2021-05-03
§
|
11:56 |
<kharlan@deploy1002> |
Synchronized php-1.37.0-wmf.3/extensions/GrowthExperiments: Backport: [[gerrit:684080|refreshLinkRecommendations.php: Use per-wiki locks]] [[gerrit:684078|Handle DB readonly errors (T281382)]] (duration: 00m 58s) |
[production] |
11:15 |
<urbanecm@deploy1002> |
Synchronized php-1.37.0-wmf.3/extensions/Popups/: a438b641c81fa16faba287407012beaff8b1f3ba: Fix settings dialog offering ReferencePreviews when unavailable (T281352) (duration: 00m 58s) |
[production] |
11:11 |
<urbanecm@deploy1002> |
Synchronized wmf-config/InitialiseSettings.php: c5a7c67b4daf33e0f9aaabec3f35ab6d4184894b: Set wgGEMentorshipMigrationStage to SCHEMA_COMPAT_NEW everywhere (T279853) (duration: 00m 57s) |
[production] |
11:04 |
<urbanecm@deploy1002> |
Synchronized wmf-config/InitialiseSettings.php: f1a5ef0116c77b86b1abfb7bfa7d4ed363c69f61: wikidata: post edit constraint jobs on 70% of edits (T204031) (duration: 00m 57s) |
[production] |
10:59 |
<moritzm> |
installing avahi security updates on buster |
[production] |
10:47 |
<jdrewniak@deploy1002> |
Synchronized portals: Wikimedia Portals Update: [[gerrit:684302| Bumping portals to master (T128546)]] (duration: 00m 57s) |
[production] |
10:46 |
<jdrewniak@deploy1002> |
Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:684302| Bumping portals to master (T128546)]] (duration: 00m 58s) |
[production] |
09:42 |
<moritzm> |
installing python3.7 security updates |
[production] |
09:41 |
<joal@deploy1002> |
Finished deploy [analytics/refinery@584ed6a] (hadoop-test): Hotfix analytics deploy (monthly sqoop) HADOOP-TEST [analytics/refinery@584ed6a] (duration: 29m 24s) |
[production] |
09:12 |
<joal@deploy1002> |
Started deploy [analytics/refinery@584ed6a] (hadoop-test): Hotfix analytics deploy (monthly sqoop) HADOOP-TEST [analytics/refinery@584ed6a] |
[production] |
09:10 |
<joal@deploy1002> |
Finished deploy [analytics/refinery@584ed6a] (thin): Hotfix analytics deploy (monthly sqoop) THIN [analytics/refinery@584ed6a] (duration: 00m 07s) |
[production] |
09:10 |
<joal@deploy1002> |
Started deploy [analytics/refinery@584ed6a] (thin): Hotfix analytics deploy (monthly sqoop) THIN [analytics/refinery@584ed6a] |
[production] |
09:09 |
<joal@deploy1002> |
Finished deploy [analytics/refinery@584ed6a]: Hotfix analytics deploy (monthly sqoop) [analytics/refinery@584ed6a] (duration: 16m 06s) |
[production] |
08:52 |
<joal@deploy1002> |
Started deploy [analytics/refinery@584ed6a]: Hotfix analytics deploy (monthly sqoop) [analytics/refinery@584ed6a] |
[production] |
08:01 |
<moritzm> |
installing edk2 security updates |
[production] |
07:31 |
<moritzm> |
installing libimage-exiftool-perl security updates |
[production] |
2021-04-30
§
|
21:54 |
<mutante> |
people1003 - rsycncing /home from peopel1002 |
[production] |
15:30 |
<dcaro@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudmetrics1002.eqiad.wmnet with reason: Flaky host |
[production] |
15:29 |
<dcaro@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on cloudmetrics1002.eqiad.wmnet with reason: Flaky host |
[production] |
15:25 |
<bstorm> |
hard rebooting cloudmetrics1002 T275605 |
[production] |
11:40 |
<ladsgroup@deploy1002> |
Synchronized static/favicon/wikitech.ico: Config: [[gerrit:683835|Update wikitech logo]] (duration: 00m 56s) |
[production] |
11:36 |
<ladsgroup@deploy1002> |
Synchronized static/images/project-logos/wikitech-1.5x.png: Config: [[gerrit:683835|Update wikitech logo]] (duration: 00m 56s) |
[production] |
11:34 |
<ladsgroup@deploy1002> |
Synchronized static/images/project-logos/wikitech-2x.png: Config: [[gerrit:683835|Update wikitech logo]] (duration: 00m 57s) |
[production] |
11:33 |
<ladsgroup@deploy1002> |
Synchronized static/images/project-logos/wikitech.png: Config: [[gerrit:683835|Update wikitech logo]] (duration: 00m 57s) |
[production] |
11:31 |
<ladsgroup@deploy1002> |
Synchronized logos/config.yaml: Config: [[gerrit:683835|Update wikitech logo]] (duration: 00m 57s) |
[production] |
09:04 |
<dcaro@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1040.eqiad.wmnet with reason: primary nic disconnected |
[production] |
09:03 |
<dcaro@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1040.eqiad.wmnet with reason: primary nic disconnected |
[production] |
08:11 |
<moritzm> |
remove mc1027 from debmonitor, server is broken and won't return (T276415) |
[production] |
07:38 |
<moritzm> |
installing iputils updates from Buster point release |
[production] |
06:15 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1114 (re)pooling @ 100%: Repool db1114', diff saved to https://phabricator.wikimedia.org/P15667 and previous config saved to /var/cache/conftool/dbconfig/20210430-061549-root.json |
[production] |
06:00 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1114 (re)pooling @ 75%: Repool db1114', diff saved to https://phabricator.wikimedia.org/P15666 and previous config saved to /var/cache/conftool/dbconfig/20210430-060046-root.json |
[production] |
05:51 |
<ryankemper@cumin1001> |
END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563 |
[production] |
05:45 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1114 (re)pooling @ 50%: Repool db1114', diff saved to https://phabricator.wikimedia.org/P15665 and previous config saved to /var/cache/conftool/dbconfig/20210430-054542-root.json |
[production] |
05:30 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1114 (re)pooling @ 25%: Repool db1114', diff saved to https://phabricator.wikimedia.org/P15664 and previous config saved to /var/cache/conftool/dbconfig/20210430-053038-root.json |
[production] |
05:16 |
<marostegui> |
Upgrade kernel on db1114 |
[production] |
05:15 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Depool db1114 to enable report_host T266483', diff saved to https://phabricator.wikimedia.org/P15663 and previous config saved to /var/cache/conftool/dbconfig/20210430-051558-marostegui.json |
[production] |
05:08 |
<marostegui@cumin1001> |
END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1080.eqiad.wmnet |
[production] |
04:57 |
<marostegui@cumin1001> |
START - Cookbook sre.hosts.decommission for hosts db1080.eqiad.wmnet |
[production] |
04:56 |
<ryankemper> |
[WDQS] `ryankemper@wdqs1006:~$ sudo systemctl restart wdqs-blazegraph` |
[production] |
04:43 |
<ryankemper> |
T280563 `sudo -i cookbook sre.elasticsearch.rolling-operation search_eqiad "eqiad reboot to apply sec updates" --reboot --nodes-per-run 3 --start-datetime 2021-04-29T23:04:29 --task-id T280563` on `ryankemper@cumin1001` tmux session `elastic_restarts` |
[production] |
04:43 |
<ryankemper@cumin1001> |
START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563 |
[production] |
04:42 |
<ryankemper> |
T261239 `elastic2033`, which is known to be in a state of hardware failure (we have a ticket open), is holding up the reboot of codfw. I don't think we have a good way to exclude a node currently. Going to just proceed to `eqiad` for now |
[production] |
04:41 |
<ryankemper@cumin1001> |
END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw reboot - ryankemper@cumin1001 - T280563 |
[production] |