201-250 of 10000 results (25ms)
2021-01-03 §
15:30 <andrewbogott> disabling puppet fleet-wide to avert potential disaster from acme-chief cert rotation T271063 [production]
14:42 <andrewbogott> restarting slapd on serpens and seaborgium [production]
11:38 <elukey> powercycle an-worker1114 (kernel errors in the serial console) [production]
09:07 <elukey> reboot ms-be2050 as attempt to recover/fix its broken networking state (started from Dec 30th) - T271041 [production]
2021-01-02 §
19:27 <vgutierrez> restart acme-chief on acmechief1001 [production]
2021-01-01 §
14:49 <milimetric@deploy1001> Finished deploy [analytics/refinery@f9281dd] (thin): [SAFE, IGNORE] Simple hotfix for a python bug, analytics refinery only, not urgent (duration: 00m 07s) [production]
14:49 <milimetric@deploy1001> Started deploy [analytics/refinery@f9281dd] (thin): [SAFE, IGNORE] Simple hotfix for a python bug, analytics refinery only, not urgent [production]
14:48 <milimetric@deploy1001> Finished deploy [analytics/refinery@f9281dd]: [SAFE, IGNORE] Simple hotfix for a python bug, analytics refinery only, not urgent (duration: 10m 00s) [production]
14:38 <milimetric@deploy1001> Started deploy [analytics/refinery@f9281dd]: [SAFE, IGNORE] Simple hotfix for a python bug, analytics refinery only, not urgent [production]
08:52 <legoktm@deploy1001> Synchronized wmf-config/InitialiseSettings.php: Switch fiwiki to their 500k temporary logo! (T270974) (duration: 00m 55s) [production]
08:46 <legoktm@deploy1001> Synchronized static/images/project-logos/fiwiki-500k-2x.png: Add fiwiki 500k temporary logos (3/3) (duration: 00m 55s) [production]
08:45 <legoktm@deploy1001> Synchronized static/images/project-logos/fiwiki-500k-1.5x.png: Add fiwiki 500k temporary logos (2/3) (duration: 00m 54s) [production]
08:44 <legoktm@deploy1001> Synchronized static/images/project-logos/fiwiki-500k.png: Add fiwiki 500k temporary logos (1/3) (duration: 00m 58s) [production]
2020-12-29 §
15:52 <vgutierrez> reloading nginx on cloudelastic1005 and cloudelastic1006 [production]
15:48 <vgutierrez> triggering a puppet run on cp nodes [production]
15:45 <vgutierrez> restarting acme-chief on acmechief1001 [production]
2020-12-28 §
09:54 <elukey> reboot an-coord1002 (puppet in D state after issues with broken disk - host in standby, no traffic) [production]
2020-12-24 §
12:41 <elukey@cumin1001> END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [production]
12:37 <elukey@cumin1001> START - Cookbook sre.dns.netbox [production]
12:34 <elukey@cumin1001> END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [production]
12:23 <elukey@cumin1001> START - Cookbook sre.dns.netbox [production]
11:22 <volans> running on cumin1001: homer asw2-*-eqiad.mgmt.eqiad.wmnet commit "Fix numbering of an-worker hosts - T260445" [production]
11:08 <hashar> gerrit2001 (replica) restarting Gerrit server [production]
00:45 <legoktm> reset maxmind password [production]
2020-12-23 §
21:33 <cmjohnson@cumin1001> END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [production]
21:30 <cmjohnson@cumin1001> START - Cookbook sre.dns.netbox [production]
16:58 <cmjohnson@cumin1001> END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [production]
16:51 <cmjohnson@cumin1001> START - Cookbook sre.dns.netbox [production]
16:51 <cmjohnson@cumin1001> END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) [production]
16:44 <cmjohnson@cumin1001> START - Cookbook sre.dns.netbox [production]
15:15 <cdanis> disabling puppet on alert1001 for klaxon rollout [production]
09:59 <hashar> gerrit: removed old gerrit directory /srv/var-lib-gerrit2-cobalt.wikimedia.org/.gerritcodereview/ (was some tmp dirs for Gerrit jars ) [production]
09:54 <volans> upgraded python3-wmflib to 0.0.5 on cumin1001 [production]
05:54 <ladsgroup@deploy1001> Synchronized wmf-config/InitialiseSettings.php: [[gerrit:651682|Fix typo in autoreview right of eliminators in fawiki]] (duration: 00m 57s) [production]
2020-12-22 §
21:57 <mutante> apt1001 - sudo systemctl status rsync-aptrepo-apt2001.wikimedia.org.service - confirmed timer job is working like the cron before [production]
21:31 <mutante> deploy1002/deploy2002 - apt-get remove --purge php-readline and let puppet reinstall it (7.2 vs 7.3 after gerrit 651158) T265963 [production]
21:26 <andrewbogott> upgrading wikitech-static: mediawiki to 1.35.1 and general apt upgrade [production]
20:26 <eileen> civicrm revision changed from e86e756807 to 6150267979, config revision is 52f1cbc5dd [production]
19:32 <mutante> restarting gerrit to pick up config change in gitiles for T269300 [production]
18:29 <andrew@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on labstore1004.eqiad.wmnet with reason: REIMAGE [production]
18:27 <andrew@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on labstore1004.eqiad.wmnet with reason: REIMAGE [production]
17:27 <andrewbogott> shutting down labstore1004 in preparation for move and reimage [production]
16:51 <mforns@deploy1001> Finished deploy [analytics/refinery@21c0c89] (thin): Regular analytics weekly train THIN [analytics/refinery@Ie7bce02179547ee4c6756d52f9956f492c5b4df6] (duration: 00m 08s) [production]
16:51 <mforns@deploy1001> Started deploy [analytics/refinery@21c0c89] (thin): Regular analytics weekly train THIN [analytics/refinery@Ie7bce02179547ee4c6756d52f9956f492c5b4df6] [production]
16:48 <volans> restarted ferm on ms-be1026 (failed with DNS query for 'ms-be1055.eqiad.wmnet' failed: query timed out ) [production]
16:15 <bstorm> downtimed and stopped puppet on labstore1004 and labstore1005 for failover T266202 [production]
15:23 <jgiannelos@deploy1001> helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' . [production]
15:12 <jgiannelos@deploy1001> helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' . [production]
15:08 <jgiannelos@deploy1001> helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' . [production]
11:52 <marostegui> Set db1151 to writable T269324 [production]