5651-5700 of 10000 results (36ms)
2020-12-23 §
16:44 <cmjohnson@cumin1001> START - Cookbook sre.dns.netbox [production]
15:53 <ottomata> point analytics-hive.eqiad.wmnet back at an-coord1001 - T268028 T270768 [analytics]
15:38 <andrewbogott> restarting rabbitmq on cloudcontrol1004; suspected leaks [admin]
15:35 <wm-bot> <lucaswerkmeister> deployed 6d8bae537b (Esperanto verb) [tools.lexeme-forms]
15:33 <andrewbogott> restarting each cloudcontrol galera node in turn to see if that quiets down the syncing warnings [admin]
15:15 <cdanis> disabling puppet on alert1001 for klaxon rollout [production]
14:32 <wm-bot> <lucaswerkmeister> deployed 69f610af18 (Breton noun, without mutation, collective) [tools.lexeme-forms]
12:08 <arturo> move memory out of the swap in cloudcontrol1004 by disabling/enabling it (1Gb swap was being used) [admin]
09:59 <hashar> gerrit: removed old gerrit directory /srv/var-lib-gerrit2-cobalt.wikimedia.org/.gerritcodereview/ (was some tmp dirs for Gerrit jars ) [production]
09:54 <volans> upgraded python3-wmflib to 0.0.5 on cumin1001 [production]
05:54 <ladsgroup@deploy1001> Synchronized wmf-config/InitialiseSettings.php: [[gerrit:651682|Fix typo in autoreview right of eliminators in fawiki]] (duration: 00m 57s) [production]
2020-12-22 §
22:40 <James_F> Zuul: [integration/docroot] Only test PHP 7.3+ from now on [releng]
21:57 <mutante> apt1001 - sudo systemctl status rsync-aptrepo-apt2001.wikimedia.org.service - confirmed timer job is working like the cron before [production]
21:31 <mutante> deploy1002/deploy2002 - apt-get remove --purge php-readline and let puppet reinstall it (7.2 vs 7.3 after gerrit 651158) T265963 [production]
21:26 <andrewbogott> upgrading wikitech-static: mediawiki to 1.35.1 and general apt upgrade [production]
21:16 <James_F> Docker: Building and publoshing tox-labs-striker:0.5.0 [releng]
20:26 <eileen> civicrm revision changed from e86e756807 to 6150267979, config revision is 52f1cbc5dd [production]
19:35 <elukey> restart hive daemons on an-coord1001 to pick up new settings [analytics]
19:32 <mutante> restarting gerrit to pick up config change in gitiles for T269300 [production]
18:29 <andrew@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on labstore1004.eqiad.wmnet with reason: REIMAGE [production]
18:27 <andrew@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on labstore1004.eqiad.wmnet with reason: REIMAGE [production]
18:22 <bstorm> rebooting the grid master because it is misbehaving following the NFS outage [tools]
18:13 <elukey> failover analytics-hive.eqiad.wmnet to an-coord1002 (to allow maintenance on an-coord1001) [analytics]
18:07 <elukey> restart hive server on an-coord1002 (current standby - no traffic) to pick up the new config (use the local metastore as opposed to what it is pointed by analytics-hive) [analytics]
17:27 <andrewbogott> shutting down labstore1004 in preparation for move and reimage [production]
17:00 <mforns> Deployed refinery as part of weekly train (v0.0.142) [analytics]
16:51 <mforns@deploy1001> Finished deploy [analytics/refinery@21c0c89] (thin): Regular analytics weekly train THIN [analytics/refinery@Ie7bce02179547ee4c6756d52f9956f492c5b4df6] (duration: 00m 08s) [production]
16:51 <mforns@deploy1001> Started deploy [analytics/refinery@21c0c89] (thin): Regular analytics weekly train THIN [analytics/refinery@Ie7bce02179547ee4c6756d52f9956f492c5b4df6] [production]
16:48 <volans> restarted ferm on ms-be1026 (failed with DNS query for 'ms-be1055.eqiad.wmnet' failed: query timed out ) [production]
16:42 <mforns> Deployed refinery-source v0.0.142 [analytics]
16:30 <mforns> Deployed refinery-source v0.0.142 [analytics]
16:15 <bstorm> downtimed and stopped puppet on labstore1004 and labstore1005 for failover T266202 [production]
15:30 <dcaro> cleaning up 6778 dangling snapshots for glance images in eqiad (T270478) [admin]
15:23 <jgiannelos@deploy1001> helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' . [production]
15:12 <jgiannelos@deploy1001> helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' . [production]
15:08 <jgiannelos@deploy1001> helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' . [production]
15:00 <razzi> stopping superset server on analytics-tool1004 [analytics]
13:51 <dcaro> merged patch to move wikidumpparse backups to cloudvirt1025 to free space on cloudvirt1026 [admin]
11:52 <marostegui> Set db1151 to writable T269324 [production]
11:16 <wm-bot> <lucaswerkmeister> deployed 6e1185532d (Basque adjective) [tools.lexeme-forms]
11:10 <jbond42> upload puppet 5.5.22 to jessie-wikimedia [production]
11:02 <jbond42> upload puppet 5.5.22 to stretch-wikimedia [production]
10:53 <arturo> rebase & resolve ugly git merge conflict in labs/private.git [tools]
10:51 <volans@cumin2001> test SAL message from wmflib, please ignore [production]
10:48 <arturo> rebase & resolve ugly git merge conflict in labs/private.git [toolsbeta]
10:36 <elukey> restart presto coordinator to pick up analytics-hive settings [analytics]
10:25 <elukey> failover analytics-hive.eqiad.wmnet to an-coord1001 [analytics]
10:06 <volans> upgraded python3-wmflib to 0.0.5 on cumin2001 [production]
09:56 <elukey> restart hive daemons on an-coord1001 to pick up analytics-hive settings [analytics]
08:52 <hashar> gerrit: running jhat heap analyzer on gerrit2001 # T263008 [production]