2020-12-23
§
|
21:33 |
<cmjohnson@cumin1001> |
END (PASS) - Cookbook sre.dns.netbox (exit_code=0) |
[production] |
21:30 |
<cmjohnson@cumin1001> |
START - Cookbook sre.dns.netbox |
[production] |
20:40 |
<legoktm> |
deploying https://gerrit.wikimedia.org/r/651819 |
[releng] |
20:32 |
<bstorm> |
Created the directory /srv/misc/shared/wikilink/project on labstore1004 and verified puppet and nfs-exportd are happy T264107 |
[wikilink] |
19:48 |
<James_F> |
Zuul: [mediawiki/tools/dependency-analysis] Add composer test CI |
[releng] |
19:20 |
<bstorm> |
created clouddb-wikireplicas-proxy-1 and clouddb-wikireplicas-proxy-2 as well as the 16 neutron ports for wikireplicas proxying |
[clouddb-services] |
19:03 |
<balloons> |
resized deployment-puppetdb03 to g2.cores2.ram4.disk40 (T270420) |
[deployment-prep] |
19:03 |
<balloons> |
resized deployment-puppetdb03 to g2.cores2.ram4.disk40 (T270420) |
[releng] |
16:58 |
<cmjohnson@cumin1001> |
END (PASS) - Cookbook sre.dns.netbox (exit_code=0) |
[production] |
16:51 |
<cmjohnson@cumin1001> |
START - Cookbook sre.dns.netbox |
[production] |
16:51 |
<cmjohnson@cumin1001> |
END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) |
[production] |
16:44 |
<cmjohnson@cumin1001> |
START - Cookbook sre.dns.netbox |
[production] |
15:53 |
<ottomata> |
point analytics-hive.eqiad.wmnet back at an-coord1001 - T268028 T270768 |
[analytics] |
15:38 |
<andrewbogott> |
restarting rabbitmq on cloudcontrol1004; suspected leaks |
[admin] |
15:35 |
<wm-bot> |
<lucaswerkmeister> deployed 6d8bae537b (Esperanto verb) |
[tools.lexeme-forms] |
15:33 |
<andrewbogott> |
restarting each cloudcontrol galera node in turn to see if that quiets down the syncing warnings |
[admin] |
15:15 |
<cdanis> |
disabling puppet on alert1001 for klaxon rollout |
[production] |
14:32 |
<wm-bot> |
<lucaswerkmeister> deployed 69f610af18 (Breton noun, without mutation, collective) |
[tools.lexeme-forms] |
12:08 |
<arturo> |
move memory out of the swap in cloudcontrol1004 by disabling/enabling it (1Gb swap was being used) |
[admin] |
09:59 |
<hashar> |
gerrit: removed old gerrit directory /srv/var-lib-gerrit2-cobalt.wikimedia.org/.gerritcodereview/ (was some tmp dirs for Gerrit jars ) |
[production] |
09:54 |
<volans> |
upgraded python3-wmflib to 0.0.5 on cumin1001 |
[production] |
05:54 |
<ladsgroup@deploy1001> |
Synchronized wmf-config/InitialiseSettings.php: [[gerrit:651682|Fix typo in autoreview right of eliminators in fawiki]] (duration: 00m 57s) |
[production] |
2020-12-22
§
|
22:40 |
<James_F> |
Zuul: [integration/docroot] Only test PHP 7.3+ from now on |
[releng] |
21:57 |
<mutante> |
apt1001 - sudo systemctl status rsync-aptrepo-apt2001.wikimedia.org.service - confirmed timer job is working like the cron before |
[production] |
21:31 |
<mutante> |
deploy1002/deploy2002 - apt-get remove --purge php-readline and let puppet reinstall it (7.2 vs 7.3 after gerrit 651158) T265963 |
[production] |
21:26 |
<andrewbogott> |
upgrading wikitech-static: mediawiki to 1.35.1 and general apt upgrade |
[production] |
21:16 |
<James_F> |
Docker: Building and publoshing tox-labs-striker:0.5.0 |
[releng] |
20:26 |
<eileen> |
civicrm revision changed from e86e756807 to 6150267979, config revision is 52f1cbc5dd |
[production] |
19:35 |
<elukey> |
restart hive daemons on an-coord1001 to pick up new settings |
[analytics] |
19:32 |
<mutante> |
restarting gerrit to pick up config change in gitiles for T269300 |
[production] |
18:29 |
<andrew@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on labstore1004.eqiad.wmnet with reason: REIMAGE |
[production] |
18:27 |
<andrew@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on labstore1004.eqiad.wmnet with reason: REIMAGE |
[production] |
18:22 |
<bstorm> |
rebooting the grid master because it is misbehaving following the NFS outage |
[tools] |
18:13 |
<elukey> |
failover analytics-hive.eqiad.wmnet to an-coord1002 (to allow maintenance on an-coord1001) |
[analytics] |
18:07 |
<elukey> |
restart hive server on an-coord1002 (current standby - no traffic) to pick up the new config (use the local metastore as opposed to what it is pointed by analytics-hive) |
[analytics] |
17:27 |
<andrewbogott> |
shutting down labstore1004 in preparation for move and reimage |
[production] |
17:00 |
<mforns> |
Deployed refinery as part of weekly train (v0.0.142) |
[analytics] |
16:51 |
<mforns@deploy1001> |
Finished deploy [analytics/refinery@21c0c89] (thin): Regular analytics weekly train THIN [analytics/refinery@Ie7bce02179547ee4c6756d52f9956f492c5b4df6] (duration: 00m 08s) |
[production] |
16:51 |
<mforns@deploy1001> |
Started deploy [analytics/refinery@21c0c89] (thin): Regular analytics weekly train THIN [analytics/refinery@Ie7bce02179547ee4c6756d52f9956f492c5b4df6] |
[production] |
16:48 |
<volans> |
restarted ferm on ms-be1026 (failed with DNS query for 'ms-be1055.eqiad.wmnet' failed: query timed out ) |
[production] |
16:42 |
<mforns> |
Deployed refinery-source v0.0.142 |
[analytics] |
16:30 |
<mforns> |
Deployed refinery-source v0.0.142 |
[analytics] |
16:15 |
<bstorm> |
downtimed and stopped puppet on labstore1004 and labstore1005 for failover T266202 |
[production] |
15:30 |
<dcaro> |
cleaning up 6778 dangling snapshots for glance images in eqiad (T270478) |
[admin] |
15:23 |
<jgiannelos@deploy1001> |
helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' . |
[production] |
15:12 |
<jgiannelos@deploy1001> |
helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' . |
[production] |
15:08 |
<jgiannelos@deploy1001> |
helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' . |
[production] |
15:00 |
<razzi> |
stopping superset server on analytics-tool1004 |
[analytics] |