2020-05-28
§
|
14:30 |
<andrew@cumin1001> |
START - Cookbook sre.hosts.downtime |
[production] |
14:01 |
<ema> |
atskafka 0.8 uploaded to buster-wikimedia T253551 |
[production] |
13:49 |
<godog> |
roll-restart prometheus k8s-staging to enable thanos upload - T252186 |
[production] |
13:36 |
<hashar> |
Restarting CI Jenkins for plugin rollback |
[production] |
11:49 |
<moritzm> |
installing unbound security updates |
[production] |
11:03 |
<kormat@cumin1001> |
dbctl commit (dc=all): 'Add db2138 to s2+s4 T252985', diff saved to https://phabricator.wikimedia.org/P11330 and previous config saved to /var/cache/conftool/dbconfig/20200528-110333-kormat.json |
[production] |
10:36 |
<jayme@deploy1001> |
helmfile [EQIAD] Ran 'sync' command on namespace 'blubberoid' for release 'production' . |
[production] |
10:34 |
<jayme@deploy1001> |
helmfile [CODFW] Ran 'sync' command on namespace 'blubberoid' for release 'production' . |
[production] |
10:30 |
<jayme@deploy1001> |
helmfile [STAGING] Ran 'sync' command on namespace 'blubberoid' for release 'staging' . |
[production] |
10:02 |
<mutante> |
gerrit1002 (test server) - chown -R gerrit2:gerrit2 /var/lib/gerrit/review_site ; restarted gerrit service, now the service is not in restart loop anymore, gerrit-ssh is listening too, just not accepting publickey (T239151) |
[production] |
09:51 |
<XioNoX> |
failover VRRP in ulsfo |
[production] |
09:41 |
<XioNoX> |
re-activate peering/transit on cr2-eqdfw - T243080 |
[production] |
09:35 |
<mutante> |
restarting gerrit on gerrit1002 after fixing db_pass to the readonly one (T243800) |
[production] |
09:33 |
<XioNoX> |
restart cr2-eqdfw for upgrade - T243080 |
[production] |
09:30 |
<XioNoX> |
deactivate peering/transit on cr2-eqdfw - T243080 |
[production] |
09:25 |
<_joe_> |
updating ACLs on all etcd servers |
[production] |
09:22 |
<XioNoX> |
install new Junos on cr2-eqdfw - T243080 |
[production] |
09:16 |
<XioNoX> |
rollback cr2-eqord ospf/bgp - T243080 |
[production] |
09:07 |
<XioNoX> |
restart cr2-eqord for upgrade - T243080 |
[production] |
09:05 |
<jayme@deploy1001> |
helmfile [STAGING] Ran 'sync' command on namespace 'blubberoid' for release 'staging' . |
[production] |
08:50 |
<_joe_> |
upgrading etcd ACLs (adding new users) to conf1004 |
[production] |
08:50 |
<XioNoX> |
install new Junos on cr2-eqord - T243080 |
[production] |
08:46 |
<XioNoX> |
deactivate peering/transit on cr2-eqord - T243080 |
[production] |
08:45 |
<XioNoX> |
de-pref all OSPF links to cr2-eqord - T243080 |
[production] |
08:13 |
<marostegui> |
Pool db1141 into labsdb analytics role - T249188 |
[production] |
07:33 |
<gilles@deploy1001> |
Synchronized static/images: T252108 Deploying optimised static PNGs (duration: 01m 39s) |
[production] |
07:31 |
<gilles@deploy1001> |
Synchronized static/apple-touch: T252108 Deploying optimised static PNGs (duration: 01m 12s) |
[production] |
06:30 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Remove db1081 from API and set its weight to 0 on main traffic - preparation for tomorrow's failover T253808', diff saved to https://phabricator.wikimedia.org/P11329 and previous config saved to /var/cache/conftool/dbconfig/20200528-063037-marostegui.json |
[production] |
04:44 |
<marostegui> |
Run check_private data on db1141 - T249188 |
[production] |
04:22 |
<marostegui> |
Stop MySQL on db1141 - T249188 |
[production] |
2020-05-27
§
|
23:20 |
<catrope@deploy1001> |
Synchronized wmf-config/InitialiseSettings.php: Add autoreviewrestore right to rollbacker group on hiwiki (T252986) (duration: 01m 05s) |
[production] |
23:16 |
<catrope@deploy1001> |
Synchronized wmf-config/InitialiseSettings.php: Add thwiki Draft namespace to wmgExemptFromUserRobotsControlExtra and enable VE there (T252959) (duration: 01m 06s) |
[production] |
22:58 |
<gehel@cumin1001> |
END (PASS) - Cookbook sre.postgresql.postgres-init (exit_code=0) |
[production] |
22:02 |
<crusnov@deploy1001> |
Finished deploy [netbox/deploy@5251cf1]: Netbox Upgrade to 2.8.4 (part4) (duration: 00m 10s) |
[production] |
22:02 |
<crusnov@deploy1001> |
Started deploy [netbox/deploy@5251cf1]: Netbox Upgrade to 2.8.4 (part4) |
[production] |
22:01 |
<crusnov@deploy1001> |
Finished deploy [netbox/deploy@5251cf1]: Netbox Upgrade to 2.8.4 (part3) (duration: 01m 29s) |
[production] |
22:00 |
<crusnov@deploy1001> |
Started deploy [netbox/deploy@5251cf1]: Netbox Upgrade to 2.8.4 (part3) |
[production] |
22:00 |
<crusnov@deploy1001> |
deploy aborted: Netbox Upgrade to 2.8.4 (part2) (duration: 01m 31s) |
[production] |
21:58 |
<crusnov@deploy1001> |
Started deploy [netbox/deploy@5251cf1]: Netbox Upgrade to 2.8.4 (part2) |
[production] |
21:58 |
<crusnov@deploy1001> |
Finished deploy [netbox/deploy@5251cf1]: Netbox Upgrade to 2.8.1 (part1) (duration: 01m 01s) |
[production] |
21:57 |
<crusnov@deploy1001> |
Started deploy [netbox/deploy@5251cf1]: Netbox Upgrade to 2.8.1 (part1) |
[production] |
20:43 |
<gehel@cumin1001> |
START - Cookbook sre.postgresql.postgres-init |
[production] |
20:28 |
<marostegui> |
Decrease innodb poolsize on s4 master and restart mysql |
[production] |
20:11 |
<mbsantos@deploy1001> |
Finished deploy [mobileapps/deploy@9dc827f]: Update mobileapps to b3b9214c (T253648) (duration: 03m 31s) |
[production] |
20:08 |
<mbsantos@deploy1001> |
Started deploy [mobileapps/deploy@9dc827f]: Update mobileapps to b3b9214c (T253648) |
[production] |
20:04 |
<twentyafterfour@deploy1001> |
Synchronized php: group1 wikis to 1.35.0-wmf.32 refs T253022 (duration: 01m 04s) |
[production] |
20:03 |
<twentyafterfour@deploy1001> |
rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.32 refs T253022 |
[production] |
20:00 |
<gehel@cumin1001> |
END (PASS) - Cookbook sre.postgresql.postgres-init (exit_code=0) |
[production] |
19:56 |
<twentyafterfour@deploy1001> |
scap failed: average error rate on 4/9 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/e474f13ffac6b8c3bf919c4aeafc8c9b for details) |
[production] |
19:46 |
<jforrester@deploy1001> |
Synchronized php-1.35.0-wmf.34/includes/parser/CoreParserFunctions.php: T253725 Partially revert 'Fix impedance mismatch with Parser::getRevisionRecordObject()' (duration: 01m 05s) |
[production] |