601-650 of 10000 results (15ms)
2020-05-28 §
14:30 <andrew@cumin1001> START - Cookbook sre.hosts.downtime [production]
14:01 <ema> atskafka 0.8 uploaded to buster-wikimedia T253551 [production]
13:49 <godog> roll-restart prometheus k8s-staging to enable thanos upload - T252186 [production]
13:36 <hashar> Restarting CI Jenkins for plugin rollback [production]
11:49 <moritzm> installing unbound security updates [production]
11:03 <kormat@cumin1001> dbctl commit (dc=all): 'Add db2138 to s2+s4 T252985', diff saved to https://phabricator.wikimedia.org/P11330 and previous config saved to /var/cache/conftool/dbconfig/20200528-110333-kormat.json [production]
10:36 <jayme@deploy1001> helmfile [EQIAD] Ran 'sync' command on namespace 'blubberoid' for release 'production' . [production]
10:34 <jayme@deploy1001> helmfile [CODFW] Ran 'sync' command on namespace 'blubberoid' for release 'production' . [production]
10:30 <jayme@deploy1001> helmfile [STAGING] Ran 'sync' command on namespace 'blubberoid' for release 'staging' . [production]
10:02 <mutante> gerrit1002 (test server) - chown -R gerrit2:gerrit2 /var/lib/gerrit/review_site ; restarted gerrit service, now the service is not in restart loop anymore, gerrit-ssh is listening too, just not accepting publickey (T239151) [production]
09:51 <XioNoX> failover VRRP in ulsfo [production]
09:41 <XioNoX> re-activate peering/transit on cr2-eqdfw - T243080 [production]
09:35 <mutante> restarting gerrit on gerrit1002 after fixing db_pass to the readonly one (T243800) [production]
09:33 <XioNoX> restart cr2-eqdfw for upgrade - T243080 [production]
09:30 <XioNoX> deactivate peering/transit on cr2-eqdfw - T243080 [production]
09:25 <_joe_> updating ACLs on all etcd servers [production]
09:22 <XioNoX> install new Junos on cr2-eqdfw - T243080 [production]
09:16 <XioNoX> rollback cr2-eqord ospf/bgp - T243080 [production]
09:07 <XioNoX> restart cr2-eqord for upgrade - T243080 [production]
09:05 <jayme@deploy1001> helmfile [STAGING] Ran 'sync' command on namespace 'blubberoid' for release 'staging' . [production]
08:50 <_joe_> upgrading etcd ACLs (adding new users) to conf1004 [production]
08:50 <XioNoX> install new Junos on cr2-eqord - T243080 [production]
08:46 <XioNoX> deactivate peering/transit on cr2-eqord - T243080 [production]
08:45 <XioNoX> de-pref all OSPF links to cr2-eqord - T243080 [production]
08:13 <marostegui> Pool db1141 into labsdb analytics role - T249188 [production]
07:33 <gilles@deploy1001> Synchronized static/images: T252108 Deploying optimised static PNGs (duration: 01m 39s) [production]
07:31 <gilles@deploy1001> Synchronized static/apple-touch: T252108 Deploying optimised static PNGs (duration: 01m 12s) [production]
06:30 <marostegui@cumin1001> dbctl commit (dc=all): 'Remove db1081 from API and set its weight to 0 on main traffic - preparation for tomorrow's failover T253808', diff saved to https://phabricator.wikimedia.org/P11329 and previous config saved to /var/cache/conftool/dbconfig/20200528-063037-marostegui.json [production]
04:44 <marostegui> Run check_private data on db1141 - T249188 [production]
04:22 <marostegui> Stop MySQL on db1141 - T249188 [production]
2020-05-27 §
23:20 <catrope@deploy1001> Synchronized wmf-config/InitialiseSettings.php: Add autoreviewrestore right to rollbacker group on hiwiki (T252986) (duration: 01m 05s) [production]
23:16 <catrope@deploy1001> Synchronized wmf-config/InitialiseSettings.php: Add thwiki Draft namespace to wmgExemptFromUserRobotsControlExtra and enable VE there (T252959) (duration: 01m 06s) [production]
22:58 <gehel@cumin1001> END (PASS) - Cookbook sre.postgresql.postgres-init (exit_code=0) [production]
22:02 <crusnov@deploy1001> Finished deploy [netbox/deploy@5251cf1]: Netbox Upgrade to 2.8.4 (part4) (duration: 00m 10s) [production]
22:02 <crusnov@deploy1001> Started deploy [netbox/deploy@5251cf1]: Netbox Upgrade to 2.8.4 (part4) [production]
22:01 <crusnov@deploy1001> Finished deploy [netbox/deploy@5251cf1]: Netbox Upgrade to 2.8.4 (part3) (duration: 01m 29s) [production]
22:00 <crusnov@deploy1001> Started deploy [netbox/deploy@5251cf1]: Netbox Upgrade to 2.8.4 (part3) [production]
22:00 <crusnov@deploy1001> deploy aborted: Netbox Upgrade to 2.8.4 (part2) (duration: 01m 31s) [production]
21:58 <crusnov@deploy1001> Started deploy [netbox/deploy@5251cf1]: Netbox Upgrade to 2.8.4 (part2) [production]
21:58 <crusnov@deploy1001> Finished deploy [netbox/deploy@5251cf1]: Netbox Upgrade to 2.8.1 (part1) (duration: 01m 01s) [production]
21:57 <crusnov@deploy1001> Started deploy [netbox/deploy@5251cf1]: Netbox Upgrade to 2.8.1 (part1) [production]
20:43 <gehel@cumin1001> START - Cookbook sre.postgresql.postgres-init [production]
20:28 <marostegui> Decrease innodb poolsize on s4 master and restart mysql [production]
20:11 <mbsantos@deploy1001> Finished deploy [mobileapps/deploy@9dc827f]: Update mobileapps to b3b9214c (T253648) (duration: 03m 31s) [production]
20:08 <mbsantos@deploy1001> Started deploy [mobileapps/deploy@9dc827f]: Update mobileapps to b3b9214c (T253648) [production]
20:04 <twentyafterfour@deploy1001> Synchronized php: group1 wikis to 1.35.0-wmf.32 refs T253022 (duration: 01m 04s) [production]
20:03 <twentyafterfour@deploy1001> rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.32 refs T253022 [production]
20:00 <gehel@cumin1001> END (PASS) - Cookbook sre.postgresql.postgres-init (exit_code=0) [production]
19:56 <twentyafterfour@deploy1001> scap failed: average error rate on 4/9 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/e474f13ffac6b8c3bf919c4aeafc8c9b for details) [production]
19:46 <jforrester@deploy1001> Synchronized php-1.35.0-wmf.34/includes/parser/CoreParserFunctions.php: T253725 Partially revert 'Fix impedance mismatch with Parser::getRevisionRecordObject()' (duration: 01m 05s) [production]