2020-09-09
§
|
08:40 |
<oblivian@deploy1001> |
helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' . |
[production] |
08:40 |
<oblivian@deploy1001> |
helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' . |
[production] |
08:36 |
<kormat@cumin1001> |
dbctl commit (dc=all): 'Repooling after reboot. T261389', diff saved to https://phabricator.wikimedia.org/P12536 and previous config saved to /var/cache/conftool/dbconfig/20200909-083616-kormat.json |
[production] |
08:34 |
<oblivian@deploy1001> |
helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' . |
[production] |
08:34 |
<oblivian@deploy1001> |
helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' . |
[production] |
08:30 |
<kormat@cumin1001> |
dbctl commit (dc=all): 'Rebooting for T261389', diff saved to https://phabricator.wikimedia.org/P12535 and previous config saved to /var/cache/conftool/dbconfig/20200909-083038-kormat.json |
[production] |
08:30 |
<kormat@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) |
[production] |
08:30 |
<kormat@cumin1001> |
START - Cookbook sre.hosts.downtime |
[production] |
08:14 |
<oblivian@deploy1001> |
helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' . |
[production] |
07:41 |
<urbanecm@deploy1001> |
Synchronized wmf-config/InitialiseSettings.php: Disable DynamicPageList on ruwikinews (T262240) (duration: 01m 22s) |
[production] |
07:25 |
<elukey> |
restart varnishkafka-webrequest on cp5010 and cp5012, delivery reports errors happening since yesterday's network outage |
[production] |
06:21 |
<XioNoX> |
push new pfw policies - T262297 |
[production] |
01:58 |
<eileen> |
civicrm revision changed from 4e40a59d42 to cc1f7e6d13, config revision is 4845a229dc |
[production] |
2020-09-08
§
|
23:47 |
<eileen> |
civicrm revision is 4e40a59d42, config revision is d26334fa36 |
[production] |
23:25 |
<eileen> |
civicrm revision changed from 5e7352e2c3 to 4e40a59d42, config revision is 3cf0913789 |
[production] |
22:14 |
<pt1979@cumin2001> |
END (PASS) - Cookbook sre.dns.netbox (exit_code=0) |
[production] |
22:12 |
<andrew@deploy1001> |
Finished deploy [horizon/deploy@7d727eb]: very minor wmf-puppet-dashboard update (duration: 03m 35s) |
[production] |
22:08 |
<andrew@deploy1001> |
Started deploy [horizon/deploy@7d727eb]: very minor wmf-puppet-dashboard update |
[production] |
22:02 |
<pt1979@cumin2001> |
START - Cookbook sre.dns.netbox |
[production] |
21:57 |
<andrew@deploy1001> |
Finished deploy [horizon/deploy@7a3221d]: refreshing to clobber local hacks (duration: 00m 13s) |
[production] |
21:57 |
<andrew@deploy1001> |
Started deploy [horizon/deploy@7a3221d]: refreshing to clobber local hacks |
[production] |
19:19 |
<jhuneidi@deploy1001> |
rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.8 |
[production] |
19:12 |
<jhuneidi@deploy1001> |
Finished scap: testwikis wikis to 1.36.0-wmf.8 (duration: 71m 45s) |
[production] |
18:22 |
<elukey> |
rm /srv/prometheus/ops/targets/mjolnir_msearch_eqiad.yaml on prometheus100[3,4] as cleanup after https://gerrit.wikimedia.org/r/621988 - T260305 |
[production] |
18:00 |
<jhuneidi@deploy1001> |
Started scap: testwikis wikis to 1.36.0-wmf.8 |
[production] |
17:58 |
<ryankemper@cumin1001> |
START - Cookbook sre.wdqs.data-reload |
[production] |
17:57 |
<ryankemper@cumin1001> |
END (ERROR) - Cookbook sre.wdqs.data-reload (exit_code=97) |
[production] |
17:54 |
<Amir1> |
Deployed patch for T262240 |
[production] |
17:53 |
<ryankemper@cumin1001> |
START - Cookbook sre.wdqs.data-reload |
[production] |
17:23 |
<andrewbogott> |
rebooting cloudvirt1033 |
[production] |
17:03 |
<klausman> |
attempted to add rock-dkms_3.3-19_all.deb to thirdparty/amd-rocm33 for use on analytics servers with GPUs |
[production] |
16:35 |
<otto@deploy1001> |
Synchronized wmf-config/InitialiseSettings.php: wgEventStreams: Set canary_events_enabled: true for eventgate test streams and eventlogging_Test - T251609 (duration: 00m 58s) |
[production] |
16:34 |
<herron> |
increased elk5 logstash JVM heaps to 2g (to help decrease kafka-logging consumer lag) |
[production] |
16:12 |
<longma> |
1.36.0-wmf.8 was branched at e81e81e91473cc8259c473165863aca8ecea2784 for T257976 |
[production] |
16:03 |
<akosiaris@deploy1001> |
helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' . |
[production] |
16:03 |
<akosiaris@deploy1001> |
helmfile [eqiad] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' . |
[production] |
16:02 |
<akosiaris@deploy1001> |
helmfile [codfw] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' . |
[production] |
15:34 |
<jayme@cumin1001> |
conftool action : set/pooled=yes; selector: name=kubernetes1004.* |
[production] |
15:32 |
<jayme@cumin1001> |
conftool action : set/pooled=yes; selector: service=kubesvc,name=kubernetes1013.* |
[production] |
15:30 |
<elukey> |
roll restart of hadoop master daemons on an-master100[1,2] after the cookbook failed |
[production] |
15:26 |
<elukey@cumin1001> |
END (FAIL) - Cookbook sre.hadoop.roll-restart-masters (exit_code=99) |
[production] |
15:20 |
<_joe_> |
restarted celery-ores-worker.service on ores1007 |
[production] |
15:19 |
<_joe_> |
restarted ferm on wdqs1011 |
[production] |
15:18 |
<elukey@cumin1001> |
START - Cookbook sre.hadoop.roll-restart-masters |
[production] |
15:16 |
<_joe_> |
starting wdqs-updater on wdqs1005 |
[production] |
15:15 |
<bblack@cumin1001> |
conftool action : set/pooled=yes; selector: name=cp1090.eqiad.wmnet |
[production] |
15:14 |
<bblack@cumin1001> |
conftool action : set/pooled=yes; selector: name=cp108[789].eqiad.wmnet |
[production] |
15:14 |
<bblack> |
repool cp1087-90 (eqiad row D) |
[production] |
15:13 |
<herron> |
rolling restart of elk5 logstashes |
[production] |
15:10 |
<marostegui> |
Start mysql on db1106 after PDU maintenance is done |
[production] |