2021-09-24
§
|
09:25 |
<marostegui> |
Rename flaggedimages on db1096(ruwiki) and db1098(arwiki) T290340 |
[production] |
09:25 |
<elukey@cumin1001> |
START - Cookbook sre.druid.roll-restart-workers for Druid public cluster: Roll restart of Druid jvm daemons. - elukey@cumin1001 |
[production] |
09:09 |
<jynus> |
upgrade and restart db2139, db2101 |
[production] |
09:03 |
<btullis@cumin1001> |
START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop test cluster: Restart of jvm daemons. - btullis@cumin1001 |
[production] |
08:35 |
<elukey@cumin1001> |
END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-test-eqiad cluster: Roll restart of jvm daemons for openjdk upgrade. - elukey@cumin1001 |
[production] |
08:22 |
<jynus> |
upgrade and restart db2098 T290868 |
[production] |
08:20 |
<elukey@cumin1001> |
END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid analytics cluster: Roll restart of Druid jvm daemons. - elukey@cumin1001 |
[production] |
08:08 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mx2002.wikimedia.org |
[production] |
07:59 |
<jmm@cumin2002> |
START - Cookbook sre.hosts.decommission for hosts mx2002.wikimedia.org |
[production] |
07:42 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mx1002.wikimedia.org |
[production] |
07:34 |
<elukey@cumin1001> |
START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid jvm daemons. - elukey@cumin1001 |
[production] |
07:17 |
<elukey@cumin1001> |
END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0) restart workers for Hadoop test cluster: Roll restart of jvm daemons for openjdk upgrade. - elukey@cumin1001 |
[production] |
07:11 |
<jmm@cumin2002> |
START - Cookbook sre.hosts.decommission for hosts mx1002.wikimedia.org |
[production] |
07:01 |
<elukey@cumin1001> |
START - Cookbook sre.hadoop.roll-restart-workers restart workers for Hadoop test cluster: Roll restart of jvm daemons for openjdk upgrade. - elukey@cumin1001 |
[production] |
07:01 |
<elukey@cumin1001> |
END (ERROR) - Cookbook sre.hadoop.roll-restart-workers (exit_code=97) restart workers for Hadoop test cluster: Roll restart of jvm daemons for openjdk upgrade. - elukey@cumin1001 |
[production] |
07:00 |
<elukey@cumin1001> |
START - Cookbook sre.hadoop.roll-restart-workers restart workers for Hadoop test cluster: Roll restart of jvm daemons for openjdk upgrade. - elukey@cumin1001 |
[production] |
06:55 |
<elukey@cumin1001> |
START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-test-eqiad cluster: Roll restart of jvm daemons for openjdk upgrade. - elukey@cumin1001 |
[production] |
06:53 |
<elukey@cumin1001> |
END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid test cluster: Roll restart of Druid jvm daemons. - elukey@cumin1001 |
[production] |
06:44 |
<elukey@cumin1001> |
START - Cookbook sre.druid.roll-restart-workers for Druid test cluster: Roll restart of Druid jvm daemons. - elukey@cumin1001 |
[production] |
06:41 |
<elukey@cumin1001> |
END (PASS) - Cookbook sre.presto.roll-restart-workers (exit_code=0) for Presto analytics cluster: Roll restart of all Presto's jvm daemons. - elukey@cumin1001 |
[production] |
06:30 |
<elukey@cumin1001> |
START - Cookbook sre.presto.roll-restart-workers for Presto analytics cluster: Roll restart of all Presto's jvm daemons. - elukey@cumin1001 |
[production] |
06:26 |
<elukey> |
restart archiva on archiva1002 to pick up new openjdk upgrades |
[production] |
06:11 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1177 (re)pooling @ 100%: After fixing some indexes T291584', diff saved to https://phabricator.wikimedia.org/P17324 and previous config saved to /var/cache/conftool/dbconfig/20210924-061105-root.json |
[production] |
05:56 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1177 (re)pooling @ 75%: After fixing some indexes T291584', diff saved to https://phabricator.wikimedia.org/P17323 and previous config saved to /var/cache/conftool/dbconfig/20210924-055601-root.json |
[production] |
05:40 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1177 (re)pooling @ 50%: After fixing some indexes T291584', diff saved to https://phabricator.wikimedia.org/P17322 and previous config saved to /var/cache/conftool/dbconfig/20210924-054057-root.json |
[production] |
05:25 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1177 (re)pooling @ 25%: After fixing some indexes T291584', diff saved to https://phabricator.wikimedia.org/P17321 and previous config saved to /var/cache/conftool/dbconfig/20210924-052554-root.json |
[production] |
05:10 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1177 (re)pooling @ 10%: After fixing some indexes T291584', diff saved to https://phabricator.wikimedia.org/P17320 and previous config saved to /var/cache/conftool/dbconfig/20210924-051050-root.json |
[production] |
05:07 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Depool db1177 T291584', diff saved to https://phabricator.wikimedia.org/P17319 and previous config saved to /var/cache/conftool/dbconfig/20210924-050739-marostegui.json |
[production] |
01:27 |
<mwdebug-deploy@deploy1002> |
helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . |
[production] |
01:23 |
<mwdebug-deploy@deploy1002> |
helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . |
[production] |
01:16 |
<krinkle@deploy1002> |
Synchronized wmf-config/profiler.php: I25f4b70b9d4b (duration: 00m 57s) |
[production] |
00:39 |
<mwdebug-deploy@deploy1002> |
helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . |
[production] |
00:39 |
<legoktm@deploy1002> |
Synchronized php-1.38.0-wmf.1/resources/src/mediawiki.searchSuggest/searchSuggest.js: Hiding fallback button depends on HTML order (T291272) (duration: 00m 57s) |
[production] |
00:36 |
<mwdebug-deploy@deploy1002> |
helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . |
[production] |
2021-09-23
§
|
23:38 |
<foks> |
running wm-scripts/mcdc2021/populateEditCount.php on each wiki (s1 thru s8 simultaneously) https://phabricator.wikimedia.org/T291668 |
[production] |
22:58 |
<bd808@deploy1002> |
helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' . |
[production] |
22:58 |
<foks> |
creating `mcdc2021_edits` table on each wiki for elections voterlist https://phabricator.wikimedia.org/T291668 |
[production] |
22:37 |
<mwdebug-deploy@deploy1002> |
helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . |
[production] |
22:34 |
<mwdebug-deploy@deploy1002> |
helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . |
[production] |
22:33 |
<reedy@deploy1002> |
Synchronized php-1.38.0-wmf.1/extensions/SecurePoll/cli/wm-scripts/: T291668 (duration: 00m 57s) |
[production] |
22:27 |
<ryankemper> |
T280001 `ryankemper@cumin1001:~$ sudo cumin 'P{puppetmaster*}' 'sudo rm -fv /var/run/confd-template/.wcqs*'` complete, forcing recheck |
[production] |
22:27 |
<ryankemper> |
T280001 The pooling of the `wcqs*` hosts has gotten `/srv/config-master/pybal/${DC}/wcqs` to render, but we need to clear away the stale error files to get rid of the associated warnings `Stale template error files present for '/srv/config-master/pybal/${DC}/wcqs'` => `sudo rm -fv /var/run/confd-template/.wcqs*` |
[production] |
22:20 |
<mwdebug-deploy@deploy1002> |
helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . |
[production] |
22:18 |
<ryankemper> |
T280001 `ryankemper@puppetmaster1001:/srv$ sudo confctl select 'name=wcqs.*' set/pooled=yes:weight=10` |
[production] |
22:17 |
<ryankemper@puppetmaster1001> |
conftool action : set/pooled=yes:weight=10; selector: name=wcqs.* |
[production] |
22:17 |
<mwdebug-deploy@deploy1002> |
helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . |
[production] |
22:13 |
<ryankemper> |
T280001 [codfw] `root@lvs2010:/home/ryankemper# ipvsadm -Dt 10.2.2.67:443` and `root@lvs2009:/home/ryankemper# ipvsadm -Dt 10.2.2.67:443` |
[production] |
22:13 |
<ryankemper> |
T280001 [eqiad] `root@lvs1016:/home/ryankemper# ipvsadm -Dt 10.2.1.67:443` and `root@lvs1015:/home/ryankemper# ipvsadm -Dt 10.2.1.67:443` |
[production] |
22:06 |
<ryankemper> |
T280001 Restarted pybal on low-traffic primaries: `ryankemper@cumin1001:~$ sudo cumin 'P{lvs2009*,lvs1015*}' 'sudo systemctl restart pybal'` |
[production] |
22:06 |
<ryankemper> |
T280001 Waited 120s and checked https://icinga.wikimedia.org/alerts, proceeding to primary low-traffic hosts `lvs2009` and `lvs1015` |
[production] |