1051-1100 of 10000 results (48ms)
2021-06-04 §
05:10 <marostegui@cumin1001> dbctl commit (dc=all): 'db1121 (re)pooling @ 25%: Repool db1121', diff saved to https://phabricator.wikimedia.org/P16287 and previous config saved to /var/cache/conftool/dbconfig/20210604-051010-root.json [production]
04:43 <ryankemper@cumin2002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2002.codfw.wmnet with reason: REIMAGE [production]
04:41 <ryankemper@cumin2002> START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2002.codfw.wmnet with reason: REIMAGE [production]
04:25 <ryankemper> T280382 `sudo -i wmf-auto-reimage-host -p T280382 wdqs2002.codfw.wmnet` on `ryankemper@cumin2002` tmux session `wdqs_reimage` [production]
04:22 <ryankemper> T280382 `wdqs2001.codfw.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/md2 2.9T 998G 1.8T 36% /srv` [production]
03:49 <ryankemper@cumin1001> END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) [production]
02:42 <ryankemper@cumin2002> END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) [production]
02:33 <ryankemper> [WDQS] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1006.eqiad.wmnet --dest wdqs1013.eqiad.wmnet --reason "repair overinflated wikidata jnl" --blazegraph_instance blazegraph` [production]
02:32 <ryankemper@cumin1001> START - Cookbook sre.wdqs.data-transfer [production]
02:30 <ryankemper> T280382 `wdqs1005.eqiad.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/md2 2.9T 998G 1.8T 36% /srv` [production]
02:25 <ryankemper> [WDQS] `ryankemper@wdqs1012:~$ sudo pool` (caught up on lag) [production]
02:09 <ryankemper> T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2007.codfw.wmnet --dest wdqs2001.codfw.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin2002` tmux session `wdqs_reimage` [production]
02:06 <ebernhardson> post-deploy restart airflow-(webserver|scheduer) on an-airflow1001 [production]
02:05 <ebernhardson@deploy1002> Finished deploy [wikimedia/discovery/analytics@500179f]: Stop overwriting uploads in swift (duration: 04m 40s) [production]
02:00 <ebernhardson@deploy1002> Started deploy [wikimedia/discovery/analytics@500179f]: Stop overwriting uploads in swift [production]
01:38 <ryankemper@cumin1001> END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) [production]
01:24 <ryankemper@cumin2002> START - Cookbook sre.wdqs.data-transfer [production]
00:12 <ryankemper@cumin2002> END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) [production]
00:08 <reedy@deploy1002> Synchronized wmf-config/CommonSettings.php: T280886 (duration: 00m 57s) [production]
00:07 <ryankemper> T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2007.codfw.wmnet --dest wdqs2001.codfw.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin2002` tmux session `wdqs_reimage` [production]
00:06 <ryankemper@cumin2002> START - Cookbook sre.wdqs.data-transfer [production]
00:05 <ryankemper> T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1008.eqiad.wmnet --dest wdqs1005.eqiad.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `wdqs_reimage` [production]
00:05 <ryankemper@cumin1001> START - Cookbook sre.wdqs.data-transfer [production]
00:05 <ryankemper@cumin1001> END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97) [production]
2021-06-03 §
23:41 <reedy@deploy1002> Synchronized wmf-config/CommonSettings.php: T280886 (duration: 00m 56s) [production]
23:40 <reedy@deploy1002> Synchronized wmf-config/InitialiseSettings.php: T280886 (duration: 00m 57s) [production]
23:33 <mutante> installing OS on fresh VM doh5001 [production]
23:30 <ryankemper@cumin2002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2001.codfw.wmnet with reason: REIMAGE [production]
23:28 <ryankemper@cumin2002> START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2001.codfw.wmnet with reason: REIMAGE [production]
23:09 <thcipriani@deploy1002> Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:694686|Restrict changetags to sysops and bots on meta]] T283625 (duration: 00m 58s) [production]
22:41 <ryankemper> T280382 `sudo -i wmf-auto-reimage-host -p T280382 wdqs2001.codfw.wmnet` on `ryankemper@cumin2002` tmux session `wdqs_reimage` [production]
22:39 <ryankemper> T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1008.eqiad.wmnet --dest wdqs1005.eqiad.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `wdqs_reimage` [production]
22:39 <ryankemper@cumin1001> START - Cookbook sre.wdqs.data-transfer [production]
22:36 <ryankemper> T280382 Cancelled transfer to `wdqs1005`; the source host `wdqs1013` has a `wikidata.jnl` that is 80% too big; will transfer from different node -> `wdqs1005` and then fix the journal on `wdqs1013` after [production]
22:36 <ryankemper@cumin1001> END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97) [production]
22:35 <ryankemper> T280382 `wdqs2005.codfw.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/md2 2.6T 998G 1.5T 40% /srv` [production]
22:28 <robh@cumin1001> END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [production]
22:15 <robh@cumin1001> START - Cookbook sre.dns.netbox [production]
21:55 <ryankemper@cumin2002> END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) [production]
20:54 <shdubsh> restart kafka on kafka-logging to take new retention config [production]
20:47 <sbassett> Deployed security patch for T282932 [production]
20:37 <ebernhardson> restart mjolnir-kafka-bulk-daemon on search-loader[12]001 [production]
20:35 <ebernhardson@deploy1002> Finished deploy [search/mjolnir/deploy@1c40c83]: bulk daemon: accept events for search_updates swift container (duration: 01m 00s) [production]
20:34 <ryankemper> T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1013.eqiad.wmnet --dest wdqs1005.eqiad.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `wdqs_reimage` [production]
20:34 <ryankemper@cumin1001> START - Cookbook sre.wdqs.data-transfer [production]
20:34 <ebernhardson@deploy1002> Started deploy [search/mjolnir/deploy@1c40c83]: bulk daemon: accept events for search_updates swift container [production]
20:34 <ryankemper> T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2001.codfw.wmnet --dest wdqs2005.codfw.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin2002` tmux session `wdqs_reimage` [production]
20:34 <ryankemper@cumin2002> START - Cookbook sre.wdqs.data-transfer [production]
19:58 <mutante> [mwmaint1002:~] $ /usr/local/bin/systemd-timer-mail-wrapper -T root@mwmaint1002.eqiad.wmnet --only-on-error /usr/local/bin/cross-validate-accounts [production]
19:56 <mutante> [mwmaint1002:~] $ sudo systemctl start daily_account_consistency_check.service [production]