8351-8400 of 10000 results (53ms)
2021-06-03 ยง
19:58 <mutante> [mwmaint1002:~] $ /usr/local/bin/systemd-timer-mail-wrapper -T root@mwmaint1002.eqiad.wmnet --only-on-error /usr/local/bin/cross-validate-accounts [production]
19:56 <mutante> [mwmaint1002:~] $ sudo systemctl start daily_account_consistency_check.service [production]
19:41 <dzahn@cumin1001> END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host doh5002.wikimedia.org [production]
19:41 <dzahn@cumin1001> START - Cookbook sre.ganeti.makevm for new host doh5002.wikimedia.org [production]
19:39 <ebernhardson@deploy1002> Finished deploy [wikimedia/discovery/analytics@339d402]: ship pip and wheel packages for virtualenvs (duration: 04m 27s) [production]
19:37 <dzahn@cumin1001> END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host doh5001.wikimedia.org [production]
19:34 <ebernhardson@deploy1002> Started deploy [wikimedia/discovery/analytics@339d402]: ship pip and wheel packages for virtualenvs [production]
19:33 <mutante> [deneb:~] $ sudo systemctl start docker-reporter-releng-images - T251918 - icinga-wm> RECOVERY - Check systemd state on deneb is OK [production]
19:33 <ryankemper@cumin2002> END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) [production]
19:32 <ryankemper@cumin1001> END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) [production]
19:32 <mutante> [deneb:~] $ sudo systemctl start docker-reporter-releng-images [production]
19:28 <ryankemper> T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2001.codfw.wmnet --dest wdqs2005.codfw.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin2002` tmux session `wdqs_reimage` [production]
19:27 <ryankemper@cumin2002> START - Cookbook sre.wdqs.data-transfer [production]
19:27 <ryankemper> T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1013.eqiad.wmnet --dest wdqs1005.eqiad.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `wdqs_reimage` [production]
19:27 <ryankemper@cumin1001> START - Cookbook sre.wdqs.data-transfer [production]
19:23 <dzahn@cumin1001> START - Cookbook sre.ganeti.makevm for new host doh5001.wikimedia.org [production]
19:14 <mutante> install1003 - restarting nginx after we switched from nginx-full to nginx-light package, same on other install servers T164456 [production]
19:05 <ryankemper@cumin2002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2005.codfw.wmnet with reason: REIMAGE [production]
19:03 <ryankemper@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1005.eqiad.wmnet with reason: REIMAGE [production]
19:03 <ryankemper@cumin2002> START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2005.codfw.wmnet with reason: REIMAGE [production]
19:01 <ryankemper@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1005.eqiad.wmnet with reason: REIMAGE [production]
18:52 <ebernhardson@deploy1002> Finished deploy [wikimedia/discovery/analytics@f40d41a]: resolve npe in datawriter (duration: 00m 31s) [production]
18:51 <ebernhardson@deploy1002> Started deploy [wikimedia/discovery/analytics@f40d41a]: resolve npe in datawriter [production]
18:46 <ryankemper> T280382 `sudo -i wmf-auto-reimage-host -p T280382 wdqs2005.codfw.wmnet` on `ryankemper@cumin2002` tmux session `wdqs_reimage` [production]
18:46 <ryankemper> T280382 `sudo -i wmf-auto-reimage-host -p T280382 wdqs1005.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `wdqs_reimage` [production]
18:39 <ryankemper> [WDQS] depooled `wdqs1012` (has ~15 hours of lag to catch up on) [production]
18:37 <ryankemper> [WDQS] `ryankemper@wdqs1012:~$ sudo systemctl restart wdqs-blazegraph` (blazegraph on the host has been locked up for ~16 hours based off of https://grafana.wikimedia.org/d/000000489/wikidata-query-service?orgId=1&var-cluster_name=wdqs&from=1622683465757&to=1622745461547) [production]
18:37 <dzahn@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp1087.eqiad.wmnet with reason: replaced DIMM https://phabricator.wikimedia.org/T278729 [production]
18:37 <dzahn@cumin1001> START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp1087.eqiad.wmnet with reason: replaced DIMM https://phabricator.wikimedia.org/T278729 [production]
18:28 <mutante> temp. disabling puppet on install* servers. switching nginx to light variant (T164456) [production]
18:16 <ebernhardson@deploy1002> Finished deploy [wikimedia/discovery/analytics@659a8e4]: resolve npe in datawriter (duration: 00m 15s) [production]
18:16 <ebernhardson@deploy1002> Started deploy [wikimedia/discovery/analytics@659a8e4]: resolve npe in datawriter [production]
17:49 <robh@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-be1002.eqiad.wmnet with reason: REIMAGE [production]
17:47 <robh@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-be1001.eqiad.wmnet with reason: REIMAGE [production]
17:47 <robh@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on moss-be1002.eqiad.wmnet with reason: REIMAGE [production]
17:45 <robh@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on moss-be1001.eqiad.wmnet with reason: REIMAGE [production]
17:37 <brennen> gitlab1001: re-running install-gitlab-server.sh [production]
17:16 <urandom> remove dropped Cassandra keyspace snapshots -- T258414 [production]
16:55 <ejegg> updated payments-wiki from 6fac77f60e to 7be0534b91 [production]
16:23 <ayounsi@cumin1001> START - Cookbook sre.dns.netbox [production]
15:49 <topranks> Gerrit 697993: Change BGP peer IP for doh3002 on esams CRs. [production]
15:27 <papaul> pdu replacement complete [production]
15:25 <moritzm> upgrading gitlab to 13.11.5 [production]
15:08 <papaul> disconnect ps2-d8-codfw for replacement [production]
14:55 <oblivian@deploy1002> helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [production]
14:54 <topranks> Gerrit 697970: Add Wikidough BGP peerings on esams CRs for doh3001 and doh3002. [production]
14:23 <moritzm> installing nginx security updates on buster [production]
14:12 <moritzm> installing postgresql-9.6 security updates [production]
13:55 <oblivian@deploy1002> helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [production]
13:25 <oblivian@deploy1002> helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [production]