6251-6300 of 10000 results (47ms)
2021-02-08 §
10:05 <moritzm> updating netboot images to Buster 10.8 T274099 [production]
10:05 <jiji@cumin1001> START - Cookbook sre.hosts.reboot-single for host mc2025.codfw.wmnet [production]
09:43 <XioNoX> failover pfw3-eqiad RG1 to node 0 T263833 [production]
09:42 <marostegui> Stop MySQL on db1111 T273982 [production]
09:36 <vgutierrez@cumin1001> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs4007.ulsfo.wmnet [production]
09:23 <vgutierrez> restart varnish-fe on cp1087 [production]
09:21 <vgutierrez@cumin1001> START - Cookbook sre.hosts.reboot-single for host lvs4007.ulsfo.wmnet [production]
09:20 <vgutierrez> rolling restart of LVS instances to catch up on kernel upgrades [production]
09:00 <gehel> depool and restart blazegraph on wdqs1005 / wdqs1012 [production]
08:56 <XioNoX> push pfw policies T273989 [production]
08:33 <godog> swift codfw-prod decrease HDD weight for ms-be20[16-27] - T272837 [production]
07:08 <marostegui@cumin1001> dbctl commit (dc=all): 'Depool db1111 T273982', diff saved to https://phabricator.wikimedia.org/P14229 and previous config saved to /var/cache/conftool/dbconfig/20210208-070858-marostegui.json [production]
06:50 <effie> Removed mc1024 from mcrouter, some resharding is expected [production]
06:13 <marostegui@cumin1001> dbctl commit (dc=all): 'Remove db1094 from dbctl T273710', diff saved to https://phabricator.wikimedia.org/P14228 and previous config saved to /var/cache/conftool/dbconfig/20210208-061319-marostegui.json [production]
2021-02-07 §
22:58 <Urbanecm> Reset password for TheresNoTime (T274087) [production]
2021-02-06 §
08:59 <elukey@cumin1001> END (PASS) - Cookbook sre.hadoop.change-distro-from-cdh-clients (exit_code=0) for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001 [production]
08:58 <elukey@cumin1001> START - Cookbook sre.hadoop.change-distro-from-cdh-clients for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001 [production]
08:52 <elukey@cumin1001> END (PASS) - Cookbook sre.hadoop.change-distro-from-cdh-clients (exit_code=0) for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001 [production]
08:52 <elukey@cumin1001> START - Cookbook sre.hadoop.change-distro-from-cdh-clients for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001 [production]
03:40 <ryankemper> Deleted dump taking up diskspace on `wdqs1009`, disk space warning will resolve now [production]
01:30 <dzahn@cumin1001> conftool action : set/pooled=yes; selector: name=mw1319.eqiad.wmnet [production]
01:29 <dzahn@cumin1001> conftool action : set/pooled=yes; selector: name=mw1313.eqiad.wmnet [production]
01:25 <dzahn@cumin1001> conftool action : set/pooled=no; selector: name=mw1319.eqiad.wmnet [production]
01:25 <dzahn@cumin1001> conftool action : set/pooled=no; selector: name=mw1313.eqiad.wmnet [production]
01:00 <dzahn@cumin1001> conftool action : set/pooled=yes; selector: name=mw2265.codfw.wmnet [production]
00:57 <dzahn@cumin1001> conftool action : set/pooled=yes; selector: name=mw1366.eqiad.wmnet [production]
00:46 <dzahn@cumin1001> conftool action : set/pooled=no; selector: name=mw1366.eqiad.wmnet [production]
00:46 <dzahn@cumin1001> conftool action : set/pooled=no; selector: name=mw2265.codfw.wmnet [production]
00:30 <dzahn@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1313.eqiad.wmnet with reason: REIMAGE [production]
00:28 <dzahn@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on mw1313.eqiad.wmnet with reason: REIMAGE [production]
00:25 <dzahn@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1319.eqiad.wmnet with reason: REIMAGE [production]
00:23 <dzahn@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on mw1319.eqiad.wmnet with reason: REIMAGE [production]
00:19 <dzahn@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2265.codfw.wmnet with reason: REIMAGE [production]
00:17 <dzahn@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on mw2265.codfw.wmnet with reason: REIMAGE [production]
00:15 <dzahn@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1366.eqiad.wmnet with reason: REIMAGE [production]
00:13 <dzahn@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on mw1366.eqiad.wmnet with reason: REIMAGE [production]
2021-02-05 §
23:37 <legoktm@cumin1001> conftool action : set/pooled=yes; selector: name=mw1285.eqiad.wmnet [production]
23:35 <ryankemper> T267927 Re-downloading latest dumps (main database, lexeme) in tmux session `downloads_dumps` on `ryankemper@wdqs1009.eqiad.wmnet` [production]
23:15 <legoktm@cumin1001> conftool action : set/pooled=no; selector: name=mw1285.eqiad.wmnet [production]
22:56 <ryankemper@cumin1001> END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) [production]
22:56 <ryankemper@cumin1001> START - Cookbook sre.wdqs.data-reload [production]
22:50 <ryankemper@cumin1001> END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) [production]
22:50 <ryankemper@cumin1001> START - Cookbook sre.wdqs.data-reload [production]
22:46 <ryankemper@cumin1001> END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) [production]
22:46 <ryankemper@cumin1001> START - Cookbook sre.wdqs.data-reload [production]
22:42 <ryankemper> T267927 `sudo cookbook sre.wdqs.data-reload wdqs1009.eqiad.wmnet --reuse-downloaded-dump --reload-data wikidata --skolemize --reason 'T267927: Reload wikidata jnl from fresh dumps' --task-id T267927` failing with `ERROR org.wikidata.query.rdf.tool.Munge - Fatal error munging RDF` [production]
22:41 <ryankemper@cumin1001> END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) [production]
22:41 <ryankemper@cumin1001> START - Cookbook sre.wdqs.data-reload [production]
22:38 <ryankemper@cumin1001> END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) [production]
22:38 <ryankemper@cumin1001> START - Cookbook sre.wdqs.data-reload [production]