1801-1850 of 10000 results (37ms)
2021-11-22 §
07:17 <Amir1> running optimize table on image table in commonswiki on codfw with replication enabled, it'll cause replication lag (T296143) [production]
07:10 <marostegui@cumin1001> dbctl commit (dc=all): 'db1131 (re)pooling @ 20%: Repool after HW maintenance', diff saved to https://phabricator.wikimedia.org/P17788 and previous config saved to /var/cache/conftool/dbconfig/20211122-071006-root.json [production]
06:55 <marostegui@cumin1001> dbctl commit (dc=all): 'db1131 (re)pooling @ 10%: Repool after HW maintenance', diff saved to https://phabricator.wikimedia.org/P17787 and previous config saved to /var/cache/conftool/dbconfig/20211122-065502-root.json [production]
06:46 <marostegui> Revoke dump grants for scholarships database T296166 [production]
06:39 <marostegui@cumin1001> dbctl commit (dc=all): 'db1131 (re)pooling @ 5%: Repool after HW maintenance', diff saved to https://phabricator.wikimedia.org/P17786 and previous config saved to /var/cache/conftool/dbconfig/20211122-063959-root.json [production]
06:24 <marostegui@cumin1001> dbctl commit (dc=all): 'db1131 (re)pooling @ 1%: Repool after HW maintenance', diff saved to https://phabricator.wikimedia.org/P17785 and previous config saved to /var/cache/conftool/dbconfig/20211122-062455-root.json [production]
03:30 <Amir1> run optimize table on db2140 for image table (T296143) [production]
2021-11-21 §
13:17 <dcausse> restarting blazegraph on wdqs1007 (jvm stuck for 10h) [production]
07:26 <XioNoX> cr1-eqiad# deactivate protocols bgp group Confed_eqord [production]
05:22 <Amir1> running clean up of djvu files in all wikis (T275268) [production]
05:13 <Amir1> end of djvu metadata maint script run (T275268) [production]
2021-11-20 §
01:02 <mutante> lists1001 - restarted apache, icinga alerts for the web UI, but recovered [production]
00:27 <cdanis@cumin1001> END (PASS) - Cookbook sre.network.cf (exit_code=0) [production]
00:26 <cdanis@cumin1001> START - Cookbook sre.network.cf [production]
00:25 <bblack> lvs3005 - re-enabling puppet + pybal [production]
00:25 <legoktm@cumin1001> END (PASS) - Cookbook sre.network.cf (exit_code=0) [production]
00:25 <legoktm@cumin1001> START - Cookbook sre.network.cf [production]
00:24 <cdanis@cumin1001> END (PASS) - Cookbook sre.network.cf (exit_code=0) [production]
00:23 <cdanis@cumin1001> START - Cookbook sre.network.cf [production]
00:06 <bblack> lvs3005 - disabling puppet and stopping pybal (traffic will go to lvs3007) [production]
2021-11-19 §
23:52 <pt1979@cumin2002> END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host prometheus2005.codfw.wmnet with OS bullseye [production]
23:25 <pt1979@cumin2002> START - Cookbook sre.hosts.reimage for host prometheus2005.codfw.wmnet with OS bullseye [production]
23:24 <pt1979@cumin2002> END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host prometheus2005.codfw.wmnet with OS bullseye [production]
23:15 <mutante> LDAP - added mmartorana to wmf (91354e9e-5706-4289-9a60-98e8a7632853) T295789 [production]
22:59 <pt1979@cumin2002> START - Cookbook sre.hosts.reimage for host prometheus2005.codfw.wmnet with OS bullseye [production]
20:24 <pt1979@cumin2002> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2018.codfw.wmnet with OS stretch [production]
20:21 <mutante> phabricator - adding eigyan to WMF-NDA (phab projectt 61 - https://phabricator.wikimedia.org/project/members/61/ ) - since that is now standard when adding people to the wmf LDAP group (T295928) [production]
20:20 <legoktm@cumin1001> END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts thumbor2002.codfw.wmnet [production]
20:05 <legoktm@cumin1001> START - Cookbook sre.hosts.decommission for hosts thumbor2002.codfw.wmnet [production]
20:00 <dzahn@cumin1001> END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mw2280.codfw.wmnet [production]
19:55 <pt1979@cumin2002> START - Cookbook sre.hosts.reimage for host kubernetes2018.codfw.wmnet with OS stretch [production]
19:51 <mutante> shutting down undead server mw2280 - not icinga and puppetdb but in debmonitor and still has IP and puppet cert [production]
19:45 <dzahn@cumin1001> START - Cookbook sre.hosts.decommission for hosts mw2280.codfw.wmnet [production]
18:54 <hnowlan@cumin1001> END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-codfw: Restarting to pick up Java security updates - hnowlan@cumin1001 [production]
18:10 <andrew@deploy1002> Finished deploy [horizon/deploy@ba16257]: moving the proxy endpoint behind keystone (duration: 04m 19s) [production]
18:06 <andrew@deploy1002> Started deploy [horizon/deploy@ba16257]: moving the proxy endpoint behind keystone [production]
17:45 <pt1979@cumin2002> END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [production]
17:41 <pt1979@cumin2002> START - Cookbook sre.dns.netbox [production]
17:25 <andrew@deploy1002> Finished deploy [horizon/deploy@ee83e27]: fixing sudo rule editing (duration: 04m 10s) [production]
17:21 <andrew@deploy1002> Started deploy [horizon/deploy@ee83e27]: fixing sudo rule editing [production]
17:19 <mwdebug-deploy@deploy1002> helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [production]
17:10 <mwdebug-deploy@deploy1002> helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [production]
16:54 <mwdebug-deploy@deploy1002> helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [production]
16:50 <mwdebug-deploy@deploy1002> helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [production]
16:42 <thcipriani@deploy1002> rebuilt and synchronized wikiversions files: Revert "group1 wikis to 1.38.0-wmf.9 refs T293950 T296098" [production]
16:35 <thcipriani> rolling back to group0 for T296098 [production]
16:20 <hnowlan@cumin1001> START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-codfw: Restarting to pick up Java security updates - hnowlan@cumin1001 [production]
15:31 <akosiaris> roll restart wtp10* php7.2-fpm excluding wtp1025, wtp1041 [production]
15:29 <akosiaris> depooling wtp1041, wtp1025 from traffic. The entire of the parsoid cluster is in a memory pressure situation, it looks like a rolling restart of php-fpm will alleviate the pressure and gives us some time to drill more on the problem before the pressure builds up again. [production]
15:28 <akosiaris@cumin1001> conftool action : set/pooled=no; selector: cluster=parsoid,name=wtp1025.eqiad.wmnet [production]