251-300 of 10000 results (70ms)
2023-07-27 §
09:56 <elukey@cumin1001> START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-main-codfw cluster: Roll restart of jvm daemons. [production]
09:54 <fabfur> begin restarting lvs3005 (T335835) [production]
09:44 <fabfur> done restarting lvs3007 (T335835) [production]
09:42 <fabfur@cumin1001> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs3007.esams.wmnet [production]
09:40 <fabfur@cumin1001> START - Cookbook sre.hosts.reboot-single for host lvs3007.esams.wmnet [production]
09:38 <fabfur> begin restarting lvs3007 (T335835) [production]
09:20 <urbanecm> Run `mwscript extensions/GrowthExperiments/maintenance/refreshLinkRecommendations.php --wiki=frwiki --page="Sensibilité électromagnétique" --force` to debug T342488 [production]
09:12 <fabfur> done restarting lvs1019 (T335835) [production]
09:11 <fabfur@cumin1001> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1019.eqiad.wmnet [production]
09:07 <fabfur@cumin1001> START - Cookbook sre.hosts.reboot-single for host lvs1019.eqiad.wmnet [production]
08:42 <fabfur> begin restarting lvs1019 (T335835) [production]
08:34 <elukey@cumin1001> END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-main-eqiad cluster: Roll restart of jvm daemons. [production]
08:15 <jnuche@deploy1002> rebuilt and synchronized wikiversions files: group2 wikis to 1.41.0-wmf.19 refs T340247 [production]
07:54 <oblivian@deploy1002> helmfile [eqiad] DONE helmfile.d/services/mw-misc: apply [production]
07:54 <oblivian@deploy1002> helmfile [eqiad] START helmfile.d/services/mw-misc: apply [production]
07:54 <oblivian@deploy1002> helmfile [codfw] DONE helmfile.d/services/mw-misc: apply [production]
07:54 <oblivian@deploy1002> helmfile [codfw] START helmfile.d/services/mw-misc: apply [production]
07:40 <XioNoX> reboot lsw1-a1-codfw (test device) [production]
06:53 <elukey@cumin1001> START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-main-eqiad cluster: Roll restart of jvm daemons. [production]
06:39 <isaranto@deploy1002> helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' . [production]
06:38 <isaranto@deploy1002> helmfile [ml-serve-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' . [production]
06:36 <isaranto@deploy1002> helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' . [production]
06:03 <oblivian@deploy1002> helmfile [eqiad] DONE helmfile.d/services/mw-misc: apply [production]
05:57 <oblivian@deploy1002> helmfile [eqiad] START helmfile.d/services/mw-misc: apply [production]
05:45 <oblivian@deploy1002> helmfile [codfw] DONE helmfile.d/services/mw-misc: apply [production]
05:40 <oblivian@deploy1002> helmfile [codfw] START helmfile.d/services/mw-misc: apply [production]
05:26 <oblivian@deploy1002> Started scap: (no justification provided) [production]
05:26 <_joe_> scap is not syncing; just rebuilding the image from scratch to verify the reason for a bug. [production]
05:22 <oblivian@deploy1002> Started scap: (no justification provided) [production]
03:19 <cstone> payments-wiki upgraded from 2a68dfe2 to 1a6ca7ab [production]
03:04 <eileen> civicrm upgraded from 5a84b138 to 16c2e58a [production]
00:54 <eileen> civicrm upgraded from 68f29b70 to 5a84b138 [production]
00:51 <eileen> civicrm upgraded from 853c14f3 to 68f29b70 [production]
00:20 <eileen> rollback because I got an error when I tried to view - so let's see [production]
00:20 <eileen> civicrm rolled back from 68f29b70 to 853c14f3 (locked) [production]
00:17 <eileen> civicrm upgraded from 853c14f3 to 68f29b70 [production]
2023-07-26 §
23:01 <jforrester@deploy1002> Synchronized wmf-config/interwiki.php: Update interwiki cache now that wikifunctions is here (duration: 06m 52s) [production]
21:53 <bking@cumin1001> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wcqs2001.codfw.wmnet [production]
21:46 <bking@cumin1001> START - Cookbook sre.hosts.reboot-single for host wcqs2001.codfw.wmnet [production]
21:23 <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db2180 (T342617)', diff saved to https://phabricator.wikimedia.org/P49745 and previous config saved to /var/cache/conftool/dbconfig/20230726-212310-ladsgroup.json [production]
21:08 <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P49744 and previous config saved to /var/cache/conftool/dbconfig/20230726-210804-ladsgroup.json [production]
21:04 <jhancock@cumin2002> END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host rdb1013.eqiad.wmnet with OS bullseye [production]
21:04 <jhancock@cumin2002> START - Cookbook sre.hosts.reimage for host rdb1013.eqiad.wmnet with OS bullseye [production]
21:00 <taavi> manually attach User:WikiLambda_system to SUL T342811 [production]
20:52 <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P49743 and previous config saved to /var/cache/conftool/dbconfig/20230726-205257-ladsgroup.json [production]
20:37 <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db2180 (T342617)', diff saved to https://phabricator.wikimedia.org/P49742 and previous config saved to /var/cache/conftool/dbconfig/20230726-203751-ladsgroup.json [production]
20:34 <taavi@deploy1002> Finished scap: Backport for [[gerrit:941954|clienthints: Start collecting client hints data on testwiki (T341110)]], [[gerrit:941021|CheckUser event table migration: Write new on group0 (T330158)]] (duration: 26m 17s) [production]
20:15 <ladsgroup@cumin1001> dbctl commit (dc=all): 'Depooling db2180 (T342617)', diff saved to https://phabricator.wikimedia.org/P49741 and previous config saved to /var/cache/conftool/dbconfig/20230726-201554-ladsgroup.json [production]
20:15 <ladsgroup@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2180.codfw.wmnet with reason: Maintenance [production]
20:15 <ladsgroup@cumin1001> START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2180.codfw.wmnet with reason: Maintenance [production]