1001-1050 of 1724 results (14ms)
2020-06-17 §
18:21 <urbanecm@deploy1001> scap failed: average error rate on 9/9 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/e474f13ffac6b8c3bf919c4aeafc8c9b for details) [production]
14:49 <mdholloway> rolled back recommendation-api deployment due to canary endpoint check failure (T255683) [production]
08:43 <jforrester@deploy1001> Synchronized php-1.35.0-wmf.37/includes/EditPage.php: T255177 T255614 Do not return internal edit status from EditPage (duration: 01m 08s) [production]
2020-06-16 §
00:16 <ebernhardson@deploy1001> Finished deploy [wikimedia/discovery/analytics@17212bb]: airflow: migrate leven-dist to edit-dist (duration: 00m 45s) [production]
00:16 <ebernhardson@deploy1001> Started deploy [wikimedia/discovery/analytics@17212bb]: airflow: migrate leven-dist to edit-dist [production]
2020-06-11 §
23:51 <ladsgroup@deploy1001> scap failed: average error rate on 3/9 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/e474f13ffac6b8c3bf919c4aeafc8c9b for details) [production]
2020-06-10 §
16:40 <godog> EDIT: in esams [production]
2020-06-02 §
07:32 <marostegui@cumin1001> dbctl commit (dc=all): 'Repool db1079 after data check', diff saved to https://phabricator.wikimedia.org/P11351 and previous config saved to /var/cache/conftool/dbconfig/20200602-073245-marostegui.json [production]
07:22 <marostegui> Stop slave on db1079 for data check [production]
07:22 <marostegui@cumin1001> dbctl commit (dc=all): 'Depool db1079 for data check', diff saved to https://phabricator.wikimedia.org/P11350 and previous config saved to /var/cache/conftool/dbconfig/20200602-072214-marostegui.json [production]
2020-05-27 §
19:56 <twentyafterfour@deploy1001> scap failed: average error rate on 4/9 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/e474f13ffac6b8c3bf919c4aeafc8c9b for details) [production]
2020-05-26 §
14:35 <jforrester@deploy1001> Synchronized wmf-config/CommonSettings.php: Clean up MWMultiVersion check in CommonSettings.php (duration: 00m 59s) [production]
2020-05-04 §
11:46 <tgr@deploy1001> Synchronized php-1.35.0-wmf.30/extensions/GrowthExperiments/modules/helppanel/ext.growthExperiments.HelpPanel.cta.js: SWAT: [[gerrit:594134|Help panel: Check if guidance feature flag is set before loading mobile peek (T251589)]] (duration: 01m 06s) [production]
11:43 <tgr@deploy1001> Synchronized php-1.35.0-wmf.28/extensions/GrowthExperiments/modules/helppanel/ext.growthExperiments.HelpPanel.cta.js: SWAT: [[gerrit:594137|Help panel: Check if guidance feature flag is set before loading mobile peek (T251589)]] (duration: 01m 10s) [production]
2020-04-20 §
04:55 <ariel@deploy1001> Finished deploy [dumps/dumps@b813c8a]: no private table dumps, check for existence of 7z,bz2 page content files before dumping, various unit tests (duration: 00m 04s) [production]
04:55 <ariel@deploy1001> Started deploy [dumps/dumps@b813c8a]: no private table dumps, check for existence of 7z,bz2 page content files before dumping, various unit tests [production]
2020-03-27 §
11:44 <oblivian@puppetmaster1001> conftool action : edit; selector: dc=codfw,cluster=restbase,service=restbase-ssl,name=restbase202[1].codfw.wmnet [production]
2020-03-25 §
21:16 <rlazarus> holding off on updating eventgate-analytics until EU time, to check on unexpected helmfile diffs T246868 [production]
2020-03-24 §
19:29 <twentyafterfour@deploy1001> scap failed: average error rate on 4/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/db09a36be5ed3e81155041f7d46ad040 for details) [production]
19:28 <twentyafterfour@deploy1001> scap failed: average error rate on 7/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/db09a36be5ed3e81155041f7d46ad040 for details) [production]
2020-03-12 §
12:00 <tarrow@deploy1001> Synchronized php-1.35.0-wmf.23/extensions/TwoColConflict: SWAT: [[gerrit:579221|Detect whether an edit came from VisualEditor (T245722)]] (duration: 01m 10s) [production]
2020-03-03 §
00:14 <jforrester@deploy1001> Synchronized wmf-config/CommonSettings.php: T240055: Point the Parsoid cluster at the train version of Parsoid, not a special check-out (duration: 00m 56s) [production]
00:13 <jforrester@deploy1001> sync aborted: wmf-config/CommonSettings.php T240055: Point the Parsoid cluster at the train version of Parsoid, not a special check-out (duration: 00m 03s) [production]
00:13 <jforrester@deploy1001> Started scap: wmf-config/CommonSettings.php T240055: Point the Parsoid cluster at the train version of Parsoid, not a special check-out [production]
2020-03-02 §
06:24 <marostegui@cumin1001> dbctl commit (dc=all): 'Add db1111 to s8 with minimal weight to check grants and any other issues T246447', diff saved to https://phabricator.wikimedia.org/P10564 and previous config saved to /var/cache/conftool/dbconfig/20200302-062435-marostegui.json [production]
2020-03-01 §
16:08 <reedy@deploy1001> scap failed: average error rate on 5/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/db09a36be5ed3e81155041f7d46ad040 for details) [production]
2020-02-27 §
21:39 <jforrester@deploy1001> scap failed: average error rate on 11/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/db09a36be5ed3e81155041f7d46ad040 for details) [production]
16:50 <volans> temporarily decommented external check for icinga2001. Restarting Icinga on icinga2001 [production]
2020-02-26 §
15:51 <jynus> starting s2, s3 eqiad backup source data check; expect increase read traffic on db1095:3313, db1140:3312, db1078, db1090:3312 T244958 [production]
2020-02-24 §
07:31 <cdanis> dbctl: edit es4/es5 sections in eqiad (flavor & master & min_replicas fields) T245806 [production]
07:29 <cdanis> dbctl: edit es4/es5 sections in codfw (flavor & master fields) T245806 [production]
2020-02-21 §
21:53 <andrew@deploy1001> Finished deploy [horizon/deploy@a8f2ea9]: added a warning about the public git history to the hiera edit panel -- take two (duration: 03m 41s) [production]
21:49 <andrew@deploy1001> Started deploy [horizon/deploy@a8f2ea9]: added a warning about the public git history to the hiera edit panel -- take two [production]
21:45 <andrew@deploy1001> Finished deploy [horizon/deploy@13ca90a]: added a warning about the public git history to the hiera edit panel (duration: 00m 11s) [production]
21:45 <andrew@deploy1001> Started deploy [horizon/deploy@13ca90a]: added a warning about the public git history to the hiera edit panel [production]
2020-02-19 §
19:31 <jforrester@deploy1001> Synchronized php-1.35.0-wmf.19/skins/MinervaNeue/includes/MinervaHooks.php: T245162 Check title value before proceeding to check if user page (duration: 01m 04s) [production]
19:27 <jforrester@deploy1001> Synchronized php-1.35.0-wmf.20/skins/MinervaNeue/includes/MinervaHooks.php: T245162 Check title value before proceeding to check if user page (duration: 01m 04s) [production]
17:40 <jynus> starting data check between db1078 and db1140:3313 T244958 [production]
2020-02-13 §
00:45 <niharika29@deploy1001> Synchronized wmf-config/throttle.php: Throttle rule for National Gallery of Canada Library and Archives edit-a-thon - T244488 (duration: 01m 07s) [production]
2020-02-12 §
15:32 <marostegui> Disable event handler for db1095 RAID check on icinga - T244958 [production]
15:32 <marostegui> Disable event handler for db1095 RAID check on icinga - [production]
2020-02-10 §
21:36 <robh> cp1075 and cp1076 going offline for bios updates. This will cause a bit of cp irc icinga noise, but no paging. Not putting into maint mode, as there is no way to maint mode the noisest check (which checks all backends and thus shouldnt be disabled) [production]
2020-02-05 §
19:10 <jforrester@deploy1001> scap failed: average error rate on 4/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/db09a36be5ed3e81155041f7d46ad040 for details) [production]
2020-02-03 §
22:13 <mutante> rebooting ganeti1010, ganeti1011 and other new ganeti machines to pickup microcode mitigations, for some reason the previous reboots did not do it. rescheduled service check on icinga for ganeti1010 and now it recovered (T228924) [production]
2020-01-31 §
16:59 <marostegui> Re-enable notifications on the dbstore1005:3318 check T243871 [production]
2020-01-22 §
12:43 <ladsgroup@deploy1001> scap failed: average error rate on 4/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/db09a36be5ed3e81155041f7d46ad040 for details) [production]
2020-01-15 §
01:17 <mutante> dbproxy1017 and dbproxy1021 were showing "haproxy failover" icinga alerts. did the check described on https://wikitech.wikimedia.org/wiki/HAProxy#Failover and it claimed on both that db1133 was DOWN..but checking db1133 itself showed it was up and working normal. in that case the docs said to 'systemctl reload haproxy'. done on both and things recovered [production]
2020-01-10 §
20:45 <jeh> cloudcontrol200[13]-dev schedule downtime until Feb 28 2020 on systemd service check T242462 [production]
20:29 <jeh> cloudmetrics100[12] schedule downtime until Feb 28 2020 on prometheus check T242460 [production]
2020-01-06 §
12:31 <ladsgroup@deploy1001> Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:562083|Don’t check constraints on P6685 statements]] Bypassing T236104 (duration: 00m 55s) [production]