2020-09-04
§
|
16:19 |
<James_F> |
Zuul: Voting FR jobs for ParserFunctions and cldr. |
[releng] |
15:32 |
<hashar> |
Updated doc.wikimedia.org docroot for https://gerrit.wikimedia.org/r/c/integration/docroot/+/624714 |
[releng] |
14:10 |
<Reedy> |
rebooted due to laggy irc echoing |
[tools.wikibugs] |
10:31 |
<elukey@cumin1001> |
END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0) |
[production] |
10:29 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Depool db1087 for MCR schema change', diff saved to https://phabricator.wikimedia.org/P12492 and previous config saved to /var/cache/conftool/dbconfig/20200904-102955-marostegui.json |
[production] |
10:28 |
<marostegui> |
Deploy MCR schema change on db1087 (sanitarium master), this will generate lag (probably a few days) on s8 labsdb hosts T238966 |
[production] |
09:48 |
<marostegui> |
Restart prometheus-mysqld-exporter on db2125 |
[production] |
09:39 |
<rxy> |
Flags +AV were set on Mirinano in #cvn-ja. |
[cvn] |
09:39 |
<rxy> |
Flags +AV were set on Mirinano in #cvn-sw. |
[cvn] |
09:11 |
<elukey@cumin1001> |
START - Cookbook sre.hadoop.roll-restart-workers |
[production] |
08:58 |
<elukey@cumin1001> |
END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0) |
[production] |
08:31 |
<elukey@cumin1001> |
START - Cookbook sre.hadoop.roll-restart-workers |
[production] |
08:29 |
<elukey> |
roll restart of the hadoop workers (test and analytics cluster) for openjdk upgrades |
[production] |
08:08 |
<moritzm> |
installing 4.19.132 kernel on buster systems (only installing the deb, reboots separately) |
[production] |
07:30 |
<moritzm> |
installing 4.9.228 kernel on stretch systems (only installing the deb, reboots separately) |
[production] |
07:07 |
<wm-bot> |
<jeanfred> Deploy latest from Git master: 3099ce3 |
[tools.wikiloves] |
06:54 |
<joal> |
Manually restart mediawiki-history-drop-snapshot after hive-partitions/hdfs-folders mismatch fix |
[analytics] |
06:08 |
<elukey> |
reset-failed mediawiki-history-drop-snapshot on an-launcher1002 to clear icinga errors |
[analytics] |
05:13 |
<marostegui> |
Deploy MCR schema change on s4 eqiad master T238966 |
[production] |
01:52 |
<milimetric> |
aborted aqs deploy due to cassandra error |
[analytics] |
01:51 |
<milimetric@deploy1001> |
Finished deploy [analytics/aqs/deploy@95d6432]: AQS: Deploying new geoeditors endpoints (duration: 63m 18s) |
[production] |
01:35 |
<pt1979@cumin2001> |
END (PASS) - Cookbook sre.dns.netbox (exit_code=0) |
[production] |
01:30 |
<pt1979@cumin2001> |
START - Cookbook sre.dns.netbox |
[production] |
01:23 |
<ryankemper> |
(Following the restart of blazegraph, service has been restored to `wdqs2003`. See https://grafana.wikimedia.org/d/000000489/wikidata-query-service?orgId=1&var-cluster_name=wdqs&from=1599182219699&to=1599182547699) |
[production] |
01:16 |
<ryankemper> |
Glancing at https://grafana.wikimedia.org/d/000000489/wikidata-query-service?orgId=1&var-cluster_name=wdqs&from=1599170628749&to=1599182011243, looks like `wdqs2003`'s blazegaph isn't happy based off the null data entries. Restarting blazegraph: `ryankemper@wdqs2003:~$ sudo systemctl restart wdqs-blazegraph` |
[production] |
00:48 |
<milimetric@deploy1001> |
Started deploy [analytics/aqs/deploy@95d6432]: AQS: Deploying new geoeditors endpoints |
[production] |
2020-09-03
§
|
23:31 |
<urbanecm@deploy1001> |
Synchronized wmf-config/InitialiseSettings.php: 93947391e97be11a9cd7eb4713b274b05d5b371a: Start logging log-ins on select wikis (T253802) (duration: 00m 56s) |
[production] |
22:18 |
<legoktm> |
manually kicking mirror script, it apparently got stuck on 2020-07-01 |
[packagist-mirror] |
22:10 |
<legoktm> |
switch domain to wmcloud.org |
[packagist-mirror] |
21:50 |
<legoktm> |
added libraryupgrader2.wmcloud.org DNS proxy and remove wmflabs.org one for automatic redirect (T261995) |
[library-upgrader] |
21:18 |
<cmjohnson@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) |
[production] |
21:15 |
<cmjohnson@cumin1001> |
START - Cookbook sre.hosts.downtime |
[production] |
20:14 |
<balloons> |
increased cores to 24 and ram to 49152 |
[wmde-dashboards] |
19:55 |
<milimetric@deploy1001> |
deploy aborted: AQS: Deploying new geoeditors endpoints (duration: 00m 13s) |
[production] |
19:54 |
<milimetric@deploy1001> |
Started deploy [analytics/aqs/deploy@95d6432]: AQS: Deploying new geoeditors endpoints |
[production] |
19:15 |
<milimetric> |
finished deploying refinery and refinery-source, restarting jobs now |
[analytics] |
19:07 |
<milimetric@deploy1001> |
Finished deploy [analytics/refinery@e4d5149] (thin): Regular analytics weekly train THIN [analytics/refinery@e4d5149] (duration: 00m 08s) |
[production] |
19:07 |
<milimetric@deploy1001> |
Started deploy [analytics/refinery@e4d5149] (thin): Regular analytics weekly train THIN [analytics/refinery@e4d5149] |
[production] |
19:06 |
<milimetric@deploy1001> |
Finished deploy [analytics/refinery@e4d5149]: Regular analytics weekly train [analytics/refinery@e4d5149] (duration: 09m 06s) |
[production] |
18:57 |
<milimetric@deploy1001> |
Started deploy [analytics/refinery@e4d5149]: Regular analytics weekly train [analytics/refinery@e4d5149] |
[production] |
17:50 |
<cmjohnson@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) |
[production] |
17:48 |
<cmjohnson@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) |
[production] |
17:47 |
<cmjohnson@cumin1001> |
START - Cookbook sre.hosts.downtime |
[production] |
17:46 |
<cmjohnson@cumin1001> |
END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) |
[production] |
17:46 |
<cmjohnson@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) |
[production] |
17:45 |
<cmjohnson@cumin1001> |
START - Cookbook sre.hosts.downtime |
[production] |
17:44 |
<cmjohnson@cumin1001> |
START - Cookbook sre.hosts.downtime |
[production] |
17:43 |
<cmjohnson@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) |
[production] |
17:43 |
<cmjohnson@cumin1001> |
START - Cookbook sre.hosts.downtime |
[production] |
17:41 |
<cmjohnson@cumin1001> |
START - Cookbook sre.hosts.downtime |
[production] |