3051-3100 of 10000 results (85ms)
2019-12-16 ยง
12:23 <joal> Kill backfilling job for mediarequest-per-file with 2017-07-0[2345] days not done [analytics]
12:22 <joal> Rerun cassandra-daily-wf-local_group_default_T_pageviews_per_article_flat-2019-12-15 [analytics]
12:17 <elukey> kill netflow realtime druid supervisor as prep step for kerberos [analytics]
12:16 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [production]
12:15 <elukey@cumin1001> START - Cookbook sre.hosts.downtime [production]
12:12 <Urbanecm> EU SWAT done [production]
12:11 <urbanecm@deploy1001> Synchronized wmf-config/InitialiseSettings.php: SWAT: 026913d: Add no=>nb in $wgInterlanguageLinkCodeMap (T174160) (duration: 00m 53s) [production]
11:58 <jynus@cumin1001> dbctl commit (dc=all): 'Depool db1130', diff saved to https://phabricator.wikimedia.org/P9873 and previous config saved to /var/cache/conftool/dbconfig/20191216-115841-jynus.json [production]
11:55 <hashar> Restarting Jenkins completely to flush out stall Gearman functions in Zuul [production]
11:41 <jdrewniak@deploy1001> Synchronized portals: Wikimedia Portals Update: [[gerrit:558017| Bumping portals to master (T128546)]] (duration: 00m 52s) [production]
11:40 <jdrewniak@deploy1001> Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:558017| Bumping portals to master (T128546)]] (duration: 00m 56s) [production]
11:14 <joal> Clean spark-shell drivers on cluster before kerberos [analytics]
10:57 <elukey> disable puppet on labstore100[6,7] and stop analytics-related systemd timers - prep step for Kerberos [production]
10:46 <elukey> stop airflow-* on an-airflow1001 [analytics]
10:41 <XioNoX> delete virtual chassis ID on asw-d-codfw [production]
10:41 <elukey> stop jupyterhub on notebook100[3,4] as prep step for kerberos [analytics]
10:38 <elukey> kill Nuria's spark shell application masters in Yarn [analytics]
10:17 <elukey> stop hadoop-related timers on stat1007 [analytics]
10:14 <hashar> Restarting CI Jenkins due to out of sync state between Zuul Gearman and what is actually running (some jobs got lost) [production]
10:04 <joal> Killing user-app eating all cluster (application_1573208467349_190044) [analytics]
09:50 <marostegui> Stop replication in the same position in labsdb1010 and labsdb1012 - T238399 [production]
09:35 <hashar> doc1001: sudo -u doc-uploader rm -fR /srv/docroot/org/wikimedia/doc/DOCKER-mediawiki-core [releng]
09:24 <hashar> Reloading Jenkins CI [production]
09:14 <godog> upgrade hw raid firmware on ms-be2016 and reboot - T240798 [production]
09:14 <filippo@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [production]
09:13 <filippo@cumin1001> START - Cookbook sre.hosts.downtime [production]
09:05 <joal> Rerun webrequest-load-wf-text-2019-12-14-18 with updated error-checking parameters (all false positive) [analytics]
09:04 <Urbanecm> mwscript importImages.php --wiki=commonswiki --comment-ext=txt --user=Coffeeandcrumbs /home/urbanecm/T240825 (T240825) [production]
08:54 <ema> cp1077: ats-backend-restart to increase RAM cache size T238494 [production]
08:53 <moritzm> powercycling ms-be2016 T240798 [production]
08:49 <elukey> re-run webrequest-load 2019-12-14-13 and 2019-12-15-12 with higher mapreduce limits (modified version of refinery on hdfs /user/elukey with https://gerrit.wikimedia.org/r/#/c/analytics/refinery/+/557794/) [analytics]
08:36 <ema> cp1075: repool all services T240826 [production]
08:12 <ema> cp1075: wipe varnish-fe and ats-be caches due to missed purges T240826 [production]
08:08 <ema> cp1075: manually start vhtcpd.service T240826 [production]
07:52 <ema> cp1075: depool, vhtcpd not running [production]
07:38 <marostegui> Disable auto-learn on db21[03-35] T240823 [production]
07:27 <marostegui> Disable auto-learn on db[1126-1138].eqiad.wmnet T240823 [production]
07:22 <elukey> stop camus timers as prep step for maintenance (if we'll do it) [analytics]
07:13 <_joe_> restarting cpjobqueue on scb1001 to check if processing rate of recentChanges recovers T240518 [production]
07:11 <marostegui> Stop replication in the same position in labsdb1010 and labsdb1012 - T238399 [production]
07:09 <onimisionipe> depool maps2001 for postgres reinit - T239728 [production]
06:59 <onimisionipe> pool maps2004. osm import is complete - T239728 [production]
06:58 <_joe_> clearing apcu across multiple api servers to allow metrics to be collected again (task coming soon) [production]
06:56 <marostegui> Force re-learn cycle on db1130 [production]
06:42 <marostegui> Depool labsdb1010 - T238399 [production]
06:39 <marostegui> Recreate views on commonswiki,testcommonswiki for protected_titles on all labsdb hosts - T233135 [production]
06:29 <marostegui> Remove triggers for ar_comment on db1125:3314 T234704 [production]
06:28 <marostegui> Stop replication on db1121 for schema change [production]
06:28 <marostegui@cumin1001> dbctl commit (dc=all): 'Depool db1121 for schema change', diff saved to https://phabricator.wikimedia.org/P9871 and previous config saved to /var/cache/conftool/dbconfig/20191216-062809-marostegui.json [production]
03:52 <tstarling@deploy1001> Synchronized docroot/mediawiki.org/keys/keys.html: (no justification provided) (duration: 00m 57s) [production]