651-700 of 10000 results (19ms)
2020-05-07 §
12:34 <mutante> removing role::labs::lvm::srv from deployment servers since this is now included in role:deployment_server and should neve have been a role in the first place [releng]
12:34 <mutante> removing role::labs::lvm::srv from deployment servers since this is now included in role:deployment_server and should neve have been a role in the first place [deployment-prep]
12:27 <zpapierski@deploy1001> Started deploy [wdqs/wdqs@94906d0]: Deploy WDQS 0.3.28 + GUI [production]
12:13 <addshore@deploy1001> Synchronized php-1.35.0-wmf.31/extensions/Wikibase: [[gerrit:594920]] T252079 Revert "Move prefetching-term-lookup-callback service wiring" (duration: 01m 12s) [production]
12:12 <cmjohnson@cumin1001> END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) [production]
12:10 <cmjohnson@cumin1001> START - Cookbook sre.hosts.downtime [production]
12:07 <mutante> - puppet still broken on deployment_servers due to unrelated pre-existing issues, also no alerts about it in shinken [releng]
12:07 <mutante> - puppet still broken on deployment_servers due to unrelated pre-existing issues, also no alerts about it in shinken [deployment-prep]
12:04 <mutante> - puppet broken on deployment_servers - fix deployed in https://gerrit.wikimedia.org/r/c/operations/puppet/+/594932 [releng]
12:04 <mutante> - puppet broken on deployment_servers - fix deployed in https://gerrit.wikimedia.org/r/c/operations/puppet/+/594932 [deployment-prep]
11:55 <cmjohnson@cumin1001> END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) [production]
11:53 <cmjohnson@cumin1001> START - Cookbook sre.hosts.downtime [production]
11:33 <moritzm> imported component/puppet5 for jessie-wikimedia into "main" [production]
11:31 <jbond42> enable ferm-status script https://gerrit.wikimedia.org/r/c/operations/puppet/+/576102 [production]
11:12 <arturo> livehack toolsbeta-puppetmaster-03 with https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/594925 and https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/594926 (T251297 and T250866) [toolsbeta]
11:10 <matthiasmullie> EU swat done [production]
11:07 <mlitn@deploy1001> Synchronized php-1.35.0-wmf.31/extensions/WikibaseMediaInfo/: [MediaInfo] Add dummy concept chips without thumbnail (duration: 01m 09s) [production]
11:00 <joal> Moving application_1583418280867_334532 to the nice queue [analytics]
10:58 <joal> Rerun wikidata-articleplaceholder_metrics-wf-2020-5-6 [analytics]
10:07 <moritzm> installing Java security updates on restbase/sessionstore [production]
09:24 <mutante> - cloud puppetmasters still affected by https://phabricator.wikimedia.org/T83447#5807825 [devtools]
09:11 <elukey> roll restart cassandra on aqs1005 to pick up new openjdk upgrades (canary) [production]
09:07 <mutante> - puppetmaster-1001 - Permission denied @ rb_sysopen - /var/lib/puppet/volatile/GeoIP/.geoipupdate.lock [devtools]
09:06 <mutante> - avoiding the need for a second role for deployment_servers in cloud with https://gerrit.wikimedia.org/r/c/operations/puppet/+/594903 [devtools]
09:05 <mutante> - puppet fixed on deploy-1002 with https://gerrit.wikimedia.org/r/c/operations/puppet/+/594900 [devtools]
08:32 <moritzm> upgrading restbase-dev to latest OpenJDK security update [production]
08:06 <jynus> setting pc2007, pc2009 as read-write [production]
08:04 <mutante> - broken puppet again from prod changes. this time: deploy-1002 - []' is not applicable to an Undef Value. mediawiki/mcrouter_wancache.pp, line: 19 [devtools]
07:59 <mutante> - shutting down instance puppet-paladox, backups created and uploaded to deploy-1002 in devtools (T236569) [git]
07:55 <mutante> - shutting down instance gerrit-test7, backups created and uploaded to deploy1002 in devtools, disassociating floating IP (T236569) [git]
07:45 <elukey> re-run mediawiki-history-denormalize [analytics]
07:44 <godog> further decrease weight for ms-be10[678] - T252008 [production]
07:43 <elukey> kill application_1583418280867_333560 after a chat with David, the job is consuming ~2TB of RAM [analytics]
07:32 <elukey> re-run mediawiki history load [analytics]
07:18 <elukey> execute yarn application -movetoqueue application_1583418280867_332862 -queue root.nice [analytics]
07:06 <elukey> restart mediawiki-history-load via hue [analytics]
06:41 <elukey> restart oozie on an-coord1001 [analytics]
05:49 <marostegui@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [production]
05:46 <elukey> re-run mediarequest-hourly-wf-2020-5-6-19 [analytics]
05:45 <marostegui@cumin1001> START - Cookbook sre.hosts.downtime [production]
05:35 <elukey> re-run two failed hours for webrequest load text (07/05T05) and upload (06/05T23) [analytics]
05:33 <elukey> restart hadoop yarn nodemanager on analytics1071 [analytics]
05:33 <elukey> restart hadoop yarn nodemanager on analytics1071 [production]
05:22 <marostegui> Reimage db2078 [production]
05:04 <marostegui@cumin1001> dbctl commit (dc=all): 'Set s3 and s7 as read-only=off for maintenance T251158', diff saved to https://phabricator.wikimedia.org/P11167 and previous config saved to /var/cache/conftool/dbconfig/20200507-050419-marostegui.json [production]
05:00 <marostegui@cumin1001> dbctl commit (dc=all): 'Set s3 and s7 as read-only for maintenance T251158', diff saved to https://phabricator.wikimedia.org/P11166 and previous config saved to /var/cache/conftool/dbconfig/20200507-050046-marostegui.json [production]
02:56 <brennen@deploy1001> rebuilt and synchronized wikiversions files: Revert group1 wikis to 1.35.0-wmf.30 for T252079 [production]
02:55 <brennen> reverting group1 to 1.35.0-wmf.30 for T252079 [production]
00:12 <ryankemper@cumin1001> END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) [production]
2020-05-06 §
23:59 <catrope@deploy1001> Synchronized wmf-config/InitialiseSettings.php: Disable GrowthExperiments guidance on testwiki (duration: 01m 07s) [production]