2020-05-07
ยง
|
13:01 |
<cmjohnson@cumin1001> |
START - Cookbook sre.hosts.downtime |
[production] |
12:59 |
<zpapierski@deploy1001> |
Started deploy [wdqs/wdqs@94906d0]: Deploy WDQS 0.3.28 + GUI - new servers |
[production] |
12:50 |
<marostegui@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) |
[production] |
12:48 |
<marostegui@cumin1001> |
START - Cookbook sre.hosts.downtime |
[production] |
12:43 |
<zpapierski@deploy1001> |
Finished deploy [wdqs/wdqs@94906d0]: Deploy WDQS 0.3.28 + GUI (duration: 16m 20s) |
[production] |
12:36 |
<arturo> |
cleanup livehacks in toolsbeta-puppetmaster-03 |
[toolsbeta] |
12:34 |
<mutante> |
removing role::labs::lvm::srv from deployment servers since this is now included in role:deployment_server and should neve have been a role in the first place |
[releng] |
12:34 |
<mutante> |
removing role::labs::lvm::srv from deployment servers since this is now included in role:deployment_server and should neve have been a role in the first place |
[deployment-prep] |
12:27 |
<zpapierski@deploy1001> |
Started deploy [wdqs/wdqs@94906d0]: Deploy WDQS 0.3.28 + GUI |
[production] |
12:13 |
<addshore@deploy1001> |
Synchronized php-1.35.0-wmf.31/extensions/Wikibase: [[gerrit:594920]] T252079 Revert "Move prefetching-term-lookup-callback service wiring" (duration: 01m 12s) |
[production] |
12:12 |
<cmjohnson@cumin1001> |
END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) |
[production] |
12:10 |
<cmjohnson@cumin1001> |
START - Cookbook sre.hosts.downtime |
[production] |
12:07 |
<mutante> |
- puppet still broken on deployment_servers due to unrelated pre-existing issues, also no alerts about it in shinken |
[releng] |
12:07 |
<mutante> |
- puppet still broken on deployment_servers due to unrelated pre-existing issues, also no alerts about it in shinken |
[deployment-prep] |
12:04 |
<mutante> |
- puppet broken on deployment_servers - fix deployed in https://gerrit.wikimedia.org/r/c/operations/puppet/+/594932 |
[releng] |
12:04 |
<mutante> |
- puppet broken on deployment_servers - fix deployed in https://gerrit.wikimedia.org/r/c/operations/puppet/+/594932 |
[deployment-prep] |
11:55 |
<cmjohnson@cumin1001> |
END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) |
[production] |
11:53 |
<cmjohnson@cumin1001> |
START - Cookbook sre.hosts.downtime |
[production] |
11:33 |
<moritzm> |
imported component/puppet5 for jessie-wikimedia into "main" |
[production] |
11:31 |
<jbond42> |
enable ferm-status script https://gerrit.wikimedia.org/r/c/operations/puppet/+/576102 |
[production] |
11:12 |
<arturo> |
livehack toolsbeta-puppetmaster-03 with https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/594925 and https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/594926 (T251297 and T250866) |
[toolsbeta] |
11:10 |
<matthiasmullie> |
EU swat done |
[production] |
11:07 |
<mlitn@deploy1001> |
Synchronized php-1.35.0-wmf.31/extensions/WikibaseMediaInfo/: [MediaInfo] Add dummy concept chips without thumbnail (duration: 01m 09s) |
[production] |
11:00 |
<joal> |
Moving application_1583418280867_334532 to the nice queue |
[analytics] |
10:58 |
<joal> |
Rerun wikidata-articleplaceholder_metrics-wf-2020-5-6 |
[analytics] |
10:07 |
<moritzm> |
installing Java security updates on restbase/sessionstore |
[production] |
09:24 |
<mutante> |
- cloud puppetmasters still affected by https://phabricator.wikimedia.org/T83447#5807825 |
[devtools] |
09:11 |
<elukey> |
roll restart cassandra on aqs1005 to pick up new openjdk upgrades (canary) |
[production] |
09:07 |
<mutante> |
- puppetmaster-1001 - Permission denied @ rb_sysopen - /var/lib/puppet/volatile/GeoIP/.geoipupdate.lock |
[devtools] |
09:06 |
<mutante> |
- avoiding the need for a second role for deployment_servers in cloud with https://gerrit.wikimedia.org/r/c/operations/puppet/+/594903 |
[devtools] |
09:05 |
<mutante> |
- puppet fixed on deploy-1002 with https://gerrit.wikimedia.org/r/c/operations/puppet/+/594900 |
[devtools] |
08:32 |
<moritzm> |
upgrading restbase-dev to latest OpenJDK security update |
[production] |
08:06 |
<jynus> |
setting pc2007, pc2009 as read-write |
[production] |
08:04 |
<mutante> |
- broken puppet again from prod changes. this time: deploy-1002 - []' is not applicable to an Undef Value. mediawiki/mcrouter_wancache.pp, line: 19 |
[devtools] |
07:59 |
<mutante> |
- shutting down instance puppet-paladox, backups created and uploaded to deploy-1002 in devtools (T236569) |
[git] |
07:55 |
<mutante> |
- shutting down instance gerrit-test7, backups created and uploaded to deploy1002 in devtools, disassociating floating IP (T236569) |
[git] |
07:45 |
<elukey> |
re-run mediawiki-history-denormalize |
[analytics] |
07:44 |
<godog> |
further decrease weight for ms-be10[678] - T252008 |
[production] |
07:43 |
<elukey> |
kill application_1583418280867_333560 after a chat with David, the job is consuming ~2TB of RAM |
[analytics] |
07:32 |
<elukey> |
re-run mediawiki history load |
[analytics] |
07:18 |
<elukey> |
execute yarn application -movetoqueue application_1583418280867_332862 -queue root.nice |
[analytics] |
07:06 |
<elukey> |
restart mediawiki-history-load via hue |
[analytics] |
06:41 |
<elukey> |
restart oozie on an-coord1001 |
[analytics] |
05:49 |
<marostegui@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) |
[production] |
05:46 |
<elukey> |
re-run mediarequest-hourly-wf-2020-5-6-19 |
[analytics] |
05:45 |
<marostegui@cumin1001> |
START - Cookbook sre.hosts.downtime |
[production] |
05:35 |
<elukey> |
re-run two failed hours for webrequest load text (07/05T05) and upload (06/05T23) |
[analytics] |
05:33 |
<elukey> |
restart hadoop yarn nodemanager on analytics1071 |
[analytics] |
05:33 |
<elukey> |
restart hadoop yarn nodemanager on analytics1071 |
[production] |
05:22 |
<marostegui> |
Reimage db2078 |
[production] |