2020-05-28
§
|
11:27 |
<arturo> |
cleanup livehackings |
[toolsbeta] |
11:27 |
<arturo> |
merging change to front-proxy: https://gerrit.wikimedia.org/r/c/operations/puppet/+/599139 (T253816) |
[tools] |
11:03 |
<kormat@cumin1001> |
dbctl commit (dc=all): 'Add db2138 to s2+s4 T252985', diff saved to https://phabricator.wikimedia.org/P11330 and previous config saved to /var/cache/conftool/dbconfig/20200528-110333-kormat.json |
[production] |
10:36 |
<jayme@deploy1001> |
helmfile [EQIAD] Ran 'sync' command on namespace 'blubberoid' for release 'production' . |
[production] |
10:34 |
<jayme@deploy1001> |
helmfile [CODFW] Ran 'sync' command on namespace 'blubberoid' for release 'production' . |
[production] |
10:31 |
<arturo> |
livehacking puppetmaster and toolsbeta-proxy-1 to test https://gerrit.wikimedia.org/r/c/operations/puppet/+/599139 (T253816) |
[toolsbeta] |
10:30 |
<arturo> |
livehacking puppetmaster to test https://gerrit.wikimedia.org/r/c/operations/puppet/+/599139 |
[toolsbeta] |
10:30 |
<jayme@deploy1001> |
helmfile [STAGING] Ran 'sync' command on namespace 'blubberoid' for release 'staging' . |
[production] |
10:21 |
<RhinosF1> |
Known inconsistency in git - unable to pull changes (NOW RESOLVED) |
[tools.zppixbot] |
10:20 |
<RhinosF1> |
Known inconsistency in git - unable to pull changes |
[tools.zppixbot] |
10:07 |
<wm-bot> |
<rhinosf1> tweaking stuff for testing |
[tools.zppixbot-test] |
10:02 |
<mutante> |
gerrit1002 (test server) - chown -R gerrit2:gerrit2 /var/lib/gerrit/review_site ; restarted gerrit service, now the service is not in restart loop anymore, gerrit-ssh is listening too, just not accepting publickey (T239151) |
[production] |
09:51 |
<XioNoX> |
failover VRRP in ulsfo |
[production] |
09:41 |
<XioNoX> |
re-activate peering/transit on cr2-eqdfw - T243080 |
[production] |
09:35 |
<mutante> |
restarting gerrit on gerrit1002 after fixing db_pass to the readonly one (T243800) |
[production] |
09:33 |
<XioNoX> |
restart cr2-eqdfw for upgrade - T243080 |
[production] |
09:30 |
<XioNoX> |
deactivate peering/transit on cr2-eqdfw - T243080 |
[production] |
09:25 |
<_joe_> |
updating ACLs on all etcd servers |
[production] |
09:22 |
<XioNoX> |
install new Junos on cr2-eqdfw - T243080 |
[production] |
09:16 |
<XioNoX> |
rollback cr2-eqord ospf/bgp - T243080 |
[production] |
09:07 |
<XioNoX> |
restart cr2-eqord for upgrade - T243080 |
[production] |
09:05 |
<jayme@deploy1001> |
helmfile [STAGING] Ran 'sync' command on namespace 'blubberoid' for release 'staging' . |
[production] |
08:50 |
<_joe_> |
upgrading etcd ACLs (adding new users) to conf1004 |
[production] |
08:50 |
<XioNoX> |
install new Junos on cr2-eqord - T243080 |
[production] |
08:46 |
<XioNoX> |
deactivate peering/transit on cr2-eqord - T243080 |
[production] |
08:45 |
<XioNoX> |
de-pref all OSPF links to cr2-eqord - T243080 |
[production] |
08:13 |
<marostegui> |
Pool db1141 into labsdb analytics role - T249188 |
[production] |
07:33 |
<gilles@deploy1001> |
Synchronized static/images: T252108 Deploying optimised static PNGs (duration: 01m 39s) |
[production] |
07:31 |
<gilles@deploy1001> |
Synchronized static/apple-touch: T252108 Deploying optimised static PNGs (duration: 01m 12s) |
[production] |
06:40 |
<elukey> |
slowly restarting all RU units on an-launcher1001 |
[analytics] |
06:32 |
<elukey> |
delete old RU pid files with timestamp May 27 19:00 (scap deployment failed to an-launcher due to disk issues) except ./jobs/reportupdater-queries/pingback/.reportupdater.pid that was working fine |
[analytics] |
06:30 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Remove db1081 from API and set its weight to 0 on main traffic - preparation for tomorrow's failover T253808', diff saved to https://phabricator.wikimedia.org/P11329 and previous config saved to /var/cache/conftool/dbconfig/20200528-063037-marostegui.json |
[production] |
04:44 |
<marostegui> |
Run check_private data on db1141 - T249188 |
[production] |
04:22 |
<marostegui> |
Stop MySQL on db1141 - T249188 |
[production] |
00:33 |
<andrewbogott> |
shutting down cloudservices2002-dev to see if we can live without it. This is in anticipation or rebuilding it entirely for T253780 |
[admin] |
2020-05-27
§
|
23:29 |
<andrewbogott> |
disabling the backup job on cloudbackup2001 (just like last week) so the backup doesn't start while Brooke is rebuilding labstore1004 tomorrow. |
[admin] |
23:20 |
<catrope@deploy1001> |
Synchronized wmf-config/InitialiseSettings.php: Add autoreviewrestore right to rollbacker group on hiwiki (T252986) (duration: 01m 05s) |
[production] |
23:16 |
<catrope@deploy1001> |
Synchronized wmf-config/InitialiseSettings.php: Add thwiki Draft namespace to wmgExemptFromUserRobotsControlExtra and enable VE there (T252959) (duration: 01m 06s) |
[production] |
22:58 |
<gehel@cumin1001> |
END (PASS) - Cookbook sre.postgresql.postgres-init (exit_code=0) |
[production] |
22:02 |
<crusnov@deploy1001> |
Finished deploy [netbox/deploy@5251cf1]: Netbox Upgrade to 2.8.4 (part4) (duration: 00m 10s) |
[production] |
22:02 |
<crusnov@deploy1001> |
Started deploy [netbox/deploy@5251cf1]: Netbox Upgrade to 2.8.4 (part4) |
[production] |
22:01 |
<crusnov@deploy1001> |
Finished deploy [netbox/deploy@5251cf1]: Netbox Upgrade to 2.8.4 (part3) (duration: 01m 29s) |
[production] |
22:00 |
<crusnov@deploy1001> |
Started deploy [netbox/deploy@5251cf1]: Netbox Upgrade to 2.8.4 (part3) |
[production] |
22:00 |
<crusnov@deploy1001> |
deploy aborted: Netbox Upgrade to 2.8.4 (part2) (duration: 01m 31s) |
[production] |
21:58 |
<crusnov@deploy1001> |
Started deploy [netbox/deploy@5251cf1]: Netbox Upgrade to 2.8.4 (part2) |
[production] |
21:58 |
<crusnov@deploy1001> |
Finished deploy [netbox/deploy@5251cf1]: Netbox Upgrade to 2.8.1 (part1) (duration: 01m 01s) |
[production] |
21:57 |
<crusnov@deploy1001> |
Started deploy [netbox/deploy@5251cf1]: Netbox Upgrade to 2.8.1 (part1) |
[production] |
21:55 |
<James_F> |
Nicely restarting Jenkins for xunit plugin upgrade. |
[releng] |
20:43 |
<gehel@cumin1001> |
START - Cookbook sre.postgresql.postgres-init |
[production] |
20:28 |
<marostegui> |
Decrease innodb poolsize on s4 master and restart mysql |
[production] |