2020-06-08
§
|
10:41 |
<jdrewniak@deploy1001> |
Synchronized portals: Wikimedia Portals Update: [[gerrit:603408| Bumping portals to master (603408)]] (duration: 00m 57s) |
[production] |
10:40 |
<jdrewniak@deploy1001> |
Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:603408| Bumping portals to master (603408)]] (duration: 01m 09s) |
[production] |
10:39 |
<XioNoX> |
depool codfw - T243080 |
[production] |
09:46 |
<moritzm> |
installing gnutls28 security updates on buster (older releases not affected) |
[production] |
09:32 |
<qchris> |
Turning on puppet on gerrit1002 again to avoid starting to lag too far behind |
[production] |
08:17 |
<XioNoX> |
push T250136 to eqsin - T250136 |
[production] |
08:09 |
<XioNoX> |
push T250136 to eqiad - T250136 |
[production] |
08:07 |
<moritzm> |
upgrading mw1349-mw1383 to PHP 7.2.31 |
[production] |
08:07 |
<mutante> |
stat1006 moved broken jupyter-dedcode-singleuser.service out of /run/systemd/transient. systemctl reset-failed |
[production] |
08:02 |
<XioNoX> |
push T250136 to codfw - T250136 |
[production] |
07:58 |
<XioNoX> |
push T250136 to eqord/eqdfw - T250136 |
[production] |
07:58 |
<mutante> |
stat1006 bash[40607]: /bin/bash: line 0: exec: jupyterhub-singleuser: not found |
[production] |
07:57 |
<mutante> |
ran puppet on all stat* hosts for an access request (dcipoletti was added) - stat1006 systemd state broke right after, jupyter-dedcode-singleuser.service failed |
[production] |
07:46 |
<XioNoX> |
push T250136 to esams/knams - T250136 |
[production] |
07:42 |
<XioNoX> |
cr4-ulsfo protocols bgp group Transit4 family inet any -> unicast - T250136 |
[production] |
07:39 |
<XioNoX> |
cr3-ulsfo protocols bgp group Transit4 family inet any -> unicast - T250136 |
[production] |
07:37 |
<moritzm> |
installing nodejs security updates |
[production] |
07:05 |
<marostegui> |
Stop MySQL on labsdb1012 to clone labsdb1011 T249188 |
[production] |
05:22 |
<marostegui> |
Upgrade db1077 to 10.4.13 to test events memory leak |
[production] |
04:45 |
<_joe_> |
de-firewalling mc1029 |
[production] |
04:27 |
<_joe_> |
firewallingf off memcached on mc1029 |
[production] |
2020-06-05
§
|
16:45 |
<elukey@deploy1001> |
Finished deploy [analytics/turnilo/deploy@f7e4f78]: Upgrade to 1.24.0 (duration: 00m 11s) |
[production] |
16:45 |
<elukey@deploy1001> |
Started deploy [analytics/turnilo/deploy@f7e4f78]: Upgrade to 1.24.0 |
[production] |
16:29 |
<bd808> |
Testing stashbot following hard restart of service. It was having LDAP connection failure problems. |
[production] |
16:00 |
<AndyRussG> |
Turned off Fundraising job recurring_smashpig_charge |
[production] |
15:54 |
<cdanis> |
enabling & rerunning puppet on netflow* T254574 |
[production] |
15:39 |
<cdanis> |
disabling puppet on netflow* and trying I6598d8f8 on netflow3001 first T254574 |
[production] |
15:39 |
<cdanis> |
disabling puppet on netflow* and trying I6598d8f8 on netflow3001 first |
[production] |
13:33 |
<jayme@deploy1001> |
helmfile [STAGING] Ran 'sync' command on namespace 'mathoid' for release 'staging' . |
[production] |
13:19 |
<akosiaris@deploy1001> |
helmfile [STAGING] Ran 'sync' command on namespace 'cxserver' for release 'staging' . |
[production] |
13:19 |
<elukey@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) |
[production] |
13:19 |
<akosiaris@deploy1001> |
helmfile [STAGING] Ran 'sync' command on namespace 'citoid' for release 'staging' . |
[production] |
13:18 |
<akosiaris@deploy1001> |
helmfile [STAGING] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' . |
[production] |
13:15 |
<elukey@cumin1001> |
START - Cookbook sre.hosts.downtime |
[production] |
12:55 |
<ladsgroup@deploy1001> |
Synchronized wmf-config/interwiki.php: Hotfix for be-tarask interwiki link being broken (T111853) (duration: 01m 00s) |
[production] |
12:41 |
<mutante> |
rebooting gerrit1002 to add more vCPUs, after [ganeti1009:~] $ sudo gnt-instance modify -B vcpus=8 gerrit1002.wikimedia.org T239151 |
[production] |
12:20 |
<akosiaris@deploy1001> |
helmfile [STAGING] Ran 'sync' command on namespace 'zotero' for release 'staging' . |
[production] |
12:19 |
<akosiaris@deploy1001> |
helmfile [STAGING] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' . |
[production] |
12:19 |
<akosiaris@deploy1001> |
helmfile [STAGING] Ran 'sync' command on namespace 'cxserver' for release 'staging' . |
[production] |
12:19 |
<akosiaris@deploy1001> |
helmfile [STAGING] Ran 'sync' command on namespace 'citoid' for release 'staging' . |
[production] |
12:19 |
<akosiaris@deploy1001> |
helmfile [STAGING] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' . |
[production] |
12:17 |
<akosiaris> |
update blubberoid changeprop changeprop-jobqueue citoid cxserver wikifeeds zotero in staging to latest charts |
[production] |
12:17 |
<akosiaris@deploy1001> |
helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' . |
[production] |
12:17 |
<akosiaris@deploy1001> |
helmfile [STAGING] Ran 'sync' command on namespace 'blubberoid' for release 'staging' . |
[production] |
12:17 |
<akosiaris> |
fix typo in ganeti2016 /etc/network/interfaces and reboot |
[production] |
11:28 |
<akosiaris> |
master-failover from ganeti2001 to ganeti2019 for ganeti01.svc.codfw.wmnet |
[production] |
11:25 |
<akosiaris@deploy1001> |
helmfile [EQIAD] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' . |
[production] |
11:25 |
<akosiaris@deploy1001> |
helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' . |
[production] |
11:25 |
<akosiaris@deploy1001> |
helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' . |
[production] |
11:14 |
<mutante> |
running puppet on all ganeti nodes |
[production] |