2020-04-20
§
|
08:09 |
<jynus> |
restarting s3 instance on db1095 to reduce its buffer pool T250602 |
[production] |
07:22 |
<_joe_> |
restarting php-fpm on the eqiad appservers to pick up the new max_execution_time |
[production] |
07:20 |
<marostegui> |
Re add tl_namespace index to db1104 and db1092 - T250060 |
[production] |
06:44 |
<moritzm> |
installing python2.7 security updates on jessie |
[production] |
06:41 |
<elukey> |
execute find -mtime +30 -delete in /var/log/airflow/scheduler on an-airflow1001 to free space |
[production] |
06:25 |
<moritzm> |
installing libxdmcp security updates on jessie |
[production] |
06:16 |
<moritzm> |
installing bash updates on jessie |
[production] |
05:54 |
<vgutierrez> |
rolling restart of ats-tls in cp[3052,3054,3056,3058,3060,4028,4029,4030,4031,4032] - T249335 |
[production] |
05:53 |
<marostegui> |
Deploy schema change on s8 eqiad hosts T250060 |
[production] |
05:50 |
<marostegui> |
Deploy schema change on s8 codfw - lag will show up T250060 |
[production] |
04:55 |
<ariel@deploy1001> |
Finished deploy [dumps/dumps@b813c8a]: no private table dumps, check for existence of 7z,bz2 page content files before dumping, various unit tests (duration: 00m 04s) |
[production] |
04:55 |
<ariel@deploy1001> |
Started deploy [dumps/dumps@b813c8a]: no private table dumps, check for existence of 7z,bz2 page content files before dumping, various unit tests |
[production] |
2020-04-18
§
|
22:50 |
<addshore> |
pool wdqs1006 blazegraph caught up T242453 |
[production] |
20:30 |
<cdanis@cumin1001> |
conftool action : set/pooled=true; selector: dnsdisc=wdqs,name=eqiad |
[production] |
20:27 |
<thcipriani> |
restart gerrit-replica |
[production] |
16:40 |
<dcausse> |
forcing replica count to 1 on some cloudelastic@chi indices |
[production] |
15:13 |
<Amir1> |
applying schema change of T139090 on labswiki (wikitech) |
[production] |
14:03 |
<cdanis@cumin1001> |
conftool action : set/pooled=false; selector: dnsdisc=wdqs,name=eqiad |
[production] |
12:19 |
<addshore> |
restarting blazegraph on wdqs1006 blazegraph stuck T242453 |
[production] |
12:15 |
<addshore> |
depool wdqs1006 blazegraph stuck T242453 |
[production] |
12:15 |
<addshore> |
depool wdqs1006 blazegraph stuck |
[production] |
06:07 |
<XioNoX> |
change OSPF metrics to prefer ulsfo tunnel transport |
[production] |
2020-04-17
§
|
19:33 |
<Krinkle> |
Depool mw1407.eqiad.wmnet for opcache testing. Do not repool without first reverting https://gerrit.wikimedia.org/r/589674. |
[production] |
19:32 |
<Krinkle> |
Depool mw1407.eqiad.wmnet for opcache and LCStoreStaticArray testing. – T99740 |
[production] |
17:41 |
<cmjohnson1> |
replacing network cable pc1009 T250257 |
[production] |
17:34 |
<cmjohnson1> |
moving msw1 to msw-c racks mounted switch cable ports from port 49 to port 50 |
[production] |
17:22 |
<jmm@cumin2001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) |
[production] |
17:22 |
<jmm@cumin2001> |
START - Cookbook sre.hosts.downtime |
[production] |
16:15 |
<Urbanecm> |
Revert recent email change of User:CPHL@SUL's email |
[production] |
16:05 |
<otto@deploy1001> |
helmfile [STAGING] Ran 'apply' command on namespace 'eventstreams' for release 'canary' . |
[production] |
16:05 |
<otto@deploy1001> |
helmfile [STAGING] Ran 'apply' command on namespace 'eventstreams' for release 'production' . |
[production] |
15:52 |
<otto@deploy1001> |
helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-main' for release 'canary' . |
[production] |
15:52 |
<otto@deploy1001> |
helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-main' for release 'production' . |
[production] |
15:48 |
<otto@deploy1001> |
helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics' for release 'canary' . |
[production] |
15:48 |
<otto@deploy1001> |
helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics' for release 'production' . |
[production] |
15:42 |
<otto@deploy1001> |
helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics-external' for release 'canary' . |
[production] |
15:41 |
<otto@deploy1001> |
helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics-external' for release 'production' . |
[production] |
15:20 |
<rzl> |
remove cronjobs from mwmaint1002 previously updated to systemd timers and erroneously left in crontab -- diffs: https://phabricator.wikimedia.org/P11012 T211250 |
[production] |
14:29 |
<mutante> |
ganeti2001 - kileld and restarted gnt-rapi process with the correct new key and cert |
[production] |
14:19 |
<cdanis> |
add peer AS29802 to cr2-eqdfw and cr2-esams |
[production] |
14:01 |
<mutante> |
netbox1001 - netbox_ganeti_eqiad_synx / systemd state fixed after gnt-rapi is runnign again on ganeti1003 |
[production] |
14:00 |
<mutante> |
ganeti1003 - fixing gnt-rapi daemon not running |
[production] |
13:54 |
<mateusbs17> |
Running VACUUM FULL for gis DB in maps2004.codfw.wmnet (which is depooled at the moment) |
[production] |
13:00 |
<mutante> |
netbox1001 - sudo systemctl start netbox_ganeti_eqiad_sync (was failed) |
[production] |
12:54 |
<mutante> |
contint2001 /usr/local/sbin/build-envoy-config -c /etc/envoy ; restart envoyproxy; was not listening on admin port |
[production] |
12:45 |
<mutante> |
cntint2001 - restart nagios-nrpe-server |
[production] |
12:28 |
<moritzm> |
copied kubernetes-client from stretch-wikimedia to buster-wikimedia T224591 |
[production] |