2020-04-18
§
|
22:50 |
<addshore> |
pool wdqs1006 blazegraph caught up T242453 |
[production] |
21:40 |
<elukey> |
force hdfs-balancer as attempt to redistribute hdfs blocks more evenly to worker nodes (hoping to free the busiest ones) |
[analytics] |
21:32 |
<elukey> |
drop /user/analytics-privatedata/.Trash/* from hdfs to free some space (~100G used) |
[analytics] |
21:25 |
<elukey> |
drop /var/log/hadoop-yarn/apps/analytics-search/* from hdfs to free space (~8T replicated used) |
[analytics] |
21:21 |
<elukey> |
drop /user/{analytics|hdfs}/.Trash/* from hdfs to free space (~100T used) |
[analytics] |
21:12 |
<elukey> |
drop /var/log/hadoop-yarn/apps/analytics from hdfs to free space (15.1T replicated) |
[analytics] |
20:30 |
<cdanis@cumin1001> |
conftool action : set/pooled=true; selector: dnsdisc=wdqs,name=eqiad |
[production] |
20:27 |
<thcipriani> |
restart gerrit-replica |
[production] |
18:16 |
<wm-bot> |
<lucaswerkmeister> deployed c815a210bd (Hebrew nouns) |
[tools.lexeme-forms] |
17:34 |
<wm-bot> |
<lucaswerkmeister> deployed 33c3ac264e (fix english-adverb edit mode) |
[tools.lexeme-forms] |
16:40 |
<dcausse> |
forcing replica count to 1 on some cloudelastic@chi indices |
[production] |
15:13 |
<Amir1> |
applying schema change of T139090 on labswiki (wikitech) |
[production] |
14:03 |
<cdanis@cumin1001> |
conftool action : set/pooled=false; selector: dnsdisc=wdqs,name=eqiad |
[production] |
12:19 |
<addshore> |
restarting blazegraph on wdqs1006 blazegraph stuck T242453 |
[production] |
12:15 |
<addshore> |
depool wdqs1006 blazegraph stuck T242453 |
[production] |
12:15 |
<addshore> |
depool wdqs1006 blazegraph stuck |
[production] |
11:55 |
<wm-bot> |
<lucaswerkmeister> deployed 2959ebf637 (fix duplicates in advanced mode) |
[tools.lexeme-forms] |
06:07 |
<XioNoX> |
change OSPF metrics to prefer ulsfo tunnel transport |
[production] |
2020-04-17
§
|
23:02 |
<bd808> |
Added bd808 (self) as project admin |
[meet] |
21:21 |
<hashar> |
Building Docker image releng/node10-portals:0.1.2 for longma |
[releng] |
21:11 |
<James_F> |
Zuul: Add Scribunto to the gate after only five years of talking about it T125050 |
[releng] |
19:59 |
<James_F> |
Docker: Rebuild the quibble world for 0.0.42 |
[releng] |
19:33 |
<Krinkle> |
Depool mw1407.eqiad.wmnet for opcache testing. Do not repool without first reverting https://gerrit.wikimedia.org/r/589674. |
[production] |
19:32 |
<Krinkle> |
Depool mw1407.eqiad.wmnet for opcache and LCStoreStaticArray testing. – T99740 |
[production] |
19:01 |
<longma> |
Updating docker-pkg files on contint2001 for https://gerrit.wikimedia.org/r/c/integration/config/+/589414 |
[releng] |
17:41 |
<cmjohnson1> |
replacing network cable pc1009 T250257 |
[production] |
17:39 |
<James_F> |
Docker: Rebuilding the PHP world to have native zip and unzip T250496 |
[releng] |
17:34 |
<cmjohnson1> |
moving msw1 to msw-c racks mounted switch cable ports from port 49 to port 50 |
[production] |
17:22 |
<jmm@cumin2001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) |
[production] |
17:22 |
<jmm@cumin2001> |
START - Cookbook sre.hosts.downtime |
[production] |
16:15 |
<Urbanecm> |
Revert recent email change of User:CPHL@SUL's email |
[production] |
16:05 |
<otto@deploy1001> |
helmfile [STAGING] Ran 'apply' command on namespace 'eventstreams' for release 'canary' . |
[production] |
16:05 |
<otto@deploy1001> |
helmfile [STAGING] Ran 'apply' command on namespace 'eventstreams' for release 'production' . |
[production] |
15:52 |
<otto@deploy1001> |
helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-main' for release 'canary' . |
[production] |
15:52 |
<otto@deploy1001> |
helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-main' for release 'production' . |
[production] |
15:49 |
<brennen> |
Updating dev-images docker-pkg files on contint2001 for T231864 |
[releng] |
15:48 |
<otto@deploy1001> |
helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics' for release 'canary' . |
[production] |
15:48 |
<otto@deploy1001> |
helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics' for release 'production' . |
[production] |
15:42 |
<otto@deploy1001> |
helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics-external' for release 'canary' . |
[production] |
15:41 |
<otto@deploy1001> |
helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics-external' for release 'production' . |
[production] |
15:20 |
<rzl> |
remove cronjobs from mwmaint1002 previously updated to systemd timers and erroneously left in crontab -- diffs: https://phabricator.wikimedia.org/P11012 T211250 |
[production] |
14:29 |
<mutante> |
ganeti2001 - kileld and restarted gnt-rapi process with the correct new key and cert |
[production] |
14:19 |
<cdanis> |
add peer AS29802 to cr2-eqdfw and cr2-esams |
[production] |
14:01 |
<mutante> |
netbox1001 - netbox_ganeti_eqiad_synx / systemd state fixed after gnt-rapi is runnign again on ganeti1003 |
[production] |
14:00 |
<mutante> |
ganeti1003 - fixing gnt-rapi daemon not running |
[production] |
13:54 |
<mateusbs17> |
Running VACUUM FULL for gis DB in maps2004.codfw.wmnet (which is depooled at the moment) |
[production] |
13:45 |
<elukey> |
lock down /srv/log/mw-log/archive/ on stat1007 to analytics-privatedata-users access only |
[analytics] |