2021-04-14
§
|
14:05 |
<elukey> |
run build/env/bin/hue migrate on an-tool1009 after the hue upgade |
[analytics] |
13:10 |
<elukey> |
rollback hue-next to 4.8 - issues not present in staging |
[analytics] |
13:00 |
<elukey> |
upgrade Hue to 4.9 on an-tool1009 - hue-next.wikimedia.org |
[analytics] |
10:02 |
<elukey> |
roll restart yarn nodemanagers on hadoop prod (attempt to see if they entered in a weird state, graceful restart) |
[analytics] |
09:54 |
<elukey> |
kill long running mediawiki-job refine erroring out application_1615988861843_166906 |
[analytics] |
09:46 |
<elukey> |
kill application_1615988861843_163186 for the same reason |
[analytics] |
09:43 |
<elukey> |
kill application_1615988861843_164387 to see if any improvement to socket consumption is made |
[analytics] |
09:14 |
<elukey> |
run "sudo kill `pgrep -f sqoop`" on an-launcher1002 to clean up old test processes still running |
[analytics] |
2021-04-08
§
|
16:33 |
<elukey> |
reboot an-worker1100 again to check if all the disks come up correctly |
[analytics] |
15:43 |
<razzi> |
rebalance kafka partitions for webrequest_text partitions 17, 18 |
[analytics] |
15:35 |
<elukey> |
reboot an-worker1100 to see if it helps with the strange BBU behavior in T279475 |
[analytics] |
14:07 |
<elukey> |
drop /var/spool/rsyslog from stat1008 - corrupted files due to root partition filled up caused a SEGV for rsyslog |
[analytics] |
11:14 |
<hnowlan> |
created aqs user and loaded full schemas into analytics wmcs cassandra |
[analytics] |
08:35 |
<elukey> |
apt-get clean on stat1008 to free some space |
[analytics] |
07:44 |
<elukey> |
restart hadoop hdfs masters on an-master100[1,2] to apply the new log4j settings fro the audit log |
[analytics] |
06:44 |
<elukey> |
re-deployed refinery to hadoop-test after fixing permissions on an-test-coord1001 |
[analytics] |
2021-04-07
§
|
23:03 |
<ottomata> |
installing anaconda-wmf-2020.02~wmf5 on remaining nodes - T279480 |
[analytics] |
22:51 |
<ottomata> |
installing anaconda-wmf-2020.02~wmf5 on stat boxes - T279480 |
[analytics] |
22:47 |
<mforns> |
finished refinery deployment up to 1dbbd3dfa996d2e970eb1cbc0a63d53040d4e3a3 |
[analytics] |
22:39 |
<mforns> |
deployment of refinery via scap to hadoop-test failed with Permission denied: '/srv/deployment/analytics/refinery-cache/.config' (deployemt to production went fine) |
[analytics] |
21:44 |
<mforns> |
starting refinery deploy up to 1dbbd3dfa996d2e970eb1cbc0a63d53040d4e3a3 |
[analytics] |
21:26 |
<mforns> |
deployed refinery-source v0.1.4 |
[analytics] |
21:25 |
<razzi> |
sudo apt-get install --reinstall sudo apt-get install --reinstall anaconda-wmf on stat1008 |
[analytics] |
20:15 |
<razzi> |
rebalance kafka partitions for webrequest_text partitions 15, 16 |
[analytics] |
19:53 |
<ottomata> |
upgrade anaconda-wmf everywhere to 2020.02~wmf4 with fixes for T279480 |
[analytics] |