2024-01-31
§
|
17:00 |
<phuedx> |
phuedx@deploy2002 Started deploy [analytics/refinery@2c00cad] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@2c00cad1] |
[analytics] |
16:57 |
<phuedx> |
phuedx@deploy2002 Finished deploy [analytics/refinery@2c00cad] (thin): Regular analytics weekly train THIN [analytics/refinery@2c00cad1] (duration: 00m 06s) |
[analytics] |
16:57 |
<phuedx> |
phuedx@deploy2002 Started deploy [analytics/refinery@2c00cad] (thin): Regular analytics weekly train THIN [analytics/refinery@2c00cad1] |
[analytics] |
16:53 |
<phuedx> |
phuedx@deploy2002 Finished deploy [analytics/refinery@2c00cad]: Regular analytics weekly train [analytics/refinery@2c00cad1] (duration: 09m 52s) |
[analytics] |
16:52 |
<phuedx> |
Regular analytics weekly train [analytics/refinery@$(git rev-parse --short HEAD)] |
[analytics] |
12:12 |
<btullis> |
rebooting dbstore1009 for new kernel version (T356239) |
[analytics] |
11:56 |
<btullis> |
rebooting dbstore1008 for new kernel version (T356239) |
[analytics] |
10:57 |
<btullis> |
deploying https://gerrit.wikimedia.org/r/c/analytics/superset/deploy/+/994213 to superset-next to test nested display of presto columns |
[analytics] |
2024-01-15
§
|
17:02 |
<btullis> |
roll-restarting public druid cluster |
[analytics] |
17:01 |
<btullis> |
roll-restarting analytics druid cluster |
[analytics] |
16:55 |
<joal> |
Clearing analytics failed aiflow tasks after fix |
[analytics] |
16:47 |
<btullis> |
restarted the hive-server2 and hive-metastore services on an-coord100[3-4] which had been accidentally omitted earlier for T332573 |
[analytics] |
12:00 |
<btullis> |
removing all downtime for hadoop-all for T332573 |
[analytics] |
11:57 |
<btullis> |
un-pausing all previously paused DAGS on all airflow instances for T332573 |
[analytics] |
11:55 |
<btullis> |
re-enabling gobblin jobs |
[analytics] |
11:38 |
<brouberol> |
redeploying the Spark History Server to pick up the new HDFS namenodes - T332573 |
[analytics] |
11:29 |
<btullis> |
puppet runs cleanly on an-master1003 and it is the active namenode - running puppet an an-master1004. |
[analytics] |
11:20 |
<btullis> |
running puppet on an-master1003 to set it to active for T332573 |
[analytics] |
11:16 |
<btullis> |
running puppet on journal nodes first for T332573 |
[analytics] |
11:03 |
<btullis> |
stopping all hadoop services |
[analytics] |
10:59 |
<btullis> |
disabling puppet on all hadoop nodes |
[analytics] |
10:54 |
<btullis> |
putting HDFS into safe mode for T332573 |
[analytics] |
2024-01-09
§
|
21:28 |
<aqu> |
airflow-dags/analytics(_test) are both deployed |
[analytics] |
21:18 |
<aqu> |
analytics/refinery not deployed fully on test cluster. Ticket for the bug here: https://phabricator.wikimedia.org/T354703 |
[analytics] |
21:07 |
<aqu> |
Deployed refinery using scap, then deployed onto hdfs |
[analytics] |
20:48 |
<aqu> |
about to deploy analytics/refinery - weekly train |
[analytics] |
12:57 |
<stevemunene> |
roll restart analytics hadoop masters to pickup new net_topology script and new JRE T254480 |
[analytics] |
11:48 |
<stevemunene> |
roll restarting hadoop test masters to pick up new net_topology script and new JRE |
[analytics] |
11:36 |
<stevemunene> |
disable puppet on hadoop masters both test and production to test/implement new net_topology script |
[analytics] |
10:39 |
<btullis> |
roll-restarting kafka-jumbo to pick up new JRE |
[analytics] |