151-200 of 6038 results (33ms)
2024-04-17
§
|
07:39 |
<aqu> |
analytics/refinery deploy begin (added source jars 0.2.35) |
[analytics] |
07:37 |
<stevemunene> |
disable puppet on an-test-client1002 to test new conda anaytics deb T362648 |
[analytics] |
2024-04-16
§
|
20:08 |
<aqu> |
Weekly deploy of refinery using scap, then deployed onto hdfs |
[analytics] |
15:00 |
<btullis> |
kicked off a rolling restart of the hadoop worker datanode and nodemanager process for T356382 |
[analytics] |
14:40 |
<btullis> |
failed back HDFS namenode from an-master1004 to an-master1003. |
[analytics] |
11:02 |
<stevemunene> |
upgrade datahub to v0.12.1 T361688 |
[analytics] |
09:16 |
<btullis> |
restarting mapreduce history service on an-master1003 for T356382 |
[analytics] |
2024-04-15
§
|
11:05 |
<btullis> |
sudo systemctl start hadoop-hdfs-namenode.service on an-master1003 after failed failback operation. |
[analytics] |
10:45 |
<btullis> |
roll-restarting hadoop masters on the prod cluster for T356382 |
[analytics] |
08:54 |
<btullis> |
roll-restarting hadoop masters on test cluster for T356382 |
[analytics] |
08:36 |
<btullis> |
roll-restarting druid on test cluster for T356382 |
[analytics] |
2024-04-11
§
|
15:25 |
<btullis> |
restarting hive-server2 and hive-metastore on an-test-coord1001 for T356382 |
[analytics] |
14:10 |
<elukey> |
move cassandra instances on aqs1010 to PKI TLS certs |
[analytics] |
12:21 |
<btullis> |
deploying editor-analytics with the new aqs-http-gateway chart |
[analytics] |
2024-04-09
§
|
13:20 |
<btullis> |
shut down stat1010 to have the GPU power connected for T336040 |
[analytics] |
12:56 |
<gmodena> |
successfully deployed refinery to hadoop and hadoop-test |
[analytics] |
12:06 |
<gmodena> |
starting a refinery deployment for 2024-04-09 |
[analytics] |
2024-04-08
§
|
15:43 |
<btullis> |
decommissioning dumpsdata1002 for T362065 |
[analytics] |
15:25 |
<btullis> |
decommissioning dumpsdata1001 |
[analytics] |
12:00 |
<btullis> |
rebooting stat1011 due to unresponsiveness |
[analytics] |
2024-04-03
§
|
11:46 |
<stevemunene> |
disable puppet on `an-test-client1002` to test new conda-analytics version T356231 |
[analytics] |
2024-03-28
§
|
18:04 |
<btullis> |
deploying refinery to HDFS. |
[analytics] |
16:22 |
<btullis> |
deploying refinery to test the git-lfs integration with scap for T328472 |
[analytics] |
15:00 |
<elukey> |
remove GPU labels in Hadoop Yarn for an-worker[1096-1099] (the hosts don't have a GPU anymore) - T361225 |
[analytics] |
2024-03-27
§
|
15:14 |
<brouberol> |
decommissioning an-tool1009 now that hue is fully offline - T341895 |
[analytics] |
15:02 |
<brouberol> |
dropping the hue.wikimedia.org CNAME - T341895 |
[analytics] |
2024-03-25
§
|
15:02 |
<btullis> |
updating the ssl_provider for eventstreams schema servers to cfssl for T360412 |
[analytics] |
2024-03-22
§
|
13:17 |
<elukey> |
`elukey@cumin1002:~$ sudo cumin 'stat100[4,5,8,9]*' 'kill `pgrep -u kcv-wikimf`'` to unblock puppet on various stat nodes |
[analytics] |
10:44 |
<btullis> |
shut down an-worker1168 to investigate disk controller failure for T360594 |
[analytics] |
2024-03-20
§
|
10:50 |
<brouberol> |
superset.wikimedia.org is now migrated to the DSE k8s cluster, CAS errors have receeded |
[analytics] |
10:20 |
<brouberol> |
migrating superset to Kubernetes. Some CAS errors are expected during ~15 minutes |
[analytics] |
2024-03-07
§
|
14:01 |
<btullis> |
deploying updated mediwiki_history_reduced snapshots to AQS 2.0 |
[analytics] |
2024-03-04
§
|
12:22 |
<btullis> |
restarting hive-server2 and hive-metastore service on an-coord1003 |
[analytics] |
12:00 |
<btullis> |
migrating analytics-hive from an-coord1003 to an-coord1004 with https://gerrit.wikimedia.org/r/c/operations/dns/+/1008414 |
[analytics] |
10:32 |
<btullis> |
restart hive-server2 and hive-metastore service on an-coord1004 |
[analytics] |
2024-02-29
§
|
14:06 |
<btullis> |
sudo systemctl reset-failed refinery-sqoop-whole-mediawiki.service |
[analytics] |
09:59 |
<joal> |
Deploying refinery with scap (fix sqoop for tomorrow) |
[analytics] |
09:25 |
<brouberol> |
decommissioning an-tool1005 now that superset-next is migrated to k8s - T358706 |
[analytics] |
2024-02-28
§
|
11:08 |
<btullis> |
reimaging dbstore1007 to bookworm for T356961 |
[analytics] |
09:48 |
<joal> |
Deploying refinery onto HDFS |
[analytics] |
09:28 |
<joal> |
Deploying Refinery for T357859 |
[analytics] |
2024-02-27
§
|
18:14 |
<tchin> |
deploying eventstreams |
[analytics] |
2024-02-22
§
|
11:52 |
<brouberol> |
redeploying the spark-history server with expanded egress rules for hadoop workers - T358206 |
[analytics] |
2024-02-21
§
|
21:21 |
<joal> |
Update airflow variable for pageview_actor-hourly leading to 64 written files instead of 32 - this should ease the job resource consumption and prevent failures |
[analytics] |
19:51 |
<joal> |
Rerun pageview_actor_hourly for hour 2024-02-20T07:00 |
[analytics] |
2024-02-20
§
|
22:52 |
<sfaci> |
Deployed refinery using scap, then deployed onto hdfs |
[analytics] |
22:18 |
<sfaci> |
Starting refinery deployment |
[analytics] |
15:57 |
<xcollazo> |
deployed latest Airflow DAG updates for the analytics instance |
[analytics] |
2024-02-19
§
|
11:14 |
<sfaci> |
rerunning the compute_pageview_actor_hourly task in the pageview_actor_hourly DAG 2024-02-17 08:00:00 UTC |
[analytics] |
2024-02-13
§
|
09:03 |
<brouberol> |
attempting a reimage of apifeatureusage1001 to bookworm - T346053 |
[analytics] |