451-500 of 10000 results (15ms)
2020-10-05 ยง
19:27 <dzahn@cumin1001> START - Cookbook sre.dns.netbox [production]
19:14 <mforns> restarted oozie coord unique_devices-per_domain-monthly after deployment [analytics]
19:05 <mforns> finished deploying refinery to unblock deletion of raw mediawiki_job and raw netflow data [analytics]
18:59 <mforns@deploy1001> Finished deploy [analytics/refinery@2c6c335] (thin): [THIN] Special deployment to unblock deletion jobs [analytics/refinery@2c6c335e61cecd0321ec6f066a153feaf2dbbc27] (duration: 00m 08s) [production]
18:59 <mforns@deploy1001> Started deploy [analytics/refinery@2c6c335] (thin): [THIN] Special deployment to unblock deletion jobs [analytics/refinery@2c6c335e61cecd0321ec6f066a153feaf2dbbc27] [production]
18:58 <mforns@deploy1001> Finished deploy [analytics/refinery@2c6c335]: Special deployment to unblock deletion jobs [analytics/refinery@2c6c335e61cecd0321ec6f066a153feaf2dbbc27] (duration: 12m 08s) [production]
18:46 <mforns@deploy1001> Started deploy [analytics/refinery@2c6c335]: Special deployment to unblock deletion jobs [analytics/refinery@2c6c335e61cecd0321ec6f066a153feaf2dbbc27] [production]
18:46 <mutante> marked project for deletion in 2020 purge [planet]
18:45 <mforns> deploying refinery to unblock deletion of raw mediawiki_job and raw netflow data [analytics]
18:44 <mutante> shutting down instance pk8s - not in use since 2019 [planet]
18:20 <elukey> manual creation of /opt/rocm -> /opt/rocm-3.3.0 on stat1008 to avoid failures in finding the lib dir [analytics]
18:17 <elukey@cumin1001> END (FAIL) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=99) [production]
18:17 <elukey@cumin1001> START - Cookbook sre.hadoop.init-hadoop-workers [production]
18:15 <elukey@cumin1001> END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) [production]
18:13 <elukey@cumin1001> START - Cookbook sre.hadoop.init-hadoop-workers [production]
18:11 <elukey@cumin1001> END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) [production]
18:10 <elukey@cumin1001> START - Cookbook sre.hadoop.init-hadoop-workers [production]
17:53 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [production]
17:51 <elukey@cumin1001> START - Cookbook sre.hosts.downtime [production]
17:44 <wm-bot> <bd808> Purged cache of puppet roles for T264649 [tools.openstack-browser]
17:40 <bd808> `service uwsgi-labspuppetbackend restart` on cloud-puppetmaster-03 (T264649) [admin]
17:29 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [production]
17:27 <elukey@cumin1001> START - Cookbook sre.hosts.downtime [production]
17:25 <hnowlan@deploy1001> helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' . [production]
17:25 <hnowlan@deploy1001> helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' . [production]
17:11 <elukey> bootstrap an-worker[1115-1117] as hadoop workers [analytics]
17:00 <hnowlan@deploy1001> helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' . [production]
17:00 <hnowlan@deploy1001> helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' . [production]
16:59 <hnowlan@deploy1001> helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' . [production]
16:59 <hnowlan@deploy1001> helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' . [production]
16:51 <hnowlan@deploy1001> helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' . [production]
16:51 <hnowlan@deploy1001> helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' . [production]
15:50 <thcipriani> start deployment-mediawiki-07 [releng]
15:43 <brennen> restarting gitlab-test after disabling prometheus metrics [releng]
15:15 <ppchelko@deploy1001> helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' . [production]
14:56 <ppchelko@deploy1001> helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' . [production]
14:55 <ppchelko@deploy1001> helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' . [production]
14:52 <milimetric> disabling drop-el-unsanitized-events timer until https://gerrit.wikimedia.org/r/c/analytics/refinery/+/631804/ is deployed [analytics]
14:41 <elukey> shutdown stat1005 and stat1008 for ram expansion (1005 again) [analytics]
14:41 <elukey> shutdown stat1005 and stat1008 for ram expansion (1005 again) [production]
14:36 <ppchelko@deploy1001> Finished deploy [restbase/deploy@366a543]: T263133 T264035 (duration: 22m 23s) [production]
14:25 <elukey> shutdown an-master1001 for ram expansion [production]
14:25 <elukey> shutdown an-master1001 for ram expansion [analytics]
14:13 <ppchelko@deploy1001> Started deploy [restbase/deploy@366a543]: T263133 T264035 [production]
14:01 <filippo@deploy1001> helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' . [production]
13:58 <filippo@deploy1001> helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' . [production]
13:55 <filippo@deploy1001> helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' . [production]
13:54 <elukey> shutdown stat1005 for ram upgrade [analytics]
13:54 <elukey> shutdown stat1005 for ram upgrade [production]
13:31 <elukey> shutdown an-master1002 for ram expansion (64 -> 128G) [production]