2015-12-07
§
|
13:46 |
<hashar> |
Reloading Jenkins configuration from disk following up mass deletions of jobs directly on gallium |
[releng] |
13:46 |
<Coren> |
The new grid masters are happy, killing the old ones (-shadow, -master) |
[tools] |
13:41 |
<hashar> |
deleting a bunch of unmanaged Jenkins jobs (no more in JJB / no more in Zuul) |
[releng] |
12:56 |
<jynus> |
rolling restart, configuration upgrade of es1015 |
[production] |
12:20 |
<jynus@tin> |
Synchronized wmf-config/db-eqiad.php: Depool es1015; es1013 at 100% load; pool es1017 with low weight (duration: 00m 28s) |
[production] |
11:08 |
<YuviPanda> |
restarting pdns on holmium |
[production] |
10:52 |
<jynus> |
database and system maintenance to es1017 |
[production] |
10:46 |
<YuviPanda> |
restarted nscd on tools-proxy-01 |
[tools] |
10:43 |
<hashar> |
CI / zuul / nodepool recovered. Root cause was some malfunction in openstack wmflabs |
[production] |
10:20 |
<YuviPanda> |
restarted nova-conductor and scheduler on labcontrol1001 |
[production] |
10:07 |
<jynus@tin> |
Synchronized wmf-config/db-eqiad.php: Repool es1013 (lower weight for now) and depool es1017 (duration: 00m 41s) |
[production] |
10:05 |
<hashar> |
stopped Nodepool. Can not create instances anymore on wmflabs ( https://phabricator.wikimedia.org/T120586 ) |
[production] |
09:46 |
<hashar> |
restarting Nodepool on labnodepool1001.eqiad.wment |
[production] |
09:40 |
<hashar> |
CI / Zuul stalled. Nodepool can no more spawn instances :-/ |
[production] |
09:27 |
<godog> |
nodetool decommission restbase1008 |
[production] |
09:13 |
<jynus> |
es1013 maintenance (mysql restart, upgrade, possible reboot) |
[production] |
08:27 |
<_joe_> |
uploaded etcd 2.2 package from stretch to jessie-wikimedia |
[production] |
04:24 |
<bd808> |
The ip address in jenkins for ci-jessie-wikimedia-10306 now belongs to an instance named future-wikipedia.reading-web-staging.eqiad.wmflabs (obviously the config is wrong) |
[releng] |
04:12 |
<bd808> |
ci-jessie-wikimedia-10306 down and blocking many zuul queues |
[releng] |
03:56 |
<l10nupdate@tin> |
ResourceLoader cache refresh completed at Mon Dec 7 03:56:49 UTC 2015 (duration 1h 32m 22s) |
[production] |
02:24 |
<mwdeploy@tin> |
sync-l10n completed (1.27.0-wmf.7) (duration: 09m 59s) |
[production] |
2015-12-06
§
|
21:48 |
<ori> |
krypton unresponsive, nothing on console. shutting down, increasing instance ram from 2 to 4g, and rebooting. |
[production] |
21:01 |
<Luke081515> |
Enable rcm-5, try to replicate phabricator update issue with puppet |
[rcm] |
21:00 |
<Luke081515> |
deleted rcm-3 (Not needed) |
[rcm] |
18:49 |
<legoktm> |
reset auth token for User:QuimGil |
[production] |
10:29 |
<YuviPanda> |
did webservice start on tool 'derivative', was missing service.manifest |
[tools] |
05:50 |
<mutante> |
silver gzip /var/log/nutcracker.log.1 |
[production] |
05:40 |
<mutante> |
silver: apt-get clean for disk space |
[production] |
03:57 |
<l10nupdate@tin> |
ResourceLoader cache refresh completed at Sun Dec 6 03:57:02 UTC 2015 (duration 1h 31m 41s) |
[production] |
02:25 |
<mwdeploy@tin> |
sync-l10n completed (1.27.0-wmf.7) (duration: 10m 04s) |
[production] |
2015-12-05
§
|
18:30 |
<gwicke> |
started nodetool decommission on restbase1008 |
[production] |
11:35 |
<reedy@tin> |
Synchronized wmf-config/CommonSettings.php: Disable common password password policy to come in wmf.8 (duration: 00m 28s) |
[production] |
11:23 |
<reedy@tin> |
Purged l10n cache for 1.27.0-wmf.5 |
[production] |
11:22 |
<reedy@tin> |
Synchronized php-1.27.0-wmf.7/extensions/WikimediaMaintenance/refreshMessageBlobs.php: Less waiting for slaves (duration: 00m 28s) |
[production] |
11:13 |
<reedy@tin> |
Synchronized docroot and w: Add jobqueue-labs to noc (duration: 00m 28s) |
[production] |
08:59 |
<bblack> |
offlined db1019 megacli disk 32:11 |
[production] |
06:09 |
<l10nupdate@tin> |
ResourceLoader cache refresh completed at Sat Dec 5 06:09:07 UTC 2015 (duration 3h 44m 18s) |
[production] |
02:24 |
<mwdeploy@tin> |
sync-l10n completed (1.27.0-wmf.7) (duration: 09m 59s) |
[production] |
2015-12-04
§
|
21:44 |
<andrewbogott> |
disabling puppet on labcontrol1002 for ldap testing |
[production] |
21:36 |
<ori@tin> |
Synchronized php-1.27.0-wmf.7/includes/Hooks.php: Iba0138a: Don't install a custom error handler for hooks (T117553) (duration: 00m 28s) |
[production] |
20:28 |
<ori@tin> |
Synchronized wmf-config/jobqueue-eqiad.php: Idee6a1980: job queue: use instances on port 6378 as aggregators (duration: 00m 30s) |
[production] |
19:33 |
<Coren> |
switching master role to tools-grid-master |
[tools] |
19:24 |
<MaxSem> |
bumped portals |
[releng] |
19:21 |
<ori> |
krypton: updated Grafana to 2.6.0-beta1 for bug fix for issue 3422 |
[production] |
15:52 |
<Jeff_Green> |
add mx record for donate.wikimedia.org |
[production] |
15:33 |
<godog> |
ms-be2019 rebooted by itself, ilo event log shows "Uncorrectable Machine Check Exception (Board 0, Processor 2, APIC ID 0x00000038, Bank 0x00000003, Status 0xFE000040'00020135, Address 0x00000000'FEB82F63, Misc 0x00000000'00002285)" |
[production] |
09:15 |
<hashar> |
salt --show-timeout '*' cmd.run 'rm -fR /mnt/jenkins-workspace/workspace/mwext-qunit/src/skins/*' ( https://phabricator.wikimedia.org/T120349 ) |
[releng] |
08:52 |
<godog> |
reimage restbase1009 |
[production] |
05:59 |
<gwicke> |
ran systemctl mask cassandra on restbase1009; it is important that this node does not start up. |
[production] |
05:53 |
<gwicke> |
moved /var/lib/cassandra out of the way in an attempt to stop puppet restarting cassandra on decommissioned restbase1009 |
[production] |