4951-5000 of 10000 results (41ms)
2020-06-11 ยง
13:35 <filippo@cumin1001> conftool action : set/pooled=false; selector: dnsdisc=thanos-swift,name=eqiad [production]
13:33 <wm-bot> <zppixbot> auto-update@website: Synced website repo in 95.s [tools.zppixbot]
13:16 <wm-bot> <zppixbot> auto-update@website: Synced website repo in 45.s [tools.zppixbot]
12:42 <arturo> introduce puppet profile 'toolsbeta-docker-registry' and relocate some hiera config there [toolsbeta]
12:39 <ayounsi@cumin1001> END (PASS) - Cookbook sre.network.prepare-upgrade (exit_code=0) [production]
12:39 <arturo> for the record, k8s etcd servers certificate changed (puppet based) and k8s just kept working [toolsbeta]
12:36 <elukey> updated pcc facts [production]
12:35 <arturo> according to `aborrero@cloud-cumin-01:~$ sudo cumin --force -x 'O{project:toolsbeta}' 'run-puppet-agent'` we are mostly back in business [toolsbeta]
12:28 <jayme@deploy1001> helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' . [production]
12:28 <jayme@deploy1001> helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-main' for release 'production' . [production]
12:28 <marostegui@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [production]
12:25 <marostegui@cumin1001> START - Cookbook sre.hosts.downtime [production]
12:15 <jayme@deploy1001> helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' . [production]
12:15 <jayme@deploy1001> helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' . [production]
12:14 <arturo> try switching all VMs to toolsbeta-puppetmaster-04 [toolsbeta]
12:14 <arturo> poweroff toolsbeta-puppetmaster-03 [toolsbeta]
12:12 <arturo> copy over labs/private from toolsbeta-puppetmaster-03 to toolsbeta-puppetmaster-04 [toolsbeta]
12:04 <jforrester@deploy1001> Synchronized php-1.35.0-wmf.36/includes/title/NamespaceInfo.php: T253098 NamespaceInfo::makeValidNamespace: Don't throw for -1 or -2 (duration: 01m 06s) [production]
12:03 <marostegui> Reimage es2023 (es5 codfw master) [production]
11:54 <marostegui@cumin1001> dbctl commit (dc=all): 'Repool db2075 T254139', diff saved to https://phabricator.wikimedia.org/P11469 and previous config saved to /var/cache/conftool/dbconfig/20200611-115430-marostegui.json [production]
11:53 <arturo> create VM toolsbeta-puppetmaster-04 [toolsbeta]
11:46 <marostegui> Deploy schema change on s6 codfw - T250066 [production]
11:44 <volans@deploy1001> Finished deploy [homer/deploy@df83901]: Release v0.2.3 (duration: 00m 25s) [production]
11:44 <volans@deploy1001> Started deploy [homer/deploy@df83901]: Release v0.2.3 [production]
11:36 <ayounsi@cumin1001> START - Cookbook sre.network.prepare-upgrade [production]
11:36 <matthiasmullie> EU BACON done [production]
11:35 <arturo> try reinstalling the python3 stack in toolsbeta-puppetmaster-03, because everything python-related segfaults [toolsbeta]
11:35 <mlitn@deploy1001> Synchronized php-1.35.0-wmf.36/extensions/GrowthExperiments: Help panel: Update guidance behavior rules (duration: 01m 06s) [production]
11:34 <jayme@deploy1001> helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' . [production]
11:34 <jayme@deploy1001> helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' . [production]
11:33 <arturo> reboot toolsbeta-puppetmaster-03 to try cleaning up potential kernel/filesystem problems [toolsbeta]
11:32 <arturo> apparently every python script segfaults in toolsbeta-puppetmaster-03 [toolsbeta]
11:28 <kartik@deploy1001> Synchronized php-1.35.0-wmf.36/extensions/ContentTranslation/modules/tools/mw.cx.tools.IssueTrackingTool.js: Backport: [[gerrit|604587|IssueTrackingTool: Fix js error in getCurrentNodeId method (T254965)]] (duration: 01m 07s) [production]
11:27 <arturo> puppetdb wasn't the problem. The problem is puppet-enc segfaulting in toolsbeta-puppetmaster-03 [toolsbeta]
11:21 <arturo> puppet not working bc puppetdb, run `aborrero@toolsbeta-puppetdb-02:~ $ sudo systemctl restart puppetdb` [toolsbeta]
11:11 <arturo> deployed nginx-ingress for some early testing (not definitive) with code https://github.com/crookedstorm/paws/commit/bee62b3fd57f9804aa27e7b8b41fde50bd93df94 (T195217) [paws]
11:08 <jayme@deploy1001> helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' . [production]
11:04 <legoktm> restarting everything after gerrit-replica 502s fixed T255094 T255125 [codesearch]
11:04 <mlitn@deploy1001> Synchronized php-1.35.0-wmf.36/extensions/MachineVision: $aliases should be an array of strings, not AliasGroup objects (duration: 01m 07s) [production]
10:47 <moritzm> repooling mw1318,mw2139,mw2145,mw2147,mw2221,mw2219,mw2250,mw2350 (these were depooled, but seem all fine in Icinga and were probably just forgotten) [production]
10:41 <filippo@cumin1001> conftool action : set/pooled=yes; selector: cluster=thanos,service=thanos-swift [production]
10:40 <filippo@cumin1001> conftool action : set/pooled=yes; selector: cluster=thanos,service=thanos-query [production]
10:37 <Urbanecm> Run `update page set page_content_model="json" where page_content_model = "CollaborationListContent" OR page_content_model = "CollaborationHubContent";` at beta enwiki (T255107) [releng]
10:37 <moritzm> installing buster kernel security updates (no reboots yet, on hold for regression-free microcode update) [production]
10:32 <godog> roll-restart pybal in eqiad lvs low-traffic [production]
10:21 <mutante> restarting gerrit on gerrit-replica (gerrit2001) - java.lang.OutOfMemoryError: Java heap space [production]
10:21 <Urbanecm> Run scap pull at mwdebug1001 to revert temporary changes [production]
10:18 <RhinosF1> tools.zppixbot-test@tools-sgebastion-08:~$ grep -r -D skip "last_event_at" (in case anything seems slow, may take a while, please don't kill anything while I do it) END [tools.zppixbot-test]
10:15 <arturo> added role (just a label) for ingress nodes: `kubectl label node paws-k8s-ingress-1 kubernetes.io/role=ingress` (T195217) [paws]
10:14 <RhinosF1> tools.zppixbot-test@tools-sgebastion-08:~$ grep -r -D skip "last_event_at" (in case anything seems slow, may take a while, please don't kill anything while I do it) [tools.zppixbot-test]