I'm also the kind of person who will spin up a virtual machine to build some file-based ZFS pools in order to test the vdev names that my code generates for raidz and draid vdevs to verify that they match the names the zpool command generates. Draid vdevs have very complicated names, and I hope they can't be reshaped.

(I have real machines running ZFS but I wasn't about to assemble some test pools on them; that goes on scratch VMs that can explode without problems. Just in case.)

Show thread

It turns out that I am the kind of person who will make a big ball of intertwined changes, publish it as that to get it out there, and then go back to carefully construct a series of individual changes (mostly) in a plausible order so that people could theoretically cherry-pick some of them and it looks neat.

(I did wind up with two changes intertwined because I wasn't going to actually build and test a split-apart version.)

Show thread

I've now published the modified version of zfs_exporter that we're using on Linux to expose ZFS IO stats to Prometheus. github.com/siebenmann/zfs_expo (branch cks-upstream, which should be the default).

Life is healing: the graduate students are making coffee in their lounge again. They weren't for a while and I wondered what was wrong (it wasn't a broken coffee-making machine).

The problem with having a bunch of ongoing virtual machines on my desktop is that I have to turn them on every so often and update them. Well, I don't have to, I just feel twitchy if I let them sit out of date (powered off) for too long.

It turns out that basically all of our SSH authentication probes are coming from one remote AWS machine trying to hammer on the root account of one server (something that would never succeed). I only discovered this when I started pulling stuff out of Grafana Loki out of curiosity.

(Now I have a new dashboard.)

Show thread

I'm an experienced sysadmin and only today, after <redacted> years, have I firewalled incoming IPv6 connections to client devices on my network. IPv4? Implicitly firewalled years ago. IPv6? Until now, sail on through, because it was just a bit too much of a pain to get around to (and who attacks things via IPv6, really).

At some point I'm going to write a techblog entry on why I'm finding Grafana Loki to be the lazy person's log grep+awk. It's not as straightforward an issue as you might think, since you do have to learn LogQL (which is not PromQL and has irritating differences and omissions).

Show thread

Stupid Grafana Loki trick of the time interval: how much do our server clocks drift over a day for machines that adjust them periodically via ntpdate?

sum(sum_over_time({syslog_identifier="ntpdate"} |= "adjust time server " | pattern "adjust time server <server> offset <delta> sec" | unwrap delta [24h])) by (host)

Toronto's thunderstorm show has helpfully started while I'm still at work. Hopefully it will also finish before I need to bicycle back home. Thunder and pouring rain is nicely dramatic from inside and unpleasantly dramatic from outside.

“User Interface”? You mean clicking randomly on the ⁝ and ☰ and ⚙️ buttons trying to figure out what's in each one?

Current status: finished just over 85 km of bicycling, aka 'cooked'. TBN's "Richmond Hill and Dale" had a lot of hill (much of it a slow long ascent hidden in meandering through suburbia) and not as much dale as I like. We also got to see some still-surviving (for the moment) farmland.


(The route is only 54 km or so, but I rode to and from the start point, as I always do because I'm crazy.)

Having tracked how much an IPMI's idea of time drifts across our modest fleet for a while, well, we're now resetting the IPMI time once a day.

I remain terrible at putting computer hardware I've bought into actual use, because I am very lazy about things like 'opening up my case' and 'shuffling everything around'.

Current status: treating Grafana dashboard panels somewhat like high explosives. If they aren't solving my problems, just add more.

(This is not how I get my 'why is the system doing that' questions answered, but it feels productive.)

Current Toronto storm status: well, there goes my power. This is certain a dramatic storm, but I hope the power comes back on before too long (certainly before the UPS runs out).

One reason Loki isn't a clear slam-dunk for us is that we run systems, not applications, and Linux system logs are both voluminous and very noisy (and also everything logs messages in different formats and ways). Still, I could do basic things like put kernel log messages on our per-system dashboards. Maybe a log volume summary would also be useful.

I'll have to experiment to see what information is worth presenting (the eternal dashboard problem).

Show thread

We now have a working (Grafana) Loki setup that's ingesting logs from our modest fleet (in addition to our existing central syslog server). It works, but I'm not sure what to do with it now.

(This is unlike Prometheus, which was clearly immediately useful for showing us the status and performance of various things.)

Show thread

I should probably get a CO2 monitor or two, but every time I think about doing this I go stare at Amazon listings for a bit then quietly back away for various reasons. My kingdom for a 'Linux users, buy this' answer, or at least a 'this doesn't demand you install a phone app and is pleasant to use' one.

Let me tell you, don't rely on flash for cold storage of important data. I learned it the hard way, despite knowing better.

A short thread. (1/n)

Show older

The original server operated by the Mastodon gGmbH non-profit