Hey admin colleagues

1) Do you run CEPH?
2) If so, do you have a SSD backed pool?
3) Do you use this pool for RBD images?

If you've answered all three with yes, do you run fstrim inside these RBD devices? (If yes please leave a comment so we can maybe reach out to you.) If you don't do it for a reason please leave a comment.

@ops yep, but also when I still had HDDs below, as fstrim on an rbd image reduces the number of objects in use - thin provisioning and stuff

Discard for the SSDs below is issued by bluestore already

@LittleFox the performance with HDDs here was too bad and HDDs to cheap.

Do you see major performance issues during the fstrim run on your VMs? How big is your cluster?

Follow

@ops the fstrim operation takes some time (at most around a minute per image), but that's mostly due to that only being half-automated and not run very frequently

For the few volumes with automated fstrim, I don't see any performance problems so far - but mostly use these volumes as k8s PVs, not disks for virtual machines, so they don't serve an OS but only the application itself

@LittleFox how big are these images? A minute is quite short if I look at our timings. Could you tell something about the number of OSDs and the current write rate in IOPS across the pool?

@ops most images (again, serving only single applications most of the time) are around 8GiB, but there are some outliers up to 300GiB (which actually serves as VM disk)

Gonna run a fstrim on that big disk now and report wall time and IOPS while it's running

@LittleFox ok thats far to small to compare it to our clusters. Thanks for your information anyway.

@ops alright, good luck finding more suitable comparation data

@ops fwiw full fstrim on the 300GiB took 159s with up to 1.5k IOPS reported by ceph - filesystem mixed ext4, xfs and btrfs, with most data on btrfs

@ops it's currently a 3 node / 4 OSD cluster, at it's biggest it had 6 nodes and 6 OSDs (but at that point all nodes were virtual servers rented from netcup)

All current nodes are k8s control plane and ceph control plane nodes, as well as ceph storage and k8s worker nodes - but runs quite well and fstrim never was any problem so far (had enough others)

Sign in to participate in the conversation
Mastodon

The original server operated by the Mastodon gGmbH non-profit