Prometheus host metrics and graphing just solved a mysterious 'machine out of memory without OOM notification' we had earlier today. I feel like I just got a win.

(In retrospect it's not much of a mystery, we'd just forgotten that we were enforcing strict overcommit limits on this class of machines.)

Sign in to participate in the conversation

Follow friends and discover new ones. Publish anything you want: links, pictures, text, video. This server is run by the main developers of the Mastodon project. Everyone is welcome as long as you follow our code of conduct!