Prometheus host metrics and graphing just solved a mysterious 'machine out of memory without OOM notification' we had earlier today. I feel like I just got a win.

(In retrospect it's not much of a mystery, we'd just forgotten that we were enforcing strict overcommit limits on this class of machines.)

