Really sorry for this situation. Some instances continue down but I can give you an update on what is happening.
One of the database servers stopped responding at 8:30 pm UTC and after the team in the data center made an intervention they detected that one disk in the RAID had failed.
All servers that I use have RAID 1 and in theory a disk failure shouldn't bring the server down but this one did and worst I can't get it to boot now.
OK, it all looks good and everything is running smoothly.
This was by far the biggest downtime of Masto.host, gladly it was partial and on affected around 10% of the hosted services.
Still, I am really sorry that this happened. I am exhausted (it's 3:24 am for me) and tomorrow I will think about finding solutions for dealing with situations like this.
Ideally it would be to create redundancy but it's hard to do that without increasing prices. It's a hard balance.
Thank you for your patience and for making it possible for me to run this fun project.
And just as a reminder, I leave here the 4th paragraph from Masto.host Terms of Service: https://masto.host/tos/
... I was dreading a day like this for over 4 years.
@mastohost keep strong, this is a bad situation to be in, but everyone will appreciate your clear communication.
@nattukaran mostly because all options were bad. Either recover the backups and lose over 12 hours of data or wait and cross my fingers I could bring the server back up once the disk was replaced.
Gladly I was able to do it but it took way more time than it should.
The silver lining is that I learned a couple more things to deal with this kind of issues faster in the future.
@mastohost please don't be so hard on yourself. These things happen, it's how you respond that is the most important thing.
@duncanhart Well, in the moment it's hard for me to be clear headed but now looking back it's much easier to gain perspective. Thanks :)
@mastohost the heat and intensity of a 'crisis' blind us and we forget the wider, bigger perspectives. You're human, we all are and the feelings you experienced are because you care about the service.
@duncanhart Yep, pretty true. Still, was only able to sleep 3 hours because I was pretty wired. Let's see if I can chill now :)
@mastohost You’re awesome, Hugo. Thank you for getting it all fixed so quickly and for the detailed report. Hope you’re getting some rest now :)
@mastohost Sometimes stuff happens that's beyond our control. Not your fault! Used to work for a server hosting company, I know how that goes XD Thanks for your hard work 💜
@welshpixie Thanks for understanding. It's just hard to keep cool when the situation is happening. Today I can look at it under a different light but still there are couple of lessons I can take from it. 🤗
Server run by the main developers of the project It is not focused on any particular niche interest - everyone is welcome as long as you follow our code of conduct!