There is currently an issue with a database server that stopped responding. I am waiting for a reply from the data center to know what could be causing this.

Some instances are currently down, I am trying to solve the issue as fast as possible and will update here when I know something.

Really sorry for this situation. Some instances continue down but I can give you an update on what is happening.

One of the database servers stopped responding at 8:30 pm UTC and after the team in the data center made an intervention they detected that one disk in the RAID had failed.

All servers that I use have RAID 1 and in theory a disk failure shouldn't bring the server down but this one did and worst I can't get it to boot now.

1/3

So, I have requested the faulty disk to be replaced and I am waiting for that to be done.

Last case scenario, I will restore the database backups for affected Mastodon servers and there could be a lost of around 12 hours of data as that was the time of the last backup.

2/3

I am so sorry for this situation and will do my best to try and bring the failed instances up as soon as possible. I will keep this thread updated.

3/3

OK, we are back online! :blobsweats:

The data center replaced the disk and I rebuilt the RAID and all is there. No data should have been lost.

I will look more just to be certain and will be back with more information.

Follow

I will now do a restart of all instances to clear out all processes and be sure nothing is stuck.

Less than 30 seconds of downtime during this process.

· · Web · 1 · 1 · 6

OK, it all looks good and everything is running smoothly.

This was by far the biggest downtime of Masto.host, gladly it was partial and on affected around 10% of the hosted services.

Still, I am really sorry that this happened. I am exhausted (it's 3:24 am for me) and tomorrow I will think about finding solutions for dealing with situations like this.

Ideally it would be to create redundancy but it's hard to do that without increasing prices. It's a hard balance.

Thank you for your patience and for making it possible for me to run this fun project.

And just as a reminder, I leave here the 4th paragraph from Masto.host Terms of Service: masto.host/tos/

... I was dreading a day like this for over 4 years.

@nattukaran mostly because all options were bad. Either recover the backups and lose over 12 hours of data or wait and cross my fingers I could bring the server back up once the disk was replaced.
Gladly I was able to do it but it took way more time than it should.
The silver lining is that I learned a couple more things to deal with this kind of issues faster in the future.

@mastohost Ok😀
You know what they say experience is the best teacher.

@mastohost please don't be so hard on yourself. These things happen, it's how you respond that is the most important thing.

@duncanhart Well, in the moment it's hard for me to be clear headed but now looking back it's much easier to gain perspective. Thanks :)

@mastohost the heat and intensity of a 'crisis' blind us and we forget the wider, bigger perspectives. You're human, we all are and the feelings you experienced are because you care about the service.

@duncanhart Yep, pretty true. Still, was only able to sleep 3 hours because I was pretty wired. Let's see if I can chill now :)

@mastohost You’re awesome, Hugo. Thank you for getting it all fixed so quickly and for the detailed report. Hope you’re getting some rest now :)

💕

@aral Thank you so much Aral, I slept 11 hours today :) Feeling much better

Sign in to participate in the conversation
Mastodon

Server run by the main developers of the project 🐘 It is not focused on any particular niche interest - everyone is welcome as long as you follow our code of conduct!