Thank you for your patience and for making it possible for me to run this fun project.

And just as a reminder, I leave here the 4th paragraph from Masto.host Terms of Service: masto.host/tos/

... I was dreading a day like this for over 4 years.

Show thread

OK, it all looks good and everything is running smoothly.

This was by far the biggest downtime of Masto.host, gladly it was partial and on affected around 10% of the hosted services.

Still, I am really sorry that this happened. I am exhausted (it's 3:24 am for me) and tomorrow I will think about finding solutions for dealing with situations like this.

Ideally it would be to create redundancy but it's hard to do that without increasing prices. It's a hard balance.

Show thread

I will now do a restart of all instances to clear out all processes and be sure nothing is stuck.

Less than 30 seconds of downtime during this process.

Show thread

OK, we are back online! :blobsweats:

The data center replaced the disk and I rebuilt the RAID and all is there. No data should have been lost.

I will look more just to be certain and will be back with more information.

Show thread

I am so sorry for this situation and will do my best to try and bring the failed instances up as soon as possible. I will keep this thread updated.

3/3

Show thread

So, I have requested the faulty disk to be replaced and I am waiting for that to be done.

Last case scenario, I will restore the database backups for affected Mastodon servers and there could be a lost of around 12 hours of data as that was the time of the last backup.

2/3

Show thread

Really sorry for this situation. Some instances continue down but I can give you an update on what is happening.

One of the database servers stopped responding at 8:30 pm UTC and after the team in the data center made an intervention they detected that one disk in the RAID had failed.

All servers that I use have RAID 1 and in theory a disk failure shouldn't bring the server down but this one did and worst I can't get it to boot now.

1/3

Show thread

There is currently an issue with a database server that stopped responding. I am waiting for a reply from the data center to know what could be causing this.

Some instances are currently down, I am trying to solve the issue as fast as possible and will update here when I know something.

I am migrating the email hosting for Masto.host to fastmail.com

In theory it is all working but DNS propagation/caches can cause some problems in the next hours. If you try to reach me via email and get a return message please let me know here or try again later.

Notice: this will only affect the email addressed to/from info@masto.host, all notifications sent by instances will continue to use the MaiGun SMTP.

Also, I will be updating the Privacy Policy to reflect this change.

Upgrade finished. All servers hosted by Masto.host are now running v3.4.1

Any issues please let me know 馃悩 thanks

Show thread

After testing v3.4.1 for 48 hours the issue that I reported has not caused any problems.

So, I will be starting the upgrade to v3.4.1 for every server on Masto.host. The downtime during upgrade should be less than 30 seconds.

You can see the v3.4.1 release notes here: github.com/tootsuite/mastodon/

Any questions please let me know.

Show thread

There is new release of Mastodon ( v3.4.1 - github.com/tootsuite/mastodon/ ) but it introduces a strange warning github.com/tootsuite/mastodon/ when running the upgrade commands.

It's probably not a problem and wouldn't affect the upgrade but out of an abundance of caution, I will wait to see if someone reports any problems related to this issue or if a patch is released.

So, the upgrade to v3.4.1 will be delayed for the time being but I will announce as soon as I have more information.

Upgrade finish. All servers on Masto.host are now running Mastodon v3.4.0

Any problems please let me know. Thanks 馃悩

Show thread

I will be starting the upgrade to Mastodon v3.4.0 for all servers hosted in Masto.host

During the upgrade there will be around 1 minute of downtime.

You can find the v3.4.0 release notes here: github.com/tootsuite/mastodon/

Any issues or questions please let me know.

Data center incident not affecting production 

Finally the new media backups have been copied to the new data center and the daily rotation is starting now. In the next couple of days it should be running daily for every account.

The old Object Storage (PCS) is still "Under investigation" by OVH to see if the data is recoverable: ovhcloud.com/en/lp/status-serv - still, even if it's recovered I will end up deleting it as the new backups are up to date.

Any questions please let me know.

Show thread

Data center incident not affecting production 

All going as expected, the backups for database, configuration files, SSL certificates and other config files will be fully copied today and will start the usual daily copy to the new data center tomorrow.

The media files it will take some days (or a couple of weeks) to be fully copied and doing the daily rotation that I had in place.

Still, hopefully the old backups will be retrieved but if not the new backup is already going.

Show thread

Data center incident not affecting production 

Replies from OVH about restoring their Object Storage from the data center have been vague. So, I can't be sure of a recovery of the backup data.

To try and be proactive in case of non recovery, I have started the backup from zero to another OVH data center in Warsaw.

The database backups from the past 2 days are currently being copied there and will start backing up media files later today.

Show thread

Data center incident not affecting production 

The reason to locate backups in a different data center is exactly to allow for a situation like the mentioned fire to be recovered from.

Will see if I can get a confirmation if the backup data was lost or not. The backup data was hosted on their Object Storage so hopefully it was not but can't be sure right now.

I will keep you posted when I have more information.

Show thread

Data center incident not affecting production 

There was a fire on one OVH data centers (Strasbourg) travaux.ovh.net/?do=details&id and from what I could understand the building is lost.

In case you don't know. I use OVH to host all of Masto.host but thankfully everything is hosted on a different data center in France, with the exception of backups. The backups were in the building that burned.

So, everything is working in production, just not sure if the backups from the last days were lost.

Because I am getting too many strange signups in the last 24 hours and most people don't read the Terms of Service, I added a Temporary Notice to Masto.host Pricing page: masto.host/pricing/

"If you are looking for an alternative to Parler, this is not it. Please look elsewhere. Thanks."

I hope that's clear enough :)

Show older
Mastodon

Server run by the main developers of the project 馃悩 It is not focused on any particular niche interest - everyone is welcome as long as you follow our code of conduct!