mastodon.social is one of the many independent Mastodon servers you can use to participate in the fediverse.
The original server operated by the Mastodon gGmbH non-profit

Administered by:

Server stats:

380K
active users

nixCraft 🐧

PSA: Test your backups by doing a full restore.
Please find out how many hours it takes to restore your data. You will be surprised by the results. 🧑🏻‍💻

@nixCraft I've always said, it's not a backup service, it's a restore service..... and it's definitely not an archive service!

@nixCraft And while we're at it here's another #psa : RAID is not a backup; it's a continue functioning redundancy for both SDDs and HDDs or a speedup for HDDs.

@leean00

@nixCraft That is a myth that doesn't help anyone RAID is always a backup, but it doesn't provide the highest level of protection . The best backups have multiple redundant copies in different geographical locations. What disaster are you worried about - the number one for most people should be disk failure as this is common and a good RAID is protection against that. Number two should be deleting files, RAID with snapshots does that. Number three is theft of your computer or computer destroyed in fire, this can only be protected against by having backups in different geographical locations. You should also be concerned with your backup media fails, and ransomware attackers (perhaps this should be number one concern). RAID is a useful part of a backup system. RAID is very expensive and so not recommended for any backup that you are not using regularly, but if money is no object RAID has the fastest restore time of them all.

@leean00 @nixCraft What if you physically remove half of disks from RAID and rotate them? : ]

@nixCraft A really good way to test your backups is to reinstall your OS

@nixCraft , that's probably a good idea. I should probably do that. Time to spin up a virtual machine. But my backups check themselves for integrity after each backup, makes me test the key manually every two months. I really need to get a Synology 4 drive Fire/Water proof backup system for my server.

@soundconjurer @nixCraft Probably better off spending money on an off-site solution, if you don't already have that as part of your setup, rather than beefing up a local component that's still vulnerable to other local problems.

@hatter @nixCraft , definitely a very logical answer. I am just entirely self reliant on everything, with that unnecessary preference, I'd probably be best just getting a multidrive bay, maybe a 4 drive, then backup to that and put it in a fire/water-proof safe. Then store it at my sibling's place on the other side of the state. I live in NW Florida and they live right on the Atlantic. We're unlikely to ever have the same event affect both of us.

@nixCraft

Well if you have no backups you have instant restore! And you need no storage space.

Win-Win

(Edit: Obvious, this is a joke. I run regular backups on all my devices. Including phones. I have not tried a full restore yet though)

@nixCraft I had an employer once who demanded that we made sure to be able to restore backups of our streaming platform within 8 hours. We were talking about almost 60 TB of data which had to come from a second location over public lines while the second location remained in production thus severely limiting the transfer speeds.
He was not amused when we told him this was not possible.

@valheru @nixCraft I’m still laughing thinking about his “demand.” If you could keep the smile off your face then you may have a future in professional poker.

@patmikemid @nixCraft well, since his demand came with a lot of yelling and accusing us of being incompetent I didn't smile, I did start looking for a new job though 😉

@nixCraft
The meme aside, I find it really hard how to restore a backup because there are quite some things on my server. Makes it even more important...
But setting up ansible for the whole thing is quite a hassle too :/

Any advice?

@nixCraft
You never know until you put it to the test.
Every time I've fallen short, but if you never test you never know.

@nixCraft that's great advice...that few individual people actually have the resources to take. I dunno about you, but I don't have an extra PC lying around so I can test restores on it, nor do I have the many hours it would take to come up with and write and maintain the tests that would establish that restoring was actually successful. As is often the case, the cobbler's children go barefoot.

Let's be very, very real about this. How often do you test restores of your desktop (or laptop, if you don't use a desktop) computer?

@nixCraft And have someone in your team do the restore following the SOP.

@nixCraft Seen it that the person who wrote and runs the backups is the one who does the restore and it works perfectly and the rest of the time other team members who have little or no experience doing the restore and usually encounters issues.

We follow the rule that when we do our BR/DR the team rookie does the restore process if they have never done a BR/DR and are using current SOPs. If the rookie did the last BR/DR then the team member who has not done it in years is nominated.

Timelines are part of the restore exercise and the actual results are logged for the future.

Amazing what you find when doing restores.

@nixCraft
I would still love to get a reasonably priced tape drive as an addition to my disk based backup server.....

@nixCraft
If you backup onto tape - especially if on multiple tapes, especially in robbys - regularly make a backup of the backup server, or at least of its config and the tape catalog onto round-robined USB storages which you keep offline.
Otherwise you’ll have to read through ALL THE TAPES, COMPLETELY, before you can even tell what you might be able to restore.
Think along 1-5 days per tape and tapedrive.
Now look at your tape library with 450 tapes.
Then think again.

@nixCraft I know this is boring in the age of Cloud/NoOps but
1) you don't have a working backup system until you've verified you can restore from it.
2) RPO/RTO are important. If you don't know what those are and you're responsible for the backup system, I strongly suggest you do some reading

@nixCraft when I worked in storage, the "laws" of backups were:

1) Very few make backups.
2) Even fewer test their backups.
3) Untested backups usually fail in some way.

@lizakowski @nixCraft
Yes. I always tell clients, "If you make a backup, but you don't test restoring from that backup, then in actuality you do not have a backup."

@nixCraft When you need to restore, you have to

* copy the backup back to your server (you do keep them off the server? do you?)
* have the tool to restore ready!
* you need to know where to restore the data
* you clean the destination up (if broken files are left)
* you restore from the backup...

You need to know the time required including the transfer time (1 or 100 GB makes a difference!)

As your DB or data directories grow, you need to repeat this to keep those number up to date!

@nixCraft If there ist no defined and pracised process for desaster-recovery, there is no desaster-recovery.

If you just make backups, you should better save money and time by not doing it.

@studiohope

@nixCraft there are reasonable odds that despite being untested your backups are there and someone can restore. Tested backps have better odds but any backup is likely better than none.

@nixCraft dara restore should take about an hour. Restoring all services would be a pain, TBH

@nixCraft I did an equivalent, of getting data off a failing hard drive. Disks are large enough that a full coup would take 4-5 hours.

Or in this case, even longer because the hard drive stopped working whenever it read a bad sector, and needing to be turned off and on.

@nixCraft *prints this on a poster board*

*turns that poster into a wearable sandwich board where the other side says "RESTORING ONE SINGLE FILE FROM THE BACKUP SERVICE TELLS YOU JACK SHIT ABOUT YOUR ABILITY TO HIT RESTORE TIME AND RESTORE POINT OBJECTIVES AND IS IRRELEVANT IN DETERMINING WHETHER SERVICE RECOVERY IS EVEN POSSIBLE"*

*parades in front of $JOB_IT_DEPARTMENT building just wordlessly screaming through a bullhorn at every single person entering or leaving*

@gnomon Psst let's not ask about cold-starting your environment from either a total power loss or a disaster recovery scenario.

(We walked through a mostly theoretical exercise on this at work at one point, although we did do a proof of concept set of restores from our offsite backups along with bringing up core servers. It was interesting and useful.)