It's really important to back your files up. Most people don't think about it that much, but it really matters. On top of that, backups must be partially recoverable. A backup is of no use if you have to load the entire (possibly massive) backup onto existing hardware just to pull one file out.
It's almost kicked me in the butt several times that I hadn't backed up my data reliably. ClassiCube probably wouldn't still be around (at least not in the same capacity as it is now) if I didn't get lucky with a backup made shortly before a huge unrecoverable disk failure.
At home, I have a NAS with a nice chunky hard drive that runs a python script every night at 11pm, which rsync's data from all of my linux servers on the internet. It's really helpful for me, so maybe you'll find it useful too:
import os,sys
os.chdir(os.path.dirname(sys.argv[0]))
DEBUG_RUN = False
BACKUP = {
"example_name": {
"host": "root@192.168.0.44", # must be reachable with current server's ssh key
"source": [
"/etc/nginx", # nginx configs
"/etc/mysql", # mariadb configs
"/var/lib/mysql", # mysql data (won't be perfect but should be recoverable)
"/home" # home directory (why not?)
]
},
}
def main():
for server in BACKUP:
s = BACKUP[server]
save_dir = server
args = ['rsync', '-trvz', '--links']
if DEBUG_RUN:
args.append("--dry-run")
if type(s['source']) is str:
sources = [s['source']]
else:
sources = s['source']
for source in sources:
if source.startswith('/'):
source = source[1:]
levels = source.split('/')
if levels > 1:
curlvl = ''
for level in levels[:-1]:
curlvl = "{}{}".format(curlvl,level)
args.append('--include="{}"'.format(curlvl))
curlvl = "{}/".format(curlvl)
if not source.endswith('/'):
source = '{}/'.format(source)
source = '{}***'.format(source)
args.append('--include="{}"'.format(source))
args.append('--exclude="***"')
args.append('-e')
args.append('"ssh"')
args.append("{}:/".format(s['host']))
args.append("share/backups/{}".format(server))
run(args)
run(['chmod', '777', 'share/backups/', '-R']) # clobber the permissions so I can view them on smb
def run(args):
print(" ".join(args))
su = os.system(" ".join(args))
if __name__ == '__main__':
main()
Thanks to this, I rest easy knowing that my data is at least somewhat safe
what if I have a housefire or the drives in my NAS die?
I haven't gotten that far ahead yet - recurring payment solutions for more than random bits haven't really been in my budget until the last year, so I've just barely scratched the surface of the research I need to do.
Backblaze B2 sounds like a good fit, though. Some people recommend using Amazon S3 Glacier, but the costs of continual storage are higher than they sound, and forget about trying to retrieve that data. ouch!
Having an off-site backup is just as important as an on-site one. Just as remote hard drives can die, any number of things could happen to my NAS to render it inoperable - flooding, fire, building collapse, and more!
why backups are important
Backups are important to me for two primary reasons.
- Society as a whole is built up on the knowledge of the past. As such, maintaining archival backups of old code, old documentation, old data, etc can provide knowledge that helps you into the future. Looking back on old data is fun and brings back memories. Looking at old code helps remind me why I didn't use x or y for whatever.
- Recovery from hardware failure. This one is obvious and it's why most people make backups.
There's endless reasons why backups might be important to somebody, but those are my driving points.
The best data recovery is having a backup. There's no guarantee that you'll be able to recover anything from a dead drive.
Even if you don't think something is important now, maybe in 10 years a family member will die and you'll really wish you had that picture of you two together still.