2013-03-09

DISASTER RECOVERY








On the need to do backup

A view from a philosophical and from a practical perspective.




THE CASE

Recently I survived a rather nasty case of data loss, or TOTAL SYSTEMIC FAILURE as it looked like. At this moment data is moslty restored and new preliminary backups were just made. No data was really lost, only time and my health. And as the occasion arised I'd like to share some observations on the subject of backups.

I can't be 100% sure what actually happened.
It seems to be a chain of unrelated events that just coincided in time.

FIRST one on my systems stopped booting up (I know I did the mess). I decided to restore system from install disc. But stupid installer decided that some of my disks suddenly require a chkdsk. After that lots of files (yet unbackuped) just dissapeared from one partition. Also the reinstall finally failed and I had to switch to backup image of system partition.
The files were meant to be backuped - later, just a loose collection of stuff to be sorted and then archived later (like old mails, some mp3 lectures, some photos, some installers). After painstaking NTFS undelete I ran another chkdsk only to discover that all files are back there again.
Annoying waste of time, but no harm done.

THEN one impatient hard reset too many (I guess that is the reason, not a virus, no HD failure, hm.. Well my PC fails to start sometimes and it turns itself off then just after power on - that however remotely may have been the reason too) did some more nasty things to my other partitions.
First I discovered that apparently one program I created (that was tested and working ok) started to say some weird dll entry didn't exists in kernel32.dll.
The binary was just several bits defferent than it should be (why there was no some CRC fail - I do not know).
I checked it by diff on hexdump-s of one copy of apparently bad and one copy of good exe.
So I tried to recompile. but then I realized my BDS compiler stopped working (one dll got broken in the same way). Fortunatelly reinstall fixed it.
But then I found out that the sources don't compile anymore!!! Damn, were I furious!
Some random bits were also changed in one or two files - 'd' changed to 'e' or so. Text files don't have checksums.
But HDD should somehow stay consistent none the less, right? I never imagined this could REALLY happen. But, well... I was WRONG!

So I spent last 6 days painfully checking all my projects against latest backups, doing diffs and restoring what seemed corrupted.
Older projects untouched for long time had good and consistent (old) backups. Also the were not affected. Actually, most projects were clean.
But some - recent - had some files corrupted. And some of those recent had no good backups yet as they were just heavilly changed.

A lot of work, no profit. Not to mention system restoration and extensive AV checks.


FIRST OF ALL

Do backups. I really mean it. Do them. Do them often. And do them smart.
Do backups. I always repeat that to everyone and on every occasion. But I ALMOST failed in that practice myself.
If you do (create) anything worth doing - it is also worth backing up.
If what you do is not worth a backup, it is probably not worth doing. You'd be better off going for a walk.
One can say: what you do is perhaps less important than the way in which you backup it.

And by smart I mean:
1) implement backup policy that works for you, eg. so you won't be too lazy to do backup in an overcompilcated way
2) automate backup procedures if you can,
3) check if backups are restorable,
4) store backups safely (safe place, but also on multiple media - extrernal offline drive + DVD + more),
5) consider using remote backup 'in the cloud',
6) encrypt what is sensitive,
7) be paranoid (despite what people may think, it can be useful),
8) learn from your (or, preferably, others) mistakes and adapt


ADMIN VS PROGRAMMER

I think it is highly beneficial for admins to be able to think like a programmer. And for a programmer to think like an admin. At least from time to time. Especially the latter. Take the time to understand the need to prepare for distaters. Take steps to prepare. And - again - do backups or at least make sure they are done by people responsible. But don't blindly assume they will do their job. Better take care of your data yourself too. Of course you may be too limited to do that in your particular work environment. I write this from a freelancer's point of view. But ask yourself this: what good is if you commit to your SVN every 30 seconds if there is no backup of that SVN?


TWO CORPORATE STORIES

Story 1:

There was allegedly an admin in one Polish telecom that had a saying "Real men don't do backups". I don't know if the story is true, but it is said that he didn't work there for long. (*)

Story 2:

My old workplace (10 years ago). There has always been one admin assigned to backups (or the backups were responsibility of one). And that one was always most likely to be fired (**). Of course we didn't know that back then. But now I can see it clearly. We did one or maybe two incremental tape backups every day (VSS, Exchange, other stuff, don't remember really, but it's not important), also one extra full backup every week. The tapes had like 100GB back then, which was a lot. But they were slow, and they were faulty.
When the distater happened - and it happened sometimes - and it very soon got obvious that the tape is unreadable or that yesterday's backup didn't really run... there has always been a mess.

(*) it wasn't me
(**) I was the third in the row


THE METHOD - FREELANCE VS ORGANISATION MAY DIFFER

Of course: Implement the method that is best for you.

I personally don't use SVN for my own projects, that is when there is no team collaboration (most of my projects). I prefer fast and simple (for me) method of just zip-ing project's folder and naming it with date, exe version and number. Till now for each project in its root I had 'zipstorage' subfolder for such backups.
Then, periodically, I would copy latest zip-s to offline USB drive (sometimes encrypted) or pendrive (in that case ENCRYPTED). When? When I feel like it (THAT WAS A MISTAKE). For example after major milestone or when I think it's time for 'backup day'. In effect each project would have multiple backups on multiple destination media. But in no consistent fashion (THAT WAS A MISTAKE). And a lot zip-s in 'zipstorage'. Another 'backup day' arrives when I do 'projects snapshot' DVD - current snapshot of everything is burnt - usually unzipped. The 'big backup day' is when I copy some selected (newest * most important) zip-s to multiple destinations at one time. I even once had it scripted - one click would do nicely named zip-s via USB, but it didn't work (THAT IS TO BE CORRECTED). Probably I'm still going to do most of my backups manually. The only difference: I'll probably use some remote storage (after encryption).

Advantages of my method:
- easy,
- woks against enthropy - any file can be destroyed, but probably not all of them, so the more copies the better,
- full control - what, when, where.
- full backup (all/whole files are duplicated) - this can be disadvantage too for big, seldom changing files.

Disadvantages:
- done manually,
- need to rember to do backup (zip may soon become too old to restore from it),
- difficult to restore after major failure,
- I now know that one common 'zipstorages' is better - it is easier (faster) to backup multiple project snapshots w/o zips in them.


A WORD ABOUT THE TOOLS

I really bless the author or authors of Windiff. I'd be lost without it. OR I'd had to invent one myself. That was the miracle tool I used to check my projects (one by one) against zipped-unzpped-again backups. Also plain chkdsk appears really effective in fixing NTFS partitions. Other cool software I'd like to mention here: 7Zip, True Crypt, DrvImageXP. The last one makes backup of whole partitions. It helped me to aviod reinstalling the whole system from scratch.


SUMMARY

It is simple. Tools may vary, methods may vary, policies may vary.
Don't learn it the hard way. SAVE YOUR ASS, DO BACKUPS!


POST SCRIPTUM

This may have been a virus after all. But again I can't be 100% sure.
AVAST found one (Win32:Malware-gen) in some MSI installer in \%system%\Installers.
It came with Spectaculator (I payed some good money for that).
But it can still be a false positive. I will try to verify...

I still don't feel sure if all is really OK with my system.
Some weird bahavior I saw (that never happened before) could indicate some virus:
- very slow compilation,
- clipboard not working with screen capture images in some programs,
- DNS suddenly not resolving popular addresses,
- http traffic lost while other protocols worked fine.
Maybe there was a virus and it just layed dormant until now?

One thing is sure: my backups saved me. But I should do them more often.


No comments:

Post a Comment