Backup strategy for a web server

Hi

I’m setting up a web server and need a strategy to back it up. I want to be able to bring a backup service on line if the primary service fails, without having to do much thinking. Here’s what I think might be a strategy. The questions are: will it work and what have I overlooked?

I thought I would duplicate the partitions from the primary server onto a backup server.

Dynamic content: There are several web servers, one for each of several domain names. The document roots for these are on a /home partition at /home/username/public_html where there’s a separate “username/public_html” for each domain name. I propose to use rsync to update /home periodically over the network, mapping to the same locations on the backup server. Will rsync do the job over a Samba network, transferring only files that have changed?

Static content: all the directories other than /home are static (except I suppose /tmp). These I would update to the backup server only occasionally rather than periodically; e.g. after working on the root filesystem of the server. Is there any reason why rsync wouldn’t work for that too?

Thanks for your advice
Swerdna

bump – any comments?

I was going to comment, but then you mentioned Samba and I was wondering if Windows filesystems came into it with all the attendant file permission and timestamp details, and pushed it into the too hard basket.

If the primary and backup servers are Linux, there should be no problems at all keeping them in sync using rsync over ssh.

OK, I’ll use that, thanks.
(don’t know why I specified samba – unclear thinking).

Err, I’m nervous about doing this, because I’m in danger of teaching my grandmother to suck eggs, but you asked for comments and you can always ignore them…

Is there any reason why rsync wouldn’t work for that too?

Mmmm, rsync. Or Unison. Unison. Or Rsync.

From what I’ve read (and this may be the irrelevant raving of a disturbed mind) there are some circumstances in which Unison is more efficient than Rsync by being cleverer at just transmitting diffs, but, if this makes a critical difference to you, I would suggest that you check it out, because my memory on this can’t be relied on.

…mapping to the same locations on the backup server…

Rather than home/server1/somedata and home/server2/somedata (etc, etc). Have I misunderstood? My assumption was that you were backing up several web servers to one backup server and that if you don’t seperate the data sets from the different web servers, there will be problems? Or are you saying that the data is primarily duplicated (and, if you are saying that, I’m going to worry, but not necessarily about the right thing).

OK, and now I’m going to get controversial. This isn’t the real deal, it doesn’t do the full job. You’ve got an active copy of the data and a near line copy, which is good, but IMHO, its not enough. I would want to be writing out copies of that data to some removable media (maybe not on every occasion that you grab a new near-line backup) and that raises the issue of timing. I’m guessing that you are intending writing to near line storage in an off-peak period, if you have them. The inference then is that you would be writing from near line to removable in an on-peak period. (Which is convenient for changing media, but may be inconvenient for network utilisation.)

Does that make sense? If it does, just check data rates and see whether everything is likely to get done in the time windows that you have available. Depending on how the system is architected, you may find youself thinking about second network cards in order not to overload the network with the backup traffic, while normal traffic is also going on (assuming that there isn’t a dead period in which this can occur - that would certainly make things easier, but I assume that you are nominally in a 24x7 world).

Err, I’m nervous about doing this, because I’m in danger of teaching my grandmother to suck eggs, but you asked for comments and you can always ignore them…
Thanks for your reply. One thing is certain in Linux: there are always several ways to suck the egg.

Unison: I’ve not been aware of it before you pointed to it. I see that it’s in the openSUSE repos and it looks impressive. I’ll look at it carefully.

You said:

OK, and now I’m going to get controversial. This isn’t the real deal, it doesn’t do the full job. You’ve got an active copy of the data and a near line copy, which is good, but IMHO, its not enough.
Losing some recently-added data if the store burns down isn’t a critical issue in this case. But of course, you’re right and an off-site copy is easy enough to make from time to time, so I will, and losses would be acceptable.

Losing some recently-added data if the store burns down isn’t a critical issue in this case. But of course, you’re right and an off-site copy is easy enough to make from time to time, so I will, and losses would be acceptable.

Well, this is all dependant upon context, so you are probably right.

The big advantage of having a near line copy is that you can copy that copy to removable media without beating the performance of the main system to death; as the backup media are (usually) slow, this can be a big advantage.

The other thing is that losing data through what I’ll call an ‘idiot-user’ problem (including the ‘completely non-idiotic, but distracted, sys admin’ problem) is probably several times more common than straightforward ‘woke up this morning, my hard disk had died…oooh, oooh I’ve got those sys admin blues’ problem. This is something that the idots/fools/candidates for promotion to management who say ‘we don’t need a backup, we’ve got raid’ completely overlook; even with near-line, once you’ve deleted a file from your main data, you can loose it from your near line copy fairly quickly.

Point/s understood. Fortunately I’ve got as much time as I need to get this right (in my own mind) b4 going live.