You are afraid of losing harddisk data, and you would like to make incremental backups of your data regularly. You are also afraid of spending time learning and configuring the machines involved for doing proper backups. When disasters do happen, touch wood, you want to access those backup data in their 'ordinary' form without needing to install extra big software to extract them out. What can you do? Well, use rsync, which is packaged in almost every GNU/Linux distribution. Ports also exist for other UNIX systems and Microsoft Windows.
There are many rsync tutorials online, but many are out-of-date too. Hopefully this quick guide helps you to start using rsync in a decent way.
Rsync behaves like cp or scp but uses an efficient algorithm to update the destination files. It computes the differences between two sets of files and performs only essential 'patches' to the set destination files, thus saving network bandwidth and making it a good choice for doing incremental backup.
To copy files from a local directory src to dest while avoiding copying files already in dest, do:
\$ rsync -av src dest
This copies all necessary files from src to dest (with option -a) recursively while perserving information about symlinks, owner, group, permissions, devices, and times. A list of files transferred and a brief summary will be shown (with option -v).
An important note about the existence of a trailing slash on the src directory.
dir): Copy the directory by name. A directory dest/src will be created.dir/): Copy the contents of this directory.Include it or not totally depends on how you structure your destination directory.
The following options are commonly used:
-c. This forces rsync to perform MD4 checksum instead of modification time to decide if a file needs to be transferred. (While the size of the file is always compared regardless of this option.) Without this option, more files than required could be transferred if you rsync between different partition types with different schemes of tracking modification times (e.g., an ext3 partition and a vfat partition), or if you rsync between two servers (when you perform remote backup as described next) that do not synchronize their times. This option should solves the problem.--modify-time=NUM. This instructs rsync to perform modification time comparison with a tolerance of NUM seconds. This is sometimes useful in some situations such as using --modify-time=1 for making transfers between an ext3 partition (with a 1-second time resolution) and a vfat partition (with a 2-second time resolution). The comparison is probably less accurate than that with MD4 checksums, but this method uses less processing time.--delete. This deletes files under dest that don't exist in src. This allows exact mirroring of the contents of src.--filter=PATTERN, --exclude=PATTERN, --exclude-from=FILE, --include=PATTERN, --include-from=FILE. This allows you to add sophisticated rules to select which files to transfer (include) and which files to skip (exclude). Multiple such options can be used. Refer to the rsync manual (man rsync) for futher details.--delete-excluded. Tells rsync to also delete files under dest that are excluded.-n (--dry-run). Perform a trial run and don't make any changes.For remote file transfer, simply specify src or dest in the form of [user@]host:dir. This is very similar to using scp. Yes, that simple! (Note: The source and the destination cannot be both remote machines.)
You may consider using the following option:
-z. This requests rsync to compress the file data as it is sent. The option helps to save the total time required especially for performing remote backup over a slow network connection.Normally you can perform the action successfully with a typical GNU/Linux system setup. In case of errors, check the following issues:
-e to specify an alternative remote shell.)A bit of my personal experience in studying rsync. (Skip this paragraph safely if you like.) I found that some online tutorials start to introduce the rsync daemon when talking about remote copying. Some didn't mention the use of the remote shell at all. After checking the OLDNEWS file in the rsync source, the reason seems to be that rsync did not support remote shell connection in its early versions. In particular, it switched to use the more secure ssh as the default remote shell since the version released in Jan 2004, where the adoption sounds a bit late. So some tutorials are probably out-of-date and can be misleading.
There are pros and cons for using rsync daemon for remote connection. You can predefine some named modules for convenient transfers, but you also need to configure the remote server properly before you can use it successfully. I don't use it personally as I find that the remote shell connection method is sufficient for most uses. Furthermore, sometimes I don't have enough privilege to set up the configuration on the remote server. Anyway, the setup steps required are outlined here. Please refer to the manual or other tutorials for details.
/etc/rsyncd.conf for essential configurations and define at least one module.Rsync is very powerful and can be used in many different ways (I counted about 100 options available for running rsync). People usually use cron jobs and shell scripts along with rsync for effective and efficient incremental backups. For example, by using the option --link-dest, which requests rsync to create hard links to unchanged files, along with a script to properly organize some directories in the destination, you generally need only a little extra space to make snapshots of the source files at different times. To know more, read the rsync manual and find out how others make use of it.
I have written a script that uses rsync for backing up Drupal-powered website.
Added a link to my script for website backup.
Added and improved the descriptions for --modify-window and -c options.