Friday 24th May 2013 

About


This tutorial explains how to create a fully functional data synchronisation script using the Perl scripting language.

Such a script can be used to backup up large amounts of data from a primary data store to a removable drive in a very short amount of time. The high speed is due to the script only processing deltas, i.e. files that have been modified since the last backup. On my machine I obtain results of 50GB in one minute, although this obviously depends on the magnitude of the deltas each time.

Prerequisites


ActivePerl
Version used:5.8.8.820 (Win32)
Download file:ActivePerl-5.8.8.820-MSWin32-x86-274739.msi
Description:ActivePerl is a multi-platform Perl distribution used to interpret and run Perl scripts.

Notepad++
Version used:4.0.2 (Win32)
Download file:npp.4.0.2.Installer.exe
Description:Notepad++ is a free source code editor which supports a multitude of programming languages, and contains a myriad of special features.

Backup drive
Your backup drive can take the form of any removable drive. The script will also work if backing up to a different location on the same drive or a separate disk partition. However, doing real backups to the same drive is definitely not recommended!
Note: this script will not work when backing up to CD or DVD, or any once-writable media.

Main script body


Include directives


The code begins with a set of Perl library include directives, namely:
use strict;Restricts use of undeclared variables, undeclared subroutines, and symbolic references.
use File::Glob ':glob';Overrides the default core 'glob' function with the improved 'File::Glob'. We must use this since CORE::Glob will treat internal spaces as delimiters.
use File::Copy;Includes the 'copy' and 'move' functions. Note that 'move' is not used since it will not move directories.
use File::Basename;Includes the 'fileparse' function which takes a filepath and splits it into base path, file name and file extension.

Define primary data stores


Next we define our primary data stores that we require to be backed up. These must be absolute paths and include the drive letter, e.g. "E:/primaryDataStore1".
You may specify as many primary data stores as you wish.
Here three stores are specified:



Define backup data store


Now we define the location of the backup drive. This can be an absolute path, or a path relative to the script's location. It is recommended to place the script in the root level of your backup drive and define the backup location simply as ".". This is shown below.



Create output action log file


We use a log file to record all actions executed by the script. The log file name is timestamped with the current data and time, thus a new log file will be created for each run.



We open the log file and write a message to it, as well as a message to stdout to let the user know the script is executing.



Validate primary store and backup store definitions


We now check that we have valid primary store and backup path definitions, and that the backup location exists.



Traverse each primary store directory


For each primary store directory defined in @primaryStoreDirs we check if that directory exists. Non-existent directories are skipped. If it does exist, then two recursive functions are called: 'ReplicateDirTree' and 'MoveRedundantBackupFiles'. These are discussed in detail in a later section.



End of script


The script has now finished sucessfully, so close any file handles and write the end time to the log file.



Subroutines


This section describes the subroutines called from the main script body.

cexit


A trivial subroutine that closes any open file handles before calling the standard 'exit' function.



ReplicateDirTree


This function replicates a directory structure from one location to another. It traverses each file in each directory and checks whether that file exists in the target location. The file comparison is achieved by checking the file size and file modification timestamp.

In this subroutine, after obtaining the input parameters, we check if our 'fromDir' exists, and if not, we exit with an error. If 'toDir' does not exist, then we create it.



Next we traverse all files in 'fromDir'. For each file we check if it is a directory, and if so then we make a recursive call to 'ReplicateDirTree' so as to act on that subdirectory. If the file is not a directory then we perform the file check mentioned above on the mirrored copy in 'toDir'. If the check fails or the file does not exist in 'toDir' we copy it there from 'fromDir'.



MoveRedundantBackupFiles


This function traverses the directory 'backDir' to search for files and directories deleted from 'primDir' which should no longer exist in the backup image. If found, these files are moved from 'backDir' to 'primStoreDeletedDir'. Thus note that no files are actually deleted; that is left to the user to perform after they have inspected the contents of 'primStoreDeletedDir'.



CompareFiles


This subroutine uses the 'stat' function to obtain file information. It compares two files by size and modification timestamp, and returns one if they match, and zero otherwise.



WinMove


This (Windows only) function will move files or whole folders from one location to another. Overwrite can be specified as a third argument. The functions returns one on success and zero otherwise. This function is superior to the 'move' function in File::Copy which will not move entire directories.



Script


Download (ZIP file - 2.35KB).