This tutorial explains how to create a fully functional data synchronisation script using the Perl scripting language.
Such a script can be used to backup up large amounts of data from a primary data store to a removable drive in a very short amount of time. The high speed is due to the script only processing deltas, i.e. files that have been modified since the last backup. On my machine I obtain results of 50GB in one minute, although this obviously depends on the magnitude of the deltas each time.
Main script body
The code begins with a set of Perl library include directives, namely:
Define primary data stores
Next we define our primary data stores that we require to be backed up. These must be absolute paths and include the drive letter, e.g. "E:/primaryDataStore1".
You may specify as many primary data stores as you wish.
Here three stores are specified:
Define backup data store
Now we define the location of the backup drive. This can be an absolute path, or a path relative to the script's location. It is recommended to place the script in the root level of your backup drive and define the backup location simply as ".". This is shown below.
Create output action log file
We use a log file to record all actions executed by the script. The log file name is timestamped with the current data and time, thus a new log file will be created for each run.
We open the log file and write a message to it, as well as a message to stdout to let the user know the script is executing.
Validate primary store and backup store definitions
We now check that we have valid primary store and backup path definitions, and that the backup location exists.
Traverse each primary store directory
For each primary store directory defined in @primaryStoreDirs we check if that directory exists. Non-existent directories are skipped. If it does exist, then two recursive functions are called: 'ReplicateDirTree' and 'MoveRedundantBackupFiles'. These are discussed in detail in a later section.
End of script
The script has now finished sucessfully, so close any file handles and write the end time to the log file.
This section describes the subroutines called from the main script body.
A trivial subroutine that closes any open file handles before calling the standard 'exit' function.
This function replicates a directory structure from one location to another. It traverses each file in each directory and checks whether that file exists in the target location. The file comparison is achieved by checking the file size and file modification timestamp.
In this subroutine, after obtaining the input parameters, we check if our 'fromDir' exists, and if not, we exit with an error. If 'toDir' does not exist, then we create it.
Next we traverse all files in 'fromDir'. For each file we check if it is a directory, and if so then we make a recursive call to 'ReplicateDirTree' so as to act on that subdirectory. If the file is not a directory then we perform the file check mentioned above on the mirrored copy in 'toDir'. If the check fails or the file does not exist in 'toDir' we copy it there from 'fromDir'.
This function traverses the directory 'backDir' to search for files and directories deleted from 'primDir' which should no longer exist in the backup image. If found, these files are moved from 'backDir' to 'primStoreDeletedDir'. Thus note that no files are actually deleted; that is left to the user to perform after they have inspected the contents of 'primStoreDeletedDir'.
This subroutine uses the 'stat' function to obtain file information. It compares two files by size and modification timestamp, and returns one if they match, and zero otherwise.
This (Windows only) function will move files or whole folders from one location to another. Overwrite can be specified as a third argument. The functions returns one on success and zero otherwise. This function is superior to the 'move' function in File::Copy which will not move entire directories.
Download (ZIP file - 2.35KB).