When moving many files across the wire, for example via ftp, sometimes files get corrupted or are transferred partially. Just because two files have the same size it does not mean that they have successfully been transferred. This problem can manifest itself when moving many thousands of files. I usually use either of two methods to verify that files have been transferred properly:

(a) Use file checksums (e.g. using md5deep, sha256deep).
(b) Use rsync’s checksum option.

Using file checksums

To create hashes for a tree of files use the md5deep utility, or the included sha256deep, which computes a message digest for each file using the SHA-256 algorithm. The md5deep project is a collection of utilities using a variety of hashing algorithms to create a checksum for each file. That list of checksums can be then used on the receiving side of a transfer to verify the integrity of the transferred files.

To create a list of file checksums, having relative paths on the transmitting side:

cd files_to_send
sha256deep -rl * > hashes.sha

Now, copy the hashes.sha file to the receiving side, along with the files. To check against this file on the receiving side:

cd files_received
sha256deep -rl -x hashes.sha * > files_did_not_match.txt

The files_did_not_match.txt file will contain any files that have problems. Resend them and verify them again.

Using rsync –checksum

You can also use rsync’s built-in checksum functionality to upload files while verifying their checksum:

rsync -av --checksum user@source.host.com:/uploads/files_to_send/* \
                     user@destination.host.com:/uploads/files_received/ \
                     > sync.log

This is a one-step operation with less hassle, but implies you have rsync. There are many things you can do with rsync, so it is recommended to take a look into its various options if you transfer files, mirror, or sync between hosts often. On Windows, you can download cwrsync, a packaging of cygwin and rsync.

0