When moving many files across the wire, for example via ftp, sometimes files get corrupted or are transferred partially. Just because two files have the same size it does not mean that they have successfully been transferred. This problem can manifest itself when moving many thousands of files. I usually use either of two methods to verify that files have been transferred properly:
(a) Use file checksums (e.g. using md5deep, sha256deep).
(b) Use rsync’s checksum option.
Using file checksums
To create hashes for a tree of files use the md5deep utility, or the included sha256deep, which computes a message digest for each file using the SHA-256 algorithm. The md5deep project is a collection of utilities using a variety of hashing algorithms to create a checksum for each file. That list of checksums can be then used on the receiving side of a transfer to verify the integrity of the transferred files.
To create a list of file checksums, having relative paths on the transmitting side:
cd files_to_send sha256deep -rl * > hashes.sha
Now, copy the hashes.sha
file to the receiving side, along with the files. To check against this file on the receiving side:
cd files_received sha256deep -rl -x hashes.sha * > files_did_not_match.txt
The files_did_not_match.txt
file will contain any files that have problems. Resend them and verify them again.
Using rsync –checksum
You can also use rsync’s built-in checksum functionality to upload files while verifying their checksum:
rsync -av --checksum user@source.host.com:/uploads/files_to_send/* \ user@destination.host.com:/uploads/files_received/ \ > sync.log
This is a one-step operation with less hassle, but implies you have rsync. There are many things you can do with rsync, so it is recommended to take a look into its various options if you transfer files, mirror, or sync between hosts often. On Windows, you can download cwrsync, a packaging of cygwin and rsync.