I’ve been writing a lot lately about the new features in Subversion 1.8, but there’s a little nugget in Subversion 1.7 that just caught my attention recently. I knew that Subversion stored md5 checksums of files in the repository, but I wasn’t quite sure how to easily access that information. The svnrdump command introduced in Subversion 1.7 provides the answer, and makes data auditing in Subversion much easier.
So why is this important? Well, to put it bluntly, stuff happens to data: it may be corrupted due to hardware failure, lost due to improper backup procedures, or purposely damaged by someone with bad intentions. Subversion MultiSite can protect you against all the vagaries of hardware and network, but if you work in a regulated environment you will someday have to prove that the data you took out of Subversion is the same as the data you put in.
That’s where the checksums come in. Let’s say I check out an important file from Subversion, like a configuration script or a data file with sensitive information. I can easily compare a local checksum against the checksum on the server to see if they match.
> md5sum BigDataFile.csv
> svnrdump dump svn://myrepo/trunk/BigDataFile.csv
Simple enough, and very easy to script for automated auditing. If you store any important data in Subversion in a regulated environment, this simple feature is another way to help satisfy any compliance concerns about data integrity.