Data Auditing in Subversion

I’ve been writing a lot lately about the new features in Subversion 1.8, but there’s a little nugget in Subversion 1.7 that just caught my attention recently. I knew that Subversion stored md5 checksums of files in the repository, but I wasn’t quite sure how to easily access that information. The svnrdump command introduced in Subversion 1.7 provides the answer, and makes data auditing in Subversion much easier.

So why is this important? Well, to put it bluntly, stuff happens to data: it may be corrupted due to hardware failure, lost due to improper backup procedures, or purposely damaged by someone with bad intentions. Subversion MultiSite can protect you against all the vagaries of hardware and network, but if you work in a regulated environment you will someday have to prove that the data you took out of Subversion is the same as the data you put in.

That’s where the checksums come in. Let’s say I check out an important file from Subversion, like a configuration script or a data file with sensitive information. I can easily compare a local checksum against the checksum on the server to see if they match.

> md5sum BigDataFile.csv
3eba79a554754ac31fa0ade31cd0efe5  BigDataFile.csv
> svnrdump dump svn://myrepo/trunk/BigDataFile.csv
Text-content-md5: 3eba79a554754ac31fa0ade31cd0efe5

Simple enough, and very easy to script for automated auditing. If you store any important data in Subversion in a regulated environment, this simple feature is another way to help satisfy any compliance concerns about data integrity.

If you have any regulatory or compliance concerns around Subversion then grab the latest certified binaries, ask us for advice, or try out SVN MultiSite’s 100% data safety capability.


1 Response to “Data Auditing in Subversion”

  • I do not see how this is useful. For example, you have always been able to do this:

    $ svn cat | md5

    But if you do this:

    $ svnrdump dump

    You are going to get output for all 1,524,177 + revisions in the repository. Try it.

    Also, keep in mind that repository files do not have keywords expanded so the checksum in the repository does not necessarily match what you will have in your working copy if you use keywords, or if you have different line endings. Using svn cat causes the keywords and line endings to be normalized so that you can do compare apples to apples.

Leave a Reply