Subversion 1.8 Caches Pristine Data to Reduce Data Transfer
One of the less noticed improvements in Subversion 1.8 is the efficient caching of pristine file data in a workspace. This improvement can actually result in much faster working copy updates in many cases.
Subversion workspaces that contain multiple branches will often have duplicate copies of the same files. Every time you checked out or updated those files, you’d download duplicate copies of the pristine file for use in the .svn directory as well. Subversion 1.8 now checks if the pristine data cache already has a file with the same checksum, and will avoid downloading duplicate copies.
If you have a large workspace with several branches this improvement can result in much faster checkouts and updates, particularly if you’re working over a slow connection. To get a sense of the improvement, I set up a Subversion 1.7 server and loaded in the Hadoop 2.0.5 source code. I made two new branches, then checked out a working copy with all three branches. On Subversion 1.7 Wireshark showed that I transferred about 49 MB of data during the checkout. On a Subversion 1.8 server that was down to 21 MB – a reduction of almost 60%.
Branching is cheap and easy in Subversion , so it’s great that Subversion is now smarter about not sending duplicate data. Of course if you work with media or documentation you can end up with duplicate files in the same branch, so this improvement is a big help in that situation as well.