Monthly Archive for December, 2013

Git Document Sharing

If you’re a Git user, you may have heard about SparkleShare, a clever tool that gives you a Dropbox-style interface for document sharing and collaboration backed by Git. Storing your documents in a SCM system like Git gives you strong version management of the documents and the ability to host the Git repository on your own servers, along with easier collaboration between software teams and document writers. The only thing better than Git document sharing is…Git document sharing backed by Git MultiSite!

One of WANdisco’s talented crew in Sheffield ran a proof-of-concept with SparkleShare on two machines, each instance using a different Git MultiSite node as a Git remote. It works as expected, which is no surprise, and Git MultiSite can replicate any Git repository. Setup is simple:

  • Set up two or more Git MultiSite nodes with a replicated repository.

  • Set up SparkleShare on two machines, each using a different Git MultiSite node as a Git remote.

  • Add documents in one SparkleShare folder and see them appear on the other machine.

The process is simple and convenient, and now your SparkleShare folder is backed by a Git repository that enjoys zero down time, LAN speeds at every location, strong security, and all the other benefits of Git MultiSite. That means you can have SparkleShare folders automatically available at remote sites, and use replication groups to figure out where SparkleShare data is replicated.

If you’re looking for a Git document sharing solution that works in the enterprise, this proof-of-concept is a great place to start.


The Open Source Wave in SCM

A recent Forrester survey has confirmed what those of us working in the ALM space have seen coming for several years: the open source wave has hit SCM.  The wave isn’t on the horizon, it’s not something you need to prepare for someday – it has well and truly arrived.

The numbers tell the tale.  As you can see in the infographic below, Subversion and Git lead the enterprise SCM market with a share of 28.8%.  Subversion is a stable and mature system proven at scale in challenging environments, and is widely accepted in mainstream enterprise development organizations.  Git is now moving past the early adopter phase.  And of course Subversion and Git are the dominant SCM solutions for open source projects.

SCM Adoption

SCM Adoption

The gradual adoption of proven open source technologies in the enterprise should come as no surprise.  We’ve seen this trend before with the Apache web server (51% market share), Linux data center servers (23% of revenue), and Android (81% of devices shipped).

Why did the open source wave arrive in SCM over the last couple of years?  (Depending on your perspective, you may be thinking ‘why did it take so long’ or ‘how did it happen so fast’.)  A few trend lines converged at the right time.

First, the face of enterprise development is changing.  The software industry is widely adopting lean development principles like the Scaled Agile Framework and continuous delivery.  Subversion and Git are well suited for the workflows that support lean development.

Second, Git and particularly Subversion have matured both in features and in commercial support options.  Maintaining enough internal expertise to be completely self-sufficient is both expensive and difficult, so the rise of a thriving ecosystem of commercial vendors around an open source project is a hallmark of enterprise adoption.  Vendors like WANdisco both sponsor future development and provide enterprise support, services, and products for Subversion and Git.

The future is indeed bright for Subversion and Git.  The 21% of developers not using SCM will likely adopt an open source SCM solution – why would they look anywhere else – while the 17% using legacy solutions will likewise look to Subversion and Git as the logical upgrade paths.  Economics will eventually drive a good part of the ‘everything else’ category into the Subversion and Git camps as well.  If you’re looking to make the move, our quick reference cards on Subversion and Git are a good place to start.

The open source wave in SCM is here.  Are you ready?

Storing Binary Files Efficiently with Subversion

In the course of running a recent performance test, I remembered another big advantage that Subversion has when used for managing large digital assets. Subversion practices deduplication (also known as “rep sharing”) in its back-end storage system.

That can result in considerably large savings in terms of costly storage. Of course, Subversion doesn’t create physical server-side copies of data when branching, but you may find that you save more than 20% of storage capacity thanks to deduplication.  Sometimes users copy files to stand up new projects, particularly game artists who may not be familiar with SCM.

It’s great that Subversion makes this so easy.  And it’s also surprising that Perforce, a system known for handling large binary data, doesn’t provide any deduplication out of the box.  You must (carefully) script it yourself or rely on more expensive storage solutions to provide deduplication.  The savings quickly add up when you use Subversion instead.


Repository Storage

Repository Storage when importing binary files


Subversion deduplication is enabled by default, although you can toggle the setting in your repository’s db/fsfs.conf file.  So relax – you don’t need to do anything to take advantage of this capability.  Of course if you have any questions our team of Subversion experts is here to help!


Monitoring Subversion and Git Repository Activity

There’s nothing as frustrating as trying to diagnose a slow Subversion or Git repository. You might spend a lot of time digging through logs and system monitoring tools before finally discovering that someone is submitting a 2GB file that needs to transfer from Singapore to Boston. That’s why SVN MultiSite Plus and Git MultiSite give you built-in tools for monitoring repository activity.

In the administration console I can see how many transactions are pending for a particular repository.


Repository Transactions

Repository Transactions

I can also see transactions pending for a particular server.


Transactions per Server

Transactions per Server

In addition to viewing the number of transactions pending for repositories and servers, I can drill down to see more details about the repository events to try and pin down what’s causing a hang-up.

Monitoring a big Subversion or Git deployment is challenging and requires several types of tools, but the quick view of pending transactions gives you a fast sense of whether there are a lot of transactions stacked up waiting to process. To get a sense of typical system load over time, you can always inject these data points into a monitoring tool like Graphite.

Interested in learning more? Give us a call and see a demo or start a free trial!





Faster Subversion working copy updates

Subversion 1.8 Caches Pristine Data to Reduce Data Transfer

One of the less noticed improvements in Subversion 1.8 is the efficient caching of pristine file data in a workspace.  This improvement can actually result in much faster working copy updates in many cases.

Subversion workspaces that contain multiple branches will often have duplicate copies of the same files.  Every time you checked out or updated those files, you’d download duplicate copies of the pristine file for use in the .svn directory as well.  Subversion 1.8 now checks if the pristine data cache already has a file with the same checksum, and will avoid downloading duplicate copies.

If you have a large workspace with several branches this improvement can result in much faster checkouts and updates, particularly if you’re working over a slow connection.  To get a sense of the improvement, I set up a Subversion 1.7 server and loaded in the Hadoop 2.0.5 source code.  I made two new branches, then checked out a working copy with all three branches.  On Subversion 1.7 Wireshark showed that I transferred about 49 MB of data during the checkout.  On a Subversion 1.8 server that was down to 21 MB – a reduction of almost 60%.

Branching is cheap and easy in Subversion , so it’s great that Subversion is now smarter about not sending duplicate data.  Of course if you work with media or documentation you can end up with duplicate files in the same branch, so this improvement is a big help in that situation as well.

The efficiency improvement is impressive, and marks another milestone in Subversion’s performance story for larger digital assets.  Want to try it?  Grab a certified Subversion 1.8 release.