Recently I wrote about Git MultiSite’s reliable Git replication – guaranteed data safety, zero down time, and no concern about replicas falling out of sync. These are huge benefits, but do they come at a cost? To the skeptical, ‘reliable replication’ sounds like overhead. I did some simple testing to find out, and I’m happy to report that the overhead is insignificant compared to the value of reliable replication. What’s more, using Git MultiSite makes you less vulnerable to the effects of network latency.
I wanted to measure three things:
Pure overhead of replication algorithm. Compared to pushing directly to a bare Git repo on a remote server, how much overhead do you see when pushing to a local Git MultiSite node in a replication group with a remote node?
Effect of increasing latency. As the latency between the user and the bare Git master repo or the remote Git MultiSite node increases, what happens to performance?
Effect of guaranteed content delivery. What’s the additional overhead imposed by guaranteeing replication of content to at least one other remote or local node before the push succeeds?
Just for fun I decided to include some other name brand Git repository management solutions in the test.
The test configuration is shown below.
The test program would push 50 times to each remote in round-robin fashion. After each run I would increase the latency, using these values:
32 ms (e.g. coast to coast in US)
64 ms (e.g. trans-Atlantic)
128 ms (e.g. trans-Pacific)
256 ms (e.g. US to India)
Since most Git repository management solutions offer only read-only mirrors, pushes for those systems go directly to the master repo. All pushes were done over SSH.
I used three different Git MultiSite configurations and measured the push completion times for each. First I used the two-node configuration shown in the diagram without guaranteed content delivery (i.e. measuring the overhead of the replication algorithm). Next I used the two-node configuration with guaranteed content delivery to the remote node over SSL. In both of these cases the remote node had to participate in the replication proposal. Finally, I used the three-node configuration with guaranteed content delivery to one node over SSL.
To account for any outliers in the data I used the median value of the results for each Git remote.
The graphs below summarize the results.
This chart shows the median push time for every system tested. Let’s drill down into the most common Git MultiSite configurations compared to bare Git repos.
It’s easy to see that there is a nominal amount of overhead with no latency. The gap quickly closes and Git MultiSite even pulls ahead with higher latency.
This chart shows the delta between any Git system’s performance with no latency and its performance at higher latencies. In other words, it’s the penalty you pay with a given system when latency increases. It’s very clear that Git MultiSite (the three bars on the right of each group) has the best relative performance as latency increases.
Replicable Replication: Low Overhead, Better Over a WAN
As the results show, the basic overhead for the replication algorithm, even with content delivery, is minimal. Compared to the benefits of 100% confidence in the replication system, it’s a small price to pay.
Of course, real world performance will vary based on usage. Guaranteed content delivery of very large commits will take a bit of time, but that’s why Git MultiSite gives you the ability to control how and where data is replicated and how many nodes must receive the push before it is accepted. You can choose the balance of performance and redundancy that makes sense for you.
Finally, for those of you supporting remote sites, you’ll appreciate that Git MultiSite’s performance holds up well to increased latency.