Monthly Archive for October, 2013

Thoughts on RICON West 2013

RICON bills itself as a “Distributed Systems Conference for Developers” but it also has in recent years increasingly morphed into a lively intersection of leading academics and distributed systems practitioners. WANdisco also parallels these worlds, on the one hand being a 5 year sponsor for Berkeley’s AMPLab, and on the other having deployed advanced distributed systems into enterprise production environments for almost a decade.

A recent and gratifying trend is the maturation of understanding of the Paxos algorithm as a practical solution for implementing distributed consensus, an essential building block for distributed systems. Even though we still heard some speakers repeat the common opinion that Paxos is “too hard to implement”, we also saw others dipping their toes into Paxos.  And as evidence of increased interest in coordination algorithms, one talk presented a “stepping stone” algorithm proven easier to teach to undergraduate computer science students.

Another area of beneficial progress is the growth of understanding around the subtleties of the CAP theorem. As expressed by Michael Bernstein, “now you have CAP, which is an acronym, which is super easy to make s–t up about.” Of course, what we’ve often heard are witty sounding but dead wrong simplifications about CP and AP tradeoffs.  As industry knowledge about distributed systems matures, real life implementations prove increasingly effective and durable.

There was also increased interest in methods for strengthening consistency in eventual consistency databases. Note that the entire subject of eventual consistency leaves us a little squeamish. As I wrote in Why Cassandra Lies to You, the eventual consistency model does not practically provide strong consistency.  In cases where true consistency is required, choosing the weaker BASE guarantee of eventual consistency will likely be a painful mistake.

Perhaps it is inevitable that distributed databases will eventually displace the relational databases powering the vast, churning machines of industry.  RICON is one window into that future.

How much git down time can you tolerate?

Enterprise SCM administrators realize the valuable service they’re providing to development organizations and strive to avoid outages, but exactly how costly is SCM downtime? Put another way, how much is avoiding Git downtime worth to a company that relies on Git for enterprise software development?

A recent study concluded that a data center outage costs an average of $5,600 per minute in general use cases. To get a more concrete number, assume the cost of a single developer is $50-$300/hour depending on location and faced with SCM downtime, a few hundred of them are unable to be productive.

A developer with a private Git repository or a local read-only mirror can still get some work done, but they can’t get the most recent work from other developers if the master Git repository is down, and they can’t commit their own work. The productivity loss factor may not be 100%, but it’s not trivial either. You also need to include the cost of any schedule impact – days matter when deadlines are looming.

That’s why WANdisco provides non-stop data solutions for Git and Subversion. Zero downtime  and continuous availability are guarantees, not perks. Enterprise SCM administrators can count on High Availability (HA) and Disaster Recovery (DR) out of the box, with an aggressive Recovery Point Objective (RPO) and Recovery Time Objective (RTO). When compared to Git MultiSite and SVN MultiSite, one-off home-baked solutions simply aren’t battle tested.

With WANdisco MultiSite products for Git and Subversion, every node in the deployment is a replicated peer node, and every node is fully writable. You can choose how these nodes behave by setting up different replication groups, but for the purposes of HA/DR, you might set up a deployment like this:

 

HA/DR configuration

HA/DR configuration

In this simplified view, users at two sites have a set of local nodes to use. If one node fails, failover to another is automated by a load balancer (the HA case). Note that all the nodes are active and thus also serve to improve overall performance. In a DR scenario, users at one site can simply switch their Git remote to the load balancer at the other site, which routes them to any of the fully writable nodes at the second site.

This setup is quite simple to achieve with Git MultiSite and a stock load balancer like HAProxy, giving you an effective zero downtime solution at very low cost.

How much downtime can you tolerate in your enterprise Git deployment? If the answer is close to zero, learn more about Git MultiSite or start a free trial.

 

 

Subversion 1.8.4 released!

Today the Apache Software Foundation (ASF) announced the release of Subversion 1.8.4, which features a number of bug fixes.

Apache Subversion 1.8.4 fixes include:

– revert: fix problems reverting moves

– translation updates for Swedish language users

– merge: reduce network connections for automatic merge

– fix crash on windows when piped command is interrupted

– fix assertion when upgrading old working copies

– fsfs: improve error message when unsupported fsfs format found

For a full list of all bug fixes and improvements, see the Apache changelog for Subversion 1.8.

You can download our fully tested, certified binaries for Subversion 1.8.4 free here.

WANdisco’s binaries are a complete, fully-tested version of Subversion based on the most recent stable release, including the latest fixes, and undergo the same rigorous quality assurance process that WANdisco uses for its enterprise products that support the world’s largest Subversion implementations.

Using TortoiseSVN?

To go along with the update to Apache Subversion we are pleased to announce an update to TortoiseSVN. You can download the latest version for free here.

WANdisco Announces SVN On-Demand Training for Administrators and Developers

Whether you’re looking to get started with Subversion or build your skills in managing large-scale Subversion deployments, WANdisco’s new SVN On-Demand Training offers courses designed for Subversion administrators and developers of all skill levels.

SVN On-Demand Training offers instruction to boost administrators’ and developers’ knowledge of Subversion and the library includes more than 30 videos and supporting reference materials. New material is being continually added for subscribers.

Some of the current SVN On-Demand Training courses include:

  • Introduction to Subversion

  • Subversion for Beginners

  • Intermediate Subversion

  • Advanced Subversion

SVN On-Demand Training is available now. Visit wandisco.com/training/subversion for more information and to request a quote.

Non-Stop Hadoop for Hortonworks HDP 2.0

As part of our partnership with Hortonworks, today we announced support for HDP 2.0. With its new YARN-based architecture, HDP 2.0 is the most flexible, complete and integrated Apache Hadoop distribution to date.

By combining WANdisco’s non-stop technology with HDP 2.0, WANdisco’s Non-Stop Hadoop for Hortonworks addresses critical enterprise requirements for global data availability so customers can better leverage Apache Hadoop. The solution delivers 100% uptime for large enterprises using HDP with automatic failover and recovery both within and across data centers. Whether a single server or an entire site goes down, HDP is always available.

“Hortonworks and WANdisco share the vision of delivering an enterprise-ready data platform for our mutual customers,” said David Richards, Chairman and CEO, WANdisco. “Non-Stop Hadoop for Hortonworks combines the YARN-based architecture of HDP 2.0 with WANdisco’s patented Non-Stop technology to deliver a solution that enables global enterprises to deploy Hadoop across multiple data centers with continuous availability.”

Stop by WANdisco’s booth (110) at Strata + Hadoop World in New York October 28-30 for a live demonstration and pick up an invitation to theCUBE Party @ #BigDataNYC Tuesday, October 29, 2013 from 6:00 to 9:00 PM at the Warwick Hotel, co-sponsored by Hortonworks and WANdisco.

Reliable Git Replication with Git MultiSite

Setting up Git replication to help with backups and scalability may seem easy: just use a read-only mirror. In reality, setting up reliable Git replication, particularly in a large global deployment, is much more difficult than simply creating a mirror. That’s where Git MultiSite comes in.

Replication is Hard

Let’s look at a few of the challenges involved in managing a Git deployment. If you’re an enterprise Git administrator, I’ll wager that you’ve run into several of these problems:

  1. Failures will happen – especially in a WAN environment. Network interruptions, hardware failures, user error: all of these factors interrupt the ‘golden path’ of simple master-mirror replication. Since Git doesn’t provide replication out of the box you need to either write your own tools or rely on a free mirror solution. In either case you won’t have a replication solution that stands up to every failure condition.

  2. Replicas get out of sync. When a failure happens, it must be reliably detected and corrective action must be taken. Otherwise, a replica can be out of sync and contain old or incorrect data without your knowledge.

  3. Replication should be the first tier in your High Availability / Disaster Recovery (HA/DR) plan. Your data is the most important thing you own and you want a multi-tiered strategy for keeping it safe. Unreliable replication takes away a vital part of that strategy. Plus, failover is hard. Even if you have a perfect backup, how quickly can you bring it online and redirect all of the connections to it? How do you fail back when the primary server is back online?

  4. Security is essential. Every Git mirror should be subject to the same access control rules, yet there is almost no capability to enforce that in most systems.

  5. The biggest sites need replication more than anyone. Are you running 50 Git mirrors to support a large distributed user base and build automation? How are you monitoring all of those mirrors?

Git MultiSite Solution

So how does Git MultiSite solve these problems?

All failure conditions are accounted for.

Git MultiSite uses a patented algorithm based on the Paxos design. That means that it accounts for all failure conditions in the algorithm itself. Dropped data, not enough nodes online to agree on a proposal – these are all accounted for in the design. It’s hard stuff, and that’s why we wrote a very long paper on it.

Easy monitoring and administration.

Git MultiSite monitors the consistency and availability of each replicated peer node. The administration console tells you at a glance if anything is out of sync.

MultiSite Dashboard

MultiSite Dashboard

Zero down time.

If a node is out of sync it will go offline automatically while work continues on other peer nodes.  Failover is accomplished instantly with a load balancer or manually by using a secondary Git remote. When a node recovers it will catch up automatically (failback) and start participating in new activity.

Consistent security across the deployment.

Access control is consistent across every node. Using flexible replication groups you can control where a repository is replicated and how it is used at a site.

 

Replication group

Replication group

Guaranteed data safety.

Every commit is guaranteed to reach at least one other node before it is accepted, guaranteeing the safety of your data. The content delivery system has several parameters you can adjust to suit your deployment.

How Important is Your Data?

If you need a zero down time solution with guaranteed data safety, start a free trial of Git MultiSite’s reliable Git replication. Our team of Git experts can help you get started.

Choosing Between Subversion and Git

Subversion and Git are the two dominant SCM systems in use today. Collectively they represent more than 85% of the open source projects and around 60% of the enterprise (private development) market. The numbers vary a bit depending on which survey you read, but the trends are clear. So how do you choose between Subversion and Git?

Much has been written about this topic already, and those of us who follow the SCM space could spend a very long time debating different features. I’m going to throw in my 2¢ with a focus on the things that matter most in large enterprise deployments.

Distributed

Git is a distributed SCM system. Subversion is not. Though this may seem like a substantial advantage for Git, it matters much less in the enterprise than it does for personal projects or at smaller shops. In the enterprise, Git is deployed a lot like Subversion, with master repositories that are secure, highly available, and controlled. Local operations in Git are faster; it’s easier for a small team to stand up a new Git repository for a skunk works project, and it’s easier for a road warrior to work from their laptop for a few days, but otherwise the central model of Subversion is not a key limitation.

Workflows

Subversion is very good at the mainline model but can be used in a lot of other ways. Git supports the mainline model, some workflows based on the pull request concept, and a stream-like workflow known as Git Flow. Tool selection in this area largely boils down to a matter of how you prefer to work.

If you collaborate frequently with teams outside the firewall, then Git is a solid choice. History in Git is not tightly coupled to a particular repository, making it very easy to push and pull changes between repositories on separate networks and even do it via sneakernet.

Maturity and the Cool Factor

Subversion has been around longer than Git and is widely used in large deployments. It’s a proven tool with a solid feature set. It has a few shortcomings but in most situations it just works. Subversion administrators know how to deploy, configure, secure, and maintain their systems.

Git is less of a known quantity in the enterprise, although that’s changing rapidly. Some of the enterprise parts of Git are still evolving (although the introduction of Git MultiSite has helped a lot), but Git is riding a wave of popularity, and that matters too. The next generation of developers is growing up on Git: they know it and prefer it.

Learning Curve

Subversion and Git are both very easy to learn for daily use and have good tool and plugin support, but Git’s learning curve gets very steep once you’re past the basics, and not every Git feature is exposed through a GUI.

Community and Future

Both Subversion and Git benefit from a strong open source community with commercial sponsors. Although the Git community has the momentum, both tools will be strong and viable for many years.

The Choice is Yours

Subversion and Git are powerful SCM tools that have different strengths. Whichever you choose, you can feel confident that the software and its community will be around for many years to come.  And if you’re using CVS or some other legacy SCM system, there’s no better time to move to one of the two powerhouse open source choices.

If you’re interested in training, support, data migration, or advice on how to use both Subversion and Git in tandem, call on our team of Subversion and Git experts.

Reporting on Subversion & Git Live 2013

We’re two-thirds of the way through Subversion & Git Live 2013, and I’d like to share a few observations before we take the show to London next week.

For this year’s conference, we also offered a Git track for the first time to go along with WANdisco’s enterprise Git products, services and support.  This proved quite popular, with good attendance at all sessions.  The response to the talks and the questions asked gave a good indication of enterprise software development’s nascent progress with adoption of Git.  In contrast, the Subversion sessions tended to be more closely focused around specific and deeper technical material, representing Subversion’s role as SCM workhorse for the enterprise, and Git as the new kid on the block.

That said, there were also some companies with years of experience supporting Git deployments involving thousands of users and tens of thousands of shared repositories.  The majority of attendees were in earlier stages of Git enterprise adoption, either with relatively few users or in initial evaluations.

Many found my intermediate-level talk: “Git Enterprise Challenges” to be sobering or even frightening. It was certainly not intended that way, but I can understand that reaction; Git poses yet unanswered questions for enterprise scale deployments. WANdisco is addressing a number of these with our Git MultiSite and Git Access Control products.

Although most of the sessions focused on either Git or Subversion topics, the reality is virtually every SCM administrator we talked to is seeing or thinking about supporting both Subversion and Git, along with a variety of legacy SCM systems throughout their organizations. Clearly, Subversion and Git will be co-deployed or in hybrid configurations for a long time to come.

There were so many interesting discussions; I’ll touch on a few topics in each article over the next few weeks. If there are any follow up questions on your mind, please leave them in the comments below and/or come see us in London on the 16th.

Reliable Git Replication: Low Overhead and Good WAN Performance

Recently I wrote about Git MultiSite’s reliable Git replication – guaranteed data safety, zero down time, and no concern about replicas falling out of sync. These are huge benefits, but do they come at a cost? To the skeptical, ‘reliable replication’ sounds like overhead. I did some simple testing to find out, and I’m happy to report that the overhead is insignificant compared to the value of reliable replication. What’s more, using Git MultiSite makes you less vulnerable to the effects of network latency.

Test Goals

I wanted to measure three things:

  • Pure overhead of replication algorithm. Compared to pushing directly to a bare Git repo on a remote server, how much overhead do you see when pushing to a local Git MultiSite node in a replication group with a remote node?

  • Effect of increasing latency. As the latency between the user and the bare Git master repo or the remote Git MultiSite node increases, what happens to performance?

  • Effect of guaranteed content delivery. What’s the additional overhead imposed by guaranteeing replication of content to at least one other remote or local node before the push succeeds?

Just for fun I decided to include some other name brand Git repository management solutions in the test.

Test Setup

The test configuration is shown below.

Test Configuration

Test Configuration

The test program would push 50 times to each remote in round-robin fashion. After each run I would increase the latency, using these values:

  • 0 ms

  • 32 ms (e.g. coast to coast in US)

  • 64 ms (e.g. trans-Atlantic)

  • 128 ms (e.g. trans-Pacific)

  • 256 ms (e.g. US to India)

Since most Git repository management solutions offer only read-only mirrors, pushes for those systems go directly to the master repo. All pushes were done over SSH.

I used three different Git MultiSite configurations and measured the push completion times for each. First I used the two-node configuration shown in the diagram without guaranteed content delivery (i.e. measuring the overhead of the replication algorithm). Next I used the two-node configuration with guaranteed content delivery to the remote node over SSL. In both of these cases the remote node had to participate in the replication proposal. Finally, I used the three-node configuration with guaranteed content delivery to one node over SSL.

To account for any outliers in the data I used the median value of the results for each Git remote.

Results

The graphs below summarize the results.

Median Push Time

Median Push Time

This chart shows the median push time for every system tested. Let’s drill down into the most common Git MultiSite configurations compared to bare Git repos.

Median Push Time (Summary)

Median Push Time (Summary)

It’s easy to see that there is a nominal amount of overhead with no latency. The gap quickly closes and Git MultiSite even pulls ahead with higher latency.

Effects of Latency

Effects of Latency

This chart shows the delta between any Git system’s performance with no latency and its performance at higher latencies. In other words, it’s the penalty you pay with a given system when latency increases. It’s very clear that Git MultiSite (the three bars on the right of each group) has the best relative performance as latency increases.

Replicable Replication: Low Overhead, Better Over a WAN

As the results show, the basic overhead for the replication algorithm, even with content delivery, is minimal. Compared to the benefits of 100% confidence in the replication system, it’s a small price to pay.

Of course, real world performance will vary based on usage. Guaranteed content delivery of very large commits will take a bit of time, but that’s why Git MultiSite gives you the ability to control how and where data is replicated and how many nodes must receive the push before it is accepted. You can choose the balance of performance and redundancy that makes sense for you.

Finally, for those of you supporting remote sites, you’ll appreciate that Git MultiSite’s performance holds up well to increased latency.

If 100% data safety and good WAN performance are important to you, give Git MultiSite a try.  You can start a free trial or start talking to one of our Git experts.