The First Subversion 1.7 Alpha Release is Available Now and The Community Needs Your Feedback

The long awaited and highly anticipated release of Subversion 1.7 is almost upon us.  Like any successful open source project, Subversion needs active participation and feedback from its user community.     This  is especially true in the case of 1.7,  given its emphasis on client-side performance and  new client tools.  That’s why Subversion 1.7.0 alpha-1 has now been made available for user testing.    Major 1.7 enhancements include:

  • HTTPv2- a protocol rewrite designed to enhance performance by reducing the number of round trips between the client and the server with every request.
  • WC-NG – a rewrite of the working copy library that enhances performance by centralizing metadata storage and provides a foundation for supporting features such as shelving and offline commits in future releases.
  • svnrdump –  a new client tool that provides the same functionality as  svnadmin dump and svnadmin load, but  on remote repositories.   There’s no need for administrator access to the source or target repository on on the remote server’s filesystem.

A complete list of what’s new is available at  http://subversion.apache.org/docs/release-notes/1.7.html.  Download 1.7 now at: http://www.wandisco.com/subversion/download/1_7-alpha.  Contribute to the success of this major new release by providing your feedback at:  http://www.svnforum.org/forums/56-Apache-Subversion-1.7.0-Alpha-Support .   This is your chance to make a positive impact on the world’s most popular version control system and its more than five million users.


Support for Mac OS X Added to the World’s Most Comprehensive Set of Fully Tested Free Subversion Binaries

Certified Subversion 1.6.17 binaries are now available for the Mac OS X platform.  These binaries work with Mac OS X versions 10.5.x and 10.6.x on PPC and Intel 32 and 64 bit architectures.  This software provides a complete, fully tested version of Subversion based on the most recent stable release, including the latest fixes.

To upgrade to the newest release of Subversion on Mac OS X and take advantage of the fixes and enhancements that it offers go to: http://www.wandisco.com/subversion/download .  You can register for free community support for the software at: www.svnforum.orgProfessional support is also available if you require secure, high quality online, phone and email support with guaranteed response times and automated access to the latest fixes and updates.

Subversion 1.6.17 Now Available

Subversion 1.6.17 was just released today.   This newest version provides several key enhancements as well as bug fixes.   Most notably, 1.6.17 includes major improvements in checkout performance for large working copies on Windows, greater efficiency of ‘blame – g’  for users dealing with a large amount of mergeinfo, and improved error handling on Windows.   A detailed list of the changes included in this new release is available at:  http://svn.apache.org/repos/asf/subversion/tags/1.6.17/CHANGES.   Release notes for the entire 1.6.x series can be found at:  http://subversion.apache.org/docs/release-notes/1.6.html.

To upgrade to this latest release of Subversion and take advantage of the fixes and enhancements that it offers go to: http://www.wandisco.com/subversion/download

The Next Frontier of Software Development: Social Coding for Subversion

WANdisco recently unveiled uberSVN – a major new product available free of charge that transforms Subversion into an open, extensible platform for application lifecycle management (ALM). In addition to plug-and-play flexibility and rich system and user administration capabilities, uberSVN provides the first-ever social coding environment for Subversion, taking enterprise software development beyond the limits of email, wikis, defect trackers, peer-code-review-tools and other applications typically used to manage projects.

uberSVN’s social coding environment reflects the convergence of social networking paradigms represented by Facebook and Twitter that foster instant communication and the collaborative development models of open source communities where software with features similar to these social networking sites was first used. And it’s having the same positive impact on software quality and developer productivity behind corporate firewalls that it’s had in the open source communities that deliver such market-dominating software as the Apache web server, Linux operating system and even Subversion itself.

uberSVN is organized around development teams and their activities. Each team has a home page that profiles the team members, lists the projects they’re working on, repositories they’re using and their latest activity and status. Team members can see each other’s real-time progress by simply subscribing to Twitter-like feeds that managers can also monitor.

With uberSVN, just like developers in an open source community, software engineers in corporate IT environments can rapidly exchange information and continually learn from one another. Peer review and continuous feedback are the norm. The overall skill level of the development team goes up and the all-too-common pitfall of reinventing the wheel is avoided. The end result is higher quality software delivered in far less time.

uberSVN is free.  Download it now at http://www.ubersvn.com/download.

SVNref Cards – Free Cheat Sheets for Subversion Developers and Administrators

In conjunction with our free Subversion training webinars we’re building a library of free cheat sheets for developers and administrators that we call SVNref Cards. Based on the highlights from each webinar, SVNref Cards are written by active members of the Apache Subversion project. Their easy to use format focuses on the most important concepts from each session and illustrates them with practical examples.

We introduced SVNref Cards after one of our most popular sessions, Hidden Subversion, which had over a thousand attendees. Since then, we’ve followed up with one for each webinar. Anyone who registers for a free Subversion training webinar receives an SVNref Card for that session automatically. If you miss a session you can download them anytime and stay up-to-date with the latest tips and tricks.

Our rapidly expanding library includes:

Hidden Subversion – An inventory of powerful, but relatively unknown and seldom used Subversion features.

Subversion Administration Best Practices – A quick reference guide covering administration policies and procedures, repository organization, backup and recovery and hook script usage.

Introduction to Branching and Merging – A review of branching and merging basics and merge conflict resolution.

Advanced Branching and Merging – Picks up where the introduction leaves off with examples of merge types, use of mergeinfo, analysis of branches using revision graphs and enforcement of standards.

WANdisco provides SVNref Cards and the free training webinars they’re based on to promote Subversion’s use and adoption, benefiting the entire community. We’re even planning to make recordings of our free training webinars available in response to the thousands of requests we’ve had since we began offering them nearly a year ago. Of course, if more extensive Subversion training, support, or other services are required, WANdisco has those too.

We’re First Again with Certified Binaries for the Latest Release of Apache Subversion

How and Why Do We Do it Every Time?

The Subversion community just announced the release of Subversion 1.6.16. Moments later, WANdisco announced the availability of its fully tested, certified Subversion binaries for this new release. Before we make these pure, certified binaries available for free download under the Apache 2.0 license, we put them through the same QA processes we use for our enterprise products that support Subversion deployments with tens of thousands of users processing millions of transactions each day. And because we verify that these binaries are pure, unmodified open source before we make them available, there’s no risk of being blindsided by IP infringement claims when you use them, or getting forced down the path of implementing proprietary solutions for defect tracking and other applications with Subversion.

The reason we’re able to accomplish this so quickly with every release is that WANdisco is committed to Subversion’s success and we’ve backed that commitment with our own very talented resources. First and foremost, these resources include core Subversion developers who have become our employees. These individuals have been a part of the project since the beginning and they have the status within the community to make changes to Subversion’s code base. They’re actively involved with the rest of the Subversion community from the time a new release is in the planning stages until it’s publicly available. And they’re led by Hyrum Wright, WANdisco’s Director of Open Source and the release manager for the Subversion project since 2008.

In fact the bug (CVE-2011-0715) was reported by Philip Martin, one of our very talented, full-time Subversion developers.

There’s no denying that WANdisco has an interest in Subversion’s continued success, particularly with large enterprises that have adopted it so enthusiastically over the last few years. But at the same time that this rapid adoption has validated Subversion’s success, it’s placed demands on the project to meet the kind of tough requirements that these large enterprises have. In addition, they have clear requirements for enterprise class support that’s on a par with the support services available for closed source solutions, as well as professional training and consulting services.

At WANdisco we’ve hired senior Subversion committers, offered enterprise class Subversion support, provided free training webinars , as well as paid for training classes, hosted our Subversion Live user conferences where attendees meet with committers in person, and become corporate sponsors of the Apache Software Foundation. We’ve also taken the lead on fixing branching and merging, a requirement that’s been out there since 2007 waiting to be addressed. We’ve done all of these things not because they are easy and make good press, but because they are required for Subversion to continue on its very successful path. That’s something we all have a stake in.

How to Cut Development Time in Half, Improve Build Performance by 500% and Eliminate Downtime

SSP, a leader in software applications for insurance and financial services, with developers in the UK, Australia and South Africa did it by implementing Subversion WAN Clustering (MultiSite) .  As soon as developers at one site commit changes they’re available everywhere at LAN-speed. SSP’s developers in the UK, Australia and South Africa checkout and commit changes to the same files simultaneously and have immediate access to each other’s work.  Merge conflicts and other problems that weren’t discovered for days or weeks until it was time to create a build, are caught and fixed when they happen.  The best talent for a project regardless of location can work together as one agile, virtual development team to get the job done faster.  The net result is that the time SSP has to spend on QA and rework has gone down so dramatically that development cycles have been cut in half.

Now that every developer has instant access to the latest changes regardless of where they came from, builds can be created and tested at each site in less than a day, instead of waiting up to 5 days for a central team to complete their development work and schedule in builds for other sites.

SSP has also been able to go 24-by-7 with no downtime because Subversion WAN Clustering has turned their distributed Subversion servers into mirrors of each other. When one server goes down for either a planned or unplanned outage, users failover to another site and keep working. When a server comes back online it recovers automatically, grabbing all of the changes that happened at other sites while it was offline.

Get SSP’s full story to learn more about how all of this was accomplished.

Making Subversion Agile in a Distributed Environment.

Agile development is iterative and incremental. It requires continuous build-test-deploy cycles and continuous communication. Your SCM, whether it’s Subversion or any other, is the key enabler for communication about the most important aspect of any software development project: the current state of your code.

The biggest challenge to effective agile development in a distributed environment lies in maintaining the same level of communication, the same level of continuous integration and the same level of Subversion repository access that’s possible when everyone works from a single location. While you can partially overcome the effects of distance by putting certain practices into place, the implementation architecture and tools you choose can either help or hurt your efforts to be agile.

Central Subversion server and proxy server solutions like svnsync often can’t address requirements for high levels of communication and continuous build integration in large distributed organizations because:

Network latency causes remote developers to check out source code and commit changes infrequently. Merge conflicts and other problems aren’t uncovered until later in the development cycle, resulting in broken builds, extra QA and rework.
The latest changes aren’t available from remote sites, making continuous build integration impossible.

Proxy server solutions such as svnsync have the same network latency issues for write transactions that central server implementations have. In addition, if the replication of a write transaction from the master to a read-only slave fails, with svnsync there’s no built-in capability for catching the error and retrying the failed transaction.
If an IT organization wants the flexibility to allow all of its sites to run builds in parallel so that remote developers aren’t dependent on a central build team’s schedule and time zone differences, this won’t be practical with a proxy server solution if build scripts include write steps that can only take place on the master server.
Agile development requires a high level of availability. Central server and proxy server implementations have a single point of failure inherent in their architectures. This limits availability when the central or master server crashes, the network connection is lost, or the server has to be taken offline for maintenance
If development takes place at only one location, with a large enough team, the combination of development activity and continuous build cycles on a central Subversion server can turn it into a performance bottleneck as well as a single point of failure, slowing down both developers and the build team.

In contrast to central server or proxy server solutions, with Subversion MultiSite’s peer-to-peer architecture there is no single point of failure, or performance bottleneck. Distributed Subversion servers are fully readable and writable at LAN-speed at every location and kept continuously in sync by WANdisco’s unique active-active replication capability. Each site can perform builds and test locally with the latest source code, regardless of where it originates. In a distributed environment, this provides the support necessary for continuous build integration that detects problems early and keeps software development projects from going over-budget. Delays caused by broken builds and scheduling conflicts with a centralized build team go away.

In addition, because WANdisco’s replication technology turns distributed Subversion repositories into mirrors of each other, continuous hot backup is achieved by default. Subversion MultiSite leverages this with automated recovery features that allow one site to recover from any other after a network outage or server crash. These capabilities can also be used to take servers offline for routine maintenance without interrupting user access, making 24-by-7 operation possible. As a result, Subversion MulitiSite delivers a higher level of availability in a globally distributed environment than a central server can with everyone at the same location. Downtime is eliminated and business continuity is insured.

Subversion Clustering is built on the same replication technology as Subversion MultiSite and delivers the same automated recovery capabilities that enable full 24-by-7 operation, with immediate and transparent failover. It allows Subversion to be deployed in an active-active cluster over a LAN and removes the single point of failure and performance bottleneck of a central Subversion server. It provides truly shared nothing clustering. There is no sharing of disk, CPU or memory between servers in a cluster. Intelligent load balancing capabilities optimize performance even further by taking each server’s current load into account before routing requests, rather than relying on simplistic round-robin approaches. Subversion Clustering is often used to improve Subversion’s performance by spreading the load created by continuous builds across a cluster of servers at large sites.

Subversion Clustering can be implemented standalone, or in combination with Subversion MultiSite to combine local clusters into an active-active WAN cluster that can be monitored and administered from one location. Remote users are routed to the closest site where Subversion servers are available.

Distributed software development reduces labor costs and provides access to wider talent pools and global markets. Agile development delivers cost-savings by catching problems before unplanned rework and QA leads to missed deadlines, and being flexible enough to allow software to change as fast as user requirements. In order to achieve this, the right processes have to be combined with the right technology to overcome the impact of distance on communication between developers. The bottom line is that when agile development is implemented successfully in a distributed environment, the greatest cost-savings and productivity improvements possible can result.

The Centralized vs. Distributed Debate Continues

After my previous post, “Can Globally Distributed Development Really be Supported with a Central Subversion Server? ” CollabNet’s Jack Repenning recently responded with a post of his own: “Optimizing Globally Distributed Source Management”.

Needless to say, everyone at WANdisco has tremendous respect for Subversion and what the Subversion community has achieved. We’ve built a significant portion of our business around Subversion because of that respect. My post had everything to do with implementation architecture and nothing to do with the quality of Subversion. In fact, it could have been taken and applied to a debate over whether or not to centralize or distribute a database, or any other application.

And while Jack and I still agree on the list of tradeoffs in the centralized versus distributed debate, including his additional item of enterprise workflow, we still see different outcomes:

WAN costs, in terms of bandwidth and latency in large corporations are not merely a problem of, “…configured “Quality of Service” policies colliding with evolving use patterns which can be reconfigured…” WANdisco’s Fortune Global 1000 customers moved away from relying on a central Subversion server, even though they did control their networks and had ample resources to obtain extra bandwidth. Some of them are even network service providers. The bottom line is that no matter how good the network is, there’s still the unavoidable speed of light problem, which gets compounded by the number of WAN round trips required to complete a transaction. Then there’s always the risk of timeouts with each network hop as data gets transmitted over a WAN. There’s no forward recovery after a timeout, so transactions have to be resubmitted and data can be lost. Finally, if remote users lose their connection to the central server, they have no repository access at all.

Server performance can be addressed up to a point by adding more memory, CPU or disk. Some of WANdisco’s customers, including those that actually manufacture servers, did just that, or asked their hosted service provider do it for them as a stopgap. But no increase in the capacity of a central server can get away from the fact that it’s a single point of failure that’s destined to become a performance bottleneck again. Distributing Subversion repositories with the right solution gives IT organizations the ability to balance workload across their development sites and locate data close to users, which by default leads to better performance, scalability and availability (the next item on the list) in a way that no central server implementation can.

Availability is ultimately a function of the fail-over and disaster recovery features of an implementation, and since there must be something to restore from, the backup solution. Most large data centers rely on disk mirroring, coupled with backups taken at regular intervals in case the mirrors fail, to provide the data to restore from. There are three challenges with disk mirroring solutions: (1) they’re designed primarily to protect against disk failure, not total server failure, so the application still has to be brought up on another server, (2) some manual intervention is required so the risk of human error leading to data loss and extended downtime is real and (3) they only work over distances covered by a metro-LAN at best, not a WAN.

Continuous hot backup and automated disaster recovery are included with Subversion MultiSite, so there’s no need to rely on disk mirroring solutions and incremental backups, or manual procedures of any kind. Servers can even be taken offline for routine maintenance without interrupting user access, making full 24 by 7 operation a reality in a global environment. As soon as a server comes back online it catches up automatically with the other servers.

Enterprise work flow in this context refers to the ability to map and selectively replicate groups of Subversion repositories to only those sites where the development teams that use those repositories are located. As projects, teams and locations shift and change the replication map needs to change with them.

While this can get complicated without the right tools, large enterprises have implemented Subversion MultiSite partly on the basis of its ability to handle selective replication. They didn’t see it as a problem when they moved away from a central server.
One of the key features provided by Subversion MultiSite is the ability to administer all sites from a single location. Selective replication is just one facet of this.

Can Globally Distributed Development Really be Supported with a Central Subversion Server?

Based on the feedback we’ve received from our customers and prospects, the answer to this question is a resounding no, at least not in a distributed development organization of any size, with a number of users at remote sites.

There are essentially three major obstacles that distributed development organizations will run into with a central server approach:

1. WAN Latency.

WAN latency becomes a problem not only because of increased network traffic as the number of remote users grows, but also due to the fact that every remote request entails a WAN penalty. Even though Subversion clients only send changes to the central server when modifications to existing source code files are committed, when a new source code file is committed, or an existing file is checked out, the entire file is sent over the WAN.

2. Degradation of the Central Server’s Performance.

This results not only from the extra load generated by increasing numbers of remote users, but also from read transactions that would otherwise be unnecessary if those users had access to a local copy of the Subversion repository. A frequent pattern that arises with a central Subversion server configuration is that multiple developers at the same remote locations repeatedly perform checkouts, updates and other read operations against the same files.

These repeated, unnecessary reads use up central server memory and processor capacity, as well as network bandwidth.

3. Availability.

The ultimate weakness with a central server approach is its single point of failure architecture, and the impact this has on repository availability. When the network connection is lost, remote users have no repository access at all. Even a transient WAN connection failure between a remote Subversion client and the central server can slow down developers at remote sites if it takes place during a large commit, since the entire commit will have to be resubmitted.

Obviously, if the server hosting the repository is down for any reason, all users will be impacted, not just those at remote locations, unless a backup is available. Even if a backup server is available, the time involved in bringing it into service can be significant, and, there’s always the risk of data loss and extended periods of downtime due to human error during the recovery
process.

The arguments typically used in favor of a central server approach are that everyone works from a single consistent copy of the repository, maintenance and administration are only required at one location, and overall control can be implemented more effectively from both a project management and data security perspective. The bottom line for advocates of supporting globally distributed development with a central Subversion server is that these perceived benefits greatly outweigh any gains that might be achieved by distributing Subversion repositories.

Let’s take each of these arguments and examine their validity in light of the available alternatives.

Single Consistent Copy of the Subversion Repository.

The first argument that advocates of a central server approach make is that it’s the only way to maintain a single consistent copy of the repository.

There are a number of master-slave solutions available that provide a partial response to this argument. The most commonly used is svnsync, introduced with Subversion 1.4. While svnsync and the other solutions do offer the advantage of allowing remote site developers to access a local read-only slave mirror of a master Subversion repository, the slaves are only as current as the last instance of replication from the master. The lag time between each instance of replication often leaves developers at remote sites checking out stale versions of source code files. This in turn leads to update conflicts when remote developers perform their commits against the master. This then requires them to perform updates over the WAN against the master to get the latest revision and resolve any conflicts before reattempting their commit. This can negate some of the expected improvements in network performance and developer productivity, because read operations still have to be performed over the WAN. Finally, the master repository represents a single point of failure for write operations.

In contrast to a master-slave approach, Subversion MultiSite relies on a peer-to-peer architecture with no single point of failure. All of the repository replicas are readable and writeable for the entire code base, and consistency across the repositories is guaranteed. In addition, WANdisco’s active-active replication capability allows developers at all locations to work at LAN speed over a WAN for both read and write operations, while at the same time keeping all of the repository replicas continually in sync. In effect Subversion MultiSite delivers one-copy equivalence across a system of distributed Subversion repositories, and provides the same user experience that would be achieved if all of the developers worked at one location over a LAN against a single repository, instead of thousands of miles apart.

Maintenance and Administration.

The second major argument in favor of a central repository is that maintenance and administration only have to be performed at one location.

While this may sound like a major benefit at first glance, for remote developers there can be a significant negative impact if they are separated from the central server site by large time zone differences, and either the network connection or the server goes down during their location’s normal working hours. From the remote site’s perspective, it can take until the next business day to restore access.

In addition, in a typical Subversion implementation, not only Subversion, but Apache and the operating system have to be maintained as well. It is true that if Subversion repositories are distributed, maintenance, particularly in the area of applying patches and performing upgrades becomes more of a challenge. If it’s handled inconsistently, problem resolution can become incredibly complex, resulting in data loss and extended periods of downtime.

WANdisco addresses these challenges by making it possible to monitor and administers servers at all sites from a single location. In addition, Subversion MultiSite is now available as a virtual software appliance, with access to an update server. Patches and upgrades can be applied automatically for all of the components of the implementation, including Subversion, Apache, and the operating system at each location, eliminating the risks inherent in performing these tasks manually.

Backup and Recovery

Since backup and recovery is such an important aspect of Subversion repository maintenance and administration, I’d like to discuss it briefly here. In an upcoming post, I’ll cover this topic in more detail.

With a central server approach, backup and recovery solutions typically either rely on disk mirroring, or svnadmin scripts used to copy the repository to a standby backup server. In any event, even if a backup server is available, the lag time involved in bringing it into service can be significant. In addition, there’s always the risk of data loss and extended downtime resulting from human error during the failover and recovery process.

Although master-slave solutions like svnsync can be used for backup and recovery, if the master goes down, the slaves are likely to be missing data that may be unrecoverable depending upon the nature of the master server failure. The extent of data loss will depend upon the lag time and size of the changes to the master since the last instance of replication.

The other issue to be aware of is what actually gets replicated to the mirror slave repositories by the tool you’re using. For example, with svnsync, only the versioned repository data gets synchronized. Repository configuration files, user-specified repository path locks, and other items that might live in the physical repository directory but not inside the repository’s virtual versioned filesystem are not replicated.

With Subversion MultiSite, continuous hot backup is achieved by default as a byproduct of active-active replication and all repository data is replicated. After an outage, recovery from any other site’s server is automatic.

Overall Control.

The third major argument used by advocates of a central Subversion server approach is that development projects can be managed more tightly. What happens in practice is often just the opposite.

Many of our customers report that prior to implementing Subversion MultiSite, remote developers often held back large commits until the end of the day or end of the week, using WAN latency as an excuse. This made it harder to monitor what everyone was doing on a day-to-day basis, and meant that it took longer to find out that developers didn’t understand the specs they were given. As a result, code was delivered that had to be rewritten and project deadlines were frequently missed.

In terms of maintaining control from a data security perspective, the goal is to achieve consistent enforcement of security policy across all development sites. When Subversion MultiSite is implemented with Subversion Access Control, the security policy configuration is automatically replicated to all sites when it’s initially set up, as are any future changes. This guarantees that access control is enforced consistently at every location. Subversion Access Control also provides audit capabilities that track every user access to the repository and alert administrators whenever access violations occur.

Data security as it relates to the contents of source code repositories has become a greater concern in recent years as IT organizations began to outsource development work to countries where enforcement of intellectual property rights is relatively weak. In addition Sarbanes-Oxley and other regulations have begun to reach into the IT organization, imposing requirements of their own. In an upcoming post, I’ll cover this topic in more detail and describe
what’s really required to secure the intellectual property stored in souce code repositories in a globally distributed environment.

Multi-site Development and the Write-Thru Proxy

With Subversion 1.5, along with other notable new features such as built-in merge tracking, the WebDAV write-thru proxy will be introduced to simplify use of svnsync for Subversion deployments based around Apache 2.2.x.

Prior to Subversion 1.5, users had to manually redirect their client to the master server whenever they executed a commit or other write transaction using the “svn switch — relocate” command. The WebDAV write-thru proxy will now detect when a commit or other write command has been issued by a client connected to a slave repository. It will then automatically redirect the client to the master server. This should make life somewhat easier for end users, and help prevent unintended writes against slave repositories from leading to split-brain scenarios that can be difficult to recover from.

However, the WebDAV write-thru proxy leaves svnsync’s master-slave architecture unchanged. While svnsync does offer the advantage of local reads, eliminating WAN traffic that would otherwise take place between a remote client and a central Subversion server, writes only happen on the master. Thus, the master repository can become a single point of failure for write transactions. In addition, the lag time between each instance of master repository replication can result in users at remote sites checking out stale copies of source code files from their local slave. This in turn can lead to update conflicts when changes are committed against the master. If the replication process fails due to network outages or server crashes, there are no built-in recovery capabilities.

In contrast, WANdisco’s Subversion MultiSite turns every Subversion repository into a peer of every other, and every repository is readable as well as writeable for the entire code base. Replication is triggered automatically when a write operation is done at any location, and transactional consistency is guaranteed across all of the repositories. Self-healing capabilities are provided to automate the recovery process after a network outage or server crash, and prevent any data loss.

Although both svnsync and Subversion MultiSite support the WebDAV HTTP protocol, Subversion MultiSite only uses this protocol over a LAN. WANdisco’s own optimized protocol is used over a WAN on top of TCP/IP. The result is that commits consisting of hundreds of files are sent in a single pass during replication with Subversion MultiSite, rather than one-by-one as a series of HTTP PUTS, as is the case with svnsync. This enables Subversion MultiSite to deliver a significant performance boost over a wide area network.

With svnsync, user information, including access privileges must be maintained consistently across all of the servers, and there are no built-in features to support this. When WANdisco’s Subversion Access Control solution is implemented with Subversion MultiSite, the security configuration is replicated automatically when it’s initially set up, as are any changes, insuring consistency across all of the servers.

To learn more check out: Subversion MultiSite.

Get the WAN Out of the Way

Virtually every approach to globally distributed multi-site development
using Subversion leaves the WAN in the way of developer productivity. In this post, I’ll explain why this is the case, and how it can be dealt with.

The WAN performance that developers experience results from a combination of two factors: (1) the number of WAN round trip times required to complete a write operation between a Subversion client and a master server, and (2) the available throughput on the network.
I’ll focus on write operations over the WAN such as commits, since there are master-slave solutions like svnsync that allow developers to do checkouts, updates and other read operations locally, without generating WAN traffic. However, read operations over the WAN may still be required with master- slave solutions like svnsync. To understand why this is the case, see my earlier post, “Keeping Multiple Subversion Repositories in Sync.”

Let’s examine the first factor impacting WAN performance, the number of WAN round trips. With each Subversion commit using the SVN RA protocol (Subversion without Apache) up to six WAN round trips will take place between a remote developer’s Subversion client and the master Subversion server. These WAN round trips are required to open the connection, authenticate the user, and write the commit to the master server. This introduces some minimal latency, typically on the order of 2 to 3 seconds over a WAN. If Subversion is implemented with Apache using the WebDAV HTTP protocol,
4 WAN round trips will be incurred for each file in the commit, since each file will get transferred with its own separate HTTP put. With a large number of files in a commit, several minutes of wait time can be incurred over a WAN.

However, the real impact on developer productivity comes from the second factor, the amount of time required to transmit data at WAN-speed, based on the available throughput on the network. For example, consider a commit consisting of 500MB of data sent over a WAN from India to the US. Given that the typical E-1 line used between the US and India operates at approximately 2 megabits per second, it should take 2000 seconds, or a little over 33 minutes to transfer 500 MB. This assumes an absolute best-case scenario in which there’s no competition with other network traffic at the same time, and everything goes smoothly without any connection loss or communication error between the client and the remote master server.

What if a remote site developer’s commits could be processed at LAN-speed, instead of WAN-speed? Given that most LANs operate at one gigabit per second, it should take about four seconds to transfer 500MB of data between the Subversion client and the server over a LAN. Instead of waiting over 33 minutes under the best of circumstances, remote site developers would see their commits complete in four seconds! Developers would check in their changes more frequently, rather than waiting until the end of the day, or end of the week as they would have done in the past, due to the pain of poor network performance.

In addition, what if the distributed Subversion servers were kept in sync in real-time over the WAN? The 33 minutes saved would be just the tip of the iceberg. If developers across all sites had access to the latest source code without having to wait for a master server to be copied to their local read-only servers, then update conflicts and other problems could be fixed as soon as they were found. It would also be possible to achieve real-time collaboration between distributed development teams instead of having them work in silos. As a result, less time would be spent on QA and rework, and a significant amount of time and cost would be squeezed out of the development cycle.

WANdisco, with its unique active-active replication capabilities, allows all of this to be accomplished. WANdisco delivers LAN-speed performance for both read and write operations, while keeping distributed Subversion repositories in sync in real-time. WANdisco gets the WAN out of the way, so that all of the productivity improvements and cost-savings that IT organizations are seeking from globally distributed development can be achieved.

Keeping Multiple Subversion Repositories in Sync

With Subversion 1.4svnsync was introduced for this purpose. The key problem with using svnsync for multiple Subversion repositories distributed over the WAN is its reliance on a master-slave architecture. While svnsync does provide the advantage of having local read-only repositories at each of the remote development sites, only the master repository is writeable. The master repository is then replicated to the read only slaves. However, the replication process can place a significant load on the network and servers. Because of this, replication tends to happen on an infrequent basis, leaving the read-only slave repositories that remote sites do their checkouts from out of sync with the master much of the time. As a result, commit failures due to update conflicts on the master repository can become a problem. In order to avoid commit failures, developers at the slave repository sites have to do updates over the WAN against the master Subversion repository before doing their commits. This can negate most of the expected network performance and developer productivity benefits of using svnsync in a distributed development environment.

Other solutions such as svk do allow multiple repositories to be readable as well as writeable, but there are no guarantees of consistency across the repositories. A commit can succeed on a developer’s local repository where there are no conflicts, and fail when it’s copied to other sites’ repositories due to update conflicts. This can make administration extremely difficult.

WANdisco solves these problems by turning distributed Subversion repositories into peers. All of the repositories are writeable, and consistency across the repositories is guaranteed. WANdisco’s active-active replication capabilities allow developers to work at LAN speed over the WAN for both read and write operations, while keeping all of the repositories in sync, in effect in real-time. WANdisco also provides self-healing capabilities that automate disaster recovery after a network outage or server failure.

avatar

About Jim Campigli