Monthly Archive for November, 2013

Subversion 1.8.5 and 1.7.14 released!

Today the Apache Software Foundation (ASF) announced the release of Subversion 1.8.5 and 1.7.14, and we’re proud to announce our own fully tested and certified binaries are also available from our website.

Subversion 1.8.5 changes include:

  • Client-side bugfixes

    • Fix externals that point at redirected locations (issues #4428, #4429)

    • diff: fix assertion with move inside a copy (issue #4444)

  • Server-side bugfixes

    • mod_dav_svn: Prevent crashes with some 3rd party modules (r1537360 et al)

    • hotcopy: fix hotcopy losing revprop files in packed repos (issue #4448)

Subversion 1.7.14 changes include:

  • Client-side bugfixes

    • Fix externals that point at redirected locations (issues #4428, #4429)

    • diff: fix incorrect calculation of changes in some cases (issue #4283)

    • diff: fix errors with added/deleted targets (issues #4153, #4421)

  • Server-side bugfixes

    • mod_dav_svn: Prevent crashes with some 3rd party modules (r1537360 et al)

    • fsfs: limit commit time of files with deep change histories (r1536790)

Visit the Apache changelogs for Subversion 1.8 and 1.7.

You can download our fully tested, certified binaries for Subversion 1.8.5 and 1.7.14 free here.

WANdisco’s binaries are a complete, fully-tested version of Subversion based on the most recent stable release, including the latest fixes, and undergo the same rigorous quality assurance process that WANdisco uses for its enterprise products that support the world’s largest Subversion implementations.

Subversion Password Security Upgrade

Continuing a series of articles on the latest improvements in Subversion, this article will focus on a small but significant Subversion password security upgrade. Subversion 1.8 now allows passwords to be cached in memory rather than on disk.

Passwords or authentication tickets cached on disk are a security vulnerability if the drive is lost or stolen, so this is a welcome improvement. Note, however, that the password exists in memory in plain text, and if an intruder accesses the machine while the cache is live and knows the cache ID, the password could still be compromised.

In order to use this new feature you’ll need Subversion 1.8 binaries compiled with gpg-agent support, gpg-agent itself, and a pinentry program. You’ll also need to configure a couple of gpg-agent environment variables.

If password security is an important concern for you, get your certified Subversion 1.8 binaries and take advantage of this improvement.

Subversion is a registered trademark of the Apache Software Foundation.


SVN MultiSite Plus: The Fastest Solution for Game Developers

Handling large binary files, such as the media assets used during game development or design files used by hardware and firmware engineers, can be challenging for SCM systems.  Apache Subversion has accelerated development of enterprise features such as class-leading performance with large binary assets.  Subversion now shows a significant advantage when committing and transferring files over a WAN compared to Perforce, a commercial SCM system with a reputation for being the performance leader in binary asset management.

Test Setup

In order to test the performance of large binary file handling over a WAN, I set up a test configuration as follows.


SVN MultiSite Plus test configuration

SVN MultiSite Plus test configuration

Perforce test configuration

Perforce test configuration



During the test, a 1.7 GB ISO file is committed to a local node three times, and each iteration is timed.  Two measurements are taken:

  • Time to run the commit command. This is the user’s experience of system speed.

  • Time for the file to transfer to the remote node over a link with 256 ms simulated latency. This is the time after which the file would be available to users at other sites.

Test Results

As the chart below shows, Subversion is significantly faster to commit and transfer the data.

Test results

Test results


Several conclusions can be drawn.

1.       Perforce is very slow to transfer a large binary file over a WAN. No compression was used between client and server, and the link was unencrypted. As the chart indicates, the bulk of the commit time was simply waiting for file transfer. Subversion is significantly faster. [1]

2.       SVN MultiSite Plus offers the benefit of using a second local node to satisfy data integrity before accepting the commit. By default commit data must reach at least one other node to ensure data availability, but this can be a local node. The user does not need to wait for the file to transfer over the WAN. This result is clearly shown in the results for the first iteration, when the commit completed in 5 minutes and the file appeared on the remote node 9 minutes later.

3.       Subversion does not retransfer or store duplicate objects in the repository storage area. Thus, there is no time for transferring the commit content in the second and subsequent runs.

The departments working on game development or hardware/firmware design are often separated by function (e.g. game artists versus game developers or hardware engineers versus application developers), and often each team will be in a separate location. For these teams, SVN MultiSite Plus offers a significant performance advantage when handling large binary files.

[1] The ‘svn’ protocol used by svnserve is relatively unaffected by increased latency as it does not wait for responses from the server while transmitting data.  See for more information.  Perforce, however, is very sensitive to TCP send and receive buffer sizes over a high latency network.  Increasing the operating system network tunables does improve Perforce performance.  In a simpler test of a regular commit from client to server over a 256ms latency link, the transfer time dropped from 2 hours to 20 minutes.  However that still compares poorly with Subversion’s out of the box, non-replicated transfer time of 7 minutes, and again SVN MultiSite Plus completes a replicated commit in even less time.




WANdisco Available from Ingram Micro

We’re pleased to announce our products are now available to channel partners through Ingram Micro. Ingram Micro’s authorized technology resellers in North America can now quickly and easily add WANdisco’s solutions to their existing product offerings via our agreement with IT solutions aggregator DistiNow.

“Ingram Micro is well positioned to help us identify new business opportunities and drive significant growth throughout North America,” said David Richards, WANdisco CEO. “As a single-source distributor offering its channel partners full access to a complete array of solutions and services, Ingram Micro offers extensive reach within vertical markets.”

“Data availability is a high priority that spans across nearly every vertical market and as such presents a growing opportunity for our channel partners,” said Bill Brandel, senior director, Advanced Computing Division, Ingram Micro U.S. “We’re pleased to bring WANdisco’s portfolio of products to our channel partners and provide them with a needed solution that allows them to monetize the open source product market.”

Visit our website to learn more about WANdisco’s solutions providing continuous availability for Hadoop Big Data, Subversion, and Git. While you’re there, register for our upcoming webinar with Hortonworks, “The Modern Data Architecture for a Non-Stop Hadoop” scheduled for December 5th.

Dynamic Subversion Deployments

Among the many improvements in the latest release of SVN MultiSite Plus, the ability to add new nodes on the fly to replication groups really stands out. SVN MultiSite Plus helps a Subversion administrator cope with uncertainty: you don’t know very far in advance how many developers you have, where they’ll be located, and how much build automation they’ll use. When I worked as a consultant I would often ask these basic questions in the process of hardware sizing for a deployment, but in reality you’re just guessing if you try to figure out how many servers you need years in advance. SVN MultiSite Plus gives you a dynamic Subversion deployment that grows in response to your environment.

As a simple example, let’s say that I start out with three nodes in a single office. After six months I need to add two more nodes to handle additional build automation load, then after a year I need to add a node at a remote site to support a new office. These routine capacity events shouldn’t cause a big disruption to your Subversion service.

With SVN MultiSite Plus I can simply add a new server to the deployment on the fly. After the server is provisioned and configured, I simply go to the administration console and add the new server to the replication group:


After I add a node, SVN MultiSite Plus walks me through the process of synchronizing data onto the new node. While the new node is being synchronized it is not usable, of course, but it is automatically activated once the data transfer is complete.

If you’re tired of constantly scrambling to keep your Subversion deployments up to speed, grab a free trial of SVN MultiSite Plus. If you face the same problem for Git, Git MultiSite offers the same support for quickly expanding a Git deployment.


A View From Strata NY: Big Data is Getting Bigger

In general a trade show is a dangerous place to gauge sentiment.  Full of marketing & sales, backslapping & handshakes and marketecture rather than architecture the world is indeed viewed through rose-tinted-spectacles. Strata, the Hadoop Big Data conference in New York last week was very interesting albeit through my rose-tinted-spectacles.

Firstly, the sheer volume of people, over 3,500 is telling.  This show used to be a few hundred, primarily techies inventing the future.  The show is now bigger, much bigger.  A cursory glance at the exhibit hall revealed a mix of the biggest tech companies and hot start-ups.  The keynotes, to the disappointment of those original techies, were primarily press-driven product releases lacking real technical substance.  This is not such a bad thing though. It’s a sign that Hadoop is coming of age. It’s what happens when technology moves into the main stream.

Second, the agenda has changed quite dramatically.  Companies looking to deploy Hadoop are no longer trying to figure out how it might fit into their data centers. They are trying to figure out how to deploy it.  2014 will indeed be the end of trials and the beginning of full-scale enterprise roll-out.  The use-cases are all over the place.  Analysts yearn for clues and clusters to explain this “Are you seeing mainly telco’s or financial services?”  Analysts of course must try to enumerate in order to explain but the wave and shift is seismic and the only explanation is a fundamental shift in the very nature of enterprise applications.

My third theme is the discussion around why Hadoop is driving this move to rewrite enterprise applications.  As someone at the show told me, “the average age of enterprise application is 19 years”.  Hence,this is part of a classic business cycle.  Hadoop is a major technological shift that takes advantage of dramatic changes in the capabilities and economics of hardware.  Expensive spinning hard-disk, processing speeds, bandwidth, networks, etc. were limitations and hence assumptions that the last generation of enterprise applications had to deal with.  Commodity hardware and massive in-memory processing are the new assumptions that Hadoop takes advantage of.  In a few years we will not be talking about ‘Big Data’ we will simply use the term ‘Data’ because it will no longer be unusual for it to be so large in relative terms.

My fourth observation was that Hadoop 2 has changed the agenda for the type of use case.  In very rough terms Hadoop 1 was primarily about wall ststorage and batch processing.  Hadoop 2 is about yarn and run-time applications. In other words processing can now take place on top of Hadoop rather than storing in Hadoop but processing somewhere else.  This change is highly disruptive because it means that software vendors cannot rely on customers to use their products in conjunction with Hadoop.  Rather, they are talking about building on top of Hadoop.  To them Hadoop is a new type of operating system.  This disruption is very good news for the new brand of companies that are building pure applications built from the ground up and really bad news for those who believe that they can mildly integrate or even store data in 2 places. That’s not going to happen. Some of the traditional companies had a token presence at Strata that suggests they are still unsure of exactly what they are going to do – they are neither fully embracing or ignoring this new trend.

My final observation is about confusion.  There’s a lot of money at stake here so naturally everyone wants a piece of the action.  There’s a lot of flashing lights and noise from vendors, lavish claims and a lack of substance.  Forking core open source is nearly always a disaster. As open-source guru Karl Fogel says ‘forks happen due to irreconcilable disagreements, technical disagreements or interpersonal conflicts and is something developers should be afraid of and try to avoid it in any way’.  It creates natural barriers to use tertiary products and with an open source project moving as quickly as this, one has to stay super-close to the de facto open source project.

A forked version of core Hadoop is not Hadoop, it’s something else.  If customers go down a forked path it’s difficult to get back and they will lose competitive edge because they will be unable to use the community of products being built as part of the wider community.  Customers should think of Hadoop like an operating system or database.  If it’s merely embedded and heavily modified then this is not Hadoop.

So 2014 it is then.  As the Wall St Journal put it the Elephant in the Room to Weigh on Growth for Oracle, Teradata

Here’s a great video demo of the new @WANdisco continuous availability technology running on Hortonworks Hadoop 2.2 Distro



About David Richards

David is CEO, President and co-founder of WANdisco and has quickly established WANdisco as one of the world’s most promising technology companies. Since co-founding the company in Silicon Valley in 2005, David has led WANdisco on a course for rapid international expansion, opening offices in the UK, Japan and China. David spearheaded the acquisition of Altostor, which accelerated the development of WANdisco’s first products for the Big Data market. The majority of WANdisco’s core technology is now produced out of the company’s flourishing software development base in David’s hometown of Sheffield, England and in Belfast, Northern Ireland. David has become recognised as a champion of British technology and entrepreneurship. In 2012, he led WANdisco to a hugely successful listing on London Stock Exchange (WAND:LSE), raising over £24m to drive business growth. With over 15 years' executive experience in the software industry, David sits on a number of advisory and executive boards of Silicon Valley start-up ventures. A passionate advocate of entrepreneurship, he has established many successful start-up companies in Enterprise Software and is recognised as an industry leader in Enterprise Application Integration and its standards. David is a frequent commentator on a range of business and technology issues, appearing regularly on Bloomberg and CNBC. Profiles of David have appeared in a range of leading publications including the Financial Times, The Daily Telegraph and the Daily Mail. Specialties:IPO's, Startups, Entrepreneurship, CEO, Visionary, Investor, ceo, board member, advisor, venture capital, offshore development, financing, M&A

Challenges of the Git Enterprise Architect, Part 1

This is the first of more than 20 articles, each examining a key challenge facing anyone responsible for deploying Git at scale in their enterprise software development environment.

Part of my role in Product Management is to seek out early adopters of emerging technology and study their process, challenges, and techniques deploying new technology. This helps ensure that we build products that people want to buy. As I wrote in Problem-centric Products, “it’s so important to deeply understand the challenges faced by your customers, and speak to the problems first whenever possible.” By studying a variety of early adopters, patterns start to emerge, and that’s where Product Management’s “ear to the ground” starts to turn information into a product vision.

I also like to share what I’ve learned, so my original “Top Challenges for the Git Enterprise Architect” document was circulated with first, our customers, then as a talk at our Subversion & Git Live 2013 conference in Boston, San Francisco and London, and now as a set of blog articles.

It was a surprise to me that many found my intermediate-level talk: “Git Enterprise Challenges” to be sobering or even frightening.  Perhaps the rose-colored glasses of my optimism cause me to see problems as opportunities, or equally possible, common challenges mean there is a chance to create a product that will benefit many people.

Note that not every development environment will face every one of these challenges, however together they comprise a checklist of issues to consider when adopting Git into your environment.  I’ll drill down into each of these over the next few months, and the result should paint a reasonably complete, if high-level picture.

I should also point out that I don’t address solutions in this series. Our products, Git MultiSite and Git Access Control, are just the first step in a roadmap that eventually visits every challenge. You’ll know that, just as your needs around deploying Git in your enterprise grow, WANdisco’s Git products will grow with you.

As a side note, WANdisco is now a general SCM expert, with deep knowledge for deploying and supporting leading tools like Git and Subversion, as well as advice and professional services for migrating from legacy tools like ClearCase, CVS, TFS, Perforce, and others.

And without further ado, here are the topics I’ll be covering:

Managing many repos, Access control, Multi-repo codebases, Ever growing repos, Large binaries, Shared code, Large repos, Long clone times, Supporting add-ons, Splitting repos, Combining repos, IP protection, IP reuse, Contaminating licenses, Code refactoring, Multi-site, Scaling myth of dictator-lieutenant, Untracked rename, Supporting a successor to Git, Untracked rebase, Permanent file removal, Excessive cloning

If there are any additional topics you want to see covered, please leave a comment and I’ll try to address it.  

Tune in soon for “Managing Many Repos”.

Efficient Incremental Backups for Subversion

Subversion 1.8 introduces an improved method for efficient incremental backups for Subversion. In a nutshell, the hotcopy command can run incrementally for faster backups.

To take a step back, Subversion repositories hold the intellectual property equivalent of your crown jewels, and guarding that data requires a layered strategy.

  • The first step is usually a local mirror that has a full copy of the repository data and can be used as a warm or hot spare. Of course, if you’re using WANdisco SVN MultiSite, every replicated peer has a complete set of data and failover is automated.

  • The second step is a remote mirror that has a full copy of the repository data and can be used for failover in a disaster recovery scenario. Again, SVN MultiSite provides this capability (and more) along with automated failover.

  • Finally, you need to have full offline backups of your repository data. Offline backups often progress through a storage cycle, with the most recent backups kept on faster, short-term storage and the oldest moving to cheaper offline storage. The backups generated by svnadmin hotcopy (possibly from a spare server) fall into this category.

Prior to Subversion 1.8, hotcopy created a full backup of the entire repository. The operation was often slow on large repositories, requiring careful scheduling or alternative strategies. Since hotcopy is only bound by disk I/O speed, it is faster than either native Subversion replication (svnsync) or creating incremental dump files, making it a great choice for a more efficient backup strategy. And of course hotcopy retains all of its usual advantages: it makes a full backup of a running repository in the right sequence to ensure data integrity.

Though hotcopy is only part of a layered backup strategy, it is a powerful tool for Subversion administrators not enjoyed by administrators of many commercial SCM systems. hotcopy is simple, reliable, and now pretty efficient. There are no concerns over backing up several parts of a complex system in the right order.

To use the incremental mode, simply use the new –incremental option. If you want to give it a try, download the latest certified SVN binaries for Subversion 1.8.

Subversion is a registered trademark of the Apache Foundation.


SmartSVN 8 RC1 released!

Yesterday we released the first Release Candidate (RC) for SmartSVN 8.

SmartSVN is the cross-platform graphical client for Apache Subversion.

New SmartSVN 8 RC1 features include:

  • The Project menu: “Open or Manage projects” is now available without project window
  • OS X: dock icon click will reopen minimized windows
  • Reintegrate Merge: removed (as it’s no longer relevant with Subversion 1.8)
  • Upgrade: SmartSVN will convert 1.7 working copies to 1.8 format

Fixes include:

  • Refresh: file and property conflicts were not displayed at all
  • Start Up: crash on Ubuntu 13.10
  • Conflict Solver: possible modification of edited file even if modifications were rejected
  • Commit: committing a removal of a directory using svn protocol did not work

For a full list of all improvements and bug fixes, view the changelog.

Have your feedback included in a future version of SmartSVN

Many of the fixes and suggestions included in new versions of SmartSVN are raised via our dedicated SmartSVN forum, so if you’ve got an issue or a request for a new feature, head over there and let us know.

You can download RC1 for SmartSVN 8 from our early access page.

Haven’t yet started with SmartSVN? Claim your free trial of SmartSVN Professional here.