Monthly Archive for August, 2013

Subversion 1.7.13 and 1.8.3 Released

Today the Apache Software Foundation (ASF) announced the release of Subversion 1.7.13 and 1.8.3, bringing a number of fixes to each.

Apache Subversion 1.7.13 includes fixes for the following:

  • merge – fix bogus mergeinfo with conflicting file merges
  • diff – fix duplicated path component in ‘–summarize’ output
  • ra_serf – ignore case when checking certificate common names
  • svnserve – fix creation of pid files
  • mod_dav_svn – better status codes for commit failures
  • mod_dav_svn – do not map requests to filesystem

1.8.3 fixes:

  • ra_serf – fix crash when committing cp with deep deletion
  • diff – issue an error for files that can’t commit in memory
  • update – fix a crash when a temp file doesn’t exist
  • diff – continue on missing or obstructing files
  • ra_serf – include library version in ‘–version’ output
  • svnadmin – fix output encoding in non-UTF8 environments

For a full list of all bug fixes and improvements, see the Apache changelog for 1.7 and 1.8.

You can download our fully tested, certified binaries for Subversion 1.7.13 and 1.8.3 free here.

WANdisco’s binaries are a complete, fully-tested version of Subversion based on the most recent stable release, including the latest fixes, and undergo the same rigorous quality assurance process that WANdisco uses for its enterprise products that support the world’s largest Subversion implementations.

SmartSVN 8 Preview 1 Released

Yesterday we released SmartSVN 8, Preview 1. SmartSVN is the cross-platform graphical client for Apache Subversion.

New SmartSVN 8 features include:

  • Support for Subversion 1.8 working copy
  • Ability to specify different merge tools for different file patterns as conflict solvers

SmartSVN 8 fixes include:

  • Possible internal error closing a project window
  • Text editors:
    • “Autoindent new lines” did not work correctly when typing, e.g. CJK characters using an IME
    • Internal error related to syntax highlighting when using an IME

For a full list of all improvements and bug fixes, view the changelog.

Have your feedback included in a future version of SmartSVN

Many issues resolved in this release were raised via our dedicated SmartSVN forum, so if you’ve got an issue or a request for a new feature, head over there and let us know.

You can download Preview 1 for SmartSVN 8 from our early access page.

Haven’t yet started with SmartSVN? Claim your free trial of SmartSVN Professional here.

Git as a Service

Git MultiSite Solves Availability and Management Challenges

As an enterprise SCM administrator you’re a service provider to development organizations.  You may even have a formal Service Level Agreement (SLA) that identifies the allowed mean time between failures (MTBF) and mean time to recovery (MTTR) among other metrics. In layman’s terms, development teams expect their SCM system to be secure, highly available, and high-speed. So how do you provide Git as a service to a global development team?

Availability

First, you need to ensure availability:

  • Data integrity. How do you safeguard against potential data corruption?

  • Outages. How do you prevent downtime and meet your SLA requirements for MTBF and MTTR?

  • Performance. How do you scale up the Git service to handle more users, more sites, and more build automation?

Active-active replication provides a unique solution for each of these concerns. Each replicated peer node in an active-active deployment has a full copy of repository data, and each node periodically validates itself to detect any corruption. If one of the nodes falls out of sync due to subtle data corruption on its file system, you’ll see a warning in the Git MultiSite administration console.

Similarly, an active-active deployment provides failover out of the box. Git users can simply start using another node as their Git remote in the event of failure, or a load balancer can transparently redirect them to another node. This built-in failover provides a zero down time solution, yielding an excellent MTBF and MTTR.

Simple HA/DR Configuration.  Note that all five nodes in this example are replicated peers; marking them as ‘HA’ or ‘DR’ nodes is simply by convention

Simple HA/DR Configuration. Note that all five nodes in this example are replicated peers; marking them as ‘HA’ or ‘DR’ nodes is simply by convention

Finally, additional read-only nodes can be added to the deployment to service build automation, while end user traffic can be handled by adding more writable nodes. Every commit is done locally, and depending on your deployment configuration, data may never have to transfer over a WAN for the commit to complete. If you are supporting users at multiple locations with servers in several data centers, you’ll appreciate the performance and flexibility that active-active replication provides.

Supporting Users in Multiple Data Centers with Flexible Replication Groups. In this example, replication groups support granular replication to different sites based on repository type and security.

Supporting Users in Multiple Data Centers with Flexible Replication Groups. In this example, replication groups support granular replication to different sites based on repository type and security.

 

Security

Next, you need to be concerned about the security of your data. This includes:

  • Access control

  • Security of data in transit

  • Security of data at rest

  • Controlling where and how Git repositories are used

Once again, Git MultiSite covers all the bases. Git MultiSite works with WANdisco’s own access control product, and also interfaces with leading third party solutions like Gitolite. It works with secure Git transmission protocols like HTTPS and SSH, and because it is a pure Git solution with no closed back-end storage, it is compatible with disk encryption mechanisms.

Git MultiSite also offers selective replication to control where and how your repositories are accessed. Flexible replication groups can be configured to limit where sensitive repositories are replicated, and can designate read-only nodes. If you need to push a subset of your repositories to a public cloud for deployment purposes, it’s as simple as setting up a new replication group.

 

Read-only nodes used to serve deployment repositories

Read-only nodes used to serve deployment repositories

Management

Just like any other IT service, Git repositories and servers must be managed properly. You’ll need proper reporting, auditing, and administration tools.

Git MultiSite’s administration console is your entry point to a comprehensive solution. In the console you can see the status of every node and repository, define their replication behavior, and induct new servers and repositories as they become available. More detailed deployment metrics can be obtained by sending information from Git MultiSite into a collation tool like Graphite; the REST API provides a useful integration point.

 

Git MultiSite Global Management

Git MultiSite Global Management

 

Auditing can be accomplished in several ways. Git MultiSite’s logs contain a rich history of how the deployment is configured, and normal system logs (e.g. from Apache or SSHD) provide another layer of information.

 

Enterprise Git

The short answer to all of these concerns is that you need Git to be an enterprise software solution, just like your databases, email servers, and other critical infrastructure. WANdisco provides the total package. Git MultiSite solves the availability, security, and management challenges of Git, and is backed by WANdisco’s Git support and services offerings. Eventually you’ll need to pick up the phone and get help quickly for a problem, and WANdisco has a team of experts ready to help.

Contact us for a free Git MultiSite trial and get started on your way to providing an outstanding Git service to your development organization.

Git Repository Metrics

Managing Git repositories means looking ahead, not just fighting today’s fire. Keeping an eye on key Git repository metrics will keep you a step ahead – and keep your development teams happy. There are several useful predictive metrics you can look at including repository size, growth rate, number of references, number of files exceeding a size threshold, and number of operations per day. These metrics help you with hardware sizing and also help you maintain good performance. You can see if you need more Git replicas to handle clones and pulls from a new development team, or if someone is checking in too many large binary files and slowing down repository performance.

Metrics in the Dashboard

How do you go about collecting this data? Most Git reporting tools focus purely on development metrics like number of commits and developer activity. By contrast, Git MultiSite has some useful metrics built right into its administration console, viewable either graphically or in a list.

Repository Size and Activity Over Time

Repository Size and Activity Over Time

Collecting and Viewing Metrics with Graphite

The administration console dashboard gives you a quick snapshot of key metrics over time, however you may have your own reporting and analysis tools that provide a more elaborate monitoring framework. In that case you can pull data out of Git MultiSite’s REST API to feed into an external system, giving you complete control over how you use the repository metrics.

As a simple example, let’s look at how to track repository size over time using Graphite.  Graphite is an open source tool for storing and charting any type of numeric time-series metric.  Internally it uses a round-robin database that allows for flexible data storage management and purging of old data.

Collecting Data

First, I’ll write a script that uses curl to gather the latest repository statistics from Git MultiSite’s REST API, parse out the size, and feed it to Graphite using the plaintext protocol.

#!perl
use XML::Simple;
use Data::Dumper;
my $ENDPOINT = 'http://gitms1:8082/dcone/';
my $REPOSITORIES = 'repositories/';
my $PORT='2003';
my $SERVER='127.0.0.1';
my $FEED_PREFIX = 'gitms.';
my $FEED_SUFFIX = '.size';

my $rest_call = 'curl ' . $ENDPOINT . $REPOSITORIES;
my $rest_output = `$rest_call`;

my $ref = XMLin($rest_output, ForceArray=>1);
my $date = `date +%s`;

for(my $ii=0; $ii <= $#{ $ref->{repository} }; $ii++) {
   my $size = $ref->{repository}->[$ii]->{repoSize}->[0];
   my $name = $ref->{repository}->[$ii]->{name}->[0];
   my $feed = $FEED_PREFIX . $name . $FEED_SUFFIX;
   print "$name\n$size\n$date\n";
   system("echo \"$feed $size $date\" | nc $SERVER $PORT");
}

I’ll set up a cron job to run this script every 5 minutes. The script will insert a metric called gitms.<repo name>.size for each repository.

Note that there are more efficient ways to send the data to Graphite, but the plaintext protocol works well for demonstrations.

Viewing Data

Next I’ll configure a simple Graphite chart that shows the repository size over time.

Repository Size

Repository Size

Graphite can also show calculated metrics. Here I’ll look at a chart showing repository growth over time. (Specifically, the chart is showing the 7 day delta in repository size for each time point.)

Repository Growth

Repository Growth

As a Git administrator, I’d keep an eye out for unusual spikes in repository growth. These spikes may indicate an automated build system run amok, or a new project starting up. I may need to take corrective action or start planning for a capacity upgrade.

Tools like Graphite are purpose-built for metric storage and charting, so having an easy way to extract data from Git MultiSite using the REST API makes a great integration point.

Get Going

Git MultiSite provides an open and extensible management framework for your Git repositories, along with all the benefits of true active-active replication. If you’re interested in setting up a comprehensive Git monitoring system, ask for advice or start a free trial of Git MultiSite today.

Simple Subversion Benchmarking

A simple Subversion benchmarking tool included in the Subversion 1.8 release helps you sort out performance complaints from developers more quickly. Whenever someone complains about slow Subversion performance, you know there are at least three possibilities:

  • The Subversion server is actually slow, perhaps due to heavy load.

  • The user’s machine is slow. Recall that the Subversion client does some disk I/O and other processing during some operations, and the user might be running virus scanners and the like.

  • The user suffers from a slow network connection.

The svn-bench command is a lightweight Subversion client that omits most of the local processing. That makes it easier to get a real performance measurement without being affected by the user’s virus scanner or slow file system.

If you run svn-bench on the Subversion server itself you’ll get a baseline performance metric for a few Subversion operations. If that baseline seems slow, you can try to improve the server performance.

If you then run svn-bench on a client workstation, you can get a sense of the effects of network latency. If there’s little latency apparent, then the problem may lie in the user’s workstation.

For instance, I ran svn-bench null-export on the trunk of a Subversion repository. On the server itself, the real time was 4.1 seconds. On a workstation connected over a slow network, the real time was 32.5 seconds. That’s a good indicator that network latency is slowing things down. Just to confirm my suspicion, I ran a normal svn export on that workstation and the time only slowed down by a second or so, which gives me a good sense that the problem lies in the network.

svn-bench is a simple but useful tool for Subversion benchmarking. You can try it out by downloading a certified SVN 1.8 binary. If you need help with Subversion performance analysis, our team of Subversion experts can help.

 Subversion is a registered trademark of the Apache Software Foundation.

Non-Stop News

Last week, our Non-Stop NameNode received a lot of attention. First of all, we announced the next version of Non-Stop NameNode WAN Edition, which includes:

• Dynamic Group Evolution – This provides the ability to add and remove namenodes within Hadoop clusters on the fly without downtime, eliminating the need for scheduled maintenance.
• Configurable Quorum Schema – New node configurations enable increased availability and deployment flexibility for more efficient use of IT infrastructure.
• Rapid recovery for a namenode that has been down for an extended period of time.

In addition, Non-Stop NameNode received integration and interoperability certification with Dell PowerEdge servers. Customers can now deploy Apache Hadoop in mission-critical environments where processing and access to data requires continuous availability, as opposed to relying on active-passive solutions.

Non-Stop NameNode WAN Edition applies WANdisco’s patented replication technology to deliver 100% uptime by eliminating Hadoop’s most problematic single point of failure — the NameNode — providing the first and only continuous availability solution for globally distributed deployments. With it, all NameNode servers in a Hadoop cluster deployed over a WAN actively support clients at each location and along with the data nodes, are continuously synchronized. The result is LAN-speed performance and access to the same data at every location. Failover and recovery are automatic both within and across data centers. Whether a single NameNode or an entire site goes down, Hadoop is always available.

Securing Your Data with Selective Git Replication

Git MultiSite Gives You Control Over Where Your Data Ends Up

If you administer Git for anything other than a personal project, you’ll wind up thinking about replication – and then you’ll wind up thinking about securing your Git data during the process.  Git MultiSite is the first Enterprise Git management system that lets you control both where and how your data is available.

To recap, there are a lot of reasons why you’ll want to replicate Git data:

  • You need a non-stop data solution with zero down time (high availability and disaster recovery).

  • You support development teams at different sites and they all need good performance.

  • You’ve invested in a continuous integration system to support Agile and continuous delivery, and it’s putting a strain on your Git repositories.

  • Your company has grown by organic expansion or acquisition and your SCM infrastructure needs to scale up to support the larger user base.

Whatever the reason, you’ve realized that you need highly available Git data. Git MultiSite is the only active-active replication solution that supports truly distributed development – but putting that aside for now, you also need to think about the security of your data as it moves around the world. [1]

There are three key questions to consider.

  1. Where should each repository be available? You may have a very sensitive repository that should not be available to partner sites in different locations. You may want to limit which repositories are available in a public cloud environment that’s used for deploying production app servers; typically only the repositories that contain your runtime configuration and environment settings belong there. Alternatively, you may not need every repository available at every location and don’t want to waste the bandwidth.

  2. How is each repository being used at each site? Should a repository be writable, or should it only be available as a read-only resource for build farms and downstream consumption?

  3. How easy is it to manage the problem? As your deployment grows from a few Git repositories to a few hundred, how are you going to monitor and audit your replication strategy?

Git MultiSite has selective replication and effective management tools baked in, so it provides an out-of-the-box answer to all three questions.

Where Does This Repository Go: Defining Replication Groups

Git MultiSite lets you define one or more replication groups to manage your deployment. A replication group is a flexible way to define how the replicated peer nodes in your MultiSite deployment share data.

As a simple example, assume that the deployment has five nodes in total, one each in Boston, Seattle, London, Sydney, and Chennai. Boston and London are the primary offices; Seattle and Sydney are data centers used for deploying production app servers; and Chennai is a partner site.

I might set up three replication groups.

  • Default Group replicates to all of the development sites – Boston, London, and Chennai.

  • Proprietary Group contains repositories with sensitive IP, and only replicates to the primary offices in Boston and London.

  • Deployment Group contains repositories with runtime configuration and environment data like Puppet manifests. It replicates to the development sites and the data center sites.

Replication Groups

Replication Groups

How Is the Repository Used: Refining Replication Groups

WANdisco’s Distributed Coordination Engine (DConE) distinguishes between several types of replicated peer nodes. The most common type is an active voter, which participates in transaction proposals and can accept write activity. Another type is a passive node, which receives all repository data but will not accept write activity.

In the example in the previous section, the two data center nodes in the Deployment Group are passive nodes. They are necessary to provide runtime data to the production servers in the data centers, but any changes are made at the development sites.

Different Node Types

Different Node Types

Management and Auditing: Easy Administration, Central View

Git MultiSite provides easy central management of replication groups. The administration console, available with proper authentication from any site, first provides a single view of all the nodes in the system.

Global View

Global View

The console provides a simple graphical tool for setting up replication groups, where you can define which nodes belong to a group and how they are used.

Managing Replication Groups

Managing Replication Groups

And finally, there’s a quick list of the repositories belonging to each replication group.

Replicas in Group

Replicas in Group

The entire configuration is captured in the audit logs.

Audit log snippet…

< X-Jersey-Trace-006: matched resource method: public javax.ws.rs.core.Response com.wandisco.application.rest.resources.ReplicationGroupResource.createReplicationGroup(com.wandisco.application.dto.ReplicationGroupDTO)
INFO: 4984 * Server in-bound request
4984 > GET http://gitms1.demo.wandisco.com:8082/dcone/nodes/local

Total Control Over the Non-Stop Data

WANdisco provides non-stop data solutions, but we haven’t forgotten about the administration and security side of the picture. Git MultiSite gives you complete control and visibility over where and how your repositories are used.

 [1]  For the purposes of this discussion, consider the problem of secure transmission solved by Git’s use of either SSH or HTTPS as transmission protocols.

Versioned Access Control in Subversion 1.8

Managing and monitoring access control just got a little easier thanks to the introduction of versioned access control files in Subversion 1.8. You can now store the authz file often used to govern repository access when Subversion is running over Apache or svnserve.

The easiest way to try this is to check in your authz file, then reference it in the server configuration using relative repository syntax. Let’s say I have it in the repository under the path svn://repo-host/protected/authz. I would then refer to it in svnserve.conf:

authz-db = ^/protected/authz

You should, of course, make sure that only authorized users can see and change the authz file. You may worry that you’ll lock yourself out of the repository if you make a mistake that denies all write access to the authz file, but you can always temporarily switch Apache or svnserve back to using a local authz.

If you manage several related repositories, you can store all of their authz files in a central management repository, and refer to the authz files with local file syntax. In this case, all of the repositories must have access to the same file system.

With this change, Subversion takes one step closer to the ideal of ‘infrastructure as code’, taking a lesson from the DevOps space. In many ways, your SCM configuration is as important as the data in the SCM system itself, so capturing this data in the SCM system is simply good practice.

Grab a certified SVN 1.8 binary today and give it a try.

Subversion is a registered trademark of the Apache Software Foundation

SmartSVN 7.6 – It’s All About Performance

We’re pleased to announce SmartSVN 7.6 is now available to download. SmartSVN is the cross-platform graphical client for Apache Subversion.

The focus with 7.6 has been performance, performance and more performance. Responding to customer feedback, we’ve worked to make 7.6 faster and lighter than its predecessors.

New SmartSVN 7.6 features include:

– Auto-update – there is no need to install new versions manually

– Repository Browser – defined svn:externals are shown as own entries

– proxy auto-detection

– external tools menu

– OS X retina support

– Project data is saved on project creation rather than when exiting

GUI improvements include:

– file/directory input fields – support for ~ on unix-like operating systems

– natural sorting (“foo-9.txt” before “foo-10.txt”)

– more readable colors on Transactions and other panes

SmartSVN 7.6 fixes include:

– speed-search – possible internal error typing Chinese characters

– Revision Graph – errors when deselecting all branches

– Tag Browser – possible internal error

– SVN operations – significant performance improvements

– Check Out – checking out to an already versioned directory appeared to work, then failed later

– Refresh – possible performance problems and a fix for displaying conflicts at drive root

– Issues with migrating settings and auth credentials from pre-7.5 versions

– Foundation edition: changing the project root was not possible

For a full list of all improvements and bug fixes, view the changelog

Contribute to Future Releases

Many features and enhancements in this release were due to comments made by users in our dedicated SmartSVN forum, so if you’ve got an issue, or a request for a new feature, head over there and let us know.

Supporting Git Build Automation with Git MultiSite

The two building blocks of continuous delivery are version control and automation – and boy, do you need a lot of automation. If you’re serious about building every commit and then running a progressively tougher sequence of test and deployment steps in your delivery pipeline, you should also make sure that your version control infrastructure doesn’t start creaking under the load. That’s particularly true for Git, since cloning repositories and fetching the latest data can be intensive operations. So let’s look at supporting Git build automation.

The natural first step is to use a replicated Git repository to serve the build farm. That way the load from the build servers doesn’t impact the human users on the main repository. Git MultiSite provides the perfect solution:

  • New replicated peer nodes can be added at any time to serve a growing build farm.

  • The nodes serving the build farm can be designated as read-only for better performance and security.

  • Like any Git MultiSite node, a node serving a build farm is easily managed from the Git MultiSite administration console.

 

Git MultiSite Deployment for Build Farm

Git MultiSite Deployment for Build Farm

Git MultiSite Replication Management

Git MultiSite Replication Management

Sometimes your build processes can’t just use a read-only replica. They may check in modified configuration data, or simply create a tag. Do you let your main Git repository serve the build farm, or try to make the CI system use a different Git remote for write operations? After all, very few Git management solutions even offer a pass-through mirror (a mirror that forwards write activity to the master repository).

Git MultiSite’s active-active replication solves this performance challenge as well. You can use an active replicated peer node(s) to serve the build farm.

Git MultiSite Deployment for Build Farm - Commits Accepted

Git MultiSite Deployment for Build Farm – Commits Accepted

 

In this configuration, all read requests are purely local, and any writes are coordinated with the other Git nodes in the deployment. It’s really that seamless, since each Git MultiSite node is a fully writable peer. Git MultiSite enables the ideal configuration: build automation doesn’t put load on the Git nodes that service your developers, yet your build processes can still push data if necessary.

Finally, Git MultiSite’s flexible replication groups give you the ability to provide more peer nodes for build farms on busier projects. Less active projects are not replicated to as many nodes, saving valuable bandwidth and hardware resources.

Replication Groups Support Projects with Different Loads

Replication Groups Support Projects with Different Loads

 

Supporting build automation is another example of the fantastic flexibility and scalability achieved with active-active replication. If you’re ready to set up an enterprise continuous delivery system built on Git, contact WANdisco for expert advice and a free trial of Git MultiSite.