Securing Your Data with Selective Git Replication

Git MultiSite Gives You Control Over Where Your Data Ends Up

If you administer Git for anything other than a personal project, you’ll wind up thinking about replication – and then you’ll wind up thinking about securing your Git data during the process.  Git MultiSite is the first Enterprise Git management system that lets you control both where and how your data is available.

To recap, there are a lot of reasons why you’ll want to replicate Git data:

  • You need a non-stop data solution with zero down time (high availability and disaster recovery).

  • You support development teams at different sites and they all need good performance.

  • You’ve invested in a continuous integration system to support Agile and continuous delivery, and it’s putting a strain on your Git repositories.

  • Your company has grown by organic expansion or acquisition and your SCM infrastructure needs to scale up to support the larger user base.

Whatever the reason, you’ve realized that you need highly available Git data. Git MultiSite is the only active-active replication solution that supports truly distributed development – but putting that aside for now, you also need to think about the security of your data as it moves around the world. [1]

There are three key questions to consider.

  1. Where should each repository be available? You may have a very sensitive repository that should not be available to partner sites in different locations. You may want to limit which repositories are available in a public cloud environment that’s used for deploying production app servers; typically only the repositories that contain your runtime configuration and environment settings belong there. Alternatively, you may not need every repository available at every location and don’t want to waste the bandwidth.

  2. How is each repository being used at each site? Should a repository be writable, or should it only be available as a read-only resource for build farms and downstream consumption?

  3. How easy is it to manage the problem? As your deployment grows from a few Git repositories to a few hundred, how are you going to monitor and audit your replication strategy?

Git MultiSite has selective replication and effective management tools baked in, so it provides an out-of-the-box answer to all three questions.

Where Does This Repository Go: Defining Replication Groups

Git MultiSite lets you define one or more replication groups to manage your deployment. A replication group is a flexible way to define how the replicated peer nodes in your MultiSite deployment share data.

As a simple example, assume that the deployment has five nodes in total, one each in Boston, Seattle, London, Sydney, and Chennai. Boston and London are the primary offices; Seattle and Sydney are data centers used for deploying production app servers; and Chennai is a partner site.

I might set up three replication groups.

  • Default Group replicates to all of the development sites – Boston, London, and Chennai.

  • Proprietary Group contains repositories with sensitive IP, and only replicates to the primary offices in Boston and London.

  • Deployment Group contains repositories with runtime configuration and environment data like Puppet manifests. It replicates to the development sites and the data center sites.

Replication Groups

Replication Groups

How Is the Repository Used: Refining Replication Groups

WANdisco’s Distributed Coordination Engine (DConE) distinguishes between several types of replicated peer nodes. The most common type is an active voter, which participates in transaction proposals and can accept write activity. Another type is a passive node, which receives all repository data but will not accept write activity.

In the example in the previous section, the two data center nodes in the Deployment Group are passive nodes. They are necessary to provide runtime data to the production servers in the data centers, but any changes are made at the development sites.

Different Node Types

Different Node Types

Management and Auditing: Easy Administration, Central View

Git MultiSite provides easy central management of replication groups. The administration console, available with proper authentication from any site, first provides a single view of all the nodes in the system.

Global View

Global View

The console provides a simple graphical tool for setting up replication groups, where you can define which nodes belong to a group and how they are used.

Managing Replication Groups

Managing Replication Groups

And finally, there’s a quick list of the repositories belonging to each replication group.

Replicas in Group

Replicas in Group

The entire configuration is captured in the audit logs.

Audit log snippet…

< X-Jersey-Trace-006: matched resource method: public
INFO: 4984 * Server in-bound request
4984 > GET

Total Control Over the Non-Stop Data

WANdisco provides non-stop data solutions, but we haven’t forgotten about the administration and security side of the picture. Git MultiSite gives you complete control and visibility over where and how your repositories are used.

 [1]  For the purposes of this discussion, consider the problem of secure transmission solved by Git’s use of either SSH or HTTPS as transmission protocols.

0 Responses to “Securing Your Data with Selective Git Replication”

  • No Comments

Leave a Reply