Using MultiSite Replication to Facilitate a Global Mainline
Although Git is a distributed version control system (DVCS), it can support almost any style of software configuration management (SCM) workflow. The lines between the four prominent workflows in the Git user community can be blurry in implementation, but there are important conceptual differences between them. Understanding these differences is important when considering the use of Git workflows and continuous delivery in your organization.
After an introduction to these workflows, we’ll evaluate how they match up against continuous integration and continuous delivery best practices, and then look at their application with global software development teams.
Fork and pull
In this model, a developer will fork (clone) a Git repository and work independently on their own server-side copy. When the developer has a change ready to contribute, he/she will ask the upstream maintainers to pull the changes into the original repository.
This model originated in open source projects and is prominent in that community. Contributors to open source projects may not even know one another and rely on a trusted set of upstream maintainers to review any contributions.
In this model, new branches are made for each feature (also called a task or topic) and are sometimes shared with the master repository. When changes are approved they are merged to the mainline (master) branch.
This model suits many small teams, as they are able to collaborate in a single shared repository yet still isolate new work to an individual or a small group. Functionally it is very similar to fork-and-pull but a feature branch usually has a shorter lifespan than a forked repository.
In this model, most work is committed directly to the trunk (master branch). There are few, if any, long lived branches less stable than the trunk. Long lived branches are sometimes used for release maintenance. Developers are encouraged to commit to the trunk frequently, perhaps daily. Local branches and stashes can be used for pre-flight review and build, but are not promoted to the shared repository.
The mainline model is strongly recommended in continuous integration and continuous delivery paradigms. It encourages very frequent reconciliation of new work, preventing any buildup of merge debt. Following this model, work is merged and up to date on a regular basis and available for testing and possible deployment.
The mainline model scales to large teams in enterprise settings but requires a high level of development discipline. For example, large new features must be decomposed into small incremental changes that can be committed rapidly. Furthermore, incomplete work may be hidden by configuration or feature toggles.
There is often a fine distinction between practical use of the mainline model and the feature branch workflow. If feature branches are personal, local, and short lived, they are consistent with the mainline model. However, use of a formal promotion process (merge request) versus a pure push can slow down the pace of commits. If every developer commits once a day, all of those commits would need a human review.
“Git Flow” is a popular model developed by Vincent Driessen. It recommends a long lived development branch containing work-in-progress, a stable mainline, and feature, hot fix, and release branches as necessary. It is somewhat similar to a mainline model with long lived integration branches and feature branches.
Unlike the mainline model, however, the Git flow model violates some of the precepts of continuous integration. Notably, work may be left on the development branch or feature branches, not integrated with the latest changes on the mainline, for a long period of time. Nonetheless this model is often a comfortable transition for teams new to Git and continuous integration. It may also feel more natural for products with a clear distinction between stable development and production code, as opposed to SaaS products that deliver new changes daily.
Application to Continuous Delivery
Continuous delivery indicates that each commit is a potential release candidate. Building on continuous integration principles, each commit is merged into the trunk and subjected to a progressively more difficult series of test and verification steps. For example, a commit may run through a pre-flight build, unit testing, component testing, performance testing, staging deployment, and production deployment. The latter stages are more expensive and time consuming, and may even involve human review. A commit that passes all the stages is available to deploy (but is only deployed when the business is ready). A failure must be addressed as soon as possible.
To view it in another light, continuous delivery tries to reduce isolation by vetting and surfacing new work as quickly as possible. Important new features are not hidden in forks or branches for weeks – they are integrated, tested, and made available to the business as soon as possible.
As noted earlier, the mainline model is best suited to continuous delivery and is strongly recommended in the literature. Eliminating long lived development branches ensures that every change is tested and integrated quickly, delivering value to the business frequently. It also enforces good habits like decomposing stories and features into incremental tasks that are less likely to cause breakages.
The fork and pull model can leave changes isolated in other repositories for long periods of time, and often involves a gated promotion process. It is the workflow least suited to rapid development in large enterprise teams.
The feature branch workflow occupies a middle ground. If the feature branches are local and short lived, they effectively serve as private staging areas. The promotion process (merge request) should be automated as much as possible with little human intervention.
Git flow is a workable model but introduces a second long-lived branch, putting distance between development and deployment.
Consider adoption of the mainline development model as advocated by the continuous integration paradigm. Committing once a day to the trunk is a sea change for developers used to working on isolated branches (or forks) for long periods of time. Though developers may be skeptical, the risk and discomfort are mitigated by:
Running rigorous pre- and post-commit tests if you have the latest code and dependencies and can rely on fast continuous integration.
Being able to pull updates quickly several times a day.
Being able to commit quickly, particularly if a prior commit introduced a breakage and you must fix it or roll back.
Reducing the risk and discomfort of the mainline model imposes several demands of this nature on the SCM system. These demands are even more challenging when you are working with several teams in different locations; you have many more contributors, and the product is assembled from multiple components.
These scaling and infrastructure challenges illuminate the isolation that often arises from working in a large distributed environment. Data may be local to or effectively mastered at one site; and all the complications of working over a WAN will hinder performance and slow down the development tempo.
Global software development on complex projects is common to enterprise software development and complicates the adoption of continuous delivery. In order for a set of large distributed teams to adopt continuous delivery and the mainline model, they must have the tools to overcome data isolation of all kinds:
A version control infrastructure that allows a developer at any site full access to the latest source code with the ability to commit frequently.
The ability to set up continuous integration (build and test) infrastructure that operates well under heavy load at multiple locations.
The support to cope with tens or hundreds of repositories containing the product components, configuration data, environment settings, and other necessary material.
In short, the mainline model reduces isolation introduced by non-optimal codeline models (i.e. new work lingering in long lived branches) to make sure that new work is available quickly. Development teams need the support of a solid SCM infrastructure to adopt the mainline model and avoid the isolation that often comes from working in large distributed teams.
Solving Continuous Delivery Challenges for Global Development with MultiSite Replication
An SCM system that only functions well in a LAN environment under moderate load will not suffice for global development projects. A simple master-slave data replication scheme will not overcome the complexities of operating in a large distributed environment.
Only a true active-active replication system can scale up an SCM system to cope with continuous delivery for a global distributed software organization. With active-active replication as provided by WANdisco’s family of MultiSite products, each node in the system is a peer, usable for any operation at LAN speeds.
With an active-active replication system, teams at all sites are first class citizens and can use and access key data with no latency bottlenecks.
Likewise, additional peer nodes can handle the load imposed by larger teams of contributors and the associated build and test automation.
Since the system is self-healing with automated failover and high availability, there is no risk of down time due to maintenance windows, hardware failures, or network outages.
Selective replication means that an administrator can choose which repositories are replicated to which sites. Repositories with production environment data may only be replicated to sites that interact with runtime servers, for example.
The MultiSite administration console provides global visibility across all servers and repositories, making it easier to coordinate a product assembled from several components kept in separate repositories.
Git can support many development workflows. The mainline model is considered optimal for continuous delivery.
The code in the SCM system delivers value to the business when it is available to the customer. Continuous delivery is a set of practices designed to reduce the isolation of the data and get it to customers sooner. Active-active replication fully supports the mainline model and other continuous delivery best practices by making the data available when and where it is needed throughout the delivery pipeline.