The architecture of GitLab running with Git MultiSite is worth exploring. In the interest of saving a thousand words, here’s the picture.
As you can see, the topology is quite a bit more complex when you use a Git repository management system that uses multiple data stores. Git MultiSite coordinates with GitLab to replicate all repository activity, including wiki repositories. Git MultiSite also replicates some important files like the GitLab authorization files for access control.
As for the other data stores, we’re relying on GitLab’s ability to run with multiple web apps connected to a single logical relational database and a single logical Redis database. They can be connected directly or via pass-through mirrors. Kudos to the GitLab team for a clean architecture that facilitates this multi-master setup; they’ve avoid some of the nasty caching issues that other applications encounter. This topology is in fact similar to what you can do with GitLab when you use shared storage for the repositories. Git MultiSite provides the missing link: full repository replication with robust performance in a WAN environment and a shared-nothing architecture.
Short of relying completely on Git as a data store for code reviews and other metadata, this architecture is about as clean as it gets.
Now for some nuts and bolts…
We are making some simplifying assumptions for the first release of GitLab integration. The biggest assumption is that all nodes run all the software, and that all repositories originate in GitLab and exist on all nodes. We plan to relax some of these constraints in the future.
And what about performance? Well, I’m happy to relate that you’ll see very good performance in all cases and much improved performance in some cases. Balancing repository activity across several nodes gives better throughput when the system is under practical load.
Well, that picture saved a few words, but nothing speaks better than a demo or a proof-of-concept deployment. Contact us for details!