Gerrit Code Review or Github’s fork and pull ? Take both !

When searching on Google with the keywords “Gerrit” and “GitHub” you find lots of different links with more questions than answers; see below a selection of the most interesting ones:

And additionally Linus Torvalds, the father of the Git version control, whilst keeping the Kernel source on GitHub, expressed explicitly in his own way what he thinks about Pull Requests.

Google decided to use a different tool than GitHub and developed Gerrit Code Review for managing the community effort of developing the Android Operating System, mainly for two reasons:

  1. GitHub pull requests model wouldn’t have worked for Android: forking the projects several thousands times would have been just unsustainable. Google recognized that early on and Gerrit was developed with the “not like GitHub pull request” requirement.
  2. GitHub is not (and today has no plans to become) OpenSource

There are for sure additional reasons why even today and even if GitHub would decide to become OpenSource in the future a long set of features that GitHub would be needed in order to support a large-scale cooperative project !

What is Gerrit Code Review today ?

Today Gerrit is much more than the Android OS review tool ! There are around 80 contributors  growing over time and from both large industries and OpenSource projects. SAP, Sony Mobile and Qualcomm IC are amongst the most active companies contributing to the tool whilst from the OpenSource community there are LibreOffice, Openstack and Wikimedia.

What is the right choice then ? red pill or blue pill ? Open or commercial ?

We thought about the problem very deeply at GerritForge.com as some of our customers decided to completely quit GitHub, mainly for security and confidentiality reasons but others moved into the opposite direction as well embracing GitHub:Enterprise.

In a nutshell the criteria that drove those customers into one (GitHub) or another direction (Gerrit Code Review) were based on the following aspects.

Security.

  • GitHub: history quite weak because of its architecture mainly based on Ruby (or let’s say a naive implementation based on Ruby, as the language itself is not so weak from a security perspective). Problem was solved but raised many concerns in the industry on how many more security problems are still to be found.
  • Gerrit: completely written in Java and with Security in mind. Large corporations such as SAP, Sony Mobile, Qualcomm and many other enterprises, organisations and non affiliated individuals/volunteers contributed to the review and development of the code-base. OpenSource and community code inspection has always the golden rule for very secure projects (e.g. OpenSSH and OpenSSL are widely reviewed and OpenSource) and code-obsucurity has always been a security anti-patterns.

High availability.

  • GitHub: it has been historically very reliable, especially at the beginning. When it started to become popular and saw its traffic to increase exponentially started to be rather unreliable because of several repeated DDoS attacks. GitHub:Enterprise is a proprietary-locked VM that can be installed on-premises but not on a private / public Cloud.
  • Gerrit: differently from GitHub, it is not a service and can be hosted either on your private / public Cloud or on-premises. Google has some instances in his own distributed cloud network around the world and managed with high availability in mind for Android OS development and other OpenSource projects (and for Gerrit self-hosting of course). Google’s deployment has not been impacted by DDoS attacks so far and its physical deployment is protected by the standard Google DataCenters network security. Other deployments are either private or distributed around different projects’ sites.

Usability.

  • GitHub: the key of the success of GitHub is its amazing user-experience and the ability to push the OpenSource development to a new level of social collaboration ! We all need to be grateful to GitHub for having made the OpenSource development ever more interesting and fun for the masses.
  • Gerrit: the user interface is functional but not “shiny” or “attractive” as a modern social collaboration platform should be. In a nutshell Gerrit does not want to be a developer’s social network but rather targets its specific objective of managing Code and Projects across large teams. This is the reason why large OpenSource communities such as the Eclipse foundation embraced Gerrit.

Scalability

  • GitHub: based on C-Git implementation (using the GitHub libgit2 library) that works very well with small repositories. However when the number of BLOBs and Packs increases the effort of counting them through the repository history grows linearly over time (*). With regards to the number of repositories, GitHub demonstrated to be capable of being very effective in distributing the data cross their nodes and sharing BLOBs for limiting the disk-space needed for forked repositories.
  • Gerrit: the R&D folks working at Google have invested a lot of time in optimising JGit for large repositories and a large number of users accessing them. The latest excellence of their performance improvements is represented by the JGit bitmap implementation (thanks to the fantastic work by Colby Ranger). Those optimisations however are not present in the C-Git code-base used by GitHub. With regards to the number of repositories the largest installation I have ever seen has less than 50K projects: it has never been used or tested with millions of repositories AFAIK.

(*) Note from Shawn Pearce on this topic: “Its just crazy slow per object, the C implementation discovers around 70k objects/second. 3M objects takes 42 seconds at best, the truth is the rate of new object discovery slows as it goes further back in history, which is why counting 3M objects takes modern machines minutes. GitHub has tried porting the bitmap code to C. Its running in some limited cases on their site, at one time https://github.com/torvalds/linux/ had it enabled. We haven’t seen updated patches for it, and it looks like its disabled again.”

Code Review

  • GitHub:  uses the fork + pull model. In a nutshell every user always pushes to its own “forked version” of the repository and, once the changes are ready, request the source repository owner to pull its changes. Works very well for projects where there is a single approver of all the incoming changes and the GitHub user-interface is simply amazing in the way that changes are displayed and navigated in a unified-diffed view making the multiple commits review a simpler task.
  • Gerrit: being designed for projects with many contributors and committers, do not embrace at all the fork + pull model. It would have been simply unmanageable having hundreds of thousands of forked version of Android OS code-base ! The Gerrit workflow is mainly derived then from the Android OS contribution workflow: each contribution is defined as “Change”, has a unique ID (Change-ID) and is composed by a set of Patches (Patch-Set) of candidate changes. When the latest Patch-Set reaches the necessary score to be approved (Code-Review +2 and Validate +1 for the Android OS workflow) then it can be merged.

Why not using Gerrit and GitHub together ?

This is not a new idea as it has been proposed and successfully implemented by some popular OpenSource projects such as:

The benefits of using both tools are twofold.

From the features and performance perspective the projects can benefit from the Gerrit JGit engine and associated Code Review capabilities. Gerrit Code Review model may seem less friendly than GitHub’s Pull Request but eventually generates a more readable and maintainable code-history, essential for long-term products in production.

From the point of view of accessibility and social community, the fact of using GitHub allows WikiMedia and Openstack to have an extended reach and at the same time even off-load all the clone traffic to GitHub nodes instead of their Gerrit servers !

Why GerritHub ? What is the value added by the platform ?

We thought about creating GerritHub about 2 years ago, when we first discussed with Kohsuke Kawaguchi, the adoption of Gerrit for the Jenkins Continuous Integration project. He liked Gerrit at first sight when he joined the Git Together in 2011 @Mountain View but at the same time he was concerned about the loss of reach and ease of use of GitHub.

The integration between the two tools was technical possible but challenging and needed some significant set of Gerrit skills to be implemented correctly, including the integration between the Pull Request model and the Gerrit Code Reviews.

GerritHub is the first Gerrit-powered platform that offers the best of Gerrit 2.8 (current master release) integrated with GitHub SSO (using OAuth 2.0) and replicated to GitHub repositories and Pull Requests. Differently from the WikiMedia and Openstack implementations, it is a self-service platform and anyone who has a GitHub account and repositories can self-register at GerritHub and use it for its own OpenSource projects !

Summary.

There is no winner in the battle between GitHub and Gerrit because they are simply different tools for different audiences. There are cases where the needs are mixed and both can provide a valid platform for the purpose of the projects.

Gerrit has been historically a niche tool, confined to the Android OS development: now things are different and major OpenSource projects adopted it as standard. However the need of a “public GitHub presence” was needed and has been implemented.

GerritHub gives you the choice of taking and using the best of both !

Learn more about Gerrit Code Review and GerritHub.

Gerrit Code Review home:
http://code.google.com/p/gerrit/

One-click sign-In and auto-registration to GerritHub:
https://review.gerrithub.io/login

Book about Gerrit Code Review:
http://gerrithub.io/book

Leave a comment