14 years of JGit/EGit Code Reviews migrated to GerritHub

Posted on November 21, 2023 by Git and Gerrit Code Review for the Enterprise

21 November 2023 (Sunnyvale, CA) – GerritForge Inc. the leader in Gerrit Code Review Enterprise Support, has successfully re-hosted the Eclipse JGit/EGit projects on GerritHub.io, preserving 14 years of the repository history, including all changes, reviews and comments. Everything that has been produced and was historically available on the https://git.eclipse.org/r website is now fully available on https://eclipse.gerrithub.io.

From repo.or.cz to Eclipse

Shawn Pearce (RIP) started the JGit project back in 2006 on repo.or.cz and later joined Google in 2008 where he was given the task to adapt the Gerrit Rietveld Code Review tool for the development of the Android Operating System.

Later in 2009 Shawn started the dogfooding practice by also re-hosting the project on Gerrit Code Review instance, kindly offered to the Eclipse Foundation as self-hosting of the Eclipse plugin for Git (i.e. EGit) and its 100% pure Java implementation of the Git protocol and data format (i.e. JGit). The URL of the self-hosted dogfooding Gerrit instance was https://egit.eclipse.org which was later exposed as https://git.eclipse.org/r.

Here is the first Gerrit change https://git.eclipse.org/r/c/egit/egit/+/1 hosted on the first Gerrit Code Review Server Shawn Pearce and Matthias Sohn hosted ourselves on a vserver we got from Eclipse foundation.

Since then, the Gerrit Code Review project has massively evolved, and Google adopted the tool for all its Open-Source projects in a highly available multi-site and multi-domain setup across the globe. Noteworthy examples are https://gerrit-review.googlesource.com, https://android-review.googlesource.com and https://chromium-review.googlesource.com.

Project growth on Eclipse

The Eclipse Foundation started to encourage all of its projects to adopt Gerrit Code Review, which became the main hub where all the other Open-Source components and contributors were uploading their code and collaborating.

Today, the https://git.eclipse.org/r site hosts over 1300 repositories and tens of thousands of contributors and reviewers.

The risks of the announced shutdown

The Eclipse Foundation started looking at more comprehensive hosting solutions well beyond pure Git hosting and associated Code Review, including GitHub and GitLab and started using them side-by-side with their existing https://git.eclipse.org/r.
In November 2021, the organisation decided to shut down the Gerrit Code Review instance giving as alternatives to migrate the projects to either GitHub or GitLab.

Although both GitHub and GitLab would have offered to keep the code history of all projects, the review information would have been completely lost. Gerrit Code Review has a JSON format (code-named NoteDb) for storing all the review comments together with the repository so that code and review meta-data can be kept safe in the same place. However, GitHub and GitLab have a more traditional relational DBMS approach and would have been unable to render Gerrit’s NoteDb.

If the project would have migrated to GitHub or GitLab, they could have created three main issues:

All the review history would have been formally accessible in the repository but not visible on the GitHub or GitLab UI
All associations between the NoteDb data and the committers’ identity would have been lost.
New reviews of the code developed on GitHub or GitLab UI would have been stored on a server-side relational DBMS.

GerritForge offers to rescue 14 years of review data

GerritForge, the largest contributor to the Gerrit Code Review project outside of Google, leader of the Gerrit Code Review Enterprise Support, launched a new dogfooding project called GerritHub.io back in 2013 with the aim of providing the richer Code Review experience of Gerrit on top of every GitHub repository.

The main goal of GerritHub.io was to enable anyone who has a public or private repository on GitHub to use Gerrit Code Review on top of their existing data. All the authentication, authorisation and publishing of the repository stay on GitHub, whilst GerritHub.io provides the Code Review and collaboration experience.

Because the Eclipse Foundation offered GitHub as one of the alternatives to https://git.eclipse.org/r, GerritHub.io was the most likely candidate to achieve a win-win situation:

The Eclipse Foundation‘s win: they have been able to shut down https://git.eclipse.org/r and save on hosting and maintenance costs.
The projects’ win: all their repositories would have been moved to GitHub, and all existing 14 years of review history and new reviews would be accessible through GerritHub.io

The migration project from git.eclipse.org/r to eclipse.gerrithub.io

The migration journey started six months ago, when Matthias Sohn, the project leader of JGit and EGit, announced on the Eclipse Foundation issue tracker that he was planning to use GerritHub.io as Code-Review frontend for his migrated projects in GitHub.

The project was made possible thanks to the introduction of the “importing feature” in Gerrit v3.7, where projects can be moved between Gerrit instances by keeping their change numbers, accounts identities mapping and all associated review data.

Using existing GitHub projects on GerritHub.io is straightforward, and anyone can get started in a matter of minutes; however, the Eclipse Foundation case was more complex because of multiple additional requirements:

Custom validation of incoming Git commits authors against the Eclipse ECA policies. The Foundation had developed a custom plugin on Gerrit Code Review that needed to be amended to be suitable for a shared-hosting platform like GerritHub.io.
Virtual isolation of the Eclipse Foundation projects from all the other 56k repositories on GerritHub.io. All the repositories that were migrated from the legacy https://git.eclipse.org/r needed a new “home page” in GerritHub.io called https://eclipse.gerrithub.io
The Eclipse Foundation needed the configuration of specific OAuth scopes and permission tailored to the roles of the Eclipse Foundation contributors and reviewers.

Last but not least, the migration from https://git.eclipse.org/r to https://eclipse.gerrithub.io needed to be completed with zero downtime and minimal disruption for the existing committers and contributors to the project. Therefore, a classic “big-bang” migration with a planned outage was not an option.

Gerrit multi-site and the enablement of smooth migration paths

Gerrit Code Review has been multi-site at Google for many years, but that deployment was limited to the forked version hosted in Google’s data centres.
GerritForge and the rest of the Open-Source community have invested a lot into publicly available multi-site support since 2018, and it is currently able to provide an equivalent solution on a standard infrastructure, leveraging a global-refdb and events-broker off-the-shelf.

Being multi-site means that the “logical domain” (e.g. eclipse.gerrithub.io), instead of being served by a set of hosts in a single data centre, it can point to different locations across the globe, all active at the same time and accept read/write operations, such as Git push, clone, fetch and code-reviews. The full design of the solution is available on the multi-site plugin repository

When two users are pushing code at the same time to two different sites, Gerrit will check the destination refs against the SHA1 stored in the global-refdb and will coordinate the transactions to avoid ending up in a split-brain situation. Synchronisation between sites is achieved using the pull-replication plugin.

Gerrit Code Review is designed to be future-proof, thanks to a clear separation and contract between the front end and the backend REST-API. That allows a smooth blue-green migration between releases because every release of Gerrit is forward and backwards compatible with its next release +1. For example, GerritHub.io is running two different versions of Gerrit Code Review on different sites as we speak: v3.8.2 in the US and Canada (https://review-am.gerrithub.io) and v3.9.0-rc5 in Europe (https://review-eu.gerrithub.io), without anyone noticing any disruption. Each site progresses towards newer releases bi-weekly whilst the overall service remains active.

Project-based migration from git.eclipse.org to eclipse.gerrithub.io

Gerrit projects include all the commits and meta-data in the same repository and, therefore, have the perfect design to allow an easy migration between servers. However, there are some gotchas:

Every Gerrit server has a server-id associated with it, which is used to “tag” every change. That prevents Gerrit from parsing and indexing data that does not necessarily belong to the server.
Every NoteDb meta-data record is strictly decoupled from any Personal Identifiable Information (aka PII), including the full name and e-mails of the authors, committers, owners and reviewers of the changes under review. The lookup between the anonymised identity (aka account-id) and the PII is contained in a centralised repository called ‘All-Users.git’, which isn’t accessible.
Every change has a unique incremental number associated with it, the change number. The numbering sequence is unique per Gerrit server, but when moving projects between different servers, you may have numbering conflicts.

Luca Milanesio and Matthia Sohn, both maintainers of the Gerrit Code Review project, have cooperated to find solutions to all three problems and have included them in Gerrit v3.7 onwards.

GerritForge has configured the server ID of git.eclipse.org as an “external imported server ID” so that every project coming from the Eclipse Foundation can be parsed and indexed. Its review metadata is rendered on the UI.

The identities are mapped using the public REST-API https://git.eclipse.org/r/accounts/NN/detail, which allows the association of GerritHub users with the legacy Eclipse Foundation account IDs matched by e-mail address.

With regards to the change numbers, the legacy sequence numbers coming with https://git.eclipse.org/r are in conflict with the changes on GerritHub.io; see, for example, https://review.gerrithub.io/5819 and https://git.eclipse .org/r/5819, both valid change numbers but pointing to different projects on different servers.
GerritForge has developed a new ad-hoc plugin to allow existing URLs, previously pointing to https://git.eclipse.org/r, to continue to work as expected on the projects migrated to eclipse.gerrithub.io.
The plugin has a full list of the legacy URLs on https://git.eclipse.org/r and performs the correct redirect to the full equivalent project / change on eclipse.gerrithub.io.
For example, https://git.eclipse.org/r/5819 and https://eclipse.gerrithub.io/5819 are both referring to the same Change-Id:Iff84409c of the JGit project.

eclipse.gerrithub.io as a Gerrit Code Review multi-tenant domain

Gerrit Code Review has secretly supported multi-tenant domains for over a decade; however, that was implemented using a private fork implemented at Google and only in their data centres, as Patrick Hiesel presented at the Gerrit User Summit 2017 in London.

The Open-Source version does not have support for multi-tenancy in the Gerrit core. However, I developed a minimalistic solution six years ago that would give the “user experience” of virtual hosting on Gerrit.
The idea behind the solution is quite simple: hide unwanted projects based on the full domain name, pretty much like the virtual hosts work on the HTTP Servers world.

For example, you could define eclipse.gerrithub.io as follows:

 [server "eclipse.gerrithub.io"]
  projects = eclipse-jgit/*
  projects = eclipse-egit/*

Shawn himself was stunned when he saw the source code of the virtual-host libmodule back in 2017, with the comment “how did I end up writing so much code, if you did everything in just 7 Java classes?”

To be fair, the solution Shawn implemented on review-*.googlesource.com was a lot more comprehensive than the virtual-host libmodule, because it also included the ability to have different gerrit.config per tenant, whilst the solution implemented on GerritHub.io is a simple extra permission filter applied based on the domain name.

That means that all the Eclipse repositories are effectively available on any of the GerritHub.io sites and also accessible with the main domain URL https://review.gerrithub.io; the filtering on the virtual-host is a pure visibility setting for avoiding the users coming from the Eclipse Foundation from being overwhelmed by the other 50k projects hosted on GerritHub.io.

The advantage is that all the current GerritHub.io sites replicate the Eclipse Foundations repositories, providing, therefore, additional redundancy to the overall setup. All commits pushed to any of the repositories on eclipse.gerrithub.io will also be replicated to all sites, including the ones NOT starting with eclipse.gerrithub.io. Thanks to this redundancy, all the projects hosted on GerritHub.io can benefit from an astonishing 99.997% availability, well above any other free Git hosting sites for Open-Source available right now.

What’s next for the other 1,300 repositories on git.eclipse.org?

The work done for migrating the JGit and EGit projects to https://eclipse.gerrithub.io is the ground needed for the reuse of the same path for many more repositories and projects that want to keep their review history before the legacy git.eclipse.org site is going to be shut down by the Eclipse Foundation.
The scope definition, the user accounts association, and the provision of the users and projects are going to be exactly the same for any other project that wants to move to keep its history.

Once all the projects are migrated, the Eclipse Foundation can define a redirection rule that serves all the incoming requests to https://git.eclipse.org/r and redirects them to https://eclipse.gerrithub.io.

Lessons learnt and takeaway for other migrations

Migrating projects between Gerrit instances was declared impossible just a few years ago; however, that was the end goal of the whole Gerrit NoteDb project. Shawn Pearce used to say that he “would like to make all his reviews locally on his laptops and just push code and reviews once they landed“, making the Code Review an integral part of the Git data format.

The success of this migration project is the demonstration that Shawn’s vision was really innovative and, thanks to the cooperation of the community, projects can last and persevere well beyond the boundaries and lifetime of the people who initially founded them.

Migrating projects and consolidating Gerrit Servers is not something that is only applicable to this example of the Eclipse Foundation server shutdown, but can be further applied to other domains and use cases.
Companies are constantly changing, splitting and merging; projects need to follow the organisation and also move between Gerrit Servers and domains.

All the innovations introduced in Gerrit v3.7 and beyond can serve as an example of the implementation of a different migration path compared to the traditional big-bang approach.

One important lesson from the Eclipse Foundation’s experience is that every migration comes with many little but important details: all of them need accurate evaluation, implementation and testing. Upfront planning is needed; however, many times, many more details are found along the migration path, making it difficult to estimate correctly all the efforts and costs associated. Migrating is like doing daily exercising, the first round sounds quite lengthy and challenging, however, the following rounds can reuse the tools and experience earned in the previous migrations.

Lastly, this exercise has shown how important it is to keep the project’s history for planning its future. It would have been unthinkable for the JGit/EGit projets to continue developing without being able to leverage the learnings, discussions and experience from the past.

“The Code Review history is our legacy; learning from our past gives us direction for our future.”

Luca Milanesio
GerritForge, Inc. – CEO and CTO
Gerrit Code Review Maintainer
Gerrit Release Manager
Member of the Gerrit Engineering Steering Committee

Gerrit: 2021 in review

Posted on December 30, 2021 by Git and Gerrit Code Review for the Enterprise

Yet another year has passed for the Gerrit Code Review project with many challenges posed by the COVID-19 pandemic, new exciting releases, and the most popular Gerrit User Summit with the largest audience ever in its 12 years of history.

2021 in numbers

93 registered attendees to the Gerrit Virtual User Summit 2021, connecting from 56 companies over 17 countries, 14 talks showcased by 15 presenters over 2 days
1 Gerrit Contributors’ Summit
35 releases of which 2 major versions (v3.4.0 and v3.5.0.1) and 33 patches
107 contributors from 32 organizations, merging 4763 changes to 84 projects

The Gerrit Code Review community has shown resiliency during these difficult times, with outstanding participation in the events organized during the year, all remote and lacking the much-needed face-to-face interaction.

2021 vs. 2020 trends

Commits: -26%
Projects: -16%
Contributors: -30%
Companies: -41%
Average changes/contributor: +10%

The engagement has paid its toll after two years of pandemics with fewer organizations willing to invest time in contributing to Gerrit, possibly also impacted by the uncertainty of the future. 2021 has also been the first whole year of the project without David Pursehouse, one of the Gerrit project’s top #3 contributors. He was used to contributing 1.5k changes per year, which would alone easily justify the drop observed.

On the bright side, the contributors that continued over the year 2021 have shown an increased commitment as the number of active projects and commits has dropped less than the contributors, increasing the change/contributor rate compared to 2020.

Major organisations contributing to Gerrit in 2021

Google is confirmed to be the leading force of the Gerrit Code Review project, with over 62% of the changes merged, while GerritForge continues to be the #1 top contributor from the rest of the community. There are a couple of pleasant special surprises from the contributors.

Wikimedia Foundation confirmed to be the #2 top contributor from the community, all provided by Paladox who has been awarded Gerrit Maintainer in November.
SAP continues to be a strong contributor, just below Wikimedia Foundation, with Thomas being awarded Gerrit Maintainer in November.
Qualcomm is back on the shortlist of the top maintainers, with many new names in the list of contributors, well done!

Top-ten projects with major activity in 2021

gerrit (2,903 changes)
plugins/code-owners (447 changes)
jgit (287 changes)
plugins/task (83 changes)
plugins/multi-site (57 changes)
aws-gerrit (44 changes)
modules/cache-chroniclemap (40 changes)
plugins/checks (39 changes)
plugins/high-availability (39 changes)
plugins/replication (38 changes)

The first surprise is that the code-owners, the emerging star of the Gerrit plugins, received a massive investment of effort from Edwin (Google), who contributed 89% of the changes to it. The code-owners plugin has also been presented at the Gerrit Virtual User Summit 2021 and attracted the community’s attention.

The second surprise is the decline in contributions in the jgit project during the past two years: from 820 changes/year is now down to 374 changes in 2021.

Task is now the #2 plugin project in terms of merged changes in 2021. Qualcomm keeps the project’s full ownership with 98.9% of changes in 2021.

GerritForge confirm their commitment to improving Gerrit Multi-Site, as its plugin is the #3 in terms of changes merged in 2021.

Aws-gerrit is a relatively new project, presented less than two years ago and contributed by GerritForge, who contributed over 99% of the changes. It confirms to be a very active project that has helped the Gerrit Code Review open-source project deploy and test well-known “recipes” of infrastructure setups and see how Gerrit performs and works on those. Many bugs have been detected before the release and identified by the aws-gerrit project and CI integration.

The cache-chroniclemap module confirms to be very active in 2021, with 40 changes all provided by GerritForge. This relatively new module allows existing Gerrit setups to increase the overall performance of all persistent caches, which are vital in reducing the REST-API latency across all Gerrit features.

The checks plugin was deprecated back in 2020. However, it still shows significant changes and investment from Google in supporting the new Gerrit checks-API and UI. However, the rate of contributions is in stiff decline, down from the 324 changes in 2019 when it was still an actively developed project.

The last two plugins projects in the top tens are the replication and high-availability plugins, which has received major contributions from Qualcomm, GerritForge, Google and Ericsson.

Top events in 2021

The Gerrit Code Review community abandoned the idea of a face-to-face event in 2021 because of the continued global pandemic of COVID-19.
Instead, there were two separate virtual events for sharing the news of what is happening on the platform and the expectations from the community.

Virtual Gerrit Contributors’ Summit – 9th of June

The summit was organized by the Gerrit Community Managers and had an amazing audience amongst the contributors. The presentations showed what different teams are working on and reported into the summit notes:

GerritForge’s initiative of making Gerrit Code Review a cloud-native service
SAP’s work on supporting case-insensitive usernames in Gerrit v3.5
Recent performance improvements in JGit by Google
docker-based replicator service by SAP
Han-Wen’s work and the current status of integrating reftable in Git
Edwin showed his demo on the code-owners plugin
Google’s work on replacing Prolog with composable Submit Requirements in Gerrit
Google’s improvements on the Gerrit UI
Han-Wen showed how Google is managing consistency using GCloud pubsub

Gerrit Virtual User Summit 2021 – 2-3 of December

It was the first experiment of an entirely Virtual User Summit of the Gerrit Code Review project history. The challenges were multiple, including the limitations of allowing up to 100s of attendees, shortening the overall time to 3h x 2 days, and still allowing some interactions between the audience and the presenters. After two years of silence, we have finally received some user stories of using Gerrit in the wild.

The Summit has received vast overall positive feedback and rated 7.9/10, making it a fantastic achievement. The quality and interest of the talks were scored even higher, reaching 8.2/10.

The talks have been fully recorded and published on the GerritForge TV channel:

Luca (GerritForge) and Milutin (Google) presented what’s new in Gerrit v3.4 and v3.5
Google showed the brand-new Checks UI and its capabilities using the Checks API
Google also introduced the current status of the Submit Requirements and how the next forthcoming Gerrit v3.6 will allow a fully Prolog-less submit rules
Edwin demoed the code-owners and its shiny UI
Paul (CUE open-source project), showed how they use Gerrit hosted by GerritHub.io and the GitHub actions for their CI validation pipeline.
Ian Gauthier (Flywheel.io), presented his study results on the effectiveness of using historical reviews as criteria for selecting reviewers for new changes.
Qualcomm presented the status of their migration endeavor from v2.7 to the latest version of Gerrit, and the associated performance improvements contributed to the Gerrit open-source project.
Luca (GerritForge) presented his work and discoveries on Gerrit’s bottlenecks with large mono-repos and some ideas on overcoming some of them.
Shane McIntosh (University of Waterloo – Canada), presented the research work of its Software REBELs group on data analysis on code reviews.
Marcin (GerritForge) showed the work done in implementing a brand-new plugin that promises msec latencies for replicating repositories 1000x faster than the current replication plugin.
Tony (GerritForge) presented the aws-gerrit project, and the recent improvements in integrating Gerrit Code Review API and Git/HTTP calls with AWS’s X-Ray for performance and latency analysis.
Ponch (GerritForge) also presented how to leverage AWS’s Kinesis Streams for sending Gerrit’s stream events using a reliable cloud-native pubsub system.

It was definitely a lot of information and sharing, which showed that the Gerrit Code Review open-source project is alive and active more than ever.

Gerrit features highlights in 2021

Gerrit Code Review has major innovations developed and decisions made over 2021. See below a short recap of the ones that represent a turning point in the evolution of the Gerrit open-source project. Some of them are considered breaking changes and, therefore, need careful analysis and a planned upgrade path.

Speed up of Gerrit upgrade from v2.7 to the latest version

2021 has seen a significant increase in the cooperation and contributions of Qualcomm to the rest of the Gerrit Code Review community, focussed on the speed-up of the Gerrit upgrade process from v2.7 to v3.5.
The contributions and cooperation have brought many improvements to JGit and Gerrit and will allow many more companies to migrate faster and smoother than ever before.

Goodbye to Java 8

From Gerrit v3.5 onwards, the source code and binaries of Gerrit Code Review won’t be compatible with Java 8 anymore.

JSch SSH library is completed removed from Gerrit Code Review

The quirks and obsolescence of the JSch library has cursed Gerrit’s destiny for years. Thanks to Thomas Wolf (Paranor) JGit moved away from it and rebuilt all its Git/SSH stack on top of Apache Mina. That has allowed to remove the JSch library from the Gerrit dependencies and used the Apache Mina SSHD client stack instead.

ElasticSearch is removed from Gerrit Code Review

On the 2nd of February 2021, Elasticsearch B.V. changed its license model and abandoned the Apache 2.0 open-source license for the new versions of ElasticSearch v8 and over.

Gerrit cannot include or require any commercial product not released under one of the open-source licenses allowed by the project. The ElasticSearch backend has not been widely used in the community anyway, based on a recent survey sent to the community therefore the ESC decided on the 3rd of November that the ElasticSearch backend will be removed from Gerrit core and moved into a libModule.

Submit Requirements waving goodbye to Prolog

The Gerrit Code Review project does not use anymore Prolog rules for the submit rules of the project from the 16th of December. The support for Prolog-less submit rules is now mature and it will be part of the forthcoming v3.6 release in 2022.

What’s coming in 2022?

The future of Gerrit Code Review is bright and full of innovative ideas and improvements on the overall development and CI/CD lifecycle. With the forced remote working of millions of developers worldwide, more and more companies are looking on how to make remote interactions more useful and fruitful, reducing frictions and making the workflow smoother and faster more than ever.

Stay tuned and keep on using and contributing to Gerrit Code Review, one of the most innovative and productive platforms for code review and collaboration.

Happy New Year, Gerrit Code Review

Posted on December 31, 2019 by Git and Gerrit Code Review for the Enterprise

It has been a hectic and productive year for ourselves at GerritForge and the Gerrit Code Review Community.
We want to take this opportunity to recap some of the milestones of the 2019 and the exciting perspectives for 2020 and beyond.

Gerrit Code Review, 2019 in numbers

Gerrit had over 120+ contributors from all around the world coming from 33 different companies and organisations, which is excellent. There is a robust 6% increase in the number of commits (+231 commits) but a reduction in the number of contributors (-7 authors).

With regards to the overall trend of commits during the year, the success of the Gerrit User Summit 2019 in Sunnyvale is visible, with an increase of the rate of commits around October/November.

Top-three projects of the 2019

Gerrit (1,626 commits) is, of course, the most active project. However, it is visibly down in terms of number of commits from 2018 (-19%). That is a consequence of the shift of focus to the other two key components listed below, which are available as plugins and then not accounted for the overall gerrit core repository statistics.
Checks (315 commits) is the brand-new 1st class CI integration API for external build systems, such as Jenkins and Zuul. It is incredible how in just 12 months it has become robust and fully mature. It is currently used for the validation of all changes on the Gerrit project.
Multi-site (234 commits) is the long-awaited support for Gerrit that everyone has been waiting for years. It is finally available for all active and supported versions (from 2.16+ onwards).

Top-three companies contributing to Gerrit

Google is, with no surprise, still the top contributor of the Gerrit project overall. It is basically stable from 2018 (around 43%) as a confirmation of the continued commitment to the project.
GerritForge is growing significantly in the contribution to the project, with exactly half of the contributions of Google. This is a significant result from 2018 with a 7% growth of involvement.
CollabNet is sliding to the 3rd position (it was 2nd in 2018) with a 3% decrease of contributions. As noticeable mention, however, David Pursehouse from CollabNet is still the number #1 maintainer in terms of number of commits.

Even if it is outside the top#3 contributors companies, SAP deserves a special mention for its continuous involvement in the JGit project, which is at the basis of Gerrit engine, and its fantastic engagement in improving the Gerrit CI system and integrating it with the checks plugin.

Top-three achievements from GerritForge

The outstanding results of contributions of GerritForge in 2019 have been focused on three major topics.

Gerrit multi-site, released and production ready

We released the Gerrit Multi-Site plugin, allowing seamless balancing in a distributed environment, a technologically highly advanced development, crucial for very distributed companies. See https://gerrit.googlesource.com/plugins/multi-site for more information.

Gerrit User Summits in Europe, USA and streaming

We successfully organised and executed the Gerrit User Group in Europe and the US. The event was very well received by the community with an overall attendance of some 87 on-site and 38 in streaming. Have a look at https://gitenterprise.me/2019/12/23/gerrit-user-summit-survey/ for interesting feedback on those from the attendees.
We opened our own local office in Sunnyvale, in the heart of Silicon Valley. A crucial move to better serve our ever-expanding US customer base.

Gerrit Analytics for the Android Open-Source Project

We kickstarted the Gerrit Analytics for the Android open-source project initiative: after the successful adoption of the automatic collection of code metrics on the Gerrit project (see https://analytics.gerrithub.io) the Android team asked GerritForge to start working on extracting the same metrics from their code.

What’s coming in 2020

Gerrit v3.2 is currently under development and it is planned to be released around April/May 2020. It represents a major milestone for the Gerrit project with the support for Java 11 and large JVM heaps, up to hundreds of GBytes. Gerrit v3.2 is definitely the release that everyone that has a big repository (mono-repos) should target as next upgrade. See the Gerrit .roadmap at https://www.gerritcodereview.com/roadmap.html for more details about the planned features.

More work and improvements on the checks plugin, with the aim of fully integrating it into everyone’s user-journey and their CI/CD pipeline. Our first blog-post of 2020 will be how to use Jenkins and Checks plugin together with GerritHub.io.

Multi-site and HA will become more integrated with Gerrit, with the aim of moving parts of their technologies (e.g. global ref-db) into JGit and thus used in Gerrit core.

The Gerrit User Summit 2020 will continue the experiment of cross-pollination with other communities, after the success of the interactions with the JGit and OpenStack communities in 2019. Bazel is the next target, as it is used as the de-facto standard build system for Gerrit and its plugins.

Again, Best wishes from your friends at GerritForge and looking forward to a continuing successful partnership in the coming years.

Luca Milanesio
Gerrit Maintainer, Release Manager and member of the ESC.

Gerrit: OpenSource and Multi-Site

Posted on March 2, 2019 by Git and Gerrit Code Review for the Enterprise

One more recording from the Gerrit User Summit 2018 at Cloudera in Palo Alto.

Luca Milanesio, Gerrit Code Review Maintainer and Release Manager, presented the current status of the support for multi-master and multi-site setups with the standard OpenSource Components, developed by GerritForge and the Gerrit Code Review Community.

Introduction

The focus of this talk is sharing with you one experience that we did with the Gerrit server that we maintained, GerritHub.

First of all, I’m just going to tell you how we went through the journey from a single master-slave installation back in 2013 to a fully multi-site setup across two continents.

The evolution of GerritHub to multi-site

GerritHub was born in November 2013. The idea was straightforward. It was just an idea on how to take a single Gerrit server and put the replication plug-in to push to GitHub.

To implement a good and scalable and reliable architecture, you don’t need to design everything up front. At the beginning of your journey, you don’t know who your users are, how many repos are going to create, what the traffic looks like, what the latency looks like: you know nothing.

You need to start small, and we did back in 2013, with a single Gerrit master located in Germany, because we had no idea of where the users would have come from.

Would the people in Europe like it, or rather the people in the U.S. like it, or again the people in China like it? We did not know. So we started with one in Germany.

Screenshot 2019-03-03 at 00.07.01

Because we wanted to make a self-service system what we did was very simple: a simple plugin called, “The GitHub plug-in”. That was just a wizard to add an entry in the replication config.

You have Gerrit incoming traffic, then you configure replication, plugin and eventually push to GitHub. The only complicated part here is that if you do it as a Gerrit administrator you have to define these remotes in the replication.config but you can express it in an optimized way. On a self-service system, you’ve got 1000s of people then will create 1000s remotes automatically. Luckily, the replication plugin works very well and was able to cope with it very well.

Moving to Canada

Then we evolved. The reason why we changed is that people started saying ‘Listen, GerritHub is cool, I can use it, works well in the morning. Why in the afternoon is so slow?“. Uh oh.

We needed to do some data mining to see precisely who was using it, where they were coming from, and what operations they were doing. Then we realized that we had chosen the wrong location because we decided that we wanted to put the Gerrit master in Germany, but the majority of people are coming from the USA.

Depending on how the backbone between the Atlantic Ocean was performing, GerritHub could be faster or slower. One of the complaints that they were saying is that in the morning, GitHub was slower than GerritHub, but in the afternoon it was exactly the opposite.

We were doing some performance tuning and analyzing the traffic, and even when people were saying that it’s very slow, actually GerritHub was a lot faster than GitHub in terms of throughput. The problem was the number of hops between the end user and GerritHub.

We decided that we needed to move from Germany to the other side of the Atlantic Ocean. We could have done to move the service to the USA but we decided to go with Canada because the latency was precisely the same as hosting in the USA but less expensive.

Screenshot 2019-03-02 at 01.19.34

What we could have done is just to move the Master from Germany to the other side of the Atlantic Ocean, but because, from the beginning, we wanted to give a service that is always available, we decided to keep both zones.

We didn’t want to have any downtime, even in this migration. We wanted, definitely, to do one step at a time. No changes in releases, no changes in configurations, only moving stuff around. Whenever you change something, even if it’s a small release change, you change the function, and that has to be properly communicated.

If we were changing data center and version, when something goes wrong, you would have the doubt of what it is. Would it be the new version that is slower or the new data center that is slower? You don’t know. If you change one thing at a time, it must be that thing that wasn’t working.

We did the migration in two steps.

Step-1: The Gerrit master in Germany, still the replication to GitHub, and the new master in Canada was just one extra replication end.
The traffic was still coming on this side of the master, but it was replicated to both Canada and the other GitHub. Then, when that was stable, so we were doing all the testing, the other master was used as it was at Gerrit slave, but was not a slave, all the nodes were master, with just a different role.

Step-2: Flip the switch the Gerrit master is in Canada. When replication was online and everything was aligned, we have put a small read-only plug on the German side, which was making the whole node read-only for a few minutes, to give time to the last replication queue to drain.When the replication queue was drained, we flipped the switch, when it was going to the new master it was already read/write.

The people didn’t change their domain, didn’t notice any difference, apart from the much-improved performance. The feedback was like “Oh my, what have you done to GerritHub since yesterday? It’s so much faster. It has been never so fast before.‘
Because it was the same version, and we were testing in parallel for quite some time, nobody had a single disruption.

Zero-downtime migration leveraging multi-site

But that was not enough, because we wanted to keep Gerrit always up and running and always up to date with the latest and greatest version. Gerrit is typically released twice a year; however, the code-base is stable in every single commit.
However, we were still forced to do the ping pong between the two data centers when we were doing our roll out. It means that every time that an upgrade was done, users had a few minutes of read-only state. That was not good enough, because we wanted to release more frequently.

When you upgrade Gerrit within the same release, let’s say between 2.15.4 and 2.15.5, the process is really straight forward, because you just replace the .war, restart Gerrit, done.

However, If you don’t have at least two nodes on either side, you need to ping pong between the two different data centers, then apply the read-only window, which isn’t great.
We started with a second server on the central node so each node can deal with the entire traffic of GerritHub. We were not concerned about the German side, because we were just using it as disaster recovery.

Going multi-site: issues

We started doubling the Canadian side with one extra server. Of course, if you do that with version v2.14 which problem do you have?

Sessions. So, how do you share the sessions? If you login into one Gerrit server, you create one session, then you go to the other and you don’t have have a session anymore.
Caches. That is easier to resolve, you just put the TTL of the cache to a very low value, put some stickiness, you may sort this out. However, cache consistency is another problem and needs to be sorted.
Indexes are the very painful one, because, at that time there was no support for ElasticSearch. Now things are different, but back in 2017, it wasn’t there.
What happens is that every single node has its own index. If an index entry is stale, it’s not right until someone is going to re-index.

The guys from Ericsson were developing a high availability plugin. We said instead of reinventing the wheel, why don’t we use a high availability plugin? We started rolling it out to GerritHub and actually, the configuration is more complex and looks like this one.

So, imagine you’ve got in Canada you’ve got two different masters, still only one in Germany. They align the consistency of the Lucene index and the cache through the HA plugin and they still use the replication plugin.

How do you share the repository between the two? You need to have a shared file system. We use what exactly the same used by Ericsson, NFS.

Screenshot 2019-03-02 at 01.21.36

Then, for exposing the service we needed HAProxy, not just one but at least two. If you put one HAProxy, you’re not HA anymore, because if that HA proxy dies, your service goes down. So, you have two HAProxy, they must have a cross configuration’s, it means that both of them, they can redirect traffic to one master or the second master, it’s not one primary and the second backup: they have exactly the same role. They do exactly the same thing, they contain exactly the same code, they’ve got exactly the same cache, exactly the same index. They’re both running at the same time. They’re both accepting traffic.

This is something similar to what Martin Fick (Qualcomm) did, I believe, last year, with the only difference that they did not use HAProxy but only DNS round-robin.

Adoption of the high-availability plugin

Based on the experience of running this configuration on GerritHub, we started contributing a lot of fixes to the high-availability plugin.

A lot of people are asking “Oh, GerritHub is amazing. Can you sell me GerritHub?”. I reply with “What do you mean exactly?”

GerritHub is just a domain name that I own, with Gerrit 2.15, plus a bunch of plugins: replication plugin, GitHub plugin, the high-availability plugin (we use a fork), the web session flat file and a bunch of scripts to implement the health check.

If you guys want to do your own, the same configuration, you don’t need to buy any commercial product. We don’t sell commercial products. We just put the ideas into the OpenSource community to make it happen. Then if you need Enterprise Support, we can help you implement it.

The need for a Gerrit disaster-recovery site

Then we needed to do something more because we had one problem. OVH had historically been very reliable, but, of course, shit happens sometimes.
It happened one day that OVH network backbone was down for a few hours.

That means that any server that was on that Data-Center was absolutely unreachable. You couldn’t even connect to them, you couldn’t even check their status, zero. We turned the traffic to the disaster recovery side, but then we faced a different challenge because we had only one master.
It means that if something happens to that master, I don’t know, a peak, or whatever, is becoming a little bit unhealthy, then we are going to have an outage. We didn’t want to risk to have an outage in that situation.

So we moved to Germany with two servers, OVH with two servers, and afterward, we migrated to Gerrit v2.15 and NoteDb.

Your disaster recovery side is never really safe until you are going to need and use it. Then use it all the times, on a regular basis. This is what we ended up to implement.

We have now two different data centers, managed by two different cloud providers. One is still OVH with Canada, and the other is Hetzner in Germany. We still have the same configuration, HA plugin over a shared NFS, so this one is completely replicated into the disaster recovery site, and we are using the disaster recovery continuously to make sure that is always healthy and aligned.

Leverage Gerrit-DR site for Analytics

Because we didn’t want to serve actual user traffic on the disaster recovery site, because of the synchronization lag between the two sides, we ended up using for all the data mining activities. There is a lot of things that we do on data, trying to understand how our system performs. There is a universe of data that typically either never looked at or you don’t really extract and process in the right way.

Have you ever noticed how much data Gerrit generates under the logs directory? A tremendous amount of data and that data tells you exactly the stories that you want to know. It tells you exactly how Gerrit works today, what I need to do to makes sure that Gerrit will work tomorrow, how functions are performing for the end users, if something has blown up, it’s all there.

We started really long ago to work on that DevOps Analytics space, and we started providing that metrics and insights data for the Gerrit Code Review project itself and reporting it back to the Gerrit Code Review project through the service https://analytics.gerrithub.io.

Therefore we started using the disaster recovery site for analytics traffic because if I do an extraction and processing of my data on my logging, on my activities, is there really a need for an analysis of the visit of 10 seconds ago or not? A small time lag on data doesn’t make any difference from an analytics’s perspective.

Screenshot 2019-03-03 at 00.19.54.png

We were running Gerrit v2.15 here, so the HA plugin needed to be radically different from the one that it is today. We are still massively on the HA plugin fork, but all the changes have been pushed for review on the high availability.

However, the solution was still not good enough because there were still some problems. The problem is that the HA plugin within the same data center relies on the shared file system. We knew within the same data center that was not a problem. But, what about creating an NFS across data-centers in different continents? It would never work effectively because of the latency limitations.

We then started with a low tech solution: rely on the replication plugin for the Git data in the repository. Then every 30 minutes, there was a cronjob that was checking the consistency between the two and then does a delta re-index between the different sites.

Back in Gerrit v2.14, I needed to do as well a database export and import, because contained the reviews.

But, there was also the timing problem: in case of a disaster occurred, people will have to wait for half an hour to get the data after re-index.
They would have not lost anything because the index can be recreated at any time, but the user experience was not ideal. And also, you’ve got the DNS related issues for going from one zone to the other.

Sharding Gerrit across sites

First of all, we want to leverage the sharding based on the repository, available from Gerrit 2.15, which include the project name in each page or REST-API URLs. That allows achieving basic sharding across different nodes even with a simple OpenSource HAProxy or other Workload Balancer, without having the magic of the Google intelligent Gerrit/Git proxy. Bear in mind this is not pure sharding, because all the nodes keep on having all the repositories available on every node. HAProxy is going to be clever and based on the project name and action, will use the most appropriate backend for the reads and writes operation. In that way, we are making sure that you never push on the same repository on the same branch from two different nodes concurrently.

Screenshot 2019-03-02 at 01.23.16

How magic works? The replication plugin takes care of synching the nodes. Historically with master-slave, the synchronization was unidirectional. With multi-site instead, all masters are replicating to each other. It means that the first master will replicate to the second and the other way around. Thanks to the new URL scheme, Google made our lives so much easier.

Introducing the multi-site plugin

We have already started writing a brand-new plugin called multi-site. Yes, it has a different name, because the direction that it’s taking is completely different in terms of decisions compared to the high-availability plugin, which required having a shared file system. Secondly, the HTTP synchronous communication with the other site was not feasible anymore. What happens if, for instance, a remote Site in India is not reachable for 50 seconds? I still want things events and data to be put into a persistent queue to be eventually sent through the remote site.

Screenshot 2019-03-02 at 01.24.06
For the multi-site broker implementation, were have decided to start with Kafka, because there is already an implementation of the stream events it in Gerrit as a plugin. The multi-site plugin, however, won’t be limited to Kafka but could be extended to support other popular brokers such as NATS.

We will go to the location-aware DNS because we want the users to access the HAProxy that is closer to him. If I am living in Germany maybe it makes sense for me to use only the European servers, but definitely not the servers in California.

You will go by default to a “location-aware” site that is closer to you. However, depending on what you do, you could still be laster on redirected to another server across zones.

For example, if I want to fetch data, I can still fetch everything on my local site. But, if I want to push data, the operation can either be executed locally or forwarded to the remote site, if the target repository has been sharded remotely.

Screenshot 2019-03-02 at 01.24.43

The advantage is the location-awareness and data-locality. The majority of data transfer will happen with my local site because, at the end of the day, the major complaints of people using Gerrit with remote masters is a sluggish GUI.

If you have Gerrit master/slave, for instance, a Gerrit master in San Francisco and your Gerrit slaves in India, you have the problem that everyone from India will still have to access a remote GUI from the server in San Francisco, and thus would experience a very slow GUI.

Even if only 10 percent of your traffic is a write operation, you would suffer from all the performance penalties of using a remote server in San Francisco. However, if I have an intelligent HAProxy, with also an intelligent logic in Gerrit that understands where the traffic needs to go to, I can always talk to my local server is in my zone. Then, depending on what I do, I can use my local server or a remote server. This happens behind the scenes and is completely transparent to the user.

Q&A

Screenshot 2019-03-02 at 01.25.33

Q I just wanted to ask if you’re ignoring SSH?

SSH is a problem in this architecture, right, because HAProxy supports SSH, but has a big problem.

Q: You don’t know what projects, you don’t have any idea of the operation, whether it’s read or write.

Exactly. HAProxy works at transport levels, so it means that it knows that flow of encrypted data going back and forth to the Gerrit server, but cannot see anything in clear and thus cannot make an educated decision based on it. Your data may end up being forwarded to the zone where your traffic is not optimized, and you will end up waiting a little bit more. But, bear in mind, that the majority of complaints about Gerrit master/slave is not really on the speed of the push/pull but rather on the sluggish Gerrit GUI.

Of course, we’ll find as a Community a solution to the SSH problem, I would be more than happy to address that in multi-site as well.

Q: Solution is to get a better LAN because we don’t have any complaints about the GUI across the continents.

Okay, yeah, get a super fast network across the globe, so everyone and everywhere in the world will be super fast accessing a single unique central point. However, not everyone can have a super-fast network. For instance in GerritHub, we don’t own the infrastructure so we cannot control the speed. Similarly, a lot of people are adopting a cloud infrastructure, where you don’t own and control the infrastructure anymore.

Q: Would you also have some Gerrit slaves in this setup. Would there be several masters and some of the slaves? Or everything will become just a master?

In multi-site architecture, everything can be a master. The distinction between master and slave does not exist anymore. Functionally speaking, some of them will be master for certain repositories and slave for other repositories. You can say that every single node is capable of managing any type of operations. But, based on this smart routing, any node can potentially manage everything, but effectively forwards the calls based on where is best to execute them.
If with this simple multi-site solution you can serve the 99% of the traffic locally, and for that one percent, we accept the compromise of going a bit slower, than it could still sound very good.

We implement this architecture last year with a client that is distributed worldwide and it just worked, because it is very simple and very effective. There isn’t any “magic product” involved but simply standard OpenSource components.

We are on a journey, because as Martin said last year in his talk, “Don’t try to do multi-master all at once.” Every single situation is different, every client is different, every installation is different, and all your projects are different. You will always need to compromise, find the trade-off, and tailor the solution to your needs.

Q: I think in most of the cases that I saw up there, you were deploying a Gerrit master in multiple locations? If you want to deploy multi-master in the same sites, like say I want two Gerrit servers in Palo Alto, does that change the architecture at all? Basically, I am referring to shared NFS.

This is actually the point made by Martin: if you have the problem on multi-site, it’s not a problem of Gerrit, it’s a problem of your network because your network is too slow. You should solve the network, you shouldn’t solve this problem in Gerrit.
Actually, the architecture is this one. Imagine you have a super fast network all across the globe, everyone is reaching the same Gerrit in the same way. You have a super fast direct connection with your shared NFS, so you can go and grow your master and scale horizontally.

The answer is, for instance, yes, if your company can do that. Absolutely, I wouldn’t recommend doing multi-site at all. But if you cannot do anything about speed, and you got people that work remotely and unfortunately, you cannot go and put a cable between the U.S. and India, under the sea just for your company, then maybe you want to address the problem in a multi-site fashion way.

One more thing that I wanted to point out, is that the multi-site plugin makes sense even in the same site. Why this picture is better than the previous one? I’m not talking about multi-site, I’m talking about multi-master on the same data center. Why this one is better?

You’ll read and write into both. Read, write, traffic to both. The difference between this one and previous is that there is no shared file system on this one. The previous, even within the same data center, if you use a shared file system, you still have a single point of failure. Because, even if you are willing to buy the most expensive shared file system with the most expensive network and system that exists in the world, they will still fail. Reliability cannot be simply resolved by throwing more money on it.
It’s not the money that makes my system reliable. Machines and disks will fail one day or another. If you don’t plan for failures, you’re going to fail anyway.

Q: Let’s say when you’re doing an upgrade on all of your Gerrit masters right? Do you have an automatic mechanism to upgrade? I’m coming from a perspective where we have our Gerrit on Kubernetes. So, sure, if I’m doing a rolling upgrade, it’s kind of like a piece of cake, HAProxy is taking care of by confd and all that stuff, right?
In your situation, how does an upgrade would look like, especially if you’re trying to upgrade all the masters across the globe? Do you pick and choose which site gets upgraded first and then let the replication plugins catch up after that?

First of all, the rolling upgrade with Kubernetes will work if you do minor upgrades. If you’re doing major Gerrit upgrades, it doesn’t. This is because that changes the data, right? With the minor upgrades, you do exactly the same here. Even if the others in the other zones have a different version, they are still interoperable, right, because they talk exactly to the same schema, exactly to the same API, exactly in the same way.

Q: I guess taking Kubernetes out of the picture, even, so with the current multi-master setup you have, if you’re not doing just a trivial upgrade, how do you approach that usually?

If you are doing a major version upgrade, that is going to orchestrate exactly like the GerritHub upgrade from 2.14 to 2.15. You need to use what I called a ping-pong technique. You basically need to do across data centers.

It’s not fully automated yet. I’m trying to automate it as much as possible and contribute back to the community.

Q: In the multi-master, when you’re doing the major upgrade, and even in the ping pong, so if the schemas changed, and you’re adding the events to replication plugin, are you going to temporarily suspend replication during the period of time? Because the other items on the earlier version don’t understand that schema yet. Can you explain that a little?

Okay, when you do the ping pong that was this morning presentation, what happens is the upgrade on the first node, interruption the traffic there, you do all the testing you want, you are behind with the original master but it catches up with replication and does all the testing.

With regards to the replication event, they are not understood by the client, such as the Jenkins Gerrit trigger plugin. That point was raised this morning as well. If you go to YouTube.com/GerritForgeTV, there is the recording of my talk of last year about a new plugin that is not subject to this fluctuation of Garret version.

Luca Milanesio – Gerrit Code Review Maintainer and Release Manager