About Git and Gerrit Code Review for the Enterprise

Official blog for GerritForge

What’s new in Gerrit Code Review v3.2

gerrit-code-review-3.2-intro

Gerrit Code Review is unstoppable: despite the recent COVID-19 pandemic and the cancellation of the Spring Hackathon 2020, the community has made an extraordinary effort to deliver remotely and on-time the Gerrit v3.2 release on the 1st of June.

GerritForge has already migrated GerritHub.io on the day-1 of the release and is happy to share with you the highlights of this new release. If you need help to assess your current setup and migrating, please get in touch with us at https://gerritforge.com/contact.

Get ready to migrate: get rid of zombie comments

The migration process performs the cleanup of the zombie draft comments in the All-Users.git repository that has been left behind since the introduction of NoteDb back in v2.16.
Every user commenting on any change was creating a series of commits on the All-Users.git repository, where the draft comments are stored. Once the comments were finalised and applied to the change, they were not fully removed from the All-Users.git. That created a backlog of zombie comments on All-Users.git that are now being completely removed during the Gerrit v3.2 migration process.

Since Gerrit v2.16.16, there is a standalone utility to remove the zombie draft comments. You may want to do that operation upfront to make sure that the migration to v3.2 does not have a lot of processing during the init step. Also, make sure that the All-Users.git resides on a fast access local filesystem for minimizing the migration time.

If you do nothing, the cleanup utility will be automatically executed when migrating to Gerrit v3.2, bearing in mind that it may take quite a long time to complete. In our tests, it took around 10 minutes for 10k zombie comments.

WARNING: the execution time is not linear and it may take up to 48h of processing time for a staggering number of 1M zombie comments.

Migrate with zero-downtime

If you have on Gerrit v3.1.x in a high-availability configuration, you can upgrade seamlessly to Gerrit v3.2, without having to suspend or degrading the service in any way. GerritForge has a record number of installations done in high-availability and multi-site: if you are running a single Gerrit master today, you should get in touch with the GerritForge Team to help moving to high-availability.

For the very first time, the whole Gerrit Community can benefit from the ability to perform a rolling upgrade without any downtime.

The zero-downtime upgrade consists of the following steps:

  1. Have Gerrit masters upgraded to v3.1.6 (or later) in a high-availability configuration, healthy and able to handle the incoming traffic properly.
  2. Set gerrit.experimentalRollingUpgrade to true in gerrit.config on both Gerrit masters.
  3. Set the first Gerrit master unhealthy.
  4. Shutdown the first Gerrit master and then upgrade to v3.2.
  5. Startup the first Gerrit master and wait for the on-line reindex to complete.
  6. Please verify that the first Gerrit master is working correctly and then make it healthy again.
  7. Wait for the first Gerrit master to start serving traffic regularly.
  8. Repeat steps 3. to 7. for the second Gerrit master.
  9. Remove gerrit.experimentalRollingUpgrade from gerrit.config on both Gerrit masters.

NOTE: Gerrit v3.1.6 has not been released yet. However, if you want to perform a rolling upgrade today, you can download the latest build on the stable-3.1 branch from the GerritForge’s CI at https://gerrit-ci.gerritforge.com/job/Gerrit-bazel-stable-3.1/

GerritHub.io has been successfully upgraded on the 1st of June without any interruption of any kind using the above procedure.

Java 11 official support

Gerrit is now officially supported on Java 11, in addition to Java 8. Running on Java 11 was already possible from v2.16.13, v3.0.4 and v3.1.0, but not officially supported because of the lack of a CI validation on Java 11 for stable-2.16, stable-3.0 and stable-3.1 branches.

Gerrit v3.2 has been validated with Java 11, with the following known issues:

  • Issue 11567: Java 11 runtime & startTLS LDAP broken: ‘error code 8 – BindSimple: Transport encryption’.
  • Issue 12639: WARNING: An illegal reflective access operation has occurred, when starting Gerrit.

After 24h of adoption of Gerrit v3.2 on GerritHub.io, we have seen two major benefits from the migration to Java 11: overall reduction of the “old generation” build up in the JVM heap and massive reduction of GC cycles times and full-GCs.

screenshot-2020-06-02-at-11.48.30

Before the 29th of May, all GerritHub.io nodes were on Gerrit v3.1 / Java8. The old-generation JVM heap keeps on building up constantly until it reaches the 60GB and triggers a full GC cycle. After the upgrade to Gerrit v3.2 / Java11, memory consumption is very much under control. There are still possibilities of peaks with associated full GCs (see the one on the 30th of May around 12:00 BST) but there isn’t build up of old-generation objects anymore.

screenshot-2020-06-02-at-11.52.43

Java11 brings a lot of benefits also in reducing the latency of the individual GC cycles, showing much better performance with large heaps.
After the migration on the 29th of May, the GC graph is pretty much flat. The only full GC peak that is noticeable on the 30th of May lasted for just 5 msecs while the normal GC cycles are well below 1 msec, barely noticeable.

Performance is a feature

Shawn Pearce, the Gerrit Code Review project founder, used to say “performance is a feature”, which is very true. Any software nowadays can provide some basic out of the box features, thanks to the plethora of open-source components available out of the box. However, designing architecture and making it scale and perform to the levels that an Enterprise Code Review system needs, it is not easy.

Gerrit v3.2 is yet another significant milestone in the continued effort of the Gerrit maintainers and contributors in making Gerrit Code Review faster, more stable and available than ever before.

Performance tuning isn’t a “one-off task” but is a continuous improvement on thousands of little details ranging from the front-end javascript tuning down to the backend of the platform.

New accounts cache

From the data collected on googlesource.com Patrick Hiesel (Google) has identified the accounts loading from NoteDb as a significant cause of the delay of backend calls. That is true for all Gerrit installations, but especially for distributed setups or setups that restart often.

Gerrit v3.2 introduces a brand-new AccountCache decomposed into smaller chunks that can be cached individually:

  1. External IDs + user name (cached in ExternalIdCache)
  2. CachedAccountDetails (newly cached)
  3. Gerrit’s default settings CachedAccountDetails – a new class representing all information stored under the user’s ref (refs/users/<sharded-id>).

The new structure is cleverly designed to require a lot less I/O when an entry needs to be reloaded and lowering the ratio of cache-miss in case of user’s details updates.

The new structure has the following advantages:

  1. CachedAccountDetails contains only details from refs/users/<sharded-id>. By that, we can use the SHA1 of that ref as cache key and start serializing the cache to eliminate cold start penalty as well as router assignment change penalty (for distributed setups). It also means that we don’t have to do invalidation ourselves anymore.
  2. When the server’s default preferences change, we don’t have to invalidate all accounts anymore.
  3. The projected speed improvements that come from persisting the cache makes it so that we can remove the logic to load accounts in parallel.

Migration to Polymer 3

PolyGerrit UX roadmap continues with yet another important milestone: the migration to Polymer 3. The result is visible with an improved polishing of the GUI and significant speedup of rendering and reduction of page loading times.
There are a significant amount of small refinements to the GUI as well, coming from a meticulous work of fixes included in this release.
Not by surprise, the number of issues fixed in v3.2 on the PolyGerrit UX outnumbers by far the overall changes in the release notes.

gerrit-3.2-findings

PolyGerrit is giving special attention to the classification of the feedback coming from robots rather than humans.
Most of the efforts made in the past 12 months target the improvement the support for robot-comments and giving some extra dedicated space for them.
In Gerrit v3.2 there is a special place for them in a brand-new “Findings” tab. It is currently empty on GerritHub.io as people did not start using them much. However, I do see a lot of space of adoption of this new feature, giving the ability for more integration of linters and automatic validation feedback in this tab.

A flooding of fixes and small improvements

The list of fixes and improvements in Gerrit v3.2 is really huge. Please check the release notes on the Gerrit Code Review release page for all the details.

There are a lot of reasons to migrate to Gerrit v3.2, the fastest, more stable and scalable release of Gerrit Code Review ever.


Thanks a lot to the whole Gerrit Code Review Community of maintainers and contributors for making this release happen. Thanks to Patrick Hiesel for the technical description of the account cache improvements and the replication clustering.

Luca Milanesio (GerritForge)
Gerrit Code Review Maintainer, Release Manager, ESC Member

 

Summary of the Gerrit User Summit & Hackathon in Sunnyvale

sunnyvale-gerritforge-live.jpg

After months of reviews and contributions by different speakers and attendees, the summary of the last Gerrit User Summit & Hackathon in Sunnyvale CA has been published on the Gerrit Code Review News page.

High-performance Summit in numbers

The Gerrit User Summit 2019 has ended, with highest score of achievements
in the history of the 11 years of the entire Gerrit open-source project:

  • Two dates and locations in a 12-months period: Gothenburg (Sweden) and
    Sunnyvale (California).
  • Four Gerrit releases delivered: v2.15.16, v2.16.11, v3.0.2, v3.1.0
  • 127 people registered across the two locations,
    87 people attended on-site (70% turnout) and 38 people followed the event
    remotely at different times using the live streaming coverage
    provided by GerritForge.
  • 373 changes merged (204 in Gothenburg, 169 in Sunnyvale).
  • 32 developers attended the Hackathons, 8 of them have never contributed or
    attended an event before.
  • The highest performing version of Gerrit v3.1.0 released, with over
    2x git and REST-API performance compared to v3.0.x.
  • 22 talks presented across Gothenburg and Sunnyvale, with 6 new speakers
    that have never presented before at the Summit.

The performance of the Summit is yet again another evidence of the continuous
growth of the community and the increased synergies with the JGit, OpenStack/Zuul
and the Tuleap open-source projects.

Read the full Summit and Hackathon summary on the Gerrit Code Review web-site.

Happy New Year, Gerrit Code Review

It has been a hectic and productive year for ourselves at GerritForge and the Gerrit Code Review Community.
We want to take this opportunity to recap some of the milestones of the 2019 and the exciting perspectives for 2020 and beyond.

Gerrit Code Review, 2019 in numbers

gerrit-2019-commits.png

Gerrit had over 120+ contributors from all around the world coming from 33 different companies and organisations, which is excellent. There is a robust 6% increase in the number of commits (+231 commits) but a reduction in the number of contributors (-7 authors).

With regards to the overall trend of commits during the year, the success of the Gerrit User Summit 2019 in Sunnyvale is visible, with an increase of the rate of commits around October/November.

Top-three projects of the 2019

  1. Gerrit (1,626 commits) is, of course, the most active project. However, it is visibly down in terms of number of commits from 2018 (-19%). That is a consequence of the shift of focus to the other two key components listed below, which are available as plugins and then not accounted for the overall gerrit core repository statistics.
  2. Checks (315 commits) is the brand-new 1st class CI integration API for external build systems, such as Jenkins and Zuul. It is incredible how in just 12 months it has become robust and fully mature. It is currently used for the validation of all changes on the Gerrit project.
  3. Multi-site (234 commits) is the long-awaited support for Gerrit that everyone has been waiting for years. It is finally available for all active and supported versions (from 2.16+ onwards).

Top-three companies contributing to Gerrit

gerrit-contributors-2019.png

  1. Google is, with no surprise, still the top contributor of the Gerrit project overall. It is basically stable from 2018 (around 43%) as a confirmation of the continued commitment to the project.
  2. GerritForge is growing significantly in the contribution to the project, with exactly half of the contributions of Google. This is a significant result from 2018 with a 7% growth of involvement.
  3. CollabNet is sliding to the 3rd position (it was 2nd in 2018) with a 3% decrease of contributions. As noticeable mention, however, David Pursehouse from CollabNet is still the number #1 maintainer in terms of number of commits.

Even if it is outside the top#3 contributors companies, SAP deserves a special mention for its continuous involvement in the JGit project, which is at the basis of Gerrit engine, and its fantastic engagement in improving the Gerrit CI system and integrating it with the checks plugin.

Top-three achievements from GerritForge

The outstanding results of contributions of GerritForge in 2019 have been focused on three major topics.

Gerrit multi-site, released and production ready

We released the Gerrit Multi-Site plugin, allowing seamless balancing in a distributed environment, a technologically highly advanced development, crucial for very distributed companies. See https://gerrit.googlesource.com/plugins/multi-site for more information.

Gerrit User Summits in Europe, USA and streaming

We successfully organised and executed the Gerrit User Group in Europe and the US. The event was very well received by the community with an overall attendance of some 87 on-site and 38 in streaming. Have a look at https://gitenterprise.me/2019/12/23/gerrit-user-summit-survey/ for interesting feedback on those from the attendees.
We opened our own local office in Sunnyvale, in the heart of Silicon Valley. A crucial move to better serve our ever-expanding US customer base.

Gerrit Analytics for the Android Open-Source Project

We kickstarted the Gerrit Analytics for the Android open-source project initiative: after the successful adoption of the automatic collection of code metrics on the Gerrit project (see https://analytics.gerrithub.io) the Android team asked GerritForge to start working on extracting the same metrics from their code.

What’s coming in 2020

Gerrit v3.2 is currently under development and it is planned to be released around April/May 2020. It represents a major milestone for the Gerrit project with the support for Java 11 and large JVM heaps, up to hundreds of GBytes. Gerrit v3.2 is definitely the release that everyone that has a big repository (mono-repos) should target as next upgrade. See the Gerrit .roadmap at https://www.gerritcodereview.com/roadmap.html for more details about the planned features.

More work and improvements on the checks plugin, with the aim of fully integrating it into everyone’s user-journey and their CI/CD pipeline. Our first blog-post of 2020 will be how to use Jenkins and Checks plugin together with GerritHub.io.

Multi-site and HA will become more integrated with Gerrit, with the aim of moving parts of their technologies (e.g. global ref-db) into JGit and thus used in Gerrit core.

The Gerrit User Summit 2020 will continue the experiment of cross-pollination with other communities, after the success of the interactions with the JGit and OpenStack communities in 2019. Bazel is the next target, as it is used as the de-facto standard build system for Gerrit and its plugins.


 

Again, Best wishes from your friends at GerritForge and looking forward to a continuing successful partnership in the coming years.

Luca Milanesio
Gerrit Maintainer, Release Manager and member of the ESC.

Gerrit User Summit Survey

The 2019 has been an exceptional year, with the introduction of the next generation of Gerrit Code Review v3 releases and the largest ever Gerrit User Summit in the whole history of 11 years of the project.

As a community we want to improve even further and make the project and the community even better. Collecting metrics has been key for the improvements of the Gerrit product and its performance and, similarly, collecting feedback from the community events is the key to grow and increase the participation and sharing of the experiences about Gerrit Code Review.

Survey results

We have run a survey directed to all of those who have attended the two Gerrit User Summits this year, in Gothenburg and Sunnyvale. See below the executive summary of the results.

Did you achieve your objectives at the Summit?

Screenshot 2019-12-23 at 08.41.14

All of the attendees achieved their objective, which were different for the people, depending on their position and role in the community.

  • Getting the latest news of what’s happening in the Gerrit community and open-source product
  • Meeting the existing members of the community and welcome new contributors
  • Networking with the other Gerrit admin and users around the world
  • Influencing with ideas the future Gerrit roadmap

Overall, how would you rate the event?

Screenshot 2019-12-23 at 08.47.44

Over 76% of the people rated the event very good or excellent. However, as we strive for improvement, there is a substantial 24% of of people that are looking for a better event next year.

What did you like/dislike?

The positives of the event have been:

  • Presentation of the Gerrit roadmap and associated discussions
  • Successful mix of topics, including Zuul and JGit
  • People, atmosphere, friendship and networking
  • High quality of the talks and content

The not so positive sides where:

  • The summit covering the weekend
  • Too focused on Gerrit contributors and admins, no space for users
  • There was too much people for the chosen location
  • The talks and discussions went over the planned schedule

How organized was the event?

Screenshot 2019-12-23 at 08.56.17

89% of the people considered the event very well organized, whilst 11% are looking for improvement, possibly with a bigger venue and better timing.

What topics would you like to see covered next year?

  • Evolution of the User-Interface, roundtable with developers, user-journeys
  • Migration talks and discussions
  • CI/CD integration
  • Monitoring
  • Load testing
  • GitHub integration and pull-requests
  • Gerrit with large clusters
  • User-stories on using Gerrit

Would you like to have a workshop next year?

Screenshot 2019-12-23 at 09.02.56

The vast majority of people would like the next year event to be more informative, including a workshop for learning some of the features of Gerrit Code Review.

What would be the best time for the Summit next year?

Screenshot 2019-12-23 at 09.04.47

For the majority of people (75%) the best time for next year event would be two days during the week, rather than having it again over the weekend.


Thanks everyone again for attending the Gerrit User Summit 2019 in Gothenburg and Sunnyvale, and thanks to GerritForge, Volvo Cars and Google for sponsoring it. We are looking forward to seeing you next year.

Luca Milanesio (GerritForge)
Gerrit Code Review Maintainer, Release Manager and ESC Member.

 

Gerrit User Summit LIVE!

Screenshot 2019-10-16 at 06.46.06

The Gerrit User Summit 2019 is going live and allows anyone to join and participate from across the world.

There are only 30 days left for the Gerrit User Summit 2019, the 12th annual event of the Gerrit Code Review community. It is the year of the records, with Gerrit reaching its largest audience ever in its 11 years of history:

  • Over 120 seats
  • People coming from 27 countries
  • 2 major dates and locations, in Sweden and in the USA
  • 20 talks and presentations
  • All seats sold out 2 months before the event

This is also a historical moment for the community because, for the first time since 2011, the JGit and Gerrit contributors will get together and talk to each other face to face, strengthening the cooperation between the two projects.

Do not miss the event, go live

We have received an enormous amount of requests to join the event on-site in Sunnyvale, much more than any previous year: the event was sold out on Eventbrite 2 months before the starting date.

GerritForge has then decided to invest further funding in sponsorship to organise a full live coverage of the event.

How to participate?

GerritForge has launched a new live event broadcasting site, https://live.gerritforge.com.

Watching the event will be FREE OF CHARGE and without adverts, thanks to the sponsorship by GerritForge. To assure the maximum quality of the video, there is a limit of on-line watchers and a pre-registration is needed.

  1. Go to https://live.gerritforge.com
  2. Click on the “Register to Watch” orange button
  3. Enter your full name, e-mail, company name and country of origin
  4. Click “Register to Watch” green button on the bottom of the page

The live event will allow remote attendants to ask questions and interact with the audience in Sunnyvale: it is going to be truly interactive and useful for the whole JGit and Gerrit community.

What to expect from the Sunnyvale event?

The Sunnyvale event includes a huge number of innovations on the JGit and Gerrit projects.

  • The introduction of the Git ref-table for repositories with huge number of refs
  • Support for Git protocol v2 in Gerrit
  • Git / Gerrit plugin for Gatling, for generating consistent end to end and load tests on Gerrit
  • Zuul support for the new Gerrit’s Checks CI integration
  • Introduction of Gerrit Code Review Analytics for the Android open-source project
  • Frictionless and zero downtime upgrades for Gerrit
  • and many more talks and presentations

Your last chance to attend, reserve your live spot now

There are brand-new ways this year to get in touch and be part of the Gerrit User Summit 2019.

Reserve your live spot today by registering at https://live.gerritforge.com and be part of this record event for the JGit and Gerrit Code Review community.

Luca Milanesio
Gerrit maintainer, release manager and ESC member

 

Gerrit User Summit at Volvo Cars

gerrit-user-summit-volvo-cars

The 2019 is a year of the Summit innovations

The Gerrit User Summit 2019 can definitely be defined as truly innovative in its format and audience.

For the first time in the Gerrit history, the Summit is split into two parts. Volvo Cars have hosted the first in Gothenburg (Sweden) while the second will take place from the 11th to the 17th of November at GerritForge Inc. HQ in Sunnyvale, CA (USA).

The Summit has been repeated on both sides of the Atlantic: the European and US communities come from different background and have different needs. The Gerrit Code Review Community is global and is willing to share experiences and receive feedback from both sides.

A truly open Gerrit Hackathon

We are also innovating on the Hackathon perspective, with three new elements:

  1. The Hackathon is now open to everyone, including the people that have never contributed to Gerrit before. Experienced maintainers have paired with newbies to guide through the very first contributions.
  2. The Hackathon at Volvo Cars has been 100% focused in triaging the massive backlog of open issues and fixing as many bugs as possible for the latest three supported branches: stable-3.0, stable-2.16 and stable-2.15.
  3. The OpenStack and Gerrit communities finally have met and started talking and interacting more closely.

Read the full story on gerritcodereview.com/news.html

The full summary of the event has been published on the Gerrit Code Review project news, read what happened in Gothenburg and, if you are in the USA, do not miss the next forthcoming Gerrit User Summit USA in Sunnyvale.

Hurry up as the seats are running out, REGISTER NOW to avoid missing the event.

Luca Milanesio (GerritForge Ltd)
Gerrit Maintainer, Release Manager, ESC member

Gerrit User Summit 2019

gothenburg-sunnyvale

The Gerrit User Summit 2019 is approaching fast, with new exciting features and a brand-new Gerrit v3.0 release to present and discuss together.

The event is FREE but you need to register in advance for the Gerrit User Summit 2019 on Eventbrite.

One Summit, two events

The Gerrit User Summit & Hackathon is composed of two different events and locations, one in Sweden (Europe), hosted and sponsored by Volvo Cars, and another in California (USA) in the new GerritForge Inc. HQ. Having two separate events in two different quarters will allow most of the community around the globe to attend and share their experience and ideas.

Hackathon open to new contributors

The first part of the event is a 5-days Hackathon reserved for the current Gerrit contributors and maintainers plus anyone that is willing to start contributing to the platform. Differently, from the previous years, the community is now welcome even people that have not contributed to Gerrit before but they are willing to do so.

It is a fantastic opportunity for people to join, work side-by-side and pair with the Gerrit maintainers for a whole week. It can be a unique opportunity to implement the features that you always wanted to see in Gerrit and learning how we develop and review our changes.

The Summit

The usual 2-days Users Summit after the Hackathon is opened to all the members of the community or who is willing to adopt Gerrit Code Review in their development process in the near future.

This year there are a number of exciting news:

  • The introduction of an official Gerrit Community Process with an Engineering Steering Committee and Community Managers
  • Gerrit v3.0 and the full migration to NoteDb and PolyGerrit
  • The multi-site plugin goes OpenSource for allowing anyone to run multiple masters on different sites

The full schedule of the event is available on the Gerrit User Summit 2019 site.

Proposing a new talk

More talks and customer stories are scheduled and, if you have something to tell to the rest of the community, you can submit your talk by creating a change and push to Gerrit Summit 2019 repository:

  1. Open the repository commands age at https://gerrit-review.googlesource.com/admin/repos/summit/2019,commands
  2. Click on “CREATE CHANGE”
  3. Select “master” branch, put your description and create the new change
  4. Click on “Edit” on the top-right of the page
  5. Click on “Open” in the mid toolbar and open the page you would like to edit.

For proposing a new session, you need to add one file with the name of your talk into the sessions folder, using the template.md as an example.

Where and when

Volvo Cars HQ at their HQ in Gothenburg (Sweden)

  • 24-28th August 2019 – Gerrit Hackathon Europe
  • 29-30th August 2019 – Gerrit User Summit Europe

GerritForge Inc HQ in Sunnyvale CA (USA)

  • 11-15th November 2019 – Gerrit Hackathon USA
  • 16-17th November 2019 – Gerrit User Summit USA

Thanks to our sponsors

I would like to thank Volvo Cars and Nicholas Mucci for hosting, sponsoring and organizing the Gerrit User Summit Europe in Gothenburg (Sweden) and GerritForge for hosting and sponsoring the events in both Europe and the USA in Sunnyvale CA (USA).

Luca Milanesio
Gerrit Code Review Maintainer and Release Manager
Member of the Engineering Steering Committee

 

 

 

 

GerritHub.io is moving to Gerrit v3.0

It has been a very long journey, from the initial adoption of PolyGerrit at GerritHub to the epic moment where Gerrit historic GWT was dropped with the Gerrit v3.0 last month.

GerritHub.io has always been aligned with the latest and greatest of Gerrit Code Review and thus the moment has come for us to upgrade to v3.0 and drop forever the GWT UI.

PolyGerrit vs. GWT adoption

Screenshot 2019-06-10 at 21.16.48

The PolyGerrit UX was pretty much experimental until the beginning of 2018: the features were incomplete and people needed to go back to the old GWT UI for many of the basic use-cases.

However, things started to change radically in April 2018 when GerritHub.io adopted Gerrit v2.15 which had a 100% functionally complete PolyGerrit UI. The number of users choosing PolyGerrit jumped from 10% to 35% (3.5x times) with a +70% growth in the number of accesses overall. That means that the adoption was mainly driven by users attracted by the new UI.

In the past 12 months, PolyGerrit became the default user-interface and was just renamed as Gerrit UI. Gradually more and more users abandoned the old GWT interface that now represents 30% of the overall accesses.

Timeline of the upgrade

For the 70% of people that are using already using the new Gerrit UI, the upgrade to Gerrit v3.0 would not be noticeable at all:

  • Gerrit v3.0 UI is absolutely identical to the current one in v2.16
  • All existing API and integration points (e.g. Jenkins integration) in Gerrit v3.0 are 100% compatible with v2.16

For the 30% of people that are still using the old GWT UI, things will be very different as their favorite interface will not be available anymore.

The upgrade will happen with zero-downtime across the various GerritHub.io multi-site deployments and will start around mid-June.

Can I still use GWT with GerritHub.io?

The simple answer is NO: Gerrit v3.0 does not contain any GWT code anymore and thus it is impossible for GerritHub.io to bring back the old UI.

The journey to fill the gaps and reach 100% feature and functional equivalence between the old GWT and the new Polymer-based UI took around 6 years, 18k commits and 1M lines of code written by 260+ contributors from 60+ different organizations. It has been tested by hundreds of thousands of developers across the globe and is 100% production-ready and functionally complete.

If you feel that there was “something you could do in the GWT UI and cannot do anymore with the new Polymer-based UI”, please file a bug to the Gerrit Code Review issue tracker and you will get prompt attention and replies from the community.

Can I stay with Gerrit v2.16 on GerritHub.io?

If your organization cannot migrate to Gerrit v3.0, you could still request a dedicated hosting to GerritForge Ltd, which is the company behind GerritHub.io.

Please fill up the GerritForge feedback form and one Sales Representative will come back to you with the possible options and costs associated.

If you fully endorse GerritHub.io with Gerrit v3.0 and start using the new UI, the service will continue to be FREE for public and private repositories, organizations of all types and size. You can optionally purchase Enterprise Support from one of our plans if you require extra help in using and configuring your Gerrit projects with your tools and organization.

Enjoy the future of Gerrit v3.0 with GerritHub.io and GerritForge.

Luca Milanesio, GerritForge Ltd.
Gerrit Code Review Maintainer and Release Manager
Member of the Engineering Steering Committee

Gerrit v3.0 is here

GerritSprintHackathon2019.photo

Gerrit v3.0 has been released during the last Spring Hackathon at Google in Munich involving over 20+ developers for one week.

It can be downloaded from www.gerritcodereview.com/3.0.html and installed on top of any existing Gerrit v2.16/NoteDb installations. Native packages have been distributed through the standard channels and upgrading is as simple as shutting down the service, running the Rpm, Deb or Dnf upgrade command and starting again.

You can also try Gerrit v3.0 using Docker by simply running the following command:

docker run -ti -p 8080:8080 -p 29418:29418 gerritcodereview/gerrit:3.0.0

This article goes through the whole history of the Gerrit v3.0 development and highlights the differences between the previous releases.

Milestone for the Gerrit OpenSource Project

Finally, after 6 years, 18k commits and 1M lines of code written by 260+ contributors from 60+ different organizations, Gerrit v3.0 is finally out.

The event is a fundamental milestone for the project for two reasons:

  • The start of a new journey for Gerrit, without the legacy code of the old GUI based on Google Web Toolkit and without any relational database. Gerrit is now fully based on a Git repository and nothing else.
  • The definition of a clear community organization, with the foundation of a new Engineering Steering Committee and the role of Community Manager.

The new structure will drive the product forward for the years to come and will help to define a clear roadmap to bring back Gerrit at the center of the Software Development Pipeline.

Evolution vs. revolution

When a product release increments the first major number, it typically introduces a series of massive breaking changes and, unfortunately, a period of instability. Gerrit, however, is NOT a typical OpenSource product, because since the beginning it has been based on rigorous Code Review that brought stability and reliability from its initial inception back in 2008. Gerrit v3.0 was developed during the years by following a rigorous backward compatibility rule that has made Gerrit one of the most reliable and scalable Code Review systems on the planet.

For all the existing Gerrit v2.16 installations, the v3.0 will be much more similar to a rather minor upgrade and may not even require any downtime and interruption of the incoming read/write traffic, assuming that you have at least a high-availability setup. How is this possible? Magic? Basically, yes, it’s a “kind of magic” that made this happen, and it is all thanks to the new repository format for storing all the review meta-data: NoteDb.

Last but not least, all the feature that Gerrit v3.0 brings to the table, have been implemented iteratively over the last 6 years and released gradually from v2.13 onwards. Gerrit v3.0 is the “final step” of the implementation that fills the gaps left open in the past v2.16 release.

With regards to statistics of the changes from v2.16 to v3.0, it is clear that the code-base has been basically stabilized and cleaned up, as you can see from the official GerritForge Code Analytics extracted from analytics.gerrithub.io .

  • 1.5k commits from 63 contributors worldwide
  • 62k lines added and 72k lines removed
  • Google, CollabNet, and GerritForge are the top#3 organizations that invested in developing this release

In a nutshell, the Gerrit code-base has shrunk of 10k lines of code, compared to v2.16. So, instead of talking of what’s new in v3.0, we should instead describe what inside the 72k lines removed.

Removal of the GWT UI

The GWT UI, also referred to as “Old UI” has been around since the inception of the project back in 2008.

Gerrit.GWT-UI

Back in 2008, it seemed a good idea to build Gerrit UI on top of GWT, a Web Framework founded by Google two years earlier and aimed at reusing the same Java language for both backend and the Ajax front-end.

However, starting in 2012, things started to change. The interest of the overall community in GWT decreased, as clearly shown by the StackOverflow trends.

Screenshot 2019-05-18 at 23.34.42

In 2015, Andrew Bonventre from the Chromium Project, one of the major users of the Gerrit Code Review platform, apart from the Android Developers, presented the new prototype of the Gerrit Code Review UI, based on the Polymer project, with the code-name of PolyGerrit, and merged as change #72086.

commit ba698359647f565421880b0487d20df086e7f82a
Author: Andrew Bonventre <andybons@google.com>
Date: Wed Nov 4 11:14:54 2015 -0500

Add the skeleton of a new UI based on Polymer, PolyGerrit

This is the beginnings of an experimental new non-GWT web UI developed
using a modern JS web framework, http://www.polymer-project.org/. It
will coexist alongside the GWT UI until it is feature-complete.

The functionality of this change is light years from complete, with
a full laundry list of things that don't work. This change is simply
meant to get the starting work in and continue iteration afterward.

The contents of the polygerrit-ui directory started as the full tree of
https://github.com/andybons/polygerrit at 219f531, plus a few more
local changes since review started. In the future this directory will
be pruned, rearranged, and integrated with the Buck build.

Change-Id: Ifb6f5429e8031ee049225cdafa244ad1c21bf5b5

The PolyGerrit project introduced two major innovations:

  • Gerrit REST-API: for the first time the interaction of the code-review process has been formalized in stable and well-documented REST-API that can be used as “backend contract” for the design of the new GUI
  • The PolyGerrit front-end Team: for the first time, a specific experienced Team focused on user experience and UI workflow was dedicated to rethink and redesign iteratively all the components of the Gerrit Code Review interactions.

The GWT UI and PolyGerrit lived in the same “package” from v2.14 onwards for two years, with the users left with the option to switch between the two. Then in 2018 with v2.16 the PolyGerrit UI became the “default” interface and thus renamed just “Gerrit” UI.

With Gerrit v3.0, the entire GWT code-base in Gerrit has been completely removed with the epic change by David Ostrovsky “Remove GWT UI“, which deleted 33k lines of code in one single commit.

The new Polymer-based UI of Gerrit Code Review is not very different than the one seen in Gerrit v2.16, but includes more bug fixes and is 100% feature complete, including the projects administrations and ACLs configuration.

Screenshot 2019-05-18 at 22.58.13

Removal of ReviewDb

Gerrit v3.0 does not have a DBMS anymore, not even for storing its schema version as it happened in v2.16. This means that almost everything gets stored in the Git repositories.

The journey started back in October 2013, when Shawn Pearce gave to Dave Borowitz the task to convert all the review meta-data managed by Gerrit into a new format inside the Git repository, called NoteDb.

After two years of design and implementation, Dave Borowitz presented NoteDb at the Gerrit User Summit 2015 and called Gerrit v3.0 the release that will be fully working without the need of any other external DBMS (see the full description of the talk at https://storage.googleapis.com/gerrit-talks/summit/2015/NoteDB.pdf).

Google started adopting NoteDb in parallel with ReviewDb on their own internal setup and in June 2017, the old changes table was definitely removed. However, there was more in the todo-list: at the Gerrit User Summit 2017, Dave Borowitz presented the final roadmap to make ReviewDb finally disappear from everyone’s Gerrit server.

Screenshot 2019-05-18 at 23.18.28

In the initial plans, the first version with NoteDb fully working should have been v2.15. However, things went a bit differently and a new minor release was needed in 2018 to make the format really stable and reliable with v2.16.

Gerrit v2.16 is officially the last release that contains both code-bases and allows the migration from ReviewDb to NoteDb.

Dave Borowitz used the hashtag “RemoveReviewDb” to allow anyone to visualize the huge set of commits that removed 35k lines of code complexity from the Gerrit project.

Migrating to Gerrit v3.0, step-by-step

Gerrit v3.0 requires NoteDb as pre-requisite: if you are on v2.16 with NoteDb, the migration to v3.0 is straightforward and can be done with the following simple steps:

  1. Shutdown Gerrit
  2. Upgrade Gerrit war and plugins
  3. Run Gerrit init with the “batch” option
  4. Start Gerrit

If you are running Gerrit in a high-availability configuration, the above process can be executed on the two nodes individually, with a rolling restart and without interrupting the incoming traffic.

If you are running an earlier version of Gerrit and you are still on ReviewDb, then you should upgrade in three steps:

  1. Migrate from your version v2.x (x < v2.16) to v2.16 staying on ReviewDb. Make sure to upgrade through all the intermediate versions. (Example: migrate from v2.13 to v2.14, then from v2.14 to v2.15 and finally from v2.15 to v2.16)
  2. Convert v2.16 from ReviewDb to NoteDb
  3. Migrate v2.16 to v3.0

The leftover of a DBMS stored onto H2 files

Is Gerrit v3.0 completely running without any DBMS at all? Yes and no. There is some leftover that isn’t necessarily associated with the Code Review meta-data and thus did not make sense to be stored in NoteDb.

  • Persistent storage for in-memory caches.
    Some of the Gerrit caches store their status on the filesystem as H2 tables, so that Gerrit can save a lot of CPU time after a restart reusing the previous in-memory cache status.
  • Reviewed flag of changes.
    Represents the flag that enables the “bold” rendering of a change, storing the update status for every user. It is stored by default on the filesystem as H2 table, however, can be alternatively stored on a remote DBMS or potentially managed by a plugin.

New core plugins

Some of the plugins that have been initially distributed only with the Native Packages and Docker versions are now an integral part of the WAR distribution as well:

  • delete-project
    which allows removing a project from Gerrit and the associated changes.
  • gitiles
    a lightweight code-browser created by Dave Borowitz based on JGit
  • plugin-manager
    the interface to discover, download and install Gerrit plugins
  • webhooks
    the HTTP-based remote trigger to schedule remote builds on CI systems or active any other service from a Gerrit event

The above four plugins already existed before Gerrit v3.0, but they were not included in the gerrit.war.

Farewell to Dave Borowitz and the PolyGerrit Team

After having completed the feature parity between GWT and PolyGerrit, the original PolyGerrit Team members left the Gerrit Code Review project.

Their journey came to an end with the release of the new shiny Polymer-based Gerrit UI. The PolyGerrit Team contributed 45k lines of code on 5.3k commits in 4 years.

Then the last event unfolded during the release of Gerrit v3.0: Dave Borowitz announced that he was leaving the Gerrit Code Review project. I defined the event like “Linus Torvalds announcing he was abandoning the Linux Kernel project”.

Dave Borowitz contributed 316k lines of code on 3.6k commits over 36 repositories in 8 years. He helped also the development of the new Gerrit Multi-Site plugin by donating its Zookeeper-based implementation of a global ref-database.

On behalf of GerritForge and the Gerrit Code Review community, I would like to thank all the past contributors and maintainers that made PolyGerrit and NoteDb code-base into Gerrit: Dave, Logan, Kasper, Becky, Viktar, Andrew and Wyatt.

Luca Milanesio – GerritForge
Gerrit Code Review Maintainer, Release Manager
and member of the Engineering Steering Committee

Gerrit: OpenSource and Multi-Site

One more recording from the Gerrit User Summit 2018 at Cloudera in Palo Alto.

Luca Milanesio, Gerrit Code Review Maintainer and Release Manager, presented the current status of the support for multi-master and multi-site setups with the standard OpenSource Components, developed by GerritForge and the Gerrit Code Review Community.

Introduction

The focus of this talk is sharing with you one experience that we did with the Gerrit server that we maintained, GerritHub.

First of all, I’m just going to tell you how we went through the journey from a single master-slave installation back in 2013 to a fully multi-site setup across two continents.

The evolution of GerritHub to multi-site

GerritHub was born in November 2013. The idea was straightforward. It was just an idea on how to take a single Gerrit server and put the replication plug-in to push to GitHub.

To implement a good and scalable and reliable architecture, you don’t need to design everything up front. At the beginning of your journey, you don’t know who your users are, how many repos are going to create, what the traffic looks like, what the latency looks like: you know nothing.

You need to start small, and we did back in 2013, with a single Gerrit master located in Germany, because we had no idea of where the users would have come from.

Would the people in Europe like it, or rather the people in the U.S. like it, or again the people in China like it? We did not know. So we started with one in Germany.

Screenshot 2019-03-03 at 00.07.01

Because we wanted to make a self-service system what we did was very simple: a simple plugin called, “The GitHub plug-in”. That was just a wizard to add an entry in the replication config.

You have Gerrit incoming traffic, then you configure replication, plugin and eventually push to GitHub. The only complicated part here is that if you do it as a Gerrit administrator you have to define these remotes in the replication.config but you can express it in an optimized way. On a self-service system, you’ve got 1000s of people then will create 1000s remotes automatically. Luckily, the replication plugin works very well and was able to cope with it very well.

Moving to Canada

Then we evolved. The reason why we changed is that people started saying ‘Listen, GerritHub is cool, I can use it, works well in the morning. Why in the afternoon is so slow?“. Uh oh.

We needed to do some data mining to see precisely who was using it, where they were coming from, and what operations they were doing. Then we realized that we had chosen the wrong location because we decided that we wanted to put the Gerrit master in Germany, but the majority of people are coming from the USA.

Depending on how the backbone between the Atlantic Ocean was performing, GerritHub could be faster or slower. One of the complaints that they were saying is that in the morning, GitHub was slower than GerritHub, but in the afternoon it was exactly the opposite.

We were doing some performance tuning and analyzing the traffic, and even when people were saying that it’s very slow, actually GerritHub was a lot faster than GitHub in terms of throughput. The problem was the number of hops between the end user and GerritHub.

We decided that we needed to move from Germany to the other side of the Atlantic Ocean. We could have done to move the service to the USA but we decided to go with Canada because the latency was precisely the same as hosting in the USA but less expensive.

Screenshot 2019-03-02 at 01.19.34

What we could have done is just to move the Master from Germany to the other side of the Atlantic Ocean, but because, from the beginning, we wanted to give a service that is always available, we decided to keep both zones.

We didn’t want to have any downtime, even in this migration. We wanted, definitely, to do one step at a time. No changes in releases, no changes in configurations, only moving stuff around. Whenever you change something, even if it’s a small release change, you change the function, and that has to be properly communicated.

If we were changing data center and version, when something goes wrong, you would have the doubt of what it is. Would it be the new version that is slower or the new data center that is slower? You don’t know. If you change one thing at a time, it must be that thing that wasn’t working.

We did the migration in two steps.

  • Step-1: The Gerrit master in Germany, still the replication to GitHub, and the new master in Canada was just one extra replication end.

    The traffic was still coming on this side of the master, but it was replicated to both Canada and the other GitHub. Then, when that was stable, so we were doing all the testing, the other master was used as it was at Gerrit slave, but was not a slave, all the nodes were master, with just a different role.

  • Step-2: Flip the switch the Gerrit master is in Canada. When replication was online and everything was aligned, we have put a small read-only plug on the German side, which was making the whole node read-only for a few minutes, to give time to the last replication queue to drain.When the replication queue was drained, we flipped the switch, when it was going to the new master it was already read/write.

The people didn’t change their domain, didn’t notice any difference, apart from the much-improved performance. The feedback was like “Oh my, what have you done to GerritHub since yesterday? It’s so much faster. It has been never so fast before.
Because it was the same version, and we were testing in parallel for quite some time, nobody had a single disruption.

Zero-downtime migration leveraging multi-site

But that was not enough, because we wanted to keep Gerrit always up and running and always up to date with the latest and greatest version. Gerrit is typically released twice a year; however, the code-base is stable in every single commit.
However, we were still forced to do the ping pong between the two data centers when we were doing our roll out. It means that every time that an upgrade was done, users had a few minutes of read-only state. That was not good enough, because we wanted to release more frequently.

When you upgrade Gerrit within the same release, let’s say between 2.15.4 and 2.15.5, the process is really straight forward, because you just replace the .war, restart Gerrit, done.

However, If you don’t have at least two nodes on either side, you need to ping pong between the two different data centers, then apply the read-only window, which isn’t great.
We started with a second server on the central node so each node can deal with the entire traffic of GerritHub. We were not concerned about the German side, because we were just using it as disaster recovery.

Going multi-site: issues

We started doubling the Canadian side with one extra server. Of course, if you do that with version v2.14 which problem do you have?

  • Sessions. So, how do you share the sessions? If you login into one Gerrit server, you create one session, then you go to the other and you don’t have have a session anymore.
  • Caches. That is easier to resolve, you just put the TTL of the cache to a very low value, put some stickiness, you may sort this out. However, cache consistency is another problem and needs to be sorted.
  • Indexes are the very painful one, because, at that time there was no support for ElasticSearch. Now things are different, but back in 2017, it wasn’t there.
    What happens is that every single node has its own index. If an index entry is stale, it’s not right until someone is going to re-index.

The guys from Ericsson were developing a high availability plugin. We said instead of reinventing the wheel, why don’t we use a high availability plugin? We started rolling it out to GerritHub and actually, the configuration is more complex and looks like this one.

So, imagine you’ve got in Canada you’ve got two different masters, still only one in Germany. They align the consistency of the Lucene index and the cache through the HA plugin and they still use the replication plugin.

How do you share the repository between the two? You need to have a shared file system. We use what exactly the same used by Ericsson, NFS.

Screenshot 2019-03-02 at 01.21.36

Then, for exposing the service we needed HAProxy, not just one but at least two. If you put one HAProxy, you’re not HA anymore, because if that HA proxy dies, your service goes down. So, you have two HAProxy, they must have a cross configuration’s, it means that both of them, they can redirect traffic to one master or the second master, it’s not one primary and the second backup: they have exactly the same role. They do exactly the same thing, they contain exactly the same code, they’ve got exactly the same cache, exactly the same index. They’re both running at the same time. They’re both accepting traffic.

This is something similar to what Martin Fick (Qualcomm) did, I believe, last year, with the only difference that they did not use HAProxy but only DNS round-robin.

Adoption of the high-availability plugin

Based on the experience of running this configuration on GerritHub, we started contributing a lot of fixes to the high-availability plugin.

A lot of people are asking “Oh, GerritHub is amazing. Can you sell me GerritHub?”. I reply with “What do you mean exactly?”

GerritHub is just a domain name that I own, with Gerrit 2.15, plus a bunch of plugins: replication plugin, GitHub plugin, the high-availability plugin (we use a fork), the web session flat file and a bunch of scripts to implement the health check.

If you guys want to do your own, the same configuration, you don’t need to buy any commercial product. We don’t sell commercial products. We just put the ideas into the OpenSource community to make it happen. Then if you need Enterprise Support, we can help you implement it.

The need for a Gerrit disaster-recovery site

Then we needed to do something more because we had one problem. OVH had historically been very reliable, but, of course, shit happens sometimes.
It happened one day that OVH network backbone was down for a few hours.

That means that any server that was on that Data-Center was absolutely unreachable. You couldn’t even connect to them, you couldn’t even check their status, zero. We turned the traffic to the disaster recovery side, but then we faced a different challenge because we had only one master.
It means that if something happens to that master, I don’t know, a peak, or whatever, is becoming a little bit unhealthy, then we are going to have an outage. We didn’t want to risk to have an outage in that situation.

So we moved to Germany with two servers, OVH with two servers, and afterward, we migrated to Gerrit v2.15 and NoteDb.

Your disaster recovery side is never really safe until you are going to need and use it. Then use it all the times, on a regular basis. This is what we ended up to implement.

We have now two different data centers, managed by two different cloud providers. One is still OVH with Canada, and the other is Hetzner in Germany. We still have the same configuration, HA plugin over a shared NFS, so this one is completely replicated into the disaster recovery site, and we are using the disaster recovery continuously to make sure that is always healthy and aligned.

Leverage Gerrit-DR site for Analytics

Because we didn’t want to serve actual user traffic on the disaster recovery site, because of the synchronization lag between the two sides, we ended up using for all the data mining activities. There is a lot of things that we do on data, trying to understand how our system performs. There is a universe of data that typically either never looked at or you don’t really extract and process in the right way.

Have you ever noticed how much data Gerrit generates under the logs directory? A tremendous amount of data and that data tells you exactly the stories that you want to know. It tells you exactly how Gerrit works today, what I need to do to makes sure that Gerrit will work tomorrow, how functions are performing for the end users, if something has blown up, it’s all there.

We started really long ago to work on that DevOps Analytics space, and we started providing that metrics and insights data for the Gerrit Code Review project itself and reporting it back to the Gerrit Code Review project through the service https://analytics.gerrithub.io.

Therefore we started using the disaster recovery site for analytics traffic because if I do an extraction and processing of my data on my logging, on my activities, is there really a need for an analysis of the visit of 10 seconds ago or not? A small time lag on data doesn’t make any difference from an analytics’s perspective.

Screenshot 2019-03-03 at 00.19.54.png

We were running Gerrit v2.15 here, so the HA plugin needed to be radically different from the one that it is today. We are still massively on the HA plugin fork, but all the changes have been pushed for review on the high availability.

However, the solution was still not good enough because there were still some problems. The problem is that the HA plugin within the same data center relies on the shared file system. We knew within the same data center that was not a problem. But, what about creating an NFS across data-centers in different continents? It would never work effectively because of the latency limitations.

We then started with a low tech solution: rely on the replication plugin for the Git data in the repository. Then every 30 minutes, there was a cronjob that was checking the consistency between the two and then does a delta re-index between the different sites.

Back in Gerrit v2.14, I needed to do as well a database export and import, because contained the reviews.

But, there was also the timing problem: in case of a disaster occurred, people will have to wait for half an hour to get the data after re-index.
They would have not lost anything because the index can be recreated at any time, but the user experience was not ideal. And also, you’ve got the DNS related issues for going from one zone to the other.

Sharding Gerrit across sites

First of all, we want to leverage the sharding based on the repository, available from Gerrit 2.15, which include the project name in each page or REST-API URLs. That allows achieving basic sharding across different nodes even with a simple OpenSource HAProxy or other Workload Balancer, without having the magic of the Google intelligent Gerrit/Git proxy. Bear in mind this is not pure sharding, because all the nodes keep on having all the repositories available on every node. HAProxy is going to be clever and based on the project name and action, will use the most appropriate backend for the reads and writes operation. In that way, we are making sure that you never push on the same repository on the same branch from two different nodes concurrently.

Screenshot 2019-03-02 at 01.23.16

How magic works? The replication plugin takes care of synching the nodes. Historically with master-slave, the synchronization was unidirectional. With multi-site instead, all masters are replicating to each other. It means that the first master will replicate to the second and the other way around. Thanks to the new URL scheme, Google made our lives so much easier.

Introducing the multi-site plugin

We have already started writing a brand-new plugin called multi-site. Yes, it has a different name, because the direction that it’s taking is completely different in terms of decisions compared to the high-availability plugin, which required having a shared file system. Secondly, the HTTP synchronous communication with the other site was not feasible anymore. What happens if, for instance, a remote Site in India is not reachable for 50 seconds? I still want things events and data to be put into a persistent queue to be eventually sent through the remote site.

Screenshot 2019-03-02 at 01.24.06
For the multi-site broker implementation, were have decided to start with Kafka, because there is already an implementation of the stream events it in Gerrit as a plugin. The multi-site plugin, however, won’t be limited to Kafka but could be extended to support other popular brokers such as NATS.

We will go to the location-aware DNS because we want the users to access the HAProxy that is closer to him. If I am living in Germany maybe it makes sense for me to use only the European servers, but definitely not the servers in California.

You will go by default to a “location-aware” site that is closer to you. However, depending on what you do, you could still be laster on redirected to another server across zones.

For example, if I want to fetch data, I can still fetch everything on my local site. But, if I want to push data, the operation can either be executed locally or forwarded to the remote site, if the target repository has been sharded remotely.

Screenshot 2019-03-02 at 01.24.43

The advantage is the location-awareness and data-locality. The majority of data transfer will happen with my local site because, at the end of the day, the major complaints of people using Gerrit with remote masters is a sluggish GUI.

If you have Gerrit master/slave, for instance, a Gerrit master in San Francisco and your Gerrit slaves in India, you have the problem that everyone from India will still have to access a remote GUI from the server in San Francisco, and thus would experience a very slow GUI.

Even if only 10 percent of your traffic is a write operation, you would suffer from all the performance penalties of using a remote server in San Francisco. However, if I have an intelligent HAProxy, with also an intelligent logic in Gerrit that understands where the traffic needs to go to, I can always talk to my local server is in my zone. Then, depending on what I do, I can use my local server or a remote server. This happens behind the scenes and is completely transparent to the user.

Q&A

Screenshot 2019-03-02 at 01.25.33

Q I just wanted to ask if you’re ignoring SSH?

SSH is a problem in this architecture, right, because HAProxy supports SSH, but has a big problem.

Q: You don’t know what projects, you don’t have any idea of the operation, whether it’s read or write.

Exactly. HAProxy works at transport levels, so it means that it knows that flow of encrypted data going back and forth to the Gerrit server, but cannot see anything in clear and thus cannot make an educated decision based on it. Your data may end up being forwarded to the zone where your traffic is not optimized, and you will end up waiting a little bit more. But, bear in mind, that the majority of complaints about Gerrit master/slave is not really on the speed of the push/pull but rather on the sluggish Gerrit GUI.

Of course, we’ll find as a Community a solution to the SSH problem, I would be more than happy to address that in multi-site as well.

Q: Solution is to get a better LAN because we don’t have any complaints about the GUI across the continents.

Okay, yeah, get a super fast network across the globe, so everyone and everywhere in the world will be super fast accessing a single unique central point. However, not everyone can have a super-fast network. For instance in GerritHub, we don’t own the infrastructure so we cannot control the speed. Similarly, a lot of people are adopting a cloud infrastructure, where you don’t own and control the infrastructure anymore.

Q: Would you also have some Gerrit slaves in this setup. Would there be several masters and some of the slaves? Or everything will become just a master?

In multi-site architecture, everything can be a master. The distinction between master and slave does not exist anymore. Functionally speaking, some of them will be master for certain repositories and slave for other repositories. You can say that every single node is capable of managing any type of operations. But, based on this smart routing, any node can potentially manage everything, but effectively forwards the calls based on where is best to execute them.
If with this simple multi-site solution you can serve the 99% of the traffic locally, and for that one percent, we accept the compromise of going a bit slower, than it could still sound very good.

We implement this architecture last year with a client that is distributed worldwide and it just worked, because it is very simple and very effective. There isn’t any “magic product” involved but simply standard OpenSource components.

We are on a journey, because as Martin said last year in his talk, “Don’t try to do multi-master all at once.” Every single situation is different, every client is different, every installation is different, and all your projects are different. You will always need to compromise, find the trade-off, and tailor the solution to your needs.

Q: I think in most of the cases that I saw up there, you were deploying a Gerrit master in multiple locations? If you want to deploy multi-master in the same sites, like say I want two Gerrit servers in Palo Alto, does that change the architecture at all? Basically, I am referring to shared NFS.

This is actually the point made by Martin: if you have the problem on multi-site, it’s not a problem of Gerrit, it’s a problem of your network because your network is too slow. You should solve the network, you shouldn’t solve this problem in Gerrit.
Actually, the architecture is this one. Imagine you have a super fast network all across the globe, everyone is reaching the same Gerrit in the same way. You have a super fast direct connection with your shared NFS, so you can go and grow your master and scale horizontally.

The answer is, for instance, yes, if your company can do that. Absolutely, I wouldn’t recommend doing multi-site at all. But if you cannot do anything about speed, and you got people that work remotely and unfortunately, you cannot go and put a cable between the U.S. and India, under the sea just for your company, then maybe you want to address the problem in a multi-site fashion way.

One more thing that I wanted to point out, is that the multi-site plugin makes sense even in the same site. Why this picture is better than the previous one? I’m not talking about multi-site, I’m talking about multi-master on the same data center. Why this one is better?

You’ll read and write into both. Read, write, traffic to both. The difference between this one and previous is that there is no shared file system on this one. The previous, even within the same data center, if you use a shared file system, you still have a single point of failure. Because, even if you are willing to buy the most expensive shared file system with the most expensive network and system that exists in the world, they will still fail. Reliability cannot be simply resolved by throwing more money on it.
It’s not the money that makes my system reliable. Machines and disks will fail one day or another. If you don’t plan for failures, you’re going to fail anyway.

Q: Let’s say when you’re doing an upgrade on all of your Gerrit masters right? Do you have an automatic mechanism to upgrade? I’m coming from a perspective where we have our Gerrit on Kubernetes. So, sure, if I’m doing a rolling upgrade, it’s kind of like a piece of cake, HAProxy is taking care of by confd and all that stuff, right?
In your situation, how does an upgrade would look like, especially if you’re trying to upgrade all the masters across the globe? Do you pick and choose which site gets upgraded first and then let the replication plugins catch up after that?

First of all, the rolling upgrade with Kubernetes will work if you do minor upgrades. If you’re doing major Gerrit upgrades, it doesn’t. This is because that changes the data, right? With the minor upgrades, you do exactly the same here. Even if the others in the other zones have a different version, they are still interoperable, right, because they talk exactly to the same schema, exactly to the same API, exactly in the same way.

Q: I guess taking Kubernetes out of the picture, even, so with the current multi-master setup you have, if you’re not doing just a trivial upgrade, how do you approach that usually?

If you are doing a major version upgrade, that is going to orchestrate exactly like the GerritHub upgrade from 2.14 to 2.15. You need to use what I called a ping-pong technique. You basically need to do across data centers.

It’s not fully automated yet. I’m trying to automate it as much as possible and contribute back to the community.

Q: In the multi-master, when you’re doing the major upgrade, and even in the ping pong, so if the schemas changed, and you’re adding the events to replication plugin, are you going to temporarily suspend replication during the period of time? Because the other items on the earlier version don’t understand that schema yet. Can you explain that a little?

Okay, when you do the ping pong that was this morning presentation, what happens is the upgrade on the first node, interruption the traffic there, you do all the testing you want, you are behind with the original master but it catches up with replication and does all the testing.

With regards to the replication event, they are not understood by the client, such as the Jenkins Gerrit trigger plugin. That point was raised this morning as well. If you go to YouTube.com/GerritForgeTV, there is the recording of my talk of last year about a new plugin that is not subject to this fluctuation of Garret version.

Luca Milanesio – Gerrit Code Review Maintainer and Release Manager