2023: New Year and opportunities for GerritForge and Gerrit Code Review

TL;DR: GerritForge has been dedicating its efforts to organising and managing the Gerrit User Summit in London back in November 2022, in conjunction with the release of Gerrit v3.7. The event has been a great success, with a significant presence on-site and record-breaking attendees on the GerritForge TV youtube channel. It has also committed to its promises to research and improve the JGit and Gerrit scalability to large mono-repos, with tens of millions of objects and refs. 2023 will see the finalisation of these efforts with an increase in development efforts and a new JGit Committer for pushing the platform to a new level of performance and scalability and a new innovating system for collecting and optimising the repository metrics automatically. Stay tuned.

Read the full story here below (9 mins read).


2022 has been a critical year for turning the Gerrit Code Review community and development back on track after the COVID-19 pandemic. At GerritForge, we’ve been working hard to make sure that the development, support, and innovation of Gerrit Code Review continue on its main objectives.

Gerrit Code Review v3.6 and v3.7

We have continued to deliver on the development and release of Gerrit Code Review and its plugins, helping the testing and releasing of versions v3.6.0 (May) and v3.7.0 (November).

Some numbers of the past 12 months’ development contributions by individual committers and companies:

  • 3,627 Changes have been merged on 76 projects related to the Gerrit Code Review platform, including JGit
  • 113 committers from 42 different organisations

A special mention to the top #10 contributors: Google (Ben Rohlfs, Edwin Kempin, Chris Pouchet, Dhruv Srivastava, Frank Borden, Milutin Kristofic), GerritForge (Luca Milanesio), Wikimedia (Paladox) and SAP (Matthias Sohn and Thomas Dräbing).

In comparison with 2021, we had 25% fewer changes merged but with more contributors coming from more companies, which is a symptom to a very healthy and thriving ecosystem of maintainers.

GerritForge has committed to resuming the face-to-face user summits, which were suspended since 2020.

The Gerrit User Summit 2022 took place in London, UK the 10-11 of November in a hybrid format, with people having the opportunity to participate either on-site or remotely on GerritForge’s YouTube TV channel.

It was a glorious success, with record-breaking attendance from all around the globe:

  • 50 people registered to attend on-site, 26 of them managed to arrive despite the London tube strike, whilst the others attended remotely
  • 235 people viewed the summit on YouTube with an average view time of 40 mins (one talk)

The summit survey had an outstanding report showing a huge acceptance and appreciation of the event:

  • 82% rated the remote video streaming as “good” or “outstanding”
  • 96% rated the quality of the summit as “good” or “outstanding.”
  • 100% would recommend the summit to a colleague, with 83% strongly recommending it

GerritHub.io SLA gets closer to five-nines.

We have been working hard to make Gerrit more stable and resilient throughout 2022, discovering and fixing many issues in the code base and on the multi-site software architecture.
In 2022, GerritHub.io had only six small hiccups for a total of 19 mins of downtime (SLA = 99.997%) over a 12-month period, a 75% reliability improvement compared to 2021.

We have run extensive RCAs on the causes of the downtime and identified two leading issues, which are explained in the details below.

The “anonymous unlimited query” hole in Gerrit
GerritHub.io has been subject to a 15 mins outage because of anonymous users being able to bring offline all the sites before the system could auto-recover.
Gerrit allows bypassing of all limits set in the ACLs for running queries by simply adding the “no-limit” parameter.
Returning an arbitrary payload without limits could allow a single user to generate a server-side workload for collecting and building a GBytes-sized JSON payload; unfortunately, that option was available to everyone, including anonymous users making any publicly faced Gerrit Code Review installation subject to deny-of-service attacks.
We have identified the issue, reported and fixed it in Gerrit with Change 333304, which has been included in Gerrit v3.3.10, v3.4.4, v3.5.1, and all v3.6.0 or later releases.

More granular monitoring and alerting
We have lowered the threshold of uptime checks on GerritHub.io to 1 minute, giving us the ability to detect and react immediately to 4 smaller hiccups. We have detected a lack of scalability for some specific higher-load projects. Those hiccups have been responsible for 2 mins of downtime over the 2nd part of 2022. Many more projects are also planning to be onboarded on GerritHub.io; hence we do need to address this project-specific capacity needs.

Scaling Gerrit Code Review and JGit beyond its limits

We have been investing a massive effort in building a test environment designed to stress Gerrit and JGit to its limits and identify all the limitations and bottlenecks that prevented us from scaling further.

Scaling the test repository
We have created over the months some test repositories that increased in every dimension:

  • Tens of millions of refs as both refs/changes and refs/heads
  • Millions of delta-chains
  • Tens of millions of Git objects
  • Packfiles of tens of Giga-bytes and packed refs of hundreds of megabytes

For generating a significant load on both client and server side, we have invested more into the aws-gerrit cloud setups and gatling-git performance loading tool.

There were some “well-known” issues and additional surprising ones.

SHA1 complexity and CPU utilization for large entities
JGit has been used SHA1 for identifying uniqueness not just for Git objects but also for other large entities. However, computing SHA1 has become increasingly CPU intensive because of the relatively recent findings about collisions on shattered.io.
We have highlighted two major potential improvements in cooperation with Matthias Sohn (SAP) on the raw SHA1 performance and its application for detecting packed-refs changes on the filesystem.

Commit priority queues
JGit has a custom implementation of priority queues which are intensively used in RevWalk, which has almost quadratic complexity. That isn’t a problem for small to medium chains of commits; however, when the number of commits reaches millions, the performance degradation becomes unbearable.
We have replaced the JGit’s custom implementation with the one provided by the Java JVM library, which has a logarithmic complexity that massively improves its performance with large commit chains.

Unwanted reachability checks
JGit needs to perform a full reachability check whenever a remote unknown client is advertising refs, which makes sense when serving a remote client. However, the cost of full reachability of millions of advertised refs can be a daunting task that may be alleviated if the remote end can be considered trusted.

Fixing JGit bitmaps
Since the introduction of Git bitmap, the whole community has learned how key they are in speeding up the counting and selection during the clone phase.
However, large and unoptimized bitmaps could be so unhelpful for Git that instead of speeding up, they could represent a massive overhead for the system, causing CPU spikes and, eventually, lowering the throughput of the server.
Git bitmaps are compressed using the JavaEWAH library, which is good for memory consumption but evil for CPU utilization: that is the reason why the smaller is best for performance.
We have discovered and fixed a critical issue with the JGit bitmap generation that was causing the inclusion of all commits and BLOBs pointed by annotated tags. Also, we have introduced the ability to inform JGit about the heads that can be excluded from the bitmap, allowing to shorten the creation tens of thousands times (5h generation time for a 2k refs to as little as 60s) and increase its effectiveness by 200%.

Millions of unneeded ref logs
When performing a clone of a repository with millions of heads, JGit created one local reflog file for every remote ref, including the ones there were not actually cloned but just fetched as remote references. This was creating a significant performance gap between JGit and Git, which would instead lazily create the reflog files once they are effectively checked out the first time. Cloning a single branch of a repository with millions of remote refs took around 1h, compared to a few minutes of Git.

All of the findings were included in multiple updates on the following components:

  • JGit changes: all fixes were also provided to stable-5.13, the last supported branch for Java 8, which allows benefiting from these improvements for older versions of Gerrit from v2.16 onwards.
  • pull-replication went through major performance improvements, achieving a 1000x times faster execution time compared to the traditional replication plugin
  • aws-gerrit is going through upgrades for making use of pull-replication plugin, including the support for the bearer token which allows to replicate virtually any repository, including All-Users.git
  • gatling-git: we have upgraded the Gatling version and JGit to the latest stable-5.13 to include the latest performance improvements.
  • git-repo-metrics: we have introduced a brand-new plugin that allows us to keep under control the major dimensions of a repository and therefore graph their increase over time.

GerritForge goals for 2023

We are definitely not done yet with the performance improvements on Gerrit and JGit: there are still significant improvements to be made, and JGit changes to get merged into the mainstream branches.
We believe we are on track to finalize the job and allow a stable and scalable platform for large Git repositories in 2023.

Finalise what we cooked in 2022 for JGit
JGit has a new maintainer, David Ostrovsky, awarded in 2022 as Git committer of the project. GerritForge’s devs are focused to get more reviews and attention to the JGit performance improvements. We are committed to finalising all the open changes related to large repositories.

JGit multi-pack indexes support
There is still a major gap between JGit and Git when dealing with very active repositories: multi-pack indexes. The proliferation of packfiles would eventually lead to a long and painful search-for-reuse phase for BLOBs which could be cut down 100s of times with a multi-pack index.

Git repository optimiser for Gerrit
We have been working on tracking the live information on the Git repository, thanks to the git-repo-metrics plugin. Wouldn’t it be nice to have a tool that can do something with it and automatically?
We would be doing R&D on how to correlate the repository metrics, the Git audit trail, and the performance data for making AI-based decisions on what needs to be improved on the repository.
This work stream is going to be useful for any Git repository, not just the ones powered by Gerrit Code Review. The ‘git-repo-metrics’ and the repository optimiser would also apply to other products, including GitHub and GitLab.

Gerrit v3.8 and projects-specific change numbers
We will finalise the design document for the transition to project-specific change numbers in Gerrit v3.8. That would allow the seamless migration of projects across Gerrit setups without having to worry about changes renumbering anymore.

Gerrit Code Review testing and GerritForge-certified binaries
GerritForge is spending a tremendous amount of time developing test environments and tools for serving the Gerrit community with more stable releases and improving the quality of its code. We want to intensify the effort and also offer our platinum support customers a unique service that includes the GerritForge digital signature and rubber stamp on the binaries of Gerrit Code Review and its plugins that have been successfully tested and validated for being production-ready.
Stay tuned; more details are coming soon …

GerritForge company forecast in 2023

GerritForge Inc. will finalise its roll-out to the USA, and all contracts and services will be run from Sunnyvale, CA and Europe. Over 2022, 60% of the customers and businesses have already been moved, and the operation will be completed over the course of 2023.

We are looking forward to doubling our revenue figures in 2023 and also our contributions to the open-source community, with a main focus on JGit as the driver of performance growth for Gerrit Code Review.


2023 is going to be an incredible year for GerritForge, Gerrit Code Review, and the JGit community altogether.

Happy New start of the Year 2023!

Luca Milanesio (GerritForge)
Gerrit Code Review Maintainer and Release Manager
Member of the Gerrit Engineering Steering Committee

New year, new JGit contributions coming

We have shared GerritForge’s goals for improving Gerrit in 2022: most of them will include significant contributions to JGit, the Java-based engine powering Gerrit Code Review’s support for Git data-format and protocol.

GerritForge will contribute many more changes to JGit during 2022, all focused on improving the functionality and performance of large mono-repos. All changes will go through the formal review through Eclipse Foundation’s JGit Gerrit Project.

Lack of knowledge and reviews

The JGit project has suffered from major losses in the past few years, which is clearly shown by the list of top-contributors vs. their recent 12 months activity. I have tried running a “git blame” against all the JGit code-base, which is a heuristic (therefore a rough approximation) of which part of the JGit code has been last written/edited.

  1. (49998 LOC) – Shawn Pearce
  2. (37854 LOC) – Thomas Wolf
  3. (31417 LOC) – Matthias Sohn
  4. (13593 LOC) – David Pursehouse
  5. (13200 LOC) – Christian Halstrick

See below the number of contributions (excluding merges and trivial changes) of the above 5 maintainers in the past 12 months:

  1. Shawn Pearce – 0 changes – He sadly passed away in 2018 
  2. Thomas Wolf – 86 changes
  3. Matthias Sohn – 128 changes
  4. David PurseHouse – 0 changes
  5. Christian Halstrick – 0 changes

The above stats show that the currently active maintainers (Thomas and Matthias) appear in the bit blame of 69k out of the total 390k LOCs.

Thomas and Matthias are doing a fantastic job in keeping up as much as possible with the incoming changes’ pace and reviewing them at their best. At times, though, the incoming change may touch parts of the code they are less familiar with and, therefore, would require more eyes or more time to review.

Breaking the vicious circle

The JGit project is in a dangerous vicious circle.

  1. Incoming changes would take longer to get reviewed and merged.
  2. The lengthy reviews cause detriment to contributors that may lose interest in following up contributions or upload new changes.
  3. The lack of contributions and merged changes would keep the pressure on current maintainers, which fuel point 1. again.

How can GerritForge help break this vicious circle, provide meaningful contributions, and get them merged fast and with proper and thorough reviews?

Keeping the pace of contributions is key to avoid detriment: GerritForge will therefore create a “dev branch” of JGit. All the GerritForge’s contributions to JGit master branch will be part of the dev branch and will go through a rigorous code-review and E2E validation cycle, including the Gatling tests for Gerrit.

Two-steps validation workflow

GerritForge’s workflow for validating JGit changes with Gerrit
  1. A new change is uploaded to the Eclipse Foundation JGit project.
  2. The normal Eclipse Foundation’s CI verification builds the change and, if passes all tests, provides a Verified +1
  3. One of the JGit maintainers, or members of the GerritForge’s contributors, can provide a Code-Review +1 score with the additional description “Approved for dev
  4. The special “Approved for dev” description triggers the cherry-pick of the Change onto the GerritForge’s JGit dev branch
  5. The new change for review on the JGit dev branch triggers the creation of a Change on gerrit-review.googlesource.com/gerrit with the update of the JGit submodule pointing to the open change.
  6. The JGit submodule update Change triggers the current E2E validation using the Gatling tests, developed and hosted by GerritForge. If all tests are passing, the Gerrit change receives a Verified +1.
  7. The cherry-picked Change on JGit dev branch receives a Verified +1
  8. The cherry-picked Change is merged to the JGit dev branch
  9. The merge of the cherry-picked Change is notified on the original Change with a Code-Review +1 score with description “Merged in dev”.
  10. One of the JGit maintainers can finalise the review and, if all is good, provides the final Code-Review +2 and merge the change on JGit master.

NOTE: The above workflow will only apply to the upcoming changes on the master branch, where we do need to innovate and implement new features at a faster pace. We have to plans to apply the workflow to any stable branches.

Plus and minuses

There are good things on the above workflow, however, there are also risks:

  • Complexity: the secondary dev branch will undergo an E2E validation process with Gerrit, which is obviously complex and it may break at times.
  • Danger of forking: if the JGit maintainers would veto the Change at step 10 the changes already merged in dev would make effectively dev and master branches diverge, which isn’t a good thing and it should be avoided as much as possible.

The augmented lifecycle, which also involves Gerrit E2E tests with Gatling, has also many advantages:

  • Additional E2E validation: incoming changes on JGit would involve an E2E validation with Gerrit against the suite of E2E Gatling tests, which is good feedback and gives more confidence in merging code also on less known parts of the JGit codebase.
  • Increased velocity: speedup validation of new incoming changes and getting them merged to the JGit dev branch, without impacting the pace and quality of reviews from the current JGit maintainers.
  • Gerrit edge release: allows to have a downloadable Gerrit change that includes the JGit dev branch, allowing canary deployments and see how Gerrit behaves with the latest and greatest of JGit code.

From a JGit project’s perspective, the flow of incoming changes will have an additional E2E validation, which is always a good thing. Additionally, it will bring more contributions and inspiration for new innovative changes on the project and attracting more and more talent.

Ready to gear-up contributions on JGit?

The workflow proposed is a starting point; however, we are committed to giving it a go and seeing how it would work in practice and if it will be enough to gear up the contributions to the JGit project.

How to enable Git v2 in Gerrit Code Review

git-2-26

(c) Shutterstock / spainter_vfx

Git protocol v2 landed in Gerrit 3.1 on the 11th of October 2019. This is the last email from David Ostrovsky concluding a thread of discussion about it:

It is done now. Git wire protocol v2 is a part of open source Gerrit and will be
shipped in upcoming Gerrit 3.1 release.

And, it is even enabled per default!

Huge thank to everyone who helped to make it a reality!

A big thanks to David and the whole community for the hard work in getting this done!

This was the 3rd attempt to get the feature in Gerrit after a couple of issues encountered along the path.

Why Git protocol v2?

The Git protocol v2 introduces a big optimization in the way client and server communicate during clones and fetches.

The big change has been the possibility of filtering server-side the refs not required by the client. In the previous version of the protocol, whenever a client was issuing a fetch, all the references were sent from the server to the client, even if the client was fetching a single ref!

In Gerrit this issue was even more evident, since, as you might know, Gerrit leverages a lot the refs for its internal functionality, even more with the introduction of NoteDb.

Whenever you are creating a Change in Gerrit you are updating/creating at least 3 refs:

  • refs/changes/NN/<change-num>/<patch-set>
  • refs/changes/NN/<change-num>/meta
  • refs/sequences/changes

In the Gerrit project itself, there are currently about 104K refs/change and 24K refs/change/*/meta. Imagine you are updating a repo which is behind just a couple of commits, you will get all those references which will take up most of your bandwidth.

Git protocol v2 will avoid this, just sending you back the references that the Git client requested.

Is it really faster?

Let’s see if it really does what is written on the tin. We have enabled Gerrit v2 at the end of 2019 on GerritHub.io, so let’s test it there. You will need a Git client from version 2.18 onwards.

> git clone "ssh://barbasa@review.gerrithub.io:29418/GerritCodeReview/gerrit"
> cd gerrit
> export GIT_TRACE_PACKET=1
> git -c protocol.version=2 fetch --no-tags origin master
19:16:34.583720 pkt-line.c:80           packet:        fetch< version 2
19:16:34.585050 pkt-line.c:80           packet:        fetch< ls-refs
19:16:34.585064 pkt-line.c:80           packet:        fetch< fetch=shallow
19:16:34.585076 pkt-line.c:80           packet:        fetch< server-option
19:16:34.585084 pkt-line.c:80           packet:        fetch< 0000
19:16:34.585094 pkt-line.c:80           packet:        fetch> command=ls-refs
19:16:34.585107 pkt-line.c:80           packet:        fetch> 0001
19:16:34.585116 pkt-line.c:80           packet:        fetch> peel
19:16:34.585124 pkt-line.c:80           packet:        fetch> symrefs
19:16:34.585133 pkt-line.c:80           packet:        fetch> ref-prefix master
19:16:34.585142 pkt-line.c:80           packet:        fetch> ref-prefix refs/master
19:16:34.585151 pkt-line.c:80           packet:        fetch> ref-prefix refs/tags/master
19:16:34.585160 pkt-line.c:80           packet:        fetch> ref-prefix refs/heads/master
19:16:34.585168 pkt-line.c:80           packet:        fetch> ref-prefix refs/remotes/master
19:16:34.585177 pkt-line.c:80           packet:        fetch> ref-prefix refs/remotes/master/HEAD
19:16:34.585186 pkt-line.c:80           packet:        fetch> 0000
19:16:35.052622 pkt-line.c:80           packet:        fetch< d21ee1980f6db7a0845e6f9732471909993a205c refs/heads/master
19:16:35.052687 pkt-line.c:80           packet:        fetch< 0000
From ssh://review.gerrithub.io:29418/GerritCodeReview/gerrit
 * branch                  master     -> FETCH_HEAD
19:16:35.175324 pkt-line.c:80           packet:        fetch> 0000

> git -c protocol.version=1 fetch --no-tags origin master
19:16:57.035135 pkt-line.c:80           packet:        fetch< d21ee1980f6db7a0845e6f9732471909993a205c HEAD\0 include-tag multi_ack_detailed multi_ack ofs-delta side-band side-band-64k thin-pack no-progress shallow agent=JGit/unknown symref=HEAD:refs/heads/master
19:16:57.037456 pkt-line.c:80           packet:        fetch< 07c8a169d6341c586a10163e895973f1bdccff92 refs/changes/00/100000/1
19:16:57.037489 pkt-line.c:80           packet:        fetch< 0014ca6443ac0af338e2677b45e538782bb7a12e refs/changes/00/100000/meta
19:16:57.037502 pkt-line.c:80           packet:        fetch< b4af8cad4d3982a0bba763a5e681d26078da5a0e refs/changes/00/100400/1
19:16:57.037513 pkt-line.c:80           packet:        fetch< 9ec6e507c493f4f1905cd090b47447e66b51b7e1 refs/changes/00/100400/meta
19:16:57.037523 pkt-line.c:80           packet:        fetch< a80359367529288eea3c283e7d542164bced1e2f refs/changes/00/100800/1
19:16:57.037533 pkt-line.c:80           packet:        fetch< 170cced6d81c25d1082d95e50b37883e113efd01 refs/changes/00/100800/meta
19:16:57.037544 pkt-line.c:80           packet:        fetch< 6cb616e0ad4b3274d4b728f8f7b641b6bd22dce4 refs/changes/00/100900/1
19:16:57.037554 pkt-line.c:80           packet:        fetch< 286d1ee1574127b76c4c1a6ef0f918ad4c61953a refs/changes/00/100900/meta
19:16:57.037606 pkt-line.c:80           packet:        fetch< 312ba566d2620b43fb90be3e7c406949edf6b6d9 refs/changes/00/10100/1
19:16:57.037619 pkt-line.c:80           packet:        fetch< dde4b73cb011178584aae4fb29a528018149d20b refs/changes/00/10100/meta

…. This will go on forever …. 

As you can see there is a massive difference in the data sent back on the wire!

How to enable it?

If you want to enable it, you just need to update you git config (etc/jgit.config in 3.1 and $HOME/.gitconfig in previous versions) with the protocol version to enable it and restart your server:

[protocol]
  version = 2

Enjoy your new blazing fast protocol!

If you are interested in more details about the Git v2 protocol you can find the specs here.

Fabio Ponciroli (GerritForge)
Gerrit Code Review Contributor

Gerrit User Summit LIVE!

Screenshot 2019-10-16 at 06.46.06

The Gerrit User Summit 2019 is going live and allows anyone to join and participate from across the world.

There are only 30 days left for the Gerrit User Summit 2019, the 12th annual event of the Gerrit Code Review community. It is the year of the records, with Gerrit reaching its largest audience ever in its 11 years of history:

  • Over 120 seats
  • People coming from 27 countries
  • 2 major dates and locations, in Sweden and in the USA
  • 20 talks and presentations
  • All seats sold out 2 months before the event

This is also a historical moment for the community because, for the first time since 2011, the JGit and Gerrit contributors will get together and talk to each other face to face, strengthening the cooperation between the two projects.

Do not miss the event, go live

We have received an enormous amount of requests to join the event on-site in Sunnyvale, much more than any previous year: the event was sold out on Eventbrite 2 months before the starting date.

GerritForge has then decided to invest further funding in sponsorship to organise a full live coverage of the event.

How to participate?

GerritForge has launched a new live event broadcasting site, https://live.gerritforge.com.

Watching the event will be FREE OF CHARGE and without adverts, thanks to the sponsorship by GerritForge. To assure the maximum quality of the video, there is a limit of on-line watchers and a pre-registration is needed.

  1. Go to https://live.gerritforge.com
  2. Click on the “Register to Watch” orange button
  3. Enter your full name, e-mail, company name and country of origin
  4. Click “Register to Watch” green button on the bottom of the page

The live event will allow remote attendants to ask questions and interact with the audience in Sunnyvale: it is going to be truly interactive and useful for the whole JGit and Gerrit community.

What to expect from the Sunnyvale event?

The Sunnyvale event includes a huge number of innovations on the JGit and Gerrit projects.

  • The introduction of the Git ref-table for repositories with huge number of refs
  • Support for Git protocol v2 in Gerrit
  • Git / Gerrit plugin for Gatling, for generating consistent end to end and load tests on Gerrit
  • Zuul support for the new Gerrit’s Checks CI integration
  • Introduction of Gerrit Code Review Analytics for the Android open-source project
  • Frictionless and zero downtime upgrades for Gerrit
  • and many more talks and presentations

Your last chance to attend, reserve your live spot now

There are brand-new ways this year to get in touch and be part of the Gerrit User Summit 2019.

Reserve your live spot today by registering at https://live.gerritforge.com and be part of this record event for the JGit and Gerrit Code Review community.

Luca Milanesio
Gerrit maintainer, release manager and ESC member

 

Gerrit v3.0 is here

GerritSprintHackathon2019.photo

Gerrit v3.0 has been released during the last Spring Hackathon at Google in Munich involving over 20+ developers for one week.

It can be downloaded from www.gerritcodereview.com/3.0.html and installed on top of any existing Gerrit v2.16/NoteDb installations. Native packages have been distributed through the standard channels and upgrading is as simple as shutting down the service, running the Rpm, Deb or Dnf upgrade command and starting again.

You can also try Gerrit v3.0 using Docker by simply running the following command:

docker run -ti -p 8080:8080 -p 29418:29418 gerritcodereview/gerrit:3.0.0

This article goes through the whole history of the Gerrit v3.0 development and highlights the differences between the previous releases.

Milestone for the Gerrit OpenSource Project

Finally, after 6 years, 18k commits and 1M lines of code written by 260+ contributors from 60+ different organizations, Gerrit v3.0 is finally out.

The event is a fundamental milestone for the project for two reasons:

  • The start of a new journey for Gerrit, without the legacy code of the old GUI based on Google Web Toolkit and without any relational database. Gerrit is now fully based on a Git repository and nothing else.
  • The definition of a clear community organization, with the foundation of a new Engineering Steering Committee and the role of Community Manager.

The new structure will drive the product forward for the years to come and will help to define a clear roadmap to bring back Gerrit at the center of the Software Development Pipeline.

Evolution vs. revolution

When a product release increments the first major number, it typically introduces a series of massive breaking changes and, unfortunately, a period of instability. Gerrit, however, is NOT a typical OpenSource product, because since the beginning it has been based on rigorous Code Review that brought stability and reliability from its initial inception back in 2008. Gerrit v3.0 was developed during the years by following a rigorous backward compatibility rule that has made Gerrit one of the most reliable and scalable Code Review systems on the planet.

For all the existing Gerrit v2.16 installations, the v3.0 will be much more similar to a rather minor upgrade and may not even require any downtime and interruption of the incoming read/write traffic, assuming that you have at least a high-availability setup. How is this possible? Magic? Basically, yes, it’s a “kind of magic” that made this happen, and it is all thanks to the new repository format for storing all the review meta-data: NoteDb.

Last but not least, all the feature that Gerrit v3.0 brings to the table, have been implemented iteratively over the last 6 years and released gradually from v2.13 onwards. Gerrit v3.0 is the “final step” of the implementation that fills the gaps left open in the past v2.16 release.

With regards to statistics of the changes from v2.16 to v3.0, it is clear that the code-base has been basically stabilized and cleaned up, as you can see from the official GerritForge Code Analytics extracted from analytics.gerrithub.io .

  • 1.5k commits from 63 contributors worldwide
  • 62k lines added and 72k lines removed
  • Google, CollabNet, and GerritForge are the top#3 organizations that invested in developing this release

In a nutshell, the Gerrit code-base has shrunk of 10k lines of code, compared to v2.16. So, instead of talking of what’s new in v3.0, we should instead describe what inside the 72k lines removed.

Removal of the GWT UI

The GWT UI, also referred to as “Old UI” has been around since the inception of the project back in 2008.

Gerrit.GWT-UI

Back in 2008, it seemed a good idea to build Gerrit UI on top of GWT, a Web Framework founded by Google two years earlier and aimed at reusing the same Java language for both backend and the Ajax front-end.

However, starting in 2012, things started to change. The interest of the overall community in GWT decreased, as clearly shown by the StackOverflow trends.

Screenshot 2019-05-18 at 23.34.42

In 2015, Andrew Bonventre from the Chromium Project, one of the major users of the Gerrit Code Review platform, apart from the Android Developers, presented the new prototype of the Gerrit Code Review UI, based on the Polymer project, with the code-name of PolyGerrit, and merged as change #72086.

commit ba698359647f565421880b0487d20df086e7f82a
Author: Andrew Bonventre <andybons@google.com>
Date: Wed Nov 4 11:14:54 2015 -0500

Add the skeleton of a new UI based on Polymer, PolyGerrit

This is the beginnings of an experimental new non-GWT web UI developed
using a modern JS web framework, http://www.polymer-project.org/. It
will coexist alongside the GWT UI until it is feature-complete.

The functionality of this change is light years from complete, with
a full laundry list of things that don't work. This change is simply
meant to get the starting work in and continue iteration afterward.

The contents of the polygerrit-ui directory started as the full tree of
https://github.com/andybons/polygerrit at 219f531, plus a few more
local changes since review started. In the future this directory will
be pruned, rearranged, and integrated with the Buck build.

Change-Id: Ifb6f5429e8031ee049225cdafa244ad1c21bf5b5

The PolyGerrit project introduced two major innovations:

  • Gerrit REST-API: for the first time the interaction of the code-review process has been formalized in stable and well-documented REST-API that can be used as “backend contract” for the design of the new GUI
  • The PolyGerrit front-end Team: for the first time, a specific experienced Team focused on user experience and UI workflow was dedicated to rethink and redesign iteratively all the components of the Gerrit Code Review interactions.

The GWT UI and PolyGerrit lived in the same “package” from v2.14 onwards for two years, with the users left with the option to switch between the two. Then in 2018 with v2.16 the PolyGerrit UI became the “default” interface and thus renamed just “Gerrit” UI.

With Gerrit v3.0, the entire GWT code-base in Gerrit has been completely removed with the epic change by David Ostrovsky “Remove GWT UI“, which deleted 33k lines of code in one single commit.

The new Polymer-based UI of Gerrit Code Review is not very different than the one seen in Gerrit v2.16, but includes more bug fixes and is 100% feature complete, including the projects administrations and ACLs configuration.

Screenshot 2019-05-18 at 22.58.13

Removal of ReviewDb

Gerrit v3.0 does not have a DBMS anymore, not even for storing its schema version as it happened in v2.16. This means that almost everything gets stored in the Git repositories.

The journey started back in October 2013, when Shawn Pearce gave to Dave Borowitz the task to convert all the review meta-data managed by Gerrit into a new format inside the Git repository, called NoteDb.

After two years of design and implementation, Dave Borowitz presented NoteDb at the Gerrit User Summit 2015 and called Gerrit v3.0 the release that will be fully working without the need of any other external DBMS (see the full description of the talk at https://storage.googleapis.com/gerrit-talks/summit/2015/NoteDB.pdf).

Google started adopting NoteDb in parallel with ReviewDb on their own internal setup and in June 2017, the old changes table was definitely removed. However, there was more in the todo-list: at the Gerrit User Summit 2017, Dave Borowitz presented the final roadmap to make ReviewDb finally disappear from everyone’s Gerrit server.

Screenshot 2019-05-18 at 23.18.28

In the initial plans, the first version with NoteDb fully working should have been v2.15. However, things went a bit differently and a new minor release was needed in 2018 to make the format really stable and reliable with v2.16.

Gerrit v2.16 is officially the last release that contains both code-bases and allows the migration from ReviewDb to NoteDb.

Dave Borowitz used the hashtag “RemoveReviewDb” to allow anyone to visualize the huge set of commits that removed 35k lines of code complexity from the Gerrit project.

Migrating to Gerrit v3.0, step-by-step

Gerrit v3.0 requires NoteDb as pre-requisite: if you are on v2.16 with NoteDb, the migration to v3.0 is straightforward and can be done with the following simple steps:

  1. Shutdown Gerrit
  2. Upgrade Gerrit war and plugins
  3. Run Gerrit init with the “batch” option
  4. Start Gerrit

If you are running Gerrit in a high-availability configuration, the above process can be executed on the two nodes individually, with a rolling restart and without interrupting the incoming traffic.

If you are running an earlier version of Gerrit and you are still on ReviewDb, then you should upgrade in three steps:

  1. Migrate from your version v2.x (x < v2.16) to v2.16 staying on ReviewDb. Make sure to upgrade through all the intermediate versions. (Example: migrate from v2.13 to v2.14, then from v2.14 to v2.15 and finally from v2.15 to v2.16)
  2. Convert v2.16 from ReviewDb to NoteDb
  3. Migrate v2.16 to v3.0

The leftover of a DBMS stored onto H2 files

Is Gerrit v3.0 completely running without any DBMS at all? Yes and no. There is some leftover that isn’t necessarily associated with the Code Review meta-data and thus did not make sense to be stored in NoteDb.

  • Persistent storage for in-memory caches.
    Some of the Gerrit caches store their status on the filesystem as H2 tables, so that Gerrit can save a lot of CPU time after a restart reusing the previous in-memory cache status.
  • Reviewed flag of changes.
    Represents the flag that enables the “bold” rendering of a change, storing the update status for every user. It is stored by default on the filesystem as H2 table, however, can be alternatively stored on a remote DBMS or potentially managed by a plugin.

New core plugins

Some of the plugins that have been initially distributed only with the Native Packages and Docker versions are now an integral part of the WAR distribution as well:

  • delete-project
    which allows removing a project from Gerrit and the associated changes.
  • gitiles
    a lightweight code-browser created by Dave Borowitz based on JGit
  • plugin-manager
    the interface to discover, download and install Gerrit plugins
  • webhooks
    the HTTP-based remote trigger to schedule remote builds on CI systems or active any other service from a Gerrit event

The above four plugins already existed before Gerrit v3.0, but they were not included in the gerrit.war.

Farewell to Dave Borowitz and the PolyGerrit Team

After having completed the feature parity between GWT and PolyGerrit, the original PolyGerrit Team members left the Gerrit Code Review project.

Their journey came to an end with the release of the new shiny Polymer-based Gerrit UI. The PolyGerrit Team contributed 45k lines of code on 5.3k commits in 4 years.

Then the last event unfolded during the release of Gerrit v3.0: Dave Borowitz announced that he was leaving the Gerrit Code Review project. I defined the event like “Linus Torvalds announcing he was abandoning the Linux Kernel project”.

Dave Borowitz contributed 316k lines of code on 3.6k commits over 36 repositories in 8 years. He helped also the development of the new Gerrit Multi-Site plugin by donating its Zookeeper-based implementation of a global ref-database.

On behalf of GerritForge and the Gerrit Code Review community, I would like to thank all the past contributors and maintainers that made PolyGerrit and NoteDb code-base into Gerrit: Dave, Logan, Kasper, Becky, Viktar, Andrew and Wyatt.

Luca Milanesio – GerritForge
Gerrit Code Review Maintainer, Release Manager
and member of the Engineering Steering Committee

Security: do not use Git v2 in Gerrit 2.16

Git protocol v2 was released as an experimental feature in Gerrit v2.16. However, the introduction came with a quite serious unnoticed bug: all refs are visible to all users,
regardless of the ACL configuration, giving to any registered users the complete access to
of all branches, tags and meta-data refs, their associated commit SHA1s and
the ability to fetch them locally … ouch!

What is the Security impact?

If you were using Gerrit v2.16 for OpenSource projects, not much: everything is visible to everyone anyway, so what’s the point?

For not-so-OpenSource projects, well, it cannot be used as-is in production for sure. It was flagged as experimental after all because it was intended more for an early adopter to “have a go with it” rather than using it at full scale in production for sensitive projects.

See the full details of the security advisory at:
https://groups.google.com/forum/#!topic/repo-discuss/z_QCw2QHbbc

How did we get there?

The problem is mainly located in the JGit implementation of the refs filtering for the Git protocol v2. Gerrit ACLs are enforced using the JGit’s AdvertiseRefsHook which calls
RefFilter, where a Gerrit-specific implementation of it processes the access permissions associated with a user.

The AdvertiseRefsHook is usually set by UploadPack.setAdvertiseRefsHook
but, if Gerrit has the protocol v2 enabled in the gerrit.config and the client
is leveraging the git protocol v2 feature, the hook is not invoked. The bug has been already fixed in JGit but, of course, before enabling the support again in v2.16 we need to be sure that no other vulnerabilities are exposed.

What if I have Gerrit v2.16 in production?

On Gerrit v2.16 and v2.16.1, the Git protocol v2 was disabled anyway by default.

If you had it enabled in production by mistake (or bravery?), then just set it as disabled explicitly.

[receive]
enableProtocolV2 = false

If you can, just upgrade to v2.16.2 whenever possible, where the Git protocol v2 is always
disabled.

What’s next?

Git protocol v2 is coming back to Gerrit v2.16 very soon, possibly as early as next week or the one after, ready for Christmas 🙂

The fix is ready, but the Gerrit and JGit teams are working hard to put more specific security testing in place so that the new reintroduction can be safer to be rolled out.

Merry Christmas to everyone … and hope that Git Protocol v2 will be back with us very soon, just in time for the end of year celebrations.

Luca Milanesio – Gerrit Code Review Maintainer.

Shawn Pearce: a true leader

Shawn Pearce Memorial Fund

Shawn Pearce was the founder of JGit and Gerrit Code Review OpenSource projects. I am truly honored to have worked with him on the same project, even if we were sitting on the opposite sides of the Atlantic Ocean and working in offices with different company names.

Many times people talk about real leaders, but having known someone like Shawn for over eight years makes a big difference. We both started with the same ideas back in 2008 even before meeting each other. Shawn had the goal of making Git scalable for its adoption in the Android OpenSource project and started by rewriting the Rietvield project in pure Java, while I had the purpose of extending the use of Git version control to large Enterprises in the world, with the GitEnterprise.com service.

Embracing ideas

We first met face-to-face in 2011 at the GitTogether in Mountain View CA at Googleplex. Shawn was the initiator of this famous “unconference.” It was an incubator of ideas and creating minds that were spending time together to exchange and enrich each one experience with new revolutionary concepts.

I was new, and I did not know anyone from the project. My ideas were very different from whatever discussed by the people in the room. Then I stood up and proposed to change the architecture of the project and make it fully pluggable, inspired by the experience and success of Jenkins CI.

Shawn did not have a clue who I was, where I was coming from, what was my experience or my working history: he just answered “yeah, that makes sense. Would you like to come and help us?”. And so I did and we worked together for a week where my journey with Shawn all started.

Inspiring people

What I liked about Shawn was his unique style of leading the project. He never announced or presented himself with glamour at the conferences, but everyone knew who he was and when he entered the room always captured the interest and attention of everyone.

The passion of talking about the challenges he was facing with the project itself and always pushing the boundaries of what could be achieved was contagious. He never said to the people what to do and still left everyone to make their mistakes to learn and improve.

Shawn was the seed that made the project grow far beyond its initial size and scope. He was able to take promising young people like Dave Borowitz, who started just a year before as an intern at Google and let him rewrite the entire backend to implement NoteDb.

Humanity first

fullsizeoutput_4.jpg

Shawn was bright and genius, but what inspired me most of all was his passion for his children and his lovely family. He was used to having his son coming over the weekend to our Hackathons as well and be close to his dad. He merged his family life with his daily work and passion. The best moment I remember was when we had a “hacking barbecue” at his new house in San Jose facing the beautiful hills at sunset, with Noah playing in the garden and hugging his loving dad.

He always looked at combining the needs of his family and the presence and duties of the Gerrit project, which remembered everyone that people are the most critical part of it.

Dedication

ShawnPearce.Commits.Analytics.png

The loss of Shawn was for a lot of people unexpected: he kept on posting code and feedback on the mailing lists until his very last days. He implemented brand-new support for OpenSSH in Gerrit back in November 2017 and kept in touch with the community and contributors until Christmas. His last commit was on the 3rd of January 2018, always committed until the very end of his life.

commit ea974d725ed1d5d5ffa1461bef7986168819dee3
Merge: e650e06 bd42cc7
Author: Shawn Pearce <sop@google.com>

Date:   Wed Jan 3 01:06:03 2018 +0000
    Merge "Prolog cookbook: attribute success to uploader, not author"

He never disclosed publicly his illness and how severe it was. He lived his life with the best of his ideas and passions till the very end, with his family, his project and all the things he loved the most.

Shawn Legacy

When great leaders like Shawn are leaving, there is always the danger to leave a void. I believe it is not possible to replace Shawn because he was a genuinely unique leader in its style and passion. What he has created, however, is a vibrant and healthy community that is out there and will continue with the values that he taught us with his professional and personal inspiring life.

What Shawn has created he defined a “healthy live project” which will be able to continue beyond his life and our existence. See below the 10 years of the project in numbers.

GerritCodeReview.numbers.png

Gratitude

We are all grateful for Shawn has done to us and all the future contributors and adopters of Git, JGit and the Gerrit Code Review.

I am a lucky person to have known and worked with such an inspiring leader. Thanks, Shawn.

 

Gerrit User Summit: Multimaster at Qualcomm

My name is Martin Fick and I work at Qualcomm as Gerrit Administrator and I have one of the key historical Gerrit Maintainers since its very beginning.

I am going to talk about my experience in introducing a truly Multi-Master Gerrit setup at Qualcomm and learn about the findings and the experiences we made to make that happen.

Let’s start with a question so that we can make this session very interactive from the start.

Q: Who would like to have multi-master in Gerrit Code Review? Why would you need it on your server? Failover? High-availability? Both?

A: All of that but also latency. We have developers all around the world, and we would have at least two datacentres, one in the States and another one in Europe, we were looking for a multi-site multi-master to descreen latency between the sites.

A: (Luca Milanesio – GerritForge). Scalability and elasticity that allows to grow and shrink the instances based on the incoming traffic. Additionally, we would like to have zero downtime and rolling upgrades without stopping the services when moving between minor versions of Gerrit.

Q: (Han-Wen Nienhuys – Google) Let me reverse the question: who is running multi-master and would like to have a single master?

A: (Dave Borowitz – Google) I would love if we had a single server that was powerful enough to serve all our traffic and no latency and we had not had to deal with slow database backends and replication lags between sites.

Q: Does anybody need multi-master? How are you surviving today if you needed it?

A: Right now we maintain a second master server for DR purposes, is a passive standby backup server. If our master went down, we would have to failover to it.

A: We would need multi-master soon. Currently, we are experimenting the HA-plugin as a fallback solution, but we are working towards achieving multi-master so that we can scale up, which is mainly our primary concern here.

Gerrit Multi-Master requirements

There are needs that I hear from the audience, high-availability, scalability. Not necessarily the same requirements and I believe that multi-master would address some of those. What’s important is that everyone is focussing on their needs and think about how they can approach those first instead of going to the “big everything” solution.

We have high-availability as an issue somewhere, so we want better availability. We have some scalability issues as well. We do have some site issue at times but we most generally deal with those through slaves all over the world, between 50 and 100 and we primarily have one data-center in San Diego USA. If that goes down, nobody is getting anything done even if Gerrit is up, that makes failing over to other sites not particularly useful right now.
Our current focus is on better availability and better scalability at one site.

Scalability problems at Qualcomm

We have a pretty hefty load, we have around 6k projects, but our load is enormous on a few large projects. They are not massive projects but they are significant from the standpoint of having many changes on them, and there are a lot of users using those projects. Mainly a few kernel projects are used, and even if there are many other little projects, most of the users are pushing to those single projects.

Screen Shot 2017-12-18 at 14.05.47.png

We are a single tenant server, so we are primarily dealing with that. Other people need other Gerrit servers, and they manage their own, but at the moment we don’t have to deal with those. However, if we had a more scalable system, then I believe multi-tenancy would become the next problem on the horizon. If I had a great scalable system why am I creating this instances here and there I would instead put all of them on the central system so that I can manage one system. But then I would have the problem that Gerrit doesn’t currently handle multi-tenancy well. Maybe is because we don’t have a multi-master system and then we are not ready to put everything on one server anyway. If we are starting having a multi-master solution, then multi-tenancy is then something that would become more interesting.

On one server we have around 6k projects, the main project has about 1/2M refs, and we have in total 2.3M changes on the entire server.

I like to break down the scaling topic into two different ideas: horizontal scaling and vertical scaling. In our area, we need to scale vertically more, while horizontally is where you have lots of projects and users and vertical they are more concentrated on a few projects.

Stairway to Gerrit Multi-Master

Here is what we decided to do and we decided to build it incrementally. These are the four phases in which we have approached it.

Screen Shot 2017-12-18 at 14.07.06.png

Step #1: Active/Passive

We have an active/passive standby, and we started doing regular failover. Let’s face it, if you have a failover machine that you probably have never failed over to, it would probably not work.
We have so much software around the Gerrit ecosystem, we have a lot of scripts, cronjobs and things that run on our server, and they use a lot of software, there are a lot of Python hooks and stuff like that.
All that software gets updated. If the failover machine is not updated through the same process and tested, there could be missing dependencies, wrong package versions, etc. Maybe some hacks or links on the filesystem that is pointing over here and not over there or maybe filesystem positions are different and things like that.

Before even dreaming about multi-master you need to have a system to have your server systems consistent. If you don’t think that your organization can do that and make two systems precisely the same, you are not going to be anywhere near ready to start doing multi-master. If you are a large organization that has a lot of history, it’s a more significant challenge. If you are starting from scratch it is a lot easier, think about that from the start: how can I focus on repeatability and deploying the same stack of software to multiple servers, and then concentrate on failing over.

Sometimes this year around February/March, we started failing over, and now we are trying to fail over weekly. We have been doing that for getting the Team used to it, to ensure that our processes work.

Step #2: Active/Hot Standby

Then we moved to hot-standby. The difference here is that the passive node had all the software in place but was not necessarily running. So now we just started the other server, but only nobody was pointing to it. The active standby gives you the opportunity to test the software you have around your Gerrit master. If you have hooks or cronjobs, try to run them on the failover server, even if it is not the active master. Do you have the coordination needed? Do you have locks in place? If you are doing repository repacking, can you do it on both servers at the same time? Do they conflict with each other?

The assumption is that you have a shared repository on the backend such as an NFS and thus activities like repacking need that coordination. And you need that stuff to work before you can go to multi-master.

Step #3: Active/Active with Round-Robin DNS (low tech)

The third step which is where we are at is active-active. We chose this as a simple low-tech and easy to backup solution: round-robin DNS. It sucks from a load-balancing standpoint and for the failover recovery, but it gets us having traffic going to two servers. We have been running that for quite some time now, 15h!

Step #4: Active/Active with Load Balancer(s)

And the next step is of course to have a load balancer. We have a load-balancer setup that we use for slaves. It is much harder to back out because you have to set IPs to have DNS in place pointing to them, but we plan to transition to that.

What Gerrit data to share in a multi-master setup?

In a single-site Gerrit multi-master setup, you necessarily need to have shared Git repositories, you don’t have to share your H2 caches because you need them on each of them and then the ReviewDb needs to be shared until it goes away with Gerrit Ver. 2.15.

Screen Shot 2017-12-18 at 15.09.10.png

Sharing sessions

There is other stuff you need to share, like web sessions.
When you log in on with your web-browser, and you hit one machine, your browser keeps track that you are logged in with a cookie on your browser while the server keeps track of it on a shared persisted session. If your next hit goes to the other server, you don’t want that user to be kicked out, so you need to share the server session data across nodes.

In 2014 we developed the websession-flatfile plugin, and that helps taking care of sharing web sessions, and I know that a lot of people used that already for HA solutions. It is straightforward, just stores the sessions on the filesystem and if you are using Git repositories on NFS, just put them there also and share them.

Sharing events

The other thing that you require to share is the events data. Our events need persistence on the server side, and we share them on the filesystem too. That allows to share them across masters also. If you connect to one of our masters through SSH, you get events from both of the masters. It’s a poor man’s sharing, and it is a simple set of files that is there and is shared. One thing that we realized using multi-master is that possibly your events are going to be disrupted much more. Even if you think you have high-availability, the client can only connect to one server. So it doesn’t matter if the cluster has even ten servers, the client is still attached to just one of them.

Generally speaking, most of us want to do rolling upgrades, and that would bring one server down, and that is one of the reasons we want to go to multi-master. If you are doing that frequently, you are disrupting your SSH clients and for those who are permanently connected such as the stream events even more.

I have contributed a new stream events plugin, and I merged it upstream during the last hackathon so you can download it if you want to look at it that helps you to store stuff. You get the added benefit to add some flags to get IDs on your events, and then replay IDs. Since they are stored on the filesystem, when you connect you can tell what the last ID was you connected to, and you can start receiving all the events after that. It does it through the same interface of the old stream events: you drop it in, and it works. It respects Gerrit permissions, just like the core plugin does so that the user that is connecting will see only the events of the projects and branches he has access to.

Coordinating other processes on the Gerrit nodes

As processes on our Servers, we have the Gerrit JVM, but also we have a lot of hooks, some of them run for a very long time. You need to get those to run on both servers. We have a few cronjobs that do repack, we have others that are checking if users are active or inactive on LDAP and update their status on Gerrit and check their e-mails to make sure that they are really what they are saying. And last we have some various maintenance cronjobs.

We focussed on the background stuff first because you need to get that done anyway and you can use it as a peace mill, and we tried to coordinate those first.

We went for the low-tech solution, sharing as much as possible via the filesystem.
Years ago I created a lock implementation based on the filesystem. It is similar to an echo PID to a file but is a lot more robust and can recover deadlocks as long as he can contact the other server using SSH and does all of that automatically.

We use that as a primary method to coordinate things, so on top of that, I have built a scripting queue executor for hooks. Years ago before multi-master, we had issues with hooks that were running for too long. If you remember how Gerrit manages hooks, runs from the Java process with a queue that runs only one at a time. If your hooks take for instance three minutes, you may build up a backlog on your server. If your node goes down, you lose that backlog. What we decided years ago is to have the java process just to write the hook onto a file, and then we have a schedule that runs them in the background. The queue in Gerrit is then always at zero, it takes less than a second to write to the file and then it is done.

We made that multi-master friendly, but because that queue uses the locks, it just worked when we run them on more than one master. One server can write the hooks to a file, and the other could be the one that runs it, and thus it distributes the load in that way.

Gerrit cluster events

If you think about it, there are a lot of things that are hooked into the startup, so I just put a little hook framework that checks if the servers are running and check if the other nodes are running by ssh-ing into them. That allows identifying that if the two servers are down and one goes up, then your node is starting, but also your cluster is starting. And if the other just comes up you are not starting the cluster, but you are starting the node. Then if one node goes down, you haven’t stopped the cluster but you’ve stopped only the node, and if the other goes down you have now stopped the cluster.

I have created a little hooks framework so that you can plug into those events, and then you can do things based on that. For example, at the startup of any node, we are going to start the hooks just to make sure that they are running. But then when you stop a node we disable them only when all the cluster goes down. If one node goes down, the hooks need to be still running because they can contact the other server because they go to the domain name and not to the local server.

We plan to use the hooks framework for replication too. Currently, if you are familiar with the replication plugin, at startup try to replicate all the project to all the slaves to see if they are up to date. The theory behind it is that data could have been modified behind Gerrit back and possibly because when the server was shut down, it was in the middle of doing something and never finished it. In a cluster situation, if at any time something goes down, it could have been in the middle of replicating, and the other node won’t know. When any server goes down, you don’t want to replication, but there is no need to do replication at startup when any node starts but only when the cluster starts.
If there is at least one node up, when the other comes up you don’t want to do global replication. You could replicate it, but it would be a just extra load.

Caching across the cluster nodes

You can do some coordination better than the simple filesystem, for instance, the high-availability plugin does one to one connection, you need to configure the IP of the one master to connect to the other and they connect between each other. That works for two, but it is not a super-scalable system, it is an N-squared problem. Eventually, you want to move to a pub/sub system for things like that.

Something I haven’t mentioned yet it is what we had to do coordination wise it is caching. The high-availability plugin evicts caches, but we don’t do that yet, we consider it an optimization. The primary remedy is just to reduce the time on the caches, mainly decreasing to the values that we had on our slaves. Mostly you probably have lower cache time on your slaves, to make sure that you are checking your projects’ ACLs and your group memberships a little more often than your master. The Gerrit master knows when things get changed without the need to evict any cache, but the slaves do not know when they changed. We made the masters a little bit dumber like the slaves because they know when they changed something but they don’t know when the other masters did. As long as you are fine with the delay, let’s say 5 minutes, then it means that when you change an ACL, it may take up to 5 minutes to be seen by the other masters.

Demo: make your laptop a multi-master Gerrit server

As anybody tried to run two Gerrit instances on the same laptop?
The first problem that you will see is that if you point them to the same H2 caches it will just not start. But if you run init on two different directories and you don’t share any data on the two masters, then you need to go and modify your gerrit.config file to say to start sharing something.

Out of the box sharing abilities

You need to reconfigure the Database to be the same, and you should use a PostgreSQL or MySQL because the default H2 won’t allow any sharing. I am running 2.14 here, and I did not share the indexes, so they are going to be out of date for this demo. The user’s sessions are not going to be shared, so you need to configure them manually to make them shared.
If you use the websession-flatfile plugin, it will put the sessions on the filesystem and then you can share them on the filesystem.
Lastly, the stream events won’t be shared, but most of the rest will just work.

I am about to run stream events on both masters, on the left screen I am connecting to port 29418 and on the right screen on port 29419.
Now if I go to the WebUI, on port 8080 I have the same master of the one on the left, and on port 8081 I have the one on the right.

I am going to create a project using the master on the left: I hit create and you can see that the events appear on the stream events on 29418 but not on 29419.

Now let’s try to go to port 8081, and I have to log in again because the websessions are not shared yet. I am showing to you what it does out of the box. The events now came on the right screen but not on the left one.

The two masters are just pointing to the same Git data: I am trying to make them multi-master, but so far it is not working very well.

Adding plugins to share events and sessions

So, let’s step in and start deploying some new plugins. What we did with the plugins is to create a new one called “events” so that when you want to get the distributed events, you just call “events stream” where “events” is the name of the plugin and stream is the command.

What we did in the core is to remap the actual “gerrit stream-events” to “gerrit stream-events-core” so that you can still get to it if you want and this helps for debugging. And then we pointed the “gerrit stream-events” to “events stream” so that users will automatically get the new plugin events so that to them in invisible.

I am now on 8081 on the browser over here, create a new project called ‘Luca’ and here we go, events are generated on both 29418 and 29419 ports. You may have noticed that showed up on the right first and then to the left.

Screen Shot 2017-12-18 at 15.11.51.png

The events are on the filesystem, so the server just have to pull and here I am pulling every one second, but if something happened on this server, it will automatically catch-up on all the events generated on the other server as well.

The pulling is a backup: the events should come up even without a pulling mechanism. If we eventually had a pub/sub mechanism, I would still suggest keeping the pulling. Events are inherently unreliable and will get lost: you need a pulling mechanism in place if you want true reliability. Pulling is not great, but it is reliable. Events are effectively an optimisation for speed.

Let’s go back now to the other server at port 8080, and I did not have to login because the sessions are now shared and kept across masters. I create a ‘hugo’ project, and it came up pretty quick on both stream events on both masters.

So we have now shared web sessions and shared events. That’s in 2.14 but to make it a true multi-master you would need to share your indexes as well but as we are still on Gerrit v2.7 at Qualcomm, we don’t have an index and we don’t have to share it.

More exciting features in the events plugin

We’ve been using the events plugin in a single-master scenario since early this year, and it has been pretty reliable. I can show some other exciting features are in there. There are some new options here, IDs and resume after. If I give –ids and –resume-after 0, then it returns all the events since I started the instance, and I initialized it. Even if the server started and stopped a bunch of times, I still have all those events recorded. The IDs of the event have two parts of it: the left part is a UUID, and a right portion is a number. The idea being that this is your file store and this is your event store, a unique id associated with it. If someone deleted it on disk, it would create a new one. The idea is that if you had event 1M and someone has removed the file store, and you ask after event 1M, the new event store will restart at zero, and you get nothing. By having different Ids if you ask “give me all the events after ID 1M” then the UUID changed, and it realizes that you need everything. That allows giving some little extra safety. The plugin works on Gerrit v2.14, and I made some changes to make it work on v2.15 as well.

Screen Shot 2017-12-18 at 15.12.54.png

Q: If you use Hugo’s high-availability plugin for the index, you basically can probably do multi-master right now. Why then are you guys not running multi-master? Are you guys concerned about NFS and ref-updates?

A: (Hugo Ares – Gerrit Maintainer) We use it in failover mode for the only reason that if you just have a little bit of load, you are safe and consistency will be kept. But if you push it a bit more then sharing Git repositories from different machines in write mode doesn’t work. If you have two computers writing precisely to the same Git repository to the same branch to the same file at the same time, you are going to lose history, and we know because it happened for us. That’s why we don’t do multi-master right now.

We have been writing for several years to the same repositories from different machines using NFS and we haven’t found any issues because we do repack all over. Perhaps we have different NFS settings or just a different implementation.

A: It did not happen that often, but at least a couple of times and that’s why we don’t do it. We mainly use the high-availability plugin for evicting the caches so that we don’t need to lower the cache settings because they are removed automatically and takes care of the events in a slightly different way and trigger the reindex of the other copies. We can keep nodes down during the day, and the people would not notice any difference at all, no deal. We are just a little bit reluctant to write from both nodes, but that’s the plan.

So here you are already on a “multi-master ready” setup. We hear about your reports of the JGit problems, and we will be keeping an eye on it. I am confident we can fix any issues that could show up over here, understanding how Git works on NFS. There are some tricks that we can pull from our code base.

The (in)famous NFS stale file handle bug

I know we fixed in the past a bug on JGit to handle NFS: when you delete a file, and someone else tries to access it from a different node gets a stale file handle error. That’s the main problem you are running into and we encountered and fixed some of those issues on JGit, and there are possibly a few more left here and there probably. The workaround to the stale file handle problem is just to make a copy or a hard link to it and keep it for a bit more time and remove it later, and then you are fine. Generally speaking is just the brief moment after the operation that would be a problem. It is what it would happen on a local file system anyway: if you delete a file that was still opened by other processes, the inode will stay there for a while until it gets unreferenced. That is the main issue we have been running into, and most of the other problems have been already fixed on JGit.

Distributed GC on a multi-master setup

We have a distributed Git GC and repacking relying on the SSH filesystem lock mechanism that I talked before. We basically have a list of repos to be GCed on the filesystem. All the nodes are saying: “I am going to this,” and another node says “Oh, good, then I am going to do this other one” and in this way, the GC load gets fairly well distributed and you get only a few hours to get through the whole thing. The lock allows preventing to step on each other toes when doing repacking. As far as the backup is concerned, we do a rsync and take database dumps, so that we don’t get inconsistent snapshots. You need to get your Database backup first and then the repositories next. If you are running an older version of the DataBase backup, you are generally safe because you don’t risk to have references to inexistent objects.

Questions

Q: I may have missed one point: the shared filesystem approach is assuming you are not doing multi-site, right? If you were doing multi-site, it would add extra latency that would introduce more problems and add more latency which was the problem you were trying to solve.

A: Yes, that’s correct. We are not interested in multi-site because there are a lot of downsides to it: it is a lot more complicated and makes your writes a lot slower while the reads would be faster. When you are pushing to gerrit-review.googlesource.com you realize that is not the best experience. You need to go to a quorum to the majority of the people around the world to get the agreement of the majority of them to finalize a git push operation.
Single-site is the first initial step for us, multi-site requires a very different approach. You need to get the Git objects to be replicated and then having somewhere to store the refs that have to be shared without a system like Google’s database that assures that the refs are replicated around the world.
It is not impossible; we made some experiments on it. A few years ago Dave Borowitz uploaded a ZooKeeper implementation that does the refs sharing, but that only does the refs. Then you need to do something for the Git objects replication. You could use the regular Git replication to do that, but you need to come up with a bit of magical ref scheme that does the job behind the scene.

A: (Luca Milanesio – GerritForge) Last year at the Gerrit Hackathon in Mountain View, we presented a new implementation of the “missing bit” you were talking about. It an OpenSource project based on JGit and leverages Apache Cassandra for the Git objects, while uses the ZooKeeper implementation for the refs you mentioned. We are making a lot of progress on the project and it will soon be possibly THE solution for Gerrit Multi-Site.

Q: (Luca Milanesio – GerritForge) We found out and fixed recently in JGit a problem with the cache consistency: it was reported on NFS but had nothing to do with it. When JGit was not able to open a file for whatever reason, NFS latency, maximum opened files, whatever, then the packfile was removed from the list of packs because it was considered missing. Then JGit started to failing fetching objects from the repository, and the people were panicking thinking that their Git repository got corrupted. But then, why the other nodes accessing the same repository did not have the same problem? Then “magically” the problem disappeared when you restarted Gerrit. The problem has been fixed in recent versions of JGit but is still there for the version you are currently using as the basis of Gerrit v2.7. Did you find the same problem? How do you manage to keep up with the recent releases of JGit but staying on Gerrity v2.7?

A: We have an old version of JGit, but we have put our patches on it. I believe we discovered and fixed a bug like that years ago. We ported a series of patches upstream and we a set of performance improvements on our branch. However, it is hard sometimes to get stuff merged on JGit; there are not enough maintainers to even look at the thing.

Gerrit User Summit: Script plugins with Docker

My name is Luca Milanesio, and I work for GerritForge. My talk today is about plugins and how to create them using scripting languages.

Gerrit plugins, where it all began

My contribution to the Gerrit Code Review project started in 2011 with the introduction of plugins. To understand where we are coming from we need to back to those times when the project was just born a year earlier. Gerrit was mighty since its very beginning, and different companies that used and contributed to the tool had tailored the code base to their specific needs. When I joined the GitTogether conference in 2011, almost every user was talking about their fork of Gerrit. Forking is excellent especially in OpenSource because you can customise a project as much as you want and, we were all excited about the growing popularity of GitHub and forking was a popular concept. However, keeping a fork up-to-date is not as easy as you may initially envision. Moving on with the upstream releases is hard when you are working on a fork.

Back in 2011 when I was at the conference, I thought: “how can the Gerrit project evolve and grow if we are all working on forks?”. My way to convince the Gerrit Community to change that status quo was inviting Kohsuke Kawaguchi, the Jenkins CI project founder, to the summit. Jenkins CI is wholly based on plugins while the core does not do much: the plugins are making the whole thing work as a CI.

That was enough to convince the community that a change was needed and, during the next Hackathon in 2012, I wrote the initial version of the Gerrit plugin loader and the first “Hello world” Gerrit plugin was born.

The introduction of scripting languages

After two years, Gerrit had only 50 plugins. If you had looked at Jenkins, at that time they had over 600 plugins, ten times as many plugins compared to Gerrit. Writing a new plugin for Gerrit was still too hard for most developers and administrators.

To develop a new Gerrit plugin you needed to know way too many things and have many skills: a different build system (Buck and now Bazel), having a full development environment and all the required dependent packages.

Screen Shot 2017-11-21 at 09.38.07.png

We still had new plugins because some people went through the initial pain of setting up the environment. However, for a project to thrive, you need to get people together and embrace a diversity of skills to allow people to give the best of their knowledge.
Maybe the typical Gerrit admin is not a Java Developer, possibly could be more familiar with Groovy because the Ruby syntax is used a lot of DevOps tools. Others are more familiar with Python, and if you accept what they can contribute, the project can benefit from many more experiences from different people and backgrounds.

What does the community think about it?

Once I shared my ideas with the community, the feedback was great. However, different people with different backgrounds started asking to use very different languages, ranging from Scala to Groovy and Python. Then I realized that supporting one scripting language would not have been good enough for most of the people.

“Hello world” in Groovy

To give you an idea of how easy is to write a new plugin in Groovy, see the following example.

import com.google.gerrit.sshd.*
import com.google.gerrit.extensions.annotations.*

@Export("groovy")
class GroovyCommand extends SshCommand {
  public void run() { stdout.println "Hi from Groovy" } 
}

It is straightforward to write scripting plugins: put the above content in a hello-1.0.groovy file in the Gerrit’s /plugins directory and as soon as the file is saved the plugin is there and will be loaded in Gerrit within a few seconds.

The way that Gerrit recognize this file being a plugin is through its .groovy extension. The file name denotes both the plugin name and its version, delimited by the ‘hyphen’ on the filename. In this example the file hello-1.0.groovy identify a plugin called ‘hello’ with a ‘1.0’ version.

One warning about Groovy: it is a language that relies on Java Reflection for method invocation. Reflection is a capability of the Java Runtime and enables methods discovery which is handy to use but is slower than a native Java language.
The drawback of the ease of use of the Groovy language is the CPU cycles at runtime.

The beauty of using a scripting language for plugins is the speedup of the development cycle: as soon as you edit the Groovy file on the file system, the old plugin is unloaded and the new one loaded in Gerrit. The plugin development lifecycle becomes so much faster compared to the traditional Java application development.

Develop Scripting plugins using Docker

Slide01.jpg

Gerrit is provided as a Docker image on DockerHub. The ‘gerritcodereview’ organization has an image name called ‘gerrit’ with all the versions available denoted as tags since Ver. 2.14. Earlier versions of Gerrit docker images are available on the ‘gerritforge’ DockerHub organization.

In the following example I am running Gerrit 2.14.4 on Docker fetching the image directly from DockerHub:

docker run -ti -p 8080:8080 -p 29418:29418 gerritcodereview/gerrit:2.14.4

In the above example, Gerrit is exposed through HTTP on port 8080 and exposes its SSH interface at port 29418.

Docker is a system that allows running containers, which are application “packaged” with everything needed, including other components of libraries of the underlying operating system. The only requirement on your physical host is the Docker engine, which exists nowadays for MacOS and Windows other than Linux where it was originally designed. Whatever operating system you are running on your laptop, Docker is there.

Docker can be handy for all the contributors that are not familiar with Gerrit Development Environment. There is no need to know or install anything on the local box, other than running the Gerrit Docker container. When I am running Gerrit in this way in this example, it starts straight away, with zero installation steps or configuration.

Gerrit out-of-the-box experience

The second significant value of the Gerrit Docker container is that includes an out-of-the-box configuration, a welcome screen, and the plugin manager. It consists already a set of components that, if you are not familiar with Gerrit, will help you a lot to understand what is Gerrit and how to use it.

As you can see from this screen, Gerrit has started, and if you navigate to http://localhost:8080, it shows you an initial welcome screen.

Screen Shot 2017-11-21 at 09.42.10.png

Historically the very first screen, once you have installed Gerrit, was a blank screen. I remember a few years ago people coming to me saying that as new Gerrit users they were quite confused: they just did not know what to do with the initial blank screen. In Gerrit Docker, the initial screen is a “Welcome” which is a beautiful thing to say to people that you did not know that came to your house. Additionally, it provides some useful links and information to install plugins, which is very important because Gerrit without plugins is missing some fundamental parts of its functionality.

Playing with Gerrit Plugin Manager

By clicking the “Install plugins” button, you reach the Gerrit Plugin Manager screen. For all of those who are familiar with Jenkins, it provides precisely the same functionality as in Jenkins. If you type ‘groovy’ in the search bar, you can easily find where the Groovy scripting provider is, and you can install it with a simple click. That is the plugin you need to tell Gerrit that from now on, every file in the /plugins directory with a .groovy extension is a plugin that needs to be parsed and loaded at runtime.

Screen Shot 2017-11-21 at 09.44.14.png

You can discover and install other plugins as well. For instance, typing ‘github’ would list the integration of Gerrit with GitHub authentication and pull requests, or typing ‘jira’ would return the association and workflow integration with Jira Tickets.
The plugin manager is a fantastic discovery mechanism to understand what are the integrations available for Gerrit Code Review.

The plugin manager automatically discovers the versions of the plugins that are compatible with the Gerrit you are currently running and, when you click ‘Install’, it downloads them and installs them locally. When you are done, just click on the top-right link “Go To Gerrit” and you are straight into Gerrit UX.

How we have a running Gerrit instance that has installed all the plugins I need, including the support for Groovy plugins.

Writing plugins in Scala

If you need want to leverage the Gerrit scripting plugins, but you need optimal performance at runtime, you can use a different scripting language such as Scala.

GerritUserSummit-2017-Scala.png

The Scala language allows compiling into the native Java bytecode; it does not use reflection for method calls and, for some operations could be even faster than the Java language itself. See the same hello world example but rewritten in Scala.

import com.google.gerrit.sshd._
import com.google.gerrit.extensions.annotations._

@Export("scala")
class ScalaCommand extends SshCommand {
  override def run = stdout println "Hi from Scala" 
}

When I showed this to the community people got so excited and started writing tons of scripting plugins.

What scripting plugins do in Gerrit?

Admin tasks as SSH commands

Sometimes Gerrit admins need to automate specific tasks, however, coding an external script could be slower and difficult to implement. Inside Gerrit, there are already a lot of objects which represent pre-processed in-memory entities ready to be used. It makes sense to leverage all the information that is in-memory already and write new SSH commands like Scripting plugins to control admin tasks remotely.

Scripted REST API

At times you need as well to tailor existing Gerrit REST API to your needs. For instance, imagine that your company has specific policies for requesting new repositories: why not then creating a new ‘Create Project’ REST API tailored for your needs using the Scripting plugins and expose it through a company HTML form? You can do it without the need to be an experienced Java or Gerrit contributor and using a simple Groovy script for the new REST API.

Low-footprint hooks events

A third option is fascinating because, before the introduction of Gerrit plugins, the only way to react to Gerrit events was through hooks or stream events. Hooks are a traditional Git mechanism and, in Gerrit, have a scalability problem: they are invoked for every project and every event that happens anywhere and spawn a different asynchronous process. Over time the extra processes created can cause a significant overhead for your super-busy Gerrit server.
When a hook script needs to read from the Git repository, it would then need to process from scratch the packfiles from the local filesystem, uncompress and parse them in memory over and over again, which could slow down your server significantly.
If you are implementing Gerrit events using plugins, the same processing could be ten or even hundreds times faster.

 

 

 

 

Gerrit v2.14 brings new features and UX

A brand new version of Gerrit is out, but the increment of the minor version number to 14 uncovers a set of unique innovations that this release provides.

Gerrit Ver. 2.14 is most likely the last 2.x version before the introduction of Gerrit 3.0, which would change forever the way we look and interact with code-reviews. That means that even though 3.0 isn’t ready yet, some experimental features have already been introduced in Gerrit 2.14. Those will be tagged with the [exp] prefix in this article, but don’t be scared by the wording: all Gerrit features, including the experimental ones, are heavily used on a daily basis by large installations like Google’s and GerritHub.io

Java 8

For the first time, Java 8 is a mandatory requirement to run Gerrit. It was previously a strongly recommended option, but both Java 7 and 8 were equally supported. The switch to Java 8 comes with the incompatibilities with all the operating systems that do not support its latest version and updates, such as Ubuntu 15.x or CentOS 5.x to name some of them.

PolyGerrit and review by e-mail [exp]

Gerrit includes a richer user experience with two major improvements: new redesigned HTML5 with WebComponents UX (code-named PolyGerrit) and a fully featured bi-directional HTML e-mails. Interacting with Gerrit is becoming easier and more intuitive.

With PolyGerrit the changes diffs are included into the main screen and are as simple as expanding a div section. The page loading is much faster thanks to the browser caching to the core building blocks of the UX. Even though the UX isn’t complete yet, a lot of Google’s teams use it already on a daily basis, including the Chromium and Go-Lang projects.

The redesigned and richer HTML emails are now bidirectional and include all the information you need to perform an off-line review using your e-mail client. If you are on the move, just reply to the e-mail with your comments and Gerrit will pick them up and include in the change review as messages, amazing isn’t it?

ElasticSearch [exp]

It is now possible to use an alternative Indexing engine, ElasticSearch, which allows having a clustered setup of distributed nodes of index data. That is a major stepping stone towards the full implementation of Gerrit multi-master, giving the possibility of multiple Gerrit masters to share the index data with replication over the network.

Out of the box UX and Plugin Manager [exp]

Installing Gerrit with the associated plugins is so much easier: there is no need to clone the code or googling around for a compatible plugin build: everything is included in Gerrit with an intuitive and user-friendly user-experience. Just use the search box to find the plugins compatible with Gerrit v2.14 and install them with a single click.

This new feature is provided by the new native packages (RPMs, Debian and Docker) which benefit from two new plugins (out-of-the-box and plugin-manager) that are included by default and executed as the first action after a new fresh installation.

What’s next?

A lot more is coming, as the NoteDb support become more mature every day. Google has announced to have switched off the ReviewDb in production and is using NoteDb as “unique source of truth” for all its projects. Gerrit 3.0 with 100% NoteDb support is coming very soon and will change the way you think and interoperate with your code review forever.

Stay tuned; more innovation is coming! – Luca Milanesio – GerritForge