Gerrit 3.0 plan announced: we need stabilisation now
Gerrit 3.0 plan and its NoteDB reviews have been officially announced at the Gerrit User Summit 2015. It is already available as an experimental feature in the current Gerrit master but it needs much more stability in order to be officially supported for production.
GerritForge decided to help and reuse its existing continuous integration system to validate every Gerrit patch set against the current and the new NoteDB review persistence back-end in order to avoid regressions during the 2.13 and 3.0 development
Pre-commit validation by GerritForge CI
If you have posted a patch to gerrit-review.googlesource.com in November, you may have hopefully received a Verified+1 from a strange user with a Diffy logo on the side.
GerritForge’s provided CI on gerrit-ci.gerritforge.com fetches automatically every patch-set pushed to gerrit-review.googlesource.com and triggers a slightly modified Gerrit build with the purpose of checking whether the code change introduces a regression or not. This may seem at first sight a quite normal Gerrit to Jenkins job integration, however implementing it on top of Google’s multi-master replicated installation was not a piece of cake.
Gerrit Trigger plugin limitations on multi-master setups
Jenkins has already an out-of-the-box integration with Gerrit provided by the Gerrit Trigger plugin maintained by Robert Sandell – Cloudbees. It leverages the Gerrit stream events through an SSH channel and make use of Gerrit REST API to action them according to the build result.
The Google’s Gerrit setup, however, is not a trivial one-node installation and is further limited by the security constraints of the Google infrastructure, which does not allow any incoming SSH connectivity.
Additionally all concept of “getting the events in a stream” isn’t going to work when events can come concurrently from multiple places at the same time: who is going to define the “global ordering” and how to put all those events in a single TCP/IP Socket? Even UDP would not work in this case because SSH channel requires confidentiality between two and only two peers.
Alternatives to SSH
During the hackathon, other approaches have been discussed by Shawn Pearce, including the use of HTTP WebSockets (or Cometd) for fetching events without the need of an SSH connection. Events are still distributed and generated by multiple masters all the time, and the Jenkins plugin would then have the onus of contacting all the Gerrit servers and keep a connection opened to all of them. This is clearly not going to work because the number of servers, their IPs and locations may change at any time and the solution would eventually be in danger of losing precious events.
Back to polling
The only solution we envisaged was to fall back to a polling logic where Jenkins ever 10 minutes is asking Gerrit “what’s new since last time we spoke?”. This solution goes against the main reason the Gerrit Trigger plugin was designed: avoiding SCM polling. It is, however, a much better and optimised polling strategy and let’s see why.
Query and then fetch
The typical Git SCM polling relies on fetching all references every poll interval and detect if new Git commits are available. This is notably slow and generates a huge overhead on the Git server. The approach we took is quite different and makes use of the Gerrit search capabilities that are way faster and more powerful than a simple Git fetch.
Jenkins first ask Gerrit the list of changes and associated commit-IDs involved in any event since the last polling time: the result may include patchsets that have been already built to avoid having any gaps between polling intervals. The search is fast and implemented in … you know, Google is a search company isn’t it?
Once the list of candidate commit-ids is identified, Jenkins goes through all of them and checks using the Gerrit REST-API:
– has it been build during my previous execution?
– has it been already accepted (or rejected) by me?
The Commit-IDs that results as not being checked before and not yet validated are then used to trigger a specific job parametrised on:
– Specific branch
– Specific change ref-spect
Fetching is performed avoiding any wildcard and the corresponding load on the Git server is minimum. Fetch (Git protocol) + build (using Buck) + test (unit + integration) + review feedback (REST API) is taking an average of 5 minutes, which is an amazing result if you consider the size of the Gerrit project and the typical slow speed of a default Jenkins Git fetch.
The bottom line
Using the query + fetch approach, which seemed a bit slow and old-fashioned at the beginning, was eventually very simple and successful. Instead of setting up SSH hostkey verification, key exchange and ad-hoc channels, the only configuration needed is a valid Gerrit user and the HTTPS endpoint URL, the same used for cloning the code.
The solution is much more reliable as SSH channels are notably unstable and consume server threads. The only drawback is the slight delay between the patch-set upload the start of the build (at max 10 minutes) which is acceptable in most cases.
Results
Since its roll-out more than 1200 patches have been checked and rated, a lot of potentially Gerrit regression avoided and more importantly we have prevented the NoteDB code to start diverging regarding stability from the current mainstream development.
How can re-trigger validation for a single change?
We have enabled anyone to trigger ad-hoc executions of the Gerrit validation flow using the following URL:
https://gerrit-ci.gerritforge.com/job/Gerrit-verifier-flow/build
This is a standard Jenkins parametrized build that request the change-id to be built, as either SHA1 or number. Once the job is triggered the build will be executed and the validation feedback applied to your change, regardless of the previous build or validation status.