How to enable Git v2 in Gerrit Code Review

git-2-26

(c) Shutterstock / spainter_vfx

Git protocol v2 landed in Gerrit 3.1 on the 11th of October 2019. This is the last email from David Ostrovsky concluding a thread of discussion about it:

It is done now. Git wire protocol v2 is a part of open source Gerrit and will be
shipped in upcoming Gerrit 3.1 release.

And, it is even enabled per default!

Huge thank to everyone who helped to make it a reality!

A big thanks to David and the whole community for the hard work in getting this done!

This was the 3rd attempt to get the feature in Gerrit after a couple of issues encountered along the path.

Why Git protocol v2?

The Git protocol v2 introduces a big optimization in the way client and server communicate during clones and fetches.

The big change has been the possibility of filtering server-side the refs not required by the client. In the previous version of the protocol, whenever a client was issuing a fetch, all the references were sent from the server to the client, even if the client was fetching a single ref!

In Gerrit this issue was even more evident, since, as you might know, Gerrit leverages a lot the refs for its internal functionality, even more with the introduction of NoteDb.

Whenever you are creating a Change in Gerrit you are updating/creating at least 3 refs:

  • refs/changes/NN/<change-num>/<patch-set>
  • refs/changes/NN/<change-num>/meta
  • refs/sequences/changes

In the Gerrit project itself, there are currently about 104K refs/change and 24K refs/change/*/meta. Imagine you are updating a repo which is behind just a couple of commits, you will get all those references which will take up most of your bandwidth.

Git protocol v2 will avoid this, just sending you back the references that the Git client requested.

Is it really faster?

Let’s see if it really does what is written on the tin. We have enabled Gerrit v2 at the end of 2019 on GerritHub.io, so let’s test it there. You will need a Git client from version 2.18 onwards.

> git clone "ssh://barbasa@review.gerrithub.io:29418/GerritCodeReview/gerrit"
> cd gerrit
> export GIT_TRACE_PACKET=1
> git -c protocol.version=2 fetch --no-tags origin master
19:16:34.583720 pkt-line.c:80           packet:        fetch< version 2
19:16:34.585050 pkt-line.c:80           packet:        fetch< ls-refs
19:16:34.585064 pkt-line.c:80           packet:        fetch< fetch=shallow
19:16:34.585076 pkt-line.c:80           packet:        fetch< server-option
19:16:34.585084 pkt-line.c:80           packet:        fetch< 0000
19:16:34.585094 pkt-line.c:80           packet:        fetch> command=ls-refs
19:16:34.585107 pkt-line.c:80           packet:        fetch> 0001
19:16:34.585116 pkt-line.c:80           packet:        fetch> peel
19:16:34.585124 pkt-line.c:80           packet:        fetch> symrefs
19:16:34.585133 pkt-line.c:80           packet:        fetch> ref-prefix master
19:16:34.585142 pkt-line.c:80           packet:        fetch> ref-prefix refs/master
19:16:34.585151 pkt-line.c:80           packet:        fetch> ref-prefix refs/tags/master
19:16:34.585160 pkt-line.c:80           packet:        fetch> ref-prefix refs/heads/master
19:16:34.585168 pkt-line.c:80           packet:        fetch> ref-prefix refs/remotes/master
19:16:34.585177 pkt-line.c:80           packet:        fetch> ref-prefix refs/remotes/master/HEAD
19:16:34.585186 pkt-line.c:80           packet:        fetch> 0000
19:16:35.052622 pkt-line.c:80           packet:        fetch< d21ee1980f6db7a0845e6f9732471909993a205c refs/heads/master
19:16:35.052687 pkt-line.c:80           packet:        fetch< 0000
From ssh://review.gerrithub.io:29418/GerritCodeReview/gerrit
 * branch                  master     -> FETCH_HEAD
19:16:35.175324 pkt-line.c:80           packet:        fetch> 0000

> git -c protocol.version=1 fetch --no-tags origin master
19:16:57.035135 pkt-line.c:80           packet:        fetch< d21ee1980f6db7a0845e6f9732471909993a205c HEAD\0 include-tag multi_ack_detailed multi_ack ofs-delta side-band side-band-64k thin-pack no-progress shallow agent=JGit/unknown symref=HEAD:refs/heads/master
19:16:57.037456 pkt-line.c:80           packet:        fetch< 07c8a169d6341c586a10163e895973f1bdccff92 refs/changes/00/100000/1
19:16:57.037489 pkt-line.c:80           packet:        fetch< 0014ca6443ac0af338e2677b45e538782bb7a12e refs/changes/00/100000/meta
19:16:57.037502 pkt-line.c:80           packet:        fetch< b4af8cad4d3982a0bba763a5e681d26078da5a0e refs/changes/00/100400/1
19:16:57.037513 pkt-line.c:80           packet:        fetch< 9ec6e507c493f4f1905cd090b47447e66b51b7e1 refs/changes/00/100400/meta
19:16:57.037523 pkt-line.c:80           packet:        fetch< a80359367529288eea3c283e7d542164bced1e2f refs/changes/00/100800/1
19:16:57.037533 pkt-line.c:80           packet:        fetch< 170cced6d81c25d1082d95e50b37883e113efd01 refs/changes/00/100800/meta
19:16:57.037544 pkt-line.c:80           packet:        fetch< 6cb616e0ad4b3274d4b728f8f7b641b6bd22dce4 refs/changes/00/100900/1
19:16:57.037554 pkt-line.c:80           packet:        fetch< 286d1ee1574127b76c4c1a6ef0f918ad4c61953a refs/changes/00/100900/meta
19:16:57.037606 pkt-line.c:80           packet:        fetch< 312ba566d2620b43fb90be3e7c406949edf6b6d9 refs/changes/00/10100/1
19:16:57.037619 pkt-line.c:80           packet:        fetch< dde4b73cb011178584aae4fb29a528018149d20b refs/changes/00/10100/meta

…. This will go on forever …. 

As you can see there is a massive difference in the data sent back on the wire!

How to enable it?

If you want to enable it, you just need to update you git config (etc/jgit.config in 3.1 and $HOME/.gitconfig in previous versions) with the protocol version to enable it and restart your server:

[protocol]
  version = 2

Enjoy your new blazing fast protocol!

If you are interested in more details about the Git v2 protocol you can find the specs here.

Fabio Ponciroli (GerritForge)
Gerrit Code Review Contributor

Jenkins ❤︎ Gerrit Code Review, again

Gerrit Code Review has been integrated with Jenkins for over nine years. It was back when Kohsuke was still a Senior Engineer at Sun Microsystem, which was just announced to be acquired by Oracle and his OpenSource CI project was still called Hudson.

Jenkins and Gerrit are the most critical components of the DevOps Pipeline because of their focus on people (the developers), their code and collaboration (the code review) their builds and tests (the Jenkinsfile pipeline) that produce the value stream to the end user.

The integration between code and build is so important that other solutions like GitLab have made it a unique integrated tool and even GitHub has started covering the “last mile” a few months ago by offering powerful actions APIs and workflow to automate build actions around the code collaboration.

Accelerate the CI/CD pipeline

DevOps is all about iteration and fast feedback. That can be achieved by automating the build and verification of the code changes into a target environment, by allowing all the stakeholder to have early access to what the feature will look like and validating the results with speed and quality at the same time.

Every development team wants to make the cycle time smaller and spend less time in tedious work by automating it as much as possible. That trend has created a new explosion of fully automated processes called “Bots” that are more and more responsible for performing those tasks that developers are not interested in doing manually over and over again.

As a result, developers are doing more creative and design work, are more motivated and productive, can address technical debt a lot sooner and allow the business to go faster in more innovative directions.

As more and more companies are adopting DevOps, it becomes more important to be better and faster than your competitors. The most effective way to accelerate is to extract your data, understand where your bottlenecks are, experiment changes and measure progress.

Humans vs. Bots

The Gerrit Code Review project is fully based on an automated DevOps pipeline using Jenkins. We collect the data produced during the development and testing of the platform and extract metrics and graphs around it constantly https://analytics.gerrithub.io thanks to the OpenSource solution Gerrit DevOps Analytics (aka GDA).

By looking at the protocol and code statistics, we founded out that bots are much more hard worker than humans on GerritHub.io, which hosts, apart from the Gerrit Code Review mirrored projects, also many other popular OpenSource.

That should not come as a surprise if you think of how many activities could potentially happen whenever a PatchSet is submitted in Gerrit: style checking, static code analysis, unit and integration testing, etc.

human-vs-bot

We also noticed that most of the activities of the bots are over SSH. We started to analyze what the Bots are doing and see what the impact is on our service and possibly see if there are any improvements we can do.

Build integration, the wrong way

GerritHub has an active site with multiple nodes serving read/write traffic and a disaster recovery site ready to take over whenever the active one has any problem.

Whenever we roll out a new version of Gerrit, using the so-called ping-pong technique, we swap the roles of the two sites (see here for more details).  Within the same site, also, the traffic can jump from one to the other in the same cluster using active failover, based on health, load and availability. The issue is that we end up in a situation like the following:

Basic Use Case Diagram

The “old” instance still served SSH traffic after the switch. We noticed we had loads of long-lived SSH connections. These are mostly integration tools keeping SSH connections open listening to Gerrit events.

Long-lived SSH connections have several issues:

  • SSH traffic doesn’t allow smart routing. Hence we end up with HTTP traffic going on the currently active node and most of the SSH one still on the old one
  • There is no resource pooling since the connections are not released
  • There is the potential loss of events when restarting the connections

That impacts the overall stability of the software delivery lifecycle, extending the feedback loop and slowing your DevOps pipeline down.

Then we started to drill down into the stateful connections to understand why they exist, where are they coming from and, most importantly, which part of the pipeline they belong to.

Jenkins Integration use-case

The Gerrit Trigger plugin for Jenkins is one of the integration tools that has historically been suffering from those problems, and unfortunately, the initial tight integration has become over the years less effective, slow and complex to use.

There are mainly two options to integrate Jenkins with Gerrit:

We use both of them with the Gerrit Code Review project, and we have put together a summary of how they compare to each other:

Gerrit Trigger Plugin Gerrit Code review Plugin Notes
Trigger mechanism Stateful

Jenkins listens for Gerrit events stream

Stateless

Gerrit webhooks notify events to Jenkins

Stateful stream events are consuming resources on both Jenkins and Gerrit
Transport Protocol SSH session on a long-lived stream events connection HTTP calls for each individual stream event – SSH cannot be load-balanced
– SSH connections cannot be pooled or reused
Setup Complexity Hard: requires a node-level and project-level configuration.

No native Jenkinsfile pipeline integration

Easy: no special knowledge required.

Integrates natively with Jenkinsfile and multi-branch pipeline

Configuring the Gerrit Trigger Plugin is more error-prone because requires a lot of parameters and settings.
Systems dependencies Tightly Coupled with Gerrit versions and plugins. Uses Gerrit as a generic Git server, loosely coupled. Upgrade of Gerrit might break the Gerrit Trigger Plugin integration.
Gerrit knowledge Admin: You need to know a lot of Gerrit-specific settings to integrate with Jenkins. User. You only need to know Gerrit clone URL and credentials. The Gerrit Trigger plugin requires special user and permissions to listen to Gerrit stream events.
Fault tolerance to Jenkins restart Missed events: unless you install a server-side DB to capture and replay the events. Transparent: all events are sent as soon as Jenkins is back. Gerrit webhook automatically tracks and retries events transparently.
Tolerance to Gerrit rolling restart Events stuck: Gerrit events are stuck until the connection is reset. Transparent: any of the Gerrit nodes active continue to send events. Gerrit trigger plugin is forced to terminate stream with a watchdog, but will still miss events.
Differentiate actions per stage No flexibility to tailor the Gerrit labels to each stage of the Jenkinsfile pipeline. Full availability to Gerrit labels and comments in the Jenkinsfile pipeline
Multi-branch support Custom: you need to use the Gerrit Trigger Plugin environment variables to checkout the right branch. Native: integrates with the multi-branch projects and Jenkinsfile pipelines, without having to setup anything special.

Gerrit and Jenkins friends again

After so many years of adoption, evolution and also struggles of using them together, finally Gerrit Code Review has the first-class integration with Jenkins, liberating the Development Team from the tedious configuration and BAU management of triggering a build from a change under review.

Jenkins users truly love using Gerrit and the other way around, friends and productive together, again.

Conclusion

Thanks to Gerrit DevOps Analytics (GDA) we managed to find one of the bottlenecks of the Gerrit DevOps Pipeline and making changes to make it faster, more effective and reliable than ever before.

In this case, by just picking the right Jenkins integration plugin, your Gerrit Code Review Master Server would run faster, with less resource utilization. Your Jenkins pipeline is going to be simpler and more reliable with the validation of each change under review, without delays or hiccups.

The Gerrit Code Review plugin for Jenkins is definitively the first-class integration to Gerrit. Give it a try yourself, you won’t believe how easy it is to set up.

Fabio Ponciroli
Gerrit Code Review Contributor, GerritForge.

Accelerate with Gerrit DevOps Analytics, in one click!

 

Accelerating your time to market while delivering high-quality products is vital for any company of any size. This fast pacing and always evolving world relies on getting quicker and better in the production pipeline of the products. The whole DevOps and Lean methodologies help to achieve the speed and quality needed by continuously improving the process in a so-called feedback loop. The faster the cycle, the quicker is the ability to achieve the competitive advantage to outperform and beat the competition.

It is fundamental to have a scientific approach and put metrics in place to measure and monitor the progress of the different actors in the whole software lifecycle and delivery pipeline.

Gerrit DevOps Analytics (GDA) to the rescue

We need data to build metrics to design our continuous improvement lifecycle around it. We need to juice information from all the components we use, directly or indirectly, on a daily basis:

  • SCM/VCS (Source and Configuration Management, Version Control System)
    how many commits are going through the pipeline?
  • Code Review
    what’s the lead time for a piece of code to get validated?
    How are people interacting and cooperating around the code?
  • Issue tracker (e.g. Jira)
    how long does it take the end-to-end lifecycle outside the development, from idea to production?

Getting logs from these sources and understanding what they are telling us is fundamental to anticipate delays in deliveries, evaluate the risk of a product release and make changes in the organization to accelerate the teams’ productivity. That is not an easy task.

Gerrit DevOps Analytics (aka GDA) is an OpenSource solution for collecting data, aggregating them based on different dimensions and expose meaningful metrics in a timely fashion.

GDA is part of the Gerrit Code Review ecosystem and has been presented during the last Gerrit User Summit 2018 at Cloudera HQ in Palo Alto. However, GDA is not limited to Gerrit and is aiming at integrating and processing any information coming from other version control and code-review systems, including GitLab, GitHub and BitBucket.

Case study: GDA applied to the Gerrit Code Review project

One of the golden rules of Lean and DevOps is continuous improvement: “eating your dog food” is the perfect way to measure the progress of the solution by using its outcome in our daily life of developing GDA.

As part of the Gerrit project, I have been working with GerritForge to create Open Source tools to develop the GDA dashboards. These are based on events coming from Gerrit and Git, but we also extract data coming from the CI system, the Issue tracker. These tools include the ETL, for the data extraction and the presentation of the data.

As you will see in the examples Gerrit is not just the code review tool itself, but also its plugins ecosystem, hence you might want to include them as well into any collection and processing of analytics data.

Wanna try GDA? You are just one click away.

We made the GDA more accessible to everybody, so more people can play with it and understand its potentials. We create the Gerrit Analytics Wizard plugin so you can have some insights in your data with just one click.

What you can do

With the Gerrit Analytics Wizard you can get started quickly and with only one click you can get:

  • Initial setup with an Analytics playground with some defaults charts
  • Populate the Dashboard with data coming from one or more projects of your choice

The full GDA experience

When using the full GDA experience, you have the full control of your data:

  • Schedule recurring data imports. It is just meant to run a one-off import of the data
  • Create a production ready environment. It is meant to build a playground to explore the potentials of GDA

What components are needed?

To run the Gerrit Analytics Wizard you need:

You can find here more detailed information about the installation.

One click to crunch loads of data

Once you have Gerrit and the GDA Analytics and Wizard plugins installed, chose the top menu item Analytics Wizard > Configure Dashboard.

You land on the Analytics Wizard and can configure the following parameters:

  • Dashboard name (mandatory): name of the dashboard to create
  • Projects prefix (optional): prefix of the projects to import, i.e.: “gerrit” will match all the projects that are starting with the prefix “gerrit”. NOTE: The prefix does not support wildcards or regular expressions.
  • Date time-frame (optional): date and time interval of the data to import. If not specified the whole history will be imported without restrictions of date or time.
  • Username/Password (optional): credentials for Gerrit API, if basic auth is needed to access the project’s data.

Sample dashboard analytics wizard page:

wizard.pngOnce you are done with the configuration, press the “Create Dashboard” button and wait for the Dashboard, tailored to your data, to be created (beware this operation will take a while since it requires to download several Docker images and run an ETL job to collect and aggregate the data).

At the end of the data crunching you will be presented with a Dashboard with some initial Analytics graphs like the one below:

dashboard-e1549490575330.png

You can now navigate among the different charts from different dimensions, through time, projects, people and Teams, uncovering the potentials of your data thanks to GDA!

What has just happened behind the scenes?

When you press the “Create Dashboard” button, loads of magic happens behind the scenes. Several Docker images will be downloaded to run an ElasticSearch and Kibana instance locally, to set up the Dashboard and run the ETL job to import the data. Here a sequence workflow to illustrate the chain of events is happening:

components.png

Conclusion

Getting insights into your data is so important and has never been so simple. GDA is an OpenSource and SaaS (Software as a Service) solution designed, implemented and operated by GerritForge. GDA allows setting up the extraction flows and gives you the “out-of-the-box” solution for accelerating your company’s business right now.

Contact us if you need any help with setting up a Data Analytics pipeline or if you have any feedback about Gerrit DevOps Analytics.

Fabio Ponciroli – Gerrit Code Review Contributor – GerritForge Ltd.