2023: New Year and opportunities for GerritForge and Gerrit Code Review

TL;DR: GerritForge has been dedicating its efforts to organising and managing the Gerrit User Summit in London back in November 2022, in conjunction with the release of Gerrit v3.7. The event has been a great success, with a significant presence on-site and record-breaking attendees on the GerritForge TV youtube channel. It has also committed to its promises to research and improve the JGit and Gerrit scalability to large mono-repos, with tens of millions of objects and refs. 2023 will see the finalisation of these efforts with an increase in development efforts and a new JGit Committer for pushing the platform to a new level of performance and scalability and a new innovating system for collecting and optimising the repository metrics automatically. Stay tuned.

Read the full story here below (9 mins read).


2022 has been a critical year for turning the Gerrit Code Review community and development back on track after the COVID-19 pandemic. At GerritForge, we’ve been working hard to make sure that the development, support, and innovation of Gerrit Code Review continue on its main objectives.

Gerrit Code Review v3.6 and v3.7

We have continued to deliver on the development and release of Gerrit Code Review and its plugins, helping the testing and releasing of versions v3.6.0 (May) and v3.7.0 (November).

Some numbers of the past 12 months’ development contributions by individual committers and companies:

  • 3,627 Changes have been merged on 76 projects related to the Gerrit Code Review platform, including JGit
  • 113 committers from 42 different organisations

A special mention to the top #10 contributors: Google (Ben Rohlfs, Edwin Kempin, Chris Pouchet, Dhruv Srivastava, Frank Borden, Milutin Kristofic), GerritForge (Luca Milanesio), Wikimedia (Paladox) and SAP (Matthias Sohn and Thomas Dräbing).

In comparison with 2021, we had 25% fewer changes merged but with more contributors coming from more companies, which is a symptom to a very healthy and thriving ecosystem of maintainers.

GerritForge has committed to resuming the face-to-face user summits, which were suspended since 2020.

The Gerrit User Summit 2022 took place in London, UK the 10-11 of November in a hybrid format, with people having the opportunity to participate either on-site or remotely on GerritForge’s YouTube TV channel.

It was a glorious success, with record-breaking attendance from all around the globe:

  • 50 people registered to attend on-site, 26 of them managed to arrive despite the London tube strike, whilst the others attended remotely
  • 235 people viewed the summit on YouTube with an average view time of 40 mins (one talk)

The summit survey had an outstanding report showing a huge acceptance and appreciation of the event:

  • 82% rated the remote video streaming as “good” or “outstanding”
  • 96% rated the quality of the summit as “good” or “outstanding.”
  • 100% would recommend the summit to a colleague, with 83% strongly recommending it

GerritHub.io SLA gets closer to five-nines.

We have been working hard to make Gerrit more stable and resilient throughout 2022, discovering and fixing many issues in the code base and on the multi-site software architecture.
In 2022, GerritHub.io had only six small hiccups for a total of 19 mins of downtime (SLA = 99.997%) over a 12-month period, a 75% reliability improvement compared to 2021.

We have run extensive RCAs on the causes of the downtime and identified two leading issues, which are explained in the details below.

The “anonymous unlimited query” hole in Gerrit
GerritHub.io has been subject to a 15 mins outage because of anonymous users being able to bring offline all the sites before the system could auto-recover.
Gerrit allows bypassing of all limits set in the ACLs for running queries by simply adding the “no-limit” parameter.
Returning an arbitrary payload without limits could allow a single user to generate a server-side workload for collecting and building a GBytes-sized JSON payload; unfortunately, that option was available to everyone, including anonymous users making any publicly faced Gerrit Code Review installation subject to deny-of-service attacks.
We have identified the issue, reported and fixed it in Gerrit with Change 333304, which has been included in Gerrit v3.3.10, v3.4.4, v3.5.1, and all v3.6.0 or later releases.

More granular monitoring and alerting
We have lowered the threshold of uptime checks on GerritHub.io to 1 minute, giving us the ability to detect and react immediately to 4 smaller hiccups. We have detected a lack of scalability for some specific higher-load projects. Those hiccups have been responsible for 2 mins of downtime over the 2nd part of 2022. Many more projects are also planning to be onboarded on GerritHub.io; hence we do need to address this project-specific capacity needs.

Scaling Gerrit Code Review and JGit beyond its limits

We have been investing a massive effort in building a test environment designed to stress Gerrit and JGit to its limits and identify all the limitations and bottlenecks that prevented us from scaling further.

Scaling the test repository
We have created over the months some test repositories that increased in every dimension:

  • Tens of millions of refs as both refs/changes and refs/heads
  • Millions of delta-chains
  • Tens of millions of Git objects
  • Packfiles of tens of Giga-bytes and packed refs of hundreds of megabytes

For generating a significant load on both client and server side, we have invested more into the aws-gerrit cloud setups and gatling-git performance loading tool.

There were some “well-known” issues and additional surprising ones.

SHA1 complexity and CPU utilization for large entities
JGit has been used SHA1 for identifying uniqueness not just for Git objects but also for other large entities. However, computing SHA1 has become increasingly CPU intensive because of the relatively recent findings about collisions on shattered.io.
We have highlighted two major potential improvements in cooperation with Matthias Sohn (SAP) on the raw SHA1 performance and its application for detecting packed-refs changes on the filesystem.

Commit priority queues
JGit has a custom implementation of priority queues which are intensively used in RevWalk, which has almost quadratic complexity. That isn’t a problem for small to medium chains of commits; however, when the number of commits reaches millions, the performance degradation becomes unbearable.
We have replaced the JGit’s custom implementation with the one provided by the Java JVM library, which has a logarithmic complexity that massively improves its performance with large commit chains.

Unwanted reachability checks
JGit needs to perform a full reachability check whenever a remote unknown client is advertising refs, which makes sense when serving a remote client. However, the cost of full reachability of millions of advertised refs can be a daunting task that may be alleviated if the remote end can be considered trusted.

Fixing JGit bitmaps
Since the introduction of Git bitmap, the whole community has learned how key they are in speeding up the counting and selection during the clone phase.
However, large and unoptimized bitmaps could be so unhelpful for Git that instead of speeding up, they could represent a massive overhead for the system, causing CPU spikes and, eventually, lowering the throughput of the server.
Git bitmaps are compressed using the JavaEWAH library, which is good for memory consumption but evil for CPU utilization: that is the reason why the smaller is best for performance.
We have discovered and fixed a critical issue with the JGit bitmap generation that was causing the inclusion of all commits and BLOBs pointed by annotated tags. Also, we have introduced the ability to inform JGit about the heads that can be excluded from the bitmap, allowing to shorten the creation tens of thousands times (5h generation time for a 2k refs to as little as 60s) and increase its effectiveness by 200%.

Millions of unneeded ref logs
When performing a clone of a repository with millions of heads, JGit created one local reflog file for every remote ref, including the ones there were not actually cloned but just fetched as remote references. This was creating a significant performance gap between JGit and Git, which would instead lazily create the reflog files once they are effectively checked out the first time. Cloning a single branch of a repository with millions of remote refs took around 1h, compared to a few minutes of Git.

All of the findings were included in multiple updates on the following components:

  • JGit changes: all fixes were also provided to stable-5.13, the last supported branch for Java 8, which allows benefiting from these improvements for older versions of Gerrit from v2.16 onwards.
  • pull-replication went through major performance improvements, achieving a 1000x times faster execution time compared to the traditional replication plugin
  • aws-gerrit is going through upgrades for making use of pull-replication plugin, including the support for the bearer token which allows to replicate virtually any repository, including All-Users.git
  • gatling-git: we have upgraded the Gatling version and JGit to the latest stable-5.13 to include the latest performance improvements.
  • git-repo-metrics: we have introduced a brand-new plugin that allows us to keep under control the major dimensions of a repository and therefore graph their increase over time.

GerritForge goals for 2023

We are definitely not done yet with the performance improvements on Gerrit and JGit: there are still significant improvements to be made, and JGit changes to get merged into the mainstream branches.
We believe we are on track to finalize the job and allow a stable and scalable platform for large Git repositories in 2023.

Finalise what we cooked in 2022 for JGit
JGit has a new maintainer, David Ostrovsky, awarded in 2022 as Git committer of the project. GerritForge’s devs are focused to get more reviews and attention to the JGit performance improvements. We are committed to finalising all the open changes related to large repositories.

JGit multi-pack indexes support
There is still a major gap between JGit and Git when dealing with very active repositories: multi-pack indexes. The proliferation of packfiles would eventually lead to a long and painful search-for-reuse phase for BLOBs which could be cut down 100s of times with a multi-pack index.

Git repository optimiser for Gerrit
We have been working on tracking the live information on the Git repository, thanks to the git-repo-metrics plugin. Wouldn’t it be nice to have a tool that can do something with it and automatically?
We would be doing R&D on how to correlate the repository metrics, the Git audit trail, and the performance data for making AI-based decisions on what needs to be improved on the repository.
This work stream is going to be useful for any Git repository, not just the ones powered by Gerrit Code Review. The ‘git-repo-metrics’ and the repository optimiser would also apply to other products, including GitHub and GitLab.

Gerrit v3.8 and projects-specific change numbers
We will finalise the design document for the transition to project-specific change numbers in Gerrit v3.8. That would allow the seamless migration of projects across Gerrit setups without having to worry about changes renumbering anymore.

Gerrit Code Review testing and GerritForge-certified binaries
GerritForge is spending a tremendous amount of time developing test environments and tools for serving the Gerrit community with more stable releases and improving the quality of its code. We want to intensify the effort and also offer our platinum support customers a unique service that includes the GerritForge digital signature and rubber stamp on the binaries of Gerrit Code Review and its plugins that have been successfully tested and validated for being production-ready.
Stay tuned; more details are coming soon …

GerritForge company forecast in 2023

GerritForge Inc. will finalise its roll-out to the USA, and all contracts and services will be run from Sunnyvale, CA and Europe. Over 2022, 60% of the customers and businesses have already been moved, and the operation will be completed over the course of 2023.

We are looking forward to doubling our revenue figures in 2023 and also our contributions to the open-source community, with a main focus on JGit as the driver of performance growth for Gerrit Code Review.


2023 is going to be an incredible year for GerritForge, Gerrit Code Review, and the JGit community altogether.

Happy New start of the Year 2023!

Luca Milanesio (GerritForge)
Gerrit Code Review Maintainer and Release Manager
Member of the Gerrit Engineering Steering Committee

Go Agile with Git Part 1 of 3: Workflows, Branching & Merging

git+gerrit-webinar LM-20130111-Series-1 LS final-LM-fixes

The first webinar of the CollabNet series “Go Agile with Git” had a great audience of over 200 attendees from all over the world !! A big THANK YOU to all the attendees for the great result and the great questions during the webinar.

Unfortunately not all of the questions could be answered during the Webinar because of the limited time available. You can find below all the Webinar questions and answers on Git and Agile Development … and feel free to post additional ones by commenting this post: you will definitely get our answer shortly on the blog.

GerritForge LLP is proud partner of CollabNet Inc. and invites you to replay the first webinar and register to the following two webinars of the series:

  1. Workflows, Branching & Merging (View on demand)
    Learn the basic concepts of Git and Gerrit and see explore the power of merging and applying code from different branches thanks to recursive merge, rebase and cherry-pick.
  2. Peer Programming & Code Reviews (January 29th) << Next webinar
    See how Gerrit Code Review can enforce full code collaboration and collective ownership, similarly to peer programming you can achieve faster and better code delivery. Learn how to integrate Jenkins CI into the development process and have automatic validation to feedback into your code review workflow.
  3. Hands-On Lab with Gerrit & Jenkins (February 12th)
    Step-by-step hands-on with Gerrit to learn how to move your first steps with Code Review: contribute your change, trigger your Jenkins CI validation and learn how to understand the feedbacks and amend your code for achieving a successful merged change into the main branch. Learn how to manage multiple reviews when the code evolves.

Workflows, Branching and Merging – Q&A

Q: Do you have any tool to migrate source code of one SCM tool to GIT ?
A: Git native distributions already include tools that allow to fetch code from external SVN repository and push to a Git, including their history and branches. For other SCMs (i.e. CVS, Perforce, TFS) there are OpenSource and commercial tools to migrate the full history of your repository into Git.

Q: Gerrit Dashboards already works?
A: Dashboards have been introduced in Gerrit in November 2012 hackathon and are one of the major features of Gerrit 2.6. For more information on the Gerrit roadmap see our previous post.

Q: how can use GIT to map it ClearCase UCM
A: Git is not really comparable to ClearCase UCM. Gerrit Code Review plus the additional ALM features provided by CollabNet TeamForge or the Enterprise Issue-Tracker integrations of GerritForge LLP, can really be mapped to ClearCase UCM concepts as both define the concept of “project” and “tracking” between code changes and work items into your development lifecycle. Git can be mapped to ClearCase whilst Gerrit and its Issue-Tracker integrations map to UCM with the role of associating one or more Git commits to a delivery or set of issues tracked in ClearQuest.

Q: will we cover Gerrit and AD integration? or Git AD integration ?
A: Git does not provide AD integration out-of-the-box, unless integrated and customised through an HTTP front-end (i.e. Apache) to resolve user credentials authentication. Gerrit provides AD and industry-standard LDAP support and will be covered in the second and third webinars in the series.

Q: are there any danger when auto-merging?
A: It depends on whether you have Gerrit triggering your Jenkins CI to “protect” from wrong auto-merges or if you just rely on nightly builds and standard QA validation: when Gerrit and Jenkins are integrated, even auto-mergine is not a risk as the consistency of the code-merge is validated. Typically a code-review workflow requires the change to be already “rebased” on the branch candidate to receive the change, otherwise you would run into the risk of validating a change on the wrong context.

Q: If someone commits a huge binary file by accident and pushes it to the origin/master, can that be undone or is it there to stay forever? Or is this also covered in history protection link?
A: Git always allows to “remove” commits permanently, using “git reset” or “git rebase -i” (interactive rebase) to remove or amend the commit containing the huge binary file. However Git keeps the binary blob into its objects until garbage collection will trigger an physical removal of all unreferenced objects, including the huge binary file. With CollabNet TeamForge history protection you have full control of when and what to remove completely from your Git history (in this case the huge binary file).

Q: Can roles in Gerrit be defined in ldap?  If so, what are the requirements? Can roles in Gerrit be defined in an external source?
A: Gerrit allows the usage of RBAC (role-based access control) through its group level permissions. Groups can be defined in LDAP and Gerrit Ver. 2.5 can use them and retrieve them dynamically using the its pluggable “group backend” infrastructure. Groups LDAP entry points need to be configured in gerrit.config in order to be fetched by Gerrit. Similarly other external sources can be plugged as “group backend” into Gerrit by implementing the connector interface to the external roles as Gerrit plugin. For more information on how to create a Gerrit plugin see our talk on “How to cook a Gerrit Plugin in 10 mins” presented @GooglePlex in Mountain View in November 2012.

Q: You mentioned Jenkins fits the best here. How do you compare it with Team City in this regard? 
A: TeamCity is a very advanced CI system and has introduced the concept of “test build” and “private builds” to prevent faulty code to break the team build, same battleground of Code Review to improve Team Agility without loosing control and code quality. Unfortunately TeamCity is not OpenSource and thus has a more limited community of plugin developers providing integrations for Git and Gerrit. Jenkins CI  has the world’s largest community of plugin developers for a Continuous Integration engine and already provides support for Gerrit Code Review workflow.

Q: How many people voted on Gerrit poll?
A: 183

Q: What is the minimum number of branches needed if two geographically seperated teams are sharing the same codebase and committing to the same codebase? e.g. production, master, dev1, de2, etc. How many should we maintain to be minimal?
A: Minimum number of branches is actually one, the master. Two repositories in two locations have already a “copy” of the “master” branch and then are effectively different branches that can be merged during the repository synchronisation process across locations. In practice it is recommended to have at least one active development branch (“master”) and then one branch per stable release, independently from the number of geographically separated teams. Additionally you may want to develop features in parallel using the “topic” branches and then having them merged into the “master” as they are validated through Code Review by the Teams.

Q: We are about to start the migration from SVN to GIT and are worried about the best way to train the developers – can you give any tips of the best approach
A: Git requires a lot of “day-by-day” support from the beginning, especially for Teams coming from a SVN experience. Concepts are different and sometimes SVN command names have a different meaning and behaviour in Git. It is recommended to carefully tune Gerrit permissions to avoid “dangerous” pushes  from “beginners” Team members (i.e. disabling forge identity and forced push) and start with a dedicated Git training to a group of “champions” that will be then spread across all of the Teams to provide support. Those “champions” will become then the “day-by-day” tutorials of the other Team members learning how to use Git in everyday work. P.S. Stick the “Git cheat sheet” on the wall of every Team room to enforce the Git development SCM workflow.

Q: How do users use git on windows systems? Is there a tool like Tortoise?
A: Yes and its name is TortoiseGit (no surprise here !) but it is only a GUI front-end to Git for Windows that needs to be installed as pre-requisite. However I do encourage the Teams to use the Git command line in order to first understand the concepts and then to learn how to get the best from the tool when they need it. For most common operations they can use TortoiseGit or any other Git client GUI tool (personally I use and like a lot GitX on MacOS) but they need to understand the underlying commands that are executed and generated as Git actions on the repository.

Q: How to find out which files someone has edited over history of project?  It is easy in a specific commit, but seems hard over the life of the project …
A: Git is a very powerful tool designed to provide full history inspection and code search. Each command is very sophisticated and flexible to perform even complex tasks. This specific task mentioned is not complex for Git, the following example provides a list and stat of lines changed by someone@somecompany.com over the current branch:
git log –author someone@somecompany.com –numstat

Q: Do you have any best practices to migrate from Subversion to GIT?
A: My best suggestion is to assure consistency of Subversion vs Git usernames and e-mails. You wouldn’t like to loose visibility and association of who made a commit in SVN and his identity in Git. In order to enforce consistency make sure that “forge identity” permission is disabled when start using Git after the initial migration, in order to keep your ex-SVN users always aligned with their own identity in Git. “Forge identity” can then be granted gradually as developers become more aware and familiar with the Git tool.

Q: C1 to C5 is like Team Branch. C6, C7 is like Dev Branch.

recursive-merge

A: C6-C7 is more like a “topic-branch” where one or more developers are working on a specific feature. Master is the main development branch.

Q: Why do we need a Contributor who is NOT a commiter?
A: Anybody in the Team or outside the Team (when authorised to do so) should be able to provide their ideas and possibly even their fixes to the code; however you do not necessarily want to allow everybody to commit and merge a change (potentially flawed) and break the Team master build, causing delay on your project sprint. Defining “contributors” you can allow everybody to provide their changes into the Team’s discussion and promote them through automatic validation (Jenkins CI builds) and collective code-review, avoiding the risk of breaking the build by unwanted or not validated changes. You can then differentiate them from the “Senior Team Members” (Committer) who have the technical ownership of the design and timelines of the Team Project.

Q: any comment on gerrit vs sonar code reviews?
A: Sonar is a fantastic infrastructure to trigger code quality checks but not necessarily to action on code promotion. Gerrit is the place where people decide what to do with regards to a change, whilst Sonar is only providing a feedback on “how the change looks like” without any active review action on it. However it would be possible to integrate Sonar feedback into Gerrit code-review lifecycle developing a Sonar plugin for Gerrit or simply getting Sonar feedback into Jenkins build and then providing a validation result to Gerrit through the Jenkins Gerrit plugin.

Q: In the Git merge slide, what’s the recursive option?
A: Recursive merge strategy (the default merge option in Git) allows automatically to detect complex branches history and merge them together by walking recursively into each branch history and looking by a common ancestor branch point.

Q: does Gerrit only works with Git?
A: Yes, mainly because the it needs all the power and ability of Git to merge and rebase changes once they have been reviewed, approved and then submitted. Additionally Gerrit uses Git ability to define custom “hidden branch namespaces” to store Security and Roles definition of the repository itself.

Q: will Gerrit work with Git Fusion? Do we need a mirror repository in that case?
A: Perforce Git Fusion, used as “blessed Git repository” is a Perforce instance that “talks” the Git protocol to allow cloning and pushing to it. However Gerrit uses JGit to access its Git repository on the Filesystem and cannot then use a Perforce backend. However Perforce Git Fusion could be used as synchronisation peer of a Gerrit repo, through the usage of the Git replication protocol.

Q: how does the distributed scm work in git when the development teams are spread across geographical locations?
A: You typically configure one Gerrit replicated instance per geographical site, in order to allow maximum performance on the “git pull” operation. However for minimising conflicts during geographical repository replication, it is important to perform “git push” to only one of the repositories replicas; all the others will get the changes eventually thanks to the Gerrit replication events being propagated.

Q: how does git auto resolve work, how does git know which files to merge first in order to keep all changes from all submits?
A: Git has a lot of powerful commands and hidden (or semi-hidden) functionalities, waiting to be discovered and leveraged !
I think you refer here to the “git automatic conflict resolution” (aka git rerere) described in this Git documentation link. Git stores the “history” on how a conflict was resolved in the past and then reuse the same info to resolve future conflicts. Functionality is normally disabled but can be turned on with the following command:
git config –global rerere.enabled true

Q: Can you please explain what’s ther diff between Recursive Merge vs Rebase? the diagrams look very similar
A: There is one fundamental difference between “merging” and “rebasing” two branches: in the first case the merge preserves the full history of the two branches and creates one extra commit containing the “merged code”; in the second case the rebase modifies the history of the branch by “replaying the changes” on top of the target branch, as you were in a time-machine and you would push changes into a future state and the branch point was never created in the past. In a nutshell, the “merge” can be reverted because doesn’t change the history (removing the merged commit would suffice) whilst a “rebase” cannot be reverted easily as the “rebased branch” history has been modified and reapplied in the future. Rebase however has the effect of “flattening up” the branch history and allowing a more concise and readable set of commits on a development branch.

Thank you again for your attendance and don’t miss our next Webinar with CollabNet Inc, Tuesday, January 29, 10:00 AM – 11:00 AM PST.