Gerrit Code Review RBE: moving to BuildBuddy on-prem

The Gerrit Code Review Open-Source project has transitioned from using Google Cloud Platform’s Remote Build Execution (RBE) to BuildBuddy’s on-premises to address performance, stability, and latency issues. The migration process included setting up a new Jenkins controller and provisioning BuildBuddy executors on newly provisioned on-premises boxes, which showed significantly reduced build times and a more consistent and reliable performance. After thorough evaluation and community consensus, BuildBuddy was adopted as the new default for Gerrit’s CI/CD pipeline, enhancing overall efficiency and stability.

Historical Context

The Gerrit Code Review project has undergone significant evolution in its build processes to enhance efficiency and performance. This evolution reflects the increasing complexity and demands of modern CI/CD pipelines.

Overview of Gerrit Code Review

Gerrit is a powerful code review tool with a powerful web and command-line interface, all built on top of the Git open-source project. Gerrit codebase is significant and multifaceted, using Python tooling, TypeScript front-end and a Java-based backend. To appreciate the challenges and the need for robust build tools, consider the scope of Gerrit’s codebase and build activity:

  • Plugins: Gerrit comprises 14 core plugins maintained as git submodules, plus a universe of over 300 community-based plugins developed in multiple languages, from Java to Scala and Groovy.
  • Java Codebase: The project includes 6011 Java files, with 4765 dedicated to production code, amassing ca. 411,768 lines of code (LoC). Additionally, there are 1246 test files (924 unit tests and 322 integration tests) contributing another ca. 276,632 LoC.
  • Frontend Codebase: The frontend is built with 110 JavaScript files (ca. 2345 LoC), 733 TypeScript files (ca. 175,765 LoC), 293 HTML files, and 9 CSS files.
  • Dependencies: Gerrit relies on 135 Java dependencies managed through Maven and 25 NPM dependencies (5 runtime and 20 development).

Gerrit has been founded in 2008 and has over 15 years of code-history, which reflects the evolution of the build tools, Java VMs and front-end technologies used for over a decade. The pre-requisites that you would have to manage in order to build Gerrit are diverse and quite challenging.

Build and Verification Activity

The Gerrit project is highly active, with rigorous commit-level verification processes to ensure code quality and stability. For example, from June 9 to June 23, 2024, Gerrit handled:

Total of changes:

BranchNumber of changes
master65
stable-3.1018
stable-3.916
stable-3.815
stable-3.72
stable-3.60
stable-3.51
stable-3.41
Total118

Total of revisions (patch sets):

BranchNumber of revisions
master230
stable-3.1086
stable-3.921
stable-3.822
stable-3.74
stable-3.60
stable-3.54
stable-3.41
Total368

Total of Gerrit verifications:

Type of verificationnumber of verifications
Build/Tests277
Code Style320
PolyGerrit UI Tests124
RBE BB Build/Tests271
Total992

Evolution of Build Tools

The journey of Gerrit’s build tools reflects its growth and the increasing complexity of its CI/CD requirements:

  1. Apache Maven: Up until version 2.7, Gerrit used Apache Maven as its build tool. Maven, known for its comprehensive project management capabilities, was sufficient during Gerrit’s early stages.
  2. Buck: From version 2.8 to 2.13, Gerrit transitioned to Buck, a build tool designed for faster builds. Buck’s incremental build capabilities helped manage the growing codebase more efficiently than Maven.
  3. Bazel: Since version 2.14, Bazel has been the default build tool for Gerrit. Bazel’s advanced features, including its support for remote caching and execution, provided significant improvements in build performance and scalability.

Transition to Bazel with Remote Execution and Caching

In December 2020, Gerrit Code Review made a significant shift by adopting Bazel with remote execution and caching to address the challenges of long build times. This strategic move aimed to leverage Bazel’s advanced capabilities to enhance the efficiency of the CI processes.

Reasons for the Shift

The primary driver for this transition was the increasing build times due to the growing complexity and size of the Gerrit codebase. The conventional local build processes were becoming a bottleneck, slowing down the development and integration cycles.

Implementation with GCP Remote Build Execution (RBE)

Gerrit integrated Google Cloud Platform’s Remote Build Execution (GCP RBE) as the remote server to support this transition. The integration provided several key benefits:

  • Reduced Build Times: By offloading build and test tasks to powerful remote servers, build times were significantly reduced.
  • Efficient Resource Utilization: Local machines were freed from heavy build tasks, allowing developers to continue working without interruptions.
  • Scalability and Parallelisation: Remote execution and the parallelisation of Gerrit’s Bazel tasks allowed to leverage the scalable cloud resources.

This implementation marked a crucial enhancement in Gerrit’s CI/CD pipeline, setting the stage for further optimisations and improvements in the build process.

Motivation to find RBE alternatives

The RBE implementation on Google Cloud has served the Gerrit Code Review project successfully for many years; however, the needs of the project grew over time and the CI/CD infrastructure had to satisfy additional requirements.

  1. Stability: Google Cloud is SaaS solution which could be flaky at times, whilst the project needed a stable deployment with full control on its stability not influenced by external factors.
  2. Latency between the controller and the executors: the latency between the main CI/CD controller (Jenkins) and the RBE executors paid a significant price for shorter builds like the Code-Style checks, whilst a localised data processing resulted in faster build times and quicker feedback cycles.
  3. Predictability: Consistent and reliable performance is crucial for efficient CI/CD workflows.

Moving to BuildBuddy RBE

BuildBuddy is an open-core Bazel build event viewer, result store, remote cache, and remote build execution platform that provided many new benefits to the Gerrit Code Review builds:

  1. Integration and Customisation: the integration with the existing CI/CD pipelines was straightforward.
  2. Open Source Community: BuildBuddy, being open-core, benefits from community-driven innovation and collaborative support.
  3. Enterprise Features: BuildBuddy Enterprise offers advanced features for companies that need robust capabilities:
    • OpenID Connect Auth Support: Integrates with Google OAuth.
    • Remote Build Execution: Supports custom Docker images.
    • Configurable Bazel Caches TTL: Allows setting TTL for build results and cache with support for persistent build artifact storage.
    • High Availability: Configurations for high availability also on-premises
  4. Control and Stability: On-premise deployment offers full control and enhanced stability by minimizing reliance on external factors.
  5. Very Low Latency: Localized data processing results in faster build times and quicker feedback cycles: we could locate the executors and the Jenkins controller in the same data-centre with micro-seconds network latency.
  6. Predictable Performance: Consistent and reliable performance is crucial for efficient CI/CD workflows, thanks to the dedicated always-on executors.

BuildBuddy RBE allowed more development efficiency and reliability for the Gerrit Code Review project, making it a compelling choice for optimizing CI/CD processes while leveraging the benefits of open-source software and robust enterprise features.

What was the migration plan ?

To clarify a few points for a better understanding of the this section:

  1. Scope of Bazel RBE Execution: Bazel RBE is executed only in the Gerrit project and its core plugins (git submodules). It is not executed in non-core plugins, such as pull-replication, high-availability, multi-site, etc.
  2. Branch Support: From a CI/CD perspective, only the master branch and the last three stable branches are supported for Gerrit project, core and non-core plugins. At the time of the migration, these branches were master, stable-3.7, stable-3.8, and stable-3.9.

The initial phase of the migration aimed to assess the reliability and stability of BuildBuddy RBE. A priority in this phase was to maintain the current CI/CD process while simultaneously evaluating BuildBuddy RBE without any disruptions.

To achieve this phase, several updates and new services were implemented:

Adding BuildBuddy Bazel remote configuration in Gerrit master branch.

Provisioning BuildBuddy Executors: A cloud host was provisioned with the following specifications: 128 CPUs, Intel(R) Xeon(R) Gold 6438Y+, 128GB RAM, and SSD. This host runs 3 BuildBuddy executors (as docker containers).

Setting up a new Gerrit CI Server: A new Jenkins server was set up to run build jobs against BuildBuddy RBE on the Gerrit master branch. This server is not accessible from outside.

Registering a new Gerrit verification: A new verification named RBE BB Build/Tests was added to gerrit-review.googlesource.com to trigger builds on the new Gerrit CI server whenever a new revision was created on the Gerrit master branch.

Figure 1: Architecture Migration diagram with default CI flow and new BuildBuddy CI flow:

How is the new CI/CD flow?

In the default CI flow, when a user creates a new revision (patch set) in Gerrit master, or stable-3.7 or stable-3.8 or stable-3.9 branches, a set of verification jobs trigger Jenkins jobs. These verification jobs include:

  • RBE GCP Build/Tests: Builds the codebase and executes all the unit/integration tests on GCP RBE.
  • Code Style: Checks Java and Bazel formatting, and JavaScript lint.
  • Build/Tests: Builds the codebase and executes one single no-op test.
  • PolyGerrit UI Tests: Executes unit/integration tests for PolyGerrit UI.

If any of the verification jobs fail, the verification status of the revision is marked with a -1.

As mentioned earlier, the intention when testing the reliability and stability of BuildBuddy RBE was to avoid interfering with the default CI/CD flow. To achieve this, a new verification job called RBE BB Build/Tests was added. This verification triggers a Jenkins job on the new Gerrit CI, which builds the codebase and executes unit/integration tests on BuildBuddy RBE. This setup allowed the default flow and the BuildBuddy RBE flow to coexist without affecting each other.

It is important to note two things:

  • Only revisions in the master branch of Gerrit project triggered this new verification job. The data collected from the master branch is sufficient to draw conclusions.
  • The status of this new verification job does not affect the overall verification status of the revision.

Figure 2: Verification jobs, default ones and the BuildBuddy RBE, triggered in a Gerrit master branch revision:

Once the first phase concluded, it was important to analyze the data to determine if BuildBuddy RBE was reliable and stable enough to proceed to the next phase. In the second phase, the plan was to evaluate the performance of BuildBuddy RBE against GCP RBE. Architecturally, the CI/CD process remained the same as in the first phase, with one key difference: the verification job RBE BB Build/Tests would be triggered when revisions were created for the Gerrit repo on the master, stable-3.7, stable-3.8, and stable-3.9 branches. This was necessary to ensure that BuildBuddy RBE handled the same number of jobs as GCP RBE, allowing for a fair performance comparison.

Data Collection

Before analysing the data, it’s imperative to elucidate our data collection methodology. To procure the build data (build number, execution time in GCP RBE and BB RBE and status), we developed a script in python that employed two APIs:

Notes:

  • Build number is a unique number represented by the tuple: (change number, revision number).
  • All the graphs show builds in chronological order.
  • The build numbers are not shown in the graphs for readable purposes.
  • Builds labelled as “RUNNING” or those lacking specification according to the API have been excluded from the calculations.

Key Performance Indicators

  • Average Build Time: Calculate the average build time for each platform (GCP RBE and BuildBuddy RBE) to understand the typical time it takes to complete a build on each platform.
  • Percentage of Builds Faster: Determine the percentage of builds that are completed faster on BuildBuddy RBE compared to GCP RBE. This helps assess which platform is more efficient in terms of build time.
  • Overall Success Rate / Failure Rate: Calculate the overall success and failing rate of builds on BuildBuddy RBE. This considers both successful and failed builds to provide a comprehensive view of platform reliability.
  • Outliers (>60 minutes): Identify the percentage of builds that exceed a certain threshold, such as 60 minutes in BuildBuddy RBE. This helps pinpoint builds that take exceptionally long and may require investigation or optimization.
  • Average Build Time Reduction: Determine the average reduction in build time when using BuildBuddy RBE compared to GCP RBE. This quantifies the efficiency improvement gained by using the BuildBuddy platform.

PHASES

As we mentioned above, the migration has been segmented into two distinct phases:

Phase 1: Spanning from December 28th, 2023, to February 9th, 2024, during which RBE BuildBuddy operated against the Gerrit master branch.
Phase 2: Commencing from February 10th, 2024, to February 26th, during which RBE BuildBuddy operated against the Gerrit master, stable-3.7, stable-3.8, and stable-3.9 branches.

Phase 1: Evaluate if BuildBuddy RBE offers stability and low latency

To make the data more readable and understandable, I have split the data into 2 graphs:

Figure 3: RBE Successful Build time for Gerrit master between 28th December 2023 to 18th January 2024:

Figure 4: RBE Successful Build time for Gerrit master between 19th January 2024 to 9th February 2024:

Total number of builds:

master
GCP Builds489
BB Builds489

Build status:

BB SuccessfulBB failed
GCP Successful39017
GCP Failed082

Initially, 3.47% of BuildBuddy RBE builds failed due to CPU exhaustion caused by running 100 BuildBuddy executors simultaneously. This problem was addressed by reducing the number of executors to 3. BuildBuddy engineers advise running only one executor container per host/node, with each executor capable of handling multiple RBE Actions concurrently. For each action, an executor initiates an isolated runner to execute it. We plan to reassess our configuration in due course.

Average build time when GCP and BuildBuddy builds were successful:

Minutes
GCP Average18.69
BB Average10.2

Where the average build time reduction is 8.49 minutes and 96.4% (376 out of 390 builds) of BuildBuddy builds are faster than GCP builds.

We discovered that 1.5% of BuildBuddy successful builds were outliers. This was due to the need for a restart of the new Gerrit CI server, which caused temporary disruptions.

change_numberREVISION_NUMBERGCP RBE MINUTESBB RBE MINUTES
40039816.7868.68
3996571113.71293.55
3996571421.45137.47
400958214.52154.3
247812726.6267.17
406597114.1879.55

Average time when GCP and BB Failed:

Minutes
GCP Average17.68
BB Average23.29

Conclusions:

Assessing performance and stability, the results were promising, with the BuildBuddy platform showcasing superior performance, as highlighted in the table “Average build time when GCP and BB Successful”. Additionally, issues with BuildBuddy failing builds during successful GCP builds were addressed, primarily stemming from resolved configuration problems. Although outliers represented a mere 1.5%, their significance was negligible. However, despite these favourable outcomes, caution was warranted due to the higher volume of builds in GCP compared to BuildBuddy, attributed to GCP’s operation across stable branches.

Phase 2: Compare BuildBuddy RBE with GCP RBE based on performance

To make the data more readable and understandable, The data has been splitted into 4 graphs:

Figure 5: RBE Successful Build time for Gerrit master:

Figure 6: RBE Successful Build time for Gerrit stable-3.9:

Figure 7: RBE Successful Build time for Gerrit stable-3.8:

Figure 8: RBE Successful Build time for Gerrit stable-3.7:

Successful BB Build status / Successful GCP Build status:

masterstable-3.9stable-3.8stable-3.7Total
Builds11926611162

Average time when GCP and BB Successful:

Minutes
GCP Average13.91
BB Average8.45

Where the average build time reduction is 5.46 minutes and 90.74% (147 out of 162 builds) of BuildBuddy builds are faster than GCP builds.

Failed BB Build states / Failed GCP Build status:

masterstable-3.9stable-3.8stable-3.7Total
Builds30121144

Failed BB Build status / Successful GCP Build status:

masterstable-3.9stable-3.8stable-3.7Total
Builds12003

It is worth noting that 1.14% of BuildBuddy builds failed.

Average time when GCP and BB builds failed

Minutes
GCP Average10.96
BB Average9.43

Conclusions:

The findings indicated that the BuildBuddy scenario demonstrated a more consistent performance, due to the on-premises allocated resources, as emphasised in the table “Average build time when GCP and BB Successful,” with comparable volumes of builds. Moreover, the stability remained highly consistent, evident from the table “Failed BB Build status / Successful GCP Build status,” alongside the absence of outliers.

Gerrit code review community decision

On February 27, 2024, the collected data was shared with the Gerrit code review open-source community. After careful consideration and thorough analysis, BuildBuddy was found to demonstrate remarkable stability. While it cannot be definitively stated that BuildBuddy surpasses GCP in all aspects, it notably outperforms GCP in terms of latency. Given its superior latency performance and strong stability, the decision was made to adopt BuildBuddy to replace GCP in the CI/CD pipeline.

Final migration phase

On March 29, 2024, the new Gerrit CI was established as the default CI using BuildBuddy RBE, and the following actions were taken:

  • Decommissioned the old Gerrit CI server.
  • Configured Gerrit CI to support both core and non-core plugin jobs, ensuring external visibility.
  • Unregistered the Gerrit verification RBE GCP Build/Tests on gerrit-review.googlesource.com.

Figure 9: Default Architecture diagram with BuildBuddy CI/CD flow as default CI/CD flow:

Final Conclusions

Following the completion of the migration, data on BuildBuddy RBE was collected from May 1, 2024, to June 24, 2024, to validate all assumptions. Subsequent statistical analysis yielded the following results:

Figure 10: Successful Builds:

Builds465
Mean13.62 min
Median10.47 min
Standard dev10.79 minA higher standard deviation indicates that the build times are spread out over a wide range, meaning there is a lot of variability in the times
Q315.23 min75% of builds are completed in less than 15.23 minutes.

Figure 11: Failed Builds:

Builds105
Mean7.72 min
Median6.22 min
Standard dev7.56 min
Q37.7 min75% of builds are completed in less than 7.7 minutes.

While we are satisfied with our current results, we recognize the need for improvements in our successful builds. Our next step will be to analyze all the build data provided by the BuildBuddy dashboard, including target-level metrics, timing, artifacts, cache, and executions. This analysis will help us enhance the Bazel configuration and improve build performance.

Figure 12: BuildBuddy dashboard


Alvaro Vilaplana-Garcia – Gerrit Code Review Contributor
Luca Milanesio – Gerrit Code Review Maintainer and Release Manager

GerritForge looks at a bright AI future in 2024

Looking back at 2023 in numbers

It has been an outstanding year for GerritForge and Gerrit Code Review in 2023, with excellent achievements on our 2023 GOALS.

The numbers show the GerritForge commitment throughout the past 12 months:

  • 853 changes merged (26% of the whole project contributions)
  • 47 projects, including Gerrit, JGit and major core and non-core plugins
  • 12 contributors
  • 4 maintainers, including the Gerrit Code Review release manager
  • 4 Gerrit community events, including the Gerrit User Summit 2023 and GerritMeets

Top #5 projects’ contributions

GerritForge has confirmed over 2023 its commitment to the Gerrit Code Review platforming, helping deliver two major releases: Gerrit v3.8 and v3.9.

The major contributions combined are focused on the plugins for extending the reach of the Gerrit platform, first and foremost the pull-replication and multi-site, as shown by the split of the 853 contributions across the projects, weighted by the number of changes and average modifications per change.

  1. Pull replication plugin
    This is where GerritForge excelled in providing an unprecedented level of performance over anything that has been built so far in terms of Git replication for Gerrit. Roughly one-third of the Team efforts have contributed to the pull replication plugin, which provided over 2022/23 a 1000x speedup factor compared to Gerrit tradition factor. GerritForge has further improved its stability, resilience and self-healing capabilities thanks to a fully distributed and pluggable message broker system.
  2. Gerrit v3.8 and v3.9
    GerritForge helped release two major versions of Gerrit Code Review, contributing noteworthy features like Java 17 support, cross-plugin communication, importing of projects across instances and the migration to Bazel 7.
  3. Owners plugin
    Jacek has completely revamped the engine of the owners plugin, boosting it with an unprecedented level of performance, hundreds of times faster than in the previous release, and bringing it to the modernity of submit requirements without the need to write any Prolog rules.
  4. Multi-site plugin
    The whole team helped provide more stability and bug fixes across multiple versions of Gerrit, from v3.4 up to the latest v3.9.
  5. JGit
    GerritForge kept its promises in stepping up its efforts in getting important fixes merged, including the optimisation of the refs scanning in Git Protocol v2 and the fix for bitmap processing with incoming Git receive-pack concurrency that we promised to fix at the beginning of 2023.

Migration of Eclipse JGit/EGit to GerritHub.io

The 2023 has also seen a major improvement in GerritHub stability and availability, halving the total outage in a 12-month period from 19 to 10 minutes, with a total uptime of 99.998% (source: PIngdom.com)

With the increased stability plus the new features of projects imports since v3.7, the Eclipse JGit and EGit projects have decided and completed their migration to GerritHub.io on the 21st of November, 2023. Since then, hundreds of changes have continued their reviews, and 62 of them have been merged on GerritHub.

The whole process was completed without any downtime and a reduced read-only window on the legacy Eclipse’s instance git.eclipse.org, which was needed because of the lack of multi-site support on the Eclipse side.

What we did achieve from our goals of 2023

  • JGit changes: we did merge 22 changes in 2023, most of them within the list of our targets for the year. One related to the packed-refs loading optimisation was abandoned (doesn’t get much traction from the rest of the community), and the last major one left is the priority queue refactoring still in progress on stable-6.6. Also, thanks to the migration of JGit/EGit to GerritHub.io, David Ostrovsky managed to get hold of its committer status and will now be able to provide more help in support in getting changes reviewed and merged.
  • JGit multi-pack index support: we did not have the bandwidth and focus to tackle this major improvement. The task is still open for anyone willing to help implement it.
  • Git repository optimiser: we kick-started the activity and researched the topic, with Ponch presenting the current status at the Gerrit User Summit 2023 in Sunnyvale CA.
  • Gerrit v3.8 and project-specific change numbers: the design document has been abandoned because of the need of rethinking its end-to-end user goals. However, we found and fixed many use cases where Gerrit wasn’t using the project/change-number pair for identifying changes, which is a pre-requisite for implementing any future project-specific change number use-case.
  • Gerrit Certified Binaries: the Platinum Enterprise Support for Gerrit has been enriched in 2023 with the certified binaries programme, with enhanced Gatling tests and E2E validation using AWS-Gerrit. Many bugs have been found and fixed in all the active versions of Gerrit; some of them were very critical and surprisingly undiscovered for months.
  • GerritForge Inc. revenue targets in the USA: the revenues increased by 50% in 2023, which was slightly below the initial expectations but still remarkable, despite the latest economic downturn of the past 12 months. 100% of the business has been transferred to the USA, including the GerritForge trademark and logo and we are now ready to start a new robust growth cycle in 2024 and beyond.

Looking at the future with AI in 2024

The recent economic news in the past 6 months has highlighted a difficult moment after the COVID-19 pandemic: the conjunction of the cost of living crisis, rising interest rates and two new major wars across the globe have pushed major tech companies to revise their small to medium-term growth figures, resulting in a series of waves of lay offs in the tech sector and beyond.

Whilst the layoffs are not immediately related to a lack of profitability of the companies involved, it highlights that in the medium term there will be a lot fewer engineers looking after the production systems across the company, including SCM.

SCM and Code Review are at the heart of the software lifecycle of tech companies and, therefore, represent the most critical part of the business that would need to be protected at all costs. GerritForge sees this change as a pivotal moment for stepping up its efforts in serving the community and helping companies to thrive with Gerrit and its Git SCM projects.

How do we maintain SCM stability with fewer people?

Gerrit Code Review has become more and more stable and reliable over the years, which should sound reassuring for all of those companies that are looking at a reduced staff and the challenge of keeping the lights on of the SCM. However, the major cause of disruption is represented by what is not linked to the SCM code but rather its data.

The Git repositories and their status are nowadays responsible for 80% of the stability issues with Gerrit and possibly with other Git servers as well. Imagine a system that is receiving a high rate of Git traffic (e.g. Git clone) of 100 operations per minute, and the system is able to cope thanks to a very optimised repository and bitmaps. However, things may change quickly and some of the user actions (e.g. a user performing a force-push on a feature branch) could invalidate the effectiveness of the Git bitmap and the server will start accumulating a backlog of traffic.

In a fully staffed team of SCM administrators and with all the necessary metrics and alerts in place, the above condition would trigger a specific alert that can be noticed, analysed, and actioned swiftly before anyone notices any service degradation.

However, when there is a shortage of Git SCM admins, the number of metrics and alerts to keep under control could be overwhelming, and the trade-offs could leave the system congestion classified as a lower-priority problem.

When a system congestion lasts too long, the incoming tasks queueing could reach its limits, and the users may start noticing issues. If the resource pools are too congested, the system could also start a catastrophic failure loop where the workload further reduces the fan out of the execution pool and causing soon a global outage.

The above condition is only one example of what could happen to a Git SCM system, but not the only one. There are many variables to take into account for preventing a system from failing; the knowledge and experience of managing them is embedded in the many of the engineers that are potentially laid off, with the potential of serious consequences for the tech companies.

GerritForge brings AI to the rescue of Git SCM stability

GerritForge has been active in the past 14 years in making the Git SCM system more suitable for enterprises from its very first inception: that’s the reason why this blog is named “GitEnterprise” after all.

We have been investing over 2022 and 2023 in analysing, gathering and exporting all the metrics of the Git repositories to the eyes and minds of the SCM administrators, thanks to open-source products like Git repo-metrics plugin. However, the recent economic downturn could leave all the knowledge and value of this data into a black hole if left in its current form.

When the work of analysing, monitoring and taking action on the data becomes too overwhelming for the size of the SCM Team left after the layoffs, there are other AI-based tools that can come to the rescue. However, none of them is available “out of the box” and their setup, maintenance and operation could also become an impediment.

GerritForge has a historic know-how on knowledge-based systems and has been lecturing the community about data collection and analysis for many years in the Gerrit Code Review community, for example the Gerrit DevOps Analytics initiative back in 2017. It is now the right time to push on these technologies and package them in a form that could be directly usable for all those companies who need it now.

Introducing GHS – GerritForge-AI Health Service

As part of our 2024 goals, GerritForge will release a brand-new service called GHS, directly addressing the needs of all companies that need to have a fully automated intelligent system for collecting, analysing and acting on the Git repository metrics.

The high-level description of the service has already been anticipated at the Gerrit User Summit 2023 in Sunnyvale by Ponch and the first release of the product is due in Q1 of 2024.

How does GHS work?

GHS is a multi-stage system composed of four basic processes:

  1. Collect the metrics of your Gerrit or other Git repositories automatically and publish them on your registry of choice (e.g. Prometheus)
  2. Combine the repository metrics with the other metrics of the system, including the CPU, memory and system load, automatically.
  3. Detect dangerous situations where the repository or the system is starting to struggle and suggest a series of remediation policies, using the knowledge base and experience of GerritForge’s Team encoded as part of the AI engine.
  4. Define a direct remediation plan with suggested priorities and, if requested, act on them automatically, assessing the results.

Stage 4, the automatic execution of the suggested remediation, can be also performed in cooperation with the SCM Administrators’ Team as it may need to go through the company procedures for its execution, such as change-management process or communication with the business.

However, if needed, point 4. can also be fully automated to allow GHS to act in case the SCM admins do not provide negative feedback on the proposed actions.

What the benefits of GHS for the SCM Team?

GHS is the natural evolution of GerritForge’s services, which have historically been proactive in the analysis of the Git SCM data and the proposal of an action plan. The GerritForge’s Health Check is a service that we have been successfully providing for years to our customers; the GerritForge Health Service is the completion of the End-to-End stability that the SCM Team needs now more than ever, to survive with a reduced workforce.

  • To the SCM Administrator, GHS provides the metrics, analysis and tailored recommendations in real-time.
  • To the Head of SCM and Release Management Team, GHS gives the peace of mind of keeping the system stable with a reduced workforce.
  • To the SCM users and developers, GHS provides a stable and responsive system throughout the day, without slowdowns or outages
  • To the Head of IT, GHS allows to satisfy the company’s needs of costs and head count reduction without sacrificing the overall productivity of the Teams involved.

GerritForge’s pledges to Gerrit Code Review quality and Open-Source

GerritForge has provided Enterprise Support and free contributions to Gerrit Code Review for 14 years, pretty much since the beginning of the project. We pledged in the past to be always 100% Open-Source and do commit to our promises.

For 2024, GerritForge will focus on delivering its promising Open-Source contributions to the stability and reliability of Gerrit Code Review, with:

  • Support for the Gerrit Code Review platform releases, Gerrit v3.10 and v3.11
  • Free support and development of the Gerrit CI validation process, in collaboration with all the other Gerrit Code Review contributors and maintainers
  • Free Open-Source fixes for all critical problems raised by any of its Enterprise Support Customers, available to everyone in the Gerrit Code Review community
  • Free Open-Source code base for the main four components of the new GHS product, following the Open-Core methodology for developing the service.

With regards to the initiatives that we started in the past few years, including pull-replication and multi-site, we believe they have reached a maturity level that would not need further major refactoring and extensions in 2024. We will continue to support and improve them over the years, based on the feedback and support requests coming from the Enterprise Support Customers and the wider Gerrit Open-Source community.

GHS AI engine and dogfooding on GerritHub.io.

GHS will have a rule-based AI system that will drive all the main decisions on the selection and prioritisation of the corrective actions on the system. As with all AI systems, the engine will need to start with a baseline knowledge and intelligence and evolve based on the experience made on real-life systems.

GerritForge’s commitment to quality is based on the base principle of dogfooding, where we use the system we develop every single day and learn from it. The paradigm is on the basis of our 14 years of success and we are committed to using it also for the development of GHS.

GerritForge has been hosting GerritHub.io since 2013, and tens of thousands of people and hundreds of companies are using it for their private and Open-Source projects every single day. The system is fully self-serviced; however, still relies on manual maintenance from our Gerrit and Git SCM admins.

We are committed to starting using GHS on GerritHub.io from day 1 and use the metrics and learning of the systems to improve its AI rule engine continuously. All customers of GerritForge’s GHS service will therefore benefit from historic knowledge and experience induced by the the learnings and optimisations made on GerritHub.io for the months and years to come.

GHS = Git SCM admins humans and AI-robots working together

GHS will revolutionise the way Git SCM admins are managing the system today: they will not be alone anymore, juggling a series of tools to understand what’s going on, but they will have an intelligent and expert robot at their service, driven by the wisdom and continuous learnings made by GerritForge, at their service every single day.

We expect a different future way of working in front of us: we are embracing this radical change in how people and companies work and making GHS serve them effectively and in line with our Open-Source pledges.

The future is bright with GerritForge-AI Health Service, Git and Gerrit Code Review at your service !


Luca Milanesio
GerritForge CEO
Gerrit Code Review Release Manager and member of the Engineering Steering Committee