Accelerate with Gerrit DevOps Analytics, in one click!

Featured

 

Accelerating your time to market while delivering high-quality products is vital for any company of any size. This fast pacing and always evolving world relies on getting quicker and better in the production pipeline of the products. The whole DevOps and Lean methodologies help to achieve the speed and quality needed by continuously improving the process in a so-called feedback loop. The faster the cycle, the quicker is the ability to achieve the competitive advantage to outperform and beat the competition.

It is fundamental to have a scientific approach and put metrics in place to measure and monitor the progress of the different actors in the whole software lifecycle and delivery pipeline.

Gerrit DevOps Analytics (GDA) to the rescue

We need data to build metrics to design our continuous improvement lifecycle around it. We need to juice information from all the components we use, directly or indirectly, on a daily basis:

  • SCM/VCS (Source and Configuration Management, Version Control System)
    how many commits are going through the pipeline?
  • Code Review
    what’s the lead time for a piece of code to get validated?
    How are people interacting and cooperating around the code?
  • Issue tracker (e.g. Jira)
    how long does it take the end-to-end lifecycle outside the development, from idea to production?

Getting logs from these sources and understanding what they are telling us is fundamental to anticipate delays in deliveries, evaluate the risk of a product release and make changes in the organization to accelerate the teams’ productivity. That is not an easy task.

Gerrit DevOps Analytics (aka GDA) is an OpenSource solution for collecting data, aggregating them based on different dimensions and expose meaningful metrics in a timely fashion.

GDA is part of the Gerrit Code Review ecosystem and has been presented during the last Gerrit User Summit 2018 at Cloudera HQ in Palo Alto. However, GDA is not limited to Gerrit and is aiming at integrating and processing any information coming from other version control and code-review systems, including GitLab, GitHub and BitBucket.

Case study: GDA applied to the Gerrit Code Review project

One of the golden rules of Lean and DevOps is continuous improvement: “eating your dog food” is the perfect way to measure the progress of the solution by using its outcome in our daily life of developing GDA.

As part of the Gerrit project, I have been working with GerritForge to create Open Source tools to develop the GDA dashboards. These are based on events coming from Gerrit and Git, but we also extract data coming from the CI system, the Issue tracker. These tools include the ETL, for the data extraction and the presentation of the data.

As you will see in the examples Gerrit is not just the code review tool itself, but also its plugins ecosystem, hence you might want to include them as well into any collection and processing of analytics data.

Wanna try GDA? You are just one click away.

We made the GDA more accessible to everybody, so more people can play with it and understand its potentials. We create the Gerrit Analytics Wizard plugin so you can have some insights in your data with just one click.

What you can do

With the Gerrit Analytics Wizard you can get started quickly and with only one click you can get:

  • Initial setup with an Analytics playground with some defaults charts
  • Populate the Dashboard with data coming from one or more projects of your choice

The full GDA experience

When using the full GDA experience, you have the full control of your data:

  • Schedule recurring data imports. It is just meant to run a one-off import of the data
  • Create a production ready environment. It is meant to build a playground to explore the potentials of GDA

What components are needed?

To run the Gerrit Analytics Wizard you need:

You can find here more detailed information about the installation.

One click to crunch loads of data

Once you have Gerrit and the GDA Analytics and Wizard plugins installed, chose the top menu item Analytics Wizard > Configure Dashboard.

You land on the Analytics Wizard and can configure the following parameters:

  • Dashboard name (mandatory): name of the dashboard to create
  • Projects prefix (optional): prefix of the projects to import, i.e.: “gerrit” will match all the projects that are starting with the prefix “gerrit”. NOTE: The prefix does not support wildcards or regular expressions.
  • Date time-frame (optional): date and time interval of the data to import. If not specified the whole history will be imported without restrictions of date or time.
  • Username/Password (optional): credentials for Gerrit API, if basic auth is needed to access the project’s data.

Sample dashboard analytics wizard page:

wizard.pngOnce you are done with the configuration, press the “Create Dashboard” button and wait for the Dashboard, tailored to your data, to be created (beware this operation will take a while since it requires to download several Docker images and run an ETL job to collect and aggregate the data).

At the end of the data crunching you will be presented with a Dashboard with some initial Analytics graphs like the one below:

dashboard-e1549490575330.png

You can now navigate among the different charts from different dimensions, through time, projects, people and Teams, uncovering the potentials of your data thanks to GDA!

What has just happened behind the scenes?

When you press the “Create Dashboard” button, loads of magic happens behind the scenes. Several Docker images will be downloaded to run an ElasticSearch and Kibana instance locally, to set up the Dashboard and run the ETL job to import the data. Here a sequence workflow to illustrate the chain of events is happening:

components.png

Conclusion

Getting insights into your data is so important and has never been so simple. GDA is an OpenSource and SaaS (Software as a Service) solution designed, implemented and operated by GerritForge. GDA allows setting up the extraction flows and gives you the “out-of-the-box” solution for accelerating your company’s business right now.

Contact us if you need any help with setting up a Data Analytics pipeline or if you have any feedback about Gerrit DevOps Analytics.

Fabio Ponciroli – Gerrit Code Review Contributor – GerritForge Ltd.

How to migrate Gerrit from v2.15 to v2.16

Time has come to migrate gerrithub.io to the latest Gerrit v2.16, from the outdated v2.15 we had so far. The big change between the two is the full adoption of NoteDB: the internal Gerrit groups were still kept in ReviewDb on v2.15, which forced us to keep a PostgreSQL instance active in production. This means we can finally say goodbye to the ReviewDb 👋 and eliminated yet another SPoF (Single-Point-of-Failure) from the GerritHub high-availability infrastructure.

Migrating to Gerrit v2.16 implies:

  1. Gerrit WAR upgrade
  2. GIT repos upgrade because of a change in the NoteDb format
  3. Change in the database used, from PostgreSQL to H2 (for the schema_version)
  4. Introduction of the new Projects index

The above is a quite complex process and, here at GerritForge, we executed the migration on a running GerritHub.io with 15k of active users avoiding any downtime during the migration.

Architecture

This is the initial architecture we are starting the GerritHub.io v2.15 migration from:

Initial status - 15_01.png

In this setup, we have 2 sites, one in Canada (active) and one in Germany (active for analytics and disaster recovery). The latter is aligned with the active master via replication plugin.

The HA Plugin used between the 2 Canadian nodes is a GerritForge fork enhanced with the ability to align the Lucene Indexes, Caches and Events when sharing repositories via NFS with caching enabled.

NOTE: The original High-Availability plugin is certified and tested on Gerrit v2.14 / ReviewDb only and requires the use of NFS without caching, which requires a direct fiber-channel connection between the Gerrit nodes the disks.

The traffic is routed with HAProxy to the active node. This allows us easy code migrations with no downtimes, using what we call the “ping-pong” technique between the Canadian and the German site, which is inspired by the classical Blue/Green deployment with some adjustments for the peculiarities of the Gerrit multi-site setup.

The migration pattern, in a nutshell, is composed of the following phases:

  1. Upgrade code in Germany
    The Gerrit site in Germany is used for Analytics and thus can be upgraded first with low risk associated.
    German site -> passive, Canadian site -> active
     
  2. Redirect traffic in Germany
    Once the site in Germany is ready and warmed up, the GerritHub users are redirected to it. GerritHub is technically serving the v2.16 user-experience to all users.
    German site -> active, Canadian site -> passive
     
  3. Upgrade code in Canada
    The site in Canada is put offline and upgraded as well.
    German site -> active, Canadian site -> passive
     
  4. Redirect traffic back to Canada
    Once the site in Canada is fully ready and warmed up, the entire user-base is redirected back.
    German site -> passive, Canadian site -> active

Each HAProxy has the same configuration with a primary and 2 backups as follow:

HAProxy CA Primary.png

Timeline of events – 2nd of Jan 2019

2/1/2019 – 8:00 GMT: Starting point of the GerritHub configuration

  • Review-1 – Gerrit 2.15 – active node
  • Review-2 – Gerrit 2.15 –  failover node
  • Review-DE – Gerrit 2.15 – analytics node, used for disaster recovery

2/1/2019 – 10:10 GMT: Upgrade disaster recovery server

  • Stopped all services using Gerrit on review-de (we use the disaster recovery to crunch and serve the analytics dashboard)
  • Disabled replication plugin
  • Stopped Gerrit 2.15 and upgraded to Gerrit 2.16
  • Restarted Gerrit

2/1/2019 – 10:44 GMT: Re-enabled disaster recovery server 

  • Re-Enabled replication from review 1…boom!
    • First issue: mirror option of the replication plugin was set to true, hence all the branches containing the groups on the All-Users repo been dropped from the recovery server. All the Groups were suddenly gone from the disaster recovery server
  • Remove mirror option in replication plugin
  • Re-Enabled replication from review-1…this time everything was ok!
  • Migration re-executed and everything was fine

2/1/2019 – 11:00 GMT: Removed ReviewDB

  • Once we were happy with the replication of the Groups we could remove PostgreSQL

The only information left outside NoteDB is the schema_version table, which contains only one row and it is static. We moved it into H2 by copying the DB from a vanilla 2.16 installation and changing Gerrit Config to use it.

DE 2.16 - 15_01.png

Before the next step, we had to wait for the online reindexing on review-de to finish (~2 hours).

Note: we didn’t consider offline reindexing since it is basically sequential, and it would have been way slower compared to the concurrent online one. Additionally, it does not compute all the Prolog rules in full.

2/1/2019 – 15:15 GMT: Reduce delta between masters

  • Reducing the delta of data between the 2 sites (Canada and Germany) will allow having a shorter read-only window when upgrading the currently active master
  • Manually replicate and reindex misaligned repositories on review-de (see below the effect on the system load)

Screenshot 2019-01-14 at 20.33.10.png

Screenshot 2019-01-14 at 20.33.23.png

  • Pro tip: if you want to check queue status to see, for example, if the replication is still ongoing this command can be used:

    ssh -p 29419 <gerrit_admin_user>@localhost \
                 gerrit show-queue --by-queue --wide

2/1/2019 – 15:50 GMT: Final master catchup

  • Switched on read-only plugin on the active master
  • Service degraded for few minutes (i.e.: Gerrit was read-only), but most of the operations were available, i.e.: Gerrit index/query/plugin/version, git-upload-pack, replication
  • Waited for review-de to catch up with the latest changes that come in review-1 (we monitored it using the above “gerrit show-queue” command)

CA Readonly - 15_01.png

2/1/2019 – 15:54 GMT: Made disaster recovery active

  • Changed HAProxy configuration, and reloaded, to re-direct all the traffic to review-de, which become the active node in the cluster

HAProxy-DE-primary-transition.png

  • See the transition of the traffic to review-de

Screenshot 2019-01-14 at 20.39.22.png

  • Left review-de the whole night as the primary site. This way we also tested the disaster recovery site stability

DE Active - 15_01.png

2/1/2019 – 19:47 GMT: Upgrade review-1 and review-2 to Gerrit 2.16

  • Stopped Gerrit 2.15 and upgraded to Gerrit 2.16
  • Wait for offline reindexing of Projects, Accounts and Groups
  • Started with Gerrit 2.16 with online reindexing of the changesCA 2.16 - 15_01.png

It was possible to see an expected increase in the system load due to the reindexing, lasted for about 2 hours:

System load.png

Furthermore, despite review-1 not being the active node, the HTTP workload grew disproportionately:

HTTP requests.png

This was due to a well-known issue of the high-availability plugin, where the reindexing are forwarded to the passive nodes, creating an excessive workload on them.

3/1/2019 – 10:14 GMT: Made review 1 active

  • We used the same pattern used when upgrading review-de to align the data between masters
  • Changed HAProxy configuration, and reloaded, to re-direct back all the traffic to review-1

 

Final - 15_01.png

Conclusions

Migration was completed and production is back to stable again with the latest and greatest Gerrit v2.16.2 and the full PolyGerrit UI. With the migration of the Groups in NoteDB, ReviewDB leaves the stage completely to NoteDB. PostgreSQL is no more needed, simplifying the overall architecture.

The migration itself was quite smooth, the only issue was due to a plugin misconfiguration, nothing to have with Gerrit core. With the good monitoring we have in place, we managed to spot the issues straight away. Still, we will further automate our release process to avoid these issues from happening again.

Fabio Ponciroli (aka Ponch) – Gerrit Code Review Contributor – GerritForge