How to migrate Gerrit from v2.15 to v2.16

Time has come to migrate gerrithub.io to the latest Gerrit v2.16, from the outdated v2.15 we had so far. The big change between the two is the full adoption of NoteDB: the internal Gerrit groups were still kept in ReviewDb on v2.15, which forced us to keep a PostgreSQL instance active in production. This means we can finally say goodbye to the ReviewDb ūüĎč and eliminated yet another SPoF (Single-Point-of-Failure) from the GerritHub high-availability infrastructure.

Migrating to Gerrit v2.16 implies:

  1. Gerrit WAR upgrade
  2. GIT repos upgrade because of a change in the NoteDb format
  3. Change in the database used, from PostgreSQL to H2 (for the schema_version)
  4. Introduction of the new Projects index

The above is a quite complex process and, here at GerritForge, we executed the migration on a running GerritHub.io with 15k of active users avoiding any downtime during the migration.

Architecture

This is the initial architecture we are starting the GerritHub.io v2.15 migration from:

Initial status - 15_01.png

In this setup, we have 2 sites, one in Canada (active) and one in Germany (active for analytics and disaster recovery). The latter is aligned with the active master via replication plugin.

The HA Plugin used between the 2 Canadian nodes is a GerritForge fork enhanced with the ability to align the Lucene Indexes, Caches and Events when sharing repositories via NFS with caching enabled.

NOTE: The original High-Availability plugin is certified and tested on Gerrit v2.14 / ReviewDb only and requires the use of NFS without caching, which requires a direct fiber-channel connection between the Gerrit nodes the disks.

The traffic is routed with HAProxy to the active node. This allows us easy code migrations with no downtimes, using what we call the ‚Äúping-pong‚ÄĚ technique between the Canadian and the German site, which is inspired by the classical Blue/Green deployment with some adjustments for the peculiarities of the Gerrit multi-site setup.

The migration pattern, in a nutshell, is composed of the following phases:

  1. Upgrade code in Germany
    The Gerrit site in Germany is used for Analytics and thus can be upgraded first with low risk associated.
    German site -> passive, Canadian site -> active
     
  2. Redirect traffic in Germany
    Once the site in Germany is ready and warmed up, the GerritHub users are redirected to it. GerritHub is technically serving the v2.16 user-experience to all users.
    German site -> active, Canadian site -> passive
     
  3. Upgrade code in Canada
    The site in Canada is put offline and upgraded as well.
    German site -> active, Canadian site -> passive
     
  4. Redirect traffic back to Canada
    Once the site in Canada is fully ready and warmed up, the entire user-base is redirected back.
    German site -> passive, Canadian site -> active

Each HAProxy has the same configuration with a primary and 2 backups as follow:

HAProxy CA Primary.png

Timeline of events – 2nd of Jan 2019

2/1/2019 – 8:00 GMT: Starting point of the GerritHub configuration

  • Review-1 – Gerrit 2.15 – active node
  • Review-2 – Gerrit 2.15 – ¬†failover node
  • Review-DE – Gerrit 2.15 – analytics node, used for disaster recovery

2/1/2019 Р10:10 GMT: Upgrade disaster recovery server

  • Stopped all services using Gerrit on review-de (we use the disaster recovery to crunch and serve the analytics dashboard)
  • Disabled replication plugin
  • Stopped Gerrit 2.15 and upgraded to Gerrit 2.16
  • Restarted Gerrit

2/1/2019 Р10:44 GMT: Re-enabled disaster recovery server 

  • Re-Enabled replication from review 1…boom!
    • First issue: mirror option of the replication plugin was set to true, hence all the branches containing the groups on the All-Users repo been dropped from the recovery server. All the Groups were suddenly gone from the disaster recovery server
  • Remove mirror option in replication plugin
  • Re-Enabled replication from review-1…this time everything was ok!
  • Migration re-executed and everything was fine

2/1/2019 –¬†11:00 GMT:¬†Removed ReviewDB

  • Once we were happy with the replication of the Groups we could remove PostgreSQL

The only information left outside NoteDB is the schema_version table, which contains only one row and it is static. We moved it into H2 by copying the DB from a vanilla 2.16 installation and changing Gerrit Config to use it.

DE 2.16 - 15_01.png

Before the next step, we had to wait for the online reindexing on review-de to finish (~2 hours).

Note:¬†we didn’t consider offline¬†reindexing since it is basically sequential, and it would have been way slower compared to the concurrent¬†online one. Additionally, it does not compute all the Prolog rules in full.

2/1/2019 Р15:15 GMT: Reduce delta between masters

  • Reducing the delta of data between the 2 sites (Canada and Germany) will allow having a shorter read-only window when upgrading the currently active master
  • Manually replicate and reindex misaligned repositories on review-de (see below the effect on the system load)

Screenshot 2019-01-14 at 20.33.10.png

Screenshot 2019-01-14 at 20.33.23.png

  • Pro tip: if you want to check queue status to see, for example, if the replication is still ongoing this command can be used:

    ssh -p 29419 <gerrit_admin_user>@localhost \
                 gerrit show-queue --by-queue --wide

2/1/2019 Р15:50 GMT: Final master catchup

  • Switched on read-only plugin on the active master
  • Service degraded for few minutes (i.e.: Gerrit was read-only), but most of the operations were available, i.e.: Gerrit index/query/plugin/version, git-upload-pack, replication
  • Waited for review-de to catch up with the latest changes that come in review-1 (we monitored it using the above ‚Äúgerrit show-queue‚ÄĚ command)

CA Readonly - 15_01.png

2/1/2019 Р15:54 GMT: Made disaster recovery active

  • Changed HAProxy configuration, and reloaded, to re-direct all the traffic to review-de, which become the active node in the cluster

HAProxy-DE-primary-transition.png

  • See the transition of the traffic to review-de

Screenshot 2019-01-14 at 20.39.22.png

  • Left review-de the whole night as the primary site. This way we also tested the disaster recovery site stability

DE Active - 15_01.png

2/1/2019 Р19:47 GMT: Upgrade review-1 and review-2 to Gerrit 2.16

  • Stopped Gerrit 2.15 and upgraded to Gerrit 2.16
  • Wait for offline reindexing of Projects, Accounts and Groups
  • Started with Gerrit 2.16 with online reindexing of the changesCA 2.16 - 15_01.png

It was possible to see an expected increase in the system load due to the reindexing, lasted for about 2 hours:

System load.png

Furthermore, despite review-1 not being the active node, the HTTP workload grew disproportionately:

HTTP requests.png

This was due to a well-known issue of the high-availability plugin, where the reindexing are forwarded to the passive nodes, creating an excessive workload on them.

3/1/2019 Р10:14 GMT: Made review 1 active

  • We used the same pattern used when upgrading¬†review-de to align the data between masters
  • Changed HAProxy configuration, and reloaded, to re-direct back all the traffic to review-1

 

Final - 15_01.png

Conclusions

Migration was completed and production is back to stable again with the latest and greatest Gerrit v2.16.2 and the full PolyGerrit UI. With the migration of the Groups in NoteDB, ReviewDB leaves the stage completely to NoteDB. PostgreSQL is no more needed, simplifying the overall architecture.

The migration itself was quite smooth, the only issue was due to a plugin misconfiguration, nothing to have with Gerrit core. With the good monitoring we have in place, we managed to spot the issues straight away. Still, we will further automate our release process to avoid these issues from happening again.

Fabio Ponciroli (aka Ponch) – Gerrit Code Review Contributor – GerritForge

New year, free GerritHub: unlimited private reviews with anyone, forever

Today GitHub has announced the extension of its free plan to include unlimited private repositories. This is great because allows a lot more people to start experimenting their side projects and keep them confidential until they ready to be shared publicly.

GerritHub.io allows extending this amazing offer by having a fully-featured code review process on top of their GitHub private repositories and still keep the confidentiality needed for early-stage projects. Differently from GitHub, however, GerritHub allows you to have an unlimited number of reviewers and collaborators, for free, forever.

A wonderful new 2019 is starting with two amazing free offers to allow everyone to experiment and unleash their potential:

  • Free unlimited repos from GitHub, limited to 3 collaborators
  • Free unlimited repos from GerritHub, with unlimited collaborators for reviews

That’s super-cool, how do I start?

Getting started with your private GitHub repositories on GerritHub is easy:

  1. Go to https://review.gerrithub.io
  2. Click the top-right “Sign-in” link
  3. Select “Private” option and click the top-right “Login” button
  4. Enter your GitHub credentials
  5. Allow GerritHub to access in reading/writing your private repositories
  6. Select the GitHub SSH keys and profile into Gerrit, and click the top-right “Next” button
  7. Select the organization and repositories to import into GerritHub, and click the top-right “Import” button
  8. Select the GitHub PRs you want to import into GerritHub for review, and click the top-right “Import Selected” button

Once you’re done with the above steps, you’re up-and-running with GerritHub and you are free to invite collaborators and accept reviews.

You can follow the GerritHub video on YouTube which describes the above process.

I am new to Gerrit Code Review, where do I start?

There is plenty of information on the web about Gerrit Code Review. The best place to start is the project’s tutorial in the documentation.

Alternatively, you can watch the presentation by Shawn Pearce, the Gerrit Code Review project’s founder.

 

Have questions? Get in touch with the Community.

In case of issues, questions, you can get in touch with the Gerrit Code Review Community, and they will be happy to guide you through and provide support.

Want to use Gerrit into your Enterprise?

If you decide to use Gerrit Code Review in your Enterprise and you need the service level compliant with your company standards, you can get in touch with GerritForge which offers the full coverage of the Enterprise Support you will need:

  • Silver: 8×5 Support, with 24h turnaround for P1 issues
  • Gold: 24×7 Support, with 8h turnaround for P1 issues
  • Platinum: 24x7x365 Support, with 4h turnaround for P1 issues

What’s next?

With GitHub and GerritHub you have no excuses anymore to start innovating right now, with free unlimited repositories and free unlimited Gerrit reviewers and contributors.

Go and innovate, the future is now.