GerritHub and Zero-Downtime Upgrade

GerritHub gets bigger on Mon, 21 March 08:00 GMT


GerritHub has experienced unprecedented growth over the past two years. The November 2015 numbers presented at the Google User Summit in Mountain View – CA have been surpassed again, and we do need to make sure that our infrastructure is still capable of dealing with current and future users’ needs.

What is changing in GerritHub.io?

We are changing everything, from the version of Gerrit to the hardware, network and storage infrastructure. Data, DBMS, Indexes and cache, need to be upgraded and refreshed to make sure that the new systems are reflecting exacting the current production data and sessions.
We are changing as well the geo-location of our servers, from the current server farm in Germany (Bayern, Nuremberg – 100 MBps) to a new server farm in Canada (Quebec, Beauharnois – 1 GBps).

Why have so many changes?

We started to measure some significant delay in the Git and review operations on the old infrastructure, mainly due to three factors:

  1. More users, more repositories, more concurrency. Individuals, OpenSource projects and Businesses started using GerritHub.io for their mission-critical repositories, considering Gerrit the “source of truth” of their review workflow. We needed more horsepower, memory, storage and ability to scale even further.
  2. Bandwidth from USA and Far-east. The majority of people using GerritHub.io are from the other side of the Atlantic Ocean: this is typically not a problem from 7 AM to 3 PM … but after 4 PM the connectivity between Europe and the Americas becomes slow. Additionally, people using GerritHub.io from India, Japan, Australia and New Zealand experienced terrible slowdowns because of the excessive number of hops to reach Germany.
  3. Gerrit master is much faster. Based on the current data and metrics measured on GerritHub.io, we have contributed a lot of patches to reduce the overhead caused by Gerrit DB and lessen the number of SQL queries per minute. All those new improvements are on Gerrit master, and we need to catch-up with the “latest and greatest” version.

Will I experience any GerritHub.io outage?

Last time that GitHub needed to make a major upgrade, asked his 5M users to stop working for 23 minutes,. This translates to a loss of two millions of hours of continuous delivery lifecycle, equivalent to over 130 man/years, worth no less than eight millions dollars.
We are going to adopt a new Zero-Downtime Gerrit roll-out strategy to make sure that all those changes are not going to impact your day-by-day activity. If you were not reading this post you would possibly even not notice the “switch” from the old to the new infrastructure, apart from the increase in speed and bandwidth.

Zero-downtime GerritHub.io migration, step by step with the associated expected timings.

Phase 0 – Replication to the new Gerrit infrastructure. (- 1 month ago)
We started migrating everything one month ago, and the old and new infrastructure are working side-by-side, thanks to Gerrit master-slave replication. The new Gerrit servers are active as slaves and are read-only.

Phase 1 – Migration kick-off. (08:00 GMT)
We install a Gerrit plugin that rejects all the push to GerritHub.io repositories providing a courtesy message: “Gerrit is under maintenance, all projects are READ ONLY”. All the HTTP POST, PUT and DELETE are disable on the Gerrit REST-API.

Phase 2 – Wait for replication events to complete and migrate DB. (08:02 GMT)
Git repositories are continuously replicated, but we do need to make sure that the event queue is empty. Once that happens we schedule the last final DB migration to the new infrastructure.

Phase 3 – Gerrit DB upgrade and reindex (08:04 GMT)
New Gerrit server executes the final upgrade and off-line reindex of the latest received changes.

Phase 4 – Gerrit start-up and cache warm-up (08:05 GMT)
New Gerrit is restarted and the most critical Gerrit caches (projects, accounts and groups) are pre-loaded in memory. This allows the incoming traffic spike to avoid the collapse of used threads and makes the transition as smooth as possible without slowdowns.

Phase 5 – Traffic switch and DNS updates (08:06 GMT)
GerritHub.io redirects all incoming HTTPS and SSH traffic to the new infrastructure. Git pushes and HTTP PUT, POST and DELETE operations of the REST API are operational again and served by the new Gerrit infrastructure. GerritHub.io DNS is updated to the new Canadian IPs.

Phase 6 – New IPs gets propagate to all worldwide DNSs (+ 1 day)
Once all the DNSs in the world would have been updated, everyone will start going directly to the new infrastructure without further hops or redirection from Germany. Customers from USA, Canada, South America, Asia, Japan, New Zealand and Australia should see a significant reduction of the network latency and increase of GerritHub.io responsiveness.

Firewall and SSH considerations

Even if Gerrit server’s SSH key is not changing, some of you may see a warning similar to this when they push or pull over SSH:

Warning: the RSA host key for ‘review.gerrithub.io’ differs from the key for the IP address ‘148.251.77.70’

The warning message will also tell you which lines in your ~/.ssh/known_hosts need to change. Open that file in your favorite editor, remove or comment out those lines, then retry your push or pull.

Should your network have some strict firewall rules to access external sites, you may want to whitelist the IP of the new infrastructure WLB to: 192.99.233.76.

Follow GerritHub.io migration progress.

We will advertise the migration progress on Twitter at @GitEnterprise. Should you have any issue you can tweet us or contact GerritForge Customer Support at support@gerritforge.com.

Advertisements

Gerrit User Summit & Hackathon 2015

Gerrit User Summit at Google Mountain View – CA

Gerrit user Summit 2015

Exciting times this year at the Gerrit User Summit and Hackathon 2015: the major contributors and players of the Gerrit community shared experiencing, opinions and news in an intense 7-days event in Google – Mountain View – CA.

The User Summit

The first two days have seen the User Summit at the centre of the stage: scalability, scalability and scalability have been the Leitmotif of the discussion.
GerritHub.io started the saga with the astonishing numbers of the two years growth:

  • 6.5K users (+580%)
  • 41K changes (+1700%)
  • 16K projects (+530%)
  • 500GBytes (+500%)

The two years of live production experience with users coming from GitHub have highlighted problems (replication, repositories sharding, multi-master) and possible solutions, ranging from Gerrit Virtual Private Hosting to the on-premises deployment integrated with BitBucket and GitHub:Enterprise.
Ericsson continued with a very detailed description of how Gerrit is used as “Enterprise Washing Machine” of all code that goes through the development pipeline: scalability and control are the fundamental keywords that were repeatedly mentioned and enforced.
The replication across sites have been massively improved regarding performance and stability since the introduction of Git/HTTPS as the main protocol for replication, as previously advised by GerritForge in 2014.
The first day ended with Google smashing out everyone his hypersonic numbers:

  • 1.6M changes / 3.8M patch-sets
  • 240 Gerrit virtual nodes
  • 2.5M repositories

The second day was all projected on the future of Gerrit, with three very interesting features coming soon.

Gerrit 3.0 and NoteDB

Dave Borowitz presented the status of the replacement of Gerrit DBMS
with a 100% open standard solution based on commit notes, implemented by all OpenSource and Commercial “flavours” of Git. The solution will allow interoperability with other code-review implementations (e.g. Phabricator) and fully enable reviews replication and off-line operations.
The new DBMS-free version of Gerrit will be called the Ver. 3.0 and it will be the next version after current Gerrit master (Ver. 2.13) gets released. released.

Gerrit PluginManager: one plugin to rule them all

Luca Milanesio presented a new vision on how to discover and install plugins on the Gerrit platform: Gerrit PluginManager. We should not bundle more and more plugins with Gerrit, which eventually would lead to an explosion of the WAR file-size. You can now install only a “plugin-manager” which then guide you through the “one-click” set-up of all the others.
Differently from similar solutions such as Jenkins Update Centre, the Gerrit PluginManager is based on the live status of the plugins and their compatibility with Gerrit: as soon as one plugin gets patched and successfully compiled with a version of Gerrit, it will automatically be listed and made available by the PluginManager. Additionally GerritForge will provide a list of certified and guaranteed plugins that have been successfully tested with Gerrit.

PolyGerrit, the new web-component UX for Gerrit.

Andrew Bonventre is the Googler that is driving the transformation of Gerrit UX to the new Polymer platform, however for personal reasons he was not able to attend the Summit. Dave Borowitz replaced him and unveiled that will be the (very) near future of Gerrit UX: no more GWT and complex and black magic transforming Java in obscure JavaScript, the future is coming through web-components, a new emerging and promising standard for the HTML and modern web-browsers.
Despite the UX being at very early stages, Dave was able to showcase a fully functional list of changes and search box powered by Polymer and calling Gerrit REST-API, really fast and promising!

Gertty, the text-only Gerrit Code Review.

On the complete opposite side, why not using Gerrit from an 80×25 text-only console? Gertty is an astonishing 100% char-based console, all based on Gerrit REST-Api and a local SqLite local DB for caching changes. Allows complete off-line operations and synchronisation with Gerrit changes. Productive and effective while you are on the go.

CollabNet and Gerrit tuning cheat-sheet

CollabNet presented a useful four pages brochure to guide through the tuning of a Gerrit set-up for a small, medium or large installation. Based on their experience of running TeamForge SCM, the commercial fork of Gerrit Code Review based on their existing TeamForge ALM proprietary solution, they have been able to experiment the Gerrit default settings and learn how to adjust them to leverage the full power of your setup.
The audience appreciated the effort and encouraged CollabNet to post all the findings as Gerrit reviews to get the code-base better and improve the default settings of the set-up.

Gerrit and Jenkins Workflow dance together with Docker.

Cloudbees presented a very effective demo on how to use Gerrit and Jenkins Workflow plugin to implement a real-life Continuous Delivery Pipeline. The presentation leveraged the use of Docker images as previously presented by Stefano Galarraga in his “Gerrit and Jenkins Continuous Delivery Pipeline for BigData” talk. They both explained and showed how Code Review is key in implementing a successful and smooth code validation and roll-out. Stefano’s presentation made use of the new exciting feature of “topic submission” that enables the grouping and commit of multiple changes across repositories.

The Hackathon, topics and improvements.

Gerrit Hackathon 2015

A lot of code have been pushed during the hackathon:

  • 400 changes merged
  • 3 Gerrit version released

Gerrit metrics

Surely the hot-topic of the hackathon has been the introduction of a new pluggable metrics engine in Gerrit, currently half-merged in master branch. Gustaf demoed on how now is possible to use standard tools such as Graphite and JMX console to extract, display and graph the most relevant Gerrit metrics in real-time. This is similar to what the JavaMelody plugin was providing, with the added value of getting the data outside the Gerrit JVM and analyse with greater detail and a standard monitoring platform.

PolyGerrit

Gerrit master has been officially upgraded to allow the development of the new Polymer-based UX for Gerrit, code-named “PolyGerrit”. Gerrit master will need from now on the installation of NodeJS during development. This is needed for building and packaging the “vulcanised” version of Gerrit UX which contains the basic components of the user interface. At the moment the only thing that will be visible is the demo of list of changes presented by Dave Borowitz at the User Summit, however new changes are coming over and Google announced that they are targeting Q4-2015 for a first internal release of the new UX.

GerritForge CI Verifier

As Gerrit 3.0 will be completely revamped in terms of reviews persistence, the community pushed for having a stricter changes validation on both the old DBMS and new NoteDB based persistence. GerritForge extended the use of the CI system (https://gerrit-ci.gerritforge.com) to cover the validation of every change / patch-set that will be uploaded to Gerrit from now on. This is a substantial improvement on the Code Review workflow of Gerrit itself and will hopefully contribute to a stable and solid Ver. 3.0 release next year.

Externalisation of Gerrit Hooks and Events to plugins

Qualcomm worked at completing the externalisation of Gerrit hooks and stream events into plugins. This change will allow to plug different events providers depending on the type of Gerrit set-up, single node or multi-master. One more important step towards an OpenSource implementation of Gerrit Multi-Master.

GerritHub user-controlled GitHub Scopes

Nowadays people are very careful about privacy and user data: nobody grants access to their profile without checking first the possible consequences.
We want to give the user the ability always to know and control what level of access is given to their data: that’s why we improved the way you login in GerritHub.io.

GitHub scopes: what is it?

GitHub provides the authentication and access to user’s profile using a protocol called OAuth 2.0. When GerritHub is requesting a user to authenticate is then granted a set of permissions to operate on behalf of the user on their GitHub resources, which include:

  • User’s personal data (name, e-mails)
  • User’s membership to organisations and teams
  • User’s repositories

The set of permissions to access and operate on your data is also known as “Scope” in GitHub terms.

How is GerritHub helping me to control my access?

GerritHub has from today a new “Scope Selection” screen with two main objectives:

  1. Displaying your current scope and associated rights GerritHub has on your GitHub profile
  2. Giving you the ability to switch to a different “Scope” and consequently the rights that GerritHub has on your profile data

Screen Shot 2015-10-08 at 10.16.53

Transparency is good, but what is the practical added value?

There has been in the past a common complaint about GerritHub having too much or too little access to your GitHub profile:

  • Too much? Why GerritHub.io needs access to my e-mail address? Why does GerritHub need to see my public keys?
  • Too little? Why does GerritHub not show my private repositories in the import screen? How can I see my organisation membership in GerritHub project security screen?

With the ability now to visualise and change the current “Scope”, people can be more aware of why things are not showing up. They can make conscious decisions about how to change them with full transparency on the associated implications.

A common scenario: importing and accessing private GitHub Organisations, Teams and Repositories.

When you need to import an existing private GitHub project, you need to access information that is not publicly available:

  • Your membership to a private organisation
  • Your ownership of a Team structure
  • Ability to clone and push your private organisations’ repositories

There is now a special information box suggesting that you have the ability to change your “Scope” if you don’t see the organisations and repositories you want to import.

Screen Shot 2015-10-08 at 10.22.12

After changing the scope, you can then log in again and you will have an improved set of options to get more data and repositories from your GitHub account.

Like it? Will you use it on a daily basis?

We are eager to get your feedback on this new feature: Tell us what you think and let us know what you would change or add to the set of “Scope” permissions.

GitHub outage, again :-( What is the real cost of FREE services?

Screen Shot 2015-05-06 at 12.47.43

As a bitter surprise today, we are experiencing another GitHub outage. This time it seems a more serious problem than the average DDoS: GitHub’s Ops Team is perform an emergency maintenance on the whole site to recover the situation.

How much a FREE GitHub Service outage really costs me?

Everyone loves GitHub because it is nice, easy and most of all … it’s FREE ! Lots of projects started using it for much more than pure source code versioning:

  • People write books and documentation with it (see gitbook.com)
  • Teams started using it as free artifacts repository manager: projects wouldn’t build at all when GitHub is down
  • Companies started hosting web-pages on GitHub (see the nicely rendered microsoft.github.io)
  • GitHub Issue tracking and wikis are so simple that people are using for project collaboration

When everything works, it is amazing how your Team can be productive using GitHub on a daily basis. But when it fails, what can you do? And what if my Team cannot progress because they can’t see the tasks, wikis, requirement documents, web-pages … how much money am I really wasting when people is hanging around for hours?

Let’s consider a small Agile Team composed by a 1 x BA, 8 x Agile Devs, 1 x Scrum Master, 2 x DevOps and 2 x QA: a 30′ minutes outage like the one today would have an impact on 16 people of 1 man/day that means (for the US market) roughly $1,000 (as optimistic guess, it may cost even more). Even if GitHub goes down twice a year (gosh this happened more than twice I am afraid) your start-up will end up paying around $ 2,000 /year for GitHub. The overall amount doesn’t sound that expensive … but you wonder why GitHub “was supposed to be really FREE” if you end up spending money with it.

If we apply the same figures to a medium size company with at least 160 people working on development, your overall figure would jump to $20,000 /year. More importantly the time lost and delays caused on the project schedule may then have an avalanche effect on other teams and maybe causing additional  pain and costs across your organisation and programme plan. Those extra costs can be sometimes difficult to quantify but for sure are much more relevant on your overall business.

Shall we give up using GitHub then? Or shall we move to GitHub:Enterprise instead?

The typical reaction to a GitHub outage is: “we cannot rely on the FREE version, we should buy GitHub:Enterprise which will run inside our company network” and use this argument with your manager to get a Purchase Order finalised NOW (I may be too malicious … but a outage may actually generate more money to GitHub than loss of reputation). When you look at the GitHub:Enterprise pricing it ends up that for your 160 people you would need to spend only $36,000 /year which seems on the same order of magnitude of your $20,000 wasted money without considering the extra hidden costs of project delays.

But are you really solving the problem? GitHub and GitHub:Enterprise are the same product, same code-base, just different pricing. What makes you wonder that your internal Ops Team can do a better job than GitHub? What makes you wonder that a GitHub bug would not appear on your GitHub:Enterprise set-up? Are you just an optimistic person?

Moving to GitHub:Enterprise is typicall needed when you have compliance / security requirements on data at-rest, but is not really addressing the problem of reliability and would potentially expose your Team to even further outages for software upgrade and management that typically you don’t have using GitHub alone. You are then spending $36,000 on top of your $20,000 (or even more) wasted previously without having real benefits.

Learning how to fly with GitHub

How to solve the problem then? Can we learn from somebody’s else experience?

Airplanes have exactly (if not even more demanding) requirements on their engines as we on a Version Control System. For an aircraft the cruising speed is everything, without that speed provided by its engines he cannot fly; we have similar requirements in our Development Team where GitHub is really what we need for progressing our development otherwise we are blocked.

The solution to the problem for an airplane to be reliable is not buying more expensive engines (which are not necessarily more reliable) but instead using two engines instead of one. Can we apply the same to GitHub? GitHub is in a nutshell a Git Server, why not relying on redundancy and replication? Can I set-up a replica of GitHub and use it for my reviews?

You can of course build your own replica using plain Git and GitHub WebHooks: it would require a bit of scripting but it can be done. During an outage you can use the replica and when GitHub is back all the pending changes can be pushed back to GitHub.

Can I have another FREE and automated replica of GitHub?

This is becoming challenging now: we want something that is completely FREE (no time spent in writing scripts, webhooks, no service provider to pay, no commercial product) but that allows us to use GitHub replicated, including Code Reviews.

It seems strange but what we are looking for actually exists and it is an OpenSource project called Gerrit Code Review. It is not only a Code Review and Git Server like GitHub but offers as well more advanced security and replication capabilities. It has been designed taking into account the needs of large distributed Teams and making their daily development lifecycle more reliable independently from local failures.

Cool, how can I get started with Gerrit and GitHub now with no hassles?

You read this quick introduction for getting started in setting up your private replica or, you are really in a hurry and you wanted a FREE hosted service, you can sign-up with 3 clicks to GerritHub.io.

I have only 5 mins of free time today: what can I read/watch to understand how it works?

Well, there are plenty of resources but if you are really in a hurry, you can watch the following YouTube Video:

If you have more time, you can read the Gerrit Code Review overview and tutorial at: https://review.gerrithub.io/Documentation/intro-quick.html

Get ready now to avoid wasting again money when the next GitHub outage … that nobody wishes … will (sadly) happen 😦