Gerrit User Summit 2016: Ten days to go

gerritforge-keepcalmandcodereview-copy

In ten days time, the 2016 Gerrit User Summit will open its doors at the Googleplex Campus in Mountain View – CA.

It is going to be an amazing event, with a lot of exciting updates from the Gerrit Community, thanks to the number of innovations that are coming into the platform, from large files support (Git LFS), HTML5 Polymer-based new UX, NoteDB, multi-master updates and a lot more.

GerritForge will be present with five attendees and six amazing talks, covering many of the aspects of the world of Code Review and Continuous Delivery Pipeline:
Code Review Analytics
High Availability and Zero-Downtime upgrades
Infinite scalability with Gerrit on Apache Cassandra
Jenkins 2.0 Continuous Delivery pipeline for Gerrit
Speeding up builds with Bazel on Gerrit
Robot feedback management for Gerrit

Once again, as we always did in the past, we continue to fuel new ideas and innovation to the world of Code Review, with the feedback from the Gerrit User’s Community and developing the ideas as OpenSource, always.

See you soon at the Gerrit User Summit 2016.

Code Review Analytics at Gerrit User Summit

I love hiking mountains; I always did since I was a child. There is a mix of challenges, enthusiasm, and learnings in walking for long hours through little and tortuous steep trails: it challenges your mind and body and makes you stronger.

One thing that always fascinated me is how the shape of peaks and valleys changes as you go higher and raise your perspective. When finally, exhausted, you reach the mountain peak you have a mix of pleasure and relief. You have achieved your goal, and you can, at last, have the full view on the horizon, understand how mountain chains are linked together, where rivers start and end up in lakes, and you can see far away as never before.

Continuous Delivery is a landscape of rivers, lakes, and mountains

I believe today’s landscape of Software Engineering is very diverse: there are so many powerful tools that generate streams of data continuously, and we need to get on top of them every day to be successful in an ever-evolving market space.
Continuous Delivery is the key methodology that nowadays allows the entire “software production chain” working smoothly. However, it poses challenges which are not always technical but often related to the flow of information across the systems and people.

See the big picture

To succeed, we need to raise our point of view to see the “big picture” and understand how things are connected and where are the improvement points, considering all the data we have about:

  • Tools
    collect software and system metrics, logs, test results, build trends
  • Projects
    repositories, commits, branches, pull requests, patch sets
  • People and Teams
    active and passive collaborators, contributors, reviewers, comments and replies

Collecting data is not enough, we need to raise our point of view and hike the Continuous Delivery mountain of problems and reach a point where all those elements make sense because they are:

  • all visible from a single perspective
  • correlated
  • aggregated

Yet another BigData problem

The problem is too many data sources to manage.
The typical solution to the problem is taking all logs from everywhere and publishing them to a single repository using an ELK (ElasticSearch + Logstash + Kibana) stack and building fancy dashboards. I have used this approach for small-scale projects, and it works quite well, but … when I tried to scale that to a much bigger Continuous Delivery pipeline the complexity, diversity, and granularity of data just killed my ability to see the “bigger picture” and I felt almost helpless in front of my Kibana dashboards.

Back to the source of data

Trying to understand what was missing, I ended up realizing that some of the dimensions were not taken into account and correlated: Code Reviews.

All the tooling were about test results, build, system and application logs but none of them has taken the code into account, which is the source of all the pipeline. You can understand where to go if you realize where you are coming from: the code is the source of all build chains.
When I started collecting data from the code repository and reviews, all made sense again, and I felt the I have reached the peak of my Continuous Delivery hiking effort: all made sense again and I could see the overall perspective.

Continuous Delivery Analytics

Exactly in the same way we Application Analytics are used to collecting data about our production system, split into dimensions and analyzed, we need then to start doing the same with our Continuous Delivery pipeline.

There are already some of the tools out there in the market which integrates some parts of it, but I haven’t seen a single one who can give you the bird’s eye perspective you need to understand the big picture.

That’s why I started writing one, using the only way make sense for me: writing in the open and with the help and cooperation of the OpenSource community of people and companies who share the same problem and have the same perspective.

Gerrit Analytics coming at the User Summit 2016

I presented my ideas at a couple of conferences (Devoxx, JenkinsWorld) and I received a lot of appreciation and feedback: the next one is the Gerrit User Summit in Mountain View – CA.

Large companies like SAP, Qualcomm, Ericsson,  Google and Intel are exchanging every year their problems and ideas on how to make their Continuous Delivery Pipelines smoother, better and faster.
The perspective will be, of course, more Gerrit Code Review centric with more data and views that make sense from a review perspective.

Call to action

Come to the Gerrit User Summit 2016 in Mountain View – Google HQ on the 12th and 13th of November, and see the Gerrit and Continuous Delivery Pipeline in Action.

The event is FREE, register now at https://goo.gl/forms/oeEnQweHl2noNSnn1

 

Gerrit Upgraded with No Downtime

Screen Shot 2016-03-21 at 20.48.12

Zero DownTime success story.

From today at 08:06 GMT GerritHub users are served by our brand new infrastructure geo-located in Canada, Quebec, Beauharnois. It is the first time we applied a zero-downtime roll-out scheme, the PingDom uptime for the past 24h reported 100% uptime and 688 msec average response time for the page of the list of opened changes. The two response times spike on the above graphs are actually due to the old German infrastructure and happened before the start of the roll-out.

We can see the switch of the traffic to the new infrastructure from the increase of the overall response time (IP packets were routed from Germany to Canada causing extra hops); as the DNS propagation was spreading across the world, the overall number of hops gradually came back to normal.

Timeline of the events.

  • 08:00:00 GMT – Phase 1 – Set Gerrit READ-ONLY. All changes and Git repositories started to refuse push and updates.
  • 08:00:01 GMT – Phase 2 – Wait for pending replication to complete. Replication queue was empty; there was no need to wait.
  • 08:00:02 GMT – Phase 3 – Mirror DB and Git for the last time, delta-reindex, DB upgrade and Gerrit restart. It has been the longest part of the roll-out and lasted 5′ 32”, aligned with our estimates.
  • 08:05:34 GMT – Phase 4 – Cache warm-up. 20K projects, 8K accounts and 4.6K groups were pre-loaded in Gerrit. This step was optional but allowed us to redirect all the traffic without risks of causing thread spikes on the new infrastructure.
  • 08:06:23 GMT – Phase 5 – Redirect traffic to the new infrastructure.

Did anybody notice the rollout?

During the rollout the Git projects and Gerrit changes were read-only for 6′ and 23”. According to the logs, 493 Git/HTTP and 172 Git/SSH invocations were made and completed successfully: none of them failed.

What is the situation right now?

The new infrastructure public IP (192.99.233.76) has almost completed his DNS propagation around the world, the only countries not entirely covered are Australia and China. The rest of the world is coming directly to Canada avoid the German hops. Metrics are good, low CPU utilization and threads consumption compared to the old German infrastructure, symptom of the reduction of the execution and serving times and latency.

What’s next?

From now on we will continue to use this Blue/Green roll-out strategy, possibly improving in the ReadOnly window by introducing live distributed reindex and cache warm-up.

We fully commit to Zero-Downtime and Stability, the most valuable assets for our clients.

GerritHub and Zero-Downtime Upgrade

GerritHub gets bigger on Mon, 21 March 08:00 GMT


GerritHub has experienced unprecedented growth over the past two years. The November 2015 numbers presented at the Google User Summit in Mountain View – CA have been surpassed again, and we do need to make sure that our infrastructure is still capable of dealing with current and future users’ needs.

What is changing in GerritHub.io?

We are changing everything, from the version of Gerrit to the hardware, network and storage infrastructure. Data, DBMS, Indexes and cache, need to be upgraded and refreshed to make sure that the new systems are reflecting exacting the current production data and sessions.
We are changing as well the geo-location of our servers, from the current server farm in Germany (Bayern, Nuremberg – 100 MBps) to a new server farm in Canada (Quebec, Beauharnois – 1 GBps).

Why have so many changes?

We started to measure some significant delay in the Git and review operations on the old infrastructure, mainly due to three factors:

  1. More users, more repositories, more concurrency. Individuals, OpenSource projects and Businesses started using GerritHub.io for their mission-critical repositories, considering Gerrit the “source of truth” of their review workflow. We needed more horsepower, memory, storage and ability to scale even further.
  2. Bandwidth from USA and Far-east. The majority of people using GerritHub.io are from the other side of the Atlantic Ocean: this is typically not a problem from 7 AM to 3 PM … but after 4 PM the connectivity between Europe and the Americas becomes slow. Additionally, people using GerritHub.io from India, Japan, Australia and New Zealand experienced terrible slowdowns because of the excessive number of hops to reach Germany.
  3. Gerrit master is much faster. Based on the current data and metrics measured on GerritHub.io, we have contributed a lot of patches to reduce the overhead caused by Gerrit DB and lessen the number of SQL queries per minute. All those new improvements are on Gerrit master, and we need to catch-up with the “latest and greatest” version.

Will I experience any GerritHub.io outage?

Last time that GitHub needed to make a major upgrade, asked his 5M users to stop working for 23 minutes,. This translates to a loss of two millions of hours of continuous delivery lifecycle, equivalent to over 130 man/years, worth no less than eight millions dollars.
We are going to adopt a new Zero-Downtime Gerrit roll-out strategy to make sure that all those changes are not going to impact your day-by-day activity. If you were not reading this post you would possibly even not notice the “switch” from the old to the new infrastructure, apart from the increase in speed and bandwidth.

Zero-downtime GerritHub.io migration, step by step with the associated expected timings.

Phase 0 – Replication to the new Gerrit infrastructure. (- 1 month ago)
We started migrating everything one month ago, and the old and new infrastructure are working side-by-side, thanks to Gerrit master-slave replication. The new Gerrit servers are active as slaves and are read-only.

Phase 1 – Migration kick-off. (08:00 GMT)
We install a Gerrit plugin that rejects all the push to GerritHub.io repositories providing a courtesy message: “Gerrit is under maintenance, all projects are READ ONLY”. All the HTTP POST, PUT and DELETE are disable on the Gerrit REST-API.

Phase 2 – Wait for replication events to complete and migrate DB. (08:02 GMT)
Git repositories are continuously replicated, but we do need to make sure that the event queue is empty. Once that happens we schedule the last final DB migration to the new infrastructure.

Phase 3 – Gerrit DB upgrade and reindex (08:04 GMT)
New Gerrit server executes the final upgrade and off-line reindex of the latest received changes.

Phase 4 – Gerrit start-up and cache warm-up (08:05 GMT)
New Gerrit is restarted and the most critical Gerrit caches (projects, accounts and groups) are pre-loaded in memory. This allows the incoming traffic spike to avoid the collapse of used threads and makes the transition as smooth as possible without slowdowns.

Phase 5 – Traffic switch and DNS updates (08:06 GMT)
GerritHub.io redirects all incoming HTTPS and SSH traffic to the new infrastructure. Git pushes and HTTP PUT, POST and DELETE operations of the REST API are operational again and served by the new Gerrit infrastructure. GerritHub.io DNS is updated to the new Canadian IPs.

Phase 6 – New IPs gets propagate to all worldwide DNSs (+ 1 day)
Once all the DNSs in the world would have been updated, everyone will start going directly to the new infrastructure without further hops or redirection from Germany. Customers from USA, Canada, South America, Asia, Japan, New Zealand and Australia should see a significant reduction of the network latency and increase of GerritHub.io responsiveness.

Firewall and SSH considerations

Even if Gerrit server’s SSH key is not changing, some of you may see a warning similar to this when they push or pull over SSH:

Warning: the RSA host key for ‘review.gerrithub.io’ differs from the key for the IP address ‘148.251.77.70’

The warning message will also tell you which lines in your ~/.ssh/known_hosts need to change. Open that file in your favorite editor, remove or comment out those lines, then retry your push or pull.

Should your network have some strict firewall rules to access external sites, you may want to whitelist the IP of the new infrastructure WLB to: 192.99.233.76.

Follow GerritHub.io migration progress.

We will advertise the migration progress on Twitter at @GitEnterprise. Should you have any issue you can tweet us or contact GerritForge Customer Support at support@gerritforge.com.