Code Review Analytics at Gerrit User Summit

I love hiking mountains; I always did since I was a child. There is a mix of challenges, enthusiasm, and learnings in walking for long hours through little and tortuous steep trails: it challenges your mind and body and makes you stronger.

One thing that always fascinated me is how the shape of peaks and valleys changes as you go higher and raise your perspective. When finally, exhausted, you reach the mountain peak you have a mix of pleasure and relief. You have achieved your goal, and you can, at last, have the full view on the horizon, understand how mountain chains are linked together, where rivers start and end up in lakes, and you can see far away as never before.

Continuous Delivery is a landscape of rivers, lakes, and mountains

I believe today’s landscape of Software Engineering is very diverse: there are so many powerful tools that generate streams of data continuously, and we need to get on top of them every day to be successful in an ever-evolving market space.
Continuous Delivery is the key methodology that nowadays allows the entire “software production chain” working smoothly. However, it poses challenges which are not always technical but often related to the flow of information across the systems and people.

See the big picture

To succeed, we need to raise our point of view to see the “big picture” and understand how things are connected and where are the improvement points, considering all the data we have about:

  • Tools
    collect software and system metrics, logs, test results, build trends
  • Projects
    repositories, commits, branches, pull requests, patch sets
  • People and Teams
    active and passive collaborators, contributors, reviewers, comments and replies

Collecting data is not enough, we need to raise our point of view and hike the Continuous Delivery mountain of problems and reach a point where all those elements make sense because they are:

  • all visible from a single perspective
  • correlated
  • aggregated

Yet another BigData problem

The problem is too many data sources to manage.
The typical solution to the problem is taking all logs from everywhere and publishing them to a single repository using an ELK (ElasticSearch + Logstash + Kibana) stack and building fancy dashboards. I have used this approach for small-scale projects, and it works quite well, but … when I tried to scale that to a much bigger Continuous Delivery pipeline the complexity, diversity, and granularity of data just killed my ability to see the “bigger picture” and I felt almost helpless in front of my Kibana dashboards.

Back to the source of data

Trying to understand what was missing, I ended up realizing that some of the dimensions were not taken into account and correlated: Code Reviews.

All the tooling were about test results, build, system and application logs but none of them has taken the code into account, which is the source of all the pipeline. You can understand where to go if you realize where you are coming from: the code is the source of all build chains.
When I started collecting data from the code repository and reviews, all made sense again, and I felt the I have reached the peak of my Continuous Delivery hiking effort: all made sense again and I could see the overall perspective.

Continuous Delivery Analytics

Exactly in the same way we Application Analytics are used to collecting data about our production system, split into dimensions and analyzed, we need then to start doing the same with our Continuous Delivery pipeline.

There are already some of the tools out there in the market which integrates some parts of it, but I haven’t seen a single one who can give you the bird’s eye perspective you need to understand the big picture.

That’s why I started writing one, using the only way make sense for me: writing in the open and with the help and cooperation of the OpenSource community of people and companies who share the same problem and have the same perspective.

Gerrit Analytics coming at the User Summit 2016

I presented my ideas at a couple of conferences (Devoxx, JenkinsWorld) and I received a lot of appreciation and feedback: the next one is the Gerrit User Summit in Mountain View – CA.

Large companies like SAP, Qualcomm, Ericsson,  Google and Intel are exchanging every year their problems and ideas on how to make their Continuous Delivery Pipelines smoother, better and faster.
The perspective will be, of course, more Gerrit Code Review centric with more data and views that make sense from a review perspective.

Call to action

Come to the Gerrit User Summit 2016 in Mountain View – Google HQ on the 12th and 13th of November, and see the Gerrit and Continuous Delivery Pipeline in Action.

The event is FREE, register now at https://goo.gl/forms/oeEnQweHl2noNSnn1

 

Advertisements

GitMinutes #30: Luca Milanesio on Gerrit Code Review

git-minutesMany thanks to Thomas Ferris Nicolaisen for inviting me to talk about Gerrit Code Review at GitMinutes.

It has been a very interesting discussion on the benefits of Code Review and how Gerrit can help out small and large companies embracing it.

The interview is available on-line at http://episodes.gitminutes.com/2014/07/gitminutes-30-luca-milanesio-on-gerrit.html, alternatively you can download and listen the 1h 27′ conversation on PodCast at https://itunes.apple.com/de/podcast/gitminutes-podcasts/id637843725?l=en.

Use the force Luca!

We started (of course!) talking about the [in]famous force push of 186 Jenkins repositories to GitHub, I was on the Top-10 HackersNews over 7h … so I was expecting the question to pop-up during the interview 🙂

My friend Alex Blewitt took the opportunity as well to forge a Star-Wars like headline for his InfoQ article on what happened.

Git adoption in the Enterprise, where all began

We moved the discussion to the foundation of my business on Git and Code Review and the reasons and challenges that an Enterprise company is facing when moving to Git. We went through the history on how LMIT started GitEnterprise.com and then focused on Gerrit Code Review based product and services for large Enterprises World-Wide: a niche and successful business nowadays.

GitHub or Gerrit? or both with GerritHub?

As I expected, we ended up comparing GitHub and Gerrit analysing the similarities and differences between the two. This topic has been presented as well in two conferences at Gerrit User Summit @GooglePlex – Mountain View CA and 33rd Degree.org Java Developers Conference in Krakow; slides are available at http://www.slideshare.net/lucamilanesio/gerrit-codereviewgit-hubplugin.

Gerrit has historically been considered as “more difficult” than GitHub: true in the past but not anymore today apart from the Web User-Experience CSS styling, much nicer and pleasent on GitHub. The availability of http://gerrithub.io allowed over 1,800 developers since October 2013 to get started with Gerrit in less than 5 minutes by watching an Gerrit Introductionary YouTube video: using it was then just 3 clicks away, no installation or configuration needed! The availability of an easy and accessible Public Cloud instance represents a big improvement in accessibility and usability of Gerrit.

For which teams is Gerrit the right choice?

We talked about the “typical learning curve” of people coming from previous version control systems, such as Subversion. Does it make sense to get started with Git and Gerrit at the same time? When is Gerrit needed and when is it going to provide most of its value?

I’ve covered the topic in the past webinars and talks: hands-on Webinars recordings are freely available on-line at:

The size of the project (in terms of number of people x number of repositories) is typically one of the key factors in Code Review adoption. Gerrit however can be used as well as a standalone OpenSource Git Server , even without leveraging its Code Review capabilities: this makes the choice of Gerrit a good first step towards a smoother Git adoption.

What are Gerrit Topics about?

We went through a very interesting discussion about “Gerrit Topic”, a feature that is not new to Gerrit but is sometimes forgotten besides its important and relevance for medium-large teams.

With the forthcoming support of multi-repositories atomic commits in Gerrit, it will be possible to merge multiple changes on multiple repositories at the same time for a single topic. This feature is not ready yet but coming hopefully in the near future and Google Gerrit Team developers and contributors are working on it.

The ability to make an atomic commit across multiple repositories will allow to have a more consistent Jenkins build process as well, with less broken builds because of interdependent changes on multiple components.

Who is using Gerrit today?

We talked about the adoption of Gerrit in the community, which is growing year after year. A lot of medium companies adopted Gerrit in the past, including Spotify side-by-side with GitHub.

The ability to “submit a change” to any project without the risk to break the build is definitely an incentive to encourage even more people to contribute to share the knowledge and improve the code base, without the risk of breaking anything or  forking the code. This is one of the reason that drove large OpenSource organisations such as the Eclipse Foundation and OpenStack to the adoption Gerrit Code Review in their tools platform.

How to embrace Code Review in a Team or Company?

We went through an interesting comparison / discussion of Agile Methodology vs. Code Review. Often Teams misunderstand and confuse the concept of “review” with “pair-programming”: the problem was well analysed in my book “Learning Gerrit Code Review” (available on Amazon.com at http://www.amazon.com/Learning-Gerrit-Code-Review-Milanesio/dp/1783289473). I defined the pair-programming as a dot in a time/people space: two developers writing a piece of code at the same time. This however does not exclude all the other points in the time/people space where multiple people at different times will read the code and provide their feedback: pair-programming is then a “specific example” of the “code review space”.

Because of the different perspectives (pair-programming is a dot whilst code-review is a “cloud of dots” in time/people space) they are not one exclusive of the other: they are equally important and both enable effective collective code ownership and knowledge sharing.

References and greetings.

It has been a very long but interesting discussion with Thomas and hope you’ll enjoy it.

See below the links of the resources we mentioned during the interview:

Thanks again to Thomas for his fantastic initiative: GitMinutes PodCast!

Luca Milanesio 

Gerrit User Summit 2014 talks proposals

The list of talks proposed for the next forthcoming Gerrit User Summit in Mountain View has been published.

There are very interesting talks on ideas, extensions and case studies from large enterprises and projects: it is going to again an exciting rendez-vous for all of those interested in SCM, SDLC and Continuous Agile.

See below a distilled summary of the proposed topics:

  • Using Gerrit and Jenkins together for the LibreOffice OpenSource Project
  • How to manage and monitor Gerrit using JavaMelody
  • Extend the GitHub fork & pull-request model using Gerrit Code Review lifecycle and GerritHub.io
  • Extending Gerrit with scripting plugins (Groovy, Jython and Scala)
  • Continuous Development and Code Review with Codenvy
  • Large scale Gerrit installations with testimonials from OpenStack, Yahoo and Ericsson !
  • Integrating and using Gerrit in the Enterprise with CollabNet TeamForge
  • … and new talks are coming over !

Seats are running out quickly but there are still spaces available: you can register now for free to the Gerrit User Summit event:

See you soon at the Gerrit User Summit 2014 !

Gerrit Code Review or Github’s fork and pull ? Take both !

When searching on Google with the keywords “Gerrit” and “GitHub” you find lots of different links with more questions than answers; see below a selection of the most interesting ones:

And additionally Linus Torvalds, the father of the Git version control, whilst keeping the Kernel source on GitHub, expressed explicitly in his own way what he thinks about Pull Requests.

Google decided to use a different tool than GitHub and developed Gerrit Code Review for managing the community effort of developing the Android Operating System, mainly for two reasons:

  1. GitHub pull requests model wouldn’t have worked for Android: forking the projects several thousands times would have been just unsustainable. Google recognized that early on and Gerrit was developed with the “not like GitHub pull request” requirement.
  2. GitHub is not (and today has no plans to become) OpenSource

There are for sure additional reasons why even today and even if GitHub would decide to become OpenSource in the future a long set of features that GitHub would be needed in order to support a large-scale cooperative project !

What is Gerrit Code Review today ?

Today Gerrit is much more than the Android OS review tool ! There are around 80 contributors  growing over time and from both large industries and OpenSource projects. SAP, Sony Mobile and Qualcomm IC are amongst the most active companies contributing to the tool whilst from the OpenSource community there are LibreOffice, Openstack and Wikimedia.

What is the right choice then ? red pill or blue pill ? Open or commercial ?

We thought about the problem very deeply at GerritForge.com as some of our customers decided to completely quit GitHub, mainly for security and confidentiality reasons but others moved into the opposite direction as well embracing GitHub:Enterprise.

In a nutshell the criteria that drove those customers into one (GitHub) or another direction (Gerrit Code Review) were based on the following aspects.

Security.

  • GitHub: history quite weak because of its architecture mainly based on Ruby (or let’s say a naive implementation based on Ruby, as the language itself is not so weak from a security perspective). Problem was solved but raised many concerns in the industry on how many more security problems are still to be found.
  • Gerrit: completely written in Java and with Security in mind. Large corporations such as SAP, Sony Mobile, Qualcomm and many other enterprises, organisations and non affiliated individuals/volunteers contributed to the review and development of the code-base. OpenSource and community code inspection has always the golden rule for very secure projects (e.g. OpenSSH and OpenSSL are widely reviewed and OpenSource) and code-obsucurity has always been a security anti-patterns.

High availability.

  • GitHub: it has been historically very reliable, especially at the beginning. When it started to become popular and saw its traffic to increase exponentially started to be rather unreliable because of several repeated DDoS attacks. GitHub:Enterprise is a proprietary-locked VM that can be installed on-premises but not on a private / public Cloud.
  • Gerrit: differently from GitHub, it is not a service and can be hosted either on your private / public Cloud or on-premises. Google has some instances in his own distributed cloud network around the world and managed with high availability in mind for Android OS development and other OpenSource projects (and for Gerrit self-hosting of course). Google’s deployment has not been impacted by DDoS attacks so far and its physical deployment is protected by the standard Google DataCenters network security. Other deployments are either private or distributed around different projects’ sites.

Usability.

  • GitHub: the key of the success of GitHub is its amazing user-experience and the ability to push the OpenSource development to a new level of social collaboration ! We all need to be grateful to GitHub for having made the OpenSource development ever more interesting and fun for the masses.
  • Gerrit: the user interface is functional but not “shiny” or “attractive” as a modern social collaboration platform should be. In a nutshell Gerrit does not want to be a developer’s social network but rather targets its specific objective of managing Code and Projects across large teams. This is the reason why large OpenSource communities such as the Eclipse foundation embraced Gerrit.

Scalability

  • GitHub: based on C-Git implementation (using the GitHub libgit2 library) that works very well with small repositories. However when the number of BLOBs and Packs increases the effort of counting them through the repository history grows linearly over time (*). With regards to the number of repositories, GitHub demonstrated to be capable of being very effective in distributing the data cross their nodes and sharing BLOBs for limiting the disk-space needed for forked repositories.
  • Gerrit: the R&D folks working at Google have invested a lot of time in optimising JGit for large repositories and a large number of users accessing them. The latest excellence of their performance improvements is represented by the JGit bitmap implementation (thanks to the fantastic work by Colby Ranger). Those optimisations however are not present in the C-Git code-base used by GitHub. With regards to the number of repositories the largest installation I have ever seen has less than 50K projects: it has never been used or tested with millions of repositories AFAIK.

(*) Note from Shawn Pearce on this topic: “Its just crazy slow per object, the C implementation discovers around 70k objects/second. 3M objects takes 42 seconds at best, the truth is the rate of new object discovery slows as it goes further back in history, which is why counting 3M objects takes modern machines minutes. GitHub has tried porting the bitmap code to C. Its running in some limited cases on their site, at one time https://github.com/torvalds/linux/ had it enabled. We haven’t seen updated patches for it, and it looks like its disabled again.”

Code Review

  • GitHub:  uses the fork + pull model. In a nutshell every user always pushes to its own “forked version” of the repository and, once the changes are ready, request the source repository owner to pull its changes. Works very well for projects where there is a single approver of all the incoming changes and the GitHub user-interface is simply amazing in the way that changes are displayed and navigated in a unified-diffed view making the multiple commits review a simpler task.
  • Gerrit: being designed for projects with many contributors and committers, do not embrace at all the fork + pull model. It would have been simply unmanageable having hundreds of thousands of forked version of Android OS code-base ! The Gerrit workflow is mainly derived then from the Android OS contribution workflow: each contribution is defined as “Change”, has a unique ID (Change-ID) and is composed by a set of Patches (Patch-Set) of candidate changes. When the latest Patch-Set reaches the necessary score to be approved (Code-Review +2 and Validate +1 for the Android OS workflow) then it can be merged.

Why not using Gerrit and GitHub together ?

This is not a new idea as it has been proposed and successfully implemented by some popular OpenSource projects such as:

The benefits of using both tools are twofold.

From the features and performance perspective the projects can benefit from the Gerrit JGit engine and associated Code Review capabilities. Gerrit Code Review model may seem less friendly than GitHub’s Pull Request but eventually generates a more readable and maintainable code-history, essential for long-term products in production.

From the point of view of accessibility and social community, the fact of using GitHub allows WikiMedia and Openstack to have an extended reach and at the same time even off-load all the clone traffic to GitHub nodes instead of their Gerrit servers !

Why GerritHub ? What is the value added by the platform ?

We thought about creating GerritHub about 2 years ago, when we first discussed with Kohsuke Kawaguchi, the adoption of Gerrit for the Jenkins Continuous Integration project. He liked Gerrit at first sight when he joined the Git Together in 2011 @Mountain View but at the same time he was concerned about the loss of reach and ease of use of GitHub.

The integration between the two tools was technical possible but challenging and needed some significant set of Gerrit skills to be implemented correctly, including the integration between the Pull Request model and the Gerrit Code Reviews.

GerritHub is the first Gerrit-powered platform that offers the best of Gerrit 2.8 (current master release) integrated with GitHub SSO (using OAuth 2.0) and replicated to GitHub repositories and Pull Requests. Differently from the WikiMedia and Openstack implementations, it is a self-service platform and anyone who has a GitHub account and repositories can self-register at GerritHub and use it for its own OpenSource projects !

Summary.

There is no winner in the battle between GitHub and Gerrit because they are simply different tools for different audiences. There are cases where the needs are mixed and both can provide a valid platform for the purpose of the projects.

Gerrit has been historically a niche tool, confined to the Android OS development: now things are different and major OpenSource projects adopted it as standard. However the need of a “public GitHub presence” was needed and has been implemented.

GerritHub gives you the choice of taking and using the best of both !

Learn more about Gerrit Code Review and GerritHub.

Gerrit Code Review home:
http://code.google.com/p/gerrit/

One-click sign-In and auto-registration to GerritHub:
https://review.gerrithub.io/login

Book about Gerrit Code Review:
http://gerrithub.io/book