London to host Gerrit User Summit 2017

gerritusersummit-2017-poll-results.png

… and the winner is … Europe/London!

Despite the future Brexit plans, London will still be this year the beating heart of Code Review innovation by hosting the Gerrit User Summit 2017.

Here are the numbers in detail:

  • 157 people visited the poll page (goo.gl/M7X6rp)
  • 75 people from 14 countries expressed their vote in the past two weeks
  • Summit in Europe (only Europe + USA/Europe) received 54 votes
  • Summit in USA (only USA + USA/Europe) received 37 votes

Countries

GerritUserSummit2017-countries.png

The audience is very diverse, with the most votes coming from West and East Coast of the USA, the British Isles, and Germany. There was some interest in the Summit as well from Israel, India, and Japan.

It will be excellent to see new faces at this year Gerrit User Summit, to exchange ideas and capture new and essential requirements for the next versions to come.

The London Venue and Dates

SkillsMatter.png

SkillsMatter offered the CodeNode venue in London (10 South Place, London, EC2M 7EB, GB) which is entirely dedicated to hosting events, meetups and user groups from the global OpenSource Community.

The new venue can record every session, open policy about taking pictures and share content and the possibility of streaming the event to allow remote attendees to watch and interact.

The proposed dates for the event are:

  • Saturday 30/09 to Sunday 01/10 – Pre-summit and Gerrit+Plugins Hackathon
  • Monday 02/10 to Tuesday 03/10 – Gerrit User Summit

New location, same community, and format

Even though this year the country and location will be different from the past User Summits hosted by Google in Mountain View – CA, we want to keep the organization and format of the Summit exactly as it was before:

  • User-driven: focussed on sharing experiences and networking between the users of Gerrit Code Review
  • Self-organized by the Community: no calls for papers, sponsored talks or products presentations. All has to come from the users and voted by the users.
  • No commercials: even though the supporters of the event are business entities (SkillsMatter, eSynergy, GerritForge and many others) we will be very careful in keeping the spotlight on the users and not interfere with them.

Costs and sponsorships

Historically Google has paid for all the costs involved in the Summits (venue, catering, marketing, etc.). For this year, a lot of companies who are using Gerrit Code Review and contributing to it have already provided their interest in contributing to cover the costs and make this European event successful.

We want to keep the list of sponsors small and tightly coupled with the Gerrit Community. If you are a company and you want to sponsor the event, please contact Luca Milanesio (luca@gerritforge.com) or post your offer to the Gerrit Code Review mailing list https://groups.google.com/forum/#!forum/repo-discuss.

Next steps

We will work together with the Gerrit Community to organize this event and make it fruitful and profitable for the future of this amazing OpenSource project.

Thanks again for your vote and your interest in the 2017 Summit.

Luca Milanesio – GerritForge

GerritHub.io powered by Polymer

polymer-logo

You can now use the Gerrit Code Review interface on GerritHub.io using the new Polymer-based UX. The code-name of the project is PolyGerrit.

Enabling Polymer is easy, just add “?polygerrit=1” to the regular GerritHub.io URL. or alternatively you can click on the  “PolyGerrit” link at the bottom of the GerritHub.io footer.

https://gerrithub.io/?polygerrit=1

Alternatively you can click on the  “PolyGerrit” link at the bottom of the GerritHub.io footer.

polygerrit-footer

What is Polymer and why it matters?

Polymer is a new type of library for the web, built on top of Web Components, and designed to leverage the evolving web platform on modern browsers. Differently from the traditional Gerrit GWT UX, Polymer is developed using  HTML and JavaScript and is supported and cached natively from your web browser.

Why is Gerrit moving away from GWT? Well, there are many reasons for steering direction:

  • Speed.
    With Web Components, the core building blocks of the UX are transferred once and then cached by the browser, giving a much faster and fluid user experience.
  • Multi-sized screen support.
    Web Components are “instructed” to be rendered differently on different devices, giving the best user experience and accessibility of the device’s form factor and native interactions.
  • Branding.
    By using native HTML and CSS rendering, it becomes much easier to control and customize the look and feel of Gerrit.
  • Stability.
    Web Components development allow focusing on basic building blocks that are straightforward and testable. It becomes much easier to develop much more stable and robust components.
  • Skills-set.
    Typically, web developers find themselves more comfortable in using a front-end language and have much more control of what gets generated, understood and rendered by the web browser.

GerritHub.io on mobile

One interesting feature of PolyGerrit is the mobile view of Gerrit UX.

polygerrit-on-mobile

Gerrit’s GWT interface has always been quite a challenge on small-screen devices, mainly because of the inability of being sensitive to the reorganization fo the page real estate based on the device’s form factor.

Additionally, the buttons did not have enough real estate needed for “big thumbs”.

The ability to use Polymer has opened up a wide range of possibilities of interactions and, for the first time, a real fully featured review experience “on the road”.

Big thumbs up for the new Gerrit Mobile UX !

What and when?

What is currently implemented in the new Gerrit Polymer-UX? Not much in terms of screens, but a lot of the core functionality of the Code Review: the changes list and the review panels. What is missing? All the other Gerrit screens more dedicated to the management of the platform and the account, the projects and groups listing, and all the support for the UX plugins.

The PolyGerrit project announced at the Gerrit User Summit 2016 last year that they are making good progress and expect to have the first “production-ready” release by Feb/Mar 2017. In the meantime, all the code is stable and the current screens are good enough to be used on a daily basis.

Do I still need the GWT UX?

Yes, at least for the functionality that isn’t available yet in PolyGerrit. To switch back to GWT, simply add “?polygerrit=0” to the target URL.

https://gerrithub.io/?polygerrit=0

Credits

Many thanks to the Google’s PolyGerrit Team, lead by Andrew Bonventre, which started the Polymer support for Gerrit UX. A special thank to David Ostrovsky, who implemented the Gerrit switcher support to enable to use both GWT-UI and PolyGerrit-UI at the same time.

Gerrit User Summit 2016 Report

gerritusersummit-2016-google

The 9th edition of the Gerrit User Summit has ended today: a lot of new features and ideas have been presented and experimented in those four days.
Here is a summary of what happened and a description of the main innovations and presentations at the Summit and Hackathon at GooglePlex in Mountain View – CA.

12th-13th of November – The Summit

Gerrit v2.13

David Pursehouse from CollabNet presented what is coming in Gerrit v2.12 and v2.13.
Gerrit v2.13 has been the biggest version of Gerrit ever, with over 2,600 commits from 83 different contributors. That shows that Gerrit is an OpenSource project with a growing Contributors’ Community, getting more diverse over time.

There have been a lot of new features and improvements introduced. The most significant and noteworthy features are …

Git LFS

Ability to store large files outside the Git repository to an externally pluggable repository, such as HDFS or AWS S3.

Gerrit Metrics

New extension point to extract in real-time the performance data of Gerrit’s internal activity and publish them externally to Graphite, ElasticSearch, JConsole or other similar graphing and monitoring tool.

Hooks

There is now the ability to replace the dated hook system with much more powerful in-process-plugins. Existing hooks are then still supported using the “hooks” core plugin. Gerrit is getting more modular and extensible.

Improved review of Merge Commits.

Gerrit can implement effective “merge-requests” lifecycle on top of regular code review. You can now create feature branches exactly as you do with GitHub or GitLab and then submit only the “merge request” as change for reconsideration.

What’s cooking in Gerrit v2.14

PolyGerrit

After eight years, Gerrit GUI needed a complete revamp and modernisation, to compete with all the other web-based Code-Review systems such as GitHub, GitLab, and BitBucket.

Google has allocated a brand new front-end Team, located in San Francisco, to redesign and implement a brand-new user experience. It is much more than a simple rewrite; it is a complete rethinking of how to get the most of your code review functionality with the optimal utilization of space.
The new UX will be more fluid and adaptable to different sized screens, including tablets and even smartphones. It will be then possible to give a usable GUI even when accessing Gerrit “on-the-road.”

Suggested Reviewers

Gerrit is getting smarter and, by using sophisticated heuristics, it can now understand and sort the suggestions of which are the people that are most likely to be interested in reviewing a change. Gerrit allows saving time in browsing for individuals and contributes to the accuracy of the reviews.

New HTML e-mails interaction with Code Reviews

Say goodbye to long ugly diffs in e-mails, and now you can generate nice and tidy e-mails where all the details are carefully formatted for achieving maximum clarity. Additionally, you will be even able to reply in-line to the comments received via e-mail and Gerrit will be able to capture the message back and insert the feedback as code-review messages into the code carefully.

Robot comments

CI and automated systems can contribute comments to the code, without spamming the overall review discussion flow. That contributes to a much cleaner user-interface.
Google uses automated code analysis tools such as Tricium and ShipShape to capture the most common mistakes in coding and event providing the corresponding suggested fixes. All this feedback will be captured and integrated with Gerrit in the change comments GUI. The Change’s author will then be able to review the suggested fix and apply it with a simple click.

The management of robot commits will be fully operational with the new PolyGerrit GUI.

ElasticSearch Indexes

Instead of using the local Lucene Index, Gerrit would use an external ElasticSearch cluster to achieve maximum scalability and share the index across multiple nodes. That is one of the cornerstone feature needed for a multi-master setup.

A lot of time has been spent in making sure that the ElasticSearch support was aligned with the current Lucene implementation, but unfortunately, it took until v2.14 to fill the gap. Still, there are parts of the index not yet implemented in ElasticSearch, such as the on-line reindexing. Hopefully, the v2.14 will see all those gaps resolved and enable a real multi-master shared index setup.

Atomicity with Change-Sets

Basavaraj Karadakal from Juniper Networks presented a real-life implementation of the “submit whole topic” into his organization. We always said that “topics” require some company-wide naming policy to avoid conflicts and unexpected behaviors: Juniper found the solution by introducing a new concept, the “changeset.”

The name isn’t new to people that have been in the SCM long enough to remember IBM Rational ClearCase. Interestingly Basavaraj used that term, possibly to facilitate the understanding and adoption into the company.

The idea behind change set is to have a single change review “leading” the topic submission, stored in a “tracking repository”. One can understand the list of changesets and their status by querying the “tracking repository”. This feature is fascinating because it allows obtaining a full list of “open topics” inside Gerrit, which isn’t currently available as “out of the box” feature.

The overall implementation of change sets is composed of a set of client wrapper scripts and server-side plugin to orchestrate the review status of the changeset based on the review status of each commit included.

We do foresee a lot of feedback in the community around changesets and topics: the discussion started a couple of years ago and everyone agreed that “something was missing” in the current Gerrit topic concept … possibly the “change sets” are the missing piece of the jigsaw to make the “submit whole topic” a feature ready for production use.

Pull Requests in Gerrit

Since v2.13, Gerrit allows full reviews of merge commits. However, the workflow isn’t visible:

  • create a feature branch feature/abc
  • commit changes directly to the feature/abc branch, without requesting any review
  • once the branch is ready for review, merge into its target branch locally and push the merge commit to refs/for/target-branch

Eryk Szymanski, CollabNet, has presented an alternative GUI to make this scenario, typical of GitLib, GitHub, and BitBucket, more accessible for novice users.

The new Pull-Request GUI is based on two components:

  1. REST API, different from Gerrit’s native ones
  2. Pull-Request GUI, not integrated with Gerrit’s GWT or PolyGerrit but available as components of CollabNet’s TeamForge product.

Gerrit Analytics

Luca Milanesio from GerritForge introduced two new plugins in Gerrit 2.14:

  • Kafka stream events: to consume stream events and send them to topics
  • Project analytics: to crunch data from Gerrit repositories to expose aggregated contributors statistics

The vision of Luca Milanesio, GerritForge, is splitting Gerrit statistics into three dimensions:

  • People. What is the list of contributors, maintainers, reviewers? What do they do, how often they commit and which ones are the most active influencers?
  • Projects. How repositories are evolving over time and space and what do they need moving forward regarding storage and performance?
  • Systems. How is Gerrit behaving concerning CPU, Memory, and Network utilization?

The correlation of the metrics collected in these three dimensions allows putting together the KPI that allows understanding better how Gerrit is used and how to improve its current and future performance.
Additionally, you could use them to reward and promote good practices of code-review and fuel continuous innovation across the teams.

In one sentence, Gerrit Analytics will allow to “uncover the hidden value of your Gerrit Code Reviews”.

Plugins for CI Systems

Martin Fick from Qualcomm Innovation Center gave an overview of three new plugins developed for use by CI systems: the batch, manifest, and task plugins, with some modifications to the Gerrit core needed.

The batch plugin provides a mechanism for building and previewing sets of proposed updates to multiple projects/branches/refs that should be applied together. The focus of batch updates tends to be verification (by CI systems). The batch update service provides the tools to build refs by merging changes to temporary “snapshot” refs, which can then be tested extensively, and finally submitted ”as is.”
That represents a different approach to resolving the “topic submission” problem: Martin’s plugin would work for all Gerrit versions from v2.7.

The manifest plugin provides server-side utilities to operate on, and query information about repo manifests (XML) stored in git projects on the current server. This plugin provides APIs to update values in manifests, and to search for manifests with certain values.

The task plugin provides a mechanism to manage tasks which need to be performed on changes along with a way to expose and query this information. Tasks are organized hierarchically, and their definitions use Gerrit queries to define which changes each task applies to, and how to set the status criteria for each task. An important use case of the task plugin is to have a common place for CI systems to define which changes they will operate on, and when they will do so.

Gerrit CI Pipeline

Gerrit has finally a fully-feature CI Pipeline, thanks to GerritForge who invested time and resources to set up the https://gerrit-ci.gerritforge.com web-site.

During the Gerrit Hackathon 2015, Luca Milanesio, Doug Kelly and Khai Do have put together their ideas and skills to introduce an entirely automated and community-driven Jenkins set-up which allows to automatically generate the build jobs from the YAML files archived in the Gerrit-ci-scripts project.

The magic behind the system is that nobody directly controls the definition and execution of the jobs, but it is the collective contribution and code-review of the YAML files that automatically re-configure the Jenkins instance automatically.

Over 30K builds were executed last year, with an average of over 80 builds per day. Thanks to the Gerrit CI it is now possible to discover which plugins are available and stable on different branches of Gerrit and download the pre-built artifacts with a simple click.

What is still missing is an official “distribution site” of the Gerrit plugins; a place where successful plugin builds are installed on real Gerrit instances, tested and released. GerritForge is investing time and money to fill the gap and will provide a premium service called “Gerrit Central” available as an add-on to existing Enterprise Customers. The resulting fixes coming from this process of plugins validation will be contributed back to the community and will be accessible for everyone.

Gerrit Bazel build: a farewell to Buck

From v2.14 Gerrit is moving away from Buck and, after eight months of work of efforts for migrating the build system, David Ostrovsky has finally announced that the build is green and we are ready to move to Bazel, yeah.

Building Gerrit isn’t an easy task, with 35 repositories, 400K Java + 16K JavaScripts lines of code, over 140 dependencies and a complete JavaScript toolchain based on NodeJS. In addition to that, there are over 100 plugins automatically built with our CI system hosted and managed by GerritForge.

Gerrit started initially with Maven, but it was very slow and clumsy. After the London Hackathon in 2013 the project was successfully migrated to Buck, which promised to be much faster and flexible. However, this choice has driven the project build to a swamp of problems and bugs/fixes needed on top of Buck to make the build work.
As a result, a custom version of Buck needed to be built using Ant for potentially every branch of Gerrit development. The Gerrit project has submitted over 100 bugs to the Buck project over the three years of adoption.

Bazel is an OpenSource build tool based on an internal system used in Google and is much more advanced and stable, offering a simpler installation and an advanced sandboxing capability to isolate builds and results.
Differently from Buck, Bazel promises both speed and accuracy, qualities that were not always present at the same time.

During the Hackathon, David posted a milestone change named “Remove Buck build” which is currently under review. We all hope that the change will be merged very soon and give the final farewell to Buck build, and our thanks for having helped and supported us in our journey for over three years.

Gerrit reviews from e-mails

Patrick Hiesel from Google presented a very innovative feature that extends the ability to interact with Gerrit reviews by just answering to e-mails.

The feature is still in the very early stages of development  but the demo showed at the Summit was impressive. When you receive the e-mail notification from Gerrit with the comments of people to your code, you just answer in-line to the e-mail and … the magic happens! Gerrit fetches the e-mail automatically and understands the in-line replies as comments to be added to the reviews.

This new workflow dramatically reduces the cycle time of interactions during the review and represents IMHO the “killer feature” to push Gerrit to the top of the Code Review systems currently on the market.

Let me recap what you have to do currently with Gerrit:

1. Receive a notification e-mail
2. Read the comments
3. Click on the change link
4. Open a browser at the change page
5. Sign-in (assuming that you are using PolyGerrit, it would work on Mobile too)
6. Locate the comment you wanted to reply
7. Click on the comment
8. Click on Reply
9. Enter your Reply
10. Click save
11. Go back to the main change screen
12. Click on Reply
13. Click on Post

… and with the inbound e-mail ingestion in Gerrit it will be:

1. Receive a notification e-mail
2. Read the comments
3. Click on Reply
4. Insert your replies in-line
5. Send

There are issues, though, mostly related to security and trustworthiness of the e-mails as the vehicle for reviews.
We know that e-mails can be SPAMs or can be forged by faking source e-mail addresses. Currently, the prototype shown was only matching the e-mail sender to associate to the reviewer identity, which is fine for the majority of the cases but could raise some concerns regarding compliance.

The reviews coming from e-mails will be appearing with a different style and will be noticeable compared to more secure web-based comments. That would allow judging the comment accordingly and avoid over-reliance on their value.

It has been suggested as well to consider the integration of PGP-signed emails and the matching of the PGP-Key with the one stored in Gerrit for that user: that would be the perfect secure solution to avoid SPAMs or forged e-mails risks. However, Patrick did not commit to implementing the feature at this level for the initial MVP.

Inbound e-mail processing is a feature to keep under the radar as it will cause a wave of adoption of Gerrit because of the speed-up of entire Code Review lifecycle and consequently the time-to-market of the development of projects features.

Keep the conversation flowing with Gerrit at OpenStack

Khai Do, the maintainer of the Gerrit Review at OpenStack, has presented the challenges of managing comments with a large-scale project.

The OpenStack numbers for Code Review are astonishing:

  • Nine servers (1 master + 8 slaves)
  • 1500 repositories
  • 400K changes
  • 20K users
  • 700 events generated every hour

In addition to the scale of the installation, OpenStack allows third-party to connect to Gerrit using stream events and trigger validation builds. This workflow is code-named TCPI at OpenStack. There are currently 180 registered TCPI which generate *a lot* of messages appended to each change under review.

The overload of information associated with a change makes the “discussion flow” hard to follow in Gerrit: each change is swamped by CI messages which make the overall review unreadable.

Khai has developed a plugin that allows avoiding storing all those messages as regular reviews in Gerrit and collect them instead in a relational database. The overall result is then nicely rendered in a different section of the change, making the review discussion visible and flowing again.

The plugin works out of the box on the current version of Gerrit. Well done Khai.

Infinite Gerrit

Haithem Jarraya from GerritForge presented the idea of adopting a much more scalable storage for JGit based on Apache Cassandra. Shawn Pearce experimented the same scenario a few years ago and contributed an initial prototype on GitHub: piggybacking on the same concept, the new experiments are now driven by the need of scaling up Gerrit in space and number of nodes.

Gerrit Multi-Master is getting closer day-by-day, thanks to the joint efforts of all contributors from different points of view:

  • Sessions can be externalized to a flat-file to supports sharing across nodes (Qualcomm contribution)
  • Indexes and Caches can be aligned across nodes through events (Ericsson contribution)
  • ElasticSearch will be soon available as alternate shared indexing engine (Sony & CollabNet contributions)

Haithem now is adding one extra significant piece of the Jigsaw: the ability to have a scalable shared and extensible storage that allows all the master nodes of the cluster to read/write JGit Pack files at maximum speed. Additionally, thanks to Apache Cassandra multi-nodes data sharing and replication. You will be able to dynamically extend and reallocate the size and structure of the storage without impacting the current operations on the Gerrit Cluster.

The Cassandra backend wouldn’t cover the ref-database, which could be implemented with an entirely distributed implementation such as JGit Ketch (still in early development) or a more standard based on Apache Zookeeper.

Accounts and status of NoteDB

The task of migrating all the Code Review meta-data to the Git repository, aka NoteDB, started over two years ago. Edwin Kempin from Google, presented the current status of the implementation, with a focus on the Accounts migration.

In Gerrit v.2.13 there is already a new repository dedicated to the archiving of users, called All-Users. However, some of the information are still stored in Reviewdb, such as the external ids. Accounts will be exposed via REST-API and sensitive information encrypted end-to-end on the repository, using the same technique used for secure.config.

A new “magic ref” concept has been specifically forged for the All-Users use-case: the reference to their branches as refs/users/self. Each user will be able to get its meta-data by only fetching refs/users/self from the All-Users repository; easy isn’t it?

There is still one piece of information that will still stay outside NoteDB: the reviewed flag. The choice was made because of the inherent volatility of the information: NoteDB would have been overkill for a piece of information that lasts a few minutes or maximum hours typically. The default implementation will be H2 but it could be then replaced with something more “multi-master” aware.

What’s left? Gerrit Groups are the very last piece of the jigsaw and then NoteDB will be finally 100% feature complete. Bye bye DataBase, NoteDB is approaching fast!

Zero-Downtime Gerrit Upgrades

Luca Milanesio from GerritForge closed the Summit with a case-study on how GerritHub.io achieved a 100% zero-downtime migration process, with limited disruption to its 10K users.

Apart from Google, the current implementations of Gerrit Code Review around the world do not allow to keep the service running in case of upgrades between major releases. That was true until GerritHub.io recently introduced a well-testing roll-out mechanism inspired by the well known “blue-green” deployment.

The approach adopted by GitHub, in the case of DB upgrades, is to advise the customers up-front and schedule a planned outage of the system. Last time that this happened was last year and the blackout lasted for around an hour. However, GerritHub.io decided to be different and developed a strategy where the old version and the new version can work side-by-side by synchronizing Git repository and reviews, even across different versions of the DB schema and index.

The “preparation phase”, or Step-1 as Luca described it, may take even days or weeks to complete because relies on a low-bandwidth mirroring to avoid service disruption on the current user-base. Once the two instances are aligned, the subsequent steps will take only a few minutes.

During the “flipping” of the blue (old release) with the green (new release) instance, the users will notice that the repositories become read-only and receive a courtesy message saying “Upgrade is in progress, all projects are read-only. Please retry a few minutes later”.

The read-only “service degradation” would last only for a few minutes (2/5 minutes) and wouldn’t cause a service disruption for all the other users that are fetching, cloning or just browsing the reviews.

The Gerrit Hackathon

gerrithackathon-2016

Even though this year we had only two “hacking days,” the overall productivity has been remarkable: 158 changes created, with 82% of them merged during the Hackathon.

Bazel Build

Authors: David Ostrovsky, Luca Milanesio, Edwin Kempin, Han-Wen Nienhuys

Changes: https://gerrit-review.googlesource.com/#/q/project:gerrit+after:2016-11-10+before:2016-11-17+message:bazel

The summary of the changes is perfectly represented by David’s announcement to the Gerrit mailing list:

“I’m pleased to announce, that we are moving from Buck to Bazel build tool [1].
I’m working for a while on the transition now, and we are reaching the feature
parity for Bazel build toolchain.

I think that Bazel is a better tool for us and provides the following advantages:

* all actions are hermetic by design (no arbitrary python code execution)
* bootstrapping doesn’t require Ant
* can be installed on the main operating systems, including Windows
* better support for re-using of build rules across local and remote repositories
* switching to different branches wouldn’t mean rebuilding build tool
* extensible rules
* failed tests are not cached
* flaky tests are re-run multiple times
* error prone static analyzer integration out of the box

I would like to thank Shawn for supporting this project, and reviewers/contributors:
Han-Wen Nienhuys, David Pursehouse and Damien Martin-Guillerez from Bazel team.

Also big thanks to Luca and GerritForge for helping out with the CI and running a *lot* of tests.
Here is my related talk from Gerrit User Summit 2016: [2].

[1] https://bazel.build
[2] http://ostrovsky.org/gerrit/bazel-build-gerrit”

Allow plugins to define “has” search operands

Author: Martin Fick

Changes: https://gerrit-review.googlesource.com/#/c/91514/

Plugins can define new search operands to extend change searching.
Plugin methods implementing search operands (returning a
`Predicate<ChangeData>`), must be defined on a class implementing
one of the `ChangeQueryBuilder.ChangeOperandsFactory` interfaces
(.e.g., ChangeQueryBuilder.ChangeHasOperandFactory). The specific
`ChangeOperandFactory` class must also be bound to the `DynamicSet` from
a module’s `configure()` method in the plugin.

The new operand, when used in a search would appear as:
operatorName:operandName_pluginName

A sample `ChangeHasOperandFactory` class implementing, and registering, a
new `has:sample_pluginName` operand is shown below:

 @Singleton
 public class SampleHasOperand implements ChangeHasOperandFactory {
 public static class Module extends AbstractModule {
 @Override
 protected void configure() {
 bind(ChangeHasOperandFactory.class)
 .annotatedWith(Exports.named("sample")
 .to(SampleHasOperand.class);
 }
 }

@Override
 public Predicate<ChangeData> create(ChangeQueryBuilder builder)
 throws QueryParseException {
 return new HasSamplePredicate();
 }

Zoekt

Author: Han-Wen Nienhuys

Changes: https://gerrit-review.googlesource.com/#/q/before:2016-11-17+after:2016-11-10+project:zoekt

Han-Wen has officially announced at the Gerrit Hackathon, the general availability of the Zoekt code search engine.

“Hi there,

I’d like to announce “zoekt”, a search engine for source code I wrote over the last month or so. It was made to work well with (large) Git based repositories, so I think some of you may find it useful.

If you’re interested in code search, please try it out, and let me know about your experiences.

http://github.com/google/zoekt
http://gerrit.googlesource.com/zoekt

I didn’t do any work to try and integrate it with Gerrit/Gitiles, but
if someone wants to invest effort in making that happen, I’d be
happy to review CLs. “

ElasticSearch Index

Authors: Hugo Ares, Dariusz Łuksza

Changes: https://gerrit-review.googlesource.com/#/q/before:2016-11-17+after:2016-11-10+project:gerrit+message:elasticsearch

Small steps towards the goal of having ElasticSearch integrated as Gerrit secondary index:
– On-line reindexing (in-progress)
– Accounts indexing (in-progress)

Plugin examples

Author: Martin Fick

Changes: https://gerrit-review.googlesource.com/#/q/before:2016-11-17+after:2016-11-10+project:plugins/examples

Gerrit cookbook has become over the years a mix of different type of plugins functionality. That is good and bad at the same time. A typical “cookbook” should have a set of recipes for obtaining a final dish ready to eat … whilst our was more like a mixed salad with a bit of everything.

Martin started the noteworthy task of reorganizing the cookbook in a set of self-contained example, hopefully,  easier to manage and with a clearer understanding.

Cassandra Repository Manager

Author: Haithem Jarraya

Changes: https://gerrit-review.googlesource.com/#/q/before:2016-11-17+after:2016-11-10+project:libs/modules/repomanager/cassandra

Haithem started a new change (still in draft at the moment) for plugging an alternative JGit DFS implementation to the Gerrit repositories. It is the very first time that such a challenging goal is attempted outside Google data-centers and BigTable/GFS infrastructure. If we succeed in completing the implementation and demonstrating that the resulting architecture is more robust and scalable than the local file-system, it would be a first 100% OpenSource distributed JGit backend ever.

Gerrit LibModules

Author: Luca Milanesio

Changes: https://gerrit-review.googlesource.com/#/q/message:libModule+after:2016-11-10+project:gerrit

Luca discussed how to “plug” an external repository manager into Gerrit: it is something “similar” to a plugin but more fundamental to the system. Differently from a plugin, it needs to satisfy the following requirements:

  • Loaded at start-up time only
  • Cannot be unloaded
  • Must be injected in the primary Gerrit sys injector

It is not the first time that we need to introduce a “modular piece of software” which isn’t part of Gerrit Code Review:

  • Java CryptoProvider – BouncyCastle
  • Encrypted secure store for gerrit.config
  • Servlet filter for authentication and SSO
  • JDBC Interceptor for JavaMelody
  • Multi-hosted Git repository directories

Instead of creating “yet another exception” in Gerrit, we discussed amongst the hackathon members and agreed to come up with a new concept: LibModules.

The rationale behind the name is:

  • They need to be loaded from the same Gerrit class path from /lib
  • They provide extra Guice modules to be added to the sys injector at start-up

The configuration of a set of LibModules in Gerrit is straight-forward:

[gerrit]
installModule = com.mycompany.MyModule
installModule = com.googlesource.gerrit.modules.FancyModule

Now that the change is merged and the new concept has become part of Gerrit, we are expecting a growing number of LibModules to become available.

Removal of references to LocalDiskRepositoryManager

Authors: Hugo Ares, Shawn Pearce

Changes: https://gerrit-review.googlesource.com/#/q/before:2016-11-17+after:2016-10-11+project:gerrit+message:LocalDiskRepositoryManager

Shawn and Hugo started removing the explicit references in Gerrit to the local filesystem for resolving Git repositories. Those changes are needed to fully support an alternative implementation of a Repository Manager not based on a local filesystem anymore.

DynamicBeans for Gerrit Plugins

Author: Martin Fick

Changes: https://gerrit-review.googlesource.com/#/q/before:2016-11-17+after:2016-11-10+project:gerrit+message:DynamicBeans

Until now it has been possible to create new SSH commands or REST API through plugins, which is great but sometimes not enough. Martin came up with a use-case where they need to rather extend the current core commands, without having to necessarily change its code. How can you satisfy this use-case with plugins? Martin has the solution and is called “DynamicBeans”.

It is definitely an interesting use-case that is going to trigger a lot of interest and discussions amongst the Gerrit Community.

Analytics plugin

Author: Luca Milanesio

Changes: https://gerrit-review.googlesource.com/#/q/after:2016-11-10+project:plugins/analytics

One of the things that people are missing in Gerrit, when comparing it to GitHub, is the overall repository statistics:

  • List of contributors with number of commits / adds / remove
  • Project’s statistics of number of commits over time

The new analytics plugin extends the Gerrit project’s SSH and REST API to expose aggregated data extraction. It is extremely fast compared to GitHub because it works purely in streaming and doesn’t actually load the data in memory. The amount of data generated by the analytics can be huge and is intended to be used by an external analysis and dashboard system. Differently from the typical Gerrit’s API, the returned result has one item per line instead of a big unique JSON object. The rationale is that to generate and process a set of JSON elements in streaming, you need to make each record self-contained and processed independently. That would allow as well batch-processing with a parallel system, without having to parse the entire file content all at once.

Example of how to extract the list of contributors out of a Gerrit project via SSH:

$ ssh -p 29418 admin@gerrit.mycompany.com analytics contributors

{"name":"John Doe","email":"john.doe@mycompany.com","num_commits":1,"commits":[{"sha1":"6a1f73738071e299f600017d99f7252d41b96b4b","date":"Apr 28, 2011 5:13:14 AM","merge":false}]}
{"name":"Matt Smith","email":"matt.smith@mycompany.com","num_commits":1,"commits":[{"sha1":"54527e7e3086758a23e3b069f183db6415aca304","date":"Sep 8, 2015 3:11:23 AM","merge":true}]}

Wrap-up and Summit in 2017

It has been a very intense and revolutionary Gerrit User Summit & Hackathon this year, with a lot of new faces, new contributors and new features coming in the next forthcoming releases.

At the summit, Shawn Pearce and Luca Milanesio announced the proposal to organize the event in Europe for next year, possibly in June/July or Oct/Nov 2017. The reason for changing the location is to attract the feedback from more Gerrit Users that have not been heard so far and increase the participation of the community in the future of the of Project.

The summit has been already endorsed by the major organizations driving the Gerrit Code Review Project and other very active sponsors of the OpenSource Community, including CloudBees, CollabNet, eSynergy, Google, GerritForge, SAP and SkillsMatter which has offered to host the event at CodeNode.

Shawn Pearce, the Gerrit Code Review project founder, will send in the next few weeks a Survey to capture the opinion of this proposal from the Gerrit Community.

See you in 2017 for new and amazing features coming on Gerrit Code Review.

Luca Milanesio – GerritForge.

 

 

 

 

 

 

Gerrit User Summit 2016: Ten days to go

gerritforge-keepcalmandcodereview-copy

In ten days time, the 2016 Gerrit User Summit will open its doors at the Googleplex Campus in Mountain View – CA.

It is going to be an amazing event, with a lot of exciting updates from the Gerrit Community, thanks to the number of innovations that are coming into the platform, from large files support (Git LFS), HTML5 Polymer-based new UX, NoteDB, multi-master updates and a lot more.

GerritForge will be present with five attendees and six amazing talks, covering many of the aspects of the world of Code Review and Continuous Delivery Pipeline:
Code Review Analytics
High Availability and Zero-Downtime upgrades
Infinite scalability with Gerrit on Apache Cassandra
Jenkins 2.0 Continuous Delivery pipeline for Gerrit
Speeding up builds with Bazel on Gerrit
Robot feedback management for Gerrit

Once again, as we always did in the past, we continue to fuel new ideas and innovation to the world of Code Review, with the feedback from the Gerrit User’s Community and developing the ideas as OpenSource, always.

See you soon at the Gerrit User Summit 2016.

Gerrit Summit 2016 is coming

google

Four weeks from now, the eighth edition of the Gerrit User Summit will open its door at Google HQ in Mountain View – CA, 12th-13th of November 2016.
It has been a long journey since the first GitTogether in 2008, and after the split between the Git[Hub Universe] summit and the traditional “unconference” style Gerrit event at Google’s, things have changed quite a lot. While Gerrit remained a 100% OpenSource user-centric project, GitHub has attracted $350M in VC, and they have been losing traction over the years to join the unconference-style events.

What’s new this year?

For the first time, the proposals of talks to the Gerrit User Summit are submitted in Gerrit directly (yeah!) on the summit/2016 repository.

The list of currently approved talks is available by searching for “status:merged project:summit/2016” (https://gerrit-review.googlesource.com/#/q/status:merged+project:summit/2016)
The talks awaiting review are under “status:open project:summit2016” (https://gerrit-review.googlesource.com/#/q/status:open+project:summit/2016)

How cool is that? I foresee already a Doodle plugin for Gerrit 😉

How to register for the User Summit?

Shawn Pearce has prepared a Registration Form for you to sign-up to the event:
https://goo.gl/forms/oeEnQweHl2noNSnn1

Once you access the Registration Form at the above URL, you need to sign-in with your Google Account credentials and then complete the following information:
– Your name
– Your Organisation
– Your previous attendance to the user summit
– Any dietary restrictions

The User Summit is FREE for EVERYONE, including novice users of Git and Gerrit Code Review, but you would need to register beforehand.
The Summit is a unique opportunity to learn about Gerrit new feature, contribute to the product roadmap with your needs and requirements and, most of all, network with other users to learn new use-cases where Gerrit can be very helpful.

How to submit my talk proposal?

Well, you need to demonstrate a good understanding and use of Gerrit Code Review if you want to teach and talk to other people about it! At the end of the day, if you want to talk about Gerrit you should be able to clone a repository and submit a patch to a project 🙂

If you need just a little help … see my “Diffy super super talk” example:

$ git clone https://gerrit.googlesource.com/summit/2016 && (cd 2016 && curl -Lo `git rev-parse --git-dir`/hooks/commit-msg https://gerrit-review.googlesource.com/tools/hooks/commit-msg ; chmod +x `git rev-parse --git-dir`/hooks/commit-msg)
$ cd 2016
$ cat - > sessions/my-amazing-talk.md
# My amazing talk at Gerrit User Summit

Hi folks, this is my super-duper-talk. You should be interested in it as I will unleash the dark force of Code Review Diffy Kung Fu Review Cuckoo.

*Diffy, Birds & CO. Inc.*
^D
$ git add sessions/my-amazing-talk.md && git commit -m "Diffy super-duper talk"
$ git push origin HEAD:refs/for/master

Talks highlights.

There are already some fascinating talks submitted and approved and more will undoubtedly come in the next couple of weeks. We will start sharing some highlights of what’s happening at the conference. Here is the overview of the first talks.

What’s new in Gerrit 2.12 and 2.13

Two major versions of Gerrit have been released since the last summit in 2015, and they contain significant improvements to the platform:

  • Topic submission workflow – aka Git commits across repositories  (v2.12).
    Group multiple changes in a “topic” and having them merged as a whole, even across multiple repositories, in a single submit operation.
  • GPG signed pushed verification (v2.12).
    Allows people to upload their GPG public keys into Gerrit and have them used to verify Git signed commits.
  • Large File Storage support (Git LFS) (v2.13).
    Gerrit finally supports the automatic management of large files outside the Git repository. The feature is fully pluggable and exposed via plugins. Amazon S3 and Local file system support are available at the moment, but more plugins are here to come on this feature.
  • Gerrit metrics (v2.13).
    Expose the internal metrics to external consumers. The feature is exposed for plugins to gather this data and send to external systems for analysis and visualization purposes. Graphite, ElasticSearch, and JMX plugins are available.
  • Hooks plugin (v2.13).
    Finally, the Gerrit hooks mechanism have been entirely externalized and implemented in a pluggable way. The legacy hooks have become a core plugin. However, you can now leverage the new extension to develop a new-generation of hooks by leveraging the new extension points provided.
  • New HTML5 UX with WebComponents – PolyGerrit preview (v2.13).
    The next generation of Gerrit UX based on Polymer Web components is available. Even though not complete, offers a sneak preview of what the new interface looks like and, if you like it as-is and is good enough for your use-cases, you can enable and start using it already. Both GWT and Polymer-based UX are using the same REST API, and thus the changes generated and reviewed with them are 100% interoperable.

There is more to come.

In the next few days we will keep on publishing the highlights of the topics coming at the Gerrit User Summit this year, stay tuned and REGISTER NOW at:
https://goo.gl/forms/oeEnQweHl2noNSnn1

The GerritForge Team.

Gerrit Upgraded with No Downtime

Screen Shot 2016-03-21 at 20.48.12

Zero DownTime success story.

From today at 08:06 GMT GerritHub users are served by our brand new infrastructure geo-located in Canada, Quebec, Beauharnois. It is the first time we applied a zero-downtime roll-out scheme, the PingDom uptime for the past 24h reported 100% uptime and 688 msec average response time for the page of the list of opened changes. The two response times spike on the above graphs are actually due to the old German infrastructure and happened before the start of the roll-out.

We can see the switch of the traffic to the new infrastructure from the increase of the overall response time (IP packets were routed from Germany to Canada causing extra hops); as the DNS propagation was spreading across the world, the overall number of hops gradually came back to normal.

Timeline of the events.

  • 08:00:00 GMT – Phase 1 – Set Gerrit READ-ONLY. All changes and Git repositories started to refuse push and updates.
  • 08:00:01 GMT – Phase 2 – Wait for pending replication to complete. Replication queue was empty; there was no need to wait.
  • 08:00:02 GMT – Phase 3 – Mirror DB and Git for the last time, delta-reindex, DB upgrade and Gerrit restart. It has been the longest part of the roll-out and lasted 5′ 32”, aligned with our estimates.
  • 08:05:34 GMT – Phase 4 – Cache warm-up. 20K projects, 8K accounts and 4.6K groups were pre-loaded in Gerrit. This step was optional but allowed us to redirect all the traffic without risks of causing thread spikes on the new infrastructure.
  • 08:06:23 GMT – Phase 5 – Redirect traffic to the new infrastructure.

Did anybody notice the rollout?

During the rollout the Git projects and Gerrit changes were read-only for 6′ and 23”. According to the logs, 493 Git/HTTP and 172 Git/SSH invocations were made and completed successfully: none of them failed.

What is the situation right now?

The new infrastructure public IP (192.99.233.76) has almost completed his DNS propagation around the world, the only countries not entirely covered are Australia and China. The rest of the world is coming directly to Canada avoid the German hops. Metrics are good, low CPU utilization and threads consumption compared to the old German infrastructure, symptom of the reduction of the execution and serving times and latency.

What’s next?

From now on we will continue to use this Blue/Green roll-out strategy, possibly improving in the ReadOnly window by introducing live distributed reindex and cache warm-up.

We fully commit to Zero-Downtime and Stability, the most valuable assets for our clients.

GerritHub and Zero-Downtime Upgrade

GerritHub gets bigger on Mon, 21 March 08:00 GMT


GerritHub has experienced unprecedented growth over the past two years. The November 2015 numbers presented at the Google User Summit in Mountain View – CA have been surpassed again, and we do need to make sure that our infrastructure is still capable of dealing with current and future users’ needs.

What is changing in GerritHub.io?

We are changing everything, from the version of Gerrit to the hardware, network and storage infrastructure. Data, DBMS, Indexes and cache, need to be upgraded and refreshed to make sure that the new systems are reflecting exacting the current production data and sessions.
We are changing as well the geo-location of our servers, from the current server farm in Germany (Bayern, Nuremberg – 100 MBps) to a new server farm in Canada (Quebec, Beauharnois – 1 GBps).

Why have so many changes?

We started to measure some significant delay in the Git and review operations on the old infrastructure, mainly due to three factors:

  1. More users, more repositories, more concurrency. Individuals, OpenSource projects and Businesses started using GerritHub.io for their mission-critical repositories, considering Gerrit the “source of truth” of their review workflow. We needed more horsepower, memory, storage and ability to scale even further.
  2. Bandwidth from USA and Far-east. The majority of people using GerritHub.io are from the other side of the Atlantic Ocean: this is typically not a problem from 7 AM to 3 PM … but after 4 PM the connectivity between Europe and the Americas becomes slow. Additionally, people using GerritHub.io from India, Japan, Australia and New Zealand experienced terrible slowdowns because of the excessive number of hops to reach Germany.
  3. Gerrit master is much faster. Based on the current data and metrics measured on GerritHub.io, we have contributed a lot of patches to reduce the overhead caused by Gerrit DB and lessen the number of SQL queries per minute. All those new improvements are on Gerrit master, and we need to catch-up with the “latest and greatest” version.

Will I experience any GerritHub.io outage?

Last time that GitHub needed to make a major upgrade, asked his 5M users to stop working for 23 minutes,. This translates to a loss of two millions of hours of continuous delivery lifecycle, equivalent to over 130 man/years, worth no less than eight millions dollars.
We are going to adopt a new Zero-Downtime Gerrit roll-out strategy to make sure that all those changes are not going to impact your day-by-day activity. If you were not reading this post you would possibly even not notice the “switch” from the old to the new infrastructure, apart from the increase in speed and bandwidth.

Zero-downtime GerritHub.io migration, step by step with the associated expected timings.

Phase 0 – Replication to the new Gerrit infrastructure. (- 1 month ago)
We started migrating everything one month ago, and the old and new infrastructure are working side-by-side, thanks to Gerrit master-slave replication. The new Gerrit servers are active as slaves and are read-only.

Phase 1 – Migration kick-off. (08:00 GMT)
We install a Gerrit plugin that rejects all the push to GerritHub.io repositories providing a courtesy message: “Gerrit is under maintenance, all projects are READ ONLY”. All the HTTP POST, PUT and DELETE are disable on the Gerrit REST-API.

Phase 2 – Wait for replication events to complete and migrate DB. (08:02 GMT)
Git repositories are continuously replicated, but we do need to make sure that the event queue is empty. Once that happens we schedule the last final DB migration to the new infrastructure.

Phase 3 – Gerrit DB upgrade and reindex (08:04 GMT)
New Gerrit server executes the final upgrade and off-line reindex of the latest received changes.

Phase 4 – Gerrit start-up and cache warm-up (08:05 GMT)
New Gerrit is restarted and the most critical Gerrit caches (projects, accounts and groups) are pre-loaded in memory. This allows the incoming traffic spike to avoid the collapse of used threads and makes the transition as smooth as possible without slowdowns.

Phase 5 – Traffic switch and DNS updates (08:06 GMT)
GerritHub.io redirects all incoming HTTPS and SSH traffic to the new infrastructure. Git pushes and HTTP PUT, POST and DELETE operations of the REST API are operational again and served by the new Gerrit infrastructure. GerritHub.io DNS is updated to the new Canadian IPs.

Phase 6 – New IPs gets propagate to all worldwide DNSs (+ 1 day)
Once all the DNSs in the world would have been updated, everyone will start going directly to the new infrastructure without further hops or redirection from Germany. Customers from USA, Canada, South America, Asia, Japan, New Zealand and Australia should see a significant reduction of the network latency and increase of GerritHub.io responsiveness.

Firewall and SSH considerations

Even if Gerrit server’s SSH key is not changing, some of you may see a warning similar to this when they push or pull over SSH:

Warning: the RSA host key for ‘review.gerrithub.io’ differs from the key for the IP address ‘148.251.77.70’

The warning message will also tell you which lines in your ~/.ssh/known_hosts need to change. Open that file in your favorite editor, remove or comment out those lines, then retry your push or pull.

Should your network have some strict firewall rules to access external sites, you may want to whitelist the IP of the new infrastructure WLB to: 192.99.233.76.

Follow GerritHub.io migration progress.

We will advertise the migration progress on Twitter at @GitEnterprise. Should you have any issue you can tweet us or contact GerritForge Customer Support at support@gerritforge.com.

GerritForge helps Gerrit 3.0 stability

Gerrit 3.0 plan announced: we need stabilisation now

Screen Shot 2015-11-23 at 09.52.19

Gerrit 3.0 plan and its NoteDB reviews have been officially announced at the Gerrit User Summit 2015. It is already available as an experimental feature in the current Gerrit master but it needs much more stability in order to be officially supported for production.
GerritForge decided to help and reuse its existing continuous integration system to validate every Gerrit patch set against the current and the new NoteDB review persistence back-end in order to avoid regressions during the 2.13 and 3.0 development

Pre-commit validation by GerritForge CI

If you have posted a patch to gerrit-review.googlesource.com in November, you may have hopefully received a Verified+1 from a strange user with a Diffy logo on the side.
GerritForge’s provided CI on gerrit-ci.gerritforge.com fetches automatically every patch-set pushed to gerrit-review.googlesource.com and triggers a slightly modified Gerrit build with the purpose of checking whether the code change introduces a regression or not. This may seem at first sight a quite normal Gerrit to Jenkins job integration, however implementing it on top of Google’s multi-master replicated installation was not a piece of cake.

Gerrit Trigger plugin limitations on multi-master setups

Jenkins has already an out-of-the-box integration with Gerrit provided by the Gerrit Trigger plugin maintained by Robert Sandell – Cloudbees. It leverages the Gerrit stream events through an SSH channel and make use of Gerrit REST API to action them according to the build result.
The Google’s Gerrit setup, however, is not a trivial one-node installation and is further limited by the security constraints of the Google infrastructure, which does not allow any incoming SSH connectivity.
Additionally all concept of “getting the events in a stream” isn’t going to work when events can come concurrently from multiple places at the same time: who is going to define the “global ordering” and how to put all those events in a single TCP/IP Socket? Even UDP would not work in this case because SSH channel requires confidentiality between two and only two peers.

Alternatives to SSH

During the hackathon, other approaches have been discussed by Shawn Pearce, including the use of HTTP WebSockets (or Cometd) for fetching events without the need of an SSH connection. Events are still distributed and generated by multiple masters all the time, and the Jenkins plugin would then have the onus of contacting all the Gerrit servers and keep a connection opened to all of them. This is clearly not going to work because the number of servers, their IPs and locations may change at any time and the solution would eventually be in danger of losing precious events.

Back to polling

The only solution we envisaged was to fall back to a polling logic where Jenkins ever 10 minutes is asking Gerrit “what’s new since last time we spoke?”. This solution goes against the main reason the Gerrit Trigger plugin was designed: avoiding SCM polling. It is, however, a much better and optimised polling strategy and let’s see why.

Query and then fetch

The typical Git SCM polling relies on fetching all references every poll interval and detect if new Git commits are available. This is notably slow and generates a huge overhead on the Git server. The approach we took is quite different and makes use of the Gerrit search capabilities that are way faster and more powerful than a simple Git fetch.
Jenkins first ask Gerrit the list of changes and associated commit-IDs involved in any event since the last polling time: the result may include patchsets that have been already built to avoid having any gaps between polling intervals. The search is fast and implemented in … you know, Google is a search company isn’t it?
Once the list of candidate commit-ids is identified, Jenkins goes through all of them and checks using the Gerrit REST-API:
– has it been build during my previous execution?
– has it been already accepted (or rejected) by me?
The Commit-IDs that results as not being checked before and not yet validated are then used to trigger a specific job parametrised on:
– Specific branch
– Specific change ref-spect
Fetching is performed avoiding any wildcard and the corresponding load on the Git server is minimum. Fetch (Git protocol) + build (using Buck) + test (unit + integration) + review feedback (REST API) is taking an average of 5 minutes, which is an amazing result if you consider the size of the Gerrit project and the typical slow speed of a default Jenkins Git fetch.

The bottom line

Using the query + fetch approach, which seemed a bit slow and old-fashioned at the beginning, was eventually very simple and successful. Instead of setting up SSH hostkey verification, key exchange and ad-hoc channels, the only configuration needed is a valid Gerrit user and the HTTPS endpoint URL, the same used for cloning the code.
The solution is much more reliable as SSH channels are notably unstable and consume server threads. The only drawback is the slight delay between the patch-set upload the start of the build (at max 10 minutes) which is acceptable in most cases.
Results
Since its roll-out more than 1200 patches have been checked and rated, a lot of potentially Gerrit regression avoided and more importantly we have prevented the NoteDB code to start diverging regarding stability from the current mainstream development.

How can re-trigger validation for a single change?

We have enabled anyone to trigger ad-hoc executions of the Gerrit validation flow using the following URL:
https://gerrit-ci.gerritforge.com/job/Gerrit-verifier-flow/build
This is a standard Jenkins parametrized build that request the change-id to be built, as either SHA1 or number. Once the job is triggered the build will be executed and the validation feedback applied to your change, regardless of the previous build or validation status.

GerritHub user-controlled GitHub Scopes

Nowadays people are very careful about privacy and user data: nobody grants access to their profile without checking first the possible consequences.
We want to give the user the ability always to know and control what level of access is given to their data: that’s why we improved the way you login in GerritHub.io.

GitHub scopes: what is it?

GitHub provides the authentication and access to user’s profile using a protocol called OAuth 2.0. When GerritHub is requesting a user to authenticate is then granted a set of permissions to operate on behalf of the user on their GitHub resources, which include:

  • User’s personal data (name, e-mails)
  • User’s membership to organisations and teams
  • User’s repositories

The set of permissions to access and operate on your data is also known as “Scope” in GitHub terms.

How is GerritHub helping me to control my access?

GerritHub has from today a new “Scope Selection” screen with two main objectives:

  1. Displaying your current scope and associated rights GerritHub has on your GitHub profile
  2. Giving you the ability to switch to a different “Scope” and consequently the rights that GerritHub has on your profile data

Screen Shot 2015-10-08 at 10.16.53

Transparency is good, but what is the practical added value?

There has been in the past a common complaint about GerritHub having too much or too little access to your GitHub profile:

  • Too much? Why GerritHub.io needs access to my e-mail address? Why does GerritHub need to see my public keys?
  • Too little? Why does GerritHub not show my private repositories in the import screen? How can I see my organisation membership in GerritHub project security screen?

With the ability now to visualise and change the current “Scope”, people can be more aware of why things are not showing up. They can make conscious decisions about how to change them with full transparency on the associated implications.

A common scenario: importing and accessing private GitHub Organisations, Teams and Repositories.

When you need to import an existing private GitHub project, you need to access information that is not publicly available:

  • Your membership to a private organisation
  • Your ownership of a Team structure
  • Ability to clone and push your private organisations’ repositories

There is now a special information box suggesting that you have the ability to change your “Scope” if you don’t see the organisations and repositories you want to import.

Screen Shot 2015-10-08 at 10.22.12

After changing the scope, you can then log in again and you will have an improved set of options to get more data and repositories from your GitHub account.

Like it? Will you use it on a daily basis?

We are eager to get your feedback on this new feature: Tell us what you think and let us know what you would change or add to the set of “Scope” permissions.

Gerrit Code Review and Jenkins Continuous Delivery Pipeline on BigData

Gerrit at the Jenkins User Conference 2015 – London

For the very first time, CloudBees organised a full User Conference in London and we have been very pleased to speak to present a real-life case-study of Continuous Integration and Continuous Delivery applied to a large-scale BigData Project.

See below a summary of the overall presentation published on the above YouTube video.

The trap of the BigData production phase

BigData has been historically used by data scientists in order to analyse data and extract  features that are relevant for the business. This has typically been a very interactive process happing mostly on “notebook-style” environments where almost everything, from ad-hoc queries and graphs, could have been edited and executed interactively. This early stage of the process is typically known as “exploration” or “prototype analysis” phase. Sometimes last only a few days but often is used as day-by-day modus operandi.

However when the exploration phase is over, projects needed to be rewritten or adapted using a programming language (Scala, Python or Java) and transformations and aggregations expressed in jobs. During the “production-isation” phase code needs to be properly written and tested to be suitable for production.

Many projects fall into the trap of reducing the “production phase” to a mere translation of notebooks (or spreadsheets) into Scala, Java or Python code, relying only on the manual analysis of the resulting data as unique testing methodology. The lack of software engineering practices generates complex monolithic code,  difficult to maintain, to understand and thus to validate: the agility of the initial “exploration” phase was then miserably lost in the translation into production code.

Why Continuous Delivery on BigData?

We have approached the development of BigData projects in a radically different way: instead of simply relying on existing tools, often not enough for setting up a proper Agile Delivery Pipeline, we introduced brand-new frameworks and applied them to the building blocks of a Continuous Delivery pipeline.

This is how Stefano Galarraga wrote started the ScaldingUnit project, aimed in de-composing the development of complex Scalding MapReduce jobs in simple and testable units.

We started then to benefit from the improved Agility and speed of delivery, giving constant feedback to data-scientists and delivering constant value to the Business stakeholders during the production phase. The talk presented at the Jenkins User Conference 2015 is smaller-scale show-case of the pipeline we created for our large clients.

Continuous Delivery Pipeline Building Blocks

In order to build a robust continuous delivery pipeline, we do need a robust code-base to start with: seems a bit obvious but is often forgotten. The only way to create a stable code-base,  collectively developed and shared across different [distributed] Teams, is to adopt a robust code review lifecycle.

Gerrit Code Review is the most robust and scalable collaboration system that allows distributed teams to submit their changes and provide valuable feedback about the building blocks of the BigData solution. Data scientists can participate as well during the early stage of the production code development, giving suggestions and insight on the solution whilst is still in progress.

Docker provided the pipeline with the ability to define a set of “standard disposable systems” to host the real-life components of the target runtime, from Oracle to a BigData CDH Cluster.

Jenkins Continuous Integration is the glue that allowed coordinating all the different actors of the pipeline, activating the builds based on the stream events received from Gerrit Code Review and orchestrating the activation of the integration test environments on Docker.

Mesos and Marathon managed all the physical resources to allow a balanced allocation of all the Docker containers across the cluster. Everything has been managed through Mesos / Marathon, including the Gerrit and Jenkins services.

Pipeline flow – Pushing a new change to Gerrit Code Review

The BigData pipeline starts when a new piece of code is changed on the local development environment. Typically developers test local changes using the IDE and the Hadoop “local mode” which allows the local machine to “simulate” the behaviour of the runtime cluster.

The local mode testing is typically good enough for running unit-tests but often is unable to detect problems (e.g. non-serialisable objects, compression, performance) that are likely to appear in the target BigData cluster only. Allowing to push a code change to a target branch without having tested on a real cluster represent a potential risk of breaking the continuous delivery pipeline.

Gerrit Code Review allows the change to be committed and pushed to the Server repository and built on Jenkins Continuous Integration before the code is actually merged into the master branch (pre-commit validation).

Pipeline flow – Build and Unit-tests execution

Jenkins uses the Gerrit Trigger Plugin to fetch the code currently under review (which is not on master but on an open change) and triggers the standard Scala SBT build. This phase is typically very fast and takes only a few seconds to complete and provide the first validation feedback to Gerrit Code Review (Verified +1).

Until now we haven’t done anything special of different than a normal git-flow based continuous integration: we pushed our code and we got it validated in Jenkins before merging it to master. You could actually implement the pipeline until this point using GitHub Pull Requests or similar.

Pipeline flow – Integration test automation with a real BigData Cloudera CDH Cluster

Instead of considering the change “good enough” after a unit-test validation phase and then automatically merging it, we wanted to go through a further validation on a real cluster. We have completely automated the provisioning of a fully featured Cloudera CDH BigData cluster for running our change under review with the real Hadoop components.

In a typical pipeline, integration tests in a BigData Cluster are executed *after* the code is merged, mainly because of the intrinsic latencies associated to the provisioning of a proper reproducible integration environment. How then to speed-up the integration phase without necessarily blocking the development of new features?

We introduced Docker with Mesos / Marathon to have a much more flexible and intelligent management of the virtual resources: without having to virtualise the Hardware we were able to spawn new Docker instances in seconds instead of minutes ! Additionally the provisioning was coordinated by the Docker Build Step Jenkins plugin to allow the orchestration of the integration tests execution and the feedback on Gerrit Code Review.

Whenever an integration test phase succeeded or failed, Jenkins would have then submitted an “Integrated +1/-1” feedback to the original Gerrit Code Review change that triggered the test.

Pipeline flow – Change submission and release

When a change has received the Verified+1 (build + unit-tests successful) and Integrated+1 (integration-tests successful) is definitely ready to be reviewed and submitted to the master branch. The additional commit triggers the final release build that tags the code and uploads it to Nexus ready to be elected for production.

Pipeline flow – Rollout to production

The decision to rollout to production with a new change is typically enabled by a continuous delivery pipeline but manually operated by the Business stakeholders. Even though we could *potentially* rollout every change, we did not want *necessarily* do that because of the associated business implications.

Our approach was then to publish to Nexus all the potential *candidates* to production and roll-them-out to a pre-production environment, ready to be assessed by Data-Scientists and Business in real-time. The daily job scheduler had a configuration parameter that simply allowed to “pointing” to the version of the code to run every day. In this way whatever is deployed to Nexus is potentially fully working in production and rollout or rollback a release is just a matter of changing a label in the daily job scheduler.

Summary

Building a Continuous Delivery Pipeline for BigData has been a lot of fun and improved the agility of the Business in rolling out changes more quickly without having to compromise on features or stability.

When using a traditional Continuous Integration pipeline, the different stages (build + unit-test, integration-tests, system-tests, rollout) are all happening on the target branch causing it to be amber or red at times: whenever tests are failing the pipeline need to be restarted from start and people are blocked.

By adopting a Code Review-driven Continuous Integration Pipeline we managed to get the best of both worlds, avoiding feature branches but still keeping the ability to validate the code at each stage of the pipeline and reporting it back to the original change and the associated developer without to compromise the stability of the target branch or introducing artificial and distracting feature branches.

Resources

The slides of the talk are published on SlideShare.

All the docker images used during the presentation are available on GitHub: