Gerrit User Summit: Script plugins with Docker

My name is Luca Milanesio, and I work for GerritForge. My talk today is about plugins and how to create them using scripting languages.

Gerrit plugins, where it all began

My contribution to the Gerrit Code Review project started in 2011 with the introduction of plugins. To understand where we are coming from we need to back to those times when the project was just born a year earlier. Gerrit was mighty since its very beginning, and different companies that used and contributed to the tool had tailored the code base to their specific needs. When I joined the GitTogether conference in 2011, almost every user was talking about their fork of Gerrit. Forking is excellent especially in OpenSource because you can customise a project as much as you want and, we were all excited about the growing popularity of GitHub and forking was a popular concept. However, keeping a fork up-to-date is not as easy as you may initially envision. Moving on with the upstream releases is hard when you are working on a fork.

Back in 2011 when I was at the conference, I thought: “how can the Gerrit project evolve and grow if we are all working on forks?”. My way to convince the Gerrit Community to change that status quo was inviting Kohsuke Kawaguchi, the Jenkins CI project founder, to the summit. Jenkins CI is wholly based on plugins while the core does not do much: the plugins are making the whole thing work as a CI.

That was enough to convince the community that a change was needed and, during the next Hackathon in 2012, I wrote the initial version of the Gerrit plugin loader and the first “Hello world” Gerrit plugin was born.

The introduction of scripting languages

After two years, Gerrit had only 50 plugins. If you had looked at Jenkins, at that time they had over 600 plugins, ten times as many plugins compared to Gerrit. Writing a new plugin for Gerrit was still too hard for most developers and administrators.

To develop a new Gerrit plugin you needed to know way too many things and have many skills: a different build system (Buck and now Bazel), having a full development environment and all the required dependent packages.

Screen Shot 2017-11-21 at 09.38.07.png

We still had new plugins because some people went through the initial pain of setting up the environment. However, for a project to thrive, you need to get people together and embrace a diversity of skills to allow people to give the best of their knowledge.
Maybe the typical Gerrit admin is not a Java Developer, possibly could be more familiar with Groovy because the Ruby syntax is used a lot of DevOps tools. Others are more familiar with Python, and if you accept what they can contribute, the project can benefit from many more experiences from different people and backgrounds.

What does the community think about it?

Once I shared my ideas with the community, the feedback was great. However, different people with different backgrounds started asking to use very different languages, ranging from Scala to Groovy and Python. Then I realized that supporting one scripting language would not have been good enough for most of the people.

“Hello world” in Groovy

To give you an idea of how easy is to write a new plugin in Groovy, see the following example.

import com.google.gerrit.sshd.*
import com.google.gerrit.extensions.annotations.*

@Export("groovy")
class GroovyCommand extends SshCommand {
  public void run() { stdout.println "Hi from Groovy" } 
}

It is straightforward to write scripting plugins: put the above content in a hello-1.0.groovy file in the Gerrit’s /plugins directory and as soon as the file is saved the plugin is there and will be loaded in Gerrit within a few seconds.

The way that Gerrit recognize this file being a plugin is through its .groovy extension. The file name denotes both the plugin name and its version, delimited by the ‘hyphen’ on the filename. In this example the file hello-1.0.groovy identify a plugin called ‘hello’ with a ‘1.0’ version.

One warning about Groovy: it is a language that relies on Java Reflection for method invocation. Reflection is a capability of the Java Runtime and enables methods discovery which is handy to use but is slower than a native Java language.
The drawback of the ease of use of the Groovy language is the CPU cycles at runtime.

The beauty of using a scripting language for plugins is the speedup of the development cycle: as soon as you edit the Groovy file on the file system, the old plugin is unloaded and the new one loaded in Gerrit. The plugin development lifecycle becomes so much faster compared to the traditional Java application development.

Develop Scripting plugins using Docker

Slide01.jpg

Gerrit is provided as a Docker image on DockerHub. The ‘gerritcodereview’ organization has an image name called ‘gerrit’ with all the versions available denoted as tags since Ver. 2.14. Earlier versions of Gerrit docker images are available on the ‘gerritforge’ DockerHub organization.

In the following example I am running Gerrit 2.14.4 on Docker fetching the image directly from DockerHub:

docker run -ti -p 8080:8080 -p 29418:29418 gerritcodereview/gerrit:2.14.4

In the above example, Gerrit is exposed through HTTP on port 8080 and exposes its SSH interface at port 29418.

Docker is a system that allows running containers, which are application “packaged” with everything needed, including other components of libraries of the underlying operating system. The only requirement on your physical host is the Docker engine, which exists nowadays for MacOS and Windows other than Linux where it was originally designed. Whatever operating system you are running on your laptop, Docker is there.

Docker can be handy for all the contributors that are not familiar with Gerrit Development Environment. There is no need to know or install anything on the local box, other than running the Gerrit Docker container. When I am running Gerrit in this way in this example, it starts straight away, with zero installation steps or configuration.

Gerrit out-of-the-box experience

The second significant value of the Gerrit Docker container is that includes an out-of-the-box configuration, a welcome screen, and the plugin manager. It consists already a set of components that, if you are not familiar with Gerrit, will help you a lot to understand what is Gerrit and how to use it.

As you can see from this screen, Gerrit has started, and if you navigate to http://localhost:8080, it shows you an initial welcome screen.

Screen Shot 2017-11-21 at 09.42.10.png

Historically the very first screen, once you have installed Gerrit, was a blank screen. I remember a few years ago people coming to me saying that as new Gerrit users they were quite confused: they just did not know what to do with the initial blank screen. In Gerrit Docker, the initial screen is a “Welcome” which is a beautiful thing to say to people that you did not know that came to your house. Additionally, it provides some useful links and information to install plugins, which is very important because Gerrit without plugins is missing some fundamental parts of its functionality.

Playing with Gerrit Plugin Manager

By clicking the “Install plugins” button, you reach the Gerrit Plugin Manager screen. For all of those who are familiar with Jenkins, it provides precisely the same functionality as in Jenkins. If you type ‘groovy’ in the search bar, you can easily find where the Groovy scripting provider is, and you can install it with a simple click. That is the plugin you need to tell Gerrit that from now on, every file in the /plugins directory with a .groovy extension is a plugin that needs to be parsed and loaded at runtime.

Screen Shot 2017-11-21 at 09.44.14.png

You can discover and install other plugins as well. For instance, typing ‘github’ would list the integration of Gerrit with GitHub authentication and pull requests, or typing ‘jira’ would return the association and workflow integration with Jira Tickets.
The plugin manager is a fantastic discovery mechanism to understand what are the integrations available for Gerrit Code Review.

The plugin manager automatically discovers the versions of the plugins that are compatible with the Gerrit you are currently running and, when you click ‘Install’, it downloads them and installs them locally. When you are done, just click on the top-right link “Go To Gerrit” and you are straight into Gerrit UX.

How we have a running Gerrit instance that has installed all the plugins I need, including the support for Groovy plugins.

Writing plugins in Scala

If you need want to leverage the Gerrit scripting plugins, but you need optimal performance at runtime, you can use a different scripting language such as Scala.

GerritUserSummit-2017-Scala.png

The Scala language allows compiling into the native Java bytecode; it does not use reflection for method calls and, for some operations could be even faster than the Java language itself. See the same hello world example but rewritten in Scala.

import com.google.gerrit.sshd._
import com.google.gerrit.extensions.annotations._

@Export("scala")
class ScalaCommand extends SshCommand {
  override def run = stdout println "Hi from Scala" 
}

When I showed this to the community people got so excited and started writing tons of scripting plugins.

What scripting plugins do in Gerrit?

Admin tasks as SSH commands

Sometimes Gerrit admins need to automate specific tasks, however, coding an external script could be slower and difficult to implement. Inside Gerrit, there are already a lot of objects which represent pre-processed in-memory entities ready to be used. It makes sense to leverage all the information that is in-memory already and write new SSH commands like Scripting plugins to control admin tasks remotely.

Scripted REST API

At times you need as well to tailor existing Gerrit REST API to your needs. For instance, imagine that your company has specific policies for requesting new repositories: why not then creating a new ‘Create Project’ REST API tailored for your needs using the Scripting plugins and expose it through a company HTML form? You can do it without the need to be an experienced Java or Gerrit contributor and using a simple Groovy script for the new REST API.

Low-footprint hooks events

A third option is fascinating because, before the introduction of Gerrit plugins, the only way to react to Gerrit events was through hooks or stream events. Hooks are a traditional Git mechanism and, in Gerrit, have a scalability problem: they are invoked for every project and every event that happens anywhere and spawn a different asynchronous process. Over time the extra processes created can cause a significant overhead for your super-busy Gerrit server.
When a hook script needs to read from the Git repository, it would then need to process from scratch the packfiles from the local filesystem, uncompress and parse them in memory over and over again, which could slow down your server significantly.
If you are implementing Gerrit events using plugins, the same processing could be ten or even hundreds times faster.

 

 

 

 

Gerrit Code Review and Jenkins Continuous Delivery Pipeline on BigData

Gerrit at the Jenkins User Conference 2015 – London

For the very first time, CloudBees organised a full User Conference in London and we have been very pleased to speak to present a real-life case-study of Continuous Integration and Continuous Delivery applied to a large-scale BigData Project.

See below a summary of the overall presentation published on the above YouTube video.

The trap of the BigData production phase

BigData has been historically used by data scientists in order to analyse data and extract  features that are relevant for the business. This has typically been a very interactive process happing mostly on “notebook-style” environments where almost everything, from ad-hoc queries and graphs, could have been edited and executed interactively. This early stage of the process is typically known as “exploration” or “prototype analysis” phase. Sometimes last only a few days but often is used as day-by-day modus operandi.

However when the exploration phase is over, projects needed to be rewritten or adapted using a programming language (Scala, Python or Java) and transformations and aggregations expressed in jobs. During the “production-isation” phase code needs to be properly written and tested to be suitable for production.

Many projects fall into the trap of reducing the “production phase” to a mere translation of notebooks (or spreadsheets) into Scala, Java or Python code, relying only on the manual analysis of the resulting data as unique testing methodology. The lack of software engineering practices generates complex monolithic code,  difficult to maintain, to understand and thus to validate: the agility of the initial “exploration” phase was then miserably lost in the translation into production code.

Why Continuous Delivery on BigData?

We have approached the development of BigData projects in a radically different way: instead of simply relying on existing tools, often not enough for setting up a proper Agile Delivery Pipeline, we introduced brand-new frameworks and applied them to the building blocks of a Continuous Delivery pipeline.

This is how Stefano Galarraga wrote started the ScaldingUnit project, aimed in de-composing the development of complex Scalding MapReduce jobs in simple and testable units.

We started then to benefit from the improved Agility and speed of delivery, giving constant feedback to data-scientists and delivering constant value to the Business stakeholders during the production phase. The talk presented at the Jenkins User Conference 2015 is smaller-scale show-case of the pipeline we created for our large clients.

Continuous Delivery Pipeline Building Blocks

In order to build a robust continuous delivery pipeline, we do need a robust code-base to start with: seems a bit obvious but is often forgotten. The only way to create a stable code-base,  collectively developed and shared across different [distributed] Teams, is to adopt a robust code review lifecycle.

Gerrit Code Review is the most robust and scalable collaboration system that allows distributed teams to submit their changes and provide valuable feedback about the building blocks of the BigData solution. Data scientists can participate as well during the early stage of the production code development, giving suggestions and insight on the solution whilst is still in progress.

Docker provided the pipeline with the ability to define a set of “standard disposable systems” to host the real-life components of the target runtime, from Oracle to a BigData CDH Cluster.

Jenkins Continuous Integration is the glue that allowed coordinating all the different actors of the pipeline, activating the builds based on the stream events received from Gerrit Code Review and orchestrating the activation of the integration test environments on Docker.

Mesos and Marathon managed all the physical resources to allow a balanced allocation of all the Docker containers across the cluster. Everything has been managed through Mesos / Marathon, including the Gerrit and Jenkins services.

Pipeline flow – Pushing a new change to Gerrit Code Review

The BigData pipeline starts when a new piece of code is changed on the local development environment. Typically developers test local changes using the IDE and the Hadoop “local mode” which allows the local machine to “simulate” the behaviour of the runtime cluster.

The local mode testing is typically good enough for running unit-tests but often is unable to detect problems (e.g. non-serialisable objects, compression, performance) that are likely to appear in the target BigData cluster only. Allowing to push a code change to a target branch without having tested on a real cluster represent a potential risk of breaking the continuous delivery pipeline.

Gerrit Code Review allows the change to be committed and pushed to the Server repository and built on Jenkins Continuous Integration before the code is actually merged into the master branch (pre-commit validation).

Pipeline flow – Build and Unit-tests execution

Jenkins uses the Gerrit Trigger Plugin to fetch the code currently under review (which is not on master but on an open change) and triggers the standard Scala SBT build. This phase is typically very fast and takes only a few seconds to complete and provide the first validation feedback to Gerrit Code Review (Verified +1).

Until now we haven’t done anything special of different than a normal git-flow based continuous integration: we pushed our code and we got it validated in Jenkins before merging it to master. You could actually implement the pipeline until this point using GitHub Pull Requests or similar.

Pipeline flow – Integration test automation with a real BigData Cloudera CDH Cluster

Instead of considering the change “good enough” after a unit-test validation phase and then automatically merging it, we wanted to go through a further validation on a real cluster. We have completely automated the provisioning of a fully featured Cloudera CDH BigData cluster for running our change under review with the real Hadoop components.

In a typical pipeline, integration tests in a BigData Cluster are executed *after* the code is merged, mainly because of the intrinsic latencies associated to the provisioning of a proper reproducible integration environment. How then to speed-up the integration phase without necessarily blocking the development of new features?

We introduced Docker with Mesos / Marathon to have a much more flexible and intelligent management of the virtual resources: without having to virtualise the Hardware we were able to spawn new Docker instances in seconds instead of minutes ! Additionally the provisioning was coordinated by the Docker Build Step Jenkins plugin to allow the orchestration of the integration tests execution and the feedback on Gerrit Code Review.

Whenever an integration test phase succeeded or failed, Jenkins would have then submitted an “Integrated +1/-1” feedback to the original Gerrit Code Review change that triggered the test.

Pipeline flow – Change submission and release

When a change has received the Verified+1 (build + unit-tests successful) and Integrated+1 (integration-tests successful) is definitely ready to be reviewed and submitted to the master branch. The additional commit triggers the final release build that tags the code and uploads it to Nexus ready to be elected for production.

Pipeline flow – Rollout to production

The decision to rollout to production with a new change is typically enabled by a continuous delivery pipeline but manually operated by the Business stakeholders. Even though we could *potentially* rollout every change, we did not want *necessarily* do that because of the associated business implications.

Our approach was then to publish to Nexus all the potential *candidates* to production and roll-them-out to a pre-production environment, ready to be assessed by Data-Scientists and Business in real-time. The daily job scheduler had a configuration parameter that simply allowed to “pointing” to the version of the code to run every day. In this way whatever is deployed to Nexus is potentially fully working in production and rollout or rollback a release is just a matter of changing a label in the daily job scheduler.

Summary

Building a Continuous Delivery Pipeline for BigData has been a lot of fun and improved the agility of the Business in rolling out changes more quickly without having to compromise on features or stability.

When using a traditional Continuous Integration pipeline, the different stages (build + unit-test, integration-tests, system-tests, rollout) are all happening on the target branch causing it to be amber or red at times: whenever tests are failing the pipeline need to be restarted from start and people are blocked.

By adopting a Code Review-driven Continuous Integration Pipeline we managed to get the best of both worlds, avoiding feature branches but still keeping the ability to validate the code at each stage of the pipeline and reporting it back to the original change and the associated developer without to compromise the stability of the target branch or introducing artificial and distracting feature branches.

Resources

The slides of the talk are published on SlideShare.

All the docker images used during the presentation are available on GitHub:

No more tears with Gerrit Code Review, thanks to Docker

docker-gerrit
Gerrit has finally landed on Docker!
The first official set of Docker images have been uploaded and are available on DockerHub.

Gerrit with its default configuration is now available for CentOS 7 and Ubuntu 15.04

Thanks to the new Gerrit installer project a new set of native distribution means are becoming available: starting from the native packages for Linux to the Docker images published today.

Why Docker?

Well, why not? Docker is an amazing technology for packaging an application with its dependencies and activating it in a sandbox virtualised environment. It allows a more effective isolation of an application container and at the same time assures to have a clear separation between the application distribution and its data.

Additionally for those who want to run more than one Gerrit instance at a time, it allows to define specific QoS for the activated docker containers and assign to them an internal IP and ports to be routed in a multi-hosted environment.

How can I get Gerrit on Docker?

Well, it is simpler than you can imagine. Once you’ve installed Docker on your Linux box you just need to execute the following commands:

To download and run Gerrit on CentOS 7:

$ docker pull gerritforge/gerrit-centos7
$ docker run -d -p 8080:8080 -p 29418:29418 \
  gerritforge/gerrit-centos7

To download Gerrit and run Ubuntu 15.04:

$ docker pull gerritforge/gerrit-ubuntu15.04
$ docker run -d -p 8080:8080 -p 29418:29418 \
  gerritforge/gerrit-ubuntu15.04

Gerrit will be started inside a Docker container and will be exposed on ports 8080 and 29418 on the host machine IP.

How can I customise my Docker container?

Gerrit Dockerfiles are available in the gerrit-installer project and can be easily customised and tailored to your needs. A much better idea however would be to generate a new Dockerfile that starts from Dockerhub image and then change your Gerrit configuration files and steps to perform your desired set-up.

Dockerfile sample for Gerrit listening on HTTP port 8090:

FROM gerritforge/gerrit-centos7
MAINTAINER GerritForge

USER gerrit
RUN git config -f /var/gerrit/etc/gerrit.config httpd.listenurl http://*:8090/
EXPOSE 29418 8090

# Start Gerrit
CMD /var/gerrit/bin/gerrit.sh start && tail -f /var/gerrit/logs/error_log

How can I install a specific Gerrit Docker version?

All Docker images published are associated to a specific Gerrit tag representing the version installed on that image. The default is always the latest Gerrit version that in this case is 2.11.

To download and run a Gerrit 2.10.3.1 Docker image on CentOS 7:

$ docker pull gerritforge/gerrit-centos7:2.10.3.1
$ docker run -d -p 8080:8080 -p 29418:29418 \
  gerritforge/gerrit-centos7:2.10.3.1

What about having typical Gerrit configurations  as Dockerfiles?

There are a lot of possible Gerrit configuration settings but the most typical ones are:

  • LDAP authentication
  • OAuth with Google / GitHub authentication
  • Master / Slave with Git over SSH
  • Master / Slave with Git over HTTPS
  • PostgreSQL DB
  • MySQL DB

All the above settings can be represented by a set of Dockerfiles similar to the one above mentioned: they will all start from a plain Gerrit Docker image (e.g. gerritforge/gerrit-centos7) and follow with the amended settings.

Where can I find the “pre-digested” Dockerfiles for the typical Gerrit configurations?

We are planning to enrich the Gerrit installer project with all the above typical scenarios and publish the associated Dockerfiles so that people can “pick&mix” the perfect recipe for a flawless installation.

Have fun with Gerrit and Docker, no more installation tears with Gerrit!