Jenkins and Gerrit are the most critical components of the DevOps Pipeline because of their focus on people (the developers), their code and collaboration (the code review) their builds and tests (the Jenkinsfile pipeline) that produce the value stream to the end user.
DevOps is all about iteration and fast feedback. That can be achieved by automating the build and verification of the code changes into a target environment, by allowing all the stakeholder to have early access to what the feature will look like and validating the results with speed and quality at the same time.
Every development team wants to make the cycle time smaller and spend less time in tedious work by automating it as much as possible. That trend has created a new explosion of fully automated processes called “Bots” that are more and more responsible for performing those tasks that developers are not interested in doing manually over and over again.
As a result, developers are doing more creative and design work, are more motivated and productive, can address technical debt a lot sooner and allow the business to go faster in more innovative directions.
As more and more companies are adopting DevOps, it becomes more important to be better and faster than your competitors. The most effective way to accelerate is to extract your data, understand where your bottlenecks are, experiment changes and measure progress.
Humans vs. Bots
The Gerrit Code Review project is fully based on an automated DevOps pipeline using Jenkins. We collect the data produced during the development and testing of the platform and extract metrics and graphs around it constantly https://analytics.gerrithub.io thanks to the OpenSource solution Gerrit DevOps Analytics (aka GDA).
By looking at the protocol and code statistics, we founded out that bots are much more hard worker than humans on GerritHub.io, which hosts, apart from the Gerrit Code Review mirrored projects, also many other popular OpenSource.
That should not come as a surprise if you think of how many activities could potentially happen whenever a PatchSet is submitted in Gerrit: style checking, static code analysis, unit and integration testing, etc.
We also noticed that most of the activities of the bots are over SSH. We started to analyze what the Bots are doing and see what the impact is on our service and possibly see if there are any improvements we can do.
Build integration, the wrong way
GerritHub has an active site with multiple nodes serving read/write traffic and a disaster recovery site ready to take over whenever the active one has any problem.
Whenever we roll out a new version of Gerrit, using the so-called ping-pong technique, we swap the roles of the two sites (see here for more details). Within the same site, also, the traffic can jump from one to the other in the same cluster using active failover, based on health, load and availability. The issue is that we end up in a situation like the following:
The “old” instance still served SSH traffic after the switch. We noticed we had loads of long-lived SSH connections. These are mostly integration tools keeping SSH connections open listening to Gerrit events.
Long-lived SSH connections have several issues:
SSH traffic doesn’t allow smart routing. Hence we end up with HTTP traffic going on the currently active node and most of the SSH one still on the old one
There is no resource pooling since the connections are not released
There is the potential loss of events when restarting the connections
That impacts the overall stability of the software delivery lifecycle, extending the feedback loop and slowing your DevOps pipeline down.
Then we started to drill down into the stateful connections to understand why they exist, where are they coming from and, most importantly, which part of the pipeline they belong to.
Jenkins Integration use-case
The Gerrit Trigger plugin for Jenkins is one of the integration tools that has historically been suffering from those problems, and unfortunately, the initial tight integration has become over the years less effective, slow and complex to use.
There are mainly two options to integrate Jenkins with Gerrit:
We use both of them with the Gerrit Code Review project, and we have put together a summary of how they compare to each other:
Gerrit Trigger Plugin
Gerrit Code review Plugin
Notes
Trigger mechanism
Stateful
Jenkins listens for Gerrit events stream
Stateless
Gerrit webhooks notify events to Jenkins
Stateful stream events are consuming resources on both Jenkins and Gerrit
Transport Protocol
SSH session on a long-lived stream events connection
HTTP calls for each individual stream event
– SSH cannot be load-balanced
– SSH connections cannot be pooled or reused
Setup Complexity
Hard: requires a node-level and project-level configuration.
No native Jenkinsfile pipeline integration
Easy: no special knowledge required.
Integrates natively with Jenkinsfile and multi-branch pipeline
Configuring the Gerrit Trigger Plugin is more error-prone because requires a lot of parameters and settings.
Systems dependencies
Tightly Coupled with Gerrit versions and plugins.
Uses Gerrit as a generic Git server, loosely coupled.
Upgrade of Gerrit might break the Gerrit Trigger Plugin integration.
Gerrit knowledge
Admin: You need to know a lot of Gerrit-specific settings to integrate with Jenkins.
User. You only need to know Gerrit clone URL and credentials.
The Gerrit Trigger plugin requires special user and permissions to listen to Gerrit stream events.
Fault tolerance to Jenkins restart
Missed events: unless you install a server-side DB to capture and replay the events.
Transparent: all events are sent as soon as Jenkins is back.
Gerrit webhook automatically tracks and retries events transparently.
Tolerance to Gerrit rolling restart
Events stuck: Gerrit events are stuck until the connection is reset.
Transparent: any of the Gerrit nodes active continue to send events.
Gerrit trigger plugin is forced to terminate stream with a watchdog, but will still miss events.
Differentiate actions per stage
No flexibility to tailor the Gerrit labels to each stage of the Jenkinsfile pipeline.
Full availability to Gerrit labels and comments in the Jenkinsfile pipeline
Multi-branch support
Custom: you need to use the Gerrit Trigger Plugin environment variables to checkout the right branch.
Native: integrates with the multi-branch projects and Jenkinsfile pipelines, without having to setup anything special.
Gerrit and Jenkins friends again
After so many years of adoption, evolution and also struggles of using them together, finally Gerrit Code Review has the first-class integration with Jenkins, liberating the Development Team from the tedious configuration and BAU management of triggering a build from a change under review.
Jenkins users truly love using Gerrit and the other way around, friends and productive together, again.
Conclusion
Thanks to Gerrit DevOps Analytics (GDA) we managed to find one of the bottlenecks of the Gerrit DevOps Pipeline and making changes to make it faster, more effective and reliable than ever before.
In this case, by just picking the right Jenkins integration plugin, your Gerrit Code Review Master Server would run faster, with less resource utilization. Your Jenkins pipeline is going to be simpler and more reliable with the validation of each change under review, without delays or hiccups.
The Gerrit Code Review plugin for Jenkins is definitively the first-class integration to Gerrit. Give it a try yourself, you won’t believe how easy it is to set up.
This week we are going to publish a talk from the Gerrit User Summit 2017 about Gerrit and Jenkins used together. It is a real-life story on how to set up a CI/CD pipeline for a massive traffic OpenSource project such as Gerrit Code Review and the learnings of how to manage the storage and consumption of the Jenkins build logs and the associated meta-data.
Even if you are not a Gerrit Code Review user, the learnings of this talk are going to be exciting and useful for any high load CI/CD pipeline project with Jenkins.
GerritForge: Gerrit Code Review and Jenkins expertise
I am part of GerritForge, a London-based limited company not specialized in Gerrit, as the name would tell, but also on Jenkins, Continuous Integration and Delivery. Why don’t we use our skills to serve the Gerrit Code Review project? A couple of years ago the project did not have an official CI yet, so we said: “why not help the project and set up an official pipeline to verify all the incoming Gerrit changes to the Gerrit Code Review project itself?”
We then created https://gerrit-ci.gerritforge.com and, as you can see, it is nowadays a jam-packed CI system. We have been running a Hackathon over the weekend, and now, even while people in this room are following this talk, new changes are produced, and reviews are getting pushed to Gerrit, and that keeps our CI busy all the times.
We have a lot of slaves, some of them are provided for free by Google and others are paid by GerritForge. We have been running this service for the last couple of years, and even non-contributors to the Gerrit project like most of you guys are possibly using it for downloading some useful artifacts such as the Gerrit plugins. Additionally, if you want to download and demo the latest and greatest version of Gerrit master, as we just did with some of you before lunch, you can use the Gerrit artifacts on Gerrit-CI instead of building it yourself on your local box.
Gerrit-CI pipeline walkthrough
Let’s have a look at how Gerrit-CI works. You can log in with your GitHub credentials, and then trigger builds for your Gerrit Code Review contribution using a job called “Gerrit verifier change”. That is the most important job of the pipeline and it verifies every single change we make on the Gerrit Code Review project.
How can you manually trigger the build and verification of a change in Gerrit? You navigate to https://gerrit-ci.gerritforge.com/job/Gerrit-verifier-change/ and click on “Build with parameters” link. You enter your change number and then click “Build”: it is straightforward.
What this job does is triggering a workflow developed in Groovy language which it will provide at the end a series of feedback messages to Gerrit. When you go to https://gerrit-review.googlesource.com and list of open changes, you will notice that some of them by one guy that is called “GerritForge CI”. That means that our CI works, yeah!
At a certain point in time, someone in the Gerrit mailing list said: “Houston, we have a problem, we are too productive! We have produced so many changes and patch sets that the time you finish to build a change, we have already produced other 300 patch sets on that job and the build logs get lost”.
The Gerrit change verifier workflow
Let’s go back for a moment to review how the workflow that we came up with works. It does not rely on the Gerrit Trigger plugin, the de-facto out-of-the-box Gerrit/Jenkins integration that most of the people use, but rather on a complete “new thing” that we have built ad-hoc for our purpose.
We couldn’t use the Gerrit Trigger plugin because of two reasons:
Google data-centers do not allow incoming SSH connections
SSH stream event channel would not have been good enough for us, because of the parallelism needed.
The way that our workflow works if very simple.
The verifier flow requests the list of changes that need verification by leveraging Gerrit query language which allows you to search through most of the fields of changes using a Lucene syntax. For each change that needs checking, a corresponding number of parallel jobs are triggered. This parallelism is potentially unlimited; the only limit is the number of machines that Google can assign to the Gerrit-CI, if he can allocate one hundred, we will be able to perform hundreds of parallel changes verifications.
That means that we can produce a lot of verification jobs at the same time. Bear in mind that for every change we do not trigger just one build: we have NoteDb vs. ReviewDb verification, PolyGerrit UX tests, Code-Style check there was a moment in time where a single change needed up to 6 parallel builds! That resulted into a lot of builds, which, as long as you’ve got enough horsepower in the slaves, it was working fine us.
We do not send feedback to Gerrit for every single build, but we rather have a “Gerrit Verifier Change” job coordinating the workflow and makes a decision accordingly. The criteria are the number of failed builds, the build retries for flaky builds. At the end of the process, all build results are collected and a unique coordinated feedback to send back to Gerrit as a unique verification message.
Too many logs for Jenkins lead to a 404 page.
This is all good, but as we said earlier: “Houston, we have a problem, we are too productive!”.
Here are some numbers of our productivity:
300 jobs
170,000 builds
4.8 millions of jar artifacts produced
1.7 billions of lines of logs
And of course, we want to send a link to the build logs we want to give context to the change failure or success. Unfortunately, happened to trace in Gerrit changes some nice links pointing to a quite unpleasant 404 page in Jenkins.
Why did it happen? We have a lot of contributors that generated lots of commit traffic and thus many build runs. There is a policy in Jenkins to remove “old” builds and thus happened that we lost build logs of active changes under review.
Q. (Han-Wen Nienhuys – Google) At Google internal build system we also this kind of numbers but of course with more zeros at the end, but actually we throw away our logs, and if you build binaries, they are very large.
In the beginning, we tried to keep more stuff online in Jenkins but people started saying “Luca, we have a much bigger problem now: gerrit-ci.gerritforge.com doesn’t respond anymore. When I open the Jenkins home page, it takes a very long time and eventually times out.”
That is caused by Jenkins design which is problematic when the number of logs increases considerably: everything is stored as a file and there is no efficient indexing for discovering the data on the filesystem. Additionally, if your company does not have a large infrastructure, your disk space is limited anyway. At GerritForge the Jenkins master has only 8TBytes of disk space, and we don’t have available a system with PetaBytes or more.
Keeping Jenkins logs forever
I made the Gerrit Contributors’ Community aware of the problem and I asked: do we like that? If you think about it, logs are not rubbish. Logs are of immense value, logs are like your money, and analyzing them, crunching and understanding them is our daily job. The timestamps in the logs are like precious diamonds because they tell you that you may have made a mistake in your code and some parts of your pipeline execution start taking a lot more time than before.
When you remove the “old” logs, you make much more difficult to investigate on a failed verification build: the link attached to the change verification message points to a page that returns a 404. That’s not a bug in Jenkins; it’s a feature of removing old logs and keeping the master instance fast and healthy. But actually, it is a real functionality gap because Jenkins doesn’t know yet how to manage logs archiving.
Then I asked the Community: “For how long you want your logs to be retained?” because I needed to raise a PO for a much bigger machine. “One day, one week, one month?” and the answer I got was “Forever!”
If you think about it carefully, the answer is correct. You may not need all those logs at the moment, but in a month’s time, you may need to crunch some data to extract features or metrics. Additionally, getting rid of all logs means generating broken links in my past reviews, which could be an audit requirement stored with Gerrit changes.
Sending Jenkins data to a Logstash appender
It was about time for me to think about a solution and here is a description of what I have done.
First of all, I needed to get more disk space from Google, but then how can I tell Jenkins to use an alternative disk storage mechanism for his logs?
I then started adding to the jobs a plugin called “Logstash” (https://wiki.jenkins.io/display/JENKINS/Logstash+Plugin) which is responsible for capturing and sending Jenkins data to a configured stream appender.
All the Gerrit CI jobs are managed through YAML files which are submitted through code-reviews, using the Jenkins Job Builder tool. However, showing the Logstash configuration on the Jenkins UI is much easier to show where the Logstash is playing a role in the Gerrit-verifier-change job configuration.
I have enabled a new feature to all the jobs to send all the log stream to the Logstash plugin. This works differently to what most of the people would do. Instead of just posting the log file into a stream of lines to ElasticSearch, this plugin gets the information directly from the JVM memory together with its metadata, the timestamp, the build parameters, the environment variables and send them to an endpoint, which could be anything. In this case, I have chosen to use RabbitMQ as stream appender. On RabbitMQ you can notice that I have created a queue for incoming Jenkins messages.
You may notice a lot of activity because every time that the Jenkins jobs produce something, a message is sent to RabbitMQ with the log and the attached meta-data. RabbitMQ is not used though as a storage system but acts only as a vehicle to transfer the information to a long-term storage system, which could be Google Cloud Storage.
The organization of files is straightforward: one file per hour. By looking at the file content, it is a very compressed JSON file that contains all the information I need: the build id, the result, the logs, the parameters.
Spark to the rescue
Problem solved? Can I tell all the Gerrit contributors that they have to look for a build result into a JSON file? Maybe this is not a very nice user experience.
I little more digging is needed to make the solution more transparent to the end user.
GerritForge as a company works and contributes to many BigData projects, including Apache Spark. Why don’t we build an elementary Spark transformation that consumes the input JSON files and materializes back the log into a readable format?
So we built a Spark job that is crunching this data and produces something very very similar to what Jenkins would render. However, we need to make sure to perform all those operations outside the Jenkins domain; otherwise, it would become very soon overloaded and thus unusable.
I have then created another directory that is not actually managed by Jenkins but gets populated by a Spark job. This parallel file structure has exactly the same organization of the build files generated by Jenkins builds.
Let’s have a look for instance at the oldest build that has been recorded by Jenkins: build #31639. For sure if I go to the build #31444, which is older than the #31638, Jenkins would give me a 404 because that job execution has been removed.
However, if I try now to navigate to the build log #31444, wow, I can see the full results as the build log was still accessible.
Additionally, as this log has been produced from the previous JSON file that contains all the meta-data, I can even render more information such as the time-stamps, which are not typically available in Jenkins unless you enable a specific plugin.
Moving forward, by leveraging the same input JSON file, we could do a lot more data crunching as well. It would be interesting for instance to draw a graph of the correlation between the Gerrit changes the build execution times at the different stages.
Uncovering the hidden value of your Jenkins logs
There is a lot more we can do with the JSON I’ve shown you before. It contains not just the log messages, but everything related to the build meta-data of the build and its execution metrics. That means if we go to this change #129553, the link that points to Jenkins logs is not broken anymore, even if it is not served by Jenkins but is backed by the Spark job results.
Starting to applying the same mechanism to all the Gerrit changes and redirecting them to the Google storage where all the files are going to be archived, any change in the Gerrit history will not contain broken links anymore and will be perfectly auditable.
That means that from now, whenever you are going to receive a Verified notification from Gerrit and you navigate to your change links, you are not landing anymore to a 404 page anymore.
Questions.
Q: What if I have a Jenkins instance and I want to do some of this but I don’t have infinite disk-space as Google. Is it is possible to implement?
A: With regards to disk space, you don’t have to go to Google or AWS. You can set up an HDFS filesystem yourself. All the cloud storage implementations available on the Cloud are mainly based on something very similar to HDFS which is an open standard and is available as OpenSource. That means you can store the information there and you do not necessarily need to keep it forever. In practical terms what you need to keep is the lifetime of a release of the software, or a few software iterations, maybe six months, 12 months. As the JSON files are organized on a time-series, it is going to be very easy to remove or archive all the data you do not need anymore. I have shown you how to store those files in JSON, but you can use even more optimised and compressed format such as Avro or Parquet, which may contain 10x times the information in a fraction of disk space. Additionally, when you process them, they can be even faster because they include data encoded in binary format. In a nutshell, the term “keep the logs forever” could be read as “keep for as much as you need: one week, one month, six months, …”. The problem with Jenkins is that for very busy servers like the Gerrit CI, you cannot keep even a single day of logs and when the people are coming the next day to check what’s wrong with a failed verification, would risk having a 404 error page.
Q: So if you do compression and decompression, that needs to happen server-side, so that is transparent to the browser?
A: Yes, that needs to happen on the Server, and there are a lot of ways for doing it, it could be even done on-the-fly, streaming and is pretty fast. There will be a talk tomorrow talking about the methodology to crunch large amounts of data and about the lambda architecture.
Q: Does it generate a RabbitMQ message for each log statement or a unique one at the end of the build?
A: Yes, and the reason is straightforward: If the build crashes or gets aborted for any reason, you do not want to lose your build logs. There was an implementation of Logstash for the Jenkins pipeline that was precisely collecting the logs all at the end of the build, but the design is wrong because if the builds get aborted you do not get feedback at all. Yes, it generates a message for every single line, and possibly RabbitMQ is not the correct implementation of it. But as soon as the Logstash plugin supports the Kafka transport, the performance issues related to the use of RabbitMQ for log streaming will be resolved.
Q: The Logstash plugin that you mentioned, has nothing to do with the “ElasticLogstash” implementation?
A: Yes, it is just unfortunate naming. Actually, the Jenkins Logstash plugin was possibly born before Elastic called his implementation ‘logstash’.
Q: You mentioned that you do Spark processing at some point, but it wasn’t part of your presentation.
A: Yes, it is not part of this presentation for reasons of time, but it is trivial.
Q: Question about the GerritForge CI: I have frequent problems of the test failing not because of my code, and I want to retrigger the tests without having to add a commit to retrigger the CI. Is there a way to retrigger the CI build?
A: Yes, it can be done by going to the Gerrit-verifier-change URL, you click on “Build with Parameters” and enter your change number. You can in this way retrigger any build without having to commit anything.
Q: And if that pass that would assign the Verifier approval to the change?
A: Yes. I would like to add a Button to Gerrit-Review to avoid people to navigate to a different URL.
Q: We are relatively heavy users of Gerrit topics because we have changes that are across multiple repositories. We have a very similar job to this one but we can either put a single or multiple change IDs or a topic name, and it will work out whether it is a consistent declaration. Another thing that you can comment on, you mentioned that the verifier job which runs some independent verifications and then feeds the result as one result to Gerrit, that sounds like something we could use. What is that build using?
A: Tomorrow there will be a presentation of a brand-new integration between Gerrit and Jenkins. The rationale for writing a new integration lies on the thinking that “maybe the Gerrit project is not the only one that needs a bit more from Jenkins.” So why not creating a Jenkins plugin that takes the most of the experience we’ve made in integrating Gerrit with Jenkins for the Gerrit Code Review project and makes it available to the rest of the world? There will a plugin to implement that workflow.
New and exciting features are coming for this year Gerrit User Summit, with the launch of Ver. 2.15, NoteDb, high-availability, multi-master and much more.
The Summit will take place for the very first time in Europe, London, the location chosen by the community after a public consultation, the 2nd and 3rd of October at CodeNode (Skills Matter).
See below an overview of the topics that will be presented and discussed during the User Summit.
What’s new in Gerrit 2.14.x.
Gerrit v2.14 was released during the last Hackathon in April and has gone through three patch releases. David Pursehouse from CollabNet will give an overview of the new features introduced which would be highly beneficial for all of those who haven’t migrated yet.
Gerrit at Google: Multi-master, Mutli-tenant.
Google is the founder, main contributor and possibly the most advanced user of the Gerrit Code Review: learning from their experience is a unique opportunity to learn and being able to leverage and use the tool at its best.
Patrick Hiesel from Google will go through the insights of their Gerrit Code Review architecture and will provide some of their metrics of scale. In addition to that, he will present some findings from the recent switch of their load-balancing infrastructure and the associated pitfalls encountered.
Google is possibly the only one in the world using Gerrit in a multi-tenant setup, having a unique multi-master installation that serves a constellation of domains and projects, including huge and familiar ones like Android and Chromium.
Standing “on the shoulders of giants” like Google helps a lot in preventing scalability issues as the audience and adoption of Gerrit Code Review grows in large companies: being part of the audience in the talk is a unique opportunity to learn and ask questions directly to the maintainers of their infrastructure.
PolyGerrit: a new UX experience for Gerrit Code Review
Google has invested a lot in reinventing and reengineering the user interface of Gerrit Code Review, which remained mostly unchanged for almost a decade. A new team has been put together in their San Francisco offices with experienced UX developers that leveraged the new Polymer framework of web components.
The result is PolyGerrit, a modern web UX which provides an unprecedented browsing speed and flexible rendering across different devices, including mobile and tablets.
The PolyGerrit Team will be presenting the findings of their user-experience research and show some of the features and insights of the new UX.
Gerrit CI and keeping logs forever.
Gerrit Code Review itself is a large project, involving over 300 developers across the globe and using the most advanced DevOps practices. The CI/CD pipeline has been provided and managed by GerritForge on the https://gerrit-ci.gerritforge.com and Luca Milanesio from GerritForge will present the latest improvements in the pipeline plus an interesting way of collecting and reusing the logs.
Leveraging the logs for identifying the bottlenecks of the CI/CD pipeline is the way to drive improvement. GerritForge leveraged the expertise of his engineers to harvest and organize data and will give it back to the community as powerful dashboards.
Beyond Gerrit.
Gerrit is great. However, it is also quite an important part of a bigger ALM process. Jacek Centkowski from CollabNet will describe how multiple tools can be unified under a single TeamForge umbrella and what are the immediate benefits of it.
What’s coming in Gerrit 2.15
After only four months, we are already close to the v2.15 of Gerrit Code Review, which would be possibly the last one before the step to the v3.0.
Dave Borowitz from Google, principal maintainer of the Gerrit Code Review project, will go through the new features of v2.15 and possibly give a glimpse in what to expect from v3.0.
Mining Gerrit Data to Study Contentious Reviews and Community Evolution
Gerrit Code Review is much more than a tool, it is a way for people working together in companies that are large and mostly distributed across the globe.
Shane McIntosh from McGill University has been running a research lab on this topic. The Software REBELs—a research lab at McGill University—mine code review data to study topics like the impact that code review practices have on software release and design quality. Our more recent work mines code review data to study the reviewing process itself. In this talk, I will describe the results of two empirical studies of data that we collected from the Gerrit instances of the OpenStack project. The first study aims to understand the reviews where reviewers disagree about a patch. The second study follows how the concerns that reviewers raise evolve as the OpenStack community ages and individual reviews accrue experience.
Gerrit Analytics: dashboards, networks, KPI
Gerrit has always been lacking major code analytics features compared to other Git Server tools like GitBlit or GitLab. GerritForge Ltd is filling the gap and adds one important asset to the Gerrit Code Review platform: code review analytics.
We need to harvest and unify the logs and events coming from the different components of the CI/CD pipeline by putting at the center of it the people and teams that are building and discussing the code on Gerrit. The resulting data-lake of information can be later analyzed and correlated to calculate the cycle time of the entire pipeline.
Luca Milanesio from GerritForge will show the new analytics dashboards that are going to be published and provided back to the Team that is developing the Gerrit Code Review project as a precious contribution to the community.
How to extend Gerrit using Scripting Plugins
Gerrit Code Review has a robust set of API that can be used to extend its functionalities and provide a more integrated development workflow for the Teams.
Luca Milanesio from GerritForge will present how to use different scripting tools to extend the capabilities of Gerrit without the need of developing and building a plugin, using Jython, Groovy and Scala.
A new simpler but powerful Gerrit Jenkins plugin
Gerrit Code Review is an essential part of a larger CI/CD pipeline. Most of the times it is used in conjunction with Jenkins, the most popular OpenSource Continuous Integration and Delivery tool.
The integration between Gerrit and Jenkins (Gerrit Trigger Plugin) was developed back in 2010 at Sony and since then has been extended and adopted in thousands of Jenkins installations. However, Jenkins has evolved too and has now a brand new concept and definition of multi-branch pipeline which struggles to be seamlessly integrated with the current Gerrit Trigger Plugin.
Luca Milanesio from GerritForge will present a brand new plugin based on the new Jenkins branch discovery API which works seamlessly with Jenkins multi-branch pipelines and provides a simpler interface with Gerrit by leveraging the new WebHooks.
Diffy with enterprise grade
Since 2012 CollabNet has been working on improving Gerrit integration with TeamForge. Many features have been created to satisfy the needs of enterprise customers. Eryk Szymanski from CollabNet will present features like RBAC, history protection, Git style notifications, quality gates, pull request and code browser which have been implemented on top of vanilla Gerrit.
Q&A with the maintainers
Have you ever wondered why something is working in a certain way? Have you ever wanted to explain any complaint about some parts of Gerrit? Would you give your congratulation to the people that made this project? Would you like to make a feature request or propose new ideas?
This is the moment where you can speak directly face-to-face to the people that are building this project every single day, the Gerrit maintainers.
Gerrit 3.0 plan announced: we need stabilisation now
Gerrit 3.0 plan and its NoteDB reviews have been officially announced at the Gerrit User Summit 2015. It is already available as an experimental feature in the current Gerrit master but it needs much more stability in order to be officially supported for production.
GerritForge decided to help and reuse its existing continuous integration system to validate every Gerrit patch set against the current and the new NoteDB review persistence back-end in order to avoid regressions during the 2.13 and 3.0 development
Pre-commit validation by GerritForge CI
If you have posted a patch to gerrit-review.googlesource.com in November, you may have hopefully received a Verified+1 from a strange user with a Diffy logo on the side.
GerritForge’s provided CI on gerrit-ci.gerritforge.com fetches automatically every patch-set pushed to gerrit-review.googlesource.com and triggers a slightly modified Gerrit build with the purpose of checking whether the code change introduces a regression or not. This may seem at first sight a quite normal Gerrit to Jenkins job integration, however implementing it on top of Google’s multi-master replicated installation was not a piece of cake.
Gerrit Trigger plugin limitations on multi-master setups
Jenkins has already an out-of-the-box integration with Gerrit provided by the Gerrit Trigger plugin maintained by Robert Sandell – Cloudbees. It leverages the Gerrit stream events through an SSH channel and make use of Gerrit REST API to action them according to the build result.
The Google’s Gerrit setup, however, is not a trivial one-node installation and is further limited by the security constraints of the Google infrastructure, which does not allow any incoming SSH connectivity.
Additionally all concept of “getting the events in a stream” isn’t going to work when events can come concurrently from multiple places at the same time: who is going to define the “global ordering” and how to put all those events in a single TCP/IP Socket? Even UDP would not work in this case because SSH channel requires confidentiality between two and only two peers.
Alternatives to SSH
During the hackathon, other approaches have been discussed by Shawn Pearce, including the use of HTTP WebSockets (or Cometd) for fetching events without the need of an SSH connection. Events are still distributed and generated by multiple masters all the time, and the Jenkins plugin would then have the onus of contacting all the Gerrit servers and keep a connection opened to all of them. This is clearly not going to work because the number of servers, their IPs and locations may change at any time and the solution would eventually be in danger of losing precious events.
Back to polling
The only solution we envisaged was to fall back to a polling logic where Jenkins ever 10 minutes is asking Gerrit “what’s new since last time we spoke?”. This solution goes against the main reason the Gerrit Trigger plugin was designed: avoiding SCM polling. It is, however, a much better and optimised polling strategy and let’s see why.
Query and then fetch
The typical Git SCM polling relies on fetching all references every poll interval and detect if new Git commits are available. This is notably slow and generates a huge overhead on the Git server. The approach we took is quite different and makes use of the Gerrit search capabilities that are way faster and more powerful than a simple Git fetch.
Jenkins first ask Gerrit the list of changes and associated commit-IDs involved in any event since the last polling time: the result may include patchsets that have been already built to avoid having any gaps between polling intervals. The search is fast and implemented in … you know, Google is a search company isn’t it?
Once the list of candidate commit-ids is identified, Jenkins goes through all of them and checks using the Gerrit REST-API:
– has it been build during my previous execution?
– has it been already accepted (or rejected) by me?
The Commit-IDs that results as not being checked before and not yet validated are then used to trigger a specific job parametrised on:
– Specific branch
– Specific change ref-spect
Fetching is performed avoiding any wildcard and the corresponding load on the Git server is minimum. Fetch (Git protocol) + build (using Buck) + test (unit + integration) + review feedback (REST API) is taking an average of 5 minutes, which is an amazing result if you consider the size of the Gerrit project and the typical slow speed of a default Jenkins Git fetch.
The bottom line
Using the query + fetch approach, which seemed a bit slow and old-fashioned at the beginning, was eventually very simple and successful. Instead of setting up SSH hostkey verification, key exchange and ad-hoc channels, the only configuration needed is a valid Gerrit user and the HTTPS endpoint URL, the same used for cloning the code.
The solution is much more reliable as SSH channels are notably unstable and consume server threads. The only drawback is the slight delay between the patch-set upload the start of the build (at max 10 minutes) which is acceptable in most cases.
Results
Since its roll-out more than 1200 patches have been checked and rated, a lot of potentially Gerrit regression avoided and more importantly we have prevented the NoteDB code to start diverging regarding stability from the current mainstream development.
How can re-trigger validation for a single change?
We have enabled anyone to trigger ad-hoc executions of the Gerrit validation flow using the following URL: https://gerrit-ci.gerritforge.com/job/Gerrit-verifier-flow/build
This is a standard Jenkins parametrized build that request the change-id to be built, as either SHA1 or number. Once the job is triggered the build will be executed and the validation feedback applied to your change, regardless of the previous build or validation status.
Thank you to CollabNet Inc. for the opportunity to showcase Gerrit Code Review “in action” to a broad audience of over 120 attendees from all over the world ! During the last Session we have seen how to create your first Code-Review with Gerrit and getting automatic validation through Jenkins Gerrit-Trigger-Plugin.
NOTE: Unfortunately the session was broken in the middle of Q&A due to a technical issue with the Webinar conference tool: we invite you to comment on this blog by posting your additional questions, we will be happy to answer all of them.
Workflows, Branching & Merging (View on demand)
Learn the basic concepts of Git and Gerrit and see explore the power of merging and applying code from different branches thanks to recursive merge, rebase and cherry-pick.
Peer Programming & Code Reviews (View on demand)
See how Gerrit Code Review can enforce full code collaboration and collective ownership, similarly to peer programming you can achieve faster and better code delivery. Learn how to integrate Jenkins CI into the development process and have automatic validation to feedback into your code review workflow.
Hands-On Lab with Gerrit & Jenkins(View on demand)
Step-by-step hands-on with Gerrit to learn how to move your first steps with Code Review: contribute your change, trigger your Jenkins CI validation and learn how to understand the feedbacks and amend your code for achieving a successful merged change into the main branch. Learn how to manage multiple reviews when the code evolves.
See below the log of Question and Answers of the Gerrit Code Review webinar, brought to you by GerritForge LLP in order to give you the opportunity of getting more insight on Gerrit and trigger more discussion and feedback on Code Review.
Gerrit Code Revew – Q&A
Q: Is refs/for/master branch created for each user by gerrit?. A: Gerrit automatically manages the “refs/for/<target-branch>” logical branches (not real branches on Git Server). Whenever a commit is pushed to a “refs/for/<target-branch>”, Gerrit capture the commit and creates (or add to) a Change for review for the target-branch indicated. Different commits pushed by different users create different changes.
Q: would you reccommend using -SNAPSHOT in the POM’s version during development phase, if so when to remove it for prod releases? A: Yes, it is a common and good practice to use -SNAPSHOT suffix for labelling maven artifacts that are still in development. On stable branches however it is recommended to edit (possibly via scripting) all the pom.xml and use a final version number, without the -SNAPSHOT suffix.
Q: Is it possible to share the job configurations from Jenkins
A: Jenkins has been integrated with Gerrit using the Git Plugin Ver. 1.1.18 plus Gerrit Trigger Plugin Ver. 2.5.2; Gerrit configuration can be found in the global Jenkins configuration screen:
Gerrit Trigger contains the SSH details to connect from Jenkins to Gerrit Server (gitent and /home/gitent are the CollabNet TeamForge 6.2 user and user home directory under which Jenkins CI is running on):
Jenkins Job configuration / Git settings, needed to instruct the Git plugin on the ref-spec and branch to clone when a new Gerrit Change is pushed:
Git plugin advanced options are specified:
1) wipe out the workspace before build (as changes could be taken from completely different branches every time)
2) Trigger a build from Gerrit events / changes.
Gerrit Trigger settings on the Jenkins Job:
Q: Is ammend command part of git? Can it be done via eclipse or some other IDE?
A: The “–amend” option allows to edit the last commit objects in HEAD. EGit (Eclipse Git plugin) allows to amend using a specific icon on the top-right corner of the commit window.
Q: what code deployment tools are recommended with GIT/Gerrit/Jenkin teamforge solution
A: CollabNet recommends UC4 (See: http://www.collab.net/deploy). Please refer to a CollabNet tech sales representative for more information.
Thank you again for your attendance to all the three webinars and stay tuned as new Webinars and training material on Git and Gerrit is coming soon !
Session was about an high level view of the Gerrit Code Review workflow and his benefits in encouraging collaboration around the code and collective ownership of the Software Project. Moreover we have seen how Code Review and Continuous Integration can bring the “early integration” together with the scoring and promotion of changes into mainstream branches.
GerritForge LLP is proud partner of CollabNet Inc. and invites you to replay the first two webinars and register to the last webinar of the series:
Workflows, Branching & Merging (View on demand)
Learn the basic concepts of Git and Gerrit and see explore the power of merging and applying code from different branches thanks to recursive merge, rebase and cherry-pick.
Peer Programming & Code Reviews (View on demand)
See how Gerrit Code Review can enforce full code collaboration and collective ownership, similarly to peer programming you can achieve faster and better code delivery. Learn how to integrate Jenkins CI into the development process and have automatic validation to feedback into your code review workflow.
Hands-On Lab with Gerrit & Jenkins(February 12th) << Next Webinar
Step-by-step hands-on with Gerrit to learn how to move your first steps with Code Review: contribute your change, trigger your Jenkins CI validation and learn how to understand the feedbacks and amend your code for achieving a successful merged change into the main branch. Learn how to manage multiple reviews when the code evolves.
See below the full log of Question and Answers of the Gerrit Code Review webinar, brought to you by GerritForge LLP in order to give you the opportunity of getting more insight on Gerrit and trigger more discussion and feedback on Code Review.
Gerrit Code Revew – Q&A
Q: Please stop selling and instead show us the features. A: The initial two webinars focus was more on the concepts introduced by Git and Gerrit, alongside with their support in commercial tools such as CollabNet TeamForge ALM product: this may have been perceived as “product selling” rather than functional and conceptual introduction to Code Review; that was not really our intention. The third Webinar of the series (Hands-On Lab with Gerrit & Jenkins – February 12th) includes a step-by-step hands-on lab on how to use Gerrit from both Web-UI and command line for managing the Code Review workflow in a real-life environment, integrated with Jenkins CI Gerrit Trigger plug-in for code Validation. We will use Gerrit 2.1.8 embedded in CollabNet TeamForge 6.2 for our lab exercises, however they same exercises can be repeated with any other version of Gerrit.
Q: what were the three branches you mentioned? (1: _____ branch, 2: stabilization branch, 3: ______ branch) A: You typically have at least three branches on your Git repositories: 1. main development branch (master), 2. stabilisation and release branch, 3. production and maintenance branch. The level of access control and code-review workflow you may want to apply to different branches is potentially different: Gerrit Code Review allows the definition of ACLs on a per-branch basis, allowing to easily manage a more fine-grained access to contributors on development branch whilst putting more strict control and validation on stabilisation and production branches.
Q: I tried to POC Gerrict but fell at the 1st hurdle – the login to webiste was insisting on an OpenID logon – we run behind a strict firewall so OpenID did not work – is there a way to allow a simple logon process at least for the QA
A: Gerrit Code Review set-up typically requires to have an existing user registry for authenticating users: I suggest to use a simple plain LDAP configuration rather than OpenID (i.e. you could download ApacheDS to set-up a sample registry) or download the CollabNet TeamForge 6.2 VMware appliance that includes Gerrit 2.1.8 integrated with CollabNet user registry.
For a more recent Gerrit 2.5.1 binary distribution, you could download the GerritForge 2.5.1 binary distribution that includes a step-by-step setup procedure that provides: 1. creation of an initial administrative account without the need of any external user registries or LDAP, 2. initial configuration of groups and roles, 3. sample repositories ready to be used as templates.
Q: Can Gerrit be used in Linux and Microsoft OSes ?
A: Gerrit is a Java EE based application; can be hosted in any OSes that supports Java JDK 1.6 or later. There is still a binary dependency on Git executable in the Gerrit wrapper start-up shell: this can be simply resolved by porting the gerrit.sh to the target Operating System you want Gerrit to be hosted. GerritForge 2.5.1 for instance is available for all major UNIX platforms, Mac OSX and Microsoft Windows Servers and is integrated as automatic native system service.
Q: How Gerrit was different from sonar and hudson integrated to SVN ?
A: Sonar and Hudson are excellent tools for executing of a number of quality checks every time that a commit is created. However they do not include the ability to provide change-promotion based on voting plus collaboration tracking in terms of ideas and comments to the code. They remain however excellent tools that can be used as complement to the Code Review process, for providing additional automated score on change-sets uploaded to Gerrit.
Q: Is there any good web based code review tool like Gerrit on Subversion?
A: There are definitely other tools on the market, for instance CollabNet TeamForge includes ReviewBoard whilst Atlassian has its own Crucible commercial product: both can be used on top of other SCMs including Subversion. They are not comparable to Gerrit though, because the underlying SCM is not powerful as Git: when managing code-reviews you do need the ability to amend your code history and merge changes back to the target branch when review is completed. Subversion and other SCMs do not provide that flexibility, as a consequence the code-review on top of it cannot provide all the functionalities provided out-of-the-box by Gerrit.
Q: What are the major differences between Gerrit/Jenkins and existing enteprise application testing tools such as Junit/ANT or Oracle Application testing framework? Can they be integrated with Eclipse IDE for code reviewing and testing purposes?
A: Gerrit/Jenkins are not integrated out-of-the-box with Junit/ANT or Oracle Application Testing tools: main purpose of Gerrit is not to execute tests against the code but enforcing maximum visibility of changes in the Development Team and beyond.
Gerrit itself does not enforce the numbers and nature of checks against your code; however Gerrit can be potentially integrated with Junit/Ant and Oracle Application Testing tools to collect their feedback and track as score into a change-set review. From Gerrit 2.6 the code-review functions will be available even via RESTFul API: automation through Ant will be possible via scripting or custom tasks. Gerrit is already integrated with Eclipse IDE via Mylyn and its native Gerrit plugin: it allows to receive feedback in realtime on incoming code-reviews and display comments as annotations inside Eclipse standard code editor.
Q: Can we Integrate Maven and / or Ant with Team Forge
A: TeamForge provides a SOAP Interface and both Java and Python interface to access via external tools or scripting. This means that potentially both Maven and Ant can be used for accessing TeamForge artifacts.
Q: Can we block commits to the master branch and only use ref/for/master and force commits to gerrit code review ?
A: Yes, by using Gerrit fine-grained ACLs to refs/heads/master and allowing push for review to refs/for/refs/heads/master.
Q: Slide #28 (Gerrit workflow): What happens if master contains code conflicting the check-in of A2* (because this check-in was done prior to change A2*)? Can Gerrit auto-resolve this conflict?
A: No, as Gerrit does not know how to resolve automatically a conflict: Gerrit can try to merge the code anyway, depending on the Gerrit project settings, but if a conflict is detected, change-sets cannot merged back into the branch.
Q: That overview graph (CI + CR with traffic lights, branches) you showed was very helpful, to see what you are talking about. Which tools were you saying that covered? Gerrit and Git? TeamForge?
A: Workflow shows the usage of Git, Gerrit Code review and Jenkins CI together to validate a Change-Set and merging it to master branch. Master branch is always with a “green” street light (Build is healthy) whilst the “red” and “amber” lights are only on the refs/for/master (code-review branch) and do not impact the rest of project team development. TeamForge is the ALM to connect everything together (Git, Gerrit and Jenkins) in a unique environment.
Q: Is there integrations between other CI with TeamForge?
A: TeamForge provides integration with Cruise Control and Maven as well. Potentially other Continuous Integration tools can be integrated with TeamForge by using the SOAP or Java and Python API.
Q: Just wondering number of paritcipants on call, to relate to % in polls.
A: 138
Q: How would you recommend a small distributed group (4 devs, 2 testers) to sort of “try out” this complex approach you are promoting here, while continuing to maintain our daily work via Subversion?
A: Git allows to pull and push back to Subversion, using the “git svn” commands. You can dedicate a branch (i.e. trunk-review) to experiment Gerrit code-review workflow, whilst keeping the rest of the development on the standard trunk branch. Once code-review is completed and merged to trunk-review, branch can be pushed back to Subversion and tracked as a normal regular commit. Bear in mind that Subversion does not have all the power of Git on merging code together, you may need extra manual work then to bring the changes back into the main trunk.
Q: How does Teamforge run Gerrit? Can Gerrit run as a Windows service?
A: TeamForge runs Gerrit into a separate JVM, activated and managed in the same way as any other TeamForge daemons. Gerrit itself can be activated either via gerrit.sh wrapper script or as standard Java WAR application into a standalone Servlet Container (i.e. Tomcat). In this second form it can be technically executed into a Windows service, such as Apache Tomcat running into a Java Service Wrapper.
Q: We’re using TeamCity as CI, can it be used and integrated with Gerrit ?
A: In theory they can be integrated, using Gerrit SSH or RESTFul (from Gerrit 2.6) scripting. See this example from Timothy Basanov BLOG. Unfortunately TeamCity is not OpenSource and thus has a more limited community of plugin developers providing integrations for Git and Gerrit. Jenkins CI has the world’s largest community of plugin developers for a Continuous Integration engine and already provides support for Gerrit Code Review workflow.
Q: Gerrit or Team Forge can help to migrate code from ClearCase to GIT?
A: Not really, they are not intended to drive migration from one SCM to another.
Thank you again for your attendance and don’t miss our next Webinar with CollabNet Inc, Tuesday, February 12, 10:00 AM – 11:00 AM PST.
The first webinar of the CollabNet series “Go Agile with Git” had a great audience of over 200 attendees from all over the world !! A big THANK YOU to all the attendees for the great result and the great questions during the webinar.
Unfortunately not all of the questions could be answered during the Webinar because of the limited time available. You can find below all the Webinar questions and answers on Git and Agile Development … and feel free to post additional ones by commenting this post: you will definitely get our answer shortly on the blog.
GerritForge LLP is proud partner of CollabNet Inc. and invites you to replay the first webinar and register to the following two webinars of the series:
Workflows, Branching & Merging (View on demand)
Learn the basic concepts of Git and Gerrit and see explore the power of merging and applying code from different branches thanks to recursive merge, rebase and cherry-pick.
Peer Programming & Code Reviews (January 29th) << Next webinar
See how Gerrit Code Review can enforce full code collaboration and collective ownership, similarly to peer programming you can achieve faster and better code delivery. Learn how to integrate Jenkins CI into the development process and have automatic validation to feedback into your code review workflow.
Hands-On Lab with Gerrit & Jenkins (February 12th)
Step-by-step hands-on with Gerrit to learn how to move your first steps with Code Review: contribute your change, trigger your Jenkins CI validation and learn how to understand the feedbacks and amend your code for achieving a successful merged change into the main branch. Learn how to manage multiple reviews when the code evolves.
Workflows, Branching and Merging – Q&A
Q: Do you have any tool to migrate source code of one SCM tool to GIT ?
A: Git native distributions already include tools that allow to fetch code from external SVN repository and push to a Git, including their history and branches. For other SCMs (i.e. CVS, Perforce, TFS) there are OpenSource and commercial tools to migrate the full history of your repository into Git.
Q: Gerrit Dashboards already works?
A: Dashboards have been introduced in Gerrit in November 2012 hackathon and are one of the major features of Gerrit 2.6. For more information on the Gerrit roadmap see our previous post.
Q: how can use GIT to map it ClearCase UCM
A: Git is not really comparable to ClearCase UCM. Gerrit Code Review plus the additional ALM features provided by CollabNet TeamForge or the Enterprise Issue-Tracker integrations of GerritForge LLP, can really be mapped to ClearCase UCM concepts as both define the concept of “project” and “tracking” between code changes and work items into your development lifecycle. Git can be mapped to ClearCase whilst Gerrit and its Issue-Tracker integrations map to UCM with the role of associating one or more Git commits to a delivery or set of issues tracked in ClearQuest.
Q: will we cover Gerrit and AD integration? or Git AD integration ?
A: Git does not provide AD integration out-of-the-box, unless integrated and customised through an HTTP front-end (i.e. Apache) to resolve user credentials authentication. Gerrit provides AD and industry-standard LDAP support and will be covered in the second and third webinars in the series.
Q: are there any danger when auto-merging?
A: It depends on whether you have Gerrit triggering your Jenkins CI to “protect” from wrong auto-merges or if you just rely on nightly builds and standard QA validation: when Gerrit and Jenkins are integrated, even auto-mergine is not a risk as the consistency of the code-merge is validated. Typically a code-review workflow requires the change to be already “rebased” on the branch candidate to receive the change, otherwise you would run into the risk of validating a change on the wrong context.
Q: If someone commits a huge binary file by accident and pushes it to the origin/master, can that be undone or is it there to stay forever? Or is this also covered in history protection link?
A: Git always allows to “remove” commits permanently, using “git reset” or “git rebase -i” (interactive rebase) to remove or amend the commit containing the huge binary file. However Git keeps the binary blob into its objects until garbage collection will trigger an physical removal of all unreferenced objects, including the huge binary file. With CollabNet TeamForge history protection you have full control of when and what to remove completely from your Git history (in this case the huge binary file).
Q: Can roles in Gerrit be defined in ldap? If so, what are the requirements? Can roles in Gerrit be defined in an external source? A: Gerrit allows the usage of RBAC (role-based access control) through its group level permissions. Groups can be defined in LDAP and Gerrit Ver. 2.5 can use them and retrieve them dynamically using the its pluggable “group backend” infrastructure. Groups LDAP entry points need to be configured in gerrit.config in order to be fetched by Gerrit. Similarly other external sources can be plugged as “group backend” into Gerrit by implementing the connector interface to the external roles as Gerrit plugin. For more information on how to create a Gerrit plugin see our talk on “How to cook a Gerrit Plugin in 10 mins” presented @GooglePlex in Mountain View in November 2012.
Q: You mentioned Jenkins fits the best here. How do you compare it with Team City in this regard?
A: TeamCity is a very advanced CI system and has introduced the concept of “test build” and “private builds” to prevent faulty code to break the team build, same battleground of Code Review to improve Team Agility without loosing control and code quality. Unfortunately TeamCity is not OpenSource and thus has a more limited community of plugin developers providing integrations for Git and Gerrit. Jenkins CI has the world’s largest community of plugin developers for a Continuous Integration engine and already provides support for Gerrit Code Review workflow.
Q: How many people voted on Gerrit poll?
A: 183
Q: What is the minimum number of branches needed if two geographically seperated teams are sharing the same codebase and committing to the same codebase? e.g. production, master, dev1, de2, etc. How many should we maintain to be minimal?
A: Minimum number of branches is actually one, the master. Two repositories in two locations have already a “copy” of the “master” branch and then are effectively different branches that can be merged during the repository synchronisation process across locations. In practice it is recommended to have at least one active development branch (“master”) and then one branch per stable release, independently from the number of geographically separated teams. Additionally you may want to develop features in parallel using the “topic” branches and then having them merged into the “master” as they are validated through Code Review by the Teams.
Q: We are about to start the migration from SVN to GIT and are worried about the best way to train the developers – can you give any tips of the best approach
A: Git requires a lot of “day-by-day” support from the beginning, especially for Teams coming from a SVN experience. Concepts are different and sometimes SVN command names have a different meaning and behaviour in Git. It is recommended to carefully tune Gerrit permissions to avoid “dangerous” pushes from “beginners” Team members (i.e. disabling forge identity and forced push) and start with a dedicated Git training to a group of “champions” that will be then spread across all of the Teams to provide support. Those “champions” will become then the “day-by-day” tutorials of the other Team members learning how to use Git in everyday work. P.S. Stick the “Git cheat sheet” on the wall of every Team room to enforce the Git development SCM workflow.
Q: How do users use git on windows systems? Is there a tool like Tortoise?
A: Yes and its name is TortoiseGit (no surprise here !) but it is only a GUI front-end to Git for Windows that needs to be installed as pre-requisite. However I do encourage the Teams to use the Git command line in order to first understand the concepts and then to learn how to get the best from the tool when they need it. For most common operations they can use TortoiseGit or any other Git client GUI tool (personally I use and like a lot GitX on MacOS) but they need to understand the underlying commands that are executed and generated as Git actions on the repository.
Q: How to find out which files someone has edited over history of project? It is easy in a specific commit, but seems hard over the life of the project …
A: Git is a very powerful tool designed to provide full history inspection and code search. Each command is very sophisticated and flexible to perform even complex tasks. This specific task mentioned is not complex for Git, the following example provides a list and stat of lines changed by someone@somecompany.com over the current branch: git log –author someone@somecompany.com –numstat
Q: Do you have any best practices to migrate from Subversion to GIT?
A: My best suggestion is to assure consistency of Subversion vs Git usernames and e-mails. You wouldn’t like to loose visibility and association of who made a commit in SVN and his identity in Git. In order to enforce consistency make sure that “forge identity” permission is disabled when start using Git after the initial migration, in order to keep your ex-SVN users always aligned with their own identity in Git. “Forge identity” can then be granted gradually as developers become more aware and familiar with the Git tool.
Q: C1 to C5 is like Team Branch. C6, C7 is like Dev Branch.
A: C6-C7 is more like a “topic-branch” where one or more developers are working on a specific feature. Master is the main development branch.
Q: Why do we need a Contributor who is NOT a commiter?
A: Anybody in the Team or outside the Team (when authorised to do so) should be able to provide their ideas and possibly even their fixes to the code; however you do not necessarily want to allow everybody to commit and merge a change (potentially flawed) and break the Team master build, causing delay on your project sprint. Defining “contributors” you can allow everybody to provide their changes into the Team’s discussion and promote them through automatic validation (Jenkins CI builds) and collective code-review, avoiding the risk of breaking the build by unwanted or not validated changes. You can then differentiate them from the “Senior Team Members” (Committer) who have the technical ownership of the design and timelines of the Team Project.
Q: any comment on gerrit vs sonar code reviews?
A: Sonar is a fantastic infrastructure to trigger code quality checks but not necessarily to action on code promotion. Gerrit is the place where people decide what to do with regards to a change, whilst Sonar is only providing a feedback on “how the change looks like” without any active review action on it. However it would be possible to integrate Sonar feedback into Gerrit code-review lifecycle developing a Sonar plugin for Gerrit or simply getting Sonar feedback into Jenkins build and then providing a validation result to Gerrit through the Jenkins Gerrit plugin.
Q: In the Git merge slide, what’s the recursive option?
A: Recursive merge strategy (the default merge option in Git) allows automatically to detect complex branches history and merge them together by walking recursively into each branch history and looking by a common ancestor branch point.
Q: does Gerrit only works with Git?
A: Yes, mainly because the it needs all the power and ability of Git to merge and rebase changes once they have been reviewed, approved and then submitted. Additionally Gerrit uses Git ability to define custom “hidden branch namespaces” to store Security and Roles definition of the repository itself.
Q: will Gerrit work with Git Fusion? Do we need a mirror repository in that case?
A: Perforce Git Fusion, used as “blessed Git repository” is a Perforce instance that “talks” the Git protocol to allow cloning and pushing to it. However Gerrit uses JGit to access its Git repository on the Filesystem and cannot then use a Perforce backend. However Perforce Git Fusion could be used as synchronisation peer of a Gerrit repo, through the usage of the Git replication protocol.
Q: how does the distributed scm work in git when the development teams are spread across geographical locations?
A: You typically configure one Gerrit replicated instance per geographical site, in order to allow maximum performance on the “git pull” operation. However for minimising conflicts during geographical repository replication, it is important to perform “git push” to only one of the repositories replicas; all the others will get the changes eventually thanks to the Gerrit replication events being propagated.
Q: how does git auto resolve work, how does git know which files to merge first in order to keep all changes from all submits?
A: Git has a lot of powerful commands and hidden (or semi-hidden) functionalities, waiting to be discovered and leveraged !
I think you refer here to the “git automatic conflict resolution” (aka git rerere) described in this Git documentation link. Git stores the “history” on how a conflict was resolved in the past and then reuse the same info to resolve future conflicts. Functionality is normally disabled but can be turned on with the following command:
git config –global rerere.enabled true
Q: Can you please explain what’s ther diff between Recursive Merge vs Rebase? the diagrams look very similar
A: There is one fundamental difference between “merging” and “rebasing” two branches: in the first case the merge preserves the full history of the two branches and creates one extra commit containing the “merged code”; in the second case the rebase modifies the history of the branch by “replaying the changes” on top of the target branch, as you were in a time-machine and you would push changes into a future state and the branch point was never created in the past. In a nutshell, the “merge” can be reverted because doesn’t change the history (removing the merged commit would suffice) whilst a “rebase” cannot be reverted easily as the “rebased branch” history has been modified and reapplied in the future. Rebase however has the effect of “flattening up” the branch history and allowing a more concise and readable set of commits on a development branch.
—
Thank you again for your attendance and don’t miss our next Webinar with CollabNet Inc, Tuesday, January 29, 10:00 AM – 11:00 AM PST.