GitHub API change causes problems to Jenkins and Gerrit

GitHub has recently changed his API default permissions and has caused big problems and outages to Jenkins or Gerrit instance configured for with OAuth 2.0 authentication.

GerritHub.io unfortunately has been impacted and this has caused two outages today:

  1. 0:40 – 1:10 CEST (GitHub API error temporary overload – automatically resolved)
  2. 10:50 – 11:25 CEST (GitHub API error overload causing the slowdown of HTTP calls and subsequent exhaust of our DBMS connection pooling)

The second outage was more serious as the GitHub API problems happened exactly at peak hours for European customers.

What is the current situation?

We have added the extra “read:org” Scope permissions to the default public access to GerritHub.io in order to prevent the GitHub API from failing. This change requires you to logout and login back to GerritHub.io to approve the extra permission flag.

IMPORTANT NOTE: Previous authenticated sessions are not valid anymore (batch users for Jenkins Jobs) for reading your GitHub organisation ownership and, as a consequence, your Gerrit permissions cannot be fully evaluated. You need to login on behalf of the batch users to GerritHub.io and accept the new GitHub permissions in order to get the new valid OAuth token.

The system is back up-and-running but is slower than usual, due to the extra throttling applied by GitHub cause by the error overload. As people will start logging in again and approving the new permissions, the error rate should drop and the situation will come back to normal.

What if I still have problems after having logged in and approved the read:org permissions?

In case of any further issues, please contact GerritForge Support:
www.gerritforge.com/support

[EDIT: 17:53 BST]

We have been monitoring the situation during the day but the performance of the system was not recovering as quickly as we wanted. The problem was related to the batch users that were still running in background using OAuth tokens not authorised anymore to perform their actions.

One user from RedHat pointed out:

“You can see it triggered job and the Build results is SUCCESS. But there is no votes or verified status.”

This was caused by the batch user (configured in this case on Jenkins) was still authenticated through its old OAuth token but not authorised anymore to provide the “Verified” status. Batch users are typically not using the GUI and so have not a lot of chances of getting a renewed OAuth token with the correct permissions.

Current situation: workaround in place.

The OAuth Scope problem was only impacting those users associated to a public GitHub plan and thus using the default scopes user:email + public:repo. All the other users associated to a private GitHub plan had already granted access to all private information, including the full list of their public and private organisations.

The workaround in place uses the weakest link of the chain applied to the GitHub’s protection of the user’s organisations memberships:

  • A logged in with scopes [user:email + public:repo] cannot access its own list of organisations (strongest link).
  • The same user can however open a web browser and navigate, even without being authenticated, the URL https://github.com/username and extract the list of organisations on the bottom-left of the page under the H3 tag “Organizations” (weakest link)

The latest patch applied later today just apply this principle using the weakest link (page-scraping with anonymous HTTP-GET) as compensation of the failure to overcome the strongest link.

NOTE: The workaround allows to fill-up the Gerrit cache and gradually eliminates the GitHub throttling on the failed API calls. It allows the service to come back much more quickly to the expected normal response times. You are better anyway to authenticate to GerritHub.io interactively in order to get a renewed OAuth token as hopefully the workaround won’t be necessary anymore in the next few days.

Advertisements