Connector Catalog
Sources
GitHub

Overview

The GitHub source supports both Full Refresh and Incremental syncs. You can choose if this connector will copy only the new or updated data, or all rows in the tables and columns you set up for replication, every time a sync is run.

Output schema

This connector outputs the following full refresh streams:
This connector outputs the following incremental streams:

Notes

  1. 1.
    Only 4 streams (comments, commits, issues and review comments) from the above 17 incremental streams are pure incremental meaning that they:
    • read only new records;
    • output only new records.
  2. 2.
    Other 13 incremental streams are also incremental but with one difference, they:
    • read all records;
    • output only new records. Please, consider this behaviour when using those 13 incremental streams because it may affect you API call limits.
  3. 3.
    We are passing few parameters (since, sort and direction) to GitHub in order to filter records and sometimes for large streams specifying very distant start_date in the past may result in keep on getting error from GitHub instead of records (respective WARN log message will be outputted). In this case Specifying more recent start_date may help. The "Start Date" configuration option does not apply to the streams below, because the Github API does not include dates which can be used for filtering:
  • assignees
  • branches
  • collaborators
  • issue_labels
  • organizations
  • pull_request_commits
  • pull_request_stats
  • repositories
  • tags
  • teams
  • users

Features

Feature
Supported?
Full Refresh Sync
Yes
Incremental - Append Sync
Yes
Replicate Incremental Deletes
Coming soon
SSL connection
Yes
Namespaces
No

Performance considerations

The Github connector should not run into Github API limitations under normal usage. Please create an issue if you see any rate limit issues that are not automatically retried successfully.

Getting started

Requirements

  • Github Account;
  • Authentication - Select from 2 authentication methods:
    • Authenticate via GitHub (OAuth) - Only available in Airbyte Cloud. Authenticate by clicking the "Authenticate your account" button;
    • Authenticate with Personal Access Token - Use this method for Airbyte Open-Source. Log into GitHub and then generate a personal access token. To load balance your API quota consumption across multiple API tokens, input multiple tokens separated with ,;
  • Start Date - The date from which you'd like to replicate data for streams: comments, commit_comment_reactions, commit_comments, commits, deployments, events, issue_comment_reactions, issue_events, issue_milestones, issue_reactions, issues, project_cards, project_columns, projects, pull_request_comment_reactions, pull_requests, pull_requeststats, releases, review_comments, reviews, stargazers;
  • GitHub Repositories - Space-delimited list of GitHub organizations/repositories, e.g. airbytehq/airbyte for single repository, airbytehq/airbyte airbytehq/another-repo for multiple repositories. If you want to specify the organization to receive data from all its repositories, then you should specify it according to the following example: airbytehq/*;
  • Branch (Optional) - Space-delimited list of GitHub repository branches to pull commits for, e.g. airbytehq/airbyte/master. If no branches are specified for a repository, the default branch will be pulled. (e.g. airbytehq/airbyte/master airbytehq/airbyte/my-branch);
  • Page size for large streams (Optional) - The Github connector contains several streams with a large load. The page size of such streams depends on the size of your repository. Recommended to specify values between 10 and 30.

Permissions and scopes

If you use OAuth authentication method, the oauth2.0 application requests the next list of scopes: repo, read:org, read:repo_hook, read:user, read:discussion, workflow. For personal access token it need to manually select needed scopes.
Your token should have at least the repo scope. Depending on which streams you want to sync, the user generating the token needs more permissions:
  • For syncing Collaborators, the user which generates the personal access token must be a collaborator. To become a collaborator, they must be invited by an owner. If there are no collaborators, no records will be synced. Read more about access permissions here.
  • Syncing Teams is only available to authenticated members of a team's organization. Personal user accounts and repositories belonging to them don't have access to Teams features. In this case no records will be synced.
  • To sync the Projects stream, the repository must have the Projects feature enabled.

Changelog

Version
Date
Pull Request
Subject
0.2.25
2022-03-31
11567
Improve code for better error handling
0.2.24
2022-03-30
9251
Add Streams Workflow and WorkflowRuns
0.2.23
2022-03-17
11212
Improve documentation and spec for Beta
0.2.22
2022-03-10
10878
Fix error handling for unavailable streams with 404 status code
0.2.21
2022-03-04
10749
Add new stream ProjectCards
0.2.20
2022-02-16
10385
Add new stream Deployments, ProjectColumns, PullRequestCommits
0.2.19
2022-02-07
10211
Add human-readable error in case of incorrect organization or repo name
0.2.18
2021-02-09
10193
Add handling secondary rate limits
0.2.17
2021-02-02
9999
Remove BAD_GATEWAY code from backoff_time
0.2.16
2021-02-02
9868
Add log message for streams that are restricted for OAuth. Update oauth scopes.
0.2.15
2021-01-26
9802
Add missing fields for auto_merge in pull request stream
0.2.14
2021-01-21
9664
Add custom pagination size for large streams
0.2.13
2021-01-20
9619
Fix logging for function should_retry
0.2.11
2021-01-17
9492
Remove optional parameter Accept for reaction`s streams to fix error with 502 HTTP status code in response
0.2.10
2021-01-03
7250
Use CDK caching and convert PR-related streams to incremental
0.2.9
2021-12-29
9179
Use default retry delays on server error responses
0.2.8
2021-12-07
8524
Update connector fields title/description
0.2.7
2021-12-06
8518
Add connection retry with Github
0.2.6
2021-11-24
8030
Support start date property for PullRequestStats and Reviews streams
0.2.5
2021-11-21
8170
Fix slow check connection for organizations with a lot of repos
0.2.4
2021-11-11
7856
Resolve $ref fields in some stream schemas
0.2.3
2021-10-06
6833
Fix config backward compatability
0.2.2
2021-10-05
6761
Add oauth worflow specification
0.2.1
2021-09-22
6223
Add option to pull commits from user-specified branches
0.2.0
2021-09-19
5898 and 6227
Don't minimize any output fields & add better error handling
0.1.11
2021-09-15
5949
Add caching for all streams
0.1.10
2021-09-09
5860
Add reaction streams
0.1.9
2021-09-02
5788
Handling empty repository, check method using RepositoryStats stream
0.1.8
2021-09-01
5757
Add more streams
0.1.7
2021-08-27
5696
Handle negative backoff values
0.1.6
2021-08-18
5456
Add MultipleTokenAuthenticator
0.1.5
2021-08-18
5456
Fix set up validation
0.1.4
2021-08-13
5136
Support syncing multiple repositories/organizations
0.1.3
2021-08-03
5156
Extended existing schemas with users property for certain streams
0.1.2
2021-07-13
4708
Fix bug with IssueEvents stream and add handling for rate limiting
0.1.1
2021-07-07
4590
Fix schema in the pull_request stream
0.1.0
2021-07-06
4174
New Source: GitHub
Last modified 5mo ago
Copy link
On this page
Overview
Output schema
Notes
Features
Performance considerations
Getting started
Requirements
Permissions and scopes
Changelog