Connector Catalog
Sources
Microsoft SQL Server (MSSQL)

Feature
Supported
Notes
Full Refresh Sync
Yes
Incremental Sync - Append
Yes
Replicate Incremental Deletes
Yes
CDC (Change Data Capture)
Yes
SSL Support
Yes
SSH Tunnel Connection
Yes
Namespaces
Yes
Enabled by default
The MSSQL source does not alter the schema present in your database. Depending on the destination connected to this source, however, the schema may be altered. See the destination's documentation for more details.

You may run into an issue where the connector provides wrong values for some data types. See discussion on unexpected behaviour for certain datatypes.

On Airbyte Cloud, only TLS connections to your MSSQL instance are supported in source configuration. Other than that, you can proceed with the open-source instructions below.

  1. 1.
    MSSQL Server Azure SQL Database, Azure Synapse Analytics, Azure SQL Managed Instance, SQL Server 2019, SQL Server 2017, SQL Server 2016, SQL Server 2014, SQL Server 2012, PDW 2008R2 AU34.
  2. 2.
    Create a dedicated read-only Airbyte user with access to all tables needed for replication
  3. 3.
    If you want to use CDC, please see the relevant section below for further setup requirements

This is dependent on your networking setup. The easiest way to verify if Airbyte is able to connect to your MSSQL instance is via the check connection tool in the UI.
This step is optional but highly recommended to allow for better permission control and auditing. Alternatively, you can use Airbyte with an existing user in your database.
Coming soon: suggestions on how to create this user.

We use SQL Server's change data capture feature to capture row-level INSERT, UPDATE and DELETE operations that occur on cdc-enabled tables.
Some extra setup requiring at least db_owner permissions on the database(s) you intend to sync from will be required (detailed below).
Please read the CDC docs for an overview of how Airbyte approaches CDC.

  • If you need a record of deletions and can accept the limitations posted below, CDC is the way to go!
  • If your data set is small and/or you just want a snapshot of your table in the destination, consider using Full Refresh replication for your table instead of CDC.
  • If the limitations below prevent you from using CDC and your goal is to maintain a snapshot of your table in the destination, consider using non-CDC incremental and occasionally reset the data and re-sync.
  • If your table has a primary key but doesn't have a reasonable cursor field for incremental syncing (i.e. updated_at), CDC allows you to sync your table incrementally.

  • Make sure to read our CDC docs to see limitations that impact all databases using CDC replication.
  • There are some critical issues regarding certain datatypes. Please find detailed info in this Github issue.
  • CDC is only available for SQL Server 2016 Service Pack 1 (SP1) and later.
  • db_owner (or higher) permissions are required to perform the neccessary setup for CDC.
  • You must enable snapshot isolation mode on the database(s) you want to sync. This is used for retrieving an initial snapshot without locking tables.
  • On Linux, CDC is not supported on versions earlier than SQL Server 2017 CU18 (SQL Server 2019 is supported).
  • Change data capture cannot be enabled on tables with a clustered columnstore index. (It can be enabled on tables with a non-clustered columnstore index).
  • The SQL Server CDC feature processes changes that occur in user-created tables only. You cannot enable CDC on the SQL Server master database.
  • Using variables with partition switching on databases or tables with change data capture (CDC) is not supported for the ALTER TABLE ... SWITCH TO ... PARTITION ... statement
  • Our implementation has not been tested with managed instances, such as Azure SQL Database (we welcome any feedback from users who try this!)
    • If you do want to try this, CDC can only be enabled on Azure SQL databases tiers above Standard 3 (S3+). Basic, S0, S1 and S2 tiers are not supported for CDC.
  • Our CDC implementation uses at least once delivery for all change records.
  • Read more on CDC limitations in the Microsoft docs.

MS SQL Server provides some built-in stored procedures to enable CDC.
  • To enable CDC, a SQL Server administrator with the necessary privileges (db_owner or sysadmin) must first run a query to enable CDC at the database level.
    USE {database name}
    GO
    EXEC sys.sp_cdc_enable_db
    GO
  • The administrator must then enable CDC for each table that you want to capture. Here's an example:
    USE {database name}
    GO
    EXEC sys.sp_cdc_enable_table
    @source_schema = N'{schema name}',
    @source_name = N'{table name}',
    @role_name = N'{role name}', [1]
    @filegroup_name = N'{fiilegroup name}', [2]
    @supports_net_changes = 0 [3]
    GO
    • [1] Specifies a role which will gain SELECT permission on the captured columns of the source table. We suggest putting a value here so you can use this role in the next step but you can also set the value of @role_name to NULL to allow only sysadmin and db_owner to have access. Be sure that the credentials used to connect to the source in Airbyte align with this role so that Airbyte can access the cdc tables.
    • [2] Specifies the filegroup where SQL Server places the change table. We recommend creating a separate filegroup for CDC but you can leave this parameter out to use the default filegroup.
    • [3] If 0, only the support functions to query for all changes are generated. If 1, the functions that are needed to query for net changes are also generated. If supports_net_changes is set to 1, index_name must be specified, or the source table must have a defined primary key.
  • (For more details on parameters, see the Microsoft doc page for this stored procedure).
  • If you have many tables to enable CDC on and would like to avoid having to run this query one-by-one for every table, this script might help!

  • When a sync runs for the first time using CDC, Airbyte performs an initial consistent snapshot of your database. To avoid acquiring table locks, Airbyte uses snapshot isolation, allowing simultaneous writes by other database clients. This must be enabled on the database like so:
    ALTER DATABASE {database name}
    SET ALLOW_SNAPSHOT_ISOLATION ON;

  • Rather than use sysadmin or db_owner credentials, we recommend creating a new user with the relevant CDC access for use with Airbyte. First let's create the login and user and add to the db_datareader role:
    USE {database name};
    CREATE LOGIN {user name}
    WITH PASSWORD = '{password}';
    CREATE USER {user name} FOR LOGIN {user name};
    EXEC sp_addrolemember 'db_datareader', '{user name}';
    • Add the user to the role specified earlier when enabling cdc on the table(s):
      EXEC sp_addrolemember '{role name}', '{user name}';
    • This should be enough access, but if you run into problems, try also directly granting the user SELECT access on the cdc schema:
      USE {database name};
      GRANT SELECT ON SCHEMA :: [cdc] TO {user name};
    • If feasible, granting this user 'VIEW SERVER STATE' permissions will allow Airbyte to check whether or not the SQL Server Agent is running. This is preferred as it ensures syncs will fail if the CDC tables are not being updated by the Agent in the source database.
      USE master;
      GRANT VIEW SERVER STATE TO {user name};

  • In SQL Server, by default, only three days of data are retained in the change tables. Unless you are running very frequent syncs, we suggest increasing this retention so that in case of a failure in sync or if the sync is paused, there is still some bandwidth to start from the last point in incremental sync.
  • These settings can be changed using the stored procedure sys.sp_cdc_change_job as below:
    -- we recommend 14400 minutes (10 days) as retention period
    EXEC sp_cdc_change_job @job_type='cleanup', @retention = {minutes}
  • After making this change, a restart of the cleanup job is required:
EXEC sys.sp_cdc_stop_job @job_type = 'cleanup';
EXEC sys.sp_cdc_start_job @job_type = 'cleanup';

  • MSSQL uses the SQL Server Agent
    for CDC. It is therefore vital that the Agent is operational in order for to CDC to work effectively. You can check
    the status of the SQL Server Agent as follows:
EXEC xp_servicecontrol 'QueryState', N'SQLServerAGENT';
  • If you see something other than 'Running.' please follow
    to start the service.

Airbyte has the ability to connect to a MSSQL instance via an SSH Tunnel. The reason you might want to do this because it is not possible (or against security policy) to connect to the database directly (e.g. it does not have a public IP address).
When using an SSH tunnel, you are configuring Airbyte to connect to an intermediate server (a.k.a. a bastion sever) that does have direct access to the database. Airbyte connects to the bastion and then asks the bastion to connect directly to the server.
Using this feature requires additional configuration, when creating the source. We will talk through what each piece of configuration means.
  1. 1.
    Configure all fields for the source as you normally would, except SSH Tunnel Method.
  2. 2.
    SSH Tunnel Method defaults to No Tunnel (meaning a direct connection). If you want to use an
    SSH Tunnel choose SSH Key Authentication or Password Authentication.
    1. 1.
      Choose Key Authentication if you will be using an RSA private key as your secret for
      establishing the SSH Tunnel (see below for more information on generating this key).
    2. 2.
      Choose Password Authentication if you will be using a password as your secret for establishing
      the SSH Tunnel.
  3. 3.
    SSH Tunnel Jump Server Host refers to the intermediate (bastion) server that Airbyte will connect to. This should
    be a hostname or an IP Address.
  4. 4.
    SSH Connection Port is the port on the bastion server with which to make the SSH connection. The default port for
    SSH connections is 22, so unless you have explicitly changed something, go with the default.
  5. 5.
    SSH Login Username is the username that Airbyte should use when connection to the bastion server. This is NOT the
    MSSQL username.
  6. 6.
    If you are using Password Authentication, then SSH Login Username should be set to the
    password of the User from the previous step. If you are using SSH Key Authentication leave this
    blank. Again, this is not the MSSQL password, but the password for the OS-user that Airbyte is
    using to perform commands on the bastion.
  7. 7.
    If you are using SSH Key Authentication, then SSH Private Key should be set to the RSA
    private Key that you are using to create the SSH connection. This should be the full contents of
    the key file starting with -----BEGIN RSA PRIVATE KEY----- and ending
    with -----END RSA PRIVATE KEY-----.

The connector expects an RSA key in PEM format. To generate this key:
ssh-keygen -t rsa -m PEM -f myuser_rsa
This produces the private key in pem format, and the public key remains in the standard format used by the authorized_keys file on your bastion host. The public key should be added to your bastion host to whichever user you want to use with Airbyte. The private key is provided via copy-and-paste to the Airbyte connector configuration screen, so it may log in to the bastion.

MSSQL data types are mapped to the following data types when synchronizing data. You can check the test values examples here. If you can't find the data type you are looking for or have any problems feel free to add a new test!
MSSQL Type
Resulting Type
Notes
bigint
number
binary
string
bit
boolean
char
string
date
number
datetime
string
datetime2
string
datetimeoffset
string
decimal
number
int
number
float
number
geography
string
geometry
string
money
number
numeric
number
ntext
string
nvarchar
string
nvarchar(max)
string
real
number
smalldatetime
string
smallint
number
smallmoney
number
sql_variant
string
uniqueidentifier
string
text
string
time
string
tinyint
number
varbinary
string
varchar
string
varchar(max) COLLATE Latin1_General_100_CI_AI_SC_UTF8
string
xml
string
If you do not see a type in this list, assume that it is coerced into a string. We are happy to take feedback on preferred mappings.

Version
Date
Pull Request
Subject
0.3.17
2022-02-21
10242
Fixed cursor for old connectors that use non-microsecond format. Now connectors work with both formats
0.3.16
2022-02-18
10242
Updated timestamp transformation with microseconds
0.3.15
2022-02-14
10256
Add -XX:+ExitOnOutOfMemoryError JVM option
0.3.14
2022-01-24
9554
Allow handling of java sql date in CDC
0.3.13
2022-01-07
9094
Added support for missed data types
0.3.12
2021-12-30
9206
Update connector fields title/description
0.3.11
2021-12-24
8958
Add support for JdbcType.ARRAY
0.3.10
2021-12-01
8371
Fixed incorrect handling "\n" in ssh key
0.3.9
2021-11-09
7386
Improve support for binary and varbinary data types
0.3.8
2021-10-26
7386
Fixed data type (smalldatetime, smallmoney) conversion from mssql source
0.3.7
2021-09-30
6585
Improved SSH Tunnel key generation steps
0.3.6
2021-09-17
6318
Added option to connect to DB via SSH
0.3.4
2021-08-13
4699
Added json config validator
0.3.3
2021-07-05
4689
Add CDC support
0.3.2
2021-06-09
3179
Add AIRBYTE_ENTRYPOINT for Kubernetes support
0.3.1
2021-06-08
3893
Enable SSL connection
0.3.0
2021-04-21
2990
Support namespaces
0.2.3
2021-03-28
2600
Add NCHAR and NVCHAR support to DB and cursor type casting
0.2.2
2021-03-26
2460
Destination supports destination sync mode
0.2.1
2021-03-18
2488
Sources support primary keys
0.2.0
2021-03-09
2238
Protocol allows future/unknown properties
0.1.11
2021-02-02
1887
Migrate AbstractJdbcSource to use iterators
0.1.10
2021-01-25
1746
Fix NPE in State Decorator
0.1.9
2021-01-19
1724
Fix JdbcSource handling of tables with same names in different schemas
0.1.9
2021-01-14
1655
Fix JdbcSource OOM
0.1.8
2021-01-13
1588
Handle invalid numeric values in JDBC source
0.1.6
2020-12-09
1172
Support incremental sync
0.1.5
2020-11-30
1038
Change JDBC sources to discover more than standard schemas
0.1.4
2020-11-30
1046
Add connectors using an index YAML file
Copy link
On this page
Features
Troubleshooting
Getting Started (Airbyte Cloud)
Getting Started (Airbyte Open-Source)
Change Data Capture (CDC)
Should I use CDC for MSSQL?
Setting up CDC for MSSQL
Connection to MSSQL via an SSH Tunnel
Generating an SSH Key Pair
Data type mapping
Changelog