AirbyteCatalogapproachable to someone contributing to Airbyte for the first time. If you are looking to get deeper into the details of the catalog, you can read our technical specification on it here.
AirbyteCatalogis to describe what data is available in a source. The goal of the
ConfiguredAirbyteCatalogis to, based on an
AirbyteCatalog, specify how data from the source is replicated.
AirbyteCatalogvia a series of examples. We recommend reading the Database Example first. The other examples, will refer to knowledge described in that section. After that, jump around to whichever example is most pertinent to your inquiry.
AirbyteStream. In the case of a database a "stream" is analogous to a table. (For APIs the mapping can be a more creative; we will discuss it later in API Examples)
name- The name of the stream.
source_defined_cursor- If the stream supports
INCREMENTALreplication, then this field signals whether the source can figure out how to detect new records on its own or not.
FULL_REFRESH. Here's what our
ConfiguredAirbyteCatalogwould look like.
ConfiguredAirbyteCatalogcontains a list. This time it is a list of
ConfiguredAirbyteStream(instead of just
sync_mode- This field must be one of the values that was in
AirbyteStream- Configures which sync mode will be used when data is replicated.
stream- Hopefully this one looks familiar! This field contains an
AirbyteStream. It should be identical to the one we saw in the
source_defined_cursor = false, this field configures which field in the stream will be used to determine if a record should be replicated or not. Read more about this concept in our documentation of incremental replication.
ConfiguredAirbyteCatalog, remember that the
AirbyteCatalogdescribes what data is present in the source (and metadata around what replication configuration it can support). It is output by the
discovermethod of source. It should be treated as an immutable object; if you are ever manually editing a catalog outside of a source, you've gone off the rails. The
ConfiguredAirbyteCatalogis a mutable configuration object that specifies, for each
AirbyteStream, how (and if) it should be replicated. The
ConfiguredAirbyteCatalogdoes this by wrapping each
AirbyteCatalogoffers the flexibility in how to model the data for an API. In the next two examples, we will model data from the same API--a stock ticker--in two different ways. In the first, the source will return a single stream called
ticker, and in the second, the source with return a stream for each stock symbol it is configured to retrieve data for. Each stream's name will be a stock symbol.
tickerand will contain the closing price of the stock. We will assume that you already have a rough understanding of the
ConfiguredAirbyteCatalogfrom the previous database example.
AirbyteCatalogmight look like.
AirbyteCatalogthat we created for the Database Example. For the data we've picked here, you can think about
tickeras a table and then each field it returns in a record as a column, so it makes sense that these look pretty similar.
AirbyteCatalogfor a source however makes most sense to the use case you are trying to fulfill.
AirbyteCatalogwould look like this: