Connector Catalog
Sources
Apify Dataset
Web scraping and automation platform.

Apify is a web scraping and web automation platform providing both ready-made and custom solutions, an open-source SDK for web scraping, proxies, and many other tools to help you build and run web automation jobs at scale.
The results of a scraping job are usually stored in Apify Dataset. This Airbyte connector allows you to automatically sync the contents of a dataset to your chosen destination using Airbyte.
To sync data from a dataset, all you need to know is its ID. You will find it in Apify console under storages.

When your Apify job (aka actor run) finishes, it can trigger an Airbyte sync by calling the Airbyte API manual connection trigger (POST /v1/connections/sync). The API can be called from Apify webhook which is executed when your Apify run finishes.

Since the dataset items do not have strongly typed schema, they are synced as objects, without any assumption on their content.

Feature
Supported?
Full Refresh Sync
Yes
Incremental Sync
No

The Apify dataset connector uses Apify Python Client under the hood and should handle any API limitations under normal usage.

Version
Date
Pull Request
Subject
0.1.4
2021-12-23
PR#8434
Update fields in source-connectors specifications
0.1.2
2021-11-08
PR#7499
Remove base-python dependencies
0.1.0
2021-07-29
PR#5069
Initial version of the connector
Last modified 9mo ago
Copy link
On this page
Overview
Running Airbyte sync from Apify webhook
Output schema
Features
Performance considerations
Getting started
Requirements
Changelog