Skip to content

Latest commit

 

History

History
94 lines (76 loc) · 6.27 KB

README.md

File metadata and controls

94 lines (76 loc) · 6.27 KB

Airbyte

Data integration platform for ELT pipelines from APIs, databases & files to databases, warehouses & lakes

Test Release Slack YouTube Channel Views Build License License

We believe that only an open-source solution to data movement can cover the long tail of data sources while empowering data engineers to customize existing connectors. Our ultimate vision is to help you move data from any source to any destination. Airbyte already provides 300+ connectors for popular APIs, databases, data warehouses and data lakes.

Airbyte connectors can be implemented in any language and take the form of a Docker image that follows the Airbyte specification. You can create new connectors very fast with:

Airbyte has a built-in scheduler and uses Temporal to orchestrate jobs and ensure reliability at scale. Airbyte leverages dbt to normalize extracted data and can trigger custom transformations in SQL and dbt. You can also orchestrate Airbyte syncs with Airflow, Prefect or Dagster.

Airbyte OSS Connections UI

Explore our demo app.

Quick start

Run Airbyte locally

You can run Airbyte locally with Docker.

git clone https://github.com/airbytehq/airbyte.git
cd airbyte
docker compose up

Disaster Recovery:

How to restart the Service:

If the Airbyte service is down and we need to start it up again. Follow the next steps

  1. Destroy the current AirbyteStack and create another one from scrath:
    1. Go to the infrastructure repo and cd to infrastructure/stacks/data_platform
    2. Destroy the current stack: cdk destroy --exclusively AirbyteStack --context config=datascience
    3. Recreate the the stack: cdk deploy --exclusively AirbyteStack --context config=datascience
  2. Go to the EC2 instance and copy the Public IPv4 DNS of the recently created EC2 instance.
  3. Paste the IPv4 DNS in the SERVER variable in the Makefile in the root of this repo.
  4. Make sure that you have the airbyte.pem key in your ssh folder.
  5. IMPORTANT. Change the airbyte password variable BASIC_AUTH_PASSWORD in .env.prod file.
  6. From the root of this repo run make disaster_recovery. It will take some minutes to run all the commands.
  7. From the root of this repo run make forward_ec2_port. Now the Airbyte instance shoudl be accesible in http://localhost:8000/.
  8. Now it is time deploy the Sources, Destinations and Connections. For that we will use Octavia.
    1. We need to store the passwords as secrets in the Octavia config file (~/.octavia) a. From the root of this repo run make store_passwords. You need to have the AWS credentials for the Data Science prod Account to run this command. As it gets the passwords from AWS Secret Manager.
    2. From the root of this repo run make octavia_apply. Once it is done, go to the Airbyte UI and enable all the connections.
  9. Remember to go to data-airflow repo and change the connection Ids in the data_replication_airbyte_qogita_db_public_to_snowflake_raw and data_replication_airbyte_revenue_db_public_to_snowflake_raw DAGs

How to upgrade Airbyte or change environment variables in the service:

  1. Make the changes you want to apply in the .env.prod file.
  2. From the root of the repo run make apply_new_envs. PLEASE, TAKE INTO ACCCOUNT THAT THIS WILL STOP THE SERVICE AND RESTART IT.

Confluent kafka control center

How to deploy the service?

  1. Open /qogita-airbyte/kowl_config.yaml and modify the to use Scram AWS MSK password
  2. Run make run_kafka_docker_compose_up

How to connect to the Kafka confluent control center?

From the root of this repo run make forward_kowl_console_port. Now the Control center instance is accessible at http://localhost:8080/.