- Docker (docker-compose)
- Jupyther notebook using pyspark
- Github
- DBT (Data Build Tool)
- Airflow
- PostgreSQL and PgAdmin
- docker >= 17.12.0+
- docker-compose
Access the folder Docker
and run
docker-compose up
Pay Attention to the log when you run docker-compose up
, because there you will see the token to access jupyther notebook.
The folder jupyther
has the notebook with the scripts to ingest data into the postgres database and consume from the csv file.
It's necessary to update the POSTGRES_DB_IP
, to update , open a new terminal and run:
docker inspect network docker_network-common
the IP of your database will be the "Gateway" inside this object:
"IPAM": {
"Driver": "default",
"Options": null,
"Config": [
{
"Subnet": "172.18.0.0/16",
"Gateway": "172.18.0.1"
}
]
}
In this case the internal ip of the container is 172.18.0.1
.
to access the PG Admin you can open
http://localhost:5050
in your prefered browser
The default password is admin
You need to create a server.
In my case I created one called california_house_training
inside Servers
The db is cht
and the table to ingest the csv file is cht_raw
For DBT it's necessary to update the parameter host
from the profile.yml file.
The host is the ip of the docker network container above.
POSTGRES_USER
default postgresPOSTGRES_PASSWORD
default postgresPGADMIN_PORT
default 5050PGADMIN_DEFAULT_EMAIL
default [email protected]PGADMIN_DEFAULT_PASSWORD
default adminENV AIRFLOW__CORE__EXECUTOR
LocalExecutor`AIRFLOW__CORE__SQL_ALCHEMY_CONN
postgresql+psycopg2://${POSTGRES_HOST}:${POSTGRES_USER}@${POSTGRES_PASSWORD}:5432/${POSTGRES_DATABASE}AIRFLOW__CORE__LOAD_EXAMPLES
FalseAIRFLOW__CORE__LOAD_DEFAULT_CONNECTIONS
FalseAIRFLOW__CORE__FERNET_KEY
${generate_fernet_key.py}
localhost:5432
- User: postgres (default)
- Password: postgres (default)
- URL:
http://localhost:5050
- User: [email protected] (default)
- Password: admin (default)
- URL:
http://localhost:8080
- User: airflow (default)
- Password: airflow (default)
- Host
postgres
- Port
5432
- User
POSTGRES_USER
, default:postgres
- Password
POSTGRES_PASSWORD
, defaultpostgres