- Python version: 3.9.7
- Code formatter: autopep8
- Spark: 3.1.2
- Hadoop: 3.2.0
- Scala: 2.12
- py4j: 0.10.9
- Docker Desktop: link
- You may have to go to Preferences > Resources to increase memory.
Recommend > 6GB.
- You may have to go to Preferences > Resources to increase memory.
- Conda: link
- Git clone and
cd 5003-project
. - Duplicate
.env.example
and rename it to.env
, update the credentials inside if needed
(Tip: if you can't find the file, try opening the folder with an IDE) - Update
KAFKA_CONNECTION_STRING
,KAFKA_TOPIC_NAME
andENV
in.env
accordingly.- If
ENV
is not set or isdev
, the ingestor will send messages to the local dockerized kafka broker. - [Optional] To send messages to cloud endpoint (Azure Event Hubs), simply update
KAFKA_CONNECTION_STRING
, and setENV
toprod
- If
- Run
docker compose pull
- Run
docker compose up
to start services. - Run
docker compose down
to stop the services.
- Create conda environment with packages:
conda env create -f environment.yml
- Activate conda environment:
conda activate 5003-project
- Export conda package list:
conda env export --no-builds --from-history > environment.yml
- cd to project root
- API
- Call
uvicorn src.backend_api.app.main:app --reload --env-file=".env" --app-dir="src/backend_api/app"
- Access docs at
http://127.0.0.1:8000/latest/docs
- Call
- Notebook
- Call
jupyter-lab --config=/jupyter_lab_config.py
- Access at
http://127.0.0.1:9999/
- Call
-
Running in local:
docker compose up
- API Docs: http://localhost:80/latest/docs
- Notebook: http://localhost:8888/lab?token=5003-project
- TimescaleDB:
localhost:5432
- Spark master node (Pyspark endpoint):
localhost:7077
- Spark master node (WebUI): http://localhost:8080/
- Grafana: http://localhost:3000/
-
Running in local with rebuild:
docker compose up --build
-
Interactive shell for debugging:
docker compose up && docker-compose run backend-api sh
Example notebooks can be found in the notebook
directory
- Config at
setup.cfg
- Token: 5003-project
- DB Name: 5003-project-dev
- Username: postgres
- Password: 5003-project
- Grafana provisioning: https://grafana.com/tutorials/provision-dashboards-and-data-sources/
- Question: TimescaleDB keeps complaining about
WARNING: could not open statistics file "pg_stat_tmp/global.stat": Operation not permitted
- Answer: This is a known problem documented in Postgres' official docker hub page. In short, it does not affect operation, and can be safely ignored.