-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Load data into Vercel using GitHub Actions #161
Changes from 9 commits
5216b2f
fd81eb3
b635477
d04a7de
f227d3d
d802689
19d257a
a79e6d4
d05bb67
3e543da
e27cf4c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,45 @@ | ||
name: ETL to neon | ||
|
||
# Workflow triggers | ||
on: | ||
schedule: | ||
- cron: "0 2 * * 0" # Runs at 2am UTC every Sunday | ||
workflow_dispatch: # Allows manual triggering of the workflow | ||
|
||
jobs: | ||
neon-etl: | ||
runs-on: ubuntu-latest | ||
|
||
steps: | ||
- name: Checkout repository | ||
uses: actions/checkout@v4 | ||
|
||
- name: Set up Python | ||
uses: actions/setup-python@v4 | ||
with: | ||
python-version: "3.12" | ||
|
||
- name: Install dependencies | ||
run: | | ||
pip install -r requirements.txt | ||
- name: Get Run ID of Most Recent Successful Run | ||
id: get_run_id | ||
run: | | ||
response=$(curl -s -H "Authorization: token ${{ secrets.GH_PAT }}" \ | ||
"https://api.github.com/repos/sfbrigade/datasci-earthquake/actions/workflows/env_vars.yml/runs?status=completed&conclusion=success") | ||
run_id=$(echo $response | jq '.workflow_runs[0].id') | ||
echo "Run ID: $run_id" | ||
echo "run_id=$run_id" >> $GITHUB_ENV | ||
- name: Download .env Artifact | ||
uses: actions/download-artifact@v4 | ||
with: | ||
name: env-file | ||
github-token: ${{ secrets.GH_PAT }} | ||
repository: sfbrigade/datasci-earthquake | ||
run-id: ${{ env.run_id }} | ||
|
||
- name: ETL data to Neon DB | ||
run: | | ||
python -m backend.etl.tsunami_data_handler | ||
python -m backend.etl.soft_story_properties_data_handler | ||
python -m backend.etl.liquefaction_data_handler |
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added a |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3,7 +3,7 @@ | |
from backend.api.config import settings | ||
|
||
# Set up the database engine using settings | ||
engine = create_engine(settings.database_url_sqlalchemy, echo=True) | ||
engine = create_engine(settings.neon_url, echo=True) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are there consequences to changing this script to connect to the Neon DB as opposed to the local DB? As in, if I want to test my local DB, do I need to change this line back to connect to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, you'd have to change this line back to test locally. If this is a problem, I can add an if-clause that would switch the url based on the environment There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok, I think it’s fine for now. I would probably prefer putting more data in the test DB. |
||
|
||
# Create a session factory | ||
SessionLocal = sessionmaker(autocommit=False, autoflush=False, bind=engine) | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The duplicate check relied on the auto increment PK, which prevented duplicates from being detected. Replaced the autoincrement key with a composite PK |
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Got a mypy error: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While in development, the workflow is only triggered once a week - to save on data transfer costs. The free limit of data transfer is 5 GB/month.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a neon or Github Actions limit? Just curious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a neon limit