Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move to cloud composer 2; better handle long-running tasks #36

Merged
merged 21 commits into from
Mar 1, 2024

Conversation

jmelot
Copy link
Member

@jmelot jmelot commented Jan 12, 2024

Now splits the long-running simhash and merged id generation task into multiple tasks, using sensors to reduce issues with lost connections.

Also adds linting and moves bucket to cloud composer 2

Closes #35

@jmelot jmelot force-pushed the 35-cloud-composer-2-and-sensors branch from 66d1e5c to eee4ab6 Compare January 12, 2024 13:35
Copy link

github-actions bot commented Jan 12, 2024

No need for rebasing 👍
behind_count is 0
ahead_count is 21

Copy link

github-actions bot commented Jan 22, 2024

☂️ Python Coverage

current status: ✅

Overall Coverage

Lines Covered Coverage Threshold Status
300 260 87% 0% 🟢

New Files

File Coverage Status
tests/test_make_unlink_rows.py 100% 🟢
utils/make_unlink_rows.py 48% 🟢
TOTAL 74% 🟢

Modified Files

File Coverage Status
tests/test_clean_corpus.py 100% 🟢
tests/test_create_merge_ids.py 98% 🟢
utils/clean_corpus.py 77% 🟢
utils/create_merge_ids.py 92% 🟢
TOTAL 92% 🟢

updated for commit: 05a076c by action🐍

@jmelot jmelot force-pushed the 35-cloud-composer-2-and-sensors branch from 7d39f07 to b3d2590 Compare January 31, 2024 16:21
@jmelot jmelot requested a review from rggelles January 31, 2024 16:22
@jmelot jmelot requested review from niharikasingh and removed request for rggelles February 14, 2024 15:06
linkage_dag.py Outdated

push_to_gcs = BashOperator(
task_id="push_to_gcs",
bash_command=f'gcloud compute ssh jm3312@{gce_resource_id} --zone {gce_zone} --command "run_ids_script.sh &"',

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is the run_ids_script.sh file? I couldn't find this one.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's in utils!

Copy link
Member Author

@jmelot jmelot Mar 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wait, I think you're right and this was a leftover task that wasn't doing anything. Fixed, thanks!!

@jmelot jmelot merged commit a743d50 into master Mar 1, 2024
3 checks passed
@jmelot jmelot deleted the 35-cloud-composer-2-and-sensors branch March 1, 2024 18:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve handling of long-running tasks
2 participants