Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Celery app, remove kubernetes / flask, update tests to iRODS 4.3.3 #267

Merged
merged 8 commits into from
Sep 25, 2024

Conversation

alanking
Copy link
Collaborator

@alanking alanking commented Sep 11, 2024

Addresses #211
Addresses #262
Addresses #269
Addresses #272
Addresses #274

Some notes on this PR...

  1. Main test suite is passing.
  2. I have not tested S3 yet.
  3. The existing Celery application and many of its constellation of files have been removed.

I don't think we need any new tests because the goal is for the new implementation to be identical in behavior to the old implementation besides having new Celery application and task names.

Also, I realize this is a big change, so here's a few areas where eyeballs are most needed (IMO):

  • New file and directory names
  • New Celery task names (look for @app.task)
  • Any other re-organization efforts that seem like they would be good to include here

Will leave in draft til I can test with S3.

Copy link
Member

@trel trel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please run black if you hadn't already.

those imports all look pretty organized, so maybe you already did it...

irods_capability_automated_ingest/irods/filesystem.py Outdated Show resolved Hide resolved
irods_capability_automated_ingest/custom_event_handler.py Outdated Show resolved Hide resolved
@alanking
Copy link
Collaborator Author

Ah yes. I'll run black just before we # this because it's going to make rebasing very tricky later should I need to squash things.

@alanking
Copy link
Collaborator Author

Nevermind, I found a way to thread the needle. Had to run the formatter on the test suite before new changes came in and things work out pretty cleanly. Force-pushed because still in draft. Will test S3 shortly...

@trel
Copy link
Member

trel commented Sep 12, 2024

nice.

@alanking
Copy link
Collaborator Author

Had to make a tweak and then was able to successfully sync an S3 bucket with Minio in the demo Docker Compose project. So, I guess this is ready for review.

Copy link

@FifthPotato FifthPotato left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First set of review comments, mostly on one file. I'll post more on the rest later, since this is a pretty hefty PR.

Copy link
Collaborator Author

@alanking alanking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll contemplate making the suggested changes in this PR. The primary goal of this PR is to restructure the project under a new Celery app and making the tasks a little more straightforward, so the existing implementation was retained as much as possible. That said, the suggested changes are valid concerns which should be fixed.

@alanking
Copy link
Collaborator Author

Caught an issue in the latest changes... Tests are passing again.

Copy link

@FifthPotato FifthPotato left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, that should be the rest of them. Just a couple more small things!

Copy link
Collaborator Author

@alanking alanking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @FifthPotato

@alanking
Copy link
Collaborator Author

Tests are passing with latest changes

@alanking
Copy link
Collaborator Author

Rebased and squashed. Didn't mean to push, though. Sorry about that, reviewers...

@trel
Copy link
Member

trel commented Sep 19, 2024

very shady.

Copy link

@FifthPotato FifthPotato left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

Copy link
Contributor

@korydraughn korydraughn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed a few TODO comments in here. Let's get some issue numbers attached to them and get this merged if everything's working.

irods_capability_automated_ingest/tasks/filesystem_sync.py Outdated Show resolved Hide resolved
@korydraughn
Copy link
Contributor

We got an approval from @FifthPotato.

Do you feel this is ready for squashing/merging? Is there anything else left to do for this PR?

@alanking
Copy link
Collaborator Author

I think this is ready. The goal was to keep the behavior unchanged as much as possible and the tests are passing, so I think adding anything else in this PR is unnecessary if we're all good with the existing changes.

@korydraughn
Copy link
Contributor

Sounds good to me.

Please squash to taste.

@alanking
Copy link
Collaborator Author

Squashed

Copy link
Contributor

@korydraughn korydraughn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pound and merge it.

The kubernetes directory and flask app are not part of the main
Automated Ingest framework or the sync script. This introduces
unnecessary dependencies and clutters the codebase. These have
been removed.
Updates the ingest-test Docker Compose project to install iRODS 4.3.3
on the iRODS catalog service provider service.

Also runs black formatter on the test suite Python file.
This commit creates an alternative implementation of the ingest Celery
app. The most relevant changes to the project structure are described
below.

The Celery application is now housed in celery.py which allows the Celery
application run by the workers to refer simply to the name of the Python
module irods_capability_automated_ingest instead of requiring
irods_capability_automated_ingest.sync_task. The Celery application includes
the tasks defined in the tasks subdirectory.

The tasks subdirectory includes a file for syncing a filesystem and a file
for syncing an S3 bucket. There is currently a lot of duplicate code between
the two, but separating these is an important step for adding more types of
storage for syncing.

In order to remove circular dependencies, an irods subdirectory has also
been added to provide the functionality for interactions with PRC. The Celery
tasks call into these functions just like they always had, but, as they always
have, storage-specific implementations are needed in order to implement things
like reading data and performing parallel transfers.
This commit updates README instructions, tests, examples and the irods_sync
script to use the new Celery application and tasks.
This commit removes the historical implementation of the Automated
Ingest framework, including the Celery application, tasks, and sync
machinery. A new Celery application has been created in its place.
This commit makes some changes to irods_session to prevent empty
strings in environment variables and other potential issues caused
by invalid inputs.
@alanking
Copy link
Collaborator Author

#'d, mergin

@alanking alanking merged commit bd88378 into irods:main Sep 25, 2024
@alanking alanking deleted the new-celery-app branch September 25, 2024 19:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants