Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a helper to get the object store ids and the datasets size for every dataset in a job #125

Merged
merged 5 commits into from
Jan 25, 2024

Conversation

sanjaysrikakulam
Copy link
Contributor

This will be used in the ESG WP4 TPV meta-scheduling API for decision-making. If you think this is not useful for a larger audience, please close this PR.

Copy link
Member

@nuwang nuwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks useful in a general sense, so it's a good idea to include it here I think.

tpv/core/helpers.py Outdated Show resolved Hide resolved
tpv/core/helpers.py Outdated Show resolved Hide resolved
sanjaysrikakulam and others added 2 commits January 24, 2024 14:54
Co-authored-by: Nuwan Goonasekera <[email protected]>
Co-authored-by: Nuwan Goonasekera <[email protected]>
Copy link
Member

@nuwang nuwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes. Can we also add a test for this please?

tpv/core/helpers.py Outdated Show resolved Hide resolved
Co-authored-by: Nuwan Goonasekera <[email protected]>
@sanjaysrikakulam
Copy link
Contributor Author

sanjaysrikakulam commented Jan 24, 2024

Thanks for the changes. Can we also add a test for this please?

Could you please point me where to add a test? Should I create a new file in tests or should I add it to an existing file?

A sample test would look like this:

A minor update to the Dataset class in mock_galaxy

class Dataset:
    counter = 0

    def __init__(self, file_name, file_size, object_store_id=None):
        self.id = self.counter
        self.counter += 1
        self.file_name = file_name
        self.file_size = file_size
        self.object_store_id = object_store_id

    def get_size(self, calculate_size=False):
        return self.file_size

Test:

def test_get_dataset_attributes():
    """Test that the function returns a dictionary with the correct attributes"""
    mock_job = mock_galaxy.Job()
    mock_job.add_input_dataset(
        mock_galaxy.DatasetAssociation(
            "test",
            mock_galaxy.Dataset("test.txt", file_size=7*1024**3, object_store_id="files1")
            )
        )
    dataset_attributes = get_dataset_attributes(mock_job.input_datasets)
    expected_result = {0: {'object_store_id': 'files1', 'size': 7*1024**3}}

    if dataset_attributes == expected_result:
        print("Test get_dataset_attributes passed")
    else:
        print("Test get_dataset_attributes failed")

@nuwang
Copy link
Member

nuwang commented Jan 24, 2024

Most helpers have been tested by exercising it via an actual yaml file. For example, the tool_version_gt method in this:

- if: helpers.tool_version_gte(tool, '42')

However, since this is needlessly verbose, perhaps we could just write the unit test you wrote above in a new file named test_helpers in the tests folder?

@sanjaysrikakulam
Copy link
Contributor Author

Yeah, sounds like a plan. I will create the file and push a new commit shortly.

Copy link
Member

@nuwang nuwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

Copy link
Member

@nuwang nuwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I noticed while reading this: #123

"Deferred datasets need to be excluded". Should some logic+test be added for that?

@sanjaysrikakulam
Copy link
Contributor Author

Thank you for pointing out that issue. I'm not sure what the deferred data job objects look like. However, if the input datasets are empty, we are already returning an empty dict, so this should be fine, right? It already silently handles these cases (and probably many others we are unaware of).

@nuwang
Copy link
Member

nuwang commented Jan 25, 2024

I think you're right. A deferred dataset may return an empty object_store_id, but that's fine I guess, shouldn't break anything. If necessary, we can always do something in a follow up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants