Add a helper to get the object store ids and the datasets size for every dataset in a job #125

sanjaysrikakulam · 2024-01-24T13:06:18Z

This will be used in the ESG WP4 TPV meta-scheduling API for decision-making. If you think this is not useful for a larger audience, please close this PR.

…ery dataset in a job

nuwang

This looks useful in a general sense, so it's a good idea to include it here I think.

tpv/core/helpers.py

Co-authored-by: Nuwan Goonasekera <[email protected]>

nuwang

Thanks for the changes. Can we also add a test for this please?

tpv/core/helpers.py

Co-authored-by: Nuwan Goonasekera <[email protected]>

sanjaysrikakulam · 2024-01-24T17:04:37Z

Thanks for the changes. Can we also add a test for this please?

Could you please point me where to add a test? Should I create a new file in tests or should I add it to an existing file?

A sample test would look like this:

A minor update to the Dataset class in mock_galaxy

class Dataset:
    counter = 0

    def __init__(self, file_name, file_size, object_store_id=None):
        self.id = self.counter
        self.counter += 1
        self.file_name = file_name
        self.file_size = file_size
        self.object_store_id = object_store_id

    def get_size(self, calculate_size=False):
        return self.file_size

Test:

def test_get_dataset_attributes():
    """Test that the function returns a dictionary with the correct attributes"""
    mock_job = mock_galaxy.Job()
    mock_job.add_input_dataset(
        mock_galaxy.DatasetAssociation(
            "test",
            mock_galaxy.Dataset("test.txt", file_size=7*1024**3, object_store_id="files1")
            )
        )
    dataset_attributes = get_dataset_attributes(mock_job.input_datasets)
    expected_result = {0: {'object_store_id': 'files1', 'size': 7*1024**3}}

    if dataset_attributes == expected_result:
        print("Test get_dataset_attributes passed")
    else:
        print("Test get_dataset_attributes failed")

nuwang · 2024-01-24T19:37:22Z

Most helpers have been tested by exercising it via an actual yaml file. For example, the tool_version_gt method in this:

total-perspective-vortex/tests/fixtures/mapping-rules.yml

Line 38 in dc2a19a

- if: helpers.tool_version_gte(tool, '42')

However, since this is needlessly verbose, perhaps we could just write the unit test you wrote above in a new file named test_helpers in the tests folder?

sanjaysrikakulam · 2024-01-25T08:27:12Z

Yeah, sounds like a plan. I will create the file and push a new commit shortly.

nuwang

Looks great!

nuwang

One thing I noticed while reading this: #123

"Deferred datasets need to be excluded". Should some logic+test be added for that?

sanjaysrikakulam · 2024-01-25T09:29:21Z

Thank you for pointing out that issue. I'm not sure what the deferred data job objects look like. However, if the input datasets are empty, we are already returning an empty dict, so this should be fine, right? It already silently handles these cases (and probably many others we are unaware of).

nuwang · 2024-01-25T09:58:57Z

I think you're right. A deferred dataset may return an empty object_store_id, but that's fine I guess, shouldn't break anything. If necessary, we can always do something in a follow up.

Add a helper to get the object store ids and the datasets size for ev…

3c44b41

…ery dataset in a job

nuwang reviewed Jan 24, 2024

View reviewed changes

tpv/core/helpers.py Outdated Show resolved Hide resolved

tpv/core/helpers.py Outdated Show resolved Hide resolved

sanjaysrikakulam and others added 2 commits January 24, 2024 14:54

Update tpv/core/helpers.py

8aa024d

Co-authored-by: Nuwan Goonasekera <[email protected]>

Update tpv/core/helpers.py

e92d23b

Co-authored-by: Nuwan Goonasekera <[email protected]>

nuwang reviewed Jan 24, 2024

View reviewed changes

tpv/core/helpers.py Outdated Show resolved Hide resolved

Update tpv/core/helpers.py

a8221ea

Co-authored-by: Nuwan Goonasekera <[email protected]>

Add test for the get_dataset_attributes helper function

933a912

nuwang approved these changes Jan 25, 2024

View reviewed changes

nuwang reviewed Jan 25, 2024

View reviewed changes

nuwang merged commit 97405a5 into galaxyproject:main Jan 25, 2024
2 checks passed

nuwang mentioned this pull request Jan 25, 2024

add a convenient way to get the object_store_id of a input dataset #123

Closed

sanjaysrikakulam mentioned this pull request Jan 30, 2024

Update API closest_destination function to handle destination attributes rather than a selected objectstore usegalaxy-eu/tpv-metascheduler-api#4

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a helper to get the object store ids and the datasets size for every dataset in a job #125

Add a helper to get the object store ids and the datasets size for every dataset in a job #125

sanjaysrikakulam commented Jan 24, 2024

nuwang left a comment

nuwang left a comment

sanjaysrikakulam commented Jan 24, 2024 •

edited

Loading

nuwang commented Jan 24, 2024

sanjaysrikakulam commented Jan 25, 2024

nuwang left a comment

nuwang left a comment

sanjaysrikakulam commented Jan 25, 2024

nuwang commented Jan 25, 2024

Add a helper to get the object store ids and the datasets size for every dataset in a job #125

Add a helper to get the object store ids and the datasets size for every dataset in a job #125

Conversation

sanjaysrikakulam commented Jan 24, 2024

nuwang left a comment

Choose a reason for hiding this comment

nuwang left a comment

Choose a reason for hiding this comment

sanjaysrikakulam commented Jan 24, 2024 • edited Loading

nuwang commented Jan 24, 2024

sanjaysrikakulam commented Jan 25, 2024

nuwang left a comment

Choose a reason for hiding this comment

nuwang left a comment

Choose a reason for hiding this comment

sanjaysrikakulam commented Jan 25, 2024

nuwang commented Jan 25, 2024

sanjaysrikakulam commented Jan 24, 2024 •

edited

Loading