Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversion bruker data #148

Merged
merged 31 commits into from
Sep 5, 2023
Merged

Conversation

ypriverol
Copy link
Member

@ypriverol ypriverol commented Sep 4, 2023

@ypriverol ypriverol linked an issue Sep 4, 2023 that may be closed by this pull request
@ypriverol ypriverol linked an issue Sep 4, 2023 that may be closed by this pull request
@ypriverol ypriverol linked an issue Sep 4, 2023 that may be closed by this pull request
@jpfeuffer
Copy link
Collaborator

Can you add a "--version" command so we dont have to hardcode the version in workflows everytime and everywhere?
Would be awesome

@ypriverol
Copy link
Member Author

parse_sdrf.py --version give the version of the tool.

@ypriverol ypriverol mentioned this pull request Sep 4, 2023
@ypriverol ypriverol linked an issue Sep 4, 2023 that may be closed by this pull request
@jpfeuffer
Copy link
Collaborator

How can you specify that both raw and d are converted to mzml?

@ypriverol
Copy link
Member Author

You mean having a conversion where you have more than one file type? Right now you will only have one file type to another, this is allowed: d:mzML or raw:mzML, d:d, etc.

@jpfeuffer
Copy link
Collaborator

what if you have multiple file types in the initial design?

@ypriverol
Copy link
Member Author

We don't allow that in quantms or even OpenMS. I guess is a really future use case.

@jpfeuffer
Copy link
Collaborator

It is allowed in quantms.
I ran pipelines with raw and mzml before.

@jpfeuffer
Copy link
Collaborator

If you only allow one type of raw file, why a mapping at all?
Then you just need an option "out_type".

@ypriverol
Copy link
Member Author

Ok, what will be the expected behaviour, something like:

raw:mzml
d:d

Which mean convert the raw to mzml and keep the d as d?

@jpfeuffer
Copy link
Collaborator

Yes exactly.

@ypriverol
Copy link
Member Author

What should happen if the user gives contradictory changes, like d:d and d:raw ? Error?

@ypriverol
Copy link
Member Author

I will put the following behavior: --extension_convert 'raw:mzml, d:mzml' if multiple conversion options are provided for the same filetype: raw:mzml, raw:raw the tool will through an error.

@ypriverol
Copy link
Member Author

@jpfeuffer the following commit allows multiple file conversions: 9c304a7

@ypriverol ypriverol removed the request for review from fabianegli September 5, 2023 14:03
@ypriverol ypriverol merged commit 03d8e8b into bigbio:master Sep 5, 2023
Copy link
Collaborator

@fabianegli fabianegli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work. Most comments can be safely ignored, but the tests should all use asserts.

(and please excuse my post merge review...)

@@ -12,17 +12,28 @@ def test_validate_srdf():
runner = CliRunner()
result = runner.invoke(cli, ["validate-sdrf", "--sdrf_file", "testdata/PXD000288.sdrf.tsv", "--check_ms"])

print(result.output)
assert "ERROR" not in result.output
print("validate sdrf " + result.output)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular reason to not use an assert statement here instead of the print? Just reading this test, I don't know what would count as a test failure. Or success. There might be an

assert "ERROR" not in result.output

missing?

Comment on lines +4 to +14
def test_search_mods_by_accession():
unimod = UnimodDatabase()
ptm = unimod.get_by_accession("UNIMOD:21")
print(ptm.get_name())


def test_search_mods_by_keyword():
unimod = UnimodDatabase()
ptms = unimod.search_mods_by_keyword("Phospho")
for ptm in ptms:
print(ptm.to_str())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, tests should have an assert. Without they still require human interpretation and the idea is to have the test automated and as little as possible human interpretation/interaction.

@@ -37,7 +37,7 @@ def verify_content(pname, pvalue, ptype):
# exit("ERROR: " + pname + " needs to be a numeric value!!")
elif ptype == "class":
not_matching = [x for x in pvalue.split(",") if x not in p["value"]]
if not_matching != []:
if len(not_matching) != 0:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sidenote: This could be just if not_matching:

@@ -98,7 +98,7 @@ def add_ptms(mods, pname, mod_columns):
modname = tmod[0]
modpos = tmod[1]
found = [x for x in unimod.modifications if modname == x.get_name()]
if found == []:
if len(found) == 0:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sidenote: I would probably have gone with if not found.

@@ -35,6 +41,15 @@ def get_name(self):
def get_accession(self):
return self._ontology_term.get_accession()

def get_delta_mono_mass(self):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option would be to make them properties with the "@Property" decorator. That way the "get_" prefix could be left out and the attributes would still have some protection against overwriting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants