Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pharokka wrapper #5130

Merged
merged 14 commits into from
Mar 1, 2023
Merged

Pharokka wrapper #5130

merged 14 commits into from
Mar 1, 2023

Conversation

paulzierep
Copy link
Contributor

FOR CONTRIBUTOR:

  • - I have read the CONTRIBUTING.md document and this tool is appropriate for the tools-iuc repo.
  • - License permits unrestricted use (educational + commercial)
  • - This PR adds a new tool or tool collection
  • - This PR updates an existing tool or tool collection
  • - This PR does something else (explain below)

Had initial problems with testing the data table. But the maintainer provided a tiny version of the DB and the tests passed. Tests are only written for the main output.

<!-- check file size since output is non-deterministic -->
<output name="pharokka_gbk">
<assert_contents>
<has_size value="353875" delta="300" />
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

has_size is the worst test you can do as it is very unspecific, you could use other more specific asserts if you like

#else:
echo "use cache" &&
mkdir pharokka_db &&
tar -xvf "$reference_source.db_cached.fields.path" --strip 1 -C pharokka_db &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


## run tool
#if str( $terminase.terminase_selector ) == "no_terminase":
pharokka.py -i $fasta -o pharokka_output -d pharokka_db -t 8 $gene_predictor $meta -e $evalue &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

single-quote all data params and text params

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is -t 8 the number of cores? if so please use GALAXY_SLOTS

## create output
zip -r out.zip pharokka_db &&
cp out.zip "$archive_output" &&
cp pharokka_output/pharokka.gbk "$pharokka_gbk" &&
Copy link
Member

@bgruening bgruening Feb 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of copy, you can use from_work_dir= in the output section

</command>
<inputs>
<!-- the genome -->
<param type="data" name="fasta" format="data" />
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a title here and maybe some help to assist the user

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With title, do you mean using a label tag ?

</param>
</when>
<when value="history">
<param name="db_histroy" type="data" format="data" label="Use the folloing pharokka DB" help="You can upload a pharokka DB as tar.gz to the history and use it as DB" />
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

format is wrong, or needs to be specified

* improved tests
* added archive test
* single quotes changed
* zip as test data
* improved tests
* GALAXY_SLOTS
* using from_work_dir=
Copy link
Member

@bgruening bgruening left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • please see if you can add a bio.tools ID

rapid standardised annotation tool for bacteriophage genomes and metagenomes
</description>
<requirements>
<requirement type='package' version='1.2.0'>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


## run tool
#if str( $terminase.terminase_selector ) == 'no_terminase':
pharokka.py -i $fasta -o pharokka_output -d pharokka_db -t \${GALAXY_SLOTS:-8} $gene_predictor $meta -e $evalue &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all paths and text parameter needs to be single-quoted

]]>
</help>
<citations>
<citation type='bibtex'>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use type=doi here

</param>
</when>
<when value='history'>
<param name='db_histroy' type='data' format='zip' label='Use the folloing pharokka DB' help='You can upload a pharokka DB as zip to the history and use it as DB' />
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where do users get such a DB?

</conditional>
</inputs>
<outputs>
<data name='archive_output' format='zip' from_work_dir='out.zip' label='${tool.name} on ${on_string}: zip of the complete output' />
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we make this output file optional, so that the user needs to select an option to output it ... I don't think its so useful by default.

* optional zip output
* DB source
* single-quotes in cheetah
* citation doi
* macros and tokens
* bio tools ID
<![CDATA[
pharokka is a rapid standardised annotation tool for bacteriophage genomes and metagenomes.

If you are looking for rapid standardised annotation of bacterial genomes, please use prokka, which inspired the creation of pharokka, or bakta.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bakta should be used today, so I guess its save to recommend bakta.
This help could be enhanced a bit and more information be given?

</option>
</param>
<param name="meta" type="boolean" checked="false" truevalue="--meta" falsevalue="" label="meta mode for metavirome input samples" />
<param name="evalue" type="integer" value="100000" label="E-value threshold for mmseqs2 PHROGs database search. Defaults to 1E-05." />
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use min and max value whenever you can for floats and ints. the value looks for an evalue strange.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it, but the values go from 1e-20 to 10, however the bar does not really allow choosing something like 1e-10 ... I think it would be a good feature if a log scale could be used...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But max 10 you can add correct?

## create output
#if $zip_output == 'true':
zip -r out.zip pharokka_output
#else:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this else should not be needed

<command detect_errors="exit_code">
<![CDATA[
## run tool
#if str( $terminase.terminase_selector ) == 'no_terminase':
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't there a else case missing for run_terminase?

long_description: |
pharokka is a rapid standardised annotation tool for bacteriophage genomes and metagenomes.
If you are looking for rapid standardised annotation of bacterial genomes, please use prokka,
which inspired the creation of pharokka, or bakta. Repository-Maintainer: Paul Zierep
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can get credits by adding tags to the tools ... see https://docs.galaxyproject.org/en/latest/dev/schema.html

And add yourself as maintainer via the codeowner file in this repo.

@@ -0,0 +1,154 @@
<tool id="pharokka" name="bacteriophage annotation" version="@TOOL_VERSION@+galaxy@VERSION_SUFFIX@" python_template_version="3.7" profile="@PROFILE@">
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove the python_template_version its not needed.

* help
* credits
* min/max parameter
* else in code
]]>
</help>
<citations>
<citation type="bibtex">
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was addressed, github for some reason does not show it here.

@bgruening bgruening merged commit b5b9d62 into galaxyproject:main Mar 1, 2023
@bgruening
Copy link
Member

Thanks @paulzierep

@gxydevbot
Copy link
Collaborator

Attention: deployment skipped!

https://github.com/galaxyproject/tools-iuc/actions/runs/4308502357

@bernt-matthias
Copy link
Contributor

Due to test failures. Will restart .. lets see

@gxydevbot
Copy link
Collaborator

Attention: deployment skipped!

https://github.com/galaxyproject/tools-iuc/actions/runs/4308502357

@bernt-matthias
Copy link
Contributor

Still fails with exit code 1

@bernt-matthias
Copy link
Contributor

Traceback (most recent call last):
  File "/usr/local/bin/pharokka.py", line 5, in <module>
    import processes
  File "/usr/local/bin/processes.py", line 8, in <module>
    from BCBio import GFF
  File "/usr/local/lib/python3.10/site-packages/BCBio/GFF/__init__.py", line 3, in <module>
    from BCBio.GFF.GFFParser import GFFParser, DiscoGFFParser, GFFExaminer, parse, parse_simple
  File "/usr/local/lib/python3.10/site-packages/BCBio/GFF/GFFParser.py", line 34, in <module>
    from Bio.Seq import UnknownSeq
ImportError: cannot import name 'UnknownSeq' from 'Bio.Seq' (/usr/local/lib/python3.10/site-packages/Bio/Seq.py)

https://github.com/biopython/biopython/blob/dcf52bd4546410e1a60d39fbcd4c0041ec1e6fe6/DEPRECATED.rst#biosequnknownseq

So it seems that in biopython bcbio-gff should pin biopython to <=1.79 .. then we can bump the multipackage container and we are fine.

Could you do this @paulzierep? .. maybe also ask the pharokka developers why they use an extra gff module .. biopython itself should have a gff parser

@abretaud
Copy link
Contributor

abretaud commented Mar 2, 2023

For the record, the error in bcbio-gff is tracked there (there's a pending PR)

@paulzierep
Copy link
Contributor Author

I guess that can be solved by updating to v1.2.1 gbouras13/pharokka#238

@bernt-matthias
Copy link
Contributor

I guess that can be solved by updating to v1.2.1

Think so. Could you prepare a PR for the IUC repo?

Note, I also added the biopython pin to bcbio-gff: bioconda/bioconda-recipes#39703

@bernt-matthias
Copy link
Contributor

Ahh. you already did :)

Still wondering why our CI did not catch this problem before merging ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants