Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove pandas from gsaid api #3393

Open
wants to merge 42 commits into
base: master
Choose a base branch
from

Conversation

henrikstranneheim
Copy link
Contributor

@henrikstranneheim henrikstranneheim commented Jul 4, 2024

Description

Added

Changed

  • Use Python naitive CSV and Pydantic instead of Pandas
  • Remove Pandas from GISAID API

Fixed

How to prepare for test

  • Ssh to relevant server (depending on type of change)
  • Use stage: us
  • Paxa the environment: paxa
  • Install on stage (example for Hasta):
    bash /home/proj/production/servers/resources/hasta.scilifelab.se/update-tool-stage.sh -e S_cg -t cg -b remove-pandas-from-gsaid-api -a

How to test

  • Do ...

Expected test outcome

  • Check that ...
  • Take a screenshot and attach or copy/paste the output.

Review

  • Tests executed by
  • "Merge and deploy" approved by
    Thanks for filling in who performed the code review and the test!

This version is a

  • MAJOR - when you make incompatible API changes
  • MINOR - when you add functionality in a backwards compatible manner
  • PATCH - when you make backwards compatible bug fixes or documentation/instructions

Implementation Plan

  • Document in ...
  • Deploy this branch on ...
  • Inform to ...

@henrikstranneheim henrikstranneheim self-assigned this Jul 4, 2024
@henrikstranneheim henrikstranneheim requested a review from a team as a code owner July 4, 2024 07:48
Copy link

sonarqubecloud bot commented Jul 4, 2024

@henrikstranneheim henrikstranneheim mentioned this pull request Jul 15, 2024
3 tasks
)
reports: list[dict] = [report.model_dump() for report in sars_cov_complementary_reports]
write_csv_from_dict(
content=reports, fieldnames=HEADERS, file_path=Path(complementary_report.full_path)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What headers are required here and does it does it have to be in Swedish? @Clinical-Genomics/prodbioinfo

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would keep the headers as is, this format was not defined by us iirc

@henrikstranneheim
Copy link
Contributor Author

@karlnyr This one is quite hard to test. Do you have any feedback on a proper way to test this?

@henrikstranneheim
Copy link
Contributor Author

@Clinical-Genomics/prodbioinfo How does an invalid gisaid log entry look like? I.e.

log.get("code") == "validation_error" and "existing_ids" in log.get("msg"):

I cannot find one in stage.

@henrikstranneheim
Copy link
Contributor Author

When already uploaded, 👍

henrik.stranneheim@hasta:~] [S_base] $ cg upload gisaid fastbluejay
Called undefined __fields__ on HousekeeperAPI, please wrap
Called undefined __fields__ on HousekeeperAPI, please wrap
Called undefined __fields__ on HousekeeperAPI, please wrap
Called undefined __fields__ on HousekeeperAPI, please wrap
Called undefined __fields__ on HousekeeperAPI, please wrap
Called undefined __fields__ on HousekeeperAPI, please wrap
Called undefined __fields__ on HousekeeperAPI, please wrap
Called undefined __fields__ on HousekeeperAPI, please wrap
Called undefined __fields__ on HousekeeperAPI, please wrap
Called undefined __fields__ on HousekeeperAPI, please wrap
Called undefined __fields__ on HousekeeperAPI, please wrap
----------------- UPLOAD -----------------
----------------- GISAID UPLOAD -------------------
All samples already uploaded
Upload to GISAID successful

@henrikstranneheim
Copy link
Contributor Author

Hard to test this more since one needs to make an actual upload to GISAID.

Would be great with a review @Clinical-Genomics/sysdev @Clinical-Genomics/prodbioinfo ans suggestions how to further test this are very welcome.

@henrikstranneheim
Copy link
Contributor Author

Would be great with a review @Clinical-Genomics/sysdev @Clinical-Genomics/prodbioinfo ans suggestions how to further test this are very welcome.

Copy link

Copy link
Contributor

@diitaz93 diitaz93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great refactoring 🚀

return gisaid_accession_count == sample_number_count

def upload_to_gisaid(self, case_id: str) -> None:
"""Uploading results to GISAID and saving the accession numbers in complementary file."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""Uploading results to GISAID and saving the accession numbers in complementary file."""
"""Upload results to GISAID and save the accession numbers in complementary file."""

Comment on lines +10 to +11
sample_number: str = Field(str, alias="provnummer")
selection_criteria: str = Field(str, alias="urvalskriterium")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the str in the default position might cause a bug

Comment on lines +237 to +242
complementary_report_content_raw: list[dict] = self.get_complementary_report_content_raw(
case_id
)
sars_cov_complementary_reports: list[GisaidComplementaryReport] = (
self.parse_and_get_sars_cov_complementary_reports(complementary_report_content_raw)
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This same logic is done in the upload_to_gisaid function. Wouldn't it be better if this function got as a parameter the sars_cov_complementary_reports instead of calculating it again?

LOG.info("GISAID log exists in case bundle in Housekeeper")
def create_and_include_gisaid_log_file_to_hk(self, case_id: str) -> None:
"""Create and include GISAID log file to a Housekeeper."""
if _ := self.housekeeper_api.get_files(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this walrus operator is needed here if the value is not going to be used

Suggested change
if _ := self.housekeeper_api.get_files(
if self.housekeeper_api.get_files(

def test_validate_gisaid_complementary_reports(
gisaid_api: GisaidAPI, gisaid_complementary_report_raw: dict[str, str]
):
# GIVEN a dict
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# GIVEN a dict
# GIVEN a valid GISAID complementary report as a dict

def test_add_gisaid_accession_to_reports(
gisaid_complementary_reports: list[GisaidComplementaryReport], gisaid_api: GisaidAPI
):
"""Test adding gisaid accession to the reports."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we stick to using capitals for GISAID?

Suggested change
"""Test adding gisaid accession to the reports."""
"""Test adding GISAID accession to the reports."""

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants