Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vcf_collect fusion split error #468

Closed
berguner opened this issue Feb 7, 2024 · 3 comments
Closed

vcf_collect fusion split error #468

berguner opened this issue Feb 7, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@berguner
Copy link

berguner commented Feb 7, 2024

Description of the bug

Hi,
For some samples the pipeline (version 3.0.1) fails giving the error below:

Caused by:
  Essential container in task exited

Command executed:

  vcf_collect.py --fusioninspector RNA2956_3.FusionInspector.fusions.abridged.tsv.annotated.coding_effect --fusionreport RNA2956_3_fusionreport_index.html --fusioninspector
_gtf RNA2956_3.tsv --fusionreport_csv RNA2956_3.fusions.csv --hgnc hgnc_complete_set.txt --sample RNA2956_3 --out RNA2956_3_fusion_data.vcf
  gzip RNA2956_3_fusion_data.vcf
  
  cat <<-END_VERSIONS > versions.yml
  "NFCORE_RNAFUSION:RNAFUSION:FUSIONINSPECTOR_WORKFLOW:VCF_COLLECT":
      python: $(python --version | sed 's/Python //g')
      HGNC DB retrieval: $(cat HGNC-DB-timestamp.txt)
  END_VERSIONS

Command exit status:
  1

Command output:
  (empty)

Command error:
  Traceback (most recent call last):
    File "//nextflow-bin/vcf_collect.py", line 505, in <module>
      sys.exit(main())
               ^^^^^^
    File "//nextflow-bin/vcf_collect.py", line 493, in main
      vcf_collect(
    File "//nextflow-bin/vcf_collect.py", line 42, in vcf_collect
      .join(read_build_fusionreport(fusionreport_in_file), how="outer", on="FUSION")
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "//nextflow-bin/vcf_collect.py", line 340, in read_build_fusionreport
      fusion_report[["GeneA", "GeneB"]] = fusion_report["FUSION"].str.split("--", expand=True)
      ~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/pandas/core/frame.py", line 3966, in __setitem__
      self._setitem_array(key, value)
    File "/usr/local/lib/python3.11/site-packages/pandas/core/frame.py", line 4008, in _setitem_array
      check_key_length(self.columns, key, value)
    File "/usr/local/lib/python3.11/site-packages/pandas/core/indexers/utils.py", line 401, in check_key_length
      raise ValueError("Columns must be same length as key")
  ValueError: Columns must be same length as key

It sounds like one of the input files contains fusion gene pairs that would give more/less than 2 genes when split with --. I checked the fusions.csv and .fusionreport.tsv files but that wasn't the case. I am baffled why this error is occurring.

I am also attaching the nextflow.log file
vcf_collect_error.nextflow.log

Command used and terminal output

No response

Relevant files

No response

System information

No response

@berguner berguner added the bug Something isn't working label Feb 7, 2024
@rannick
Copy link
Collaborator

rannick commented Feb 7, 2024

I just merged a few fixes for vcf_collect in dev, could you try with that branch and report the behaviour please?

@berguner
Copy link
Author

berguner commented Feb 7, 2024

I tried the dev branch but I got a different error now:

Command executed:

  vcf_collect.py --fusioninspector RNA2600_5.FusionInspector.fusions.abridged.tsv.annotated.coding_effect --fusionreport RNA2600_5_fusionreport_index.html --fusioninspector_gtf RNA2600_5.tsv --fusionreport_csv RNA2600_5.fusions.csv --hgnc hgnc_complete_set.txt --sample RNA2600_5 --out RNA2600_5_fusion_data.vcf
  gzip RNA2600_5_fusion_data.vcf
  
  cat <<-END_VERSIONS > versions.yml
  "NFCORE_RNAFUSION:RNAFUSION:FUSIONINSPECTOR_WORKFLOW:VCF_COLLECT":
      python: $(python --version | sed 's/Python //g')
      HGNC DB retrieval: $(cat HGNC-DB-timestamp.txt)
  END_VERSIONS

Command exit status:
  1

Command output:
  (empty)

Command error:
  Traceback (most recent call last):
    File "//nextflow-bin/vcf_collect.py", line 528, in <module>
      sys.exit(main())
               ^^^^^^
    File "//nextflow-bin/vcf_collect.py", line 516, in main
      vcf_collect(
    File "//nextflow-bin/vcf_collect.py", line 42, in vcf_collect
      .join(read_build_fusionreport(fusionreport_in_file), how="outer", on="FUSION")
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "//nextflow-bin/vcf_collect.py", line 339, in read_build_fusionreport
      fusion_report = pd.DataFrame.from_dict({k: [v] for k, v in expression.items()})
                                                                 ^^^^^^^^^^^^^^^^
  AttributeError: 'tuple' object has no attribute 'items'

@rannick rannick mentioned this issue Apr 3, 2024
10 tasks
@rannick
Copy link
Collaborator

rannick commented Apr 3, 2024

This should be solved by #481

@rannick rannick closed this as completed Apr 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants