Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errors when running the pipeline #51

Open
calocascio opened this issue Aug 9, 2024 · 25 comments
Open

Errors when running the pipeline #51

calocascio opened this issue Aug 9, 2024 · 25 comments
Labels
documentation Improvements or additions to documentation enhancement New feature or request

Comments

@calocascio
Copy link
Collaborator

When running the pipeline, I get methylation classification results in the GUI, but there are several errors in the terminal. CNV data is missing, and I cannot generate a report from the GUI. I have attached the log file
NB19-509.log
. Please let me know if you need any additional information.

Environment details

Ubuntu 20.04.6

Additional context

The sequence data is from a PromethION 48 machine. The lab followed the ligation protocol, and we have not run adaptive sampling.

I ran the pipeline like this:
robin --threads 4 -r /data/GRCh38.p14.genome.fa -w /path/to/bam_pass /path/to/output
And I opened the URL http://10.54.216.13:8081 and pressed Live data.

@mattloose
Copy link
Contributor

Sorry for the delay in getting back to you.

I can see the issue - currently ROBIN expects a reference with chromosome names as chr1 chr2 etc etc. Your reference uses GL000008.2 etc.

You could try using this one:

ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz

or this:

https://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/p14/hg38.p14.fa.gz

I am thinking of bundling a reference with the tool goign forwards. The main issue is that your reads will need to be mapped to a reference using the chr1 etc nomenclature.

@mattloose mattloose added documentation Improvements or additions to documentation enhancement New feature or request labels Aug 22, 2024
@calocascio
Copy link
Collaborator Author

Hi again,

Thank you for your answer! We tried a new run with the reference genome from the second link you suggested. It worked to generate a report, and I get copy number variation data for chr 1-22 + chr X, but I think the haplotype chromosomes and unplaced/unlocalized contigs are creating error messages. I attach the new log file:
IPD0241.log

Thanks again for your support.

@mattloose
Copy link
Contributor

I'll have a look at this and get back to you asap.

If you got the report that is excellent!

@mattloose
Copy link
Contributor

Hi - can I just check - are you aligning the data to the same reference you are using for ROBIN or are the data aligned to the complete reference?

@calocascio
Copy link
Collaborator Author

Hi! We used the hg38.p14.fa for both alignment and when running ROBIN. Sorry, I wasn't aware the patches were not complete. I do see that for example "chr1_GL383518v1_alt" is present in hg38.p14.fa. Which version do you recommend using for alignment?

@calocascio
Copy link
Collaborator Author

Hi again,
Sorry to add to the issue, but we tried the same again with another sample just to see, and this time it did not work to generate a report. The errors pertaining to that is towards the bottom of this error log:
IPD1119.log

It seems the image size is too large. I'm not sure if this is related to the issue with the reference genome or if it's unrelated?

Just let me know if you need any other information!

@mattloose
Copy link
Contributor

ValueError: Image size of 521657841x958 pixels is too large. It must be less than 2^16 in each direction.
2024-09-12 11:25:32,404 - nicegui - ERROR - Image size of -1257769606x701 pixels is too large. It must be less than 2^16 in each direction.

Yes! That is a big image.

Can I check which version of ROBIN you are currently running? If you do robin --version what do you get?

@mattloose
Copy link
Contributor

Acutally - if you are on any version less than 0.1.0 then you could do with updating. That might not fix the problem, but it would be worth checking!

@calocascio
Copy link
Collaborator Author

Ah! I was using 0.0.6. I have now recloned it and have version 0.1.0. I tried running the same as before, but now I get this error:
"Error: invalid value for '--bed_file' / '-b': File '/path/to/bed/file' does not exist.

I see now after running robin --help, that there are now many more options where it says "[required]". Most should be fine, but what do I do if I don't have a bed file (we have not run adaptive sampling)?

By the way, there may have been a problem with the index file for the reference genome I was using. I have now recreated the fasta.fai file, so we can see if that helps after this :)

@mattloose
Copy link
Contributor

Ah - thats a new feature.

You can pass it the link to this file which will be in the downloaded repository:

src/robin/resources/panel_11092024_5kb_pad.bed

It won't affect anything you do, but ROBIN assumes that these were the targets for some of the analysis steps.

@calocascio
Copy link
Collaborator Author

Thank you - that's great! I ran it again like this:
robin --threads 4 -r /path/to/ref -b /path/to/bed --centreID "IPD1119" --basecall_config "guppy" --experiment_duration 72 -w /path/to/bam /path/to/output

Although I think I should have put "dorado" instead of "guppy". However, I didn't get any errors, until I tried to create a report (log file:
IPD1119.log
). I also attach screenshots of the output in case you see anything there.
screenshots.zip

Please let me know if you see anything that looks wrong, and if you have any more thoughts about the correct reference genome to use when creating the BAM files.

@mattloose
Copy link
Contributor

Hi - yes you should have used dorado (though at the moment this will not matter at all).

I will try and recreate the report generation error here and see if I can solve it.

@mattloose
Copy link
Contributor

Hi - I'm struggling to reproduce the error on the current version of the code.

Could you check that when you type:

robin --version

you see something like:

0.1.0

Assuming that is the case, could you look in the output folder which should be in /path/to/output/

In this folder you should see a subfolder that corresponds to the data set you are analysing. Within that folder should be two files called CNV.npy and CNV_dict.npy

These files contain no sequence data, but do contain a description of the copy number profile. I think it is these files that are causing the issue. Would you be able to share those with me? Then I might be able to track down what is causing the problem.

Thanks.

Matt

@calocascio
Copy link
Collaborator Author

Hi,
Yes, I am now using version 0.1.0. Sure – here are the two files from this run:
IPD1119_CNV.zip

Thank you!

@mattloose
Copy link
Contributor

Brilliant - this is very helpful. I've been able to track down at least part of the issue.

Could you try installing the version of robin on the branch I've just pushed.

https://github.com/LooseLab/ROBIN/tree/fix/reporting_multi_chrom

This should enable you to generate a report. If it works I will merge into the main branch.

@calocascio
Copy link
Collaborator Author

Thanks - that's great! I tried it out (another sample), and for some reason I can create a report after it has run a little while, but not after it has finished. Since the report is 158 pages and too big to attach here, I'm only adding the first five pages of the report I got. It changed after this though: after it had finished processing all of the BAM files, the estimated coverage was about double, and I got results for nanoDx, for instance. But here is it:
IPD0737-DXX-P01-F08_run_report_short.pdf
. I also saved all of the output to the terminal, in case that is helpful:
IPD0737_stdout.log
.

@mattloose
Copy link
Contributor

Did you reinstall from the report branch above?

It appears that you may not have done as in your report I can see this:

image

This is showing that you have reads aligned to alts and unplaced contigs which it is trying to plot. The version of the code on the alternate branch above should (and perhaps it doesn't!) ignore those now.

If the bam files that you have are aligned to a reference that does not include alts and unplaced etc then this shoudln't be happening.

Can you confirm the reference you are using? And also try installing from the reporting_multi_chrom branch as above?

@calocascio
Copy link
Collaborator Author

I did the following: git fetch origin, git checkout fix/reporting_multi_chrom, git submodule update --init --recursive. It looks like I am in the right branch when I run git branch, but let me know if I should have used different code. I am using this reference genome: , both for alignment and when running ROBIN, so maybe I need to use a different one?

@calocascio
Copy link
Collaborator Author

Hmm, looks like the link didn't work. This one: https://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/p14/hg38.p14.fa.gz

@mattloose
Copy link
Contributor

Hi,

Can you run:

git checkout fix/reporting_multi_chrom
git pull
pip install -e .

Then run:

robin --version

And hopefully it will show

0.1.0b

If so then you have updated.

Then if you restart robin you should be able to browse to the previous run folder and generate the report from the existing analysis but without the multiple alignments showing (I hope!)

@calocascio
Copy link
Collaborator Author

Ok, now it is showing version 0.1.0b. Fantastic – that seemed to have worked! (see CNV screenshot:
CNV - All Chromosomes
). However, I'm sorry, but now there is a new error coming up (log file:
IPD0737_run2.log
), stopping the report from being generated. Thank you for all of your help so far!

@mattloose
Copy link
Contributor

Hey - we are making progress!

This is good news :-)

I'll have a look at what is causing the date error. That one is odd and I haven't seen it before. I'll get back to you asap.

@mattloose
Copy link
Contributor

in the folder with all the results you should have a number of files called somethign followed by _scores.csv.

There could be three or four of these. Would you be able to share them with me?

Thanks.

@calocascio
Copy link
Collaborator Author

Great! :) yes - here are the four files zipped:
scores_output.zip

@mattloose
Copy link
Contributor

Please could you try the latest version on the main branch (should be version 0.1.3). This may well have resolved some of the reporting issues you were seeing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants