Errors when running the pipeline #51

calocascio · 2024-08-09T10:44:44Z

When running the pipeline, I get methylation classification results in the GUI, but there are several errors in the terminal. CNV data is missing, and I cannot generate a report from the GUI. I have attached the log file
NB19-509.log
. Please let me know if you need any additional information.

Environment details

Ubuntu 20.04.6

Additional context

The sequence data is from a PromethION 48 machine. The lab followed the ligation protocol, and we have not run adaptive sampling.

I ran the pipeline like this:
robin --threads 4 -r /data/GRCh38.p14.genome.fa -w /path/to/bam_pass /path/to/output
And I opened the URL http://10.54.216.13:8081 and pressed Live data.

The text was updated successfully, but these errors were encountered:

mattloose · 2024-08-22T11:27:54Z

Sorry for the delay in getting back to you.

I can see the issue - currently ROBIN expects a reference with chromosome names as chr1 chr2 etc etc. Your reference uses GL000008.2 etc.

You could try using this one:

ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz

or this:

https://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/p14/hg38.p14.fa.gz

I am thinking of bundling a reference with the tool goign forwards. The main issue is that your reads will need to be mapped to a reference using the chr1 etc nomenclature.

calocascio · 2024-09-09T09:38:31Z

Hi again,

Thank you for your answer! We tried a new run with the reference genome from the second link you suggested. It worked to generate a report, and I get copy number variation data for chr 1-22 + chr X, but I think the haplotype chromosomes and unplaced/unlocalized contigs are creating error messages. I attach the new log file:
IPD0241.log

Thanks again for your support.

mattloose · 2024-09-09T09:53:25Z

I'll have a look at this and get back to you asap.

If you got the report that is excellent!

mattloose · 2024-09-10T12:20:51Z

Hi - can I just check - are you aligning the data to the same reference you are using for ROBIN or are the data aligned to the complete reference?

calocascio · 2024-09-11T07:24:26Z

Hi! We used the hg38.p14.fa for both alignment and when running ROBIN. Sorry, I wasn't aware the patches were not complete. I do see that for example "chr1_GL383518v1_alt" is present in hg38.p14.fa. Which version do you recommend using for alignment?

calocascio · 2024-09-17T10:56:24Z

Hi again,
Sorry to add to the issue, but we tried the same again with another sample just to see, and this time it did not work to generate a report. The errors pertaining to that is towards the bottom of this error log:
IPD1119.log

It seems the image size is too large. I'm not sure if this is related to the issue with the reference genome or if it's unrelated?

Just let me know if you need any other information!

mattloose · 2024-09-17T12:09:51Z

ValueError: Image size of 521657841x958 pixels is too large. It must be less than 2^16 in each direction.
2024-09-12 11:25:32,404 - nicegui - ERROR - Image size of -1257769606x701 pixels is too large. It must be less than 2^16 in each direction.

Yes! That is a big image.

Can I check which version of ROBIN you are currently running? If you do robin --version what do you get?

mattloose · 2024-09-17T12:19:39Z

Acutally - if you are on any version less than 0.1.0 then you could do with updating. That might not fix the problem, but it would be worth checking!

calocascio · 2024-09-18T13:10:14Z

Ah! I was using 0.0.6. I have now recloned it and have version 0.1.0. I tried running the same as before, but now I get this error:
"Error: invalid value for '--bed_file' / '-b': File '/path/to/bed/file' does not exist.

I see now after running robin --help, that there are now many more options where it says "[required]". Most should be fine, but what do I do if I don't have a bed file (we have not run adaptive sampling)?

By the way, there may have been a problem with the index file for the reference genome I was using. I have now recreated the fasta.fai file, so we can see if that helps after this :)

mattloose · 2024-09-18T13:17:43Z

Ah - thats a new feature.

You can pass it the link to this file which will be in the downloaded repository:

src/robin/resources/panel_11092024_5kb_pad.bed

It won't affect anything you do, but ROBIN assumes that these were the targets for some of the analysis steps.

calocascio · 2024-09-23T11:37:40Z

Thank you - that's great! I ran it again like this:
robin --threads 4 -r /path/to/ref -b /path/to/bed --centreID "IPD1119" --basecall_config "guppy" --experiment_duration 72 -w /path/to/bam /path/to/output

Although I think I should have put "dorado" instead of "guppy". However, I didn't get any errors, until I tried to create a report (log file:
IPD1119.log
). I also attach screenshots of the output in case you see anything there.
screenshots.zip

Please let me know if you see anything that looks wrong, and if you have any more thoughts about the correct reference genome to use when creating the BAM files.

mattloose · 2024-09-23T12:54:55Z

Hi - yes you should have used dorado (though at the moment this will not matter at all).

I will try and recreate the report generation error here and see if I can solve it.

mattloose · 2024-09-23T19:21:59Z

Hi - I'm struggling to reproduce the error on the current version of the code.

Could you check that when you type:

robin --version

you see something like:

0.1.0

Assuming that is the case, could you look in the output folder which should be in /path/to/output/

In this folder you should see a subfolder that corresponds to the data set you are analysing. Within that folder should be two files called CNV.npy and CNV_dict.npy

These files contain no sequence data, but do contain a description of the copy number profile. I think it is these files that are causing the issue. Would you be able to share those with me? Then I might be able to track down what is causing the problem.

Thanks.

Matt

calocascio · 2024-09-24T11:38:35Z

Hi,
Yes, I am now using version 0.1.0. Sure – here are the two files from this run:
IPD1119_CNV.zip

Thank you!

mattloose · 2024-09-24T13:56:46Z

Brilliant - this is very helpful. I've been able to track down at least part of the issue.

Could you try installing the version of robin on the branch I've just pushed.

https://github.com/LooseLab/ROBIN/tree/fix/reporting_multi_chrom

This should enable you to generate a report. If it works I will merge into the main branch.

calocascio · 2024-09-26T09:44:49Z

Thanks - that's great! I tried it out (another sample), and for some reason I can create a report after it has run a little while, but not after it has finished. Since the report is 158 pages and too big to attach here, I'm only adding the first five pages of the report I got. It changed after this though: after it had finished processing all of the BAM files, the estimated coverage was about double, and I got results for nanoDx, for instance. But here is it:
IPD0737-DXX-P01-F08_run_report_short.pdf
. I also saved all of the output to the terminal, in case that is helpful:
IPD0737_stdout.log
.

mattloose · 2024-09-26T17:39:41Z

Did you reinstall from the report branch above?

It appears that you may not have done as in your report I can see this:

This is showing that you have reads aligned to alts and unplaced contigs which it is trying to plot. The version of the code on the alternate branch above should (and perhaps it doesn't!) ignore those now.

If the bam files that you have are aligned to a reference that does not include alts and unplaced etc then this shoudln't be happening.

Can you confirm the reference you are using? And also try installing from the reporting_multi_chrom branch as above?

calocascio · 2024-09-27T10:07:42Z

I did the following: git fetch origin, git checkout fix/reporting_multi_chrom, git submodule update --init --recursive. It looks like I am in the right branch when I run git branch, but let me know if I should have used different code. I am using this reference genome: , both for alignment and when running ROBIN, so maybe I need to use a different one?

calocascio · 2024-09-27T10:09:34Z

Hmm, looks like the link didn't work. This one: https://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/p14/hg38.p14.fa.gz

mattloose · 2024-09-27T10:13:01Z

Hi,

Can you run:

git checkout fix/reporting_multi_chrom
git pull
pip install -e .

Then run:

robin --version

And hopefully it will show

0.1.0b

If so then you have updated.

Then if you restart robin you should be able to browse to the previous run folder and generate the report from the existing analysis but without the multiple alignments showing (I hope!)

calocascio · 2024-09-30T12:16:35Z

Ok, now it is showing version 0.1.0b. Fantastic – that seemed to have worked! (see CNV screenshot:

). However, I'm sorry, but now there is a new error coming up (log file:
IPD0737_run2.log
), stopping the report from being generated. Thank you for all of your help so far!

mattloose · 2024-09-30T12:20:16Z

Hey - we are making progress!

This is good news :-)

I'll have a look at what is causing the date error. That one is odd and I haven't seen it before. I'll get back to you asap.

mattloose · 2024-09-30T12:28:37Z

in the folder with all the results you should have a number of files called somethign followed by _scores.csv.

There could be three or four of these. Would you be able to share them with me?

Thanks.

calocascio · 2024-09-30T12:47:15Z

Great! :) yes - here are the four files zipped:
scores_output.zip

mattloose · 2024-10-14T08:23:42Z

Please could you try the latest version on the main branch (should be version 0.1.3). This may well have resolved some of the reporting issues you were seeing.

mattloose added documentation Improvements or additions to documentation enhancement New feature or request labels Aug 22, 2024

mattloose added a commit that referenced this issue Sep 24, 2024

Fix to adress reporting problems in issue #51

ee6b38e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Errors when running the pipeline #51

Errors when running the pipeline #51

calocascio commented Aug 9, 2024

mattloose commented Aug 22, 2024

calocascio commented Sep 9, 2024

mattloose commented Sep 9, 2024

mattloose commented Sep 10, 2024

calocascio commented Sep 11, 2024

calocascio commented Sep 17, 2024

mattloose commented Sep 17, 2024

mattloose commented Sep 17, 2024

calocascio commented Sep 18, 2024

mattloose commented Sep 18, 2024

calocascio commented Sep 23, 2024

mattloose commented Sep 23, 2024

mattloose commented Sep 23, 2024

calocascio commented Sep 24, 2024

mattloose commented Sep 24, 2024

calocascio commented Sep 26, 2024

mattloose commented Sep 26, 2024

calocascio commented Sep 27, 2024

calocascio commented Sep 27, 2024

mattloose commented Sep 27, 2024

calocascio commented Sep 30, 2024

mattloose commented Sep 30, 2024

mattloose commented Sep 30, 2024

calocascio commented Sep 30, 2024

mattloose commented Oct 14, 2024

Errors when running the pipeline #51

Errors when running the pipeline #51

Comments

calocascio commented Aug 9, 2024

Environment details

Additional context

mattloose commented Aug 22, 2024

calocascio commented Sep 9, 2024

mattloose commented Sep 9, 2024

mattloose commented Sep 10, 2024

calocascio commented Sep 11, 2024

calocascio commented Sep 17, 2024

mattloose commented Sep 17, 2024

mattloose commented Sep 17, 2024

calocascio commented Sep 18, 2024

mattloose commented Sep 18, 2024

calocascio commented Sep 23, 2024

mattloose commented Sep 23, 2024

mattloose commented Sep 23, 2024

calocascio commented Sep 24, 2024

mattloose commented Sep 24, 2024

calocascio commented Sep 26, 2024

mattloose commented Sep 26, 2024

calocascio commented Sep 27, 2024

calocascio commented Sep 27, 2024

mattloose commented Sep 27, 2024

calocascio commented Sep 30, 2024

mattloose commented Sep 30, 2024

mattloose commented Sep 30, 2024

calocascio commented Sep 30, 2024

mattloose commented Oct 14, 2024