-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Errors when running the pipeline #51
Comments
Sorry for the delay in getting back to you. I can see the issue - currently ROBIN expects a reference with chromosome names as chr1 chr2 etc etc. Your reference uses GL000008.2 etc. You could try using this one: ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz or this: https://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/p14/hg38.p14.fa.gz I am thinking of bundling a reference with the tool goign forwards. The main issue is that your reads will need to be mapped to a reference using the chr1 etc nomenclature. |
Hi again, Thank you for your answer! We tried a new run with the reference genome from the second link you suggested. It worked to generate a report, and I get copy number variation data for chr 1-22 + chr X, but I think the haplotype chromosomes and unplaced/unlocalized contigs are creating error messages. I attach the new log file: Thanks again for your support. |
I'll have a look at this and get back to you asap. If you got the report that is excellent! |
Hi - can I just check - are you aligning the data to the same reference you are using for ROBIN or are the data aligned to the complete reference? |
Hi! We used the hg38.p14.fa for both alignment and when running ROBIN. Sorry, I wasn't aware the patches were not complete. I do see that for example "chr1_GL383518v1_alt" is present in hg38.p14.fa. Which version do you recommend using for alignment? |
Hi again, It seems the image size is too large. I'm not sure if this is related to the issue with the reference genome or if it's unrelated? Just let me know if you need any other information! |
ValueError: Image size of 521657841x958 pixels is too large. It must be less than 2^16 in each direction. Yes! That is a big image. Can I check which version of ROBIN you are currently running? If you do robin --version what do you get? |
Acutally - if you are on any version less than 0.1.0 then you could do with updating. That might not fix the problem, but it would be worth checking! |
Ah! I was using 0.0.6. I have now recloned it and have version 0.1.0. I tried running the same as before, but now I get this error: I see now after running robin --help, that there are now many more options where it says "[required]". Most should be fine, but what do I do if I don't have a bed file (we have not run adaptive sampling)? By the way, there may have been a problem with the index file for the reference genome I was using. I have now recreated the fasta.fai file, so we can see if that helps after this :) |
Ah - thats a new feature. You can pass it the link to this file which will be in the downloaded repository: src/robin/resources/panel_11092024_5kb_pad.bed It won't affect anything you do, but ROBIN assumes that these were the targets for some of the analysis steps. |
Thank you - that's great! I ran it again like this: Although I think I should have put "dorado" instead of "guppy". However, I didn't get any errors, until I tried to create a report (log file: Please let me know if you see anything that looks wrong, and if you have any more thoughts about the correct reference genome to use when creating the BAM files. |
Hi - yes you should have used dorado (though at the moment this will not matter at all). I will try and recreate the report generation error here and see if I can solve it. |
Hi - I'm struggling to reproduce the error on the current version of the code. Could you check that when you type:
you see something like:
Assuming that is the case, could you look in the output folder which should be in /path/to/output/ In this folder you should see a subfolder that corresponds to the data set you are analysing. Within that folder should be two files called CNV.npy and CNV_dict.npy These files contain no sequence data, but do contain a description of the copy number profile. I think it is these files that are causing the issue. Would you be able to share those with me? Then I might be able to track down what is causing the problem. Thanks. Matt |
Hi, Thank you! |
Brilliant - this is very helpful. I've been able to track down at least part of the issue. Could you try installing the version of robin on the branch I've just pushed. https://github.com/LooseLab/ROBIN/tree/fix/reporting_multi_chrom This should enable you to generate a report. If it works I will merge into the main branch. |
Thanks - that's great! I tried it out (another sample), and for some reason I can create a report after it has run a little while, but not after it has finished. Since the report is 158 pages and too big to attach here, I'm only adding the first five pages of the report I got. It changed after this though: after it had finished processing all of the BAM files, the estimated coverage was about double, and I got results for nanoDx, for instance. But here is it: |
Did you reinstall from the report branch above? It appears that you may not have done as in your report I can see this: This is showing that you have reads aligned to alts and unplaced contigs which it is trying to plot. The version of the code on the alternate branch above should (and perhaps it doesn't!) ignore those now. If the bam files that you have are aligned to a reference that does not include alts and unplaced etc then this shoudln't be happening. Can you confirm the reference you are using? And also try installing from the reporting_multi_chrom branch as above? |
Hmm, looks like the link didn't work. This one: https://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/p14/hg38.p14.fa.gz |
Hi, Can you run:
Then run:
And hopefully it will show
If so then you have updated. Then if you restart robin you should be able to browse to the previous run folder and generate the report from the existing analysis but without the multiple alignments showing (I hope!) |
Ok, now it is showing version 0.1.0b. Fantastic – that seemed to have worked! (see CNV screenshot: |
Hey - we are making progress! This is good news :-) I'll have a look at what is causing the date error. That one is odd and I haven't seen it before. I'll get back to you asap. |
in the folder with all the results you should have a number of files called somethign followed by _scores.csv. There could be three or four of these. Would you be able to share them with me? Thanks. |
Great! :) yes - here are the four files zipped: |
Please could you try the latest version on the main branch (should be version 0.1.3). This may well have resolved some of the reporting issues you were seeing. |
When running the pipeline, I get methylation classification results in the GUI, but there are several errors in the terminal. CNV data is missing, and I cannot generate a report from the GUI. I have attached the log file
NB19-509.log
. Please let me know if you need any additional information.
Environment details
Ubuntu 20.04.6
Additional context
The sequence data is from a PromethION 48 machine. The lab followed the ligation protocol, and we have not run adaptive sampling.
I ran the pipeline like this:
robin --threads 4 -r /data/GRCh38.p14.genome.fa -w /path/to/bam_pass /path/to/output
And I opened the URL http://10.54.216.13:8081 and pressed Live data.
The text was updated successfully, but these errors were encountered: