-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extracting insertion/deletion data from aligned region of interest #281
Comments
Extracting the data with samtools is pretty trivial if you know the region samtools view chr1:100-200 What information are you looking for, and in what format? |
With samtools view, I can see the alignment results (mismatches, insertions, deletions) and count the sum of any one of these (or all) over a given range (i.e. chr1:100-200), however I would lose base pair resolution of this information (i.e. if my region is chr1:100-200, I will not be able to see indel counts for chr1:100, chr1:101, chr1:102, and so on). By manually clicking a position of my aligned read on IGV web app, I can obtain this information, but it is not scalable for many alignment files - is there a faster, higher-throughput way to do this via IGV or some package offered by IGV? Output format of this information is not particular important , but something simple such as columns for reference position, insertions, and deletions would be ideal. Thanks for the quick response! |
I would find it really surprising if there isn't a simple tool to do this, but I assume you have looked. This isn't the sort of thing we design IGV for, which is a visual tool, but leave this open we will consider it for the future. |
BTW, on a tangent here, but where would you expect an insertion between positions 10 and 11 to be counted? |
Great question - I am exploring variant calling programs to count these. Out of curiosity, how does IGV do this? |
Yes there is no simple tool to do this. When you map variants with Minimap2 --cs -xasm5 /-xasm20 ; or unimap, --cs you get a CS tag which is a newer type of MD type which is basically telling you what sequence is ins or del relative to the reference ( its what you see when you click the insertion in IGV, if I understand it correctly..). When you select regions in IGV you get a bed file. And when we call variants against this bed file using either sniffles2 --regions regions_igv.bed --reference reference.fasta ; or using Pilon --targets regions.bed Well it doesnt work .. It doesnt work as you explained, similarly to with the mpileup example. In bcftools mpileup it doesnt work because the max ins length is limited to 500 based but when you increase it bugs. You have the case where your region of interests spans multiple reads in your bam but only 1 region/gap in your reference. You have the case where your region of interest on IGV spans the flanks of a gap in your query.bam and so samtools view -L cannot gap two reads of your query.bam with NNNN You have the case were your region of interest spans multiple gaps in your reference.fasta but only 1 read in your query.bam, this makes took like ragtag or quartet crash. There was an option on IGV to copy the consensus sequence of an alignment when you right click on the alignment. It doesnt even work and would extract the full consensus and doesn't even work when we paste it in notes.. Unfortunately no one ever solved this problem. I am trying to write a script and test it. Without much success. |
@jsromanowski I'm not exactly sure what you are asking. Are you asking about the insertion/deletion details of a single aligned read, or a pileup at a specific location of multiple aligned reads? For a single read the indel information is in the alignment record, specifically the "cigar" string. For a pileup at a location IGV simply counts the indels at that location for the alignments intersecting the location. @Isoris I'm not following your discussion at all. However there is no concept of a consensus sequence for a single alignment, by definition a consensus sequence is the consensus of multiple alignments. However this is a tangent discussion. |
Hi @jrobinso, Thank you for clarifying. Let me rephrase my question for better understanding. I'm asking whether IGV can be used to extract and manipulate data when there is a missing sequence in the reference (REF) relative to the alternate allele (ALT) or query (QRY). Specifically:
|
@Isoris No, thats not in IGV's scope. |
Hello IGV team,
I am currently analyzing amplicons from an insect genomic DNA sample edited with CRISPR/Cas9. I am aware IGV offers insertion and deletion data by selecting a specific base pair of your aligned read(s). Is it possible to access and export this data for a highlighted region of interest from IGV?
Side note: I am aware this data is contained within .bam files, however extracting the information in a clean way is not so simple and can get a bit messy (i.e. with mpileup offered by bcftools). I would also like to add I do not intend to use this data in a clinical or therapeutic context, but instead am using it to analyze gene editing in insects - so although I'm aware this web app is not particularly intended for this purpose, the ability to this data from IGV would be extremely useful and sufficient for my application.
Thanks,
Joe
The text was updated successfully, but these errors were encountered: