Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

what is the "per set sequence lengths" #11

Open
smallfishcui opened this issue Nov 9, 2019 · 5 comments
Open

what is the "per set sequence lengths" #11

smallfishcui opened this issue Nov 9, 2019 · 5 comments

Comments

@smallfishcui
Copy link

Hi,

I am not so clear from the manual about "per set sequence lengths". Is it about the target genome or the query sequence?
I am comparing two genome assembly from minicap2, and I am using the minidot.R to plot the paf result from the minimap2, but I don't have another legnth file. Is it okay to take the header from the sam file from minimap2? Is there a file format for the length file?
It seems the executable minidot in the bin folder only works with minimap version 1, and it is recommended to use minicap2 now....

Thanks,
Cui

@thackl
Copy link
Owner

thackl commented Nov 9, 2019

Hi Cui,

it is a 3 column file gives the length of each contig of each genome:

  1. genome_id - name of genome fasta file without .fa
  2. contig_id
  3. contig_length

I usually create it using samtools faidx, and then just by adding the genome_id in the first column.

If you run bin/minimap it will do that automatically. I should work with minimap2 too, but I never had the time to really test it. Hope that helps!

@smallfishcui
Copy link
Author

Hi Thomas,

Thank you for your quick and clear explanation. I have one more question, should I follow the order that target file should be placed in the first half of the file, and query genome in the second half of the file. Or the order doesn't matter?
I generated the paf file using minimap2, would this file compatible?
I couldn't find the installation instruction from the minicap website, so I am not sure if i could run it at all....

Thanks,
Cui

@thackl
Copy link
Owner

thackl commented Nov 9, 2019

The order shouldn't matter - I think it is just joined by genome_id. minimap2 paf files should work too. But let me know if they don't.

@smallfishcui
Copy link
Author

Hi,

I made a length file as instructed, and a paf file generated in minimap2, and using minicot.R script to construct the dot, but got an error message like this:
Error in Math.factor(x$V2) : ‘cumsum’ not meaningful for factors
Calls: cbind -> lapply -> FUN -> Math.factor
Execution halted

Do you know what could be the reason? Or maybe there is something wrong with my steps?

thanks,
Cui

@jdamas13
Copy link

@smallfishcui were you able to fix this problem?
I am in the same situation.
Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants