Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can I get the detailed K-mer abundance information? #1890

Open
ZhuangZK opened this issue May 13, 2019 · 2 comments
Open

Can I get the detailed K-mer abundance information? #1890

ZhuangZK opened this issue May 13, 2019 · 2 comments

Comments

@ZhuangZK
Copy link

ZhuangZK commented May 13, 2019

The result of abundance-dist.py is like below
abundance,count,cumulative,cumulative_fraction
0,0,0,0.0
1,6694694,6694694,0.48
2,906389,7601083,0.545
3,592628,8193711,0.588
4,524304,8718015,0.626
5,488859,9206874,0.661

I wonder if I can get the exect K-mer suquence instead of the number in the first col (BOLD).

@ZhuangZK ZhuangZK changed the title Can Can I get the detailed K-mer abundance information? May 13, 2019
@standage
Copy link
Member

I don't know if khmer provides any way to do this out-of-the-box. The problem is that the CountMin sketch (Countgraph or Counttable objects in khmer) don't store the k-mer sequence, only the k-mer's hash value. If you know the k-mer sequence, you can query for its abundance, but you can't determine the k-mer from the CountMin sketch alone.

One way to do this would be to count k-mers with load-into-counting.py, and then iterate over the reads again and query the count of each k-mer. If you weren't careful, you'd end up many of the k-mers multiple times, which is probably not what you want. Storing the k-mers so that they are only reported once will take A LOT of memory depending on the number and size of your sample(s).

Is there a specific reason you need every k-mer sequence? If you let us know what you're trying to do, perhaps we can help you find an alternative path to your goal.

@ctb
Copy link
Member

ctb commented May 13, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants