Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scaffolding by yahs introduces 200 bp gaps in the assembly #87

Open
janina-rinke opened this issue Apr 23, 2024 · 3 comments
Open

Scaffolding by yahs introduces 200 bp gaps in the assembly #87

janina-rinke opened this issue Apr 23, 2024 · 3 comments

Comments

@janina-rinke
Copy link

Hi,

thank you very much for this nice tool, it does a great job of scaffolding our ONT-generated assembly using HiC reads.

However, after scaffolding by yahs, gaps of a standard 200 bp length are introduced and can be seen from the final.agp file (see below). I am wondering how this occurs and whether I could set a parameter to have no such gaps in the final scaffolds. For example, I would like to avoid this introduced gap on scaffold_1 from position 19784001-19784200.

Looking at the documentation, I could not find any parameter to deal with the introduced gaps in the agp file. Thanks!

scaffold_1      1       19784000        1       W       old_scaffold_1  1       19784000        +
scaffold_1      19784001        19784200        2       N       200     scaffold        yes     proximity_ligation
scaffold_1      19784201        31747224        3       W       old_scaffold_1  19784001        31747024        +
scaffold_2      1       22623878        1       W       old_scaffold_3  1       22623878        -
scaffold_2      22623879        22624078        2       N       200     scaffold        yes     proximity_ligation
scaffold_2      22624079        22655529        3       W       old_scaffold_14 1       31451   +
@Sven-Winter
Copy link

Why would you not want the gaps? It is a scaffold that is supposed to consist of contigs linked by gaps, and Yahs uses a fixed number of 200 Ns. If you want to get rid of them you need to run gapclosing.

@MboiTui
Copy link

MboiTui commented May 31, 2024

To my understanding the gaps are there for a reason (e.g., two contigs were found to be contiguous based on contact data but no sequence overlapping the two contigs was found, thus they were put next to each other but with an arbitrary gap of 200bp). If you want to close the gaps the best way is to increase coverage or get ultra long read on top.

@c-zhou
Copy link
Owner

c-zhou commented Jun 4, 2024

Thanks @Sven-Winter and @MboiTui

@janina-rinke, yes, we put some N's between two contigs in scaffolds so people know some sequences are missed there.

Best,
Chenxi

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants