Skip to content
This repository has been archived by the owner on Mar 2, 2021. It is now read-only.

svtk bedcluster doesn't cluster events with identical coordinates #99

Open
mgonzalezporta opened this issue Sep 10, 2020 · 0 comments
Open

Comments

@mgonzalezporta
Copy link

E.g. for this input:

$ cat input.bed
chr10   105297001       105303000       chr10:105297001-105303000       WHB3855 deletion
chr10   105297001       105302000       chr10:105297001-105302000       WHB3873 deletion
chr10   105297001       105303000       chr10:105297001-105303000       WHB3880 deletion
chr10   105297001       105303000       chr10:105297001-105303000       WHB3882 deletion
chr10   105297001       105303000       chr10:105297001-105303000       WHB3884 deletion
chr10   105297001       105302000       chr10:105297001-105302000       WHB3904 deletion
chr10   105297001       105303000       chr10:105297001-105303000       WHB3934 deletion
chr10   105297001       105303000       chr10:105297001-105303000       WHB3939 deletion
chr10   105297001       105303000       chr10:105297001-105303000       WHB3961 deletion

Running svtk bedcluster gives the following output:

$ svtk bedcluster input.bed output.bed --merge-coordinates --prefix bedcluster --frac 0.8

$ cat output.bed
#chrom  start   end     name    svtype  sample  call_name       vaf     vac     pre_rmsstd      post_rmsstd
chr10   105297001       105303000       bedcluster_0    deletion        WHB3855 chr10:105297001-105303000       0.111   1       0.000   0.000
chr10   105297001       105302000       bedcluster_1    deletion        WHB3873 chr10:105297001-105302000       0.111   1       0.000   0.000
chr10   105297001       105303000       bedcluster_2    deletion        WHB3880 chr10:105297001-105303000       0.111   1       0.000   0.000
chr10   105297001       105303000       bedcluster_3    deletion        WHB3882 chr10:105297001-105303000       0.111   1       0.000   0.000
chr10   105297001       105303000       bedcluster_4    deletion        WHB3884 chr10:105297001-105303000       0.111   1       0.000   0.000
chr10   105297001       105302500       bedcluster_5    deletion        WHB3904 chr10:105297001-105302000       0.222   2       500.000 500.000
chr10   105297001       105302500       bedcluster_5    deletion        WHB3961 chr10:105297001-105303000       0.222   2       500.000 500.000
chr10   105297001       105303000       bedcluster_6    deletion        WHB3934 chr10:105297001-105303000       0.111   1       0.000   0.000
chr10   105297001       105303000       bedcluster_7    deletion        WHB3939 chr10:105297001-105303000       0.111   1       0.000   0.000

There's actually 6 calls that share the same coordinates, but they are reported with different bedcluster ids:

$ cat output.bed | cut -f 1-3 | grep -v '^#' | sort | uniq -c | sort -r
      6 chr10   105297001       105303000
      2 chr10   105297001       105302500
      1 chr10   105297001       105302000

A suggestion: ensure that events with identical coordinates have the same bedcluster id in the clustered output bed.

Thanks!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant