Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

computing cross-mappability from gene, conflicting_genes is empty #2

Open
FunongLuo opened this issue Jun 21, 2024 · 1 comment
Open

Comments

@FunongLuo
Copy link

FunongLuo commented Jun 21, 2024

hi @alorchhota ,
When I reached step 5, Compute cross-mappability, I found that no matter how I adjusted it, I couldn’t generate a valid .crossmap.txt file. I tried using both NCBI’s GTF and Ensembl’s GTF, but neither worked. Other steps seem to be fine, and the .RData file under pos2gene can be generated successfully.
here are some file:
$
head annot.exon_utr.txt
gene_id chr annotation_source feature start_pos end_pos strand gene_name gene_type
ENSBTAG00000006648 1 ensembl exon 350267 350389 - NA protein_coding
ENSBTAG00000006648 1 ensembl exon 346602 346924 - NA protein_coding
ENSBTAG00000006648 1 ensembl exon 342547 342721 - NA protein_coding
ENSBTAG00000006648 1 ensembl exon 339070 339312 - NA protein_coding
ENSBTAG00000006648 1 ensembl exon 346925 346959 - NA protein_coding
ENSBTAG00000006648 1 ensembl exon 346602 346889 - NA protein_coding
ENSBTAG00000006648 1 ensembl exon 342547 342721 - NA protein_coding
ENSBTAG00000006648 1 ensembl exon 339070 339312 - NA protein_coding
ENSBTAG00000054829 1 ensembl exon 626079 626473 + NA protein_coding

$
head gene_mappability.txt
ENSBTAG00000000088 1
ENSBTAG00000000160 0.971354893963076
ENSBTAG00000000201 1
ENSBTAG00000000210 1
ENSBTAG00000000213 1
ENSBTAG00000000242 1
ENSBTAG00000000290 0.985351034106335
ENSBTAG00000000394 1
ENSBTAG00000000448 1
ENSBTAG00000000456 1

$
head ENSBTAG00000009831.kmer.fa

0
aaaatagatagccattgagaatttgctgtataactc
1
aaatagatagccattgagaatttgctgtataactca
2
aacctggaggggtgggatggagtgggaggtgagagg
3
aatagatagccattgagaatttgctgtataactcag
4
aatatgtaaaatagatagccattgagaatttgctgt

$####### Is the size of the file normal?#######
l cross_mappability/pos2gene/
total 11M
-rw-r--r-- 1 user host 653K Jun 21 09:30 pos2gene_1.RData
-rw-r--r-- 1 user host 570K Jun 21 09:31 pos2gene_2.RData
-rw-r--r-- 1 user host 527K Jun 21 09:33 pos2gene_3.RData
-rw-r--r-- 1 user host 526K Jun 21 09:34 pos2gene_5.RData
-rw-r--r-- 1 user host 500K Jun 21 09:35 pos2gene_4.RData
-rw-r--r-- 1 user host 492K Jun 21 09:36 pos2gene_6.RData
-rw-r--r-- 1 user host 470K Jun 21 09:37 pos2gene_8.RData
-rw-r--r-- 1 user host 483K Jun 21 09:38 pos2gene_7.RData
-rw-r--r-- 1 user host 455K Jun 21 09:39 pos2gene_11.RData
-rw-r--r-- 1 user host 431K Jun 21 09:39 pos2gene_9.RData
-rw-r--r-- 1 user host 445K Jun 21 09:40 pos2gene_10.RData
-rw-r--r-- 1 user host 367K Jun 21 09:41 pos2gene_12.RData
-rw-r--r-- 1 user host 372K Jun 21 09:42 pos2gene_15.RData
-rw-r--r-- 1 user host 356K Jun 21 09:43 pos2gene_13.RData
-rw-r--r-- 1 user host 350K Jun 21 09:43 pos2gene_14.RData
-rw-r--r-- 1 user host 348K Jun 21 09:44 pos2gene_16.RData
-rw-r--r-- 1 user host 319K Jun 21 09:45 pos2gene_17.RData
-rw-r--r-- 1 user host 304K Jun 21 09:45 pos2gene_20.RData
-rw-r--r-- 1 user host 303K Jun 21 09:46 pos2gene_21.RData
-rw-r--r-- 1 user host 309K Jun 21 09:46 pos2gene_18.RData
-rw-r--r-- 1 user host 303K Jun 21 09:47 pos2gene_19.RData
-rw-r--r-- 1 user host 266K Jun 21 09:47 pos2gene_24.RData
-rw-r--r-- 1 user host 267K Jun 21 09:48 pos2gene_22.RData
-rw-r--r-- 1 user host 244K Jun 21 09:48 pos2gene_23.RData
-rw-r--r-- 1 user host 232K Jun 21 09:49 pos2gene_26.RData
-rw-r--r-- 1 user host 239K Jun 21 09:49 pos2gene_29.RData
-rw-r--r-- 1 user host 201K Jun 21 09:50 pos2gene_28.RData
-rw-r--r-- 1 user host 194K Jun 21 09:50 pos2gene_27.RData
-rw-r--r-- 1 user host 202K Jun 21 09:50 pos2gene_25.RData

$
head ENSBTAG00000039129.alignment.txt
0 29 48559787
0 8 14704324
0 12 57984984
0 25 23555622
0 18 18976585
0 8 2155642
0 11 29727308
0 1 39163552
0 24 22006827
0 7 84106561

When I print conflicting_genes below ‘conflicting_genes <- align_dt[,get_chr_conflicts(kmer, pos, chr), by=chr]’ in R script, I found that all conflicting_genes are empty.”

[1] "[06/21/24 10:39:17] computing cross-mappability from gene ENSBTAG00000007898 to genes in 1,2,3,5,4,6,8,7,11,9,10,12,15,13,14,16,17,20,21,18,19,24,22,23,26,29,28,27,25"
Empty data.table (0 rows and 4 cols): chr,G1,G2,S1
[1] "[06/21/24 10:39:32] computing cross-mappability from gene ENSBTAG00000007998 to genes in 1,2,3,5,4,6,8,7,11,9,10,12,15,13,14,16,17,20,21,18,19,24,22,23,26,29,28,27,25"
Empty data.table (0 rows and 4 cols): chr,G1,G2,S1
[1] "[06/21/24 10:39:39] computing cross-mappability from gene ENSBTAG00000008072 to genes in 1,2,3,5,4,6,8,7,11,9,10,12,15,13,14,16,17,20,21,18,19,24,22,23,26,29,28,27,25"
Empty data.table (0 rows and 4 cols): chr,G1,G2,S1
[1] "[06/21/24 10:39:45] computing cross-mappability from gene ENSBTAG00000008314 to genes in 1,2,3,5,4,6,8,7,11,9,10,12,15,13,14,16,17,20,21,18,19,24,22,23,26,29,28,27,25"

I cannot understand what happened that caused this problem. I hope you can help.THANKS!
If you need further assistance with debugging or understanding the issue, feel free to ask!

@Yang-Yingshan
Copy link

@FunongLuo Hi, I have the same problem, do you have a solution

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants