You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
Filtering data by a list of genes (e.g. C12orf4, C12orf40 etc.) does not return any variants present in these genes.
After the search completes the name of the genes in the search box appear capitalized (e.g. C12ORF4, C12ORF40). So, the search is case sensitive?
If a full chromosome is searched - in this case chr12, without any filtering, all variants present in the chromosome are returned, including those in orf genes. (I increased the number of displayed variants over 100000 to see the entire chromosome). The results in the Variant tab show the variants and the gene in lower case (orf).
Looking in the sqlite Gencode 33 database under mappers, which is still used by OC (see issue #269), the orf genes are presented with lower case. Not one of the 387 genes with small "orf" in the name produced a variant in a full genome, but genes like MORF4 did.
So, I guess we are losing all variants from orf genes if we filter by genes.
Other genes potentially affected are hsa-mir-423 and hsa-mir-1253.
To Reproduce
Add any open reading frame gene (orf) in the filter box (either small or capital letters in the gene name).
Click on search.
Zero variants are produced.
Expected behavior
Retrieve the variants in the orf genes regardless of the letter case.
Desktop:
OS: Win10; Latest version of OC
Browser chrome
Version 131
Additional context
The long story: I am trying to filter VCF data using gene symbols from HPO database (specific pathologies). Genes in HPO are up-to-date and listed by Entrez ID number and Symbols. Gene symbols in OpenCRAVAT are still based on Gencode 33. So I must match the new name in HPO with the old name in OpenCRAVAT for those genes that changed the symbol. The problem arises when the only option for matching is an orf gene symbol like STEEP1 = CXorf56 and ZFTA = C11orf95.
Perhaps when the issue #269 is solved then we do not have to do this.
Thanks,
Consuel
The text was updated successfully, but these errors were encountered:
We've made the gene name filter case-sensitive. Previously, all gene names were being forced to upper-case. This is out now on the website and in version 2.11.1 of local oc.
Regarding the update from Gencode33. We expect to be able to release Gencode 46 by Febuary 2025. It required some reverse-engineering and is taking longer than expected.
Describe the bug
Filtering data by a list of genes (e.g. C12orf4, C12orf40 etc.) does not return any variants present in these genes.
After the search completes the name of the genes in the search box appear capitalized (e.g. C12ORF4, C12ORF40). So, the search is case sensitive?
If a full chromosome is searched - in this case chr12, without any filtering, all variants present in the chromosome are returned, including those in orf genes. (I increased the number of displayed variants over 100000 to see the entire chromosome). The results in the Variant tab show the variants and the gene in lower case (orf).
Looking in the sqlite Gencode 33 database under mappers, which is still used by OC (see issue #269), the orf genes are presented with lower case. Not one of the 387 genes with small "orf" in the name produced a variant in a full genome, but genes like MORF4 did.
So, I guess we are losing all variants from orf genes if we filter by genes.
Other genes potentially affected are hsa-mir-423 and hsa-mir-1253.
To Reproduce
Expected behavior
Retrieve the variants in the orf genes regardless of the letter case.
Desktop:
Additional context
The long story: I am trying to filter VCF data using gene symbols from HPO database (specific pathologies). Genes in HPO are up-to-date and listed by Entrez ID number and Symbols. Gene symbols in OpenCRAVAT are still based on Gencode 33. So I must match the new name in HPO with the old name in OpenCRAVAT for those genes that changed the symbol. The problem arises when the only option for matching is an orf gene symbol like STEEP1 = CXorf56 and ZFTA = C11orf95.
Perhaps when the issue #269 is solved then we do not have to do this.
Thanks,
Consuel
The text was updated successfully, but these errors were encountered: