Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filtering by gene does not report existing variants - at least for orf genes #317

Closed
clinicalngs opened this issue Dec 17, 2024 · 1 comment

Comments

@clinicalngs
Copy link

Describe the bug
Filtering data by a list of genes (e.g. C12orf4, C12orf40 etc.) does not return any variants present in these genes.
After the search completes the name of the genes in the search box appear capitalized (e.g. C12ORF4, C12ORF40). So, the search is case sensitive?
If a full chromosome is searched - in this case chr12, without any filtering, all variants present in the chromosome are returned, including those in orf genes. (I increased the number of displayed variants over 100000 to see the entire chromosome). The results in the Variant tab show the variants and the gene in lower case (orf).
Looking in the sqlite Gencode 33 database under mappers, which is still used by OC (see issue #269), the orf genes are presented with lower case. Not one of the 387 genes with small "orf" in the name produced a variant in a full genome, but genes like MORF4 did.
So, I guess we are losing all variants from orf genes if we filter by genes.
Other genes potentially affected are hsa-mir-423 and hsa-mir-1253.

To Reproduce

  1. Add any open reading frame gene (orf) in the filter box (either small or capital letters in the gene name).
  2. Click on search.
  3. Zero variants are produced.

Expected behavior
Retrieve the variants in the orf genes regardless of the letter case.

Desktop:

  • OS: Win10; Latest version of OC
  • Browser chrome
  • Version 131

Additional context
The long story: I am trying to filter VCF data using gene symbols from HPO database (specific pathologies). Genes in HPO are up-to-date and listed by Entrez ID number and Symbols. Gene symbols in OpenCRAVAT are still based on Gencode 33. So I must match the new name in HPO with the old name in OpenCRAVAT for those genes that changed the symbol. The problem arises when the only option for matching is an orf gene symbol like STEEP1 = CXorf56 and ZFTA = C11orf95.
Perhaps when the issue #269 is solved then we do not have to do this.

Thanks,
Consuel

@kmoad
Copy link
Collaborator

kmoad commented Dec 17, 2024

We've made the gene name filter case-sensitive. Previously, all gene names were being forced to upper-case. This is out now on the website and in version 2.11.1 of local oc.

Regarding the update from Gencode33. We expect to be able to release Gencode 46 by Febuary 2025. It required some reverse-engineering and is taking longer than expected.

@kmoad kmoad closed this as completed Dec 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants