Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make biomart query permissive to more organisms #168

Draft
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

kelshmo
Copy link
Collaborator

@kelshmo kelshmo commented Sep 14, 2021

Fixes #167 and #138. Proposes changes to functions get_biomart and plot_sexcheck_pca

Changes to get_biomart:

  • In both functions, hgnc_symbol was hard coded all over the place. For non-human organisms, the gene symbol column name may differ (e.g. mgi_symbol for mouse). This code adds grep to search for columns with "symbol".
  • When querying for mus musculus, I noticed there was a mix of empty strings "" and NA characters. I added some cleaning to convert these values to proper NA. This is necessary for some downstream code to function properly.

Changes to plot_sexcheck_pca:

  • Function returns message if xist and uty are missing from the biomart query. Presumably, the user ran get_biomart with the counts gene features therefore the gene list in the biomart query object will match counts. Since the function looks for whether these markers correlates to PCs, the presence of the markers (and the matching ensembl gene id) are required. This function independently might benefit from an error, but since this is a non-essential step of the workflow, we want to return a message.
  • chromosome_name remains hard coded. Not sure if that will present a future problem.

@kelshmo kelshmo marked this pull request as draft September 14, 2021 23:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sex_plot_pca() errors if only X or Y in counts
1 participant