-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create UniProt Mapping file. #691
base: staging
Are you sure you want to change the base?
Conversation
…nd presentor, scheduled the task to run monthly, added functions to get uniprot names and created the uniprotmapping tsv.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made some comments based on what we talked about on the call - let me know if you have any questions!
tmp_file = tmp_file(e.file_name) | ||
tmp_file.puts(e.headers.join("\t")) | ||
|
||
e.objects.find_each do |object| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You probably don't need the indirection here of calling e.objects
and can delete def self.objects
in the presenter file. This can probably just be Gene.find_each
e.objects.find_each do |object| | ||
|
||
row = e.row_from_object(object) | ||
if row[1].is_a?(Array) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might change this to be e.rows_from_object()
and assume you always get back an Array of rows and push the logic of handling multiple (or no) uniprot ids down a level. (See other comment)
] | ||
end | ||
|
||
def self.row_from_object(gene) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rename to rows_from_object()
and do something along these lines (haven't tested it, just off the top of my head):
swissprot_names = Array(Scrapers::MyGeneInfo.get_swissprot_name(gene))
formatted_overview = formatted_overview_col(gene)
swisprot_names.map do |swissprot_name|
if name == 'N/A'
nil
else
[gene.name, swissprot_name, formatted_overview]
end
end.compact
That way you have a list of rows for your TSV, compact
will remove the nil
s and the code that actually writes the TSV can just be a simple iteration over genes, calling this, and then writing a row for each item this returns.
"UniprotMapping.tsv" | ||
end | ||
|
||
def self.formatted_overview_col(gene) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd get the counts in the following ways:
eid_count = EvidenceItem.joins(variant: [:gene]).where("evidence_items.status != 'rejected'").where(variant: {gene: gene}).distinct.count
variant_count = gene.variants.joins(:evidence_items).where("evidence_items.status != 'rejected'").distinct.count
assertion_count = gene.assertions.where("status != 'rejected'").distinct.count
You could also invert the logic and do something like this:
Assertion.joins(:gene).where("status != 'rejected'").where(gene: g).distinct.count
depending on what's more clear to you.
Corrected spelling error in generate_tsvs.rb, created tsv generator and presenter, scheduled the task to run monthly, added functions to get uniprot names and created the uniprotmapping tsv. Need help getting the correct counts for variants and evidence items that are non-rejected.