Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create UniProt Mapping file. #691

Open
wants to merge 4 commits into
base: staging
Choose a base branch
from

Conversation

nairod2000
Copy link

Corrected spelling error in generate_tsvs.rb, created tsv generator and presenter, scheduled the task to run monthly, added functions to get uniprot names and created the uniprotmapping tsv. Need help getting the correct counts for variants and evidence items that are non-rejected.

…nd presentor, scheduled the task to run monthly, added functions to get uniprot names and created the uniprotmapping tsv.
Copy link
Member

@acoffman acoffman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made some comments based on what we talked about on the call - let me know if you have any questions!

tmp_file = tmp_file(e.file_name)
tmp_file.puts(e.headers.join("\t"))

e.objects.find_each do |object|
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably don't need the indirection here of calling e.objects and can delete def self.objects in the presenter file. This can probably just be Gene.find_each

e.objects.find_each do |object|

row = e.row_from_object(object)
if row[1].is_a?(Array)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might change this to be e.rows_from_object() and assume you always get back an Array of rows and push the logic of handling multiple (or no) uniprot ids down a level. (See other comment)

]
end

def self.row_from_object(gene)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rename to rows_from_object() and do something along these lines (haven't tested it, just off the top of my head):

swissprot_names = Array(Scrapers::MyGeneInfo.get_swissprot_name(gene))
formatted_overview = formatted_overview_col(gene)
swisprot_names.map do |swissprot_name|
  if name == 'N/A'
    nil
  else 
    [gene.name, swissprot_name, formatted_overview]
  end
end.compact

That way you have a list of rows for your TSV, compact will remove the nils and the code that actually writes the TSV can just be a simple iteration over genes, calling this, and then writing a row for each item this returns.

"UniprotMapping.tsv"
end

def self.formatted_overview_col(gene)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd get the counts in the following ways:

eid_count = EvidenceItem.joins(variant: [:gene]).where("evidence_items.status != 'rejected'").where(variant: {gene: gene}).distinct.count
variant_count = gene.variants.joins(:evidence_items).where("evidence_items.status != 'rejected'").distinct.count
assertion_count = gene.assertions.where("status != 'rejected'").distinct.count

You could also invert the logic and do something like this:

Assertion.joins(:gene).where("status != 'rejected'").where(gene: g).distinct.count

depending on what's more clear to you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants