Create UniProt Mapping file. #691

nairod2000 · 2021-09-02T17:27:53Z

Corrected spelling error in generate_tsvs.rb, created tsv generator and presenter, scheduled the task to run monthly, added functions to get uniprot names and created the uniprotmapping tsv. Need help getting the correct counts for variants and evidence items that are non-rejected.

…nd presentor, scheduled the task to run monthly, added functions to get uniprot names and created the uniprotmapping tsv.

acoffman

Made some comments based on what we talked about on the call - let me know if you have any questions!

acoffman · 2021-09-03T20:35:23Z

app/jobs/generate_uniprot_mapping_tsv.rb

+          tmp_file = tmp_file(e.file_name)
+          tmp_file.puts(e.headers.join("\t"))
+
+          e.objects.find_each do |object|


You probably don't need the indirection here of calling e.objects and can delete def self.objects in the presenter file. This can probably just be Gene.find_each

acoffman · 2021-09-03T20:36:53Z

app/jobs/generate_uniprot_mapping_tsv.rb

+          e.objects.find_each do |object|
+
+            row = e.row_from_object(object)
+            if row[1].is_a?(Array)


I might change this to be e.rows_from_object() and assume you always get back an Array of rows and push the logic of handling multiple (or no) uniprot ids down a level. (See other comment)

acoffman · 2021-09-03T20:41:22Z

app/presenters/uniprot_mapping_tsv_presenter.rb

+      ]
+    end 
+
+    def self.row_from_object(gene)


I'd rename to rows_from_object() and do something along these lines (haven't tested it, just off the top of my head):

swissprot_names = Array(Scrapers::MyGeneInfo.get_swissprot_name(gene)) formatted_overview = formatted_overview_col(gene) swisprot_names.map do |swissprot_name| if name == 'N/A' nil else [gene.name, swissprot_name, formatted_overview] end end.compact

That way you have a list of rows for your TSV, compact will remove the nils and the code that actually writes the TSV can just be a simple iteration over genes, calling this, and then writing a row for each item this returns.

acoffman · 2021-09-03T20:46:06Z

app/presenters/uniprot_mapping_tsv_presenter.rb

+      "UniprotMapping.tsv"
+    end 
+
+    def self.formatted_overview_col(gene)


I'd get the counts in the following ways:

eid_count = EvidenceItem.joins(variant: [:gene]).where("evidence_items.status != 'rejected'").where(variant: {gene: gene}).distinct.count variant_count = gene.variants.joins(:evidence_items).where("evidence_items.status != 'rejected'").distinct.count assertion_count = gene.assertions.where("status != 'rejected'").distinct.count

You could also invert the logic and do something like this:

Assertion.joins(:gene).where("status != 'rejected'").where(gene: g).distinct.count

depending on what's more clear to you.

corrected spelling error in generate_tsvs.rb, created tsv generator a…

176ecf2

…nd presentor, scheduled the task to run monthly, added functions to get uniprot names and created the uniprotmapping tsv.

acoffman requested changes Sep 3, 2021

View reviewed changes

nairod2000 added 3 commits September 9, 2021 21:02

testing

471f8d7

Implemented sugestions in pull request

3cb3393

fixed database.yml file

2c4e371

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create UniProt Mapping file. #691

Create UniProt Mapping file. #691

nairod2000 commented Sep 2, 2021

acoffman left a comment

acoffman Sep 3, 2021

acoffman Sep 3, 2021

acoffman Sep 3, 2021

acoffman Sep 3, 2021

Create UniProt Mapping file. #691

Are you sure you want to change the base?

Create UniProt Mapping file. #691

Conversation

nairod2000 commented Sep 2, 2021

acoffman left a comment

Choose a reason for hiding this comment

acoffman Sep 3, 2021

Choose a reason for hiding this comment

acoffman Sep 3, 2021

Choose a reason for hiding this comment

acoffman Sep 3, 2021

Choose a reason for hiding this comment

acoffman Sep 3, 2021

Choose a reason for hiding this comment