similarity scores only on leaf nodes #19

shawntanzk · 2022-05-17T13:39:11Z

see obophenotype/brain_data_standards_ontologies#281
It seems that the similarity scores are only on leaf nodes, we can represent these scores in homology relation, however there are also many homology relations that arent on leaf nodes - wondering if we had scores for those too?

jeremymiller · 2022-05-17T15:16:59Z

I think (and maybe @raymond-sanchez can confirm?) that these scores came from a confusion matrix, which compares each cell type in one species to each cell type in another. If this is true the we should be able to calculate scores for the one-few or few-few relationships by summing the scores. If someone can provide to me the raw data file that these numbers were generated from, I can see if I can figure it out.

raymond-sanchez · 2022-05-17T16:26:16Z

That's correct - Nik calculated the scores only for leaf nodes within the same class between species (Glut-Glut, GABA-GABA, etc.), but not between subclasses, classes (I think both of these were assumed to essentially correlate 1-to-1) or intermediate nodes. @jeremymiller relevant files below, let me know how I can help!

Scores
https://raw.githubusercontent.com/AllenInstitute/MOp_taxonomies_ontology/main/mouseMOp_CCN202002013/Mouse_CrossSpecies_Similarity.csv
https://raw.githubusercontent.com/AllenInstitute/MOp_taxonomies_ontology/main/humanM1_CCN201912131/Human_CrossSpecies_Similarity.csv
https://raw.githubusercontent.com/AllenInstitute/MOp_taxonomies_ontology/main/marmosetM1_CCN201912132/Marmoset_CrossSpecies_Similarity.csv

Code to generate scores (including directions to relevant raw data files):
https://github.com/AllenInstitute/celltype_cards_contenthub/blob/main/all_code/cross_species_heatmaps/input%20files/nik_script.R

shawntanzk · 2022-05-17T16:29:42Z

I'm a bit confused, how are the homology in things like lamp5-like C2, which is an intermediary node but cross-species, calculated/determined to be a homology node if only leaf nodes are calculated?

raymond-sanchez · 2022-05-17T16:37:06Z

I think those must have been determined in a separate analysis to the one I'm pointing to above, which Nik did mainly for cell type cards. Let me do some digging and get back to you

jeremymiller · 2022-05-17T16:44:23Z

Okay, these are scores based on distance matrices and not a confusion matrix, which means we cannot directly sum values in the way I said above. I don't actually know how we'd calculate similarities this way for the other nodes, if it's even possible. I would suggest removing these values for now (or I suppose you could leave incomplete as is). Ray: you are correct about it being a separate analysis. There were two strategies used to define cross-species homologies, which is a bit confusing. We might need to bring Trygve into this discussion if this is critical, but I'm going to vote for removing this value for now again.

raymond-sanchez · 2022-05-17T16:55:34Z

Ah sorry, yes I think the original analysis was done with confusion matrices, but this one for cell type cards was Euclidean distances. That sounds good, I'd be okay with removing or leaving incomplete the values that we cannot generate these same scores for.

shawntanzk · 2022-05-17T18:08:01Z

is the leaf node scores (the tsv files @raymond-sanchez stated in the comment above) safe to use? We would like to include examples of how we can annotate confidence in the ontology for the paper but we defs dont want to use anything that isn't accurate.

raymond-sanchez · 2022-05-17T18:17:53Z

I think they're fine to use for this purpose, but Jeremy let me know if you think otherwise. Nik calculated those scores and told me that they were "a clear and more accurate representation of the data" but we could also double check with Trygve if we want to get another look.

jeremymiller · 2022-05-17T18:21:14Z

They are accurate (e.g., higher is better in a quantitative way) and can be used. Moving forward (and maybe as a topic in workshop #4?) we'll want to think about a more general metric for cell type comparisons within and between taxonomies and how those can be used in an ontology.

shawntanzk · 2022-05-19T07:34:13Z

Thanks for all the information, that was super useful.
We have decided not to add any homology scores for now - I think we do want to in the end, and that might involve a discussion with Trygve, but for now we will leave it out as we want to finalise the manuscript.
We will instead add it in as a discussion/challenges point (like discussion how we represent confidence, and should it be specific to dataset etc) and maybe say that it will be in included in a future release.
I will keep this ticket open till we figure it out just so we don't forget about it :) thanks!

shawntanzk assigned jeremymiller and raymond-sanchez May 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

similarity scores only on leaf nodes #19

similarity scores only on leaf nodes #19

shawntanzk commented May 17, 2022

jeremymiller commented May 17, 2022

raymond-sanchez commented May 17, 2022

shawntanzk commented May 17, 2022

raymond-sanchez commented May 17, 2022

jeremymiller commented May 17, 2022

raymond-sanchez commented May 17, 2022

shawntanzk commented May 17, 2022

raymond-sanchez commented May 17, 2022

jeremymiller commented May 17, 2022

shawntanzk commented May 19, 2022

similarity scores only on leaf nodes #19

similarity scores only on leaf nodes #19

Comments

shawntanzk commented May 17, 2022

jeremymiller commented May 17, 2022

raymond-sanchez commented May 17, 2022

shawntanzk commented May 17, 2022

raymond-sanchez commented May 17, 2022

jeremymiller commented May 17, 2022

raymond-sanchez commented May 17, 2022

shawntanzk commented May 17, 2022

raymond-sanchez commented May 17, 2022

jeremymiller commented May 17, 2022

shawntanzk commented May 19, 2022