Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

similarity scores only on leaf nodes #19

Open
shawntanzk opened this issue May 17, 2022 · 10 comments
Open

similarity scores only on leaf nodes #19

shawntanzk opened this issue May 17, 2022 · 10 comments
Assignees

Comments

@shawntanzk
Copy link
Collaborator

see obophenotype/brain_data_standards_ontologies#281
It seems that the similarity scores are only on leaf nodes, we can represent these scores in homology relation, however there are also many homology relations that arent on leaf nodes - wondering if we had scores for those too?

@jeremymiller
Copy link
Collaborator

I think (and maybe @raymond-sanchez can confirm?) that these scores came from a confusion matrix, which compares each cell type in one species to each cell type in another. If this is true the we should be able to calculate scores for the one-few or few-few relationships by summing the scores. If someone can provide to me the raw data file that these numbers were generated from, I can see if I can figure it out.

@raymond-sanchez
Copy link
Collaborator

That's correct - Nik calculated the scores only for leaf nodes within the same class between species (Glut-Glut, GABA-GABA, etc.), but not between subclasses, classes (I think both of these were assumed to essentially correlate 1-to-1) or intermediate nodes. @jeremymiller relevant files below, let me know how I can help!

Scores
https://raw.githubusercontent.com/AllenInstitute/MOp_taxonomies_ontology/main/mouseMOp_CCN202002013/Mouse_CrossSpecies_Similarity.csv
https://raw.githubusercontent.com/AllenInstitute/MOp_taxonomies_ontology/main/humanM1_CCN201912131/Human_CrossSpecies_Similarity.csv
https://raw.githubusercontent.com/AllenInstitute/MOp_taxonomies_ontology/main/marmosetM1_CCN201912132/Marmoset_CrossSpecies_Similarity.csv

Code to generate scores (including directions to relevant raw data files):
https://github.com/AllenInstitute/celltype_cards_contenthub/blob/main/all_code/cross_species_heatmaps/input%20files/nik_script.R

@shawntanzk
Copy link
Collaborator Author

I'm a bit confused, how are the homology in things like lamp5-like C2, which is an intermediary node but cross-species, calculated/determined to be a homology node if only leaf nodes are calculated?
Screenshot 2022-05-17 at 17 28 57

@raymond-sanchez
Copy link
Collaborator

I think those must have been determined in a separate analysis to the one I'm pointing to above, which Nik did mainly for cell type cards. Let me do some digging and get back to you

@jeremymiller
Copy link
Collaborator

Okay, these are scores based on distance matrices and not a confusion matrix, which means we cannot directly sum values in the way I said above. I don't actually know how we'd calculate similarities this way for the other nodes, if it's even possible. I would suggest removing these values for now (or I suppose you could leave incomplete as is). Ray: you are correct about it being a separate analysis. There were two strategies used to define cross-species homologies, which is a bit confusing. We might need to bring Trygve into this discussion if this is critical, but I'm going to vote for removing this value for now again.

@raymond-sanchez
Copy link
Collaborator

Ah sorry, yes I think the original analysis was done with confusion matrices, but this one for cell type cards was Euclidean distances. That sounds good, I'd be okay with removing or leaving incomplete the values that we cannot generate these same scores for.

@shawntanzk
Copy link
Collaborator Author

is the leaf node scores (the tsv files @raymond-sanchez stated in the comment above) safe to use? We would like to include examples of how we can annotate confidence in the ontology for the paper but we defs dont want to use anything that isn't accurate.

@raymond-sanchez
Copy link
Collaborator

I think they're fine to use for this purpose, but Jeremy let me know if you think otherwise. Nik calculated those scores and told me that they were "a clear and more accurate representation of the data" but we could also double check with Trygve if we want to get another look.

@jeremymiller
Copy link
Collaborator

They are accurate (e.g., higher is better in a quantitative way) and can be used. Moving forward (and maybe as a topic in workshop #4?) we'll want to think about a more general metric for cell type comparisons within and between taxonomies and how those can be used in an ontology.

@shawntanzk
Copy link
Collaborator Author

Thanks for all the information, that was super useful.
We have decided not to add any homology scores for now - I think we do want to in the end, and that might involve a discussion with Trygve, but for now we will leave it out as we want to finalise the manuscript.
We will instead add it in as a discussion/challenges point (like discussion how we represent confidence, and should it be specific to dataset etc) and maybe say that it will be in included in a future release.
I will keep this ticket open till we figure it out just so we don't forget about it :) thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants