-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
similarity scores only on leaf nodes #19
Comments
I think (and maybe @raymond-sanchez can confirm?) that these scores came from a confusion matrix, which compares each cell type in one species to each cell type in another. If this is true the we should be able to calculate scores for the one-few or few-few relationships by summing the scores. If someone can provide to me the raw data file that these numbers were generated from, I can see if I can figure it out. |
That's correct - Nik calculated the scores only for leaf nodes within the same class between species (Glut-Glut, GABA-GABA, etc.), but not between subclasses, classes (I think both of these were assumed to essentially correlate 1-to-1) or intermediate nodes. @jeremymiller relevant files below, let me know how I can help! Scores Code to generate scores (including directions to relevant raw data files): |
I think those must have been determined in a separate analysis to the one I'm pointing to above, which Nik did mainly for cell type cards. Let me do some digging and get back to you |
Okay, these are scores based on distance matrices and not a confusion matrix, which means we cannot directly sum values in the way I said above. I don't actually know how we'd calculate similarities this way for the other nodes, if it's even possible. I would suggest removing these values for now (or I suppose you could leave incomplete as is). Ray: you are correct about it being a separate analysis. There were two strategies used to define cross-species homologies, which is a bit confusing. We might need to bring Trygve into this discussion if this is critical, but I'm going to vote for removing this value for now again. |
Ah sorry, yes I think the original analysis was done with confusion matrices, but this one for cell type cards was Euclidean distances. That sounds good, I'd be okay with removing or leaving incomplete the values that we cannot generate these same scores for. |
is the leaf node scores (the tsv files @raymond-sanchez stated in the comment above) safe to use? We would like to include examples of how we can annotate confidence in the ontology for the paper but we defs dont want to use anything that isn't accurate. |
I think they're fine to use for this purpose, but Jeremy let me know if you think otherwise. Nik calculated those scores and told me that they were "a clear and more accurate representation of the data" but we could also double check with Trygve if we want to get another look. |
They are accurate (e.g., higher is better in a quantitative way) and can be used. Moving forward (and maybe as a topic in workshop #4?) we'll want to think about a more general metric for cell type comparisons within and between taxonomies and how those can be used in an ontology. |
Thanks for all the information, that was super useful. |
see obophenotype/brain_data_standards_ontologies#281
It seems that the similarity scores are only on leaf nodes, we can represent these scores in homology relation, however there are also many homology relations that arent on leaf nodes - wondering if we had scores for those too?
The text was updated successfully, but these errors were encountered: