-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
P2 02. Final model for the LINCS dataset (batch 1) #13
Comments
As discussed during yesterday's check-in, I have computed Figure 4D as in the LINCS manuscript. I only have dose points 3.33 and 10 available. In general, we see that:
|
Interpretability analysis rerun for LINCS dataFrom plate SQ00015142 I inspected images from well B13, which is 10 uM sulfafurazole, and computed the same saliencies as before in the Stain data. I chose this well randomly and the plate based on the large size of the file. I tried inspecting SQ00015106 before, but the seeding was so sparse that picking the top and bottom saliency cells resulted in only a handful of cells in total. The seeding generally seems to be less dense than in the Stain experiments. Main takeawaysNo conclusion can be drawn from these results because the high and low saliency cells are not consistent in their appearance. |
Interpretability analysis rerun for LINCS dataFrom plate SQ00015131 I inspected images from well E13, which is 10 uM ganetespib and has HSP inhibitor as its MoA, and computed the same saliencies as before in the Stain data. This MoA is the one that relatively improved them most both for the 3.33 and 10 uM dose points when using model profiling versus average profiling. Main takeawaysI think now we can see that the green-outlined cells tend to be brighter/have stronger contrast in general than the red-outlined cells. We also see that (again) features that calculate the correlation between different channels are the most important for deciding which cells are the most or least important. IIUC, that means that cells which are very flat are not important and cells that are 'fat' in the depth dimension are more important. Then the question is: does it make sense that flat cells are less representative of the compound than fat cells? |
@EchteRobert Do you happen to know what version of CellProfiler your features were made in? 3.X or 4.X? I don't know if in a way fatal to your analysis, but Costes features in CP3.X we realized as we were putting 4.0 together are improperly calculated |
Ah, that's interesting @bethac07. According to the LINCS manuscript, it was version 2.3.1 so I'm guessing they were improperly calculated there as well. I'm wondering what the model is picking up then... Do you know how they are calculated exactly then? |
My level of understanding from memory (which I cannot stress enough may be wrong) and a bit of digging is this - Costes measurements are a special case of the Manders coefficient (which involves looking at which part of an images that threshold positive in each of 2 channels), where in Costes that threshold is defined in a particular way. In at least CellProfiler 3, but possibly/probably also 2.3, there was an assumption that there were only 255 gray levels (numerical values), which is true in 8 bit images, but is wrong in 16 bit images (which these are) which have 65535 gray levels (numerical values). So the threshold was being set basically always to 255, which most of the image has a higher brightness than, so the calculated correlation coefficients were nearly always 1. So basically, I think it was measuring "pixels brighter than 255"? |
Great catch Beth! I checked out the values of those Costes Correlation features and they are indeed all equal (or almost equal) to 1. To get these features I just looked at which features resulted in the highest saliency values (absolute). I think there are two possible explanations as to why these features popped up as 'most salient':
Instead, I now calculated the correlation between saliency and feature values (something I also did before) and that points to different features, which I hope do have some actual meaning 😄 . Below are the results for this particular well for the different saliency scores. Main takeaways
Combined saliency score
L1 norm activation saliency score
Gradient analysis score
|
Does this change the cells that would be green and red then? Seems like
yes, happy to take another look.
By the way, for the colorblind you will eventually want to change to
another color scheme. The wiki has a section that can help.
--
Sent from my mobile phone
|
It could change them yes - and I think they did (but I don't have a lot of experience with analyzing cells by eye)
Yes, I will change that! |
Model trained on 3.33 uM dose point. 3.33 uM
10 uM
|
Here I trained a model on all data available from batch 1 in the LINCS dataset, which can be found like this:
aws s3 ls s3://cellpainting-gallery/cpg0004-lincs/broad/workspace/backend/2016_04_01_a549_48hr_batch1/
The model uses 1745 features, because of an issue with 10 plates (broadinstitute/lincs-cell-painting#88 (comment)). In total, I trained the model on 136 plates, 5965 wells, including 1228 unique compounds using the 10 uM dose point. During preprocessing I removed 1587 wells due to missing MoA or compound name (pert_iname) annotation. I used the following hyperparameters:
I assess the model on the 10 uM dose point using replicate and MoA prediction and similarly on the 3.33 uM dose, which is considered the test set.
Results
Results 10 uM dose point
Replicate prediction
Welch's t-test between mlp mAP and bm mAP: Ttest_indResult(statistic=84.81208433212997, pvalue=0.0)
MoA prediction
Welch's t-test between mlp mAP and bm mAP: Ttest_indResult(statistic=6.753694914168434, pvalue=1.5518902810751288e-11)
Results 3.33 uM dose point
Replicate prediction
Welch's t-test between mlp mAP and bm mAP: Ttest_indResult(statistic=49.02599189522616, pvalue=0.0)
MoA prediction
Welch's t-test between mlp mAP and bm mAP: Ttest_indResult(statistic=3.525483296865904, pvalue=0.0004250301209859708)
Loss curves
All plate names
SQ00014812_SQ00014813_SQ00014814_SQ00014815_SQ00014816_SQ00014817_SQ00014818_SQ00014819_SQ00014820_SQ00015041_SQ00015042_SQ00015043_SQ00015044_SQ00015045_SQ00015046_SQ00015047_SQ00015048_SQ00015049_SQ00015050_SQ00015051_SQ00015052_SQ00015053_SQ00015054_SQ00015055_SQ00015056_SQ00015057_SQ00015058_SQ00015059_SQ00015096_SQ00015097_SQ00015098_SQ00015099_SQ00015100_SQ00015101_SQ00015102_SQ00015103_SQ00015105_SQ00015106_SQ00015107_SQ00015108_SQ00015109_SQ00015110_SQ00015111_SQ00015112_SQ00015116_SQ00015117_SQ00015118_SQ00015119_SQ00015120_SQ00015121_SQ00015122_SQ00015123_SQ00015124_SQ00015125_SQ00015126_SQ00015127_SQ00015128_SQ00015129_SQ00015130_SQ00015131_SQ00015132_SQ00015133_SQ00015134_SQ00015135_SQ00015136_SQ00015137_SQ00015138_SQ00015139_SQ00015140_SQ00015141_SQ00015142_SQ00015143_SQ00015144_SQ00015145_SQ00015146_SQ00015147_SQ00015148_SQ00015149_SQ00015150_SQ00015151_SQ00015152_SQ00015153_SQ00015154_SQ00015155_SQ00015156_SQ00015157_SQ00015158_SQ00015159_SQ00015160_SQ00015162_SQ00015163_SQ00015164_SQ00015165_SQ00015166_SQ00015167_SQ00015168_SQ00015169_SQ00015170_SQ00015171_SQ00015172_SQ00015173_SQ00015194_SQ00015195_SQ00015196_SQ00015197_SQ00015198_SQ00015199_SQ00015200_SQ00015201_SQ00015202_SQ00015203_SQ00015204_SQ00015205_SQ00015206_SQ00015207_SQ00015208_SQ00015209_SQ00015210_SQ00015211_SQ00015212_SQ00015214_SQ00015215_SQ00015216_SQ00015217_SQ00015218_SQ00015219_SQ00015220_SQ00015221_SQ00015222_SQ00015223_SQ00015224_SQ00015229_SQ00015230_SQ00015231_SQ00015232_SQ00015233The text was updated successfully, but these errors were encountered: