You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It seems here that only the fisrt omic dataset is used to generate sample.rep, reducing it using the canonical variates found for this dataset. sample.rep is then used for the clustering. Why did you choose the first omic ? Can we consider using another dataset ? Let's say :
I don't understand the last line of this code : why did you choose the min average silhouette width ? I thought the higher the silhouette value, the better was the clustering. Shouldn't it be which.max(sils) instead ?
Finally, my last question is about the choice of removing some tissues from the datasets :
filter.non.tumor.samples <- function(raw.datum, only.primary=only.primary) {
# 01 is primary, 06 is metastatic, 03 is blood derived cancer
if (!only.primary)
return(raw.datum[,substring(colnames(raw.datum), 14, 15) %in% c('01', '03', '06')])
else
return(raw.datum[,substring(colnames(raw.datum), 14, 15) %in% c('01')])
}
Why did you chose to select only primary tumors for some cancers and discard other sample types like metastatic or recurrent tumor ? Is it coherent to discard only "normal" samples and keep the information on the samples types (not running the fix.patient.names function) so that the clusters also take this information ?
I hope my questions are clear,
Thank you in advance !
Galadriel
The text was updated successfully, but these errors were encountered:
Hello,
I'm trying to run some parts of your benchmark and I have some questions about your code and some of the choices you made.
First, I have a question about the MultiCCA run :
It seems here that only the fisrt omic dataset is used to generate sample.rep, reducing it using the canonical variates found for this dataset. sample.rep is then used for the clustering. Why did you choose the first omic ? Can we consider using another dataset ? Let's say :
sample.rep = omics.transposed[[2]] %*% cca.ret$ws[[2]]
What are the consequences on the results ?
Second, in the same MultiCCA run, the silhouette values of clusters are computed to chose coherent clusters :
I don't understand the last line of this code : why did you choose the min average silhouette width ? I thought the higher the silhouette value, the better was the clustering. Shouldn't it be
which.max(sils)
instead ?Finally, my last question is about the choice of removing some tissues from the datasets :
Why did you chose to select only primary tumors for some cancers and discard other sample types like metastatic or recurrent tumor ? Is it coherent to discard only "normal" samples and keep the information on the samples types (not running the fix.patient.names function) so that the clusters also take this information ?
I hope my questions are clear,
Thank you in advance !
Galadriel
The text was updated successfully, but these errors were encountered: