diff --git a/README.md b/README.md index 92a7400..81b1af6 100644 --- a/README.md +++ b/README.md @@ -52,7 +52,7 @@ Dispersion=var(PMBC_perc_filtered,0,2)./mean(PMBC_perc_filtered,2); [P I]=sort(Dispersion,'descend'); ``` -### Supervised Clustering and visualization +### Supervised manual hierarchical clustering and visualization PMBCClg=clustergram(PMBC_perc_filtered(I(1:2000),:),'Standardize',0,'DisplayRange',7.5,'Colormap',colormap(jet),'RowPDist','correlation','ColumnPDist','spearman','Symmetric',false,'linkage','complete', 'OptimalLeafOrder',false) @@ -75,11 +75,12 @@ And then import this list to an array ``` GenesOfInterest=table2array(readtable('test.dat','ReadVariableNames',false,'Delimiter','\t')) ``` -Then plot into a HeatMap matching the cell order of your clustergram +Then plot a HeatMap matching the cell order of your clustergram ``` HeatMap(PMBC_perc(GenesOfInterest,get(PMBCClg,'ColumnLabels')),'Standardize',2,'Colormap',colormap(jet),'Symmetric',true,'DisplayRange',2.5) ``` -screen shot 2018-09-28 at 4 58 52 pm +screen shot 2018-10-05 at 6 12 10 pm + *If you only know the gene name (e.g. HOPX) but not the gene ID, you can look it up using this:* ``` @@ -93,32 +94,37 @@ get(PMBCdm(strcontain('HOPX',get(PMBCdm,'RowNames')),:),'RowNames') Before we do this, let's first drop down our manually assigned cell clusters from clustering analysis. Based on the coloring in the example image, the clusters are: ``` -ClusterID=[repmat([1],1,451) repmat([0],1,44) repmat([2],1,137) repmat([3],1,23) repmat([0],1,664) repmat([4],1,119) repmat([0],1,20) repmat([5],1,313) repmat([0],1,195) repmat([6],1,54) repmat([7],1,202) repmat([8],1,244) repmat([0],1,15)]; +ClusterID=repmat([0],size(get(PMBCClg,'ColumnLabels'))); +ClusterID(GetOrder(get(Cluster1,'ColumnLabels'),get(PMBCClg,'ColumnLabels')))=1; +ClusterID(GetOrder(get(Cluster2,'ColumnLabels'),get(PMBCClg,'ColumnLabels')))=2; +ClusterID(GetOrder(get(Cluster3,'ColumnLabels'),get(PMBCClg,'ColumnLabels')))=3; +ClusterID(GetOrder(get(Cluster4,'ColumnLabels'),get(PMBCClg,'ColumnLabels')))=4; +ClusterID(GetOrder(get(Cluster5,'ColumnLabels'),get(PMBCClg,'ColumnLabels')))=5; +ClusterID(GetOrder(get(Cluster6,'ColumnLabels'),get(PMBCClg,'ColumnLabels')))=6; +ClusterID(GetOrder(get(Cluster7,'ColumnLabels'),get(PMBCClg,'ColumnLabels')))=7; +ClusterID(GetOrder(get(Cluster8,'ColumnLabels'),get(PMBCClg,'ColumnLabels')))=8; ``` -Now inspect top 30 PCs for the normalized dataset (you can use absolute values, if you remove the function zscore()) +Now inspect top 30 PCs for the normalized dataset (you can use normalized values, if you use the function [zscore()](https://www.mathworks.com/help/stats/zscore.html?searchHighlight=zscore&s_tid=doc_srchtitle)). ``` -PCAPlot(zscore(double(PMBC_perc_filtered(I(1:500),get(PMBCClg,'ColumnLabels')))')',get(PMBCClg,'ColumnLabels'),get(PMBC_perc_filtered(I(1:500),:),'RowNames'),ClusterID,100,30); +[COEFF,SCORE,latent]=PCAPlot(double(PMBC_perc_filtered(I(1:2000),get(PMBCClg,'ColumnLabels'))),get(PMBCClg,'ColumnLabels'),get(PMBC_perc_filtered(I(1:2000),:),'RowNames'),ClusterID',100,30); ``` Among the outputs you get an image of PC scores, where each row represents each PC's scores across individual cells, ordered in the same way as the clustergram you generated. -screen shot 2018-10-02 at 10 57 15 pm +screen shot 2018-10-05 at 6 41 52 pm -Based on this diagram, we see that only the top 9 PCs show cell-type specificity. So we only use these 9 PCs to plot t-SNE. +Based on this diagram, we see that only the top 10 PCs show cell-type specificity. So we only use these 10 PCs to plot t-SNE. Now calculate t-SNE coordinates and plot ``` -mappedX=tsne(zscore(double(PMBC_perc_filtered(I(1:500),get(PMBCClg,'ColumnLabels')))'),'Algorithm','barneshut','NumPCAComponents',9,'Distance','spearman','Perplexity',30); +mappedX=tsne(double(PMBC_perc_filtered(I(1:2000),get(PMBCClg,'ColumnLabels')))','Algorithm','barneshut','NumPCAComponents',10,'Distance','spearman','Perplexity',30); scatter(mappedX(:,1),mappedX(:,2),50,[0.8 0.8 0.8],'.') ``` -![untitled](https://user-images.githubusercontent.com/4110443/46392760-5b5d4280-c697-11e8-9754-b8db194e5e2a.jpg) +![untitled](https://user-images.githubusercontent.com/4110443/46567228-7375d180-c8e4-11e8-9364-a1ac24ae106e.jpg) *You should right click mappedX in Workspace window to save it to a file. So that next time you can use mappedX=importdata(); to import it* Now paint the t-SNE plot with cluster colors. ``` hold on -scatter(mappedX(ClusterID==1,1),mappedX(ClusterID==1,2),10,[0 0 1],'.') -scatter(mappedX(ClusterID==1,1),mappedX(ClusterID==1,2),10,[0 0 1],'.') -scatter(mappedX(ClusterID==1,1),mappedX(ClusterID==1,2),50,[0 0 1],'.') scatter(mappedX(ClusterID==1,1),mappedX(ClusterID==1,2),50,[0 0.45 0.74],'.') scatter(mappedX(ClusterID==2,1),mappedX(ClusterID==2,2),50,[0.85 0.33 0.1],'.') scatter(mappedX(ClusterID==3,1),mappedX(ClusterID==3,2),50,[0.93 0.69 0.13],'.') @@ -128,7 +134,15 @@ scatter(mappedX(ClusterID==6,1),mappedX(ClusterID==6,2),50,[0.3 0.75 0.93],'.') scatter(mappedX(ClusterID==7,1),mappedX(ClusterID==7,2),50,[0.64 0.08 0.18],'.') scatter(mappedX(ClusterID==8,1),mappedX(ClusterID==8,2),50,[1 0 0],'.') ``` -![untitled](https://user-images.githubusercontent.com/4110443/46392851-aa0adc80-c697-11e8-9582-e92dd10096c0.jpg) + +![untitled](https://user-images.githubusercontent.com/4110443/46567238-b89a0380-c8e4-11e8-931d-a4ba22b8c7d7.jpg) + +#### Unsupervised k-means clustering +Of course, you can do an unsupervised clustering using the top 10 PCs I mentioned above. +kmeans_colors=distinguishable_colors(10); +scatter(mappedX(:,1),mappedX(:,2),50,kmeans_colors(kmeans(SCORE(:,1:10),10),:),'.'); + +![untitled](https://user-images.githubusercontent.com/4110443/46567688-c30bcb80-c8eb-11e8-94bf-9cbfe379af40.jpg) #### Check expression of a specific gene ``` @@ -136,7 +150,7 @@ figure scatter(mappedX(:,1),mappedX(:,2),50,PMBC_perc('ENSG00000105369_CD79A',get(PMBCClg,'ColumnLabels')),'.') colormap(hot) ``` -![untitled](https://user-images.githubusercontent.com/4110443/46392894-d1fa4000-c697-11e8-865b-c94143d9ba9b.jpg) +![untitled](https://user-images.githubusercontent.com/4110443/46567725-acb23f80-c8ec-11e8-8858-c5ff76de3a3f.jpg) ## Discussions ### Parameter optimizations