title | author | date | output | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Eigen vector selection |
Sudeep Sahadevan |
09/17/2015 |
|
Functions to perform informative eigen vector selection based on the algorithm proposed by:
Tao Xiang and Shaogang Gong (2008). Spectral clustering with eigenvector selection. Pattern Recogn. 41(3), 1012-1029.
Article DOI: 10.1016/j.patcog.2007.07.023
Pdf link
See the original manuscript or open the html file for equations
##compute.params
Compute posterior probability for the gaussian mixture model. Given an eigen vector, compute the posterior probabilities that the given vector is a gaussian mixture under the given parameters. The variable names in this function follows the pattern adopted by Xiang and Gong (2008) in their manuscript.
- vec: input eigen vector
$e_{kn}$ - rel: numeric variable,
$0, {\leq},R_{ek},\leq,1$ , relevance of the vector. (default value: 0.50) - mean2: mean of the first gaussian mixture,
$\mu_{k2}$ , if NULL, this parameter is estimated based oninit
option - mean3: mean of the second gaussian mixture,
$\mu_{k3}$ , if NULL, this parameter is randomly estimated based oninit
option - var2: variance of the first gaussian mixture,
$\sigma_{k2}$ , if NULL, this parameter is randomly estimated based oninit
option - var3: variance of the second gaussian mixture,
$\sigma_{k3}$ , if NULL, this parameter is randomly estimated based oninit
option - w: weight of the gaussian mixture,
$\mathit{w}_{k}$ , if NULL, weight is randomly estimated asw <- runif(1,min = 0, max = 1)
-
init: initialization options "random" or "cluster", "random" random estimation of parameters and "cluster" use cluster mean from k-means clustering with centers = 2. For details on kmeans clustering see
kmeans
R function
This function is not expected to be used as such, but rather as a part of compute.relevance function
A list of many things:
- rnk: estimated
$R_{ek}^{new}$ - wnk: estimated
$\mathit{w}_{k}^{new}$ - m2nk: estimated
$\mu_{k2}^{new}$ - v2nk: estimated
$\sigma_{k2}^{new}$ - m3nk: estimated
$\mu_{k3}^{new}$ - v2nk: estimated
$\sigma_{k3}^{new}$
##compute.relevance
Given an eigenvector, compute the relevance of the vector according to the expectation maximization algorithm proposed by Xiang and Gong (2008).
- vec: input eigen vector
$e_{k}$ in the equations - tol: tolerance level for convergence (default:
$1e^{-6}$ ) - maxit: maximum number of iterations for the expectation maximization step before convergence (default: 2500)
- maxtrials: maximum number of multiple runs (default: 25)
- init: see init description
Example usage: create random dataset
testdata <- matrix(runif(36,0,1),6,6)
testdata
Make it symmetric, and assign 0 to diagonal elements
testdata <- testdata %*% t(testdata)
diag(testdata) <- 0
testdata
Compute Laplacian as
testlap <- diag(rowSums(testdata))-testdata
testlap
testeig <- eigen(testlap,isSymmetric(testlap))
testeig
The eigenvectors can be used for relevance estimation like:
testrel <- compute.relevance(testeig$vectors[,2],tol=1e-6,maxit=2500,maxtrials=2 )
testrel
The relevant eigenvectors with rnk > 0.50 can be used for further downstream processing
A list of values:
Wrapper for the function compute.relevance, instead of using a single eigenvector as input, use the eigenvector matrix.
- mat: input eigen vector matrix
- tol: tolerance level for convergence (default:
$1e^{-6}$ ) - maxit: maximum number of iterations for the expectation maximization step (default: 2500)
- maxtrials: maximum number of multiple runs (default: 25)
- init: see init description
- ncpus: number of cores to use, requires doMC and foreach packages for ncpus>1
testrel <- wrapper.compute.relevance(testeig$vectors[,c(2:4)])
testrel