Finite mixture models are very useful when applied to data where observations originate from various groups and the group affiliations are not known. For example, in single cell RNA-seq data, transcripts in each cell can be modeled as a mixture of two probabilistic processes: 1) a negative binomial process for when a transcript is amplified and detected at a level correlating with its abundance and 2) a low-magnitude Poisson process for when drop-outs occur. These error model can be then used to provide a basis for further statistical analysis including those described in Fan et al.
In this repository I use simulations and sample data to learn about methods for model-based clustering of finite mixture Gaussian distributions.
This is ultimately my attempt at utilizing the EM algorithm for finite mixture modeling and model-based clustering in the R programming language from scratch and without the help of libraries or packages (e.g. flexmix
).
Feel free to contact me with any questions or concerns.
MIT