fdacluster

The fdacluster package provides implementations of the $k$-means, hierarchical agglomerative and DBSCAN clustering methods for functional data. Variability in functional data is intrinsically divided into three components: amplitude, phase and ancillary variability. The first two sources of variability can be captured with a dedicated statistical analysis that integrates a curve alignment step. The $k$-means and HAC algorithms implemented in fdacluster provide clustering structures that are based either on ampltitude variation (default behavior) or phase variation. This is achieved by jointly performing clustering and alignment of a functional data set. The three main related functions are fdakmeans() for the $k$-means, fdahclust() for HAC and fdadbscan() for DBSCAN. The methods handle multivariate codomains.

Installation

You can install the official version from CRAN via:

install.packages("fdacluster")

or you can opt to install the development version from GitHub with:

# install.packages("remotes")
remotes::install_github("astamm/fdacluster")

Example

Data set

Let us consider the following simulated example of $30$ $1$-dimensional curves:

Looking at the data set, it seems that we shall expect $3$ groups if we aim at clustering based on phase variability but probably only $2$ groups if we search for a clustering structure based on amplitude variability.

$k$-means based on amplitude variability

We can perform $k$-means clustering based on amplitude variability as follows:

out1 <- fdakmeans(
  simulated30$x,
  simulated30$y,
  seeds = c(1, 21),
  n_clusters = 2,
  centroid_type = "mean",
  warping_class = "affine",
  metric = "normalized_l2", 
  cluster_on_phase = FALSE
)

All of fdakmeans(), fdahclust() and fdadbscan() functions returns an object of class caps (for Clustering with Amplitude and Phase Separation) for which S3 specialized methods of ggplot2::autoplot() and graphics::plot() have been implemented. Therefore, we can visualize the results simply with:

plot(out1, type = "amplitude")

plot(out1, type = "phase")

$k$-means based on phase variability

We can perform $k$-means clustering based on phase variability only by switch the cluster_on_phase argument to TRUE:

out2 <- fdakmeans(
  simulated30$x,
  simulated30$y,
  seeds = c(1, 11, 21),
  n_clusters = 3,
  centroid_type = "mean",
  warping_class = "affine",
  metric = "normalized_l2", 
  cluster_on_phase = TRUE
)

We can inspect the result:

plot(out2, type = "amplitude")

plot(out2, type = "phase")

We can perform similar analyses using HAC or DBSCAN instead of $k$-means. The fdacluster package also provides visualization tools to help choosing the optimal number of cluster based on WSS and silhouette values. This can be achieved by using a combination of the functions compare_caps() and plot.mcaps().

Name		Name	Last commit message	Last commit date
Latest commit History 284 Commits
.github		.github
R		R
data-raw		data-raw
data		data
inst/include		inst/include
man		man
src		src
tests		tests
vignettes		vignettes
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
LICENSE.md		LICENSE.md
NAMESPACE		NAMESPACE
NEWS.md		NEWS.md
README.Rmd		README.Rmd
README.md		README.md
_pkgdown.yml		_pkgdown.yml
cran-comments.md		cran-comments.md
fdacluster.Rproj		fdacluster.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

fdacluster

Installation

Example

Data set

$k$-means based on amplitude variability

$k$-means based on phase variability

About

Releases 4

Packages

Languages

License

astamm/fdacluster

Folders and files

Latest commit

History

Repository files navigation

fdacluster

Installation

Example

Data set

$k$-means based on amplitude variability

$k$-means based on phase variability

About

Resources

License

Stars

Watchers

Forks

Releases 4

Packages 0

Languages

Packages