-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.Rmd
140 lines (119 loc) · 5.01 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%",
fig.retina = 2
)
```
# fdacluster <img src="man/figures/logo.png" align="right" height="139" />
<!-- badges: start -->
[![R-CMD-check](https://github.com/astamm/fdacluster/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/astamm/fdacluster/actions/workflows/R-CMD-check.yaml)
[![test-coverage](https://github.com/astamm/fdacluster/workflows/test-coverage/badge.svg)](https://github.com/astamm/fdacluster/actions)
[![codecov](https://codecov.io/gh/astamm/fdacluster/branch/master/graph/badge.svg)](https://app.codecov.io/gh/astamm/fdacluster)
[![pkgdown](https://github.com/astamm/fdacluster/workflows/pkgdown/badge.svg)](https://github.com/astamm/fdacluster/actions)
[![CRAN status](https://www.r-pkg.org/badges/version/fdacluster)](https://CRAN.R-project.org/package=fdacluster)
<!-- badges: end -->
The [**fdacluster**](https://astamm.github.io/fdacluster/) package provides
implementations of the $k$-means, hierarchical agglomerative and DBSCAN
clustering methods for functional data. Variability in functional data is
intrinsically divided into three components: *amplitude*, *phase* and
*ancillary* variability. The first two sources of variability can be captured
with a dedicated statistical analysis that integrates a *curve alignment* step.
The $k$-means and HAC algorithms implemented in
[**fdacluster**](https://astamm.github.io/fdacluster/) provide clustering
structures that are based either on ampltitude variation (default behavior) or
phase variation. This is achieved by jointly performing clustering and alignment
of a functional data set. The three main related functions are
[`fdakmeans()`](https://astamm.github.io/fdacluster/reference/fdakmeans.html)
for the $k$-means,
[`fdahclust()`](https://astamm.github.io/fdacluster/reference/fdahclust.html)
for HAC and
[`fdadbscan()`](https://astamm.github.io/fdacluster/reference/fdadbscan.html)
for DBSCAN. The methods handle
**multivariate codomains**.
## Installation
You can install the official version from CRAN via:
``` r
install.packages("fdacluster")
```
or you can opt to install the development version from [GitHub](https://github.com/) with:
``` r
# install.packages("remotes")
remotes::install_github("astamm/fdacluster")
```
## Example
### Data set
```{r load-pkg, echo=FALSE}
library(fdacluster)
```
Let us consider the following simulated example of $30$ $1$-dimensional curves:
```{r data, echo=FALSE}
matplot(t(simulated30$x), t(simulated30$y[, 1, ]), type = "l", xlab = "x", ylab = "y")
```
Looking at the data set, it seems that we shall expect $3$ groups if we aim at clustering based on phase variability but probably only $2$ groups if we search for a clustering structure based on amplitude variability.
### $k$-means based on amplitude variability
We can perform $k$-means clustering based on amplitude variability as follows:
```{r kmeans-amplitude-run}
out1 <- fdakmeans(
simulated30$x,
simulated30$y,
seeds = c(1, 21),
n_clusters = 2,
centroid_type = "mean",
warping_class = "affine",
metric = "normalized_l2",
cluster_on_phase = FALSE
)
```
All of
[`fdakmeans()`](https://astamm.github.io/fdacluster/reference/fdakmeans.html),
[`fdahclust()`](https://astamm.github.io/fdacluster/reference/fdahclust.html)
and
[`fdadbscan()`](https://astamm.github.io/fdacluster/reference/fdadbscan.html)
functions returns an object of class
[`caps`](https://astamm.github.io/fdacluster/reference/caps.html) (for
**C**lustering with
**A**mplitude and **P**hase **S**eparation) for which `S3` specialized methods
of
[`ggplot2::autoplot()`](https://astamm.github.io/fdacluster/reference/autoplot.caps.html)
and
[`graphics::plot()`](https://astamm.github.io/fdacluster/reference/plot.caps.html)
have been implemented. Therefore, we can visualize the results simply with:
```{r kmeans-amplitude-viz}
plot(out1, type = "amplitude")
plot(out1, type = "phase")
```
### $k$-means based on phase variability
We can perform $k$-means clustering based on phase variability only by switch
the `cluster_on_phase` argument to `TRUE`:
```{r kmeans-phase-run}
out2 <- fdakmeans(
simulated30$x,
simulated30$y,
seeds = c(1, 11, 21),
n_clusters = 3,
centroid_type = "mean",
warping_class = "affine",
metric = "normalized_l2",
cluster_on_phase = TRUE
)
```
We can inspect the result:
```{r kmeans-phase-viz}
plot(out2, type = "amplitude")
plot(out2, type = "phase")
```
We can perform similar analyses using HAC or DBSCAN instead of $k$-means. The
[**fdacluster**](https://astamm.github.io/fdacluster/) package also provides
visualization tools to help choosing the optimal number of cluster based on WSS
and silhouette values. This can be achieved by using a combination of the
functions
[`compare_caps()`](https://astamm.github.io/fdacluster/reference/compare_caps.html)
and
[`plot.mcaps()`](https://astamm.github.io/fdacluster/reference/plot.mcaps.html).