-
Notifications
You must be signed in to change notification settings - Fork 48
/
09_Support-vector-machines.Rmd
556 lines (392 loc) · 21.6 KB
/
09_Support-vector-machines.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
# Support Vector Machines
**Learning objectives:**
- Implement a binary classification model using a **maximal margin classifier**.
- Implement a binary classification model using a **support vector classifier**.
- Implement a binary classification model using a **support vector machine** (SVM).
- Generalize SVM models to **multi-class** cases.
**Support vector machine (SVM)**, an approach for classification developed in 1990.
SVM is a generalizaion of classifiers methods, in particular:
- **maximal margin classifier** (it requires that the classes be separable by a linear boundary).
- **support vector classifier**
- **support vector machine**: binary classification setting with two classes
```{r echo=FALSE, fig.align='center', fig.dim="100%", message = FALSE, warning = FALSE}
library("DiagrammeR")
mermaid("
graph TB
A[SVM<br>support vector machine<br>non-linear class boundaries]
B[MMC<br>maximal margin classifier<br>linear class boundaries]
B-->C[SVC<br>support vector classifier<br>applied in a broader range of cases]
A-->C
")
```
## Hyperplane
![image credit: Deep AI](images/hyperplane.png)
- A *hyperplane* is a $p-1$-dimensional flat subspace of a $p$-dimensional space. For example, in a 2-dimensional space, a hyperplane is a flat one-dimensional space: a line.
- (standard form) Definition of 2D hyperplane in 3D space:
$$\beta_{0} + \beta_{1}X_{1} + \beta_{2}X_{2} + \beta_{3}X_{3}= 0$$
- (inner products) Any $X$ s.t. $X = (X_{1}, X_{2})^T$ for which the equation above is satisfied is a point on the hyperplane.
Additional resource: [Deep AI](https://deepai.org/machine-learning-glossary-and-terms/hyperplane)
## Separating Hyperplane
- Consider a matrix **X** of dimensions $n*p$, and a $y_{i} \in \{-1, 1\}$. We have a new observation, $x^*$, which is a vector $x^* = (x^*_{1}...x^*_{p})^T$ which we wish to classify to one of two groups.
- We will use a *separating hyperplane* to classify the observation.
![](images/fig9_2.png)
- We can label the blue observations as $y_{i} = 1$ and the pink observations as $y_{i} = -1$.
- Thus, a separating hyperplane has the property s.t. $\beta_{0} + \beta_{1}X_{i1} + \beta_{2}X_{i2} ... + \beta_{p}X_{ip} > 0$ if $y_{i} =1$ and $\beta_{0} + \beta_{1}X_{i1} + \beta_{2}X_{i2} ... + \beta_{p}X_{ip} < 0$ if $y_{i} = -1$.
- In other words, a separating hyperplane has the property s.t. $y_{i}(\beta_{0} + \beta_{1}X_{i1} + \beta_{2}X_{i2} ... + \beta_{p}X_{ip}) > 0$ for all $i = 1...n$.
- Consider also the *magnitude* of $f(x^*)$. If it is far from zero, we are confident in its classification, whereas if it is close to 0, then $x^*$ is located near the hyperplane, and we are less confident about its classification.
## Maximal Margin Classifier
![](images/fig9_3.png)
- Generally, if data can be perfectly separated using a hyperplane, an infinite amount of such hyperplanes exist.
- An intuitive choice is the *maximal margin hyperplane*, which is the hyperplane that is farthest from the training data.
- We compute the perpendicular distance from each training observation to the hyperplane. The smallest of these distances is known as the *margin*.
- The *maximal margin hyperplane* is the hyperplane for which the *margin* is maximized. We can classify a test observation based on which side of the maximal margin hyperplane it lies on, and this is known as the *maximal margin classifier*.
- The maximal margin classifier classifies $x^*$ based on the sign of $f(x^*) = \beta_{0} + \beta_{1}x^*_{1} + ... + \beta_{p}x^*_{p}$.
![](images/fig9_3.png)
- Note the 3 training observations that lie on the margin and are equidistant from the hyperplane. These are the **support vectors** (vectors in $p$-dimensional space; in this case $p=2$).
- They support the hyperplane because if their location was changed, the hyperplane would change.
- The maximal margin hyperplane depends on these observations, *but not the others* (unless the other observations were moved at or within the margin).
## Mathematics of the MMC
- Consider constructing an MMC based on the training observations $x_{1}...x_{n} \in \mathbb{R}^p$. This is the solution to the optimization problem:
$$\text{max}_{\beta_{0}...\beta_{p}, M} \space M$$
$$\text{subject to } \sum_{j=1}^{p}\beta_{j}^2 = 1$$
$$y_{i}(\beta_{0} + \beta_{1}X_{i1} + \beta_{2}X_{i2} ... + \beta_{p}X_{ip}) \geq M \quad \forall i = 1...n$$
- $M$ is the *margin*, and the $\beta$ coeffients are chosen to maximize $M$.
- The constraint (3rd equation) ensures that each observation will be correctly classified, as long as M is positive.
![](images/fig9_3.png)
- The 2nd and 3rd equations ensure that each data point is on the correct side of the hyperplane and at least M-distance away from the hyperplane.
- The perpendicular distance to the hyperplane is given by $y_{i}(\beta_{0} + \beta_{1}x_{i1} + \beta_{2}x_{i2} ... + \beta_{p}x_{ip})$.
> But what if our data is not separable by a linear hyperplane?
![](images/fig9_4.png)
> Individual data points greatly affect formation of the maximal margin classifier
![](images/fig9_5.png)
## Support Vector Classifiers
- We can't always use a hyperplane to separate two classes.
- Even if such a classifier does exist, it's not always desirable, due to overfitting or too much sensitivity to individual observations.
- Thus, it might be worthwhile to consider a classifier/hyperplane that misclassifies a few observations in order to improve classification of the remaining data points.
- The *support vector classifier*, a.k.a the *soft margin classifier*, allows some training data to be on the wrong side of the margin or even the hyperplane.
## Mathematics of the SVC
- The SVC classifies a test observation based on which side of the hyperplane it lies.
$$\text{max}_{\beta_{0}...\beta_{p}, \epsilon_{1}...\epsilon_{n}, M} \space M$$
$$\text{subject to } \sum_{j=1}^{p}\beta_{j}^2 = 1$$
$$y_{i}(\beta_{0} + \beta_{1}X_{i1} + \beta_{2}X_{i2} ... + \beta_{p}X_{ip}) \geq M(1 - \epsilon_{i})$$
$$\epsilon_{i} \geq 0, \quad \sum_{i=1}^{n}\epsilon_{i} \leq C$$
- $C$ is a nonnegative tuning parameter, typically chosen through cross-validation, and can be thought of as the budget for margin violation by the observations.
- The $\epsilon_{i}$ are *slack variables* that allow individual observations to be on the wrong side of the margin or hyperplane. The $\epsilon_{i}$ indicates where the $i^{\text{th}}$ observation is located with regards to the margin and hyperplane.
- If $\epsilon_{i} = 0$, the observation is on the correct side of the margin.
- If $\epsilon_{i} > 0$, the observation is on the wrong side of margin
- If $\epsilon_{i} > 1$, the observation is on the wrong side of the hyperplane.
- Since $C$ constrains the sum of the $\epsilon_{i}$, it determines the number and magnitude of violations to the margin. If $C=0$, there is no margin for violation, thus all the $\epsilon_{1},...,\epsilon_{n} = 0$.
- Note that if $C>0$, no more than $C$ observations can be on wrong side of hyperplane, since in these cases $\epsilon_{i} > 1$.
## Tuning Parameter
![](images/fig9_7.png)
- A property of the classifier is that only data points which lie on or violate the margin will affect the hyperplane. These data points are known as *support vectors*.
- $C$ controls the bias-variance tradeoff of the classifier.
- When $C$ is large: high bias, low variance
- When $C$ is small: low bias, high variance
- The property of the SVC solely being dependent on certain observations in classification differs from other classification methods such as LDA (depends on mean of all observations in each class, as well as each class's covariance matrix using all observations).
- However, logistic regression is more similar to SVC in that it has low sensitivity to observations far from the decision boundary.
## Nonlinear Classification
- Many decision boundaries are not linear.
- We could fit an SVC to the data using $2p$ features (in the case of $p$ features and using a quadratic form).
$$X_{1}, X_{1}^{2}, \quad X_{2}, X_{2}^{2}, \quad\cdots, \quad X_{p}, X_{p}^{2}$$
$$\text{max}_{\beta_{0},\beta_{11},\beta_{12},\dots,\beta_{p1},\beta_{p2} \epsilon_{1},\dots,\epsilon_{n}, M} \space M$$
$$\text{subject to } y_{i}\left(\beta_{0} + \sum_{j=1}^{p} \beta_{ji}x_{ji} + \sum_{j=1}^{p} \beta_{ji}x_{ji}^{2}\right) \geq M(1 - \epsilon_{i})$$
$$\epsilon_{i} \geq 0, \quad \sum_{i=1}^{n}\epsilon_{i} \leq C, \quad \sum_{j=1}^{p}\sum_{k=1}^{2} \beta_{jk}^{2} = 1$$
- Note that in the enlarged feature space (here, with the quadratic terms), the decision boundary is linear. But in the original feature space, it is quadratic $q(x) = 0$ (in this example), and generally the solutions are not linear.
- One could also include interaction terms, higher degree polynomials, etc., and thus the feature space could enlarge quickly and entail unmanageable computations.
## Support Vector Machines
- The SVM is an extension of the SVC which results from using *kernels* to enlarge the feature space. A kernel is a function that quantifies the similarity of two data points.
- Essentially, we want to enlarge the feature space to make use of a nonlinear decision boundary, while avoiding getting bogged down in unmanageable calculations.
- The solution to the SVC problem in the SVM context involves only the *inner products* (AKA dot products) of the observations.
$$\langle x_{i} \; , x_{i'} \; \rangle = \sum_{j=1}^{p}x_{ij}x_{i'j}$$
In the context of SVM, the linear support vector classifier is as follows:
$$f(x) = \beta_{0} + \sum_{i=1}^{n}\alpha_{i}\langle \; x, x_i\; \rangle$$
- To estimate the $n$ $\alpha_{i}$ coefficients and $\beta_{0}$, we only need the $\binom{n}{2}$ inner products between all pairs of training observations.
- Note that in the equation above, in order to compute $f(x)$ for the new point $x$, we need the inner product between the new point and all the training observations. However, $\alpha_{i} = 0$ for all points that are not on or within the margin (i.e., points that are not support vectors). So we can rewrite the equation as follows, where $S$ is the set of support point indices:
$$f(x) = \beta_{0} + \sum_{i \in S}\alpha_{i}\langle \; x, x_{i} \; \rangle$$
- Replace every inner product with $K(x_{i}, x_{i'})$, where $K$ is a kernel function.
- $K(x_{i}, x_{i'}) = \sum_{j=1}^{p}x_{ij}x_{i'j}$ is the SVC and is known as a linear kernel since it is linear in the features.
- One could also have kernel functions of the following form, where $d$ is a positive integer:
$$K(x_{i}, x_{i'}) = \left(1 + \sum_{j=1}^{p}x_{ij}x_{i'j}\right)^d$$
- This will lead to a much more flexible decision boundary, and is basically fitting an SVC in a higher-dimensional space involving polynomials of degree $d$, instead of the original feature space.
- When an SVC is combined with a nonlinear kernel as above, the result is a *support vector machine*.
$$f(x) = \beta_{0} + \sum_{i \in S}\alpha_{i}K(x, x_{i})$$
## Radial Kernels
![image credit: Manin Bocss](images/gaussian.png)
- There are other options besides polynomial kernel functions, and a popular one is a *radial kernel*.
$$K(x, x_{i}) = \text{exp}\left(-\gamma\sum_{j=1}^p(x_{ij} - x_{i'j})^2\right), \quad \gamma > 0$$
- For a given test observations $x^*$, if it is far from $x_{i}$, then $K(x^*, x_{i})$ will be small given the negative $\gamma$ and large $\sum_{j=1}^p(x^*_{j} - x_{ij})^2)$.
- Thus, $x_{i}$ will play little role in $f(x^*)$.
- The predicted class for $x^*$ is based on the sign of $f(x^*)$, so training observations far from a given test point play little part in determining the label for a test observation.
- The radial kernel therefore exhibits local behavior with respect to other observations.
## SVM with Radial Kernels
![image credit: Manin Bocss](images/fig9_9.png)
- The advantage of using a kernel rather than simply enlarging feature space is computational, since it is only necessary to compute $\binom{n}{2}$ kernel functions.
- For radial kernels, the feature space is implicit and infinite dimensional, so we could not do the computations in such a space anyways.
## More than Two Classes
- The concept of separating hyperplanes does not extend naturally to more than two classes, but there are some ways around this.
- A *one-versus-one* approach constructs $K \choose 2$ SVMs, where $K$ is the number of classes. An observation is classified to each of the $K \choose 2$ classes, and the number of times it appears in each class is counted.
- The $k^\text{th}$ class might be coded as +1 versus the $(k')^\text{th}$ class is coded as -1.
- The data point is classified to the class for which it was most often assigned in the pairwise classifications.
- Another option is *one-versus-all* classification. This can be useful when there are a lot of classes.
- $K$ SVMs are fitted, and one of the K classes to the remaining $K-1$ classes.
- $\beta_{0k}...\beta_{pk}$ denotes the parameters that results from constructing an SVM comparing the $k$th class (coded as +1) to the other classes (-1).
- Assign test observation $x^*$ to the class $k$ for which $\beta_{0k} + ... + \beta_{pk}x^*_{p}$ is largest.
## Lab: Support Vector Classifier
<!--
[obsolete link, but could still be a good resource if found]
This is a hybrid of [Emil's tidymodels labs](https://emilhvitfeldt.github.io/ISLR-tidymodels-labs/support-vector-machines.html) and the original lab. I make sure our data matches theirs so we can compare more directly.
-->
```{r 09-tidymodels, message = FALSE, warning = FALSE}
library("tidymodels")
library("kernlab") # We'll use the plot method from this.
```
```{r 09-lab-data}
set.seed(1)
sim_data <- matrix(
rnorm (20 * 2),
ncol = 2,
dimnames = list(NULL, c("x1", "x2"))
) %>%
as_tibble() %>%
mutate(
y = factor(c(rep(-1, 10), rep(1, 10)))
) %>%
mutate(
x1 = ifelse(y == 1, x1 + 1, x1),
x2 = ifelse(y == 1, x2 + 1, x2)
)
sim_data %>%
ggplot() +
aes(x1, x2, color = y) +
geom_point() +
labs(title = "Trying to make a hyperplane classifier",
subtitle = "simulated data",
caption = "DSLC book club") +
theme_minimal()
```
```{r 09-lab-data-2}
# generated this using their process then saved it to use here.
test_data <- readRDS("data/09-testdat.rds") %>%
rename(x1 = x.1, x2 = x.2)
test_data %>%
ggplot() +
aes(x1, x2, color = y) +
geom_point() +
labs(title = "Trying to make a hyperplane classifier",
subtitle = "simulated data",
caption = "DSLC book club") +
theme_minimal()
```
We create a `spec` for a model, which we'll update throughout this lab with different costs.
```{r 09-lab-spec}
svm_linear_spec <- svm_poly(degree = 1) %>%
set_mode("classification") %>%
set_engine("kernlab", scaled = FALSE)
```
Then we do a couple fits with manual cost.
```{r 09-lab-simple_fits}
svm_linear_fit_10 <- svm_linear_spec %>%
set_args(cost = 10) %>%
fit(y ~ ., data = sim_data)
svm_linear_fit_10
svm_linear_fit_10 %>%
extract_fit_engine() %>%
plot()
svm_linear_fit_01 <- svm_linear_spec %>%
set_args(cost = 0.1) %>%
fit(y ~ ., data = sim_data)
svm_linear_fit_01
svm_linear_fit_01 %>%
extract_fit_engine() %>%
plot()
svm_linear_fit_001 <- svm_linear_spec %>%
set_args(cost = 0.01) %>%
fit(y ~ ., data = sim_data)
svm_linear_fit_001
svm_linear_fit_001 %>%
extract_fit_engine() %>%
plot()
```
### Tuning
Let's find the best cost.
```{r 09-lab-tune}
svm_linear_wf <- workflow() %>%
add_model(
svm_linear_spec %>% set_args(cost = tune())
) %>%
add_formula(y ~ .)
set.seed(1234)
sim_data_fold <- vfold_cv(sim_data, strata = y)
param_grid <- grid_regular(cost(), levels = 10)
# Our grid isn't identical to the book, but it's close enough.
param_grid
```
```{r 09-show-tune-res, eval = FALSE}
tune_res <- tune_grid(
svm_linear_wf,
resamples = sim_data_fold,
grid = param_grid
)
# We ran this locally and then saved it so everyone doesn't need to wait for
# this to process each time they build the book.
# saveRDS(tune_res, "data/09-tune_res.rds")
```
```{r 09-load-tune-res, include = FALSE}
tune_res <- readRDS("data/09-tune_res.rds")
```
```{r 09-autoplot-tune_res, eval = FALSE}
autoplot(tune_res)
```
Tune can pull out the best result for us.
```{r 09-lab-tune-best}
best_cost <- select_best(tune_res, metric = "accuracy")
svm_linear_final <- finalize_workflow(svm_linear_wf, best_cost)
svm_linear_fit <- svm_linear_final %>% fit(sim_data)
svm_linear_fit %>%
augment(new_data = test_data) %>%
conf_mat(truth = y, estimate = .pred_class)
```
$$\text{accuracy} = \frac{9 + 8}{9 + 1 + 2 + 8} = 0.85$$
```{r 09-lab-tune-other}
svm_linear_fit_001 %>%
augment(new_data = test_data) %>%
conf_mat(truth = y, estimate = .pred_class)
```
$$\text{accuracy} = \frac{11 + 3}{11 + 6 + 0 + 3} = 0.70$$
### Linearly separable data
```{r 09-lab-separate}
sim_data_sep <- sim_data %>%
mutate(
x1 = ifelse(y == 1, x1 + 0.5, x1),
x2 = ifelse(y == 1, x2 + 0.5, x2)
)
sim_data_sep %>%
ggplot() +
aes(x1, x2, color = y) +
geom_point()
svm_fit_sep_1e5 <- svm_linear_spec %>%
set_args(cost = 1e5) %>%
fit(y ~ ., data = sim_data_sep)
svm_fit_sep_1e5
svm_fit_sep_1e5 %>%
extract_fit_engine() %>%
plot()
svm_fit_sep_1 <- svm_linear_spec %>%
set_args(cost = 1) %>%
fit(y ~ ., data = sim_data_sep)
svm_fit_sep_1
svm_fit_sep_1 %>%
extract_fit_engine() %>%
plot()
test_data_sep <- test_data %>%
mutate(
x1 = ifelse(y == 1, x1 + 0.5, x1),
x2 = ifelse(y == 1, x2 + 0.5, x2)
)
svm_fit_sep_1e5 %>%
augment(new_data = test_data_sep) %>%
conf_mat(truth = y, estimate = .pred_class)
```
$$\text{accuracy} = \frac{9 + 8}{8 + 1 + 2 + 8} = 0.85$$
```{r}
svm_fit_sep_1 %>%
augment(new_data = test_data_sep) %>%
conf_mat(truth = y, estimate = .pred_class)
```
$$\text{accuracy} = \frac{9 + 9}{9 + 0 + 2 + 9} = 0.90$$
<!--
[obsolete link, but the resource could be good if found]
## Lab: Support Vector Machine (non-linear kernel)
Now that we've seen one that's comparable, I want to focus on the tidymodels version. For the rest of the meeting, see [Emil's version](https://emilhvitfeldt.github.io/ISLR-tidymodels-labs/support-vector-machines.html#support-vector-machine)
-->
## Meeting Videos
### Cohort 1
`r knitr::include_url("https://www.youtube.com/embed/g9ICZXZ2p4s")`
<details>
<summary> Meeting chat log </summary>
```
00:41:54 Jon Harmon (jonthegeek): 9.23 in the book *does* still have the alpha.
00:52:14 Federica Gazzelloni: dealing with grouping based on some indications
00:54:37 Ryan Metcalf: Laura, not sure if this is the same video series. https://www.youtube.com/channel/UCB2p-jaoolkv0h22m4I9l9Q/videos
00:55:02 Jon Harmon (jonthegeek): Here specifically: https://www.youtube.com/watch?v=ooYwHNvH-YU&list=PLAOUn-KLSAVOf4Uk-WbLGPUDFjMSyytkw&index=3
00:59:15 Federica Gazzelloni: let’s share the lab??
```
</details>
`r knitr::include_url("https://www.youtube.com/embed/fX_hQJ_fXVw")`
### Cohort 2
`r knitr::include_url("https://www.youtube.com/embed/-hwGjFSv1Lo")`
`r knitr::include_url("https://www.youtube.com/embed/EgQ7Z_UoRcY")`
<details>
<summary> Meeting chat log </summary>
```
00:10:29 Federica Gazzelloni: https://emilhvitfeldt.github.io/ISLR-tidymodels-labs/support-vector-machines.html
01:04:02 Jenny Smith: https://www.youtube.com/c/TidyX_screencast
```
</details>
### Cohort 3
`r knitr::include_url("https://www.youtube.com/embed/I7F49a6Xhw4")`
<details>
<summary> Meeting chat log </summary>
```
00:02:19 Mei Ling Soh: https://r4ds.github.io/bookclub-islr/support-vector-machines.html
00:04:52 Mei Ling Soh: Is my audio working?
00:05:09 Rose Hartman: Yes, I think so!
00:05:17 Mei Ling Soh: 👍
00:05:30 Rose Hartman: Oh, sorry, now it's not
00:41:08 Mei Ling Soh: https://www.youtube.com/watch?v=8A7L0GsBiLQ
00:41:12 Rose Hartman: Thank you!
00:46:06 Fariborz Soroush: my understanding of the radial one is it has 3d or higher dimensions capabilities, meaning that it will be nonlinear when it is plotted in a 2d domain.
00:48:27 Fariborz Soroush: true... it should be 3D+
00:49:04 Fariborz Soroush: think about the 2D projection of 3D complex objects.
00:49:26 Mei Ling Soh: 👍
00:50:34 Rose Hartman: Thank you!
```
</details>
`r knitr::include_url("https://www.youtube.com/embed/ZYRmWNx1FX0")`
<details>
<summary> Meeting chat log </summary>
```
00:03:14 Nilay Yönet: Helloooo! I am joining from the library today! :) so I will be muted :)
00:03:25 Mei Ling Soh: No problem :)
00:05:41 Nilay Yönet: I can hear you :)
00:06:34 Nilay Yönet: I will stop my video, since my internet connection is unstable :(
00:09:46 Nilay Yönet: Mei Ling I cannot hear you?
00:09:59 Jeremy Selva: You are muted
00:10:17 Nilay Yönet: yes! :)
00:39:29 Mei Ling Soh: https://juliasilge.com/blog/last-airbender/
00:42:35 Mei Ling Soh: https://www.listendata.com/2017/01/support-vector-machine-in-r-tutorial.html
00:44:27 Nilay Yönet: thank you! :) thank you also fro the links :)
00:45:00 Nilay Yönet: see you! :)
```
</details>
### Cohort 4
`r knitr::include_url("https://www.youtube.com/embed/Lp15fbaohww")`
<details>
<summary> Meeting chat log </summary>
```
01:02:33 shamsuddeen: Thank you
01:02:37 shamsuddeen: Kevin
01:02:50 shamsuddeen: Yes
01:03:08 shamsuddeen: I am rusty also in Linear Algebra
01:05:18 shamsuddeen: I need to go. See you next week. Bye and thanks Kevin
```
</details>
`r knitr::include_url("https://www.youtube.com/embed/TllKjSTVeqU")`
<details>
<summary> Meeting chat log </summary>
```
00:37:36 Sandra Muroy: https://towardsdatascience.com/understanding-auc-roc-curve-68b2303cc9c5
```
</details>
### Cohort 5
`r knitr::include_url("https://www.youtube.com/embed/mpnBWjgzunw")`
<details>
<summary> Meeting chat log </summary>
```
00:04:14 Derek Sollberger (he/him): start
00:59:44 Lucio Cornejo: https://link.springer.com/chapter/10.1007/3-540-45665-1_31
01:01:38 Derek Sollberger (he/him): end
```
</details>
`r knitr::include_url("https://www.youtube.com/embed/lHD82la2uvU")`