Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added R materials #1

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
571 changes: 571 additions & 0 deletions bootcamp/R/.ipynb_checkpoints/0_intro_to_ML_R-checkpoint.ipynb

Large diffs are not rendered by default.

710 changes: 710 additions & 0 deletions bootcamp/R/.ipynb_checkpoints/1_linear_regression_R-checkpoint.ipynb

Large diffs are not rendered by default.

608 changes: 608 additions & 0 deletions bootcamp/R/.ipynb_checkpoints/2_logistic_regression_R-checkpoint.ipynb

Large diffs are not rendered by default.

853 changes: 853 additions & 0 deletions bootcamp/R/.ipynb_checkpoints/3_trees_R-checkpoint.ipynb

Large diffs are not rendered by default.

199 changes: 199 additions & 0 deletions bootcamp/R/.ipynb_checkpoints/4_unsupervised-checkpoint.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,199 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Notes:\n",
"* Using the Titanic dataset"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In unsupervised learning (UML), no labels are provided, and the learning algorithm focuses solely on detecting structure in unlabelled input data. One generally differentiates between\n",
"\n",
"Clustering, where the goal is to find homogeneous subgroups within the data; the grouping is based on distance between observations.\n",
"\n",
"Dimensionality reduction, where the goal is to identify patterns in the features of the data. Dimensionality reduction is often used to facilitate visualisation of the data, as well as a pre-processing method before supervised learning.\n",
"\n",
"UML presents specific challenges and benefits:\n",
"\n",
"there is no single goal in UML\n",
"there is generally much more unlabelled data available than labelled data."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# kmeans clustering\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"k-means:\n",
"* n observations\n",
"* k clusters (we choose k)\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"stats::kmeans(x, centers = 3, nstart = 10)\n",
"\n",
"x -> numeric data matrix\n",
"centers is k (# clusters)\n",
"nstart is number of times we can repeat process (to improve model\n",
" \n",
" \n",
"cl <- kmeans(x, 3, nstart = 10)\n",
"plot(x, col = cl$cluster))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"how it works:\n",
"Initialisation: randomly assign class membership\n",
"\n",
"Iteration:\n",
"\n",
"Calculate the centre of each subgroup as the average position of all observations is that subgroup.\n",
"Each observation is then assigned to the group of its nearest centre.\n",
"\n",
"\n",
"Get convergence GIF from Wikipedia"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"choosing the best model: (elbow method)\n",
"ks <- 1:5\n",
"tot_within_ss <- sapply(ks, function(k) {\n",
" cl <- kmeans(x, k, nstart = 10)\n",
" cl$tot.withinss\n",
"})\n",
"plot(ks, tot_within_ss, type = \"b\")\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# PCA\n",
"\n",
"4.5 Principal component analysis (PCA)\n",
"\n",
"Dimensionality reduction techniques are widely used and versatile techniques that can be used o\n",
"\n",
"find structure in features\n",
"pre-processing for other ML algorithms, and\n",
"as an aid in visualisation.\n",
"The basic principle of dimensionality reduction techniques is to transform the data into a new space that summarise properties of the whole data set along a reduced number of dimensions. These are then ideal candidates used to visualise the data along these reduced number of informative dimensions.\n",
"\n",
"4.5.1 How does it work\n",
"\n",
"Principal Component Analysis (PCA) is a technique that transforms the original n-dimensional data into a new n-dimensional space.\n",
"\n",
"These new dimensions are linear combinations of the original data, i.e. they are composed of proportions of the original variables.\n",
"Along these new dimensions, called principal components, the data expresses most of its variability along the first PC, then second, …\n",
"Principal components are orthogonal to each other, i.e. non-correlated."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"In R, we can use the prcomp function.\n",
"\n",
"Let’s explore PCA on the iris data. While it contains only 4 variables, is already becomes difficult to visualise the 3 groups along all these dimensions.\n",
"\n",
"pairs(iris[, -5], col = iris[, 5], pch = 19)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"Let’s use PCA to reduce the dimension.\n",
"\n",
"irispca <- prcomp(iris[, -5])\n",
"summary(irispca)\n",
"## Importance of components:\n",
"## PC1 PC2 PC3 PC4\n",
"## Standard deviation 2.0563 0.49262 0.2797 0.15439\n",
"## Proportion of Variance 0.9246 0.05307 0.0171 0.00521\n",
"## Cumulative Proportion 0.9246 0.97769 0.9948 1.00000\n",
"A summary of the prcomp output shows that along PC1 along, we are able to retain over 92% of the total variability in the data."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# DBSCAN\n",
"\n",
"Density-Based Spatial Clustering of Applications with Noise"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "R",
"language": "R",
"name": "ir"
},
"language_info": {
"codemirror_mode": "r",
"file_extension": ".r",
"mimetype": "text/x-r-source",
"name": "R",
"pygments_lexer": "r",
"version": "3.4.0"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
130 changes: 130 additions & 0 deletions bootcamp/R/.ipynb_checkpoints/5_basic_neural_networks-checkpoint.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,130 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Using r-tensorflow conda environment for TensorFlow installation\n",
"Determining latest installable release of TensorFlow...done\n",
"Installing TensorFlow...\n",
"\n",
"Installation complete.\n",
"\n"
]
}
],
"source": [
"library(keras)\n",
"install_keras()"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"ename": "ERROR",
"evalue": "Error: ImportError: cannot import name 'abs'\n\nDetailed traceback: \n File \"/Users/caitlin/anaconda/envs/r-tensorflow/lib/python3.6/site-packages/keras/__init__.py\", line 3, in <module>\n from . import utils\n File \"/Users/caitlin/anaconda/envs/r-tensorflow/lib/python3.6/site-packages/keras/utils/__init__.py\", line 6, in <module>\n from . import conv_utils\n File \"/Users/caitlin/anaconda/envs/r-tensorflow/lib/python3.6/site-packages/keras/utils/conv_utils.py\", line 9, in <module>\n from .. import backend as K\n File \"/Users/caitlin/anaconda/envs/r-tensorflow/lib/python3.6/site-packages/keras/backend/__init__.py\", line 87, in <module>\n from .tensorflow_backend import *\n File \"/Users/caitlin/anaconda/envs/r-tensorflow/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py\", line 5, in <module>\n import tensorflow as tf\n File \"/Users/caitlin/anaconda/envs/r-tensorflow/lib/python3.6/site-packages/tensorflow/__init__.py\", line 22, in <module>\n from tensorflow.python import pywrap_tensorflow # pylint: disable=unused-import\n File \"/Users/caitlin/anaconda/envs/r-tensorflow/lib/python3.6/site-packages/tensorflow/python/__init__.py\", line 81, in <module>\n from tensorflow.python import keras\n File \"/Users/caitlin/anaconda/envs/r-tensorflow/lib/python3.6/site-packages/tensorflow/python/keras/__init__.py\", line 24, in <module>\n from tensorflow.python.keras import activations\n File \"/Users/caitlin/anaconda/envs/r-tensorflow/lib/python3.6/site-packages/tensorflow/python/keras/activations/__init__.py\", line 22, in <module>\n from tensorflow.python.keras._impl.keras.activations import elu\n File \"/Users/caitlin/anaconda/envs/r-tensorflow/lib/python3.6/site-packages/tensorflow/python/keras/_impl/keras/__init__.py\", line 21, in <module>\n from tensorflow.python.keras._impl.keras import activations\n File \"/Users/caitlin/anaconda/envs/r-tensorflow/lib/python3.6/site-packages/tensorflow/python/keras/_impl/keras/activations.py\", line 23, in <module>\n from tensorflow.python.keras._impl.keras import backend as K\n File \"/Users/caitlin/anaconda/envs/r-tensorflow/lib/python3.6/site-packages/tensorflow/python/keras/_impl/keras/backend.py\", line 37, in <module>\n from tensorflow.python.layers import base as tf_base_layers\n File \"/Users/caitlin/anaconda/envs/r-tensorflow/lib/python3.6/site-packages/tensorflow/python/layers/base.py\", line 25, in <module>\n from tensorflow.python.keras.engine import base_layer\n File \"/Users/caitlin/anaconda/envs/r-tensorflow/lib/python3.6/site-packages/tensorflow/python/keras/engine/__init__.py\", line 21, in <module>\n from tensorflow.python.keras.engine.base_layer import InputSpec\n File \"/Users/caitlin/anaconda/envs/r-tensorflow/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py\", line 33, in <module>\n from tensorflow.python.keras import backend\n File \"/Users/caitlin/anaconda/envs/r-tensorflow/lib/python3.6/site-packages/tensorflow/python/keras/backend/__init__.py\", line 22, in <module>\n from tensorflow.python.keras._impl.keras.backend import abs\n\n",
"output_type": "error",
"traceback": [
"Error: ImportError: cannot import name 'abs'\n\nDetailed traceback: \n File \"/Users/caitlin/anaconda/envs/r-tensorflow/lib/python3.6/site-packages/keras/__init__.py\", line 3, in <module>\n from . import utils\n File \"/Users/caitlin/anaconda/envs/r-tensorflow/lib/python3.6/site-packages/keras/utils/__init__.py\", line 6, in <module>\n from . import conv_utils\n File \"/Users/caitlin/anaconda/envs/r-tensorflow/lib/python3.6/site-packages/keras/utils/conv_utils.py\", line 9, in <module>\n from .. import backend as K\n File \"/Users/caitlin/anaconda/envs/r-tensorflow/lib/python3.6/site-packages/keras/backend/__init__.py\", line 87, in <module>\n from .tensorflow_backend import *\n File \"/Users/caitlin/anaconda/envs/r-tensorflow/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py\", line 5, in <module>\n import tensorflow as tf\n File \"/Users/caitlin/anaconda/envs/r-tensorflow/lib/python3.6/site-packages/tensorflow/__init__.py\", line 22, in <module>\n from tensorflow.python import pywrap_tensorflow # pylint: disable=unused-import\n File \"/Users/caitlin/anaconda/envs/r-tensorflow/lib/python3.6/site-packages/tensorflow/python/__init__.py\", line 81, in <module>\n from tensorflow.python import keras\n File \"/Users/caitlin/anaconda/envs/r-tensorflow/lib/python3.6/site-packages/tensorflow/python/keras/__init__.py\", line 24, in <module>\n from tensorflow.python.keras import activations\n File \"/Users/caitlin/anaconda/envs/r-tensorflow/lib/python3.6/site-packages/tensorflow/python/keras/activations/__init__.py\", line 22, in <module>\n from tensorflow.python.keras._impl.keras.activations import elu\n File \"/Users/caitlin/anaconda/envs/r-tensorflow/lib/python3.6/site-packages/tensorflow/python/keras/_impl/keras/__init__.py\", line 21, in <module>\n from tensorflow.python.keras._impl.keras import activations\n File \"/Users/caitlin/anaconda/envs/r-tensorflow/lib/python3.6/site-packages/tensorflow/python/keras/_impl/keras/activations.py\", line 23, in <module>\n from tensorflow.python.keras._impl.keras import backend as K\n File \"/Users/caitlin/anaconda/envs/r-tensorflow/lib/python3.6/site-packages/tensorflow/python/keras/_impl/keras/backend.py\", line 37, in <module>\n from tensorflow.python.layers import base as tf_base_layers\n File \"/Users/caitlin/anaconda/envs/r-tensorflow/lib/python3.6/site-packages/tensorflow/python/layers/base.py\", line 25, in <module>\n from tensorflow.python.keras.engine import base_layer\n File \"/Users/caitlin/anaconda/envs/r-tensorflow/lib/python3.6/site-packages/tensorflow/python/keras/engine/__init__.py\", line 21, in <module>\n from tensorflow.python.keras.engine.base_layer import InputSpec\n File \"/Users/caitlin/anaconda/envs/r-tensorflow/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py\", line 33, in <module>\n from tensorflow.python.keras import backend\n File \"/Users/caitlin/anaconda/envs/r-tensorflow/lib/python3.6/site-packages/tensorflow/python/keras/backend/__init__.py\", line 22, in <module>\n from tensorflow.python.keras._impl.keras.backend import abs\n\nTraceback:\n",
"1. dataset_mnist()",
"2. keras$datasets",
"3. `$.python.builtin.module`(keras, \"datasets\")",
"4. py_resolve_module_proxy(x)",
"5. on_error(result)",
"6. stop(e$message, call. = FALSE)"
]
}
],
"source": [
"mnist <- dataset_mnist()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"\n",
"\n",
"\n",
"x_train <- mnist$train$x\n",
"y_train <- mnist$train$y\n",
"x_test <- mnist$test$x\n",
"y_test <- mnist$test$y\n",
"\n",
"# reshape\n",
"x_train <- array_reshape(x_train, c(nrow(x_train), 784))\n",
"x_test <- array_reshape(x_test, c(nrow(x_test), 784))\n",
"# rescale\n",
"x_train <- x_train / 255\n",
"x_test <- x_test / 255\n",
"\n",
"y_train <- to_categorical(y_train, 10)\n",
"y_test <- to_categorical(y_test, 10)\n",
"\n",
"model <- keras_model_sequential() \n",
"model %>% \n",
" layer_dense(units = 256, activation = 'relu', input_shape = c(784)) %>% \n",
" layer_dropout(rate = 0.4) %>% \n",
" layer_dense(units = 128, activation = 'relu') %>%\n",
" layer_dropout(rate = 0.3) %>%\n",
" layer_dense(units = 10, activation = 'softmax')\n",
"\n",
"summary(model)\n",
"\n",
"model %>% compile(\n",
" loss = 'categorical_crossentropy',\n",
" optimizer = optimizer_rmsprop(),\n",
" metrics = c('accuracy')\n",
")\n",
"\n",
"history <- model %>% fit(\n",
" x_train, y_train, \n",
" epochs = 30, batch_size = 128, \n",
" validation_split = 0.2\n",
")\n",
"\n",
"plot(history)\n",
"\n",
"model %>% evaluate(x_test, y_test)\n",
"\n",
"model %>% predict_classes(x_test)\n",
"\n"
]
}
],
"metadata": {
"anaconda-cloud": {},
"kernelspec": {
"display_name": "R",
"language": "R",
"name": "ir"
},
"language_info": {
"codemirror_mode": "r",
"file_extension": ".r",
"mimetype": "text/x-r-source",
"name": "R",
"pygments_lexer": "r",
"version": "3.5.1"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
Loading