experoinc · caitlinhudon · Jul 25, 2018 · Jul 27, 2018 · Jul 27, 2018
diff --git a/bootcamp/R/.ipynb_checkpoints/0_intro_to_ML_R-checkpoint.ipynb b/bootcamp/R/.ipynb_checkpoints/0_intro_to_ML_R-checkpoint.ipynb
diff --git a/bootcamp/R/.ipynb_checkpoints/1_linear_regression_R-checkpoint.ipynb b/bootcamp/R/.ipynb_checkpoints/1_linear_regression_R-checkpoint.ipynb
diff --git a/bootcamp/R/.ipynb_checkpoints/2_logistic_regression_R-checkpoint.ipynb b/bootcamp/R/.ipynb_checkpoints/2_logistic_regression_R-checkpoint.ipynb
diff --git a/bootcamp/R/.ipynb_checkpoints/3_trees_R-checkpoint.ipynb b/bootcamp/R/.ipynb_checkpoints/3_trees_R-checkpoint.ipynb
diff --git a/bootcamp/R/.ipynb_checkpoints/4_unsupervised-checkpoint.ipynb b/bootcamp/R/.ipynb_checkpoints/4_unsupervised-checkpoint.ipynb
@@ -0,0 +1,199 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Notes:\n",
+    "* Using the Titanic dataset"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "In unsupervised learning (UML), no labels are provided, and the learning algorithm focuses solely on detecting structure in unlabelled input data. One generally differentiates between\n",
+    "\n",
+    "Clustering, where the goal is to find homogeneous subgroups within the data; the grouping is based on distance between observations.\n",
+    "\n",
+    "Dimensionality reduction, where the goal is to identify patterns in the features of the data. Dimensionality reduction is often used to facilitate visualisation of the data, as well as a pre-processing method before supervised learning.\n",
+    "\n",
+    "UML presents specific challenges and benefits:\n",
+    "\n",
+    "there is no single goal in UML\n",
+    "there is generally much more unlabelled data available than labelled data."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# kmeans clustering\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "k-means:\n",
+    "* n observations\n",
+    "* k clusters (we choose k)\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "stats::kmeans(x, centers = 3, nstart = 10)\n",
+    "\n",
+    "x -> numeric data matrix\n",
+    "centers is k (# clusters)\n",
+    "nstart is number of times we can repeat process (to improve model\n",
+    "                                                \n",
+    "                                                \n",
+    "cl <- kmeans(x, 3, nstart = 10)\n",
+    "plot(x, col = cl$cluster))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "how it works:\n",
+    "Initialisation: randomly assign class membership\n",
+    "\n",
+    "Iteration:\n",
+    "\n",
+    "Calculate the centre of each subgroup as the average position of all observations is that subgroup.\n",
+    "Each observation is then assigned to the group of its nearest centre.\n",
+    "\n",
+    "\n",
+    "Get convergence GIF from Wikipedia"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "choosing the best model: (elbow method)\n",
+    "ks <- 1:5\n",
+    "tot_within_ss <- sapply(ks, function(k) {\n",
+    "    cl <- kmeans(x, k, nstart = 10)\n",
+    "    cl$tot.withinss\n",
+    "})\n",
+    "plot(ks, tot_within_ss, type = \"b\")\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# PCA\n",
+    "\n",
+    "4.5 Principal component analysis (PCA)\n",
+    "\n",
+    "Dimensionality reduction techniques are widely used and versatile techniques that can be used o\n",
+    "\n",
+    "find structure in features\n",
+    "pre-processing for other ML algorithms, and\n",
+    "as an aid in visualisation.\n",
+    "The basic principle of dimensionality reduction techniques is to transform the data into a new space that summarise properties of the whole data set along a reduced number of dimensions. These are then ideal candidates used to visualise the data along these reduced number of informative dimensions.\n",
+    "\n",
+    "4.5.1 How does it work\n",
+    "\n",
+    "Principal Component Analysis (PCA) is a technique that transforms the original n-dimensional data into a new n-dimensional space.\n",
+    "\n",
+    "These new dimensions are linear combinations of the original data, i.e. they are composed of proportions of the original variables.\n",
+    "Along these new dimensions, called principal components, the data expresses most of its variability along the first PC, then second, …\n",
+    "Principal components are orthogonal to each other, i.e. non-correlated."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "In R, we can use the prcomp function.\n",
+    "\n",
+    "Let’s explore PCA on the iris data. While it contains only 4 variables, is already becomes difficult to visualise the 3 groups along all these dimensions.\n",
+    "\n",
+    "pairs(iris[, -5], col = iris[, 5], pch = 19)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "Let’s use PCA to reduce the dimension.\n",
+    "\n",
+    "irispca <- prcomp(iris[, -5])\n",
+    "summary(irispca)\n",
+    "## Importance of components:\n",
+    "##                           PC1     PC2    PC3     PC4\n",
+    "## Standard deviation     2.0563 0.49262 0.2797 0.15439\n",
+    "## Proportion of Variance 0.9246 0.05307 0.0171 0.00521\n",
+    "## Cumulative Proportion  0.9246 0.97769 0.9948 1.00000\n",
+    "A summary of the prcomp output shows that along PC1 along, we are able to retain over 92% of the total variability in the data."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# DBSCAN\n",
+    "\n",
+    "Density-Based Spatial Clustering of Applications with Noise"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "R",
+   "language": "R",
+   "name": "ir"
+  },
+  "language_info": {
+   "codemirror_mode": "r",
+   "file_extension": ".r",
+   "mimetype": "text/x-r-source",
+   "name": "R",
+   "pygments_lexer": "r",
+   "version": "3.4.0"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
+}
diff --git a/bootcamp/R/.ipynb_checkpoints/5_basic_neural_networks-checkpoint.ipynb b/bootcamp/R/.ipynb_checkpoints/5_basic_neural_networks-checkpoint.ipynb
@@ -0,0 +1,130 @@
+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Using r-tensorflow conda environment for TensorFlow installation\n",
+      "Determining latest installable release of TensorFlow...done\n",
+      "Installing TensorFlow...\n",
+      "\n",
+      "Installation complete.\n",
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "library(keras)\n",
+    "install_keras()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {
+    "collapsed": false,
+    "scrolled": true
+   },
+   "outputs": [
+    {
+     "ename": "ERROR",
+     "evalue": "Error: ImportError: cannot import name 'abs'\n\nDetailed traceback: \n  File \"/Users/caitlin/anaconda/envs/r-tensorflow/lib/python3.6/site-packages/keras/__init__.py\", line 3, in <module>\n    from . import utils\n  File \"/Users/caitlin/anaconda/envs/r-tensorflow/lib/python3.6/site-packages/keras/utils/__init__.py\", line 6, in <module>\n    from . import conv_utils\n  File \"/Users/caitlin/anaconda/envs/r-tensorflow/lib/python3.6/site-packages/keras/utils/conv_utils.py\", line 9, in <module>\n    from .. import backend as K\n  File \"/Users/caitlin/anaconda/envs/r-tensorflow/lib/python3.6/site-packages/keras/backend/__init__.py\", line 87, in <module>\n    from .tensorflow_backend import *\n  File \"/Users/caitlin/anaconda/envs/r-tensorflow/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py\", line 5, in <module>\n    import tensorflow as tf\n  File \"/Users/caitlin/anaconda/envs/r-tensorflow/lib/python3.6/site-packages/tensorflow/__init__.py\", line 22, in <module>\n    from tensorflow.python import pywrap_tensorflow  # pylint: disable=unused-import\n  File \"/Users/caitlin/anaconda/envs/r-tensorflow/lib/python3.6/site-packages/tensorflow/python/__init__.py\", line 81, in <module>\n    from tensorflow.python import keras\n  File \"/Users/caitlin/anaconda/envs/r-tensorflow/lib/python3.6/site-packages/tensorflow/python/keras/__init__.py\", line 24, in <module>\n    from tensorflow.python.keras import activations\n  File \"/Users/caitlin/anaconda/envs/r-tensorflow/lib/python3.6/site-packages/tensorflow/python/keras/activations/__init__.py\", line 22, in <module>\n    from tensorflow.python.keras._impl.keras.activations import elu\n  File \"/Users/caitlin/anaconda/envs/r-tensorflow/lib/python3.6/site-packages/tensorflow/python/keras/_impl/keras/__init__.py\", line 21, in <module>\n    from tensorflow.python.keras._impl.keras import activations\n  File \"/Users/caitlin/anaconda/envs/r-tensorflow/lib/python3.6/site-packages/tensorflow/python/keras/_impl/keras/activations.py\", line 23, in <module>\n    from tensorflow.python.keras._impl.keras import backend as K\n  File \"/Users/caitlin/anaconda/envs/r-tensorflow/lib/python3.6/site-packages/tensorflow/python/keras/_impl/keras/backend.py\", line 37, in <module>\n    from tensorflow.python.layers import base as tf_base_layers\n  File \"/Users/caitlin/anaconda/envs/r-tensorflow/lib/python3.6/site-packages/tensorflow/python/layers/base.py\", line 25, in <module>\n    from tensorflow.python.keras.engine import base_layer\n  File \"/Users/caitlin/anaconda/envs/r-tensorflow/lib/python3.6/site-packages/tensorflow/python/keras/engine/__init__.py\", line 21, in <module>\n    from tensorflow.python.keras.engine.base_layer import InputSpec\n  File \"/Users/caitlin/anaconda/envs/r-tensorflow/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py\", line 33, in <module>\n    from tensorflow.python.keras import backend\n  File \"/Users/caitlin/anaconda/envs/r-tensorflow/lib/python3.6/site-packages/tensorflow/python/keras/backend/__init__.py\", line 22, in <module>\n    from tensorflow.python.keras._impl.keras.backend import abs\n\n",
+     "output_type": "error",
+     "traceback": [
+      "Error: ImportError: cannot import name 'abs'\n\nDetailed traceback: \n  File \"/Users/caitlin/anaconda/envs/r-tensorflow/lib/python3.6/site-packages/keras/__init__.py\", line 3, in <module>\n    from . import utils\n  File \"/Users/caitlin/anaconda/envs/r-tensorflow/lib/python3.6/site-packages/keras/utils/__init__.py\", line 6, in <module>\n    from . import conv_utils\n  File \"/Users/caitlin/anaconda/envs/r-tensorflow/lib/python3.6/site-packages/keras/utils/conv_utils.py\", line 9, in <module>\n    from .. import backend as K\n  File \"/Users/caitlin/anaconda/envs/r-tensorflow/lib/python3.6/site-packages/keras/backend/__init__.py\", line 87, in <module>\n    from .tensorflow_backend import *\n  File \"/Users/caitlin/anaconda/envs/r-tensorflow/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py\", line 5, in <module>\n    import tensorflow as tf\n  File \"/Users/caitlin/anaconda/envs/r-tensorflow/lib/python3.6/site-packages/tensorflow/__init__.py\", line 22, in <module>\n    from tensorflow.python import pywrap_tensorflow  # pylint: disable=unused-import\n  File \"/Users/caitlin/anaconda/envs/r-tensorflow/lib/python3.6/site-packages/tensorflow/python/__init__.py\", line 81, in <module>\n    from tensorflow.python import keras\n  File \"/Users/caitlin/anaconda/envs/r-tensorflow/lib/python3.6/site-packages/tensorflow/python/keras/__init__.py\", line 24, in <module>\n    from tensorflow.python.keras import activations\n  File \"/Users/caitlin/anaconda/envs/r-tensorflow/lib/python3.6/site-packages/tensorflow/python/keras/activations/__init__.py\", line 22, in <module>\n    from tensorflow.python.keras._impl.keras.activations import elu\n  File \"/Users/caitlin/anaconda/envs/r-tensorflow/lib/python3.6/site-packages/tensorflow/python/keras/_impl/keras/__init__.py\", line 21, in <module>\n    from tensorflow.python.keras._impl.keras import activations\n  File \"/Users/caitlin/anaconda/envs/r-tensorflow/lib/python3.6/site-packages/tensorflow/python/keras/_impl/keras/activations.py\", line 23, in <module>\n    from tensorflow.python.keras._impl.keras import backend as K\n  File \"/Users/caitlin/anaconda/envs/r-tensorflow/lib/python3.6/site-packages/tensorflow/python/keras/_impl/keras/backend.py\", line 37, in <module>\n    from tensorflow.python.layers import base as tf_base_layers\n  File \"/Users/caitlin/anaconda/envs/r-tensorflow/lib/python3.6/site-packages/tensorflow/python/layers/base.py\", line 25, in <module>\n    from tensorflow.python.keras.engine import base_layer\n  File \"/Users/caitlin/anaconda/envs/r-tensorflow/lib/python3.6/site-packages/tensorflow/python/keras/engine/__init__.py\", line 21, in <module>\n    from tensorflow.python.keras.engine.base_layer import InputSpec\n  File \"/Users/caitlin/anaconda/envs/r-tensorflow/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py\", line 33, in <module>\n    from tensorflow.python.keras import backend\n  File \"/Users/caitlin/anaconda/envs/r-tensorflow/lib/python3.6/site-packages/tensorflow/python/keras/backend/__init__.py\", line 22, in <module>\n    from tensorflow.python.keras._impl.keras.backend import abs\n\nTraceback:\n",
+      "1. dataset_mnist()",
+      "2. keras$datasets",
+      "3. `$.python.builtin.module`(keras, \"datasets\")",
+      "4. py_resolve_module_proxy(x)",
+      "5. on_error(result)",
+      "6. stop(e$message, call. = FALSE)"
+     ]
+    }
+   ],
+   "source": [
+    "mnist <- dataset_mnist()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "\n",
+    "\n",
+    "\n",
+    "x_train <- mnist$train$x\n",
+    "y_train <- mnist$train$y\n",
+    "x_test <- mnist$test$x\n",
+    "y_test <- mnist$test$y\n",
+    "\n",
+    "# reshape\n",
+    "x_train <- array_reshape(x_train, c(nrow(x_train), 784))\n",
+    "x_test <- array_reshape(x_test, c(nrow(x_test), 784))\n",
+    "# rescale\n",
+    "x_train <- x_train / 255\n",
+    "x_test <- x_test / 255\n",
+    "\n",
+    "y_train <- to_categorical(y_train, 10)\n",
+    "y_test <- to_categorical(y_test, 10)\n",
+    "\n",
+    "model <- keras_model_sequential() \n",
+    "model %>% \n",
+    "  layer_dense(units = 256, activation = 'relu', input_shape = c(784)) %>% \n",
+    "  layer_dropout(rate = 0.4) %>% \n",
+    "  layer_dense(units = 128, activation = 'relu') %>%\n",
+    "  layer_dropout(rate = 0.3) %>%\n",
+    "  layer_dense(units = 10, activation = 'softmax')\n",
+    "\n",
+    "summary(model)\n",
+    "\n",
+    "model %>% compile(\n",
+    "  loss = 'categorical_crossentropy',\n",
+    "  optimizer = optimizer_rmsprop(),\n",
+    "  metrics = c('accuracy')\n",
+    ")\n",
+    "\n",
+    "history <- model %>% fit(\n",
+    "  x_train, y_train, \n",
+    "  epochs = 30, batch_size = 128, \n",
+    "  validation_split = 0.2\n",
+    ")\n",
+    "\n",
+    "plot(history)\n",
+    "\n",
+    "model %>% evaluate(x_test, y_test)\n",
+    "\n",
+    "model %>% predict_classes(x_test)\n",
+    "\n"
+   ]
+  }
+ ],
+ "metadata": {
+  "anaconda-cloud": {},
+  "kernelspec": {
+   "display_name": "R",
+   "language": "R",
+   "name": "ir"
+  },
+  "language_info": {
+   "codemirror_mode": "r",
+   "file_extension": ".r",
+   "mimetype": "text/x-r-source",
+   "name": "R",
+   "pygments_lexer": "r",
+   "version": "3.5.1"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
+}