-
Notifications
You must be signed in to change notification settings - Fork 112
/
collaborative-filter.Rmd
61 lines (38 loc) · 3.26 KB
/
collaborative-filter.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
---
title: "collaborative-filter"
author: "Charles Lang"
date: "1/31/2019"
output: html_document
---
In HUDK4051 there are six units, we will use your ratings of these units in terms of both interest and difficulty to produce individual suggestions about what unit to attempt next.
Start by uploading both the interest and difficulty csv files:
```{r}
```
We will be using matrix operations in this assignment, so convert your data frames to matrices:
```{r, echo = FALSE}
#HINT: First you will need to remove the student ids as matrices can only contain one data type. You will then need to rename your row names with the student ids.
```
First, lets look at the interest data. We can generate a user-based similarity matrix based on cosine similarity using the ratings the class gave each unit. This matrix will represent the similarity of interests between students in the class.
```{r, echo = FALSE}
#First let's transpose the matrix so that multiplication occurs by students rather than units.
#Look at your data, missing values are coded with zero not NA. Investigate cosine similarity and decide whether your missing values should be coded with zeros or NAs. Explain your choice.
I2 <- t(I2)
#Then we can generate the cosine similarity values for each pair of students
#install.packages("lsa") #You will need to install the lsa package to access the cosine command.
library(lsa)
I.SIM <- cosine(I2) #This command generates the cosine similarity values as a new matrix. Click on I.SIM in the Global Environment pane to see what it looks like.
diag(I.SIM) <- NA #Since each student will be most similar to themselves we want to remove that information
```
Now, we can make a quick query to find out which students are most similar to you.
```{r}
my.name <- "XXXX" #Input your name as it appears in the data set
head(rownames(I.SIM[order(I.SIM[my.name,], decreasing = TRUE),]), n = 2) #This code orders the column of the matrix corresponding to your UNI according to similarity and returns the top two UNI ids for the students who's interests are most similar to yours
```
This is a basic collaborative filter! You have used information about interest across the class to generate an individual suggestion. Email one of your top matches, you may find them to be a good person to work with or ask questions during the semester.
Now create a unit-based, rather than student-based similarity matrix for difficulty. Then use your similarity matrix to provide a suggested next unit to a student who is looking for the unit that is most similar in terms of difficulty to the "prediction" unit.
```{r}
```
Finally, educational settings have important differences to purely commercial settings such as film or product suggestions. In education we want people not to just follow their interests as they may simply choose things that are easy for them so they learn very little. To reduce this possibility with your collaborative filter create a composite measure from interest and difficulty, then construct a similarity matrix using this measure. (HINT: PCA). Once you have built the similarity matrix generate a suggestion for a student who has just completed the "prediction" unit.
```{r}
```
Once you have completed your collaborative filter you can return to it each time you are choosing a new unit to complete.