a recommender system, or a recommendation system (sometimes replacing 'system' with a synonym such as platform or engine), is a subclass of information filtering system that seeks to predict the "rating" or "preference" a user would give to an item. They are primarily used in commercial applications
import numpy as np
import pandas as pd
ratings_data=pd.read_csv(r'C:\Users\TTos\Desktop\dataset\ml-latest-small\ml-latest-small\ratings.csv')
ratings_data.head()
userId | movieId | rating | timestamp | |
---|---|---|---|---|
0 | 1 | 1 | 4.0 | 964982703 |
1 | 1 | 3 | 4.0 | 964981247 |
2 | 1 | 6 | 4.0 | 964982224 |
3 | 1 | 47 | 5.0 | 964983815 |
4 | 1 | 50 | 5.0 | 964982931 |
movies_names=pd.read_csv(r'C:\Users\TTos\Desktop\dataset\ml-latest-small\ml-latest-small\movies.csv')
movies_names.head()
movieId | title | genres | |
---|---|---|---|
0 | 1 | Toy Story (1995) | Adventure|Animation|Children|Comedy|Fantasy |
1 | 2 | Jumanji (1995) | Adventure|Children|Fantasy |
2 | 3 | Grumpier Old Men (1995) | Comedy|Romance |
3 | 4 | Waiting to Exhale (1995) | Comedy|Drama|Romance |
4 | 5 | Father of the Bride Part II (1995) | Comedy |
movies_data=pd.merge(ratings_data,movies_names,on="movieId")
movies_data.head()
userId | movieId | rating | timestamp | title | genres | |
---|---|---|---|---|---|---|
0 | 1 | 1 | 4.0 | 964982703 | Toy Story (1995) | Adventure|Animation|Children|Comedy|Fantasy |
1 | 5 | 1 | 4.0 | 847434962 | Toy Story (1995) | Adventure|Animation|Children|Comedy|Fantasy |
2 | 7 | 1 | 4.5 | 1106635946 | Toy Story (1995) | Adventure|Animation|Children|Comedy|Fantasy |
3 | 15 | 1 | 2.5 | 1510577970 | Toy Story (1995) | Adventure|Animation|Children|Comedy|Fantasy |
4 | 17 | 1 | 4.5 | 1305696483 | Toy Story (1995) | Adventure|Animation|Children|Comedy|Fantasy |
movies_data.groupby('title')['rating'].mean().head()
title
'71 (2014) 4.0
'Hellboy': The Seeds of Creation (2004) 4.0
'Round Midnight (1986) 3.5
'Salem's Lot (2004) 5.0
'Til There Was You (1997) 4.0
Name: rating, dtype: float64
movies_data.groupby('title')['rating'].mean().sort_values(ascending=False).head()
title
Karlson Returns (1970) 5.0
Winter in Prostokvashino (1984) 5.0
My Love (2006) 5.0
Sorority House Massacre II (1990) 5.0
Winnie the Pooh and the Day of Concern (1972) 5.0
Name: rating, dtype: float64
ratings_mean_count=pd.DataFrame(movies_data.groupby('title')['rating'].mean())
ratings_mean_count['rating_count']=pd.DataFrame(movies_data.groupby('title')['rating'].count())
ratings_mean_count.head()
rating | rating_count | |
---|---|---|
title | ||
'71 (2014) | 4.0 | 1 |
'Hellboy': The Seeds of Creation (2004) | 4.0 | 1 |
'Round Midnight (1986) | 3.5 | 2 |
'Salem's Lot (2004) | 5.0 | 1 |
'Til There Was You (1997) | 4.0 | 2 |
Correlation is a statistical technique that can show whether and how strongly pairs of variables are related. For example, height and weight are related; taller people tend to be heavier than shorter people. The relationship isn't perfect
x=[ float(_) for _ in ratings_mean_count['rating_count']]
y=[ _ for _ in ratings_mean_count['rating']]
x=pd.Series(x)
y=pd.Series(y)
Pearson’s r Value | Correlation Between x and y |
---|---|
equal to 1 | perfect positive linear relationship |
greater than 0 | positive correlation |
equal to 0 | independent |
less than 0 | negative correlation |
equal to -1 | perfect negative linear relationship |
r=x.corr(y)
r2=x.corr(y, method='spearman')
r3=x.corr(y, method='kendall')
from pprint import pprint
pprint(
{
"Pearson's r":r,
"Spearman's rho":r2,
"Kendall's tau":r3
})
{"Kendall's tau": 0.037132866375530676,
"Pearson's r": 0.12730726667013137,
"Spearman's rho": 0.0397780088264808}
corrChek=lambda r:False if r <0 else True
from random import randint
def Recommander():
if any([corrChek(r),corrChek(r2),corrChek(r3)]) is True:
return ratings_mean_count['rating'].head()
else:
return ratings_mean_count['rating'][::-randint(0,len(ratings_mean_count['rating']))]
Recommander()
title
'71 (2014) 4.0
'Hellboy': The Seeds of Creation (2004) 4.0
'Round Midnight (1986) 3.5
'Salem's Lot (2004) 5.0
'Til There Was You (1997) 4.0
Name: rating, dtype: float64