Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Project_Outline.ipynb #54

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
193 changes: 192 additions & 1 deletion Project_Outline.ipynb
Original file line number Diff line number Diff line change
@@ -1 +1,192 @@
{"nbformat":4,"nbformat_minor":0,"metadata":{"colab":{"name":"Project Outline.ipynb","provenance":[],"authorship_tag":"ABX9TyPZl4d0nA5Qmq8X1mDqSb1O"},"kernelspec":{"name":"python3","display_name":"Python 3"},"language_info":{"name":"python"}},"cells":[{"cell_type":"markdown","source":["# **Title of Project**"],"metadata":{"id":"dqZ-nhxiganh"}},{"cell_type":"markdown","source":["-------------"],"metadata":{"id":"gScHkw6jjrLo"}},{"cell_type":"markdown","source":["## **Objective**"],"metadata":{"id":"Xns_rCdhh-vZ"}},{"cell_type":"markdown","source":[""],"metadata":{"id":"9sPvnFM1iI9l"}},{"cell_type":"markdown","source":["## **Data Source**"],"metadata":{"id":"-Vbnt9CciKJP"}},{"cell_type":"markdown","source":[""],"metadata":{"id":"sGcv5WqQiNyl"}},{"cell_type":"markdown","source":["## **Import Library**"],"metadata":{"id":"r7GrZzX0iTlV"}},{"cell_type":"code","source":[""],"metadata":{"id":"UkK6NH9DiW-X"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["## **Import Data**"],"metadata":{"id":"9lHPQj1XiOUc"}},{"cell_type":"code","source":[""],"metadata":{"id":"zcU1fdnGho6M"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["## **Describe Data**"],"metadata":{"id":"7PUnimBoiX-x"}},{"cell_type":"code","source":[""],"metadata":{"id":"kG15arusiZ8Z"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["## **Data Visualization**"],"metadata":{"id":"oBGX4Ekniriz"}},{"cell_type":"code","source":[""],"metadata":{"id":"lW-OIRK0iuzO"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["## **Data Preprocessing**"],"metadata":{"id":"UqfyPOCYiiww"}},{"cell_type":"code","source":[""],"metadata":{"id":"3cyr3fbGin0A"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["## **Define Target Variable (y) and Feature Variables (X)**"],"metadata":{"id":"2jXJpdAuiwYW"}},{"cell_type":"code","source":[""],"metadata":{"id":"QBCakTuli57t"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["## **Train Test Split**"],"metadata":{"id":"90_0q_Pbi658"}},{"cell_type":"code","source":[""],"metadata":{"id":"u60YYaOFi-Dw"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["## **Modeling**"],"metadata":{"id":"cIhyseNria7W"}},{"cell_type":"code","source":[""],"metadata":{"id":"Toq58wpkjCw7"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["## **Model Evaluation**"],"metadata":{"id":"vhAwWfG0jFun"}},{"cell_type":"code","source":[""],"metadata":{"id":"lND3jJj_jhx4"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["## **Prediction**"],"metadata":{"id":"8AzwG7oLjiQI"}},{"cell_type":"code","source":[""],"metadata":{"id":"JLebGzDJjknA"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["## **Explaination**"],"metadata":{"id":"SBo38CJZjlEX"}},{"cell_type":"markdown","source":[""],"metadata":{"id":"Ybi8FR9Kjv00"}}]}
TITLE : movie recommendation system

Libraries:

1. Pandas for data manipulation
2. NumPy for numerical computations
3. Scikit-learn for machine learning
4. Matplotlib and Seaborn for visualization

Dataset:

We'll use the MovieLens dataset, which contains:

1. 100,000 ratings from 600 users on 9,000 movies
2. Movie genres, titles, and release years

Project Structure:

1. Data Preprocessing


import pandas as pd

# Load ratings and movies datasets
ratings = pd.read_csv('ratings.csv')
movies = pd.read_csv('movies.csv')

# Merge datasets on movie ID
data = pd.merge(ratings, movies, on='movieId')


2. Exploratory Data Analysis


import matplotlib.pyplot as plt
import seaborn as sns

# Visualize rating distribution
sns.histplot(data['rating'])
plt.show()

# Top 10 most rated movies
top_movies = data['movieId'].value_counts().head(10)
print(top_movies)


3. Recommendation Algorithm

We'll implement a simple collaborative filtering algorithm using Matrix Factorization.


from sklearn.decomposition import NMF

# Create user-item matrix
user_item_matrix = pd.pivot_table(data, values='rating', index='userId', columns='movieId')

# Apply Matrix Factorization
nmf = NMF(n_components=10)
user_features = nmf.fit_transform(user_item_matrix)
movie_features = nmf.components_.T


4. Recommendation Generation


def recommend_movies(user_id, num_recommendations):
# Get user features
user_feature = user_features[user_id]

# Calculate similarities with all movies
similarities = cosine_similarity(user_feature.reshape(1, -1), movie_features)

# Get top N recommended movies
recommended_movies = movies.iloc[similarities.argsort()[-num_recommendations:]]

return recommended_movies

# Test recommendation
recommended_movies = recommend_movies(1, 5)
print(recommended_movies)


5. Evaluation


from sklearn.metrics import precision_score, recall_score

# Evaluate recommendation accuracy
precision = precision_score(data['rating'], recommended_movies['rating'])
recall = recall_score(data['rating'], recommended_movies['rating'])

print(f'Precision: {precision:.3f}, Recall: {recall:.3f}')


Example Use Cases:

1. User-based recommendations: Recommend movies to users based on their past ratings.
2. Movie-based recommendations: Recommend similar movies to users who liked a particular movie.
3. Hybrid recommendations: Combine user-based and movie-based recommendations.

Future Enhancements:

1. Incorporate additional data (e.g., genres, directors, actors)
2. Implement more advanced algorithms (e.g., Deep Learning, Graph-based methods)
3. Develop a web application for interactive recommendations
Here's the complete code for the movie recommendation system:


# Import necessary libraries
import pandas as pd
import numpy as np
from sklearn.decomposition import NMF
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.metrics import precision_score, recall_score
import matplotlib.pyplot as plt
import seaborn as sns

# Load MovieLens dataset
ratings = pd.read_csv('ratings.csv')
movies = pd.read_csv('movies.csv')

# Merge datasets on movie ID
data = pd.merge(ratings, movies, on='movieId')

# Exploratory Data Analysis
sns.histplot(data['rating'])
plt.title('Rating Distribution')
plt.show()

# Top 10 most rated movies
top_movies = data['movieId'].value_counts().head(10)
print('Top 10 Most Rated Movies:')
print(top_movies)

# Create user-item matrix
user_item_matrix = pd.pivot_table(data, values='rating', index='userId', columns='movieId')

# Apply Matrix Factorization
nmf = NMF(n_components=10)
user_features = nmf.fit_transform(user_item_matrix)
movie_features = nmf.components_.T

# Recommendation function
def recommend_movies(user_id, num_recommendations):
# Get user features
user_feature = user_features[user_id]

# Calculate similarities with all movies
similarities = cosine_similarity(user_feature.reshape(1, -1), movie_features)

# Get top N recommended movies
recommended_movies = movies.iloc[similarities.argsort()[-num_recommendations:]]

return recommended_movies

# Test recommendation
recommended_movies = recommend_movies(1, 5)
print('Recommended Movies for User 1:')
print(recommended_movies)

# Evaluate recommendation accuracy
precision = precision_score(data['rating'], recommended_movies['rating'])
recall = recall_score(data['rating'], recommended_movies['rating'])

print(f'Precision: {precision:.3f}, Recall: {recall:.3f}')


Dataset:

You can download the MovieLens dataset from:

(link unavailable)

Note:

- This code uses the ml-latest-small dataset, which contains 100,000 ratings.
- For larger datasets, consider using more advanced techniques or optimized libraries.

Requirements:

- Python 3.x
- Pandas
- NumPy
- Scikit-learn
- Matplotlib
- Seaborn

Future Enhancements:

1. Incorporate additional data (e.g., genres, directors, actors)
2. Implement more advanced algorithms (e.g., Deep Learning, Graph-based methods)
3. Develop a web application for interactive recommendations