This project provides a detailed analysis of the top 10 most popular movies from an IMDb dataset. Using various visualizations and statistics, users can explore the relationships between variables like budget, revenue, vote averages, and more.
- Correlation Analysis: View a correlation matrix of key variables.
- Genre Analysis: Bar chart and pie chart showing genre distributions.
- Release Year Analysis: Bar chart showing the number of movies released per year.
- Budget vs Revenue: Scatter plot comparing the budget and revenue of the top 10 movies.
- Popularity vs Vote Average: Scatter plot comparing movie popularity with average votes.
- Summary Statistics: Descriptive statistics for the top 10 movies dataset.
-
Clone the repository:
git clone https://github.com/yourusername/top-10-movie-analyzer.git
-
Navigate to the project directory:
cd top-10-movie-analyzer
-
Install the required Python packages:
pip install pandas seaborn matplotlib
-
Download the IMDb dataset: The system uses an IMDb dataset in CSV format. Download it from IMDb datasets and ensure it's available at the appropriate location, or update the file path in the script.
-
Run the Movie Analyzer:
python top_10_movie_analyzer.py
-
Choose the type of analysis:
- Correlation matrix
- Genre distribution (bar chart/pie chart)
- Movies released per year
- Budget vs. Revenue comparison
- Popularity vs. Vote Average
- Summary statistics, and more.
-
Explore various plots: You can interactively explore different aspects of the top 10 movies by following the menu in the terminal.
- Python 3.x
- pandas
- seaborn
- matplotlib
This project uses IMDb data, which can be downloaded from the csv files uploaded here in this repo.