Skip to content

Analyzed IMDb data using Python to uncover trends in title releases, genres, and runtimes, offering insights into evolving viewer preferences. Applied regression models like linear, polynomial, and random forest to predict title ratings and identify key factors influencing audience behavior.

Notifications You must be signed in to change notification settings

JESUSC1/IMDb-Data-Insights

Repository files navigation

Comprehensive IMDb Data Analysis: Insights into Movie Trends and Pattern

IMDb Image

Dove deep into IMDb data using Python and visualization tools, unveiling title release patterns and viewer predilections. Applied regression models to predict title ratings, and set the groundwork for building recommender systems for TV shows/movies or revenue prediction models using IMDb data.

Data Source

The primary data source for this analysis is IMDb, an extensive online database that provides detailed information about films, TV series, podcasts, video games, and other media content.

Analysis

  • Conducted exploratory data analysis (EDA) to determine the dataset's time span and classify titles by type and genre.
  • Visualized the number of titles released each year, identifying predominant title types like TV episodes, movies, and short films.
  • Explored viewer preferences, determining genres like Drama, Comedy, and Documentary as the most popular.
  • Analyzed title runtime trends over the years, highlighting shifts in movie and TV episode durations.

Libraries Used

The analysis utilizes the following Python libraries and packages:

  • Seaborn: For enhanced data visualization.
  • Sklearn: For machine learning and data preprocessing (mean_squared_error, LinearRegression, PolynomialFeatures, RandomForestRegressor, train_test_split, OneHotEncoder).
  • Matplotlib: For data visualization.
  • Numpy: For numerical computations.
  • Pandas: For data manipulation and analysis.
  • Urllib: For URL handling and web access.
  • OS: For interacting with the operating system.
  • IO: For handling streams.
  • Gzip: For working with gzipped files.
  • Zipfile: For extracting and creating zip archives.

Key Achievements

  • Successfully analyzed and visualized IMDb data, uncovering key trends and patterns in title releases and viewer preferences.
  • Applied regression models, including linear, polynomial, and random forest, to predict title ratings based on runtime, gaining insights into factors influencing viewer ratings.

Conclusion

The "IMDb-Data-Analysis-Exercise-Part-1" provides an in-depth look into IMDb data, revealing valuable insights into media consumption trends, viewer preferences, and title characteristics. This foundational analysis sets the stage for more advanced studies, including the development of recommendation systems.

Future Work

The next phase, "Part 2", will focus on obtaining data directly from the IMDb database using their API. It will also delve into creating a recommender system for TV shows/movies using the comprehensive IMDb dataset.

Note

To fully understand the conclusions drawn in this analysis, it is recommended to go through the entire notebook, including the code and its outputs. You can view the HTML version of the notebook here.

Author

Jesus Cantu Jr.

Last Updated

June 6, 2023

About

Analyzed IMDb data using Python to uncover trends in title releases, genres, and runtimes, offering insights into evolving viewer preferences. Applied regression models like linear, polynomial, and random forest to predict title ratings and identify key factors influencing audience behavior.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published