This repository contains a comprehensive analysis of IMDB movie data using SQL. The analysis is designed to extract meaningful insights to assist in strategic decision-making for a global film production initiative. The project leverages structured data and SQL expertise to derive actionable intelligence from the IMDB dataset.
This work demonstrates expertise in SQL, data analysis, and business intelligence, making it a valuable addition to your pinned projects on GitHub.
- Detailed Data Analysis: Analysis performed on a dataset spanning three years, covering multiple attributes such as genres, directors, actors, production companies, and movie ratings.
- Actionable Insights: Extracted critical patterns and trends to support data-driven decision-making in the entertainment industry.
- Structured Process: The analytics process is divided into logical segments, each contributing to a thorough understanding of the dataset.
- Scalable Design: The project framework and SQL scripts are designed for scalability and adaptability to other datasets.
The IMDB dataset includes information about movies produced over the past three years. The data spans attributes such as:
- Movie titles, genres, and production years
- Ratings and user votes
- Directors, actors, and production companies
- Movie durations and release months
- Most movies were produced in 2017, with March being the most active production month.
- The Drama genre dominates in terms of the number of movies produced, while Action movies have the longest average duration.
- Top-performing directors, production companies, and actors were identified based on ratings and other metrics.
- Key global production insights, including:
- Dream Warrior Pictures and National Theater Live have produced the most hit movies.
- Marvel Studios leads in votes received for its productions.
- Complex JOINs for multi-table analysis
- Use of aggregate functions (e.g.,
AVG
,COUNT
,SUM
) to summarize data - Advanced filtering with subqueries and CTEs (Common Table Expressions)
- Comprehensive GROUP BY and HAVING clauses for detailed breakdowns
- Analytical queries for ranking and trend detection
├── IMDB_dataset_import.sql # SQL script to create and populate the database
├── IMDB_question.sql # SQL queries to address specific analytical objectives
├── Solution_IMDB_questions.sql # Consolidated solutions for the analytical tasks
├── IMDb_movies_Data_and_ERD.xlsx # Dataset and Entity-Relationship Diagram
├── Executive_Summary.pdf # Business insights and findings
└── README.md # Project documentation
- Database Setup:
- Use
IMDB_dataset_import.sql
to create and populate the database.
- Use
- Analysis:
- Execute
IMDB_question.sql
orSolution_IMDB_questions.sql
to replicate the analysis and derive insights.
- Execute
- Review Results:
- Refer to
Executive_Summary.pdf
for a summary of findings.
- Refer to
The Entity-Relationship Diagram (ERD) provides a clear representation of database relationships, making it easier to understand the dataset's structure and logical flow.
- Full list of insights provided in
Executive_Summary.pdf
- SQL: Primary language for data analysis
- RDBMS: Tested on MySQL, compatible with other relational database systems
- Excel: For dataset visualization and ERD creation
- Integration with visualization tools like Tableau or Power BI for advanced analytics dashboards
- Incorporation of machine learning techniques for predictive modeling
- Expansion to include audience sentiment analysis and box-office trends
- Satvik Praveen
Data Science Enthusiast | LinkedIn