IMDB Data Analysis Using SQL

Overview

This repository contains a comprehensive analysis of IMDB movie data using SQL. The analysis is designed to extract meaningful insights to assist in strategic decision-making for a global film production initiative. The project leverages structured data and SQL expertise to derive actionable intelligence from the IMDB dataset.

This work demonstrates expertise in SQL, data analysis, and business intelligence, making it a valuable addition to your pinned projects on GitHub.

Key Features

Detailed Data Analysis: Analysis performed on a dataset spanning three years, covering multiple attributes such as genres, directors, actors, production companies, and movie ratings.
Actionable Insights: Extracted critical patterns and trends to support data-driven decision-making in the entertainment industry.
Structured Process: The analytics process is divided into logical segments, each contributing to a thorough understanding of the dataset.
Scalable Design: The project framework and SQL scripts are designed for scalability and adaptability to other datasets.

Dataset Description

The IMDB dataset includes information about movies produced over the past three years. The data spans attributes such as:

Movie titles, genres, and production years
Ratings and user votes
Directors, actors, and production companies
Movie durations and release months

Highlights

Insights Derived

Most movies were produced in 2017, with March being the most active production month.
The Drama genre dominates in terms of the number of movies produced, while Action movies have the longest average duration.
Top-performing directors, production companies, and actors were identified based on ratings and other metrics.
Key global production insights, including:
- Dream Warrior Pictures and National Theater Live have produced the most hit movies.
- Marvel Studios leads in votes received for its productions.

Advanced SQL Techniques Used

Complex JOINs for multi-table analysis
Use of aggregate functions (e.g., AVG, COUNT, SUM) to summarize data
Advanced filtering with subqueries and CTEs (Common Table Expressions)
Comprehensive GROUP BY and HAVING clauses for detailed breakdowns
Analytical queries for ranking and trend detection

Repository Structure

├── IMDB_dataset_import.sql       # SQL script to create and populate the database
├── IMDB_question.sql             # SQL queries to address specific analytical objectives
├── Solution_IMDB_questions.sql   # Consolidated solutions for the analytical tasks
├── IMDb_movies_Data_and_ERD.xlsx # Dataset and Entity-Relationship Diagram
├── Executive_Summary.pdf         # Business insights and findings
└── README.md                     # Project documentation

Usage

Database Setup:
- Use IMDB_dataset_import.sql to create and populate the database.
Analysis:
- Execute IMDB_question.sql or Solution_IMDB_questions.sql to replicate the analysis and derive insights.
Review Results:
- Refer to Executive_Summary.pdf for a summary of findings.

ERD Overview

The Entity-Relationship Diagram (ERD) provides a clear representation of database relationships, making it easier to understand the dataset's structure and logical flow.

Key Business Insights

Full list of insights provided in Executive_Summary.pdf

Technologies Used

SQL: Primary language for data analysis
RDBMS: Tested on MySQL, compatible with other relational database systems
Excel: For dataset visualization and ERD creation

Future Scope

Integration with visualization tools like Tableau or Power BI for advanced analytics dashboards
Incorporation of machine learning techniques for predictive modeling
Expansion to include audience sentiment analysis and box-office trends

Author

Satvik Praveen
Data Science Enthusiast | LinkedIn

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
Executive_Summary.pdf		Executive_Summary.pdf
IMDB+dataset+import.sql		IMDB+dataset+import.sql
IMDB+question.sql		IMDB+question.sql
IMDb+movies+Data+and+ERD+final.xlsx		IMDb+movies+Data+and+ERD+final.xlsx
README.md		README.md
Solution_IMDB_questions.sql		Solution_IMDB_questions.sql

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

IMDB Data Analysis Using SQL

Overview

Key Features

Dataset Description

Highlights

Insights Derived

Advanced SQL Techniques Used

Repository Structure

Usage

ERD Overview

Key Business Insights

Technologies Used

Future Scope

Author

About

Releases

Packages

SatvikPraveen/RSVP_Case_Study

Folders and files

Latest commit

History

Repository files navigation

IMDB Data Analysis Using SQL

Overview

Key Features

Dataset Description

Highlights

Insights Derived

Advanced SQL Techniques Used

Repository Structure

Usage

ERD Overview

Key Business Insights

Technologies Used

Future Scope

Author

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages