Within the five boroughs that make up New York City, around 8.2 million people dwell. Thousands of accidents occur each year as a result of a variety of circumstances. The NYPD gathers data on each of these accidents and makes it available to the public on nycopendata.socrata.com. We decided to dig deeper into the crash data to see if there were any underlying patterns or relationships that could explain the high frequency of collisions. From July 2012
to March 2022
, the data included almost 2,00,000
observations.
- We accessed the data from "nycopendata.com" using
Open Data API
(OData API) and performed Data Connection with Tableau. - Then, we cleaned the data using Python and stored it in
Google Cloud Storage
as a Bucket to create a virtual instance. - We performed our analysis using
Google's Big Query
in Google Cloud Platform and stored the query results in CSV files. - After our analysis, we have generated a report using Google Sites to share our Insights and give Recommendations.
Tableau Story Link: https://public.tableau.com/app/profile/aditya.agarwal1269/viz/NYPDMotorCollisionProject/Story1
Report Link (Google Slides): https://drive.google.com/file/d/16r6KAuHcV5lPYZfqCMQMxOaRkbRwGH77/view?usp=sharing
The Motor Vehicle Collisions crash table contains details on the crash event. Each row represents a crash event. The Motor Vehicle Collisions data tables contain information from all police reported motor vehicle collisions in NYC. The police report (MV104-AN) is required to be filled out for collisions where someone is injured or killed, or where there is at least $1000 worth of damage.
Dataset link: https://data.cityofnewyork.us/Public-Safety/Motor-Vehicle-Collisions-Crashes/h9gi-nx95
-
Understanding the Data - It is important to understand our data and our problem statement i.e., how to decrease the number of injuries and deaths in New York City.
-
Preparing the Data - After understanding our dataset, it is essential to prepare the data. We have used GCP Big Query to remove null values and duplicate entries.
-
Perform Analysis - We have carried out a Time-series analysis and made dashboards to understand more about the factors and causes of Motor collisions in New York City.
-
Get Insights - We generated interactive tableau dashboards to support our findings and get insights from the data.
-
Give Recommendations - Based on our analysis, we will provide recommendations to decrease the number of Motor collisions.
Before exploring the data, we created a list of questions we wanted to address:
- Is there a trend in the number of accidents?
- Is there a relationship between the time of day and the contributing factors of the accident? (
Time Series Analysis
) - Which areas are more "Collision-prone" areas? (
Collision prone analysis
)
- Analysis performed using
Google Cloud Platform
:
A) Most Injuries and Deaths were caused due to which Vehicle type?
B) Most of the collisions was caused due to which factor?
- Analysis performed using
Tableau
:
A) Detecting Collision-Prone Areas -
B) Time Series Analysis -
- Between
4 pm to 5 pm
was the peak time of the day when the maximum number of people got injured. - The number of people getting injured was
rising from 2012
and was at its peak in2018
with a value of123,859 injuries
. - In 2018, the total number of injured people decreased to
29,604
injuries in 2022. - The highest number of deaths and injuries were majorly caused by a lack of
Driver’s attention.
The other factors also point toward the Driver’s lack of driving skills. - Most of the accidents were caused by
Sports utility/Station wagon
vehicles, followed bySedan
andPassenger vehicles
. - Also,
4 - wheeled
vehicles were more prone to accidents than2 - wheeled
vehicles.
- Increase the number of
Traffic Officers
between4 pm and 6 pm
on days with the highest accident rates. - Raise the availability of ambulances between
1 pm to 5 pm
in collision-prone areas. - Provide a more robust and efficient
Public transit system
to encourage usage by commuters. - Focus on high collision-prone areas such as
11236
,11207
, and11234
in prioritizing new projects like traffic lights or street signs. - Increase the frequency of
driver re-training
and morestrict fines
for repeat offenders. - Increase the
awareness about the use of public transport
the commuters instead of walking or using personal vehicles to reduce accidents. - Among all the boroughs,
BROOKLYN
andQUEENS
had the highest number of deaths in New York City.