"Taco or Burrito, that is the question ..."

Goldman Sachs Challenges project on TAMUDatathon 2019, by Peng He and Pei-Chun Lai.

Inspiration

Food is the one of the best parts of life to brings everyone together, and in Texas (or maybe the entire US?) that means BURRITOS and TACOS. This project is inspired by a public dataset on Kaggle about taco and burrito restaurants, to find meaningful insights in our everyday life around these delicious Mexican foods.

What it does

The data is a list of 19,439 restaurants and similar businesses with menu items containing "burrito" or "taco" in their names, as provided by Datafiniti's Business Database, and a full schema for the data is available in their support documentation

Using exploratory data analysis (EDA) and visualization, we bring out the "hot spot" locations, feature restaurants, and menu highlights with tacos and burritos.

How we built it

Raw taco and burrito restaurant data was imported into Jupyter notebook using Pandas, and we did initial EDA in Pandas to learn the data schema and spot the quality issues in the data.
We dropped all columns contains no data or one single unique value (e.g. "country" is always US).
The reformatted data was exported from Pandas and imported to a MySQL server for efficient data wrangling.
We imported zip code to state/city look up table from the U.S. Census public data to fix the erros in "province" column and attribute the restaurant to the correct city and state.
We performed SQL queries to extract informations on top states, top cities, chain fast food restaurants, authentic Mexican restaurants, and top menu keywords related to taco and burritos.
Extracted data was visualized: a) as US heatmaps using GeoPandas in Jupyter notebook1 notebook2 notebook3; b) as top city bar-charts in Excel spreadsheet; and c) as wordcloud of menu names in Jupyter notebook.

Challenges we ran into

11% of the data (8,499 records) has city/town name in the "province" column
Unrealistic price on the menu. Can you believe a taco will cost $1,990?
Significant ratio of missing data in most of the columns
Several columns, e.g. “categories”, have multiple entries inserted in one cell.

Accomplishments that we're proud of

We leveraged the U.S. Census public data to fix the errors in the "province" column and 100% attributed the restaurant to correct city and state.
We did vivid visualizations to highlight the hot locations and menu choices for tacos and burritos.

What we learned

Top places for tacos/burritos are California, Texas, Florida, New York City and Chicago

Texas favors tacos over burritos

Fish & Chicken Tacos are the favorites, and Burritos = Breakfast

What's next for Taco or Burrito, that is the question ...

Based on our investigation, the following issues should be pursued to improve the data quality and gain more insights:

check for misplaced decimals and other ETL errors in menu prices
use third-party data sources (such as Yelp, Google, …) or reasonable guess to remediate the missing data problem in this dataset.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
Datathon.sql		Datathon.sql
EDA.ipynb		EDA.ipynb
GS_Datathon.pptx		GS_Datathon.pptx
README.md		README.md
Restaurant_Burrito_Heatmap.ipynb		Restaurant_Burrito_Heatmap.ipynb
Restaurant_Heatmap.ipynb		Restaurant_Heatmap.ipynb
Restaurant_Taco_Heatmap.ipynb		Restaurant_Taco_Heatmap.ipynb
TopCityinState.xlsx		TopCityinState.xlsx
restaurant_count.png		restaurant_count.png
restaurant_count_burrito.png		restaurant_count_burrito.png
restaurant_count_taco.png		restaurant_count_taco.png
wc_burrito.png		wc_burrito.png
wc_taco.png		wc_taco.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

"Taco or Burrito, that is the question ..."

Inspiration

What it does

How we built it

Challenges we ran into

Accomplishments that we're proud of

What we learned

Top places for tacos/burritos are California, Texas, Florida, New York City and Chicago

Texas favors tacos over burritos

Fish & Chicken Tacos are the favorites, and Burritos = Breakfast

What's next for Taco or Burrito, that is the question ...

About

Releases

Packages

Languages

piwwww/tamudatathon19

Folders and files

Latest commit

History

Repository files navigation

"Taco or Burrito, that is the question ..."

Inspiration

What it does

How we built it

Challenges we ran into

Accomplishments that we're proud of

What we learned

Top places for tacos/burritos are California, Texas, Florida, New York City and Chicago

Texas favors tacos over burritos

Fish & Chicken Tacos are the favorites, and Burritos = Breakfast

What's next for Taco or Burrito, that is the question ...

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages