This repo is for a final project in Dr. Michelle Levine's COMS4995 - Empirical Methods of Data Science course at Columbia University. It includes the code we used to build our web scrapers as well as the datasets they produced.
We have structured our work in three separate directories—bye_weeks, distance, and tnf—based on the different datasets we are analyzing. More details on each dataset can be found below.
This dataset tracks each team's performance in the game directly before (pre-bye) and the game directly after their bye week (post-bye). Our data begins in 1990 (when bye weeks were first introduced into the NFL) and ends in 2022 (the most recent NFL season).
To access our entire dataset, click here. To access subframes for each individual season, click here. The source code for our web scraper can be found here.
The features in this dataset include:
year
: the season in question (1990-2022)team
: the team in question (string)post_bye
: whether the game was pre-bye (0) or post-bye (1)week
: the week of the NFL season in which the game took place (2-16)win_pct
:team
's win percentage before the game (0.000-1.000)home_team
: whether or notteam
was the home team (1 for home, 0 for away)opp
:team
's opponent in the game (string)opp_win_pct
:opp
's win percentage before the game (0.000-1.000)result
: whetherteam
lost or won the game (1 if win, 0 otherwise)pf
: number of pointsteam
scored in the game (int)pa
: number of pointsteam
allowed in the game (int)yds
: number of yardsteam
recorded in the game (int)opp_yds
: number of yardsteam
allowed in the game (int)
This dataset tracks the results of every NFL game played from 1990-2022, aiming to analyze how distance traveled and time zones traversed affect road teams' performances.
To access our entire dataset, click here. To access subframes for each individual season, click here. The source code for our web scraper can be found here.
The features in this dataset include:
year
: the season in question (1990-2022)team
: the road team (string)week
: the week the game was played (2-18)opp
: the home team (string)distance
: distance, in miles, thatteam
had to travel to reach the game (from airport to airport) (float)time_zone_diff
: the number of time zonesteam
had to traverse to reach the game
(negative values indicate east-to-west travel, while positive values indicate west-to-east travel) (int)win_pct
:team
's win percentage going into the game in question (0.000-1.000)opp_win_pct
:opp
's win percentage going into the game in question (0.000-1.000)result
: whetherteam
won the game in question (1 if win, 0 otherwise)pf
: number of pointsteam
scored in the game in question (int)pa
: number of pointsopp
scored in the game in question (int)yds
: number of yardsteam
recorded in the game in question (int)opp_yds
: number of yardsopp
recorded in the game in question (int)
This dataset tracks each team's performance in the Thursday Night game following a Sunday game (short rest) and the game directly after that Thursday Night week (long rest). Our data begins in 2006 and ends in 2022 (the most recent NFL season).
To access our entire dataset, click here. To access subframes for each individual season, click here. The source code for our web scraper can be found here.
The features in this dataset include:
year
: the season in question (2006-2022)team
: the team in question (string)week_1
: the week of the NFL season in whichteam
's game was played (2-16)win_pct_1
:team
's win percentage going into the game (0.000-1.000)home_team_1
: whether or notteam
was the home team in the game (boolean value)opp_1
:team
's opponent in the game (string)opp_win_pct_1
:opp_1
's win percentage before the game (0.000-1.000)result_1
: whetherteam
lost or won the game (1 if win, 0 otherwise)pf_1
: number of pointsteam
scored in the game (int)pa_1
: number of pointsteam
allowed in the game (int)yds_1
: number of yardsteam
recorded in the game (int)opp_yds_1
: number of yardsteam
allowed in the game (int)tnf
: whether the game was on (1) or after (0) TNF
Robert Gao - rzg2107
Matt Kim - mjk2244
Aaron Ouyang - ao2764
Evan Tong - eht2126