This project is a personal project, using concepts of web scraping, database management and machine learning, we can use data from the that can be broken down into three phases and a possible fourth phase.
Current Phase: 2
Phase 1: Web scraping
Phase 2: Creating the database
Phase 3: Predicting the MVP based on player stats
Phase 4: Creating a web application
Additional Features:
1) Creating a docker file, and create a docker container for the local database
Phase 1: By using the python libraries beautifulsoup, we can scrap data from basketball-reference.com. There are going to be multiple scrapers that scrap information like season stats, team stats, and finally player stats from 1980 to current date. The scrapers will all output to mutiple csv files in a directory called Output.
Phase 2: By using the mysql workbench and hosting the locally. Taking the csv files from the first phase, we can add it into a database and write queries for certain stats. For example, we can write a query to see who has the total most field goals in a certain year.
Phase 3: Using the database, we can retrieve player stats for a given year. With these stats, we can highlight import features, and thus put it into a random forest function. This function should the import stats, then we can use Linear Regression to predict who won MVP for a given year.
Phase 4: Using flask, creating a web application to for the scrapers, database, and for the mvp prediction. This is an optional phase that may or may not be created.
This is a guide to create a new virtual environment using conda from
https://uoa-eresearch.github.io/eresearch-cookbook/recipe/2014/11/20/conda/.
1) Check if conda is installed and in your path
If conda is installed, this should be your output.
$ conda -V
conda 3.7.0
2) Check if conda is up to date.
$ conda update conda
3) Create a virtual environment for your project,
Where yourenvname is the name of the environment and x.x is the version of python.
$ conda create -n yourenvname python=x.x anaconda
4) Activating your virtual environment.
$ source activate yourenvname
5) Install additonal Python packages to a virtual environment.
$ conda install -n yourenvname [package]
6) Deactivate and delete your virtual environment.
$ source deactivate # Deactivate your virtual environment
$ conda remove -n yourenvname -all # Deletes your virtual environment
With the code below run it to get the required libraries to run the scraper.
$ conda install --file requirements.txt
You can clone this repo and import the libraries at your own discretion.
To run each of the scrapers, the code below will show how.
$ python /your/path/BasketBall-Stats/Python_Scrapers/Create_Player_Name # Gets dataframe of player names from 1980 - current
$ python /your/path/Basketball-Stats/Run_Scraper/get_season_stats.py # Gets season stats from 1980 - current
$ python /your/path/Basketball-Stats/Run_Scraper/get_team_stats.py # Gets team stats from 1980 - current
$ python /your/path/Basketball-Stats/Run_Scraper/get_player_stats.py # Gets player stats from 1980 - current
NOTE: You must run Create_Player_Name first before any other scraper
Where your path is where you decide to store the source directory, Basketball-Stats. NOTE: The file should be ran from the root directory, aka Basketball-Stats.
Inside of the source directory,Basketball-Stats, there will be a directory called Output.
Inside of Output, should be three directories, corresponding to the name of the scrapers. More information
be in a api.md file
IMPORTANT:For any queries, the names of people with accent has been normalized
Example: Nikola Jokić will be Nikola Jokic
TBD
TBD
TBD