machine-learning-with-stock-data

This is a ML model built using S&P 500 stock data. S&P 500 stock data are pulled using Yahoo-Finance API

Getting Started

This is a Python based project and I highly recommend you to use Anaconda platform since it allows you to handle python modules with ease.

Prerequisites

A decent Python platform with fundamental Python Knowledge
Basic API knowledge
Python Modules
- Numpy
- Pandas
- Pandas-datareader
- Matplotlib
- BeautifulSoup4
- sklearn (scikit-learn)
- yfinance (yahoo-finance)

Machine Learning Details

Model Type

Supervised with classified outputs (buy, hold, sell)

Methods used

cross_validation; allows us to create shuffled training and testing samples. This is important since we can avoid testing the alogrithm on the same data as we used for training.
LinearSVC, KNeighborsClassifier, RandomForestClassifier; classifiers used to predict.
VotingClassifier; lets all 3 classifiers above to vote on what each thinks the class is for the feature sets.

Feature Engineering

Remove unecessary data; we only need adj_close column since we want to predict based on previous closed values.
Generate a correlation table to see if you can identify any relationships.
Fill in the missing data with 0. Some companies may not have existed nor gone public in the time period we have chosen to get data.
Our features are the pricing changes(in percentage) from the previous day for all companies. Therefore, we normalize it.
Some normalized values will be infinite due to the 0 values that we've previously filled; convert these to NaNs and drop them later.
Our labels will be 1, 0, and -1 which indicate buy, hold, and sell.

Evaluation

This model's accuracy varies roughly between 37% and 49% depending on the company we choose to predict. The results are not very satisfying and this could be due to multiple reasons. We have built a model using data from 505 different companies. Certainly, some companies have relationships and strong correlations with each other; however, in general, different companies behave diffrently and it is not easy to come up with a single general model for 505 different companies. I recommend grouping companies into industrial categories -such as tech, pharmacy, banking, and etc- and generate a model in each category to improve accuracy.

Acknowledgement

Sentdex

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
stock_dfs		stock_dfs
README.md		README.md
TSLA.csv		TSLA.csv
sAndp500.py		sAndp500.py
sp500_joined_closes.csv		sp500_joined_closes.csv
sp500tickers.pickle		sp500tickers.pickle
stock.py		stock.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

machine-learning-with-stock-data

Getting Started

Prerequisites

Machine Learning Details

Model Type

Methods used

Feature Engineering

Evaluation

Acknowledgement

About

Releases

Packages

Languages

jongwoojeff/machine-learning-with-stock-data

Folders and files

Latest commit

History

Repository files navigation

machine-learning-with-stock-data

Getting Started

Prerequisites

Machine Learning Details

Model Type

Methods used

Feature Engineering

Evaluation

Acknowledgement

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages