Skip to content

Commit

Permalink
Stock Price Prediction using Machine Learning (#387)
Browse files Browse the repository at this point in the history
## Pull Request for PyVerse 💡

### Requesting to submit a pull request to the PyVerse repository.

---

#### Issue Title
Implementing Machine Learning Models for Stock Market Prediction


**Please enter the title of the issue related to your pull request.**  
*Enter the issue title here.*

- [ yes] I have provided the issue title.

---

#### Info about the Related Issue
**What's the goal of the project?**  
*Describe the aim of the project.*

- [ ] I have described the aim of the project.
o build and compare various machine learning models to predict stock
prices and trends based on historical stock market data, evaluating the
performance of each model using key metrics like accuracy, mean absolute
error (MAE), and root mean squared error (RMSE).

🔴 Brief Explanation:
The goal of this project is to implement and compare multiple machine
learning models for predicting stock prices and identifying trends. The
stock market dataset will be used to forecast continuous variables, such
as stock prices (regression), and to classify conditions, such as
whether the price will increase or decrease (classification).

Models to be Evaluated:
Classification Models:

Logistic Regression
Random Forest Classifier
Support Vector Machine (SVM)
k-Nearest Neighbors (k-NN)
Neural Networks
Gradient Boosting Classifier
Regression Models:

Linear Regression
Decision Trees
Random Forest Regression
Support Vector Regression (SVR)
Gradient Boosting Regression
Evaluation Metrics:
For classification tasks: Accuracy, Precision, and F1 Score
---

#### Name
**Please mention your name.**  
*Enter your name here.*

- [ ] I have provided my name.
Benak Deepak
---

#### GitHub ID
**Please mention your GitHub ID.**  
*Enter your GitHub ID here.*
BenakDeepak
- [ ] I have provided my GitHub ID.
#151528559
---

#### Email ID
**Please mention your email ID for further communication.**  
*Enter your email ID here.*
[email protected]
- [ ] I have provided my email ID.

---

#### Identify Yourself
**Mention in which program you are contributing (e.g., WoB, GSSOC, SSOC,
SWOC).**
*Enter your participant role here.*
GSSOC
- [ ] I have mentioned my participant role.
GSSOC
---

#### Closes
**Enter the issue number that will be closed through this PR.**  
*Closes: #issue-number*

- [✅ ] I have provided the issue number.
#330 
---

#### Describe the Add-ons or Changes You've Made
**Give a clear description of what you have added or modified.**  
*Describe your changes here.*

- [ ✅ ] I have described my changes.
i have added an ML model in machine learning repository with 3 files
readme,requirement and main.py
---

#### Type of Change
**Select the type of change:**  
- [ ] Bug fix (non-breaking change which fixes an issue)
- [✅  ] New feature (non-breaking change which adds functionality)
- [ ] Code style update (formatting, local variables)
- [ ] Breaking change (fix or feature that would cause existing
functionality to not work as expected)
- [ ] This change requires a documentation update

---

#### How Has This Been Tested?
**Describe how your changes have been tested.**  
*Describe your testing process here.*
I have hosted in localhost
- [ ✅ ] I have described my testing process.

---

#### Checklist
**Please confirm the following:**  
- [ ✅ ] My code follows the guidelines of this project.
- [ ✅ ] I have performed a self-review of my own code.
- [ ✅ ] I have commented my code, particularly wherever it was hard to
understand.
- [ ✅ ] I have made corresponding changes to the documentation.
- [ ✅ ] My changes generate no new warnings.
- [✅ ] I have added things that prove my fix is effective or that my
feature works.
- [ ✅ ] Any dependent changes have been merged and published in
downstream modules.
  • Loading branch information
UTSAVS26 authored Oct 11, 2024
2 parents 9073c40 + 2f424fa commit 23250da
Show file tree
Hide file tree
Showing 3 changed files with 128 additions and 0 deletions.
63 changes: 63 additions & 0 deletions Machine_Learning/Analying_ML_model_using_maang/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
ML Project Title: Stock Price Prediction using Machine Learning
🎯 Goal
The main goal of this project is to develop machine learning models for predicting stock prices based on historical data. The objective is to classify whether the stock price will increase or decrease the next day and to predict the actual stock prices for regression.

🧵 Dataset
This project utilizes a dataset containing historical stock market data. The dataset includes the following columns:

Date: The date of the stock prices.
Open: The opening price of the stock.
High: The highest price of the stock during the day.
Low: The lowest price of the stock during the day.
Close: The closing price of the stock.
Volume: The number of shares traded.
Target: The target variable for regression (stock prices).
🧾 Description
This project involves implementing a stock price prediction system using machine learning techniques with Python. Key components include:

Data Loading: Loading the dataset from a CSV file and preprocessing it.
Feature Engineering: Selecting relevant features and defining the target variables for both classification and regression tasks.
Missing Values Handling: Using Simple Imputer to address missing values in the dataset.
Data Splitting: Dividing the dataset into training and testing sets for both classification and regression tasks.
Feature Scaling: Scaling the features using StandardScaler to improve model performance.
🧮 What You Have Done
Implemented data preprocessing steps, including date conversion and setting the date as an index.
Constructed features for predicting stock price movement (classification) and actual prices (regression).
Handled missing values in the dataset effectively.
Split the data into training and testing sets to evaluate model performance.
Scaled the features to standardize the input for machine learning algorithms.
🚀 Models Implemented
The following machine learning classification models were implemented to predict stock price movements:

Logistic Regression: A model to predict the probability of stock price increase or decrease.
Random Forest: An ensemble model that improves prediction accuracy through multiple decision trees.
Support Vector Machine (SVM): A powerful classifier that works well on high-dimensional spaces.
k-Nearest Neighbors (k-NN): A simple yet effective model for classification based on feature similarity.
Neural Networks (MLP): A multi-layer perceptron for complex classification tasks.
Gradient Boosting: An ensemble technique that builds models sequentially to minimize prediction error.
📚 Libraries Needed
Pandas: For data manipulation and analysis.
NumPy: For numerical computations.
Scikit-learn: For implementing machine learning algorithms, preprocessing, and evaluation metrics.
Matplotlib: For data visualization (if used for visualizing results).
📊 Exploratory Data Analysis Results
Although exploratory data analysis (EDA) is not detailed in the code, it is recommended to explore:

The distribution of stock prices.
Correlations between different features.
Trends over time.
📈 Performance of the Models Based on Accuracy Scores
The models were evaluated based on their performance metrics:

Logistic Regression: Accuracy: [insert accuracy], Precision: [insert precision], F1 Score: [insert F1 score].
Random Forest: Accuracy: [insert accuracy], Precision: [insert precision], F1 Score: [insert F1 score].
SVM: Accuracy: [insert accuracy], Precision: [insert precision], F1 Score: [insert F1 score].
k-NN: Accuracy: [insert accuracy], Precision: [insert precision], F1 Score: [insert F1 score].
Neural Networks: Accuracy: [insert accuracy], Precision: [insert precision], F1 Score: [insert F1 score].
Gradient Boosting: Accuracy: [insert accuracy], Precision: [insert precision], F1 Score: [insert F1 score].
📢 Conclusion
This project successfully demonstrates the use of machine learning techniques to predict stock price movements and actual prices. The models trained provide insights into the effectiveness of different algorithms in this domain. Future work may include refining model parameters, incorporating additional features, or exploring other algorithms to enhance predictive performance.

✒️ Your Signature
Benak Deepak
https://www.linkedin.com/in/benak-deepak-210918254/
61 changes: 61 additions & 0 deletions Machine_Learning/Analying_ML_model_using_maang/main.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression, LinearRegression
from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor, GradientBoostingClassifier, GradientBoostingRegressor
from sklearn.svm import SVC, SVR
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeRegressor
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import accuracy_score, precision_score, f1_score, mean_absolute_error, mean_squared_error
from sklearn.impute import SimpleImputer
import matplotlib.pyplot as plt

# Load dataset
df = pd.read_csv('C:\\Users\\Deepak\\Desktop\\merge\\merged_output.csv') # Update with your dataset path

# Feature engineering and data preparation
# Assuming the dataset has the following columns: 'Date', 'Open', 'High', 'Low', 'Close', 'Volume', 'Target'
# You can change this according to your actual dataset
df['Date'] = pd.to_datetime(df['Date'])
df.set_index('Date', inplace=True)

# Selecting features and target
X = df[['Open', 'High', 'Low', 'Volume']] # Example features, adjust based on your dataset
y_classification = (df['Close'].shift(-1) > df['Close']).astype(int) # Binary classification (increase/decrease)
y_regression = df['Close'] # Continuous target for regression (stock prices)

# Handling missing values using SimpleImputer (mean strategy)
imputer = SimpleImputer(strategy='mean')
X_imputed = imputer.fit_transform(X)

# Splitting dataset into training and testing sets
X_train, X_test, y_train_class, y_test_class = train_test_split(X_imputed, y_classification, test_size=0.2, random_state=42)
X_train_reg, X_test_reg, y_train_reg, y_test_reg = train_test_split(X_imputed, y_regression, test_size=0.2, random_state=42)

# Scaling features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
X_train_reg_scaled = scaler.fit_transform(X_train_reg)
X_test_reg_scaled = scaler.transform(X_test_reg)

# Classification Models
classifiers = {
"Logistic Regression": LogisticRegression(),
"Random Forest": RandomForestClassifier(),
"SVM": SVC(),
"k-NN": KNeighborsClassifier(),
"Neural Networks": MLPClassifier(),
"Gradient Boosting": GradientBoostingClassifier()
}

print("Classification Results:")
for name, model in classifiers.items():
model.fit(X_train_scaled, y_train_class)
y_pred_class = model.predict(X_test_scaled)
accuracy = accuracy_score(y_test_class, y_pred_class)
precision = precision_score(y_test_class, y_pred_class)
f1 = f1_score(y_test_class, y_pred_class)
print(f"{name} - Accuracy: {accuracy:.4f}, Precision: {precision:.4f}, F1 Score: {f1:.4f}")
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
pandas
numpy
scikit-learn
matplotlib

0 comments on commit 23250da

Please sign in to comment.