Stock Price Prediction using Machine Learning (#387)

## Pull Request for PyVerse 💡 ### Requesting to submit a pull request to the PyVerse repository. --- #### Issue Title Implementing Machine Learning Models for Stock Market Prediction **Please enter the title of the issue related to your pull request.** *Enter the issue title here.* - [ yes] I have provided the issue title. --- #### Info about the Related Issue **What's the goal of the project?** *Describe the aim of the project.* - [ ] I have described the aim of the project. o build and compare various machine learning models to predict stock prices and trends based on historical stock market data, evaluating the performance of each model using key metrics like accuracy, mean absolute error (MAE), and root mean squared error (RMSE). 🔴 Brief Explanation: The goal of this project is to implement and compare multiple machine learning models for predicting stock prices and identifying trends. The stock market dataset will be used to forecast continuous variables, such as stock prices (regression), and to classify conditions, such as whether the price will increase or decrease (classification). Models to be Evaluated: Classification Models: Logistic Regression Random Forest Classifier Support Vector Machine (SVM) k-Nearest Neighbors (k-NN) Neural Networks Gradient Boosting Classifier Regression Models: Linear Regression Decision Trees Random Forest Regression Support Vector Regression (SVR) Gradient Boosting Regression Evaluation Metrics: For classification tasks: Accuracy, Precision, and F1 Score --- #### Name **Please mention your name.** *Enter your name here.* - [ ] I have provided my name. Benak Deepak --- #### GitHub ID **Please mention your GitHub ID.** *Enter your GitHub ID here.* BenakDeepak - [ ] I have provided my GitHub ID. #151528559 --- #### Email ID **Please mention your email ID for further communication.** *Enter your email ID here.* [email protected] - [ ] I have provided my email ID. --- #### Identify Yourself **Mention in which program you are contributing (e.g., WoB, GSSOC, SSOC, SWOC).** *Enter your participant role here.* GSSOC - [ ] I have mentioned my participant role. GSSOC --- #### Closes **Enter the issue number that will be closed through this PR.** *Closes: #issue-number* - [✅ ] I have provided the issue number. #330 --- #### Describe the Add-ons or Changes You've Made **Give a clear description of what you have added or modified.** *Describe your changes here.* - [ ✅ ] I have described my changes. i have added an ML model in machine learning repository with 3 files readme,requirement and main.py --- #### Type of Change **Select the type of change:** - [ ] Bug fix (non-breaking change which fixes an issue) - [✅ ] New feature (non-breaking change which adds functionality) - [ ] Code style update (formatting, local variables) - [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected) - [ ] This change requires a documentation update --- #### How Has This Been Tested? **Describe how your changes have been tested.** *Describe your testing process here.* I have hosted in localhost - [ ✅ ] I have described my testing process. --- #### Checklist **Please confirm the following:** - [ ✅ ] My code follows the guidelines of this project. - [ ✅ ] I have performed a self-review of my own code. - [ ✅ ] I have commented my code, particularly wherever it was hard to understand. - [ ✅ ] I have made corresponding changes to the documentation. - [ ✅ ] My changes generate no new warnings. - [✅ ] I have added things that prove my fix is effective or that my feature works. - [ ✅ ] Any dependent changes have been merged and published in downstream modules.
UTSAVS26 · Oct 11, 2024 · 23250da · 23250da
2 parents 9073c40 + 2f424fa
commit 23250da
Show file tree

Hide file tree

Showing 3 changed files with 128 additions and 0 deletions.
diff --git a/Machine_Learning/Analying_ML_model_using_maang/README.md b/Machine_Learning/Analying_ML_model_using_maang/README.md
@@ -0,0 +1,63 @@
+ML Project Title: Stock Price Prediction using Machine Learning
+🎯 Goal
+The main goal of this project is to develop machine learning models for predicting stock prices based on historical data. The objective is to classify whether the stock price will increase or decrease the next day and to predict the actual stock prices for regression.
+
+🧵 Dataset
+This project utilizes a dataset containing historical stock market data. The dataset includes the following columns:
+
+Date: The date of the stock prices.
+Open: The opening price of the stock.
+High: The highest price of the stock during the day.
+Low: The lowest price of the stock during the day.
+Close: The closing price of the stock.
+Volume: The number of shares traded.
+Target: The target variable for regression (stock prices).
+🧾 Description
+This project involves implementing a stock price prediction system using machine learning techniques with Python. Key components include:
+
+Data Loading: Loading the dataset from a CSV file and preprocessing it.
+Feature Engineering: Selecting relevant features and defining the target variables for both classification and regression tasks.
+Missing Values Handling: Using Simple Imputer to address missing values in the dataset.
+Data Splitting: Dividing the dataset into training and testing sets for both classification and regression tasks.
+Feature Scaling: Scaling the features using StandardScaler to improve model performance.
+🧮 What You Have Done
+Implemented data preprocessing steps, including date conversion and setting the date as an index.
+Constructed features for predicting stock price movement (classification) and actual prices (regression).
+Handled missing values in the dataset effectively.
+Split the data into training and testing sets to evaluate model performance.
+Scaled the features to standardize the input for machine learning algorithms.
+🚀 Models Implemented
+The following machine learning classification models were implemented to predict stock price movements:
+
+Logistic Regression: A model to predict the probability of stock price increase or decrease.
+Random Forest: An ensemble model that improves prediction accuracy through multiple decision trees.
+Support Vector Machine (SVM): A powerful classifier that works well on high-dimensional spaces.
+k-Nearest Neighbors (k-NN): A simple yet effective model for classification based on feature similarity.
+Neural Networks (MLP): A multi-layer perceptron for complex classification tasks.
+Gradient Boosting: An ensemble technique that builds models sequentially to minimize prediction error.
+📚 Libraries Needed
+Pandas: For data manipulation and analysis.
+NumPy: For numerical computations.
+Scikit-learn: For implementing machine learning algorithms, preprocessing, and evaluation metrics.
+Matplotlib: For data visualization (if used for visualizing results).
+📊 Exploratory Data Analysis Results
+Although exploratory data analysis (EDA) is not detailed in the code, it is recommended to explore:
+
+The distribution of stock prices.
+Correlations between different features.
+Trends over time.
+📈 Performance of the Models Based on Accuracy Scores
+The models were evaluated based on their performance metrics:
+
+Logistic Regression: Accuracy: [insert accuracy], Precision: [insert precision], F1 Score: [insert F1 score].
+Random Forest: Accuracy: [insert accuracy], Precision: [insert precision], F1 Score: [insert F1 score].
+SVM: Accuracy: [insert accuracy], Precision: [insert precision], F1 Score: [insert F1 score].
+k-NN: Accuracy: [insert accuracy], Precision: [insert precision], F1 Score: [insert F1 score].
+Neural Networks: Accuracy: [insert accuracy], Precision: [insert precision], F1 Score: [insert F1 score].
+Gradient Boosting: Accuracy: [insert accuracy], Precision: [insert precision], F1 Score: [insert F1 score].
+📢 Conclusion
+This project successfully demonstrates the use of machine learning techniques to predict stock price movements and actual prices. The models trained provide insights into the effectiveness of different algorithms in this domain. Future work may include refining model parameters, incorporating additional features, or exploring other algorithms to enhance predictive performance.
+
+✒️ Your Signature
+Benak Deepak
+https://www.linkedin.com/in/benak-deepak-210918254/
diff --git a/Machine_Learning/Analying_ML_model_using_maang/main.py b/Machine_Learning/Analying_ML_model_using_maang/main.py
@@ -0,0 +1,61 @@
+import pandas as pd
+import numpy as np
+from sklearn.model_selection import train_test_split
+from sklearn.preprocessing import StandardScaler
+from sklearn.linear_model import LogisticRegression, LinearRegression
+from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor, GradientBoostingClassifier, GradientBoostingRegressor
+from sklearn.svm import SVC, SVR
+from sklearn.neighbors import KNeighborsClassifier
+from sklearn.tree import DecisionTreeRegressor
+from sklearn.neural_network import MLPClassifier
+from sklearn.metrics import accuracy_score, precision_score, f1_score, mean_absolute_error, mean_squared_error
+from sklearn.impute import SimpleImputer
+import matplotlib.pyplot as plt
+
+# Load dataset
+df = pd.read_csv('C:\\Users\\Deepak\\Desktop\\merge\\merged_output.csv') # Update with your dataset path
+
+# Feature engineering and data preparation
+# Assuming the dataset has the following columns: 'Date', 'Open', 'High', 'Low', 'Close', 'Volume', 'Target'
+# You can change this according to your actual dataset
+df['Date'] = pd.to_datetime(df['Date'])
+df.set_index('Date', inplace=True)
+
+# Selecting features and target
+X = df[['Open', 'High', 'Low', 'Volume']]  # Example features, adjust based on your dataset
+y_classification = (df['Close'].shift(-1) > df['Close']).astype(int)  # Binary classification (increase/decrease)
+y_regression = df['Close']  # Continuous target for regression (stock prices)
+
+# Handling missing values using SimpleImputer (mean strategy)
+imputer = SimpleImputer(strategy='mean')
+X_imputed = imputer.fit_transform(X)
+
+# Splitting dataset into training and testing sets
+X_train, X_test, y_train_class, y_test_class = train_test_split(X_imputed, y_classification, test_size=0.2, random_state=42)
+X_train_reg, X_test_reg, y_train_reg, y_test_reg = train_test_split(X_imputed, y_regression, test_size=0.2, random_state=42)
+
+# Scaling features
+scaler = StandardScaler()
+X_train_scaled = scaler.fit_transform(X_train)
+X_test_scaled = scaler.transform(X_test)
+X_train_reg_scaled = scaler.fit_transform(X_train_reg)
+X_test_reg_scaled = scaler.transform(X_test_reg)
+
+# Classification Models
+classifiers = {
+    "Logistic Regression": LogisticRegression(),
+    "Random Forest": RandomForestClassifier(),
+    "SVM": SVC(),
+    "k-NN": KNeighborsClassifier(),
+    "Neural Networks": MLPClassifier(),
+    "Gradient Boosting": GradientBoostingClassifier()
+}
+
+print("Classification Results:")
+for name, model in classifiers.items():
+    model.fit(X_train_scaled, y_train_class)
+    y_pred_class = model.predict(X_test_scaled)
+    accuracy = accuracy_score(y_test_class, y_pred_class)
+    precision = precision_score(y_test_class, y_pred_class)
+    f1 = f1_score(y_test_class, y_pred_class)
+    print(f"{name} - Accuracy: {accuracy:.4f}, Precision: {precision:.4f}, F1 Score: {f1:.4f}")
diff --git a/Machine_Learning/Analying_ML_model_using_maang/requirements.txt b/Machine_Learning/Analying_ML_model_using_maang/requirements.txt
@@ -0,0 +1,4 @@
+pandas
+numpy
+scikit-learn
+matplotlib