This repository provides a Python implementation for loading, exploring, visualizing, preprocessing, and modeling data using regression techniques. The project uses libraries like pandas
, matplotlib
, seaborn
, and scikit-learn
to handle and analyze the dataset.
- Data Loading and Exploration: Loads a dataset, provides an overview, checks for missing values, and prints dataset information.
- Data Visualization: Generates histograms, bar plots, and pair plots for insights into features and relationships.
- Data Preprocessing: Cleans data, encodes categorical features, and splits the dataset into training and testing sets.
- Model Training: Trains multiple regression models including Linear Regression, Lasso, and Ridge.
- Hyperparameter Tuning: Performs grid search to optimize Ridge regression.
- Model Evaluation: Evaluates models using metrics like R², Mean Absolute Error, and Mean Squared Error.
The following Python libraries are required:
numpy
pandas
matplotlib
seaborn
scikit-learn
Install the dependencies using the following command:
pip install -r requirements.txt
Note: Ensure you have Python 3.7 or later installed.
main.py
: Contains the core implementation.January_MyCall_2022.csv
: Placeholder for the dataset (update with the actual file).README.md
: Documentation for the project.
-
Clone the repository:
https://github.com/MohamadPirniakan/Customer-Call-Quality-Analysis.git cd your-repo-name
-
Run the script: Update the
filepath
variable inmain.py
with the path to your dataset, then execute:python main.py
-
Analyze the results: Review the printed metrics, visualizations, and sample predictions for insights.
-
Load Data:
df = load_data(filepath)
-
Visualize Data:
visualize_data(df)
-
Preprocess Data:
X_train, X_test, y_train, y_test = preprocess_data(df)
-
Train Models:
results = train_models(X_train, y_train)
-
Hyperparameter Tuning:
best_model = grid_search_ridge(X_train, y_train)
-
Evaluate Best Model:
evaluate_model(best_model, X_test, y_test)
The script generates the following plots:
- Histograms for categorical features.
- Bar plots showing the average rating by categorical features.
- Pair plots for latitude and longitude with rating distribution.
- Scatter plot comparing true vs. predicted ratings.
The dataset (January_MyCall_2022.csv
) is expected to have the following columns:
operator
inout_travelling
network_type
rating
calldrop_category
state_name
latitude
longitude
Replace the placeholder file path with your actual dataset location.
This project uses scikit-learn
for machine learning and matplotlib
/seaborn
for data visualization.
Feel free to contribute, raise issues, or suggest improvements. 😄