Car Price Prediction Data Analysis

This project involves analyzing a car price dataset using Python libraries such as NumPy, Pandas, Matplotlib, Seaborn, and sklearn. The goal is to pre-process the dataset by applying feature engineering, feature selection, and exploratory data analysis. Note that actual model building is not part of this project.

Project Description

The goal of this project is to analyze a car price dataset and perform various data preprocessing tasks, including feature engineering, feature selection, and exploratory data analysis. This analysis will help develop machine learning models that can accurately predict car prices based on various features such as model, production year, category, brand, fuel type, engine volume, mileage, cylinders, vehicle style, and others.

Dataset

The dataset consists of 19,237 samples and includes features such as:

Model
Production year
Category
Brand
Fuel type
Engine volume
Mileage
Cylinders
Vehicle style
Price (target variable)

Installation

To run this project, you'll need to install the following dependencies:

Python 3.x
NumPy
Pandas
Matplotlib
Seaborn
scikit-learn
Jupyter Notebook

You can install the required packages using pip:

pip install numpy pandas matplotlib seaborn scikit-learn jupyter

Usage

git clone https://github.com/yourusername/car-price-prediction.git

cd car-price-prediction

python -m venv envsource env/bin/activate # On Windows use \`env\\Scripts\\activate\`

pip install numpy pandas matplotlib seaborn scikit-learn jupyter

jupyter notebook

Open car_price_analysis.ipynb and run the cells sequentially to perform the data analysis.

Project Structure

The project directory contains the following files:

car_price_analysis.ipynb: The Jupyter Notebook containing the data analysis and preprocessing steps.
price_prediction_batch_23.csv: The dataset used for the analysis.
README.md: The project documentation.

Analysis Steps

Section 1: Data Types and Preprocessing

Analyze data types of features and update if required.
Process the 'Levy' column and convert it to an integer type.
Process the 'Mileage' column and convert it to an integer.
Check for NaN values in the data and remove them if necessary.

Section 2: Data Cleaning and Exploration

Check for duplicates and remove them.
Check for outliers using boxplots and statistical methods, and remove them if necessary.
Draw countplots for categorical features and write observations.
Draw histograms for numeric features, compute skewness, and apply transformation functions if needed.

Section 3: Feature Engineering and Encoding

Create a joint plot with the hue parameter and write observations.
Apply scaling methods to independent features.
Convert categorical features into numeric ones using appropriate encoding techniques.
Combine results and compute the correlation among all independent features using a heatmap. Discard one of the variables if high correlation is detected (above 0.7).

Section 4: Feature Selection Based on Correlation

Split the dataset into training and testing sets using an 80-20 split.
Compute the correlation of each independent feature with the dependent variable 'Price'.
Select the seven most important independent features based on the correlation values.

Section 5: Feature Selection Using SelectKBest

Apply the SelectKBest method to the dataset to reduce the feature set to the seven most important features.

Conclusions

The analysis identified key features influencing car prices and reduced the feature set to the most relevant ones for potential model building.
Data preprocessing steps, including handling missing values, outliers, and feature scaling, were crucial in preparing the data for analysis.

Contributing

Contributions are welcome! Please fork this repository and submit a pull request for any enhancements or bug fixes.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
car_price_analysis.ipynb		car_price_analysis.ipynb
price_prediction_batch_23.csv		price_prediction_batch_23.csv
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Car Price Prediction Data Analysis

Table of Contents

Project Description

Dataset

Installation

Usage

Project Structure

Analysis Steps

Section 1: Data Types and Preprocessing

Section 2: Data Cleaning and Exploration

Section 3: Feature Engineering and Encoding

Section 4: Feature Selection Based on Correlation

Section 5: Feature Selection Using SelectKBest

Conclusions

Contributing

License

About

Releases

Packages

Languages

License

Thabhelo/car-price-prediction

Folders and files

Latest commit

History

Repository files navigation

Car Price Prediction Data Analysis

Table of Contents

Project Description

Dataset

Installation

Usage

Project Structure

Analysis Steps

Section 1: Data Types and Preprocessing

Section 2: Data Cleaning and Exploration

Section 3: Feature Engineering and Encoding

Section 4: Feature Selection Based on Correlation

Section 5: Feature Selection Using SelectKBest

Conclusions

Contributing

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages