This project is part of my portfolio showcasing my skills in data science and machine learning. It focuses on building a robust machine learning model to classify and detect breast cancer based on diagnostic features. It is based on the guided project with the same name in coursera.
Breast cancer is a prevalent disease, and early detection can significantly increase the chances of successful treatment. In this project, I have developed a predictive model using logistic regression to classify breast cancer cases into benign or malignant based on a dataset of diagnostic features.
- Data Cleaning and Preprocessing: The dataset was cleaned and prepared to ensure high-quality input for the model.
- Exploratory Data Analysis (EDA): Insightful visualizations and statistical analysis were performed to understand the data distribution and relationships.
- Machine Learning: A logistic regression model was implemented and evaluated for its predictive performance.
- Evaluation Metrics: Confusion matrix and accuracy score were used to evaluate model performance.
- Programming Language: Python
- Libraries:
- Pandas: Data manipulation and analysis
- NumPy: Numerical computations
- Matplotlib/Seaborn: Data visualization
- Scikit-learn: Machine learning model implementation and evaluation
-
Confusion Matrix: The confusion matrix illustrates the model's performance in distinguishing between benign and malignant cases.
-
Accuracy Score: The model achieved an accuracy of 94.7% on the test set.
- Clone the repository to your local machine:
git clone https://github.com/your-username/breast-cancer-detection.git cd breast-cancer-detection
- Install the required dependencies:
pip install -r requirements.txt
- Open the Jupyter Notebook:
jupyter notebook BreastCancerDetection.ipynb
- Run the cells to preprocess the data, train the model, and evaluate its performance.
The dataset used for this project contains diagnostic features of breast cancer cases. [Provide a link to the dataset source if available].
- Experiment with different machine learning models and compare their performance.
- Implement hyperparameter tuning to further optimize the logistic regression model.
- Develop a user-friendly web application for real-time breast cancer detection using the trained model.
This project demonstrates my ability to preprocess data, implement machine learning models, and evaluate their performance using standard metrics. It also highlights the importance of EDA in uncovering patterns and relationships within the data.