Data Science Project Collection 📊💻

This repository contains a series of data science exercises and projects aimed at exploring various domains such as clustering, time series analysis, machine learning, and more. Each project focuses on specific techniques and real-world datasets.

Clustering Activities 🤖👟

In this project, we use the k-means clustering algorithm to analyze time series data collected from wearable devices during activities like sleeping, running, and walking.

Dataset: data/activity.csv
Goal: Identify the number of activities and group individuals into clusters based on their activity patterns.
Challenge: Data inconsistency exists—one cohort sampled every second and another every 2 seconds.

Steps

Read and inspect the dataset for missing values.
Perform clustering to uncover hidden patterns.
Visualize and interpret the results.

Data Wrangling Medicaid 🏥💊

This exercise involves working with two datasets:

IRS Statistics of Income (SOI)
Medicaid Data per State

The goal is to create a summary table that explores medication costs per Medicaid enrollee by state.

Answer questions such as:
- What drugs contribute most to a state's spending?
- Are there regional patterns in drug prescriptions?

Outcome

Gain insights into healthcare spending and data wrangling skills crucial for real-world projects.

Simulation and Hypothesis: Memberships 💻📈

This project models revenue for a membership-based training website.

Dataset: memberships_info.csv
Goal: Develop a generative model to predict revenue for the upcoming year.

Key Factors

Gender differences in training completion rates and dropout probabilities.
Annual growth in memberships (13% increase, SD: 1.4%).
Historical data insights for over 90,000 enrollees in 2021.

Use the model to estimate revenue trends and understand membership dynamics.

Time Series Analysis 📅🔢

Analyze and generate synthetic time series data based on the following components:

Trend: Exponential growth function.
Seasonality: Quarterly sine wave.
Noise: Gaussian distribution.

Steps

Compute and plot each component independently.
Combine the components to create the final time series.
Use np.random.seed(42) to ensure reproducibility.

Visualize

The dataset spans 200 months, showcasing seasonal patterns and trends.

Wine Quality Classification 🍷📊

Classify wine quality (scale: 0–10) using various machine learning models.

Dataset: Contains features like acidity, density, etc.
Goal: Build models to classify wine quality and evaluate performance.

Models

K-Nearest Neighbors (KNN)
Logistic Regression
Random Forest
XGBoost

Evaluation

Compare models based on classification accuracy.
Visualize confusion matrix heatmaps to analyze prediction quality.

How to Use this Repository 📂🛠️

Clone the repository: git clone https://github.com/data-science-notebooks

Happy coding! 🚀

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
images		images
.gitignore		.gitignore
README.md		README.md
clustering-activities.ipynb		clustering-activities.ipynb
data-wrangling-medicaid.ipynb		data-wrangling-medicaid.ipynb
simulation-and-hypothesis-memberships.ipynb		simulation-and-hypothesis-memberships.ipynb
time_series.ipynb		time_series.ipynb
wine-quality.ipynb		wine-quality.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Science Project Collection 📊💻

Table of Contents

Clustering Activities 🤖👟

Steps

Data Wrangling Medicaid 🏥💊

Outcome

Simulation and Hypothesis: Memberships 💻📈

Key Factors

Time Series Analysis 📅🔢

Steps

Visualize

Wine Quality Classification 🍷📊

Models

Evaluation

How to Use this Repository 📂🛠️

About

Releases

Packages

Languages

acatarinaoaraujo/data-science-notebooks

Folders and files

Latest commit

History

Repository files navigation

Data Science Project Collection 📊💻

Table of Contents

Clustering Activities 🤖👟

Steps

Data Wrangling Medicaid 🏥💊

Outcome

Simulation and Hypothesis: Memberships 💻📈

Key Factors

Time Series Analysis 📅🔢

Steps

Visualize

Wine Quality Classification 🍷📊

Models

Evaluation

How to Use this Repository 📂🛠️

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages