Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Notebook: Customer Segmentation with K-Means Clustering using BigQuery Dataframes #26

Open
NiloFreitas opened this issue Sep 26, 2024 · 0 comments
Labels
good first issue Good for newcomers

Comments

@NiloFreitas
Copy link
Member

Description

This issue proposes the development of a new notebook that demonstrates how to perform customer segmentation using K-Means clustering with BigQuery Dataframes. The notebook should cover the following aspects:

Data Preparation

Select an appropriate customer data from a BigQuery public dataset or one dataset from our public GCS bucket, if available.
Perform feature engineering and selection relevant to customer segmentation (e.g., recency, frequency, monetary value - RFM analysis).
Prepare the data for K-Means clustering using BigQuery Dataframes.

Model Training

Use bigframes.ml.cluster.KMeans to train a K-Means clustering model.
Optimize the number of clusters (k) using techniques like the elbow method or silhouette analysis.

Cluster Analysis

Analyze the characteristics of each customer segment.
Visualize the clusters using appropriate techniques (e.g., scatter plots, t-SNE).

Interpretation and Application

Draw insights from the customer segments.
Discuss.potential applications of the segmentation results (e.g., targeted marketing, personalized recommendations).

Instructions for Contributors

Use the existing notebooks in the repository as a template for structure and style.
Ensure the notebook is well-documented and easy to follow.
Include a clear explanation of the concepts and techniques used.
Provide visualizations to illustrate the results.

Use a publicly available dataset or provide instructions on how to generate synthetic data.
Test the notebook thoroughly before submitting a pull request.

Resources

BigQuery Dataframes documentation: https://cloud.google.com/python/docs/reference/bigframes/latest/bigframes.ml.cluster.KMeans
K-Means Clustering documentation: https://cloud.google.com/bigquery/docs/kmeans-tutorial

Contributing guidelines: CONTRIBUTING.md

Note: Please refer to the contributing guidelines for detailed instructions on how to contribute to this repository.

This notebook will provide a valuable resource for users interested in applying K-Means clustering for customer segmentation using BigQuery Dataframes. We encourage contributions from the community to help develop this notebook.

We appreciate a lot your contribution! :)

@NiloFreitas NiloFreitas added the good first issue Good for newcomers label Sep 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

1 participant