Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The raw telco data is not the best data set to demonstrate on #2

Open
mbkupfer opened this issue Jul 2, 2022 · 2 comments
Open

The raw telco data is not the best data set to demonstrate on #2

mbkupfer opened this issue Jul 2, 2022 · 2 comments

Comments

@mbkupfer
Copy link

mbkupfer commented Jul 2, 2022

Hi, while going through the tutorials, I've noticed on multiple occasions where the functions don't end up having any effect since the data set is already pretty clean. This makes some of the demonstrations a tad bit confusing as they end up being a no-operation procedure. May be beneficial if the raw data set started out more, well raw.

Example 1: In the load data notebook, there are cells that create transformation on null data. None of the data in the file has any nulls though.

Example 2: In the exploratory notebook at the drop_outliers definitions, outliers get dropped in only 1 of all 6 possible drops.

@sfc-gh-cbaechtold
Copy link
Collaborator

@sfc-gh-pjain flagging for your awareness

@sfc-gh-pjain
Copy link
Collaborator

The purpose of this guide was to make it easy for everyone to get started with snowpark (setup and easy ML engineering) and so made the data set super easy for everyone.

You can use the same notebooks on more robust dataset.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants