Skip to content

Latest commit

 

History

History
37 lines (24 loc) · 1.5 KB

08-ohe.md

File metadata and controls

37 lines (24 loc) · 1.5 KB

3.8 One-hot encoding

Slides

Notes

One-Hot Encoding allows encoding categorical variables in numerical ones. This method represents each category of a variable as one column, and a 1 is assigned if the value belongs to the category or 0 otherwise.

Classes, functions, and methods:

  • df[x].to_dict(oriented='records') - convert x series to dictionaries, oriented by rows.
  • DictVectorizer().fit_transform(x) - Scikit-Learn class for converting x dictionaries into a sparse matrix, and in this way doing the one-hot encoding. It does not affect the numerical variables.
  • DictVectorizer().get_feature_names() - returns the names of the columns in the sparse matrix.

The entire code of this project is available in this jupyter notebook.

⚠️ The notes are written by the community.
If you see an error here, please create a PR with a fix.

Navigation