A collection of snippets and functions that I regularly use in my workflows as a data scientist. These snippets are utility functions that speed up my work, and focus on Pandas, Numpy and visualization libraries.
Snippet | Code | Description |
---|---|---|
Boosted value_counts | Code | This function improves the value_counts function by outputing absolute and normalized counts simultaneously, for faster analysis. It also sets the default value of dropna to False, so any NaNs that exist easily spotted. |
Pandarallel Configuration | Code | Pandas isn't great when it comes to handling large amounts of data, mainly because it natively uses only a single core. Pandarallel is a very straightforward alternative to parallelize pandas code. |
Snippet | Code | Description |
---|---|---|
Color print in jupyter notebooks | Code | Coloring specific values in your output can be an easy way to highlight important information. While there are packages like termcolor or colorama , I find that simply using ANSI color outputs works best. |
Progress bars in jupyter notebooks | Code | tqdm is a package that lets you create progress bars. While it has notebook specific versions via tqdm_notebook , I find that directly using tqdm works just as well without the hassle of setting up ipywidgets and IProgress . |