wcs training tools
for Google Colaboratory
!pip install "http://github.com/raoulvm/wcs/archive/main.zip" --upgrade>/dev/null
# e.g.: import google drive share helper
from wcs.google import google_drive_share
Helps with loading files from google drive links. See docString
More than df.info(), less than pandas_profiling. Just between.
Lookup-join supporting function based min/max/threshold conditions. See DocString!
wcs.kraus.build_histograms_with_target(df: pd.DataFrame, target_col: str, cont_cols: List[str], ord_cols: List[str], cat_cols: List[str], pred_col: str = None, perc_winsor: float = 1, error:str=None, fix_ratio_scale:Union[float,NoneType]=1, as_pdf_name:str=None)
Clemens' helper to plot features against binary targets.
Shorthand for wcs.skl.metrics.pretty_confusionmatrix() with an inner sklearn.metrics.confusion_matrix(), with auto sorting the POSITIVE label to be 1 and the negative label to be 0, if the values support this assumption.
can print nicer explainable confusion matrices. pass it a confusion matrix and enjoy. Work In Progress Warning
new location since v. 0.0.17
pretty_confusionmatrix(confusionmatrix: np.ndarray, textlabels:List[str]=['Positive','Negative'], title:str='Confusion Matrix', texthint:str='', metrics:bool=True)->Union[object, dict]:
"""Create a more readable HTML based confusion matrix, based on sklearn
Args:
confusionmatrix (np.ndarray): a sklearn.metrics.confusionmatrix
textlabels (List[str], optional): The class labels as list of strings.
Defaults to ['Positive','Negative'].
title (str, optional): The confusion matrix' title. Defaults to 'Confusion Matrix'.
texthint (str, optional): Text to print in the top left corner. Defaults to ''.
If an empty string (default) is passed, print the population number.
metrics (bool, optional): Print the confusion matrix immediately, and return a
dict with the metrices. Defaults to True. If set to False, the function
returns the confusion metrix as HTMLTable object.
Returns:
Union[HTMLTable, dict]: The matrix as HTMLTable if `metrics` is set to False, a dict with the metrics otherwise (Default)
"""
returns the output columns of e.g. a column transformer with nested pipelines
collates transformations for the same columns into Pipelines. See DocString Caveat: Do not use if you have transformations that require multiple columns to be passed at once! The "re-piper" will break them into multiple calls, for each column one call.
Winsorization Transformer, supports fit() and tranform() compatible to other sklearn transformers.
instantiates transformers for multiple use of the transformation list without the need of resetting them again
wraps sklearn.model_selection.train_test_split()
so the indices get all reset before returning the data
Die Aufteilung in
Wie rcat(), aber mit Gewichtung der Varianzen durch die Gruppengroessen (weniger Ausreisser-empfindlich)
nicer printout of 1 and 2-dimensional matrices in colab, can also print some matrix properties. See DocString
Calculates haversine based great circle distances on a dataframe. Dataframe has to have four columns: lon1, lat1, lon2, lat2
(names do not matter, it is iloc[]
based)
Calculates haversine based great circle distances on two sets of longitude/latitude
Print a correlation heatmap (Pearsons) from a dataframe. Defaults to a symmtric black-white-black scale with white being at 0 correlation.
def corrheatmap(data:pd.core.frame.DataFrame,
vmax:float=1.0,
diagonal:bool=False,
decimals:int=2,
title:str='Correlation Matrix',
colors:List[str]=['black', 'white', 'black'],
annot:bool = True,
as_figure:bool=True,
figure_params:Dict[Any,Any]={'figsize':(14,8), 'dpi':75},
)
Easily create a simple interactive geo plot from lat/lon coordinates without fighting the windmills axes.
Easily plot a function
Create and modify text tables for display in Colab. Work In Progress Warning