Release v0.6 · microsoft/SynapseML

New functionality:

Similar to Spark's StringIndexer, we have a ValueIndexer that can
be used for indexing any type of values instead of only strings. Not
only can it index these values, we also provide a reverse mapping via
IndexToValue, similar to Spark's IndexToString transform.

A new "clean missing" data estimator, example:

val cmd = new CleanMissingData()
  .setInputCols(Array("some-column"))
  .setOutputCols(Array("some-column"))
  .setCleaningMode(CleanMissingData.customOpt)
  .setCustomValue(someCustomValue)
val cmdModel = cmd.fit(dataset)
val result = cmdModel.transform(dataset)

New default featurization for date and timestamp spark types and our
internal image type. For featurization of date columns, convert
column to double features: year, day of week, month, day of month.
For featurization of timestamp columns, same as date and in addition:
hour of day, minute of hour, second of minute. For featurization of
image columns, use image data converted to double with width and
height info.
Starting the docker image without an ACCEPT_EULA variable setting
would throw an error. Instead, we now start a tiny web server that
shows the EULA and replaces itself with the Jupyter interface when you
click the AGREE button.

Breaking changes:

Renamed ImageTransform to ImageTransformer.

Notable bug fixes and other changes:

Improved sample notebooks, and a new one: "303 - Transfer Learning by
DNN Featurization - Airplane or Automobile".
Fix serialization bugs in generated python PipelineStages.

Acknowledgments

Thanks to Ali Zaidi for some notebook beautifications.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.6

New functionality:

Breaking changes:

Acknowledgments