Skip to content

v0.6

Compare
Choose a tag to compare
@elibarzilay elibarzilay released this 18 Jul 02:03
· 1608 commits to master since this release

New functionality:

  • Similar to Spark's StringIndexer, we have a ValueIndexer that can
    be used for indexing any type of values instead of only strings. Not
    only can it index these values, we also provide a reverse mapping via
    IndexToValue, similar to Spark's IndexToString transform.

  • A new "clean missing" data estimator, example:

    val cmd = new CleanMissingData()
      .setInputCols(Array("some-column"))
      .setOutputCols(Array("some-column"))
      .setCleaningMode(CleanMissingData.customOpt)
      .setCustomValue(someCustomValue)
    val cmdModel = cmd.fit(dataset)
    val result = cmdModel.transform(dataset)
    
  • New default featurization for date and timestamp spark types and our
    internal image type. For featurization of date columns, convert
    column to double features: year, day of week, month, day of month.
    For featurization of timestamp columns, same as date and in addition:
    hour of day, minute of hour, second of minute. For featurization of
    image columns, use image data converted to double with width and
    height info.

  • Starting the docker image without an ACCEPT_EULA variable setting
    would throw an error. Instead, we now start a tiny web server that
    shows the EULA and replaces itself with the Jupyter interface when you
    click the AGREE button.

Breaking changes:

  • Renamed ImageTransform to ImageTransformer.

Notable bug fixes and other changes:

  • Improved sample notebooks, and a new one: "303 - Transfer Learning by
    DNN Featurization - Airplane or Automobile".

  • Fix serialization bugs in generated python PipelineStages.

Acknowledgments

Thanks to Ali Zaidi for some notebook beautifications.