Releases: tensorflow/data-validation
TensorFlow Data Validation 0.26.1
Major Features and Improvements
- N/A
Bug Fixes and Other Changes
- Depends on
apache-beam[gcp]>=2.25,!=2.26.*,<2.29
.
Known Issues
- N/A
Breaking changes
- N/A
Deprecations
- N/A
TensorFlow Data Validation 0.30.0
Major Features and Improvements
-
This version is the last version before TFDV 1.0. Once 1.0, all the TFDV
public APIs (i.e. symbols in the root__init__.py
) will be subject to
semantic versioning. We are deprecating some public APIs in this version
and they will be removed in 1.0. -
Sketch-based top-k/unique stats generator now is able to detect invalid
utf-8 sequences / large texts and replace them with a placeholder.
It will not suffer from memory issue usually caused by image / large text
features in the data. Note that this generator is not by default used yet. -
Added
StatsOptions.experimental_use_sketch_based_topk_uniques
which
enables the sketch-based top-k/unique stats generator.
Bug Fixes and Other Changes
- Fixed bug in
display_schema
that caused domains not to be displayed. - Modified how
get_schema_dataframe
outputs numeric domains. - Anomalies previously (un)classified as UKNOWN_TYPE now trigger more specific
anomaly types: INVALID_DOMAIN_SPECIFICATION and MULTIPLE_REASONS. - Depends on
tensorflow-metadata>=0.30,<0.31
. - Depends on
tfx-bsl>=0.30,<0.31
.
Known Issues
- N/A
Breaking Changes
- N/A
Deprecations
tfdv.LiftStatsGenerator
is going to be removed in the next version from
the public API. To enable that generator,
supplyStatsOptions.label_feature
tfdv.NonStreamingCustomStatsGenerator
is going to be removed in the next
version from the public API. You may continue to import it from TFDV
but it will not be subject to compatibility guarantees.tfdv.validate_instance
is going to be removed in the next
version from the public API. You may continue to import it from TFDV
but it will not be subject to compatibility guarantees.- Removed
tfdv.DecodeCSV
,tfdv.DecodeTFExample
(deprecated in 0.27). - Removed
feature_whitelist
intfdv.StatsOptions
(deprecated in 0.28).
Usefeature_allowlist
instead. tfdv.get_feature_value_slicer
is deprecated.
tfdv.experimental_get_feature_value_slicer
is introduced as a replacement.
TFDV is likely to have a different slicing functionality post 1.0, which
may not be compatible with the current slicers.StatsOptions.slicing_functions
is deprecated.
StatsOptions.experimental_slicing_functions
is introduced as a
replacement.tfdv.WriteStatisticsToText
is removed (deprecated in 0.25.0).- Parameter
compression_type
intfdv.generate_statistics_from_tfrecord
is deprecated. The compression type is currently automatically determined.
TensorFlow Data Validation 0.29.0
Major Features and Improvements
- N/A
Bug Fixes and Other Changes
- Added check for invalid min and max values for
values_counts
for nested
features. - Bumped the mininum bazel version required to build TFDV to 3.7.2.
- Depends on
absl-py>=0.9,<0.13
. - Depends on
tensorflow-metadata>=0.29,<0.30
. - Depends on
tfx-bsl>=0.29,<0.30
.
Known Issues
- N/A
Breaking Changes
- N/A
Deprecations
- N/A
TensorFlow Data Validation 0.28.0
Major Features and Improvements
- Add anomaly detection for max bytes size for images.
Bug Fixes and Other Changes
- Depends on
numpy>=1.16,<1.20
. - Fixed a bug that affected all CombinerFeatureStatsGenerators.
- Allow for
bytes
type inget_feature_value_slicer
in addition toText
andint
. - Fixed a bug that caused TFDV to improperly infer a fixed shape when
tfdv.infer_schema
andtfdv.update_schema
were called with
infer_feature_shape=True
. - Deprecated parameter
infer_feature_shape
of functiontfdv.update_schema
.
If a schema feature has a pre-defined shape,tfdv.update_schema
will
always validate it. Otherwise, it will not try to add a shape. - Deprecated
tfdv.StatsOptions.feature_whitelist
and added
feature_allowlist
as a replacement. The former will be removed in the next
release. - Added
get_schema_dataframe
andget_anomalies_dataframe
utility
functions. - Depends on
apache-beam[gcp]>=2.28,<3
. - Depends on
tensorflow-metadata>=0.28,<0.29
. - Depends on
tfx-bsl>=0.28.1,<0.29
.
Known Issues
- N/A
Breaking Changes
- N/A
Deprecations
- N/A
TensorFlow Data Validation 0.27.0
Major Features and Improvements
- Performance improvement to
BasicStatsGenerator
.
Bug Fixes and Other Changes
- Added a
compact()
andsetup()
interface toCombinerStatsGenerator
,
CombinerFeatureStatsWrapperGenerator
,BasicStatsGenerator
,
CompositeStatsGenerator
, andConstituentStatsGenerator
. - Stopped depending on
tensorflow-transform
. - Depends on
apache-beam[gcp]>=2.27,<3
. - Depends on
pyarrow>=1,<3
. - Depends on
tensorflow>=1.15.2,!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,<3
. - Depends on
tensorflow-metadata>=0.27,<0.28
. - Depends on
tfx-bsl>=0.27,<0.28
.
Known Issues
- N/A
Breaking changes
- N/A
Deprecations
tfdv.DecodeCSV
andtfdv.DecodeTFExample
are deprecated. Use
tfx_bsl.public.tfxio.CsvTFXIO
andtfx_bsl.public.tfxio.TFExampleRecord
instead.
TensorFlow Data Validation 0.26.0
Version 0.26.0
Major Features and Improvements
- Added support for per-feature example weights which allows associating each
column its specific weight column. See theper_feature_weight_override
parameter inStatsOptions.__init__
.
Bug Fixes and Other Changes
- Newly added LifecycleStage.DISABLED is now exempt from validation (similar
to LifecycleStage.DEPRECATED, etc). - Fixed a bug where TFDV blindly trusts the claim type in the provided schema.
TFDV now computes the stats according to the actual type of the data, and
only when the actual type matches the claim in the schema will it compute
type-specific stats (e.g. categorical ints). - Added an option to control whether to add default stats generators when
tfdv.GenerateStatistics()
. - Started using a new quantiles computation routine that does not depend on
TF. This could potentially increase the performance of TFDV under certain
workloads. - Extending schema_util to support sematic domains.
- Moving natural_language_stats_generator to
natural_language_domain_inferring_stats_generator. - Providing vocab_utils to assist in opening / loading vocabulary files.
- A SchemaDiff will be reported upon J-S skew/drift.
- Fixed a bug in FLOAT_TYPE_SMALL_FLOAT anomaly message.
- Depends on
apache-beam[gcp]>=2.25,!=2.26.*,<3
. - Depends on
tensorflow>=1.15.2,!=2.0.*,!=2.1.*,!=2.2.*,!=2.4.*,<3
. - Depends on
tensorflow-metadata>=0.26,<0.27
. - Depends on
tensorflow-transform>=0.26,<0.27
. - Depends on
tfx-bsl>=0.26,<0.27
.
Known Issues
- N/A
Breaking changes
- N/A
Deprecations
- N/A
TensorFlow Data Validation 0.25.0
Version 0.25.0
Major Features and Improvements
-
Add support for detecting drift and distribution skew in numeric features.
-
tfdv.validate_statistics
now also reports the raw measurements of
distribution skew/drift (if any is done), regardless whether skew/drift is
detected. The report is in thedrift_skew_info
of theAnomalies
proto
(return value ofvalidate_statistics
). -
From this release TFDV will also be hosting nightly packages on
https://pypi-nightly.tensorflow.org. To install the nightly package use the
following command:pip install -i https://pypi-nightly.tensorflow.org/simple tensorflow-data-validation
Note: These nightly packages are unstable and breakages are likely to
happen. The fix could often take a week or more depending on the complexity
involved for the wheels to be available on the PyPI cloud service. You can
always use the stable version of TFDV available on PyPI by running the
commandpip install tensorflow-data-validation
.
Bug Fixes and Other Changes
- Added
tfdv.load_stats_binary
to load stats what were written using
tfdv.WriteStatisticsToText
(nowtfdv.WriteStatisticsToBinaryFile
). - Anomalies previously (un)classified as UKNOWN_TYPE now trigger more specific
anomaly types: DOMAIN_INVALID_FOR_TYPE, UNEXPECTED_DATA_TYPE,
FEATURE_MISSING_NAME, FEATURE_MISSING_TYPE, INVALID_SCHEMA_SPECIFICATION - Fixed a bug that
import tensorflow_data_validation
would fail if IPython
is not installed. IPython is an optional dependency of TFDV. - Depends on
apache-beam[gcp]>=2.25,<3
. - Depends on
tensorflow-metadata>=0.25,<0.26
. - Depends on
tensorflow-transform>=0.25,<0.26
. - Depends on
tfx-bsl>=0.25,<0.26
.
Known Issues
- N/A
Breaking Changes
tfdv.WriteStatisticsToText
is renamed as
tfdv.WriteStatisticsToBinaryFile
. The former is still available but will
be removed in a future release.
Deprecations
- N/A
TensorFlow Data Validation 0.24.1
Major Features and Improvements
- N/A
Bug Fixes and Other Changes
- Depends on
apache-beam[gcp]>=2.24,<3
. - Depends on
tensorflow-transform>=0.24.1,<0.25
. - Depends on
tfx-bsl>=0.24.1,<0.25
.
Known Issues
- N/A
Breaking Changes
- N/A
Deprecations
- N/A
TensorFlow Data Validation 0.23.1
Major Features and Improvements
- N/A
Bug Fixes and Other Changes
- Depends on
apache-beam[gcp]>=2.24,<3
.
Known Issues
- N/A
Breaking Changes
- N/A
Deprecations
- Deprecated python 3.5 support.
TensorFlow Data Validation 0.24.0
Major Features and Improvements
- You can now build the TFDV wheel with
python setup.py bdist_wheel
. Note: - If you want to build a manylinux2010 wheel you'll still need
to use Docker. - Bazel is still required.
- You can now build manylinux2010 TFDV wheel for Python 3.8.
Bug Fixes and Other Changes
- Support allowlist and denylist features in
tfdv.visualize_statistics
method. - Depends on
absl-py>=0.9,<0.11
. - Depends on
pandas>=1.0,<2
. - Depends on
protobuf>=3.9.2,<4
. - Depends on
tensorflow-metadata>=0.24,<0.25
. - Depends on
tensorflow-transform>=0.24,<0.25
. - Depends on
tfx-bsl>=0.24,<0.25
.
Known Issues
- N/A
Breaking Changes
- N/A
Deprecations
- Deprecated Py3.5 support.
- Deprecated
sample_count
option intfdv.StatsOptions
. Usesample_rate
option instead.