Skip to content

AWS Data Wrangler 2.14.0

Compare
Choose a tag to compare
@jaidisido jaidisido released this 28 Jan 14:24
· 1133 commits to main since this release
7604507

Caveats

⚠️ For platforms without PyArrow 6 support (e.g. MWAA, EMR, Glue PySpark Job):

➡️ pip install pyarrow==2 awswrangler

New Functionalities

  • Support Athena Unload 🚀 #1038

Enhancements

  • Add the ExcludeColumnSchema=True argument to the glue.get_partitions call to reduce response size #1094
  • Add PyArrow flavor argument to write_parquet via pyarrow_additional_kwargs #1057
  • Add rename_duplicate_columns and handle_duplicate_columns flag to sanitize_dataframe_columns_names method #1124
  • Add timestamp_as_object argument to all database read_sql_table methods #1130
  • Add ignore_null to read_parquet_metadata method #1125

Documentation

  • Improve documentation on installing SAR Lambda layers with the CDK #1097
  • Fix broken link to tutorial in to_parquet method #1058

Bug Fix

  • Ensure that partition locations retrieved from AWS Glue always end in a "/" #1094
  • Fix bucketing overflow issue in Athena #1086

Thanks

We thank the following contributors/users for their work on this release:

@dennyau, @kailukowiak, @lucasmo, @moykeen, @RigoIce, @vlieven, @kepler, @mdavis-xyz, @ConstantinoSchillebeeckx, @kukushking, @jaidisido


P.S. The AWS Lambda Layer file (.zip) and the AWS Glue file (.whl) are available below. Just upload it and run or use them from our S3 public bucket!