AWS Data Wrangler 2.14.0
Caveats
⚠️ For platforms without PyArrow 6 support (e.g. MWAA, EMR, Glue PySpark Job):
➡️pip install pyarrow==2 awswrangler
New Functionalities
- Support Athena Unload 🚀 #1038
Enhancements
- Add the
ExcludeColumnSchema=True
argument to the glue.get_partitions call to reduce response size #1094 - Add PyArrow flavor argument to
write_parquet
viapyarrow_additional_kwargs
#1057 - Add
rename_duplicate_columns
andhandle_duplicate_columns
flag tosanitize_dataframe_columns_names
method #1124 - Add
timestamp_as_object
argument to all databaseread_sql_table
methods #1130 - Add
ignore_null
to read_parquet_metadata method #1125
Documentation
- Improve documentation on installing SAR Lambda layers with the CDK #1097
- Fix broken link to tutorial in
to_parquet
method #1058
Bug Fix
- Ensure that partition locations retrieved from AWS Glue always end in a "/" #1094
- Fix bucketing overflow issue in Athena #1086
Thanks
We thank the following contributors/users for their work on this release:
@dennyau, @kailukowiak, @lucasmo, @moykeen, @RigoIce, @vlieven, @kepler, @mdavis-xyz, @ConstantinoSchillebeeckx, @kukushking, @jaidisido
P.S. The AWS Lambda Layer file (.zip) and the AWS Glue file (.whl) are available below. Just upload it and run or use them from our S3 public bucket!