Skip to content

Latest commit

 

History

History
741 lines (506 loc) · 38.1 KB

CHANGELOG.md

File metadata and controls

741 lines (506 loc) · 38.1 KB

Changelog

0.24.0 (2024-10-14)

⚠ BREAKING CHANGES

  • to_gbq loads naive (no timezone) columns to BigQuery DATETIME instead of TIMESTAMP (#814)
  • to_gbq loads object column containing bool values to BOOLEAN instead of STRING (#814)
  • to_gbq loads object column containing dictionary values to STRUCT instead of STRING (#814)
  • to_gbq loads unit8 columns to BigQuery INT64 instead of STRING (#814)

Features

  • Adds the capability to include custom user agent string (#819) (d43457b)

Bug Fixes

  • to_gbq loads unit8 columns to BigQuery INT64 instead of STRING (#814) (107bb40)
  • to_gbq loads naive (no timezone) columns to BigQuery DATETIME instead of TIMESTAMP (#814) (107bb40)
  • to_gbq loads object column containing bool values to BOOLEAN instead of STRING (#814) (107bb40)
  • to_gbq loads object column containing dictionary values to STRUCT instead of STRING (#814) (107bb40)

Dependencies

  • Min pyarrow is now 4.0.0 to support compliant nested types (#814) (107bb40)

0.23.2 (2024-09-20)

Bug Fixes

  • deps: Require google-cloud-bigquery >= 3.4.2 (5e14496)
  • deps: Require numpy >=1.18.1 (5e14496)
  • deps: Require packaging >= 22.0 (5e14496)

Documentation

0.23.1 (2024-06-07)

Bug Fixes

  • Handle None when converting numerics to parquet (#768) (53a4683)

Documentation

  • Use a short-link to BigQuery DataFrames (#773) (7cd4287)

0.23.0 (2024-05-20)

Features

  • read_gbq suggests using BigQuery DataFrames with large results (#769) (f937edf)

0.22.0 (2024-03-05)

Features

  • Move bqstorage to extras and add debug capability (#735) (366cb55)

Bug Fixes

  • Remove python 3.7 due to end of life (EOL) (#737) (d0810e8)

0.21.0 (2024-01-25)

Features

  • Use faster query_and_wait method from google-cloud-bigquery when available (#722) (ac3ce3f)

Bug Fixes

  • Update runtime check for min google-cloud-bigquery to 3.3.5 (#721) (b5f4869)

0.20.0 (2023-12-10)

Features

Bug Fixes

Documentation

  • Migrate .readthedocs.yml to configuration file v2 (#689) (d921219)

0.19.2 (2023-05-10)

Bug Fixes

  • Add exception context to GenericGBQExceptions (#629) (d17ae24)

Documentation

  • Correct the documented dtypes for read_gbq (#598) (b45651d)
  • Google Colab auth is used with pydata-google-auth 1.8.0+ (#631) (257aa62)
  • Updates with a link to the canonical source of documentation (#620) (1dca732)

0.19.1 (2023-01-25)

Documentation

  • Updates the user instructions re OAuth (0c2b716)

0.19.0 (2023-01-11)

Features

  • Adds ability to provide redirect uri (#595) (a06085e)

0.18.1 (2022-11-28)

Dependencies

  • Remove upper bound for python and pyarrow (#592) (4d28684)

0.18.0 (2022-11-19)

Features

  • Map "if_exists" value to LoadJobConfig.WriteDisposition (#583) (7389cd2)

0.17.9 (2022-09-27)

Bug Fixes

  • Updates requirements.txt to fix failing tests due to missing req (#575) (1d797a3)

0.17.8 (2022-08-09)

Bug Fixes

0.17.7 (2022-07-11)

Bug Fixes

  • allow to_gbq to run without bigquery.tables.create permission. (#539) (3988306)

0.17.6 (2022-06-03)

Documentation

  • fix changelog header to consistent size (#529) (218e06a)

0.17.5 (2022-05-09)

Bug Fixes

0.17.4 (2022-03-14)

Bug Fixes

  • avoid deprecated "out-of-band" authentication flow (#500) (4758e3a)
  • correctly transform query job timeout configuration and exceptions (#492) (d8c3900)

0.17.3 (2022-03-05)

Bug Fixes

  • deps: require google-api-core>=1.31.5, >=2.3.2 (#493) (744a71c)
  • deps: require google-auth>=1.25.0 (744a71c)
  • deps: require proto-plus>=1.15.0 (744a71c)

0.17.2 (2022-03-02)

Dependencies

0.17.1 (2022-02-24)

Bug Fixes

  • avoid TypeError when executing DML statements with read_gbq (#483) (e9f0e3f)

Documentation

  • document additional breaking change in 0.17.0 (#477) (a858c80)

0.17.0 (2022-01-19)

⚠ BREAKING CHANGES

  • the first argument of read_gbq is renamed from query to query_or_table (#443) (bf0e863)
  • use nullable Int64 and boolean dtypes if available (#445) (89078f8)

Features

  • accepts a table ID, which downloads the table without a query (#443) (bf0e863)
  • use nullable Int64 and boolean dtypes if available (#445) (89078f8)

Bug Fixes

  • read_gbq supports extreme DATETIME values such as 0001-01-01 00:00:00 (#444) (d120f8f)
  • to_gbq allows strings for DATE and floats for NUMERIC with api_method="load_parquet" (#423) (2180836)
  • allow extreme DATE values such as datetime.date(1, 1, 1) in load_gbq (#442) (e13abaf)
  • avoid iteritems deprecation in pandas prerelease (#469) (7379cdc)
  • use data project for destination in to_gbq (#455) (891a00c)

Miscellaneous Chores

0.16.0 (2021-11-08)

Features

  • to_gbq uses Parquet by default, use api_method="load_csv" for old behavior (#413) (9a65383)
  • allow Python 3.10 (#417) (faba940)

Miscellaneous Chores

Documentation

0.15.0 / 2021-03-30

Features

  • Load DataFrame with to_gbq to a table in a project different from the API client project. Specify the target table ID as project.dataset.table to use this feature. (#321, #347)
  • Allow billing project to be separate from destination table project in to_gbq. (#321)

Bug fixes

  • Avoid 403 error from to_gbq when table has policyTags. (#354)
  • Avoid client.dataset deprecation warnings. (#312)

Dependencies

  • Drop support for Python 3.5 and 3.6. (#337)
  • Drop support for google-cloud-bigquery==2.4.* due to query hanging bug. (#343)

0.14.1 / 2020-11-10

Bug fixes

  • Use object dtype for TIME columns. (#328)
  • Encode floating point values with greater precision. (#326)
  • Support INT64 and other standard SQL aliases in ~pandas_gbq.to_gbq table_schema argument. (#322)

0.14.0 / 2020-10-05

  • Add dtypes argument to read_gbq. Use this argument to override the default dtype for a particular column in the query results. For example, this can be used to select nullable integer columns as the Int64 nullable integer pandas extension type. (#242, #332)
df = gbq.read_gbq(
    "SELECT CAST(NULL AS INT64) AS null_integer",
    dtypes={"null_integer": "Int64"},
)

Dependency updates

  • Support google-cloud-bigquery-storage 2.0 and higher. (#329)
  • Update the minimum version of pandas to 0.20.1. (#331)

Internal changes

  • Update tests to run against Python 3.8. (#331)

0.13.3 / 2020-09-30

  • Include needed "extras" from google-cloud-bigquery package as dependencies. Exclude incompatible 2.0 version. (#324, #329)

0.13.2 / 2020-05-14

  • Fix Provided Schema does not match Table error when the existing table contains required fields. (#315)

0.13.1 / 2020-02-13

  • Fix AttributeError with BQ Storage API to download empty results. (#299)

0.13.0 / 2019-12-12

  • Raise NotImplementedError when the deprecated private_key argument is used. (#301)

0.12.0 / 2019-11-25

New features

  • Add max_results argument to ~pandas_gbq.read_gbq(). Use this argument to limit the number of rows in the results DataFrame. Set max_results to 0 to ignore query outputs, such as for DML or DDL queries. (#102)
  • Add progress_bar_type argument to ~pandas_gbq.read_gbq(). Use this argument to display a progress bar when downloading data. (#182)

Bug fixes

  • Fix resource leak with use_bqstorage_api by closing BigQuery Storage API client after use. (#294)

Dependency updates

  • Update the minimum version of google-cloud-bigquery to 1.11.1. (#296)

Documentation

  • Add code samples to introduction and refactor howto guides. (#239)

0.11.0 / 2019-07-29

  • Breaking Change: Python 2 support has been dropped. This is to align with the pandas package which dropped Python 2 support at the end of 2019. (#268)

Enhancements

  • Ensure table_schema argument is not modified inplace. (#278)

Implementation changes

  • Use object dtype for STRING, ARRAY, and STRUCT columns when there are zero rows. (#285)

Internal changes

  • Populate user-agent with pandas version information. (#281)
  • Fix pytest.raises usage for latest pytest. Fix warnings in tests. (#282)

0.10.0 / 2019-04-05

  • Breaking Change: Default SQL dialect is now standard. Use pandas_gbq.context.dialect to override the default value. (#195, #245)

Documentation

  • Document BigQuery data type to pandas dtype conversion <reading-dtypes> for read_gbq. (#269)

Dependency updates

  • Update the minimum version of google-cloud-bigquery to 1.9.0. (#247)
  • Update the minimum version of pandas to 0.19.0. (#262)

Internal changes

  • Update the authentication credentials. Note: You may need to set reauth=True in order to update your credentials to the most recent version. This is required to use new functionality such as the BigQuery Storage API. (#267)
  • Use to_dataframe() from google-cloud-bigquery in the read_gbq() function. (#247)

Enhancements

  • Fix a bug where pandas-gbq could not upload an empty DataFrame. (#237)
  • Allow table_schema in to_gbq to contain only a subset of columns, with the rest being populated using the DataFrame dtypes (#218) (contributed by @johnpaton)
  • Read project_id in to_gbq from provided credentials if available (contributed by @daureg)
  • read_gbq uses the timezone-aware DatetimeTZDtype(unit='ns', tz='UTC') dtype for BigQuery TIMESTAMP columns. (#269)
  • Add use_bqstorage_api to read_gbq. The BigQuery Storage API can be used to download large query results (>125 MB) more quickly. If the BQ Storage API can't be used, the BigQuery API is used instead. (#133, #270)

0.9.0 / 2019-01-11

  • Warn when deprecated private_key parameter is used (#240)
  • New dependency Use the pydata-google-auth package for authentication. (#241)

0.8.0 / 2018-11-12

Breaking changes

  • Deprecate private_key parameter to pandas_gbq.read_gbq and pandas_gbq.to_gbq in favor of new credentials argument. Instead, create a credentials object using google.oauth2.service_account.Credentials.from_service_account_info or google.oauth2.service_account.Credentials.from_service_account_file. See the authentication how-to guide <howto/authentication> for examples. (#161, #231)

Enhancements

  • Allow newlines in data passed to to_gbq. (#180)
  • Add pandas_gbq.context.dialect to allow overriding the default SQL syntax dialect. (#195, #235)
  • Support Python 3.7. (#197, #232)

Internal changes

  • Migrate tests to CircleCI. (#228, #232)

0.7.0 / 2018-10-19

  • int columns which contain NULL are now cast to float, rather than object type. (#174)
  • DATE, DATETIME and TIMESTAMP columns are now parsed as pandas' timestamp objects (#224)
  • Add pandas_gbq.Context to cache credentials in-memory, across calls to read_gbq and to_gbq. (#198, #208)
  • Fast queries now do not log above DEBUG level. (#204) With BigQuery's release of clustering querying smaller samples of data is now faster and cheaper.
  • Don't load credentials from disk if reauth is True. (#212) This fixes a bug where pandas-gbq could not refresh credentials if the cached credentials were invalid, revoked, or expired, even when reauth=True.
  • Catch RefreshError when trying credentials. (#226)

Internal changes

  • Avoid listing datasets and tables in system tests. (#215)
  • Improved performance from eliminating some duplicative parsing steps (#224)

0.6.1 / 2018-09-11

  • Improved read_gbq performance and memory consumption by delegating DataFrame construction to the Pandas library, radically reducing the number of loops that execute in python (#128)
  • Reduced verbosity of logging from read_gbq, particularly for short queries. (#201)
  • Avoid SELECT 1 query when running to_gbq. (#202)

0.6.0 / 2018-08-15

  • Warn when dialect is not passed in to read_gbq. The default dialect will be changing from 'legacy' to 'standard' in a future version. (#195)
  • Use general float with 15 decimal digit precision when writing to local CSV buffer in to_gbq. This prevents numerical overflow in certain edge cases. (#192)

0.5.0 / 2018-06-15

  • Project ID parameter is optional in read_gbq and to_gbq when it can inferred from the environment. Note: you must still pass in a project ID when using user-based authentication. (#103)
  • Progress bar added for to_gbq, through an optional library tqdm as dependency. (#162)
  • Add location parameter to read_gbq and to_gbq so that pandas-gbq can work with datasets in the Tokyo region. (#177)

Documentation

  • Add authentication how-to guide <howto/authentication>. (#183)
  • Update contributing guide with new paths to tests. (#154, #164)

Internal changes

  • Tests now use nox to run in multiple Python environments. (#52)
  • Renamed internal modules. (#154)
  • Refactored auth to an internal auth module. (#176)
  • Add unit tests for get_credentials(). (#184)

0.4.1 / 2018-04-05

  • Only show verbose deprecation warning if Pandas version does not populate it. (#157)

0.4.0 / 2018-04-03

  • Fix bug in read_gbq when building a dataframe with integer columns on Windows. Explicitly use 64bit integers when converting from BQ types. (#119)
  • Fix bug in read_gbq when querying for an array of floats (#123)
  • Fix bug in read_gbq with configuration argument. Updates read_gbq to account for breaking change in the way google-cloud-python version 0.32.0+ handles query configuration API representation. (#152)
  • Fix bug in to_gbq where seconds were discarded in timestamp columns. (#148)
  • Fix bug in to_gbq when supplying a user-defined schema (#150)
  • Deprecate the verbose parameter in read_gbq and to_gbq. Messages use the logging module instead of printing progress directly to standard output. (#12)

0.3.1 / 2018-02-13

  • Fix an issue where Unicode couldn't be uploaded in Python 2 (#106)
  • Add support for a passed schema in `to_gbq instead inferring the schema from the passed DataFrame with DataFrame.dtypes (#46 <#46>`_)
  • Fix an issue where a dataframe containing both integer and floating point columns could not be uploaded with to_gbq (#116)
  • to_gbq now uses to_csv to avoid manually looping over rows in a dataframe (should result in faster table uploads) (#96)

0.3.0 / 2018-01-03

  • Use the google-cloud-bigquery library for API calls. The google-cloud-bigquery package is a new dependency, and dependencies on google-api-python-client and httplib2 are removed. See the installation guide for more details. (#93)
  • Structs and arrays are now named properly (#23) and BigQuery functions like array_agg no longer run into errors during type conversion (#22).
  • to_gbq now uses a load job instead of the streaming API. Remove StreamingInsertError class, as it is no longer used by to_gbq. (#7, #75)

0.2.1 / 2017-11-27

  • read_gbq now raises QueryTimeout if the request exceeds the query.timeoutMs value specified in the BigQuery configuration. (#76)
  • Environment variable PANDAS_GBQ_CREDENTIALS_FILE can now be used to override the default location where the BigQuery user account credentials are stored. (#86)
  • BigQuery user account credentials are now stored in an application-specific hidden user folder on the operating system. (#41)

0.2.0 / 2017-07-24

  • Drop support for Python 3.4 (#40)
  • The dataframe passed to `.to_gbq(...., if_exists='append') needs to contain only a subset of the fields in the BigQuery schema. (#24 <#24>`_)
  • Use the google-auth library for authentication because oauth2client is deprecated. (#39)
  • read_gbq now has a auth_local_webserver boolean argument for controlling whether to use web server or console flow when getting user credentials. Replaces --noauth_local_webserver command line argument. (#35)
  • read_gbq now displays the BigQuery Job ID and standard price in verbose output. (#70 and #71)

0.1.6 / 2017-05-03

  • All gbq errors will simply be subclasses of ValueError and no longer inherit from the deprecated PandasError.

0.1.4 / 2017-03-17

  • InvalidIndexColumn will be raised instead of InvalidColumnOrder in read_gbq when the index column specified does not exist in the BigQuery schema. (#6)

0.1.3 / 2017-03-04

  • Bug with appending to a BigQuery table where fields have modes (NULLABLE,REQUIRED,REPEATED) specified. These modes were compared versus the remote schema and writing a table via to_gbq would previously raise. (#13)

0.1.2 / 2017-02-23

Initial release of transfered code from pandas

Includes patches since the 0.19.2 release on pandas with the following:

  • read_gbq now allows query configuration preferences pandas-GH#14742
  • read_gbq now stores INTEGER columns as dtype=object if they contain NULL values. Otherwise they are stored as int64. This prevents precision lost for integers greather than 2*53. Furthermore ``FLOAT`` columns with values above 10*4 are no longer casted to int64 which also caused precision loss pandas-GH#14064, and pandas-GH#14305