0.24.0 (2024-10-14)
to_gbq
loads naive (no timezone) columns to BigQuery DATETIME instead of TIMESTAMP (#814)to_gbq
loads object column containing bool values to BOOLEAN instead of STRING (#814)to_gbq
loads object column containing dictionary values to STRUCT instead of STRING (#814)to_gbq
loadsunit8
columns to BigQuery INT64 instead of STRING (#814)
to_gbq
loadsunit8
columns to BigQuery INT64 instead of STRING (#814) (107bb40)to_gbq
loads naive (no timezone) columns to BigQuery DATETIME instead of TIMESTAMP (#814) (107bb40)to_gbq
loads object column containing bool values to BOOLEAN instead of STRING (#814) (107bb40)to_gbq
loads object column containing dictionary values to STRUCT instead of STRING (#814) (107bb40)
0.23.2 (2024-09-20)
- deps: Require google-cloud-bigquery >= 3.4.2 (5e14496)
- deps: Require numpy >=1.18.1 (5e14496)
- deps: Require packaging >= 22.0 (5e14496)
0.23.1 (2024-06-07)
0.23.0 (2024-05-20)
0.22.0 (2024-03-05)
0.21.0 (2024-01-25)
0.20.0 (2023-12-10)
- Add 'columns' as an alias for 'col_order' (#701) (e52e8f8)
- Add support for Python 3.12 (#702) (edb93cc)
- Migrating off of circle/ci (#638) (08fe090)
- Removed pkg_resources for native namespace support (#707) (eeb1959)
0.19.2 (2023-05-10)
- Correct the documented dtypes for
read_gbq
(#598) (b45651d) - Google Colab auth is used with pydata-google-auth 1.8.0+ (#631) (257aa62)
- Updates with a link to the canonical source of documentation (#620) (1dca732)
0.19.1 (2023-01-25)
- Updates the user instructions re OAuth (0c2b716)
0.19.0 (2023-01-11)
0.18.1 (2022-11-28)
0.18.0 (2022-11-19)
0.17.9 (2022-09-27)
0.17.8 (2022-08-09)
0.17.7 (2022-07-11)
0.17.6 (2022-06-03)
0.17.5 (2022-05-09)
0.17.4 (2022-03-14)
- avoid deprecated "out-of-band" authentication flow (#500) (4758e3a)
- correctly transform query job timeout configuration and exceptions (#492) (d8c3900)
0.17.3 (2022-03-05)
- deps: require google-api-core>=1.31.5, >=2.3.2 (#493) (744a71c)
- deps: require google-auth>=1.25.0 (744a71c)
- deps: require proto-plus>=1.15.0 (744a71c)
0.17.2 (2022-03-02)
0.17.1 (2022-02-24)
0.17.0 (2022-01-19)
- the first argument of
read_gbq
is renamed fromquery
toquery_or_table
(#443) (bf0e863) - use nullable Int64 and boolean dtypes if available (#445) (89078f8)
- accepts a table ID, which downloads the table without a query (#443) (bf0e863)
- use nullable Int64 and boolean dtypes if available (#445) (89078f8)
read_gbq
supports extreme DATETIME values such as0001-01-01 00:00:00
(#444) (d120f8f)to_gbq
allows strings for DATE and floats for NUMERIC withapi_method="load_parquet"
(#423) (2180836)- allow extreme DATE values such as
datetime.date(1, 1, 1)
inload_gbq
(#442) (e13abaf) - avoid iteritems deprecation in pandas prerelease (#469) (7379cdc)
- use data project for destination in
to_gbq
(#455) (891a00c)
0.16.0 (2021-11-08)
to_gbq
uses Parquet by default, useapi_method="load_csv"
for old behavior (#413) (9a65383)- allow Python 3.10 (#417) (faba940)
- Load DataFrame with
to_gbq
to a table in a project different from the API client project. Specify the target table ID asproject.dataset.table
to use this feature. (#321, #347) - Allow billing project to be separate from destination table project
in
to_gbq
. (#321)
- Avoid 403 error from
to_gbq
when table haspolicyTags
. (#354) - Avoid
client.dataset
deprecation warnings. (#312)
- Drop support for Python 3.5 and 3.6. (#337)
- Drop support for google-cloud-bigquery==2.4.* due to query hanging bug. (#343)
- Use
object
dtype forTIME
columns. (#328) - Encode floating point values with greater precision. (#326)
- Support
INT64
and other standard SQL aliases in~pandas_gbq.to_gbq
table_schema
argument. (#322)
- Add
dtypes
argument toread_gbq
. Use this argument to override the defaultdtype
for a particular column in the query results. For example, this can be used to select nullable integer columns as theInt64
nullable integer pandas extension type. (#242, #332)
df = gbq.read_gbq(
"SELECT CAST(NULL AS INT64) AS null_integer",
dtypes={"null_integer": "Int64"},
)
- Support
google-cloud-bigquery-storage
2.0 and higher. (#329) - Update the minimum version of
pandas
to 0.20.1. (#331)
- Update tests to run against Python 3.8. (#331)
- Include needed "extras" from
google-cloud-bigquery
package as dependencies. Exclude incompatible 2.0 version. (#324, #329)
- Fix
Provided Schema does not match Table
error when the existing table contains required fields. (#315)
- Fix
AttributeError
with BQ Storage API to download empty results. (#299)
- Raise
NotImplementedError
when the deprecatedprivate_key
argument is used. (#301)
- Add
max_results
argument to~pandas_gbq.read_gbq()
. Use this argument to limit the number of rows in the results DataFrame. Setmax_results
to 0 to ignore query outputs, such as for DML or DDL queries. (#102) - Add
progress_bar_type
argument to~pandas_gbq.read_gbq()
. Use this argument to display a progress bar when downloading data. (#182)
- Fix resource leak with
use_bqstorage_api
by closing BigQuery Storage API client after use. (#294)
- Update the minimum version of
google-cloud-bigquery
to 1.11.1. (#296)
- Add code samples to introduction and refactor howto guides. (#239)
- Breaking Change: Python 2 support has been dropped. This is to align with the pandas package which dropped Python 2 support at the end of 2019. (#268)
- Ensure
table_schema
argument is not modified inplace. (#278)
- Use object dtype for
STRING
,ARRAY
, andSTRUCT
columns when there are zero rows. (#285)
- Populate
user-agent
withpandas
version information. (#281) - Fix
pytest.raises
usage for latest pytest. Fix warnings in tests. (#282)
- Breaking Change: Default SQL dialect is now
standard
. Usepandas_gbq.context.dialect
to override the default value. (#195, #245)
- Document
BigQuery data type to pandas dtype conversion <reading-dtypes>
forread_gbq
. (#269)
- Update the minimum version of
google-cloud-bigquery
to 1.9.0. (#247) - Update the minimum version of
pandas
to 0.19.0. (#262)
- Update the authentication credentials. Note: You may need to set
reauth=True
in order to update your credentials to the most recent version. This is required to use new functionality such as the BigQuery Storage API. (#267) - Use
to_dataframe()
fromgoogle-cloud-bigquery
in theread_gbq()
function. (#247)
- Fix a bug where pandas-gbq could not upload an empty DataFrame. (#237)
- Allow
table_schema
into_gbq
to contain only a subset of columns, with the rest being populated using the DataFrame dtypes (#218) (contributed by @johnpaton) - Read
project_id
into_gbq
from providedcredentials
if available (contributed by @daureg) read_gbq
uses the timezone-awareDatetimeTZDtype(unit='ns', tz='UTC')
dtype for BigQueryTIMESTAMP
columns. (#269)- Add
use_bqstorage_api
toread_gbq
. The BigQuery Storage API can be used to download large query results (>125 MB) more quickly. If the BQ Storage API can't be used, the BigQuery API is used instead. (#133, #270)
- Warn when deprecated
private_key
parameter is used (#240) - New dependency Use the
pydata-google-auth
package for authentication. (#241)
- Deprecate
private_key
parameter topandas_gbq.read_gbq
andpandas_gbq.to_gbq
in favor of newcredentials
argument. Instead, create a credentials object usinggoogle.oauth2.service_account.Credentials.from_service_account_info
orgoogle.oauth2.service_account.Credentials.from_service_account_file
. See theauthentication how-to guide <howto/authentication>
for examples. (#161, #231)
- Allow newlines in data passed to
to_gbq
. (#180) - Add
pandas_gbq.context.dialect
to allow overriding the default SQL syntax dialect. (#195, #235) - Support Python 3.7. (#197, #232)
- int columns which contain NULL are now cast to float, rather than object type. (#174)
- DATE, DATETIME and TIMESTAMP columns are now parsed as pandas' timestamp objects (#224)
- Add
pandas_gbq.Context
to cache credentials in-memory, across calls toread_gbq
andto_gbq
. (#198, #208) - Fast queries now do not log above
DEBUG
level. (#204) With BigQuery's release of clustering querying smaller samples of data is now faster and cheaper. - Don't load credentials from disk if reauth is
True
. (#212) This fixes a bug where pandas-gbq could not refresh credentials if the cached credentials were invalid, revoked, or expired, even whenreauth=True
. - Catch RefreshError when trying credentials. (#226)
- Avoid listing datasets and tables in system tests. (#215)
- Improved performance from eliminating some duplicative parsing steps (#224)
- Improved
read_gbq
performance and memory consumption by delegatingDataFrame
construction to the Pandas library, radically reducing the number of loops that execute in python (#128) - Reduced verbosity of logging from
read_gbq
, particularly for short queries. (#201) - Avoid
SELECT 1
query when runningto_gbq
. (#202)
- Warn when
dialect
is not passed in toread_gbq
. The default dialect will be changing from 'legacy' to 'standard' in a future version. (#195) - Use general float with 15 decimal digit precision when writing to
local CSV buffer in
to_gbq
. This prevents numerical overflow in certain edge cases. (#192)
- Project ID parameter is optional in
read_gbq
andto_gbq
when it can inferred from the environment. Note: you must still pass in a project ID when using user-based authentication. (#103) - Progress bar added for
to_gbq
, through an optional library tqdm as dependency. (#162) - Add location parameter to
read_gbq
andto_gbq
so that pandas-gbq can work with datasets in the Tokyo region. (#177)
- Add
authentication how-to guide <howto/authentication>
. (#183) - Update
contributing
guide with new paths to tests. (#154, #164)
- Tests now use nox to run in multiple Python environments. (#52)
- Renamed internal modules. (#154)
- Refactored auth to an internal auth module. (#176)
- Add unit tests for
get_credentials()
. (#184)
- Only show
verbose
deprecation warning if Pandas version does not populate it. (#157)
- Fix bug in read_gbq when building a dataframe with integer columns on Windows. Explicitly use 64bit integers when converting from BQ types. (#119)
- Fix bug in read_gbq when querying for an array of floats (#123)
- Fix bug in read_gbq with
configuration argument. Updates read_gbq to account for breaking change in
the way
google-cloud-python
version 0.32.0+ handles query configuration API representation. (#152) - Fix bug in to_gbq where seconds were discarded in timestamp columns. (#148)
- Fix bug in to_gbq when supplying a user-defined schema (#150)
- Deprecate the
verbose
parameter in read_gbq and to_gbq. Messages use the logging module instead of printing progress directly to standard output. (#12)
- Fix an issue where Unicode couldn't be uploaded in Python 2 (#106)
- Add support for a passed schema in
`to_gbq
instead inferring the schema from the passed DataFrame with DataFrame.dtypes (#46 <#46>`_) - Fix an issue where a dataframe containing both integer and floating
point columns could not be uploaded with
to_gbq
(#116) to_gbq
now usesto_csv
to avoid manually looping over rows in a dataframe (should result in faster table uploads) (#96)
- Use the
google-cloud-bigquery
library for API calls. The
google-cloud-bigquery
package is a new dependency, and dependencies ongoogle-api-python-client
andhttplib2
are removed. See the installation guide for more details. (#93) - Structs and arrays are now named properly
(#23)
and BigQuery functions like
array_agg
no longer run into errors during type conversion (#22). to_gbq
now uses a load job instead of the streaming API. RemoveStreamingInsertError
class, as it is no longer used byto_gbq
. (#7, #75)
read_gbq
now raisesQueryTimeout
if the request exceeds thequery.timeoutMs
value specified in the BigQuery configuration. (#76)- Environment variable
PANDAS_GBQ_CREDENTIALS_FILE
can now be used to override the default location where the BigQuery user account credentials are stored. (#86) - BigQuery user account credentials are now stored in an application-specific hidden user folder on the operating system. (#41)
- Drop support for Python 3.4 (#40)
- The dataframe passed to
`.to_gbq(...., if_exists='append')
needs to contain only a subset of the fields in the BigQuery schema. (#24 <#24>`_) - Use the google-auth
library for authentication because
oauth2client
is deprecated. (#39) read_gbq
now has aauth_local_webserver
boolean argument for controlling whether to use web server or console flow when getting user credentials. Replaces --noauth_local_webserver command line argument. (#35)read_gbq
now displays the BigQuery Job ID and standard price in verbose output. (#70 and #71)
- All gbq errors will simply be subclasses of
ValueError
and no longer inherit from the deprecatedPandasError
.
InvalidIndexColumn
will be raised instead ofInvalidColumnOrder
inread_gbq
when the index column specified does not exist in the BigQuery schema. (#6)
- Bug with appending to a BigQuery table where fields have modes
(NULLABLE,REQUIRED,REPEATED) specified. These modes were compared
versus the remote schema and writing a table via
to_gbq
would previously raise. (#13)
Initial release of transfered code from pandas
Includes patches since the 0.19.2 release on pandas with the following:
read_gbq
now allows query configuration preferences pandas-GH#14742read_gbq
now storesINTEGER
columns asdtype=object
if they containNULL
values. Otherwise they are stored asint64
. This prevents precision lost for integers greather than 2*53. Furthermore ``FLOAT`` columns with values above 10*4 are no longer casted toint64
which also caused precision loss pandas-GH#14064, and pandas-GH#14305