You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Calling Dataset.to_dataframe() currently always produces a memory copy of all arrays. This is definitely not optimal for all scenarios. We should make it possible to convert Xarray objects to Pandas objects without a memory copy.
The Series and DataFrame constructors will now copy NumPy array by default when not otherwise specified. This was changed to avoid mutating a pandas object when the NumPy array is changed inplace outside of pandas. You can set copy=False to avoid this copy.
When we construct DataFrames in Xarray, we do it like this
importnumpyasnpimportxarrayasxrds=xr.DataArray(np.ones(1_000_000), dims=('x',), name="foo").to_dataset()
df=ds.to_dataframe()
print(np.shares_memory(df.foo.values, ds.foo.values)) # -> False# can see the memory locationsprint(ds.foo.values.__array_interface__)
print(df.foo.values.__array_interface__)
# compare to thisdf2=pd.DataFrame(
{
"foo": ds.foo.values,
},
copy=False
)
np.shares_memory(df2.foo.values, ds.foo.values) # -> True
Solution
I propose we add a copy keyword option to Dataset.to_dataframe() (and similar for DataArray) which defaults to False (current behavior) but allows users to select True if that's what they want.
The text was updated successfully, but these errors were encountered:
What is your issue?
Calling
Dataset.to_dataframe()
currently always produces a memory copy of all arrays. This is definitely not optimal for all scenarios. We should make it possible to convert Xarray objects to Pandas objects without a memory copy.This behavior may depend on Pandas version. As of 2.2, here are the relevant Pandas docs: https://pandas.pydata.org/docs/user_guide/copy_on_write.html
Here's the key point:
When we construct DataFrames in Xarray, we do it like this
xarray/xarray/core/dataset.py
Lines 7386 to 7388 in d5f84dd
Here's a minimal example
Solution
I propose we add a
copy
keyword option toDataset.to_dataframe()
(and similar forDataArray
) which defaults toFalse
(current behavior) but allows users to selectTrue
if that's what they want.The text was updated successfully, but these errors were encountered: