BUG: Fix extra decimal places in DataFrame.to_csv() with quoting=csv.QUOTE_NONNUMERIC and float16/float32 dtypes (#60699) #60804
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
doc/source/whatsnew/v3.0.0.rst
file if fixing a bug or adding a new feature.quoting=None
logic forfloat
arrays.Issue
Dataframe.to_csv()
generates extra decimal places in output whenquoting=csv.QUOTE_NONNUMERIC
, dataframe'sdtype=float16 / float32
andfloat_format=None
.Reason
Dataframe.to_csv()
internally usesget_values_for_csv()
and whenquoting
is specified (=csv.QUOTE_NONNUMERIC
), it converts numpyfloat
array toobject
.pandas/pandas/core/indexes/base.py
Lines 7751 to 7765 in 57d2489
np.array(values, dtype="object")
affectsfloat16
,float32
andfloat64
differentlyfloat16
,float32
object
array, internal binary representation of the float16 values is stored inside Python's float (equivalent tonumpy.float64
), which can fully display that exact binary representationdtype=float16
anddtype=float32
when conversion todtype=object
float64
float64
represent most decimal numbers (like 8.57) exactly or with an extremely small error that is practically undetectable when converted to a higher precision or displayed as a Pythonfloat
float64
numpy array toobject
, internal binary representation is directly transferred to the object type and there is no "extra decimals" in the output.Fix Implemented
To preserve the decimal representation in case of
dtype=float16
andfloat32
, we convert numpy float array to strings and then convert them back to Python'sfloat
which is nearly equivalent tonumpy.float64
str
preserves decimal representation and prevents exposing the internal binary representation.float
is necessary to avoid treating float values as string and storing them in 64-bit (double precision) preserves the string representation.Additionally, in the original code
When
quoting
isNone
, converting first tostr
and then back toobject
is unnecessary work because the replacement ofna_rep
can be done directly on an object array (na_rep : str).Therefore,
quoting=None
branch was removed to streamline the logic.Testing
Successfully pass all existing test cases in
test_to_csv.py
with tests added for dataframes withdtype
asfloat16
,float32
andfloat64
with mix of negative, positive and missing values andquoting=csv.QUOTE_NONNUMERIC