Skip to content

BUG: ensure to_numeric down-casts to uint64 for large unsigned integers #61766

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 18 additions & 0 deletions pandas/core/tools/numeric.py
Original file line number Diff line number Diff line change
Expand Up @@ -271,6 +271,24 @@ def to_numeric(
if values.dtype == dtype:
break

# Fallback: if we requested an unsigned downcast but did not
# successfully convert (e.g. because the data was float64 after
# parsing large Python ints), attempt a direct cast to uint64 as a
# last resort. This addresses GH#14422 where `to_numeric` failed to
# downcast `[0, 9223372036854775808]` to ``uint64``.
if (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you investigate why this isn't being done in maybe_downcast_numeric instead?

downcast == "unsigned"
and values.dtype.kind == "f" # still a float dtype
and (not len(values) or np.all(np.mod(values, 1) == 0)) # integral values
and (not len(values) or np.min(values) >= 0)
and (not len(values) or np.max(values) <= np.iinfo(np.uint64).max)
):
try:
values = values.astype(np.uint64)
except (OverflowError, ValueError):
# If casting is unsafe, keep original dtype
pass

# GH33013: for IntegerArray, BooleanArray & FloatingArray need to reconstruct
# masked array
if (mask is not None or new_mask is not None) and not is_string_dtype(values.dtype):
Expand Down
Loading