Skip to content

BUG: StringDtype objects from pandas <2.3.0 cannot be reliably unpickled in 2.3.0. #61763

Open
@Liam3851

Description

@Liam3851

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

### Using pandas 2.2.3
import pandas as pd

pd.DataFrame([['a', 'b'], ['c', 'd']]).astype('string').to_pickle('G:/temp/test2.pkl')
### Using pandas 2.3.0

import pandas as pd

df = pd.read_pickle('G:/temp/test2.pkl') # looks ok

df.dtypes # raises AttributeError: 'StringDtype' object has no attribute '_na_value'

df[0] + df[1] # also raises AttributeError

Issue Description

The code in a StringDtype object in 2.3 refers to an internal _na_value representation that appears not to have existed prior to 2.3.0. Pickled objects containing StringDtype columns pickled in earlier versions, including 2.2.3, may initially appear to unpickle successfully. However, listing the dtypes or even implicitly checking the dtypes by doing an operation, raises an AttributeError.

Expected Behavior

The documentation at read_pickle indicates backward compatibility to version 0.20.3, so a pickle from 2.2.3 should be readable and usable in 2.3.0.

A current workaround is something like this, to wrap the object in a freshly created 2.3.0-compatible dtype:

def unpickle_wrap(fn):
   df = pd.read_pickle(fn)
   for col, dtype in df.dtypes.items():
       if pd.api.types.is_string_dtype(dtype):
           df[col] = df[col].astype(object).astype('string')
   return df

Installed Versions

In [55]: pd.show_versions()

INSTALLED VERSIONS

commit : 2cc3762
python : 3.11.12
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19045
machine : AMD64
processor : Intel64 Family 6 Model 158 Stepping 10, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : English_United States.1252

pandas : 2.3.0
numpy : 2.2.6
pytz : 2025.2
dateutil : 2.9.0.post0
pip : 25.1.1
Cython : None
sphinx : None
IPython : 9.3.0
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : 4.13.4
blosc : None
bottleneck : 1.5.0
dataframe-api-compat : None
fastparquet : None
fsspec : 2025.5.1
html5lib : None
hypothesis : None
gcsfs : None
jinja2 : 3.1.6
lxml.etree : 5.4.0
matplotlib : 3.10.3
numba : 0.61.2+0.g1e70d8ceb.dirty
numexpr : 2.10.2
odfpy : None
openpyxl : 3.1.5
pandas_gbq : None
psycopg2 : None
pymysql : None
pyarrow : 19.0.1
pyreadstat : None
pytest : 8.4.1
python-calamine : None
pyxlsb : None
s3fs : 2025.5.1
scipy : 1.15.2
sqlalchemy : 2.0.41
tables : None
tabulate : 0.9.0
xarray : 2025.6.1
xlrd : None
xlsxwriter : 3.2.5
zstandard : 0.23.0
tzdata : 2025.2
qtpy : None
pyqt5 : None

(Edit: fixed example to make copy-pastable, and confirmed on main)

Metadata

Metadata

Assignees

Labels

BugNeeds TriageIssue that has not been reviewed by a pandas team member

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions