Skip to content

DOC: Point out difference in usage of "str" dtype in constructor and astype member #61992

@cbourjau

Description

@cbourjau

Pandas version checks

  • I have checked that the issue still exists on the latest versions of the docs on main here

Location of the documentation

This concerns the 3.0 migration guide: https://pandas.pydata.org/docs/user_guide/migration-3-strings.html)

Documentation problem

The string migration guide suggests using "str" in place of "object" to write compatible code. The example only showcases this suggestion for the Series constructor, where it indeed works as intended (Pandas 2.3.0):

>>> import pandas as pd
>>> pd.Series(["a", None, np.nan, pd.NA], dtype="str").array 
 <NumpyExtensionArray>
 ['a', None, nan, <NA>]
 Length: 4, dtype: object

However, the semantics of using "str" are different if the series has already been initialized with an "object" dtype and the user calls astype("str") on it:

>>> series = pd.Series(["a", None, np.nan, pd.NA])
>>> series.array
<NumpyExtensionArray>
['a', None, nan, <NA>]
Length: 4, dtype: object
>>> series.astype("str").array
<NumpyExtensionArray>
['a', 'None', 'nan', '<NA>']
Length: 4, dtype: object

Note that all values have been cast to strings. In fact, this behavior appears to be the behavior of passing the literal str as the data type that is mentioned later in the bug-fix section.

Suggested fix for documentation

I believe this subtle difference should be pointed out in the migration guide. Ideally, a suggestion should be made on how one may write 3.0-compatible code using astype. In my case, the current Pandas 2 code is casting a categorical column (with string categories) into an object column, but I'd like to write code such that this operation becomes a string column in Pandas 3.

Metadata

Metadata

Assignees

No one assigned

    Labels

    DocsNeeds DiscussionRequires discussion from core team before further actionStringsString extension data type and string data

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions