pandas-dev · jorisvandenbossche · Jun 27, 2025 · Jun 30, 2025 · Jul 15, 2025 · Jul 15, 2025
diff --git a/doc/source/whatsnew/v3.0.0.rst b/doc/source/whatsnew/v3.0.0.rst
@@ -14,10 +14,108 @@ including other versions of pandas.
 Enhancements
 ~~~~~~~~~~~~
 
-.. _whatsnew_300.enhancements.enhancement1:
+.. _whatsnew_300.enhancements.string_dtype:
 
-Enhancement1
-^^^^^^^^^^^^
+Dedicated string data type by default
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Historically, pandas represented string columns with NumPy ``object`` data type.
+This representation has numerous problems: it is not specific to strings (any
+Python object can be stored in an ``object``-dtype array, not just strings) and
+it is often not very efficient (both performance wise and for memory usage).
+
+Starting with pandas 3.0, a dedicated string data type is enabled by default
+(backed by PyArrow under the hood, if installed, otherwise falling back to
+NumPy). This means that pandas will start inferring columns containing string
-NumPy). This means that pandas will start inferring columns containing string
+``object``-dtype backed by NumPy). This means that pandas will start inferring columns containing string
-NumPy). This means that pandas will start inferring columns containing string
+being backed by NumPy ``object``-dtype). This means that pandas will start inferring columns containing string
-NumPy). This means that pandas will start inferring columns containing string
+``object``-dtype backed by NumPy). This means that pandas will start inferring columns containing string
-NumPy). This means that pandas will start inferring columns containing string
+being backed by NumPy ``object``-dtype). This means that pandas will start inferring columns containing string
+data as the new ``str`` data type when creating pandas objects, such as in
+constructors or IO functions.
+
+Old behavior:
+
+.. code-block:: python
+
+    >>> ser = pd.Series(["a", "b"])
+    0    a
+    1    b
+    dtype: object
+
+New behavior:
+
+.. code-block:: python
+
+    >>> ser = pd.Series(["a", "b"])
+    0    a
+    1    b
+    dtype: str
+
+The string data type that is used in these scenarios will mostly behave as NumPy
+object would, including missing value semantics and general operations on these
+columns.
+
+The main characteristic of the new string data type:
+
+- Inferred by default for string data (instead of object dtype)
+- The ``str`` dtype can only hold strings (or missing values), in contrast to
+  ``object`` dtype. (setitem with non string fails)
+- The missing value sentinel is always ``NaN`` (``np.nan``) and follows the same
+  missing value semantics as the other default dtypes.
+
+Those intentional changes can have breaking consequences, for example when checking
+for the ``.dtype`` being object dtype or checking the exact missing value sentinel.
+See the :ref:`string_migration_guide` for more details on the behaviour changes
+and how to adapt your code to the new default.
+
+.. seealso::
+
+    `PDEP-14: Dedicated string data type for pandas 3.0 <https://pandas.pydata.org/pdeps/0014-string-dtype.html>`__
+
+
+.. _whatsnew_300.enhancements.copy_on_write:
+
+Copy-on-Write
+^^^^^^^^^^^^^
+
+The new "copy-on-write" behaviour in pandas 3.0 brings changes in behavior in
+how pandas operates with respect to copies and views. A summary of the changes:
+
+1. The result of *any* indexing operation (subsetting a DataFrame or Series in any way,
+   i.e. including accessing a DataFrame column as a Series) or any method returning a
-   i.e. including accessing a DataFrame column as a Series) or any method returning a
+   e.g. accessing a DataFrame column as a Series) or any method returning a
-   i.e. including accessing a DataFrame column as a Series) or any method returning a
+   e.g. accessing a DataFrame column as a Series) or any method returning a
+   new DataFrame or Series, always *behaves as if* it were a copy in terms of user
+   API.
+2. As a consequence, if you want to modify an object (DataFrame or Series), the only way
+   to do this is to directly modify that object itself.
+
+The main goal of this change is to make the user API more consistent and
+predictable. There is now a clear rule: *any* subset or returned
+series/dataframe **always** behaves as a copy of the original, and thus never
+modifies the original (before pandas 3.0, whether a derived object would be a
+copy or a view depended on the exact operation performed, which was often
+confusing).
+
+Because every single indexing step now behaves as a copy, this also means that
+"chained assignment" (updating a DataFrame with multiple setitem steps) will
+stop working. Because this now consistently never works, the
+``SettingWithCopyWarning`` is removed.
+
+The new behavioral semantics are explained in more detail in the
+:ref:`user guide about Copy-on-Write <copy_on_write>`.
+
+A secondary goal is to improve performance by avoiding unnecessary copies. As
+mentioned above, every new DataFrame or Series returned from an indexing
+operation or method *behaves* as a copy, but under the hood pandas will use
+views as much as possible, and only copy when needed to guarantee the "behaves
+as a copy" behaviour (this is the actual "copy-on-write" mechanism used as an
+implementation detail).
+
+Some of the behaviour changes described above are breaking changes in pandas
+3.0. When upgrading to pandas 3.0, it is recommended to first upgrade to pandas
+2.3 to get deprecation warnings for a subset of those changes. The
+:ref:`migration guide <copy_on_write.migration_guide>` explains the upgrade
+process in more detail.
+
+.. seealso::
+
+    `PDEP-7: Consistent copy/view semantics in pandas with Copy-on-Write <https://pandas.pydata.org/pdeps/0007-copy-on-write.html>`__
 
 .. _whatsnew_300.enhancements.enhancement2: