Skip to content

Commit 0baa925

Browse files
authored
DEPR: observed=False default in groupby (#51811)
* DEPR: observed=False default in groupby * Fixup docs * DEPR: observed=False default in groupby * fixup * Mention defaulting to True * fixup
1 parent 26999eb commit 0baa925

24 files changed

+84
-47
lines changed

doc/source/user_guide/10min.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -702,11 +702,11 @@ Sorting is per order in the categories, not lexical order:
702702
703703
df.sort_values(by="grade")
704704
705-
Grouping by a categorical column also shows empty categories:
705+
Grouping by a categorical column with ``observed=False`` also shows empty categories:
706706

707707
.. ipython:: python
708708
709-
df.groupby("grade").size()
709+
df.groupby("grade", observed=False).size()
710710
711711
712712
Plotting

doc/source/user_guide/advanced.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -800,8 +800,8 @@ Groupby operations on the index will preserve the index nature as well.
800800

801801
.. ipython:: python
802802
803-
df2.groupby(level=0).sum()
804-
df2.groupby(level=0).sum().index
803+
df2.groupby(level=0, observed=True).sum()
804+
df2.groupby(level=0, observed=True).sum().index
805805
806806
Reindexing operations will return a resulting index based on the type of the passed
807807
indexer. Passing a list will return a plain-old ``Index``; indexing with

doc/source/user_guide/categorical.rst

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -607,7 +607,7 @@ even if some categories are not present in the data:
607607
s = pd.Series(pd.Categorical(["a", "b", "c", "c"], categories=["c", "a", "b", "d"]))
608608
s.value_counts()
609609
610-
``DataFrame`` methods like :meth:`DataFrame.sum` also show "unused" categories.
610+
``DataFrame`` methods like :meth:`DataFrame.sum` also show "unused" categories when ``observed=False``.
611611

612612
.. ipython:: python
613613
@@ -618,17 +618,17 @@ even if some categories are not present in the data:
618618
data=[[1, 2, 3], [4, 5, 6]],
619619
columns=pd.MultiIndex.from_arrays([["A", "B", "B"], columns]),
620620
).T
621-
df.groupby(level=1).sum()
621+
df.groupby(level=1, observed=False).sum()
622622
623-
Groupby will also show "unused" categories:
623+
Groupby will also show "unused" categories when ``observed=False``:
624624

625625
.. ipython:: python
626626
627627
cats = pd.Categorical(
628628
["a", "b", "b", "b", "c", "c", "c"], categories=["a", "b", "c", "d"]
629629
)
630630
df = pd.DataFrame({"cats": cats, "values": [1, 2, 2, 2, 3, 4, 5]})
631-
df.groupby("cats").mean()
631+
df.groupby("cats", observed=False).mean()
632632
633633
cats2 = pd.Categorical(["a", "a", "b", "b"], categories=["a", "b", "c"])
634634
df2 = pd.DataFrame(
@@ -638,7 +638,7 @@ Groupby will also show "unused" categories:
638638
"values": [1, 2, 3, 4],
639639
}
640640
)
641-
df2.groupby(["cats", "B"]).mean()
641+
df2.groupby(["cats", "B"], observed=False).mean()
642642
643643
644644
Pivot tables:

doc/source/user_guide/groupby.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1401,7 +1401,7 @@ can be used as group keys. If so, the order of the levels will be preserved:
14011401
14021402
factor = pd.qcut(data, [0, 0.25, 0.5, 0.75, 1.0])
14031403
1404-
data.groupby(factor).mean()
1404+
data.groupby(factor, observed=False).mean()
14051405
14061406
.. _groupby.specify:
14071407

doc/source/whatsnew/v0.15.0.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -85,7 +85,7 @@ For full docs, see the :ref:`categorical introduction <categorical>` and the
8585
"medium", "good", "very good"])
8686
df["grade"]
8787
df.sort_values("grade")
88-
df.groupby("grade").size()
88+
df.groupby("grade", observed=False).size()
8989
9090
- ``pandas.core.group_agg`` and ``pandas.core.factor_agg`` were removed. As an alternative, construct
9191
a dataframe and use ``df.groupby(<group>).agg(<func>)``.

doc/source/whatsnew/v0.19.0.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1134,7 +1134,7 @@ As a consequence, ``groupby`` and ``set_index`` also preserve categorical dtypes
11341134
.. ipython:: python
11351135
11361136
df = pd.DataFrame({"A": [0, 1], "B": [10, 11], "C": cat})
1137-
df_grouped = df.groupby(by=["A", "C"]).first()
1137+
df_grouped = df.groupby(by=["A", "C"], observed=False).first()
11381138
df_set_idx = df.set_index(["A", "C"])
11391139
11401140
**Previous behavior**:

doc/source/whatsnew/v0.20.0.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -289,15 +289,15 @@ In previous versions, ``.groupby(..., sort=False)`` would fail with a ``ValueErr
289289

290290
.. code-block:: ipython
291291
292-
In [3]: df[df.chromosomes != '1'].groupby('chromosomes', sort=False).sum()
292+
In [3]: df[df.chromosomes != '1'].groupby('chromosomes', observed=False, sort=False).sum()
293293
---------------------------------------------------------------------------
294294
ValueError: items in new_categories are not the same as in old categories
295295
296296
**New behavior**:
297297

298298
.. ipython:: python
299299
300-
df[df.chromosomes != '1'].groupby('chromosomes', sort=False).sum()
300+
df[df.chromosomes != '1'].groupby('chromosomes', observed=False, sort=False).sum()
301301
302302
.. _whatsnew_0200.enhancements.table_schema:
303303

doc/source/whatsnew/v0.22.0.rst

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -109,7 +109,7 @@ instead of ``NaN``.
109109
110110
In [8]: grouper = pd.Categorical(['a', 'a'], categories=['a', 'b'])
111111
112-
In [9]: pd.Series([1, 2]).groupby(grouper).sum()
112+
In [9]: pd.Series([1, 2]).groupby(grouper, observed=False).sum()
113113
Out[9]:
114114
a 3.0
115115
b NaN
@@ -120,14 +120,14 @@ instead of ``NaN``.
120120
.. ipython:: python
121121
122122
grouper = pd.Categorical(["a", "a"], categories=["a", "b"])
123-
pd.Series([1, 2]).groupby(grouper).sum()
123+
pd.Series([1, 2]).groupby(grouper, observed=False).sum()
124124
125125
To restore the 0.21 behavior of returning ``NaN`` for unobserved groups,
126126
use ``min_count>=1``.
127127

128128
.. ipython:: python
129129
130-
pd.Series([1, 2]).groupby(grouper).sum(min_count=1)
130+
pd.Series([1, 2]).groupby(grouper, observed=False).sum(min_count=1)
131131
132132
Resample
133133
^^^^^^^^

doc/source/whatsnew/v2.1.0.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -99,6 +99,7 @@ Deprecations
9999
- Deprecated silently dropping unrecognized timezones when parsing strings to datetimes (:issue:`18702`)
100100
- Deprecated :meth:`DataFrame._data` and :meth:`Series._data`, use public APIs instead (:issue:`33333`)
101101
- Deprecating pinning ``group.name`` to each group in :meth:`SeriesGroupBy.aggregate` aggregations; if your operation requires utilizing the groupby keys, iterate over the groupby object instead (:issue:`41090`)
102+
- Deprecated the default of ``observed=False`` in :meth:`DataFrame.groupby` and :meth:`Series.groupby`; this will default to ``True`` in a future version (:issue:`43999`)
102103
- Deprecated ``axis=1`` in :meth:`DataFrame.groupby` and in :class:`Grouper` constructor, do ``frame.T.groupby(...)`` instead (:issue:`51203`)
103104
- Deprecated passing a :class:`DataFrame` to :meth:`DataFrame.from_records`, use :meth:`DataFrame.set_index` or :meth:`DataFrame.drop` instead (:issue:`51353`)
104105
- Deprecated accepting slices in :meth:`DataFrame.take`, call ``obj[slicer]`` or pass a sequence of integers instead (:issue:`51539`)

pandas/core/frame.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8681,7 +8681,7 @@ def groupby(
86818681
as_index: bool = True,
86828682
sort: bool = True,
86838683
group_keys: bool = True,
8684-
observed: bool = False,
8684+
observed: bool | lib.NoDefault = lib.no_default,
86858685
dropna: bool = True,
86868686
) -> DataFrameGroupBy:
86878687
if axis is not lib.no_default:

0 commit comments

Comments
 (0)