-
-
Notifications
You must be signed in to change notification settings - Fork 18.7k
Open
Labels
Copy / view semanticsGroupbyPerformanceMemory or execution speed performanceMemory or execution speed performanceRegressionFunctionality that used to work in a prior pandas versionFunctionality that used to work in a prior pandas versionTransformationse.g. cumsum, diff, ranke.g. cumsum, diff, rank
Milestone
Description
pd.options.mode.copy_on_write = False # True
size = 10_000
df = pd.DataFrame(
{
'a': np.random.randint(0, 100, size),
'b': np.random.randint(0, 100, size),
'c': np.random.randint(0, 100, size),
}
).set_index(['a', 'b']).sort_index()
gb = df.groupby(['a', 'b'])
%timeit gb.transform(lambda x: x == x.shift(-1).fillna(0))
# 2.0.x - CoW=False
# 1.46 s ± 14.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
#
# 2.0.x - CoW=True
# 1.47 s ± 6.11 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
#
# main - CoW=False
# 4.35 s ± 50.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
#
# main - CoW=True
# 9.11 s ± 76.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Encountered this trying to update some code to use CoW. The regression exists without CoW, but is also worse with it. Haven't done any investigation yet as to why.
PS: This code have not been using transform with a UDF 😄
Metadata
Metadata
Assignees
Labels
Copy / view semanticsGroupbyPerformanceMemory or execution speed performanceMemory or execution speed performanceRegressionFunctionality that used to work in a prior pandas versionFunctionality that used to work in a prior pandas versionTransformationse.g. cumsum, diff, ranke.g. cumsum, diff, rank