-
-
Notifications
You must be signed in to change notification settings - Fork 18.7k
Closed
Labels
ApplyApply, Aggregate, Transform, MapApply, Aggregate, Transform, MapEnhancementPerformanceMemory or execution speed performanceMemory or execution speed performanceWindowrolling, ewma, expandingrolling, ewma, expanding
Description
I was generating lots of features for time-dependent data, and ended up writing a lot of expanding apply operations in Cython. Would the community want something like this? Imagine you have a dataframe with an "entity" column, a "time" column and some numeric "feature" column, and you want to calculate the expanding sum/mean/mode/etc. of the feature column for each entity.
This is currently not optimized well in Pandas especially for computing the mode for categorical variables where keeping track of state saves a lot of time.
Example of a use case:
import cython_opt # cython functions are defined here
# df like {"project_id": [1,1,1,1,2,2,2,2], "value": [3,4,5,6, 10,11,12,13]}
df.groupby(level='project_id')['value'].transform(lambda x: cython_opt.expanding_mean(x.values))
# output like {"project_id": [1,1,1,1,2,2,2,2], "value": [3, 3.5, 4, 4.5, 10, 10.5, 11, 11.5]}
I can provide lots more examples and functions (I wrote around 20 of these)
Metadata
Metadata
Assignees
Labels
ApplyApply, Aggregate, Transform, MapApply, Aggregate, Transform, MapEnhancementPerformanceMemory or execution speed performanceMemory or execution speed performanceWindowrolling, ewma, expandingrolling, ewma, expanding