Cython-optimized expanding apply transformations

I was generating lots of features for time-dependent data, and ended up writing a lot of expanding apply operations in Cython. Would the community want something like this? Imagine you have a dataframe with an "entity" column, a "time" column and some numeric "feature" column, and you want to calculate the expanding sum/mean/mode/etc. of the feature column for each entity.

This is currently not optimized well in Pandas especially for computing the mode for categorical variables where keeping track of state saves a lot of time.

Example of a use case:

```
import cython_opt # cython functions are defined here

# df like {"project_id": [1,1,1,1,2,2,2,2], "value": [3,4,5,6, 10,11,12,13]}
df.groupby(level='project_id')['value'].transform(lambda x: cython_opt.expanding_mean(x.values))
# output like {"project_id": [1,1,1,1,2,2,2,2], "value": [3, 3.5, 4, 4.5, 10, 10.5, 11, 11.5]}
```

I can provide lots more examples and functions (I wrote around 20 of these)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Cython-optimized expanding apply transformations #12430

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Cython-optimized expanding apply transformations #12430

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions