Skip to content
This repository was archived by the owner on Aug 7, 2025. It is now read-only.
This repository was archived by the owner on Aug 7, 2025. It is now read-only.

Dynamic batching? #1132

@johann-petrak

Description

@johann-petrak

Torch serve mentions it is derived from the Multi Model Server https://github.com/awslabs/multi-model-server

As far as I remember, the MMS allows dynamic batching: the method for processing instances always gets an array of instances.
Depending on the configuration, if the server receives more than BATCHSIZE requests within a configurable timespan, then these requess are dynamically collected into batches, run through the model and returned individually again.

This is a crucial feature for models where running single instances through the model is highly inefficient.

I could not figure out if/how this is supported by torch serve already, and I could not find anything in the documentation about this either.

Could somebody confirm that this is actually missing in torch serve or tell me where to find information about it if it is already implemented?

Metadata

Metadata

Assignees

Labels

triaged_waitWaiting for the Reporter's resp

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions