You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Aug 7, 2025. It is now read-only.
As far as I remember, the MMS allows dynamic batching: the method for processing instances always gets an array of instances.
Depending on the configuration, if the server receives more than BATCHSIZE requests within a configurable timespan, then these requess are dynamically collected into batches, run through the model and returned individually again.
This is a crucial feature for models where running single instances through the model is highly inefficient.
I could not figure out if/how this is supported by torch serve already, and I could not find anything in the documentation about this either.
Could somebody confirm that this is actually missing in torch serve or tell me where to find information about it if it is already implemented?