-
Notifications
You must be signed in to change notification settings - Fork 3
Open
Description
Depends on #42 , #41 , #46, #61
As soon as we have multiple Node instances per thread (and noderoutines), we should start load balancing them.
Proposed is scheduler balancer (Erlang/OTP style) with:
- we have one "worker" thread per each used-by-us-core
- if an API such as SetThreadIdealProcessor() is available we have "worker" threads "preferred" (but NOT strictly assigned via an affinity mask) to specific cores
- otherwise, we'll use a notion of "current core" for each thread (on Linux, we can use sched_getcpu() ).
- each thread has its own queue of outstanding non-running "tasks" (Nodes/noderoutines); whenever Node/noderoutine yields, it goes to the queue of its own thread
- to be responsive while processing calculations - worker thread has to "delegate" monitoring (though not processing) of its own sockets to some dedicated "monitoring" thread(s); whenever "monitoring" thread gets an event on any of sockets for worker thread - it puts respective Node to a queue of the worker thread (where it can be "stolen" by another worker thread if necessary).
- if a thread has nothing to do, it tries to do "work stealing" ("stealing" either Node or noderoutine)
When "stealing", we have to take into account (a) latencies, and (b) costs of Node/noderoutine migration to another core (avoiding unnecessary migrations as long as it is possible). Proposed initial model for preventing unnecessary migrations while keeping latencies at bay:
- we keep stats on length of the task (before rescheduling); out of these, we maintain median_task_length (TBD)
- for each task, we have max_latency parameter, AND elapsed_waiting parameter (whenever elapsed_waiting exceeds max_latency - we're starting to miss our latency requirements)
- additionally, we keep a global list of "emergency" tasks which are about to miss their latency requirements (say, have max_latency-elapsed_waiting < median_task_length); this list will need to be updated by a 3rd-party thread if current thread's task takes over NN ms to run.
- for choosing tasks within thread (without "stealing") - we're looking at:
- any emergency tasks (preferably in our own thread/on same core/on same socket/..)
- any thread's own tasks
- for a "stealing" thread, we're looking at:
- any emergency tasks (preferably on the same core/same socket)
- any tasks on the same core (HT)
- any tasks on the same socket
For the time being, we're speaking about INTRA-process load balancing - so Node migration is trivialized (and is relatively cheap too).
Note: by itself, it is NOT 100% similar to Erlang/OTP; to get there (for those deployments which DO need it), we have to add #45.