Skip to content

Prevent Ruby thread pools from creating too many threads. #326

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 6, 2015

Conversation

jdantonio
Copy link
Member

In testing I've discovered that MRI will raise a ThreadError if threads are created too fast. I believe I've been able to get more than 2046 threads in the past (see e3bb86f) but I cannot verify. In my current testing on 2.2.2 I've not been able to get more than 2046.

I have mixed feelings about this PR. With this change a pure-Ruby thread pool will gracefully handle ThreadError exceptions when adding threads to the pool. In such cases the :fallback_policy will be triggered. This seems like a reasonable response. Unfortunately, suppressing this exception may mask other problems that we haven't discovered yet.

Thought, anyone?

@pitr-ch
Copy link
Member

pitr-ch commented Jun 5, 2015

My thoughts/assumptions:

  • we are trying to avoid deadlocks at all cost
  • we don't have any detection of deadlocks
  • users are managing any thread hungry resources, e.g. they throttle incoming requests

Since we cannot detect what blocking job blocks which block on the io-thread-pool we have to try to process it immediately since otherwise it could create deadlock (all already blocked treads in the pool can be directly or transitively blocked by the job in queue). Based on that ability to add more threads to the pool is quite important. Therefore I thing we have either figure out what are the limits for given platforms or keep just the graceful handling of ThreadError without any hard limit, so the pool can grow if it can on different platforms.

We should also log a waning that we could not create another thread and that it may lead to deadlock, which informs the users what is the problem if they indeed hit a deadlock.

@ianks
Copy link
Contributor

ianks commented Jun 5, 2015

(all already blocked treads in the pool can be directly or transitively blocked by the job in queue)

Can you give an example of this?

Also, I am somewhat confused as to why someone would ever want to have so many threads as a default to begin with. To me, it seems the overhead of context-switching and accounting will almost always far outweigh any benefits here. I could be confused here about the functionality since I do not know the full implmentation on ThreadPool.

@jdantonio
Copy link
Member Author

I am somewhat confused as to why someone would ever want to have so many threads...

@ianks I can't think of any good reason, either. But the bigger concern is with the global thread pools. Since we create those thread pools implicitly we have an obligation to prevent those thread pools from crashing the application.

In theory this may never be a problem. In my testing I was able to prevent a thread pool from reaching the maximum number of threads simply by adding slight delays and varying intervals. In a real application the execution of the jobs on the pool will cause natural delays. My test script was an extreme example where the tasks immediately return nil and I create the threads as fast as the system will allow. Hardly a realistic use case. So this is likely an edge case that may never occur in production, but we still have to do something.

@pitr-ch If I follow your response correctly, you are suggesting that we keep the rescue ThreadError clause but we increase the DEFAULT_MAX_POOL_SIZE to a higher level. Is this correct?

I like the idea of keeping the rescue clause. I believe it provides a reasonable fallback behavior for any number of unpredictable scenarios. I have no strong preference with DEFAULT_MAX_POOL_SIZE. This should never be an issue on JRuby since apps on that platform should always use the Java thread pools. The max on Rbx may be higher but I'm not too concerned with unintentionally limiting applications on that platform. 2046 is still a lot of threads. I'll do some experimenting to see if Rbx can go higher.

@jdantonio
Copy link
Member Author

FWIW I did a quick test on several platforms and all give up after creating ~2000 threads on my MacBook Pro. I'm going to try on my Linux machine when I get home, because now I am curious. Here are the numbers:

  • jruby-1.7.19 2023
  • jruby-head 2023
  • rbx-2.5.2 2039
  • ruby-1.9.3-p551 2046
  • ruby-2.0.0-p598 2046
  • ruby-2.1.1 2046
  • ruby-2.2.0 2046
  • ruby-2.2.2 2046

Here is my test code:

foo = []
1_000_000.times { foo << Thread.new{ sleep(1)} } #=> ThreadError: unable to create new native thread
foo.length

@jdantonio
Copy link
Member Author

Not being a systems programmer, the question "how many threads per process will operating system X allow?" is something I never had reason to ask. Apparently, many others have. The TL;DR is "it depends." But the 2000 cap I experienced in my earlier tests seems to be an expected limit:

@ianks
Copy link
Contributor

ianks commented Jun 5, 2015

yeah for me: $ cat /proc/sys/kernel/threads-max #=> 127713

@jdantonio
Copy link
Member Author

I've done some research and we have an answer.

The maximum number of threads that can be created is set by the operating system. There is no way for us to reliably determine this value within out code. Moreover, the cap would either be a maximum number for the process or for all processes. Even if we could determine a number we wouldn't be able to set a reliable max for any given thread pool because the max would have to, at best, need to be set for all thread pools we create, plus any threads spawned by other code in the application. This is impossible.

Next I dug into the Java source code for java.util.concurrent.ThreadPoolExecutor. It turns out that--quite coincidentally--my solution is exactly what Java does.

Internally, Java's ThreadPoolExecutor uses java.util.concurrent.ThreadFactory to create new threads. When the operating systems refuses to create a new thread the factory returns null. When this happens ThreadPoolExecutor simply triggers the current RejectedExecutionHandler (which we call :fallback_policy) and moves on. Which is exactly what this PR does.

With respect to the maximum number of threads that can be set on a given thread pool, Java's ThreadPoolExecutor sets the cap at java.lang.Integer::MAX_VALUE. This is as close to infinity as Java can get. This has always been the maximum for our JavaThreadPoolExecutor. For RubyThreadPoolExecutor it was incorrect of me to set a lower value in hopes of never getting a thread creation exception. The correct behavior is to set an (effectively) infinite value and let the system handle things. I considered setting the max value in RubyThreadPoolExecutor to Infinity (1 / 0.0, for those who may not be aware), but Infinity cannot be cast as in integer with to_i and I was concerned that this could cause weird edge case exceptions. Instead I set the max equal to the max on JRuby. I aslo updated the tests to reflect this.

With respect to logging, I am choosing not to log when there is a thread creation error. The reason is that we cannot log thread creation error on JRuby. Java's ThreadPoolExecutor never raises an exception, logs, or tells the user that anything happened. It suppresses the error and silently enacts the fallback policy. Our RubyThreadPoolExecutor must exhibit the same behavior as our JavaThreadPoolExecutor so we cannot log in either.

jdantonio added a commit that referenced this pull request Jun 6, 2015
Prevent Ruby thread pools from creating too many threads.
@jdantonio jdantonio merged commit 24336f5 into master Jun 6, 2015
@jdantonio jdantonio deleted the mri-max-threads branch June 6, 2015 15:33
@pitr-ch
Copy link
Member

pitr-ch commented Jun 6, 2015

@jdantonio thanks for the detailed information. This is what I was thinking and suggesting 👍. Just a note: There may be valid use-cases when concurrent-ruby is running on JRuby on some really big server.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants