-
Notifications
You must be signed in to change notification settings - Fork 14.6k
Fix a deadlock in ModuleList when starting a standalone lldb client/server #148774
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Summary: There was a deadlock was introduced by [PR llvm#146441](llvm#146441) which changed `CurrentThreadIsPrivateStateThread()` to `CurrentThreadPosesAsPrivateStateThread()`. This change caused the execution path in `ExecutionContextRef::SetTargetPtr()` to now enter a code block that was previously skipped, triggering `GetSelectedFrame()` which leads to a deadlock. In particular, one thread held `m_modules_mutex` and tried to acquire `m_language_runtimes_mutex` (via the notifier call chain), and another thread held `m_language_runtimes_mutex` and tried to acquire `m_modules_mutex` (via `ScanForGNUstepObjCLibraryCandidate`) This fixes the deadlock by adding a scoped block around the mutex lock before the call to the notifier, and moved the notifier call outside of the mutex-guarded section. Test Plan: Tested manually
@llvm/pr-subscribers-lldb Author: None (qxy11) ChangesSummary: Thread 1 gets m_modules_mutex in This fixes the deadlock by adding a scoped block around the mutex lock before the call to the notifier, and moved the notifier call outside of the mutex-guarded section. The notifier call Deadlocked Thread backtraces:
Test Plan:Built a hello world a.out
Run client in another terminal
Before:
After:
Full diff: https://github.com/llvm/llvm-project/pull/148774.diff 1 Files Affected:
diff --git a/lldb/source/Core/ModuleList.cpp b/lldb/source/Core/ModuleList.cpp
index d5ddf6e846112..4ec093b5bc5b4 100644
--- a/lldb/source/Core/ModuleList.cpp
+++ b/lldb/source/Core/ModuleList.cpp
@@ -215,30 +215,34 @@ ModuleList::~ModuleList() = default;
void ModuleList::AppendImpl(const ModuleSP &module_sp, bool use_notifier) {
if (module_sp) {
- std::lock_guard<std::recursive_mutex> guard(m_modules_mutex);
- // We are required to keep the first element of the Module List as the
- // executable module. So check here and if the first module is NOT an
- // but the new one is, we insert this module at the beginning, rather than
- // at the end.
- // We don't need to do any of this if the list is empty:
- if (m_modules.empty()) {
- m_modules.push_back(module_sp);
- } else {
- // Since producing the ObjectFile may take some work, first check the 0th
- // element, and only if that's NOT an executable look at the incoming
- // ObjectFile. That way in the normal case we only look at the element
- // 0 ObjectFile.
- const bool elem_zero_is_executable
- = m_modules[0]->GetObjectFile()->GetType()
- == ObjectFile::Type::eTypeExecutable;
- lldb_private::ObjectFile *obj = module_sp->GetObjectFile();
- if (!elem_zero_is_executable && obj
- && obj->GetType() == ObjectFile::Type::eTypeExecutable) {
- m_modules.insert(m_modules.begin(), module_sp);
- } else {
+ {
+ std::lock_guard<std::recursive_mutex> guard(m_modules_mutex);
+ // We are required to keep the first element of the Module List as the
+ // executable module. So check here and if the first module is NOT an
+ // but the new one is, we insert this module at the beginning, rather than
+ // at the end.
+ // We don't need to do any of this if the list is empty:
+ if (m_modules.empty()) {
m_modules.push_back(module_sp);
+ } else {
+ // Since producing the ObjectFile may take some work, first check the
+ // 0th element, and only if that's NOT an executable look at the
+ // incoming ObjectFile. That way in the normal case we only look at the
+ // element 0 ObjectFile.
+ const bool elem_zero_is_executable =
+ m_modules[0]->GetObjectFile()->GetType() ==
+ ObjectFile::Type::eTypeExecutable;
+ lldb_private::ObjectFile *obj = module_sp->GetObjectFile();
+ if (!elem_zero_is_executable && obj &&
+ obj->GetType() == ObjectFile::Type::eTypeExecutable) {
+ m_modules.insert(m_modules.begin(), module_sp);
+ } else {
+ m_modules.push_back(module_sp);
+ }
}
}
+ // Release the mutex before calling the notifier to avoid deadlock
+ // NotifyModuleAdded should be thread-safe
if (use_notifier && m_notifier)
m_notifier->NotifyModuleAdded(*this, module_sp);
}
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This patch looks good to me unless we are trying to protect the ModuleList contents during the notification to ensure it doesn't change before the notification has been delivered. Jim and Jonas? Thoughts?
Would it be possible to add a test that triggers this bug? It looks like a fairly simple scenario that launches lldb-server and then attaches to it. Seems like good coverage to have in the test if we don't already. Also, is this a 100% reproducable deadlock or is it somewhat non-deterministic? |
lldb/source/Core/ModuleList.cpp
Outdated
@@ -215,30 +215,34 @@ ModuleList::~ModuleList() = default; | |||
|
|||
void ModuleList::AppendImpl(const ModuleSP &module_sp, bool use_notifier) { | |||
if (module_sp) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you're already touching this, let's make this an early return to offset the indentation for the lexical block.
if (module_sp) { | |
if (!module_sp) | |
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching this. Keeping the module list locked too long is one of the easiest ways to produce deadlocks in lldb at present!
I can't see a reason why the work done when lldb is notified of a new module should require that the module list be locked at the point where this module was loaded until the notification is complete. For the most part the notification reactions are things like adding breakpoints found in the new module, or seeing if it indicates the presence of one of the known runtimes - so specific to just the module_sp that was passed in. It shouldn't need to require that no more modules get loaded while it is doing that.
So this looks like a good change to me.
It would be great to get a test for this. The test I added does something like your repro conditions, but because it was specifically about python files in dSYM's it was a macOS only test. But you might be able to use the running parts without the dSYM parts to test this?
Actually, the test will be a little harder than that because the deadlock comes between the main lldb work and async work done by the status line. So my test, which just runs lldb in Python, wouldn't have shown that error. I wonder if it would be possible to have a version of the status line that is always running, just discarding its output if there's no Terminal to write it to. As it stands, the test suite isn't doing exercising the interaction between lldb and the status line filling thread nearly as much as our users now are... |
Summary: Added a unit test that would've failed prior to the fix and passes now. The test launches lldb-server, and connects the client to it with the statusline enabled to trigger the deadlock. I was not able to reproduce the issue reliably with a mock gdb-server. Test Plan: ``` ninja lldb-dotest && bin/lldb-dotest -f TestStatusline.test_modulelist_deadlock ~/llvm/llvm-project/lldb/test/API/functionalities/statusline/ ```
Summary: Address comment to return if !module_sp
✅ With the latest revision this PR passed the C/C++ code formatter. |
I added a regression test that fails before the fix in the statusline tests. I was trying to mock theserver side behavior, but it doesn't reliably reproduce the deadlock, so I ended up going with starting the actual lldb-server in the test. Open to any suggestions for whether there's a better way to go about testing this. cc @dmpots @jimingham |
✅ With the latest revision this PR passed the Python code formatter. |
cb2cec0
to
5cf94e9
Compare
Summary:
There was a deadlock was introduced by PR #146441 which changed
CurrentThreadIsPrivateStateThread()
toCurrentThreadPosesAsPrivateStateThread()
. This change caused the execution path inExecutionContextRef::SetTargetPtr()
to now enter a code block that was previously skipped, triggeringGetSelectedFrame()
which leads to a deadlock.Thread 1 gets m_modules_mutex in
ModuleList::AppendImpl
, Thread 3 gets m_language_runtimes_mutex inGetLanguageRuntime
, but then Thread 1 waits for m_language_runtimes_mutex inGetLanguageRuntime
while Thread 3 waits for m_modules_mutex inScanForGNUstepObjCLibraryCandidate
.This fixes the deadlock by adding a scoped block around the mutex lock before the call to the notifier, and moved the notifier call outside of the mutex-guarded section. The notifier call
NotifyModuleAdded
should be thread-safe, since the module should be added to theModuleList
before the mutex is released, and the notifier doesn't modify the module list further, and the call is operates on local state and theTarget
instance.Deadlocked Thread backtraces:
Test Plan:
Built a hello world a.out
Run server in one terminal:
Run client in another terminal
Before:
Client hangs indefinitely
After: