Skip to content

Fix a deadlock in ModuleList when starting a standalone lldb client/server #148774

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

qxy11
Copy link
Contributor

@qxy11 qxy11 commented Jul 15, 2025

Summary:
There was a deadlock was introduced by PR #146441 which changed CurrentThreadIsPrivateStateThread() to CurrentThreadPosesAsPrivateStateThread(). This change caused the execution path in ExecutionContextRef::SetTargetPtr() to now enter a code block that was previously skipped, triggering GetSelectedFrame() which leads to a deadlock.

Thread 1 gets m_modules_mutex in ModuleList::AppendImpl, Thread 3 gets m_language_runtimes_mutex in GetLanguageRuntime, but then Thread 1 waits for m_language_runtimes_mutex in GetLanguageRuntime while Thread 3 waits for m_modules_mutex in ScanForGNUstepObjCLibraryCandidate.

This fixes the deadlock by adding a scoped block around the mutex lock before the call to the notifier, and moved the notifier call outside of the mutex-guarded section. The notifier call NotifyModuleAdded should be thread-safe, since the module should be added to the ModuleList before the mutex is released, and the notifier doesn't modify the module list further, and the call is operates on local state and the Target instance.

Deadlocked Thread backtraces:

* thread #3, name = 'dbg.evt-handler', stop reason = signal SIGSTOP
  * frame #0: 0x00007f2f1e2973dc libc.so.6`futex_wait(private=0, expected=2, futex_word=0x0000563786bd5f40) at    futex-internal.h:146:13
   /*... a bunch of mutex related bt ... */    
   liblldb.so.21.0git`std::lock_guard<std::recursive_mutex>::lock_guard(this=0x00007f2f0f1927b0, __m=0x0000563786bd5f40) at std_mutex.h:229:19
    frame #8: 0x00007f2f27946eb7 liblldb.so.21.0git`ScanForGNUstepObjCLibraryCandidate(modules=0x0000563786bd5f28, TT=0x0000563786bd5eb8) at GNUstepObjCRuntime.cpp:60:41
    frame #9: 0x00007f2f27946c80 liblldb.so.21.0git`lldb_private::GNUstepObjCRuntime::CreateInstance(process=0x0000563785e1d360, language=eLanguageTypeObjC) at GNUstepObjCRuntime.cpp:87:8
    frame #10: 0x00007f2f2746fca5 liblldb.so.21.0git`lldb_private::LanguageRuntime::FindPlugin(process=0x0000563785e1d360, language=eLanguageTypeObjC) at LanguageRuntime.cpp:210:36
    frame #11: 0x00007f2f2742c9e3 liblldb.so.21.0git`lldb_private::Process::GetLanguageRuntime(this=0x0000563785e1d360, language=eLanguageTypeObjC) at Process.cpp:1516:9
    ...
    frame #21: 0x00007f2f2750b5cc liblldb.so.21.0git`lldb_private::Thread::GetSelectedFrame(this=0x0000563785e064d0, select_most_relevant=DoNoSelectMostRelevantFrame) at Thread.cpp:274:48
    frame #22: 0x00007f2f273f9957 liblldb.so.21.0git`lldb_private::ExecutionContextRef::SetTargetPtr(this=0x00007f2f0f193778, target=0x0000563786bd5be0, adopt_selected=true) at ExecutionContext.cpp:525:32
    frame #23: 0x00007f2f273f9714 liblldb.so.21.0git`lldb_private::ExecutionContextRef::ExecutionContextRef(this=0x00007f2f0f193778, target=0x0000563786bd5be0, adopt_selected=true) at ExecutionContext.cpp:413:3
    frame #24: 0x00007f2f270e80af liblldb.so.21.0git`lldb_private::Debugger::GetSelectedExecutionContext(this=0x0000563785d83bc0) at Debugger.cpp:1225:23
    frame #25: 0x00007f2f271bb7fd liblldb.so.21.0git`lldb_private::Statusline::Redraw(this=0x0000563785d83f30, update=true) at Statusline.cpp:136:41
    ...
* thread #1, name = 'lldb', stop reason = signal SIGSTOP
  * frame #0: 0x00007f2f1e2973dc libc.so.6`futex_wait(private=0, expected=2, futex_word=0x0000563785e1dd98) at futex-internal.h:146:13
   /*... a bunch of mutex related bt ... */    
   liblldb.so.21.0git`std::lock_guard<std::recursive_mutex>::lock_guard(this=0x00007ffe62be0488, __m=0x0000563785e1dd98) at std_mutex.h:229:19
    frame #8: 0x00007f2f2742c8d1 liblldb.so.21.0git`lldb_private::Process::GetLanguageRuntime(this=0x0000563785e1d360, language=eLanguageTypeC_plus_plus) at Process.cpp:1510:41
    frame #9: 0x00007f2f2743c46f liblldb.so.21.0git`lldb_private::Process::ModulesDidLoad(this=0x0000563785e1d360, module_list=0x00007ffe62be06a0) at Process.cpp:6082:36
    ...
    frame #13: 0x00007f2f2715cf03 liblldb.so.21.0git`lldb_private::ModuleList::AppendImpl(this=0x0000563786bd5f28, module_sp=ptr = 0x563785cec560, use_notifier=true) at ModuleList.cpp:246:19
    frame #14: 0x00007f2f2715cf4c liblldb.so.21.0git`lldb_private::ModuleList::Append(this=0x0000563786bd5f28, module_sp=ptr = 0x563785cec560, notify=true) at ModuleList.cpp:251:3
    ...
    frame #19: 0x00007f2f274349b3 liblldb.so.21.0git`lldb_private::Process::ConnectRemote(this=0x0000563785e1d360, remote_url=(Data = "connect://localhost:1234", Length = 24)) at Process.cpp:3250:9
    frame #20: 0x00007f2f27411e0e liblldb.so.21.0git`lldb_private::Platform::DoConnectProcess(this=0x0000563785c59990, connect_url=(Data = "connect://localhost:1234", Length = 24), plugin_name=(Data = "gdb-remote", Length = 10), debugger=0x0000563785d83bc0, stream=0x00007ffe62be3128, target=0x0000563786bd5be0, error=0x00007ffe62be1ca0) at Platform.cpp:1926:23

Test Plan:

Built a hello world a.out
Run server in one terminal:

~/llvm/build/Debug/bin/lldb-server g :1234 a.out

Run client in another terminal

~/llvm/build/Debug/bin/lldb -o "gdb-remote 1234" -o "b hello.cc:3"

Before:
Client hangs indefinitely

~/llvm/build/Debug/bin/lldb -o "gdb-remote 1234" -o "b main"
(lldb) gdb-remote 1234

^C^C

After:

~/llvm/build/Debug/bin/lldb -o "gdb-remote 1234" -o "b hello.cc:3"
(lldb) gdb-remote 1234
Process 837068 stopped
* thread #1, name = 'a.out', stop reason = signal SIGSTOP
    frame #0: 0x00007ffff7fe4a60
ld-linux-x86-64.so.2`_start:
->  0x7ffff7fe4a60 <+0>: movq   %rsp, %rdi
    0x7ffff7fe4a63 <+3>: callq  0x7ffff7fe5780 ; _dl_start at rtld.c:522:1

ld-linux-x86-64.so.2`_dl_start_user:
    0x7ffff7fe4a68 <+0>: movq   %rax, %r12
    0x7ffff7fe4a6b <+3>: movl   0x18067(%rip), %eax ; _dl_skip_args
(lldb) b hello.cc:3
Breakpoint 1: where = a.out`main + 15 at hello.cc:4:13, address = 0x00005555555551bf
(lldb) c
Process 837068 resuming
Process 837068 stopped
* thread #1, name = 'a.out', stop reason = breakpoint 1.1
    frame #0: 0x00005555555551bf a.out`main at hello.cc:4:13
   1   	#include <iostream>
   2
   3   	int main() {
-> 4   	  std::cout << "Hello World" << std::endl;
   5   	  return 0;
   6   	}

Summary:
There was a deadlock was introduced by [PR llvm#146441](llvm#146441) which changed `CurrentThreadIsPrivateStateThread()` to `CurrentThreadPosesAsPrivateStateThread()`. This change caused the execution path in `ExecutionContextRef::SetTargetPtr()` to now enter a code block that was previously skipped, triggering `GetSelectedFrame()` which leads to a deadlock.

In particular,  one thread held `m_modules_mutex` and tried to acquire `m_language_runtimes_mutex` (via the notifier call chain), and another thread held `m_language_runtimes_mutex` and tried to acquire `m_modules_mutex` (via `ScanForGNUstepObjCLibraryCandidate`)

This fixes the deadlock by adding a scoped block around the mutex lock before the call to the notifier, and moved the notifier call outside of the mutex-guarded section.

Test Plan:
Tested manually
@qxy11 qxy11 requested a review from JDevlieghere as a code owner July 15, 2025 03:45
@llvmbot llvmbot added the lldb label Jul 15, 2025
@llvmbot
Copy link
Member

llvmbot commented Jul 15, 2025

@llvm/pr-subscribers-lldb

Author: None (qxy11)

Changes

Summary:
There was a deadlock was introduced by PR #146441 which changed CurrentThreadIsPrivateStateThread() to CurrentThreadPosesAsPrivateStateThread(). This change caused the execution path in ExecutionContextRef::SetTargetPtr() to now enter a code block that was previously skipped, triggering GetSelectedFrame() which leads to a deadlock.

Thread 1 gets m_modules_mutex in ModuleList::AppendImpl, Thread 3 gets m_language_runtimes_mutex in GetLanguageRuntime, but then Thread 1 waits for m_language_runtimes_mutex in GetLanguageRuntime while Thread 3 waits for m_modules_mutex in ScanForGNUstepObjCLibraryCandidate.

This fixes the deadlock by adding a scoped block around the mutex lock before the call to the notifier, and moved the notifier call outside of the mutex-guarded section. The notifier call NotifyModuleAdded should be thread-safe, since the module should be added to the ModuleList before the mutex is released, and the notifier doesn't modify the module list further, and the call is operates on local state and the Target instance.

Deadlocked Thread backtraces:

* thread #<!-- -->3, name = 'dbg.evt-handler', stop reason = signal SIGSTOP
  * frame #<!-- -->0: 0x00007f2f1e2973dc libc.so.6`futex_wait(private=0, expected=2, futex_word=0x0000563786bd5f40) at    futex-internal.h:146:13
   /*... a bunch of mutex related bt ... */    
   liblldb.so.21.0git`std::lock_guard&lt;std::recursive_mutex&gt;::lock_guard(this=0x00007f2f0f1927b0, __m=0x0000563786bd5f40) at std_mutex.h:229:19
    frame #<!-- -->8: 0x00007f2f27946eb7 liblldb.so.21.0git`ScanForGNUstepObjCLibraryCandidate(modules=0x0000563786bd5f28, TT=0x0000563786bd5eb8) at GNUstepObjCRuntime.cpp:60:41
    frame #<!-- -->9: 0x00007f2f27946c80 liblldb.so.21.0git`lldb_private::GNUstepObjCRuntime::CreateInstance(process=0x0000563785e1d360, language=eLanguageTypeObjC) at GNUstepObjCRuntime.cpp:87:8
    frame #<!-- -->10: 0x00007f2f2746fca5 liblldb.so.21.0git`lldb_private::LanguageRuntime::FindPlugin(process=0x0000563785e1d360, language=eLanguageTypeObjC) at LanguageRuntime.cpp:210:36
    frame #<!-- -->11: 0x00007f2f2742c9e3 liblldb.so.21.0git`lldb_private::Process::GetLanguageRuntime(this=0x0000563785e1d360, language=eLanguageTypeObjC) at Process.cpp:1516:9
    ...
    frame #<!-- -->21: 0x00007f2f2750b5cc liblldb.so.21.0git`lldb_private::Thread::GetSelectedFrame(this=0x0000563785e064d0, select_most_relevant=DoNoSelectMostRelevantFrame) at Thread.cpp:274:48
    frame #<!-- -->22: 0x00007f2f273f9957 liblldb.so.21.0git`lldb_private::ExecutionContextRef::SetTargetPtr(this=0x00007f2f0f193778, target=0x0000563786bd5be0, adopt_selected=true) at ExecutionContext.cpp:525:32
    frame #<!-- -->23: 0x00007f2f273f9714 liblldb.so.21.0git`lldb_private::ExecutionContextRef::ExecutionContextRef(this=0x00007f2f0f193778, target=0x0000563786bd5be0, adopt_selected=true) at ExecutionContext.cpp:413:3
    frame #<!-- -->24: 0x00007f2f270e80af liblldb.so.21.0git`lldb_private::Debugger::GetSelectedExecutionContext(this=0x0000563785d83bc0) at Debugger.cpp:1225:23
    frame #<!-- -->25: 0x00007f2f271bb7fd liblldb.so.21.0git`lldb_private::Statusline::Redraw(this=0x0000563785d83f30, update=true) at Statusline.cpp:136:41
    ...
* thread #<!-- -->1, name = 'lldb', stop reason = signal SIGSTOP
  * frame #<!-- -->0: 0x00007f2f1e2973dc libc.so.6`futex_wait(private=0, expected=2, futex_word=0x0000563785e1dd98) at futex-internal.h:146:13
   /*... a bunch of mutex related bt ... */    
   liblldb.so.21.0git`std::lock_guard&lt;std::recursive_mutex&gt;::lock_guard(this=0x00007ffe62be0488, __m=0x0000563785e1dd98) at std_mutex.h:229:19
    frame #<!-- -->8: 0x00007f2f2742c8d1 liblldb.so.21.0git`lldb_private::Process::GetLanguageRuntime(this=0x0000563785e1d360, language=eLanguageTypeC_plus_plus) at Process.cpp:1510:41
    frame #<!-- -->9: 0x00007f2f2743c46f liblldb.so.21.0git`lldb_private::Process::ModulesDidLoad(this=0x0000563785e1d360, module_list=0x00007ffe62be06a0) at Process.cpp:6082:36
    ...
    frame #<!-- -->13: 0x00007f2f2715cf03 liblldb.so.21.0git`lldb_private::ModuleList::AppendImpl(this=0x0000563786bd5f28, module_sp=ptr = 0x563785cec560, use_notifier=true) at ModuleList.cpp:246:19
    frame #<!-- -->14: 0x00007f2f2715cf4c liblldb.so.21.0git`lldb_private::ModuleList::Append(this=0x0000563786bd5f28, module_sp=ptr = 0x563785cec560, notify=true) at ModuleList.cpp:251:3
    ...
    frame #<!-- -->19: 0x00007f2f274349b3 liblldb.so.21.0git`lldb_private::Process::ConnectRemote(this=0x0000563785e1d360, remote_url=(Data = "connect://localhost:1234", Length = 24)) at Process.cpp:3250:9
    frame #<!-- -->20: 0x00007f2f27411e0e liblldb.so.21.0git`lldb_private::Platform::DoConnectProcess(this=0x0000563785c59990, connect_url=(Data = "connect://localhost:1234", Length = 24), plugin_name=(Data = "gdb-remote", Length = 10), debugger=0x0000563785d83bc0, stream=0x00007ffe62be3128, target=0x0000563786bd5be0, error=0x00007ffe62be1ca0) at Platform.cpp:1926:23

Test Plan:

Built a hello world a.out
Run server in one terminal:

~/llvm/build/Debug/bin/lldb-server g :1234 a.out

Run client in another terminal

~/llvm/build/Debug/bin/lldb -o "gdb-remote 1234" -o "b hello.cc:3"

Before:
Client hangs indefinitely

~/llvm/build/Debug/bin/lldb -o "gdb-remote 1234" -o "b main"
(lldb) gdb-remote 1234

^C^C

After:

~/llvm/build/Debug/bin/lldb -o "gdb-remote 1234" -o "b hello.cc:3"
(lldb) gdb-remote 1234
Process 837068 stopped
* thread #<!-- -->1, name = 'a.out', stop reason = signal SIGSTOP
    frame #<!-- -->0: 0x00007ffff7fe4a60
ld-linux-x86-64.so.2`_start:
-&gt;  0x7ffff7fe4a60 &lt;+0&gt;: movq   %rsp, %rdi
    0x7ffff7fe4a63 &lt;+3&gt;: callq  0x7ffff7fe5780 ; _dl_start at rtld.c:522:1

ld-linux-x86-64.so.2`_dl_start_user:
    0x7ffff7fe4a68 &lt;+0&gt;: movq   %rax, %r12
    0x7ffff7fe4a6b &lt;+3&gt;: movl   0x18067(%rip), %eax ; _dl_skip_args
(lldb) b hello.cc:3
Breakpoint 1: where = a.out`main + 15 at hello.cc:4:13, address = 0x00005555555551bf
(lldb) c
Process 837068 resuming
Process 837068 stopped
* thread #<!-- -->1, name = 'a.out', stop reason = breakpoint 1.1
    frame #<!-- -->0: 0x00005555555551bf a.out`main at hello.cc:4:13
   1   	#include &lt;iostream&gt;
   2
   3   	int main() {
-&gt; 4   	  std::cout &lt;&lt; "Hello World" &lt;&lt; std::endl;
   5   	  return 0;
   6   	}

Full diff: https://github.com/llvm/llvm-project/pull/148774.diff

1 Files Affected:

  • (modified) lldb/source/Core/ModuleList.cpp (+25-21)
diff --git a/lldb/source/Core/ModuleList.cpp b/lldb/source/Core/ModuleList.cpp
index d5ddf6e846112..4ec093b5bc5b4 100644
--- a/lldb/source/Core/ModuleList.cpp
+++ b/lldb/source/Core/ModuleList.cpp
@@ -215,30 +215,34 @@ ModuleList::~ModuleList() = default;
 
 void ModuleList::AppendImpl(const ModuleSP &module_sp, bool use_notifier) {
   if (module_sp) {
-    std::lock_guard<std::recursive_mutex> guard(m_modules_mutex);
-    // We are required to keep the first element of the Module List as the
-    // executable module.  So check here and if the first module is NOT an 
-    // but the new one is, we insert this module at the beginning, rather than 
-    // at the end.
-    // We don't need to do any of this if the list is empty:
-    if (m_modules.empty()) {
-      m_modules.push_back(module_sp);
-    } else {
-      // Since producing the ObjectFile may take some work, first check the 0th
-      // element, and only if that's NOT an executable look at the incoming
-      // ObjectFile.  That way in the normal case we only look at the element
-      // 0 ObjectFile. 
-      const bool elem_zero_is_executable 
-          = m_modules[0]->GetObjectFile()->GetType() 
-              == ObjectFile::Type::eTypeExecutable;
-      lldb_private::ObjectFile *obj = module_sp->GetObjectFile();
-      if (!elem_zero_is_executable && obj 
-          && obj->GetType() == ObjectFile::Type::eTypeExecutable) {
-        m_modules.insert(m_modules.begin(), module_sp);
-      } else {
+    {
+      std::lock_guard<std::recursive_mutex> guard(m_modules_mutex);
+      // We are required to keep the first element of the Module List as the
+      // executable module.  So check here and if the first module is NOT an
+      // but the new one is, we insert this module at the beginning, rather than
+      // at the end.
+      // We don't need to do any of this if the list is empty:
+      if (m_modules.empty()) {
         m_modules.push_back(module_sp);
+      } else {
+        // Since producing the ObjectFile may take some work, first check the
+        // 0th element, and only if that's NOT an executable look at the
+        // incoming ObjectFile.  That way in the normal case we only look at the
+        // element 0 ObjectFile.
+        const bool elem_zero_is_executable =
+            m_modules[0]->GetObjectFile()->GetType() ==
+            ObjectFile::Type::eTypeExecutable;
+        lldb_private::ObjectFile *obj = module_sp->GetObjectFile();
+        if (!elem_zero_is_executable && obj &&
+            obj->GetType() == ObjectFile::Type::eTypeExecutable) {
+          m_modules.insert(m_modules.begin(), module_sp);
+        } else {
+          m_modules.push_back(module_sp);
+        }
       }
     }
+    // Release the mutex before calling the notifier to avoid deadlock
+    // NotifyModuleAdded should be thread-safe
     if (use_notifier && m_notifier)
       m_notifier->NotifyModuleAdded(*this, module_sp);
   }

@jasonmolenda jasonmolenda requested a review from jimingham July 15, 2025 03:50
Copy link
Collaborator

@clayborg clayborg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This patch looks good to me unless we are trying to protect the ModuleList contents during the notification to ensure it doesn't change before the notification has been delivered. Jim and Jonas? Thoughts?

@dmpots
Copy link
Contributor

dmpots commented Jul 15, 2025

Would it be possible to add a test that triggers this bug? It looks like a fairly simple scenario that launches lldb-server and then attaches to it. Seems like good coverage to have in the test if we don't already. Also, is this a 100% reproducable deadlock or is it somewhat non-deterministic?

@@ -215,30 +215,34 @@ ModuleList::~ModuleList() = default;

void ModuleList::AppendImpl(const ModuleSP &module_sp, bool use_notifier) {
if (module_sp) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you're already touching this, let's make this an early return to offset the indentation for the lexical block.

Suggested change
if (module_sp) {
if (!module_sp)
return;

Copy link
Collaborator

@jimingham jimingham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this. Keeping the module list locked too long is one of the easiest ways to produce deadlocks in lldb at present!

I can't see a reason why the work done when lldb is notified of a new module should require that the module list be locked at the point where this module was loaded until the notification is complete. For the most part the notification reactions are things like adding breakpoints found in the new module, or seeing if it indicates the presence of one of the known runtimes - so specific to just the module_sp that was passed in. It shouldn't need to require that no more modules get loaded while it is doing that.

So this looks like a good change to me.

It would be great to get a test for this. The test I added does something like your repro conditions, but because it was specifically about python files in dSYM's it was a macOS only test. But you might be able to use the running parts without the dSYM parts to test this?

@jimingham
Copy link
Collaborator

Actually, the test will be a little harder than that because the deadlock comes between the main lldb work and async work done by the status line. So my test, which just runs lldb in Python, wouldn't have shown that error.

I wonder if it would be possible to have a version of the status line that is always running, just discarding its output if there's no Terminal to write it to. As it stands, the test suite isn't doing exercising the interaction between lldb and the status line filling thread nearly as much as our users now are...

qxy11 added 3 commits July 17, 2025 15:32
Summary:
Added a unit test that would've failed prior to the fix and passes now.

The test launches lldb-server, and connects the client to it with the statusline enabled to trigger the deadlock. I was not able to reproduce the issue reliably with a mock gdb-server.

Test Plan:
```
ninja lldb-dotest && bin/lldb-dotest -f TestStatusline.test_modulelist_deadlock ~/llvm/llvm-project/lldb/test/API/functionalities/statusline/
```
Summary:
Address comment to return if !module_sp
Copy link

github-actions bot commented Jul 17, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@qxy11
Copy link
Contributor Author

qxy11 commented Jul 17, 2025

I added a regression test that fails before the fix in the statusline tests. I was trying to mock theserver side behavior, but it doesn't reliably reproduce the deadlock, so I ended up going with starting the actual lldb-server in the test. Open to any suggestions for whether there's a better way to go about testing this. cc @dmpots @jimingham

Copy link

github-actions bot commented Jul 18, 2025

✅ With the latest revision this PR passed the Python code formatter.

@qxy11 qxy11 force-pushed the fix-deadlock-in-module-list branch from cb2cec0 to 5cf94e9 Compare July 18, 2025 03:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants