blob: 6cd2395339f3709009d64463cdd7c445000b528b [file] [log] [blame]
Kenny Yu66fb4d22017-02-02 10:05:31 -08001Demonstrations of deadlock_detector.
2
3This program detects potential deadlocks on a running process. The program
4attaches uprobes on `pthread_mutex_lock` and `pthread_mutex_unlock` to build
5a mutex wait directed graph, and then looks for a cycle in this graph. This
6graph has the following properties:
7
8- Nodes in the graph represent mutexes.
9- Edge (A, B) exists if there exists some thread T where lock(A) was called
10 and lock(B) was called before unlock(A) was called.
11
12If there is a cycle in this graph, this indicates that there is a lock order
13inversion (potential deadlock). If the program finds a lock order inversion, the
14program will dump the cycle of mutexes, dump the stack traces where each mutex
15was acquired, and then exit.
16
Kenny Yue7dff432017-02-03 09:39:39 -080017This program can only find potential deadlocks that occur while the program
18is tracing the process. It cannot find deadlocks that may have occurred
19before the program was attached to the process.
20
21Since this traces all mutex lock and unlock events and all thread creation
22events on the traced process, the overhead of this bpf program can be very
23high if the process has many threads and mutexes. You should only run this on
24a process where the slowdown is acceptable.
Kenny Yu66fb4d22017-02-02 10:05:31 -080025
26Note: This tool does not work for shared mutexes or recursive mutexes.
27
28For shared (read-write) mutexes, a deadlock requires a cycle in the wait
29graph where at least one of the mutexes in the cycle is acquiring exclusive
30(write) ownership.
31
32For recursive mutexes, lock() is called multiple times on the same mutex.
33However, there is no way to determine if a mutex is a recursive mutex
34after the mutex has been created. As a result, this tool will not find
35potential deadlocks that involve only one mutex.
36
37
Kenny Yuc44e92c2017-02-03 18:50:33 -080038# ./deadlock_detector.py 181
Kenny Yu66fb4d22017-02-02 10:05:31 -080039Tracing... Hit Ctrl-C to end.
Kenny Yu66fb4d22017-02-02 10:05:31 -080040----------------
41Potential Deadlock Detected!
42
Kenny Yue7dff432017-02-03 09:39:39 -080043Cycle in lock order graph: Mutex M0 (main::static_mutex3 0x0000000000473c60) => Mutex M1 (0x00007fff6d738400) => Mutex M2 (global_mutex1 0x0000000000473be0) => Mutex M3 (global_mutex2 0x0000000000473c20) => Mutex M0 (main::static_mutex3 0x0000000000473c60)
Kenny Yu66fb4d22017-02-02 10:05:31 -080044
Kenny Yue7dff432017-02-03 09:39:39 -080045Mutex M1 (0x00007fff6d738400) acquired here while holding Mutex M0 (main::static_mutex3 0x0000000000473c60) in Thread 357250 (lockinversion):
46@ 00000000004024d0 pthread_mutex_lock
47@ 0000000000406dd0 std::mutex::lock()
48@ 00000000004070d2 std::lock_guard<std::mutex>::lock_guard(std::mutex&)
49@ 0000000000402e38 main::{lambda()#3}::operator()() const
50@ 0000000000406ba8 void std::_Bind_simple<main::{lambda()#3} ()>::_M_invoke<>(std::_Index_tuple<>)
51@ 0000000000406951 std::_Bind_simple<main::{lambda()#3} ()>::operator()()
52@ 000000000040673a std::thread::_Impl<std::_Bind_simple<main::{lambda()#3} ()> >::_M_run()
53@ 00007fd4496564e1 execute_native_thread_routine
54@ 00007fd449dd57f1 start_thread
55@ 00007fd44909746d __clone
Kenny Yu66fb4d22017-02-02 10:05:31 -080056
Kenny Yue7dff432017-02-03 09:39:39 -080057Mutex M0 (main::static_mutex3 0x0000000000473c60) previously acquired by the same Thread 357250 (lockinversion) here:
58@ 00000000004024d0 pthread_mutex_lock
59@ 0000000000406dd0 std::mutex::lock()
60@ 00000000004070d2 std::lock_guard<std::mutex>::lock_guard(std::mutex&)
61@ 0000000000402e22 main::{lambda()#3}::operator()() const
62@ 0000000000406ba8 void std::_Bind_simple<main::{lambda()#3} ()>::_M_invoke<>(std::_Index_tuple<>)
63@ 0000000000406951 std::_Bind_simple<main::{lambda()#3} ()>::operator()()
64@ 000000000040673a std::thread::_Impl<std::_Bind_simple<main::{lambda()#3} ()> >::_M_run()
65@ 00007fd4496564e1 execute_native_thread_routine
66@ 00007fd449dd57f1 start_thread
67@ 00007fd44909746d __clone
Kenny Yu66fb4d22017-02-02 10:05:31 -080068
Kenny Yue7dff432017-02-03 09:39:39 -080069Mutex M2 (global_mutex1 0x0000000000473be0) acquired here while holding Mutex M1 (0x00007fff6d738400) in Thread 357251 (lockinversion):
70@ 00000000004024d0 pthread_mutex_lock
71@ 0000000000406dd0 std::mutex::lock()
72@ 00000000004070d2 std::lock_guard<std::mutex>::lock_guard(std::mutex&)
73@ 0000000000402ea8 main::{lambda()#4}::operator()() const
74@ 0000000000406b46 void std::_Bind_simple<main::{lambda()#4} ()>::_M_invoke<>(std::_Index_tuple<>)
75@ 000000000040692d std::_Bind_simple<main::{lambda()#4} ()>::operator()()
76@ 000000000040671c std::thread::_Impl<std::_Bind_simple<main::{lambda()#4} ()> >::_M_run()
77@ 00007fd4496564e1 execute_native_thread_routine
78@ 00007fd449dd57f1 start_thread
79@ 00007fd44909746d __clone
Kenny Yu66fb4d22017-02-02 10:05:31 -080080
Kenny Yue7dff432017-02-03 09:39:39 -080081Mutex M1 (0x00007fff6d738400) previously acquired by the same Thread 357251 (lockinversion) here:
82@ 00000000004024d0 pthread_mutex_lock
83@ 0000000000406dd0 std::mutex::lock()
84@ 00000000004070d2 std::lock_guard<std::mutex>::lock_guard(std::mutex&)
85@ 0000000000402e97 main::{lambda()#4}::operator()() const
86@ 0000000000406b46 void std::_Bind_simple<main::{lambda()#4} ()>::_M_invoke<>(std::_Index_tuple<>)
87@ 000000000040692d std::_Bind_simple<main::{lambda()#4} ()>::operator()()
88@ 000000000040671c std::thread::_Impl<std::_Bind_simple<main::{lambda()#4} ()> >::_M_run()
89@ 00007fd4496564e1 execute_native_thread_routine
90@ 00007fd449dd57f1 start_thread
91@ 00007fd44909746d __clone
Kenny Yu66fb4d22017-02-02 10:05:31 -080092
Kenny Yue7dff432017-02-03 09:39:39 -080093Mutex M3 (global_mutex2 0x0000000000473c20) acquired here while holding Mutex M2 (global_mutex1 0x0000000000473be0) in Thread 357247 (lockinversion):
94@ 00000000004024d0 pthread_mutex_lock
95@ 0000000000406dd0 std::mutex::lock()
96@ 00000000004070d2 std::lock_guard<std::mutex>::lock_guard(std::mutex&)
97@ 0000000000402d5f main::{lambda()#1}::operator()() const
98@ 0000000000406c6c void std::_Bind_simple<main::{lambda()#1} ()>::_M_invoke<>(std::_Index_tuple<>)
99@ 0000000000406999 std::_Bind_simple<main::{lambda()#1} ()>::operator()()
100@ 0000000000406776 std::thread::_Impl<std::_Bind_simple<main::{lambda()#1} ()> >::_M_run()
101@ 00007fd4496564e1 execute_native_thread_routine
102@ 00007fd449dd57f1 start_thread
103@ 00007fd44909746d __clone
Kenny Yu66fb4d22017-02-02 10:05:31 -0800104
Kenny Yue7dff432017-02-03 09:39:39 -0800105Mutex M2 (global_mutex1 0x0000000000473be0) previously acquired by the same Thread 357247 (lockinversion) here:
106@ 00000000004024d0 pthread_mutex_lock
107@ 0000000000406dd0 std::mutex::lock()
108@ 00000000004070d2 std::lock_guard<std::mutex>::lock_guard(std::mutex&)
109@ 0000000000402d4e main::{lambda()#1}::operator()() const
110@ 0000000000406c6c void std::_Bind_simple<main::{lambda()#1} ()>::_M_invoke<>(std::_Index_tuple<>)
111@ 0000000000406999 std::_Bind_simple<main::{lambda()#1} ()>::operator()()
112@ 0000000000406776 std::thread::_Impl<std::_Bind_simple<main::{lambda()#1} ()> >::_M_run()
113@ 00007fd4496564e1 execute_native_thread_routine
114@ 00007fd449dd57f1 start_thread
115@ 00007fd44909746d __clone
116
117Mutex M0 (main::static_mutex3 0x0000000000473c60) acquired here while holding Mutex M3 (global_mutex2 0x0000000000473c20) in Thread 357248 (lockinversion):
118@ 00000000004024d0 pthread_mutex_lock
119@ 0000000000406dd0 std::mutex::lock()
120@ 00000000004070d2 std::lock_guard<std::mutex>::lock_guard(std::mutex&)
Kenny Yu66fb4d22017-02-02 10:05:31 -0800121@ 0000000000402dc9 main::{lambda()#2}::operator()() const
Kenny Yue7dff432017-02-03 09:39:39 -0800122@ 0000000000406c0a void std::_Bind_simple<main::{lambda()#2} ()>::_M_invoke<>(std::_Index_tuple<>)
123@ 0000000000406975 std::_Bind_simple<main::{lambda()#2} ()>::operator()()
124@ 0000000000406758 std::thread::_Impl<std::_Bind_simple<main::{lambda()#2} ()> >::_M_run()
125@ 00007fd4496564e1 execute_native_thread_routine
126@ 00007fd449dd57f1 start_thread
127@ 00007fd44909746d __clone
Kenny Yu66fb4d22017-02-02 10:05:31 -0800128
Kenny Yue7dff432017-02-03 09:39:39 -0800129Mutex M3 (global_mutex2 0x0000000000473c20) previously acquired by the same Thread 357248 (lockinversion) here:
130@ 00000000004024d0 pthread_mutex_lock
131@ 0000000000406dd0 std::mutex::lock()
132@ 00000000004070d2 std::lock_guard<std::mutex>::lock_guard(std::mutex&)
133@ 0000000000402db8 main::{lambda()#2}::operator()() const
134@ 0000000000406c0a void std::_Bind_simple<main::{lambda()#2} ()>::_M_invoke<>(std::_Index_tuple<>)
135@ 0000000000406975 std::_Bind_simple<main::{lambda()#2} ()>::operator()()
136@ 0000000000406758 std::thread::_Impl<std::_Bind_simple<main::{lambda()#2} ()> >::_M_run()
137@ 00007fd4496564e1 execute_native_thread_routine
138@ 00007fd449dd57f1 start_thread
139@ 00007fd44909746d __clone
Kenny Yu66fb4d22017-02-02 10:05:31 -0800140
Kenny Yue7dff432017-02-03 09:39:39 -0800141Thread 357248 created by Thread 350692 (lockinversion) here:
142@ 00007fd449097431 __clone
143@ 00007fd449dd5ef5 pthread_create
144@ 00007fd449658440 std::thread::_M_start_thread(std::shared_ptr<std::thread::_Impl_base>)
145@ 00000000004033ac std::thread::thread<main::{lambda()#2}>(main::{lambda()#2}&&)
146@ 000000000040308f main
147@ 00007fd448faa0f6 __libc_start_main
Kenny Yu66fb4d22017-02-02 10:05:31 -0800148@ 0000000000402ad8 [unknown]
149
Kenny Yue7dff432017-02-03 09:39:39 -0800150Thread 357250 created by Thread 350692 (lockinversion) here:
151@ 00007fd449097431 __clone
152@ 00007fd449dd5ef5 pthread_create
153@ 00007fd449658440 std::thread::_M_start_thread(std::shared_ptr<std::thread::_Impl_base>)
154@ 00000000004034b2 std::thread::thread<main::{lambda()#3}>(main::{lambda()#3}&&)
155@ 00000000004030b9 main
156@ 00007fd448faa0f6 __libc_start_main
Kenny Yu66fb4d22017-02-02 10:05:31 -0800157@ 0000000000402ad8 [unknown]
158
Kenny Yue7dff432017-02-03 09:39:39 -0800159Thread 357251 created by Thread 350692 (lockinversion) here:
160@ 00007fd449097431 __clone
161@ 00007fd449dd5ef5 pthread_create
162@ 00007fd449658440 std::thread::_M_start_thread(std::shared_ptr<std::thread::_Impl_base>)
163@ 00000000004035b8 std::thread::thread<main::{lambda()#4}>(main::{lambda()#4}&&)
164@ 00000000004030e6 main
165@ 00007fd448faa0f6 __libc_start_main
Kenny Yu66fb4d22017-02-02 10:05:31 -0800166@ 0000000000402ad8 [unknown]
167
Kenny Yue7dff432017-02-03 09:39:39 -0800168Thread 357247 created by Thread 350692 (lockinversion) here:
169@ 00007fd449097431 __clone
170@ 00007fd449dd5ef5 pthread_create
171@ 00007fd449658440 std::thread::_M_start_thread(std::shared_ptr<std::thread::_Impl_base>)
172@ 00000000004032a6 std::thread::thread<main::{lambda()#1}>(main::{lambda()#1}&&)
173@ 0000000000403070 main
174@ 00007fd448faa0f6 __libc_start_main
Kenny Yu66fb4d22017-02-02 10:05:31 -0800175@ 0000000000402ad8 [unknown]
176
Kenny Yu66fb4d22017-02-02 10:05:31 -0800177This is output from a process that has a potential deadlock involving 4 mutexes
178and 4 threads:
179
Kenny Yue7dff432017-02-03 09:39:39 -0800180- Thread 357250 acquired M1 while holding M0 (edge M0 -> M1)
181- Thread 357251 acquired M2 while holding M1 (edge M1 -> M2)
182- Thread 357247 acquired M3 while holding M2 (edge M2 -> M3)
183- Thread 357248 acquired M0 while holding M3 (edge M3 -> M0)
Kenny Yu66fb4d22017-02-02 10:05:31 -0800184
185This is the C++ program that generated the output above:
186
187```c++
Kenny Yu66fb4d22017-02-02 10:05:31 -0800188#include <chrono>
189#include <iostream>
190#include <mutex>
191#include <thread>
192
Kenny Yue7dff432017-02-03 09:39:39 -0800193std::mutex global_mutex1;
194std::mutex global_mutex2;
195
Kenny Yu66fb4d22017-02-02 10:05:31 -0800196int main(void) {
Kenny Yue7dff432017-02-03 09:39:39 -0800197 static std::mutex static_mutex3;
198 std::mutex local_mutex4;
Kenny Yu66fb4d22017-02-02 10:05:31 -0800199
Kenny Yu66fb4d22017-02-02 10:05:31 -0800200 std::cout << "sleeping for a bit to allow trace to attach..." << std::endl;
201 std::this_thread::sleep_for(std::chrono::seconds(10));
202 std::cout << "starting program..." << std::endl;
203
Kenny Yue7dff432017-02-03 09:39:39 -0800204 auto t1 = std::thread([] {
205 std::lock_guard<std::mutex> g1(global_mutex1);
206 std::lock_guard<std::mutex> g2(global_mutex2);
Kenny Yu66fb4d22017-02-02 10:05:31 -0800207 });
208 t1.join();
209
Kenny Yue7dff432017-02-03 09:39:39 -0800210 auto t2 = std::thread([] {
211 std::lock_guard<std::mutex> g2(global_mutex2);
212 std::lock_guard<std::mutex> g3(static_mutex3);
Kenny Yu66fb4d22017-02-02 10:05:31 -0800213 });
214 t2.join();
215
Kenny Yue7dff432017-02-03 09:39:39 -0800216 auto t3 = std::thread([&local_mutex4] {
217 std::lock_guard<std::mutex> g3(static_mutex3);
218 std::lock_guard<std::mutex> g4(local_mutex4);
Kenny Yu66fb4d22017-02-02 10:05:31 -0800219 });
220 t3.join();
221
Kenny Yue7dff432017-02-03 09:39:39 -0800222 auto t4 = std::thread([&local_mutex4] {
223 std::lock_guard<std::mutex> g4(local_mutex4);
224 std::lock_guard<std::mutex> g1(global_mutex1);
Kenny Yu66fb4d22017-02-02 10:05:31 -0800225 });
226 t4.join();
227
228 std::cout << "sleeping to allow trace to collect data..." << std::endl;
229 std::this_thread::sleep_for(std::chrono::seconds(5));
230 std::cout << "done!" << std::endl;
231}
232```
233
234Note that an actual deadlock did not occur, although this mutex lock ordering
235creates the possibility of a deadlock, and this is a hint to the programmer to
Kenny Yue7dff432017-02-03 09:39:39 -0800236reconsider the lock ordering. If the mutexes are global or static and debug
237symbols are enabled, the output will contain the mutex symbol name. The output
238uses a similar format as ThreadSanitizer
239(https://github.com/google/sanitizers/wiki/ThreadSanitizerDeadlockDetector).
Kenny Yu66fb4d22017-02-02 10:05:31 -0800240
241
Kenny Yuc44e92c2017-02-03 18:50:33 -0800242# ./deadlock_detector.py 181 --binary /usr/local/bin/lockinversion
243
244Tracing... Hit Ctrl-C to end.
245^C
246
247If the traced process is instantiated from a statically-linked executable,
248this argument is optional, and the program will determine the path of the
249executable from the pid. However, on older kernels without this patch
250("uprobe: Find last occurrence of ':' when parsing uprobe PATH:OFFSET",
251https://lkml.org/lkml/2017/1/13/585), binaries that contain `:` in the path
252cannot be attached with uprobes. As a workaround, we can create a symlink
253to the binary, and provide the symlink name instead to the `--binary` option.
254
255
256# ./deadlock_detector.py 181 --binary /lib/x86_64-linux-gnu/libpthread.so.0
257
258Tracing... Hit Ctrl-C to end.
259^C
260
261If the traced process is instantiated from a dynamically-linked executable,
262this argument is required and needs to be the path to the pthread shared
263library used by the executable.
264
265
266# ./deadlock_detector.py 181 --dump-graph graph.json --verbose
Kenny Yu66fb4d22017-02-02 10:05:31 -0800267
268Tracing... Hit Ctrl-C to end.
Kenny Yue7dff432017-02-03 09:39:39 -0800269Mutexes: 0, Edges: 0
270Mutexes: 532, Edges: 411
271Mutexes: 735, Edges: 675
272Mutexes: 1118, Edges: 1278
273Mutexes: 1666, Edges: 2185
274Mutexes: 2056, Edges: 2694
275Mutexes: 2245, Edges: 2906
276Mutexes: 2656, Edges: 3479
277Mutexes: 2813, Edges: 3785
Kenny Yu66fb4d22017-02-02 10:05:31 -0800278^C
279
280If the program does not find a deadlock, it will keep running until you hit
Kenny Yue7dff432017-02-03 09:39:39 -0800281Ctrl-C. If you pass the `--verbose` flag, the program will also dump statistics
282about the number of mutexes and edges in the mutex wait graph. If you want to
283serialize the graph to analyze it later, you can pass the `--dump-graph FILE`
284flag, and the program will serialize the graph in json.
Kenny Yu66fb4d22017-02-02 10:05:31 -0800285
286
Kenny Yuc44e92c2017-02-03 18:50:33 -0800287# ./deadlock_detector.py 181 --lock-symbols custom_mutex1_lock,custom_mutex2_lock --unlock_symbols custom_mutex1_unlock,custom_mutex2_unlock --verbose
Kenny Yu66fb4d22017-02-02 10:05:31 -0800288
289Tracing... Hit Ctrl-C to end.
Kenny Yue7dff432017-02-03 09:39:39 -0800290Mutexes: 0, Edges: 0
291Mutexes: 532, Edges: 411
292Mutexes: 735, Edges: 675
293Mutexes: 1118, Edges: 1278
294Mutexes: 1666, Edges: 2185
295Mutexes: 2056, Edges: 2694
296Mutexes: 2245, Edges: 2906
297Mutexes: 2656, Edges: 3479
298Mutexes: 2813, Edges: 3785
Kenny Yu66fb4d22017-02-02 10:05:31 -0800299^C
300
301If your program is using custom mutexes and not pthread mutexes, you can use
302the `--lock-symbols` and `--unlock-symbols` flags to specify different mutex
303symbols to trace. The flags take a comma-separated string of symbol names.
304Note that if the symbols are inlined in the binary, then this program can result
305in false positives.
306
307
308USAGE message:
309
310# ./deadlock_detector.py -h
311
Kenny Yue7dff432017-02-03 09:39:39 -0800312usage: deadlock_detector.py [-h] [--binary BINARY] [--dump-graph DUMP_GRAPH]
313 [--verbose] [--lock-symbols LOCK_SYMBOLS]
Kenny Yu66fb4d22017-02-02 10:05:31 -0800314 [--unlock-symbols UNLOCK_SYMBOLS]
Kenny Yue7dff432017-02-03 09:39:39 -0800315 pid
Kenny Yu66fb4d22017-02-02 10:05:31 -0800316
Kenny Yuc44e92c2017-02-03 18:50:33 -0800317Detect potential deadlocks (lock inversions) in a running binary.
318Must be run as root.
Kenny Yu66fb4d22017-02-02 10:05:31 -0800319
320positional arguments:
Kenny Yu66fb4d22017-02-02 10:05:31 -0800321 pid Pid to trace
322
323optional arguments:
324 -h, --help show this help message and exit
Kenny Yuc44e92c2017-02-03 18:50:33 -0800325 --binary BINARY If set, trace the mutexes from the binary at this
326 path. For statically-linked binaries, this argument is
327 not required. For dynamically-linked binaries, this
328 argument is required and should be the path of the
329 pthread library the binary is using. Example:
330 /lib/x86_64-linux-gnu/libpthread.so.0
Kenny Yu66fb4d22017-02-02 10:05:31 -0800331 --dump-graph DUMP_GRAPH
332 If set, this will dump the mutex graph to the
333 specified file.
Kenny Yue7dff432017-02-03 09:39:39 -0800334 --verbose Print statistics about the mutex wait graph.
Kenny Yu66fb4d22017-02-02 10:05:31 -0800335 --lock-symbols LOCK_SYMBOLS
336 Comma-separated list of lock symbols to trace. Default
Kenny Yuc44e92c2017-02-03 18:50:33 -0800337 is pthread_mutex_lock. These symbols cannot be inlined
338 in the binary.
Kenny Yu66fb4d22017-02-02 10:05:31 -0800339 --unlock-symbols UNLOCK_SYMBOLS
340 Comma-separated list of unlock symbols to trace.
Kenny Yuc44e92c2017-02-03 18:50:33 -0800341 Default is pthread_mutex_unlock. These symbols cannot
342 be inlined in the binary.
343
344Examples:
345 deadlock_detector 181 # Analyze PID 181
346
347 deadlock_detector 181 --binary /lib/x86_64-linux-gnu/libpthread.so.0
348 # Analyze PID 181 and locks from this binary.
349 # If tracing a process that is running from
350 # a dynamically-linked binary, this argument
351 # is required and should be the path to the
352 # pthread library.
353
354 deadlock_detector 181 --verbose
355 # Analyze PID 181 and print statistics about
356 # the mutex wait graph.
357
358 deadlock_detector 181 --lock-symbols my_mutex_lock1,my_mutex_lock2 \
359 --unlock-symbols my_mutex_unlock1,my_mutex_unlock2
360 # Analyze PID 181 and trace custom mutex
361 # symbols instead of pthread mutexes.
362
363 deadlock_detector 181 --dump-graph graph.json
364 # Analyze PID 181 and dump the mutex wait
365 # graph to graph.json.