blob: 75b07e22f4a8737babcdebc88bd3f0a7b38a2c99 [file] [log] [blame]
Brendan Gregg38cef482016-01-15 17:26:30 -08001Demonstrations of offcputime, the Linux eBPF/bcc version.
2
3
4This program shows stack traces that were blocked, and the total duration they
5were blocked. It works by tracing when threads block and when they return to
6CPU, measuring both the time they were blocked (aka the "off-CPU time") and the
7blocked kernel stack trace and the task name. This data is summarized in kernel
8by summing the blocked time by unique stack trace and task name.
9
10Here is some example output. To explain what we are seeing: the very first
11stack trace looks like a page fault (do_page_fault() etc) from the "chmod"
12command, and in total was off-CPU for 13 microseconds.
13
14# ./offcputime
15Tracing off-CPU time (us) by kernel stack... Hit Ctrl-C to end.
16^C
17 schedule
18 schedule_timeout
19 io_schedule_timeout
20 bit_wait_io
21 __wait_on_bit
22 wait_on_page_bit_killable
23 __lock_page_or_retry
24 filemap_fault
25 __do_fault
26 handle_mm_fault
27 __do_page_fault
28 do_page_fault
29 page_fault
30 chmod
31 13
32
33 schedule
34 rcu_nocb_kthread
35 kthread
36 ret_from_fork
37 ddebug_tables
38 rcuos/0
39 22
40
41 schedule
42 schedule_timeout
43 io_schedule_timeout
44 bit_wait_io
45 __wait_on_bit_lock
46 __lock_page
47 lock_page
48 __do_fault
49 handle_mm_fault
50 __do_page_fault
51 do_page_fault
52 page_fault
53 run
54 27
55
56 schedule
57 schedule_timeout
58 io_schedule_timeout
59 bit_wait_io
60 __wait_on_bit
61 wait_on_page_bit_killable
62 __lock_page_or_retry
63 filemap_fault
64 __do_fault
65 handle_mm_fault
66 __do_page_fault
67 do_page_fault
68 page_fault
69 clear_user
70 padzero
71 load_elf_binary
72 search_binary_handler
73 load_script
74 search_binary_handler
75 do_execveat_common.isra.27
76 run
77 28
78
79 schedule
80 schedule_timeout
81 io_schedule_timeout
82 bit_wait_io
83 __wait_on_bit
84 wait_on_page_bit_killable
85 __lock_page_or_retry
86 filemap_fault
87 __do_fault
88 handle_mm_fault
89 __do_page_fault
90 do_page_fault
91 page_fault
92 run
93 82
94
95 schedule
96 pipe_wait
97 pipe_read
98 __vfs_read
99 vfs_read
100 sys_read
101 entry_SYSCALL_64_fastpath
102 bash
103 94
104
105 schedule
106 rcu_gp_kthread
107 kthread
108 ret_from_fork
109 ddebug_tables
110 rcu_sched
111 104
112
113 schedule
114 schedule_timeout
115 io_schedule_timeout
116 bit_wait_io
117 __wait_on_bit
118 out_of_line_wait_on_bit
119 __wait_on_buffer
120 jbd2_journal_commit_transaction
121 kjournald2
122 kthread
123 ret_from_fork
124 mb_cache_list
125 jbd2/xvda1-8
126 986
127
128 schedule
129 schedule_timeout
130 io_schedule_timeout
131 bit_wait_io
132 __wait_on_bit
133 out_of_line_wait_on_bit
134 __wait_on_buffer
135 jbd2_journal_commit_transaction
136 kjournald2
137 kthread
138 ret_from_fork
139 mb_cache_list
140 jbd2/xvda1-8
141 6630
142
143 schedule
144 schedule_timeout
145 io_schedule_timeout
146 bit_wait_io
147 __wait_on_bit
148 out_of_line_wait_on_bit
149 do_get_write_access
150 jbd2_journal_get_write_access
151 __ext4_journal_get_write_access
152 ext4_mb_mark_diskspace_used
153 ext4_mb_new_blocks
154 ext4_ext_map_blocks
155 ext4_map_blocks
156 ext4_writepages
157 do_writepages
158 __filemap_fdatawrite_range
159 filemap_flush
160 ext4_alloc_da_blocks
161 ext4_rename
162 ext4_rename2
163 supervise
164 6645
165
166 schedule
167 schedule_timeout
168 io_schedule_timeout
169 bit_wait_io
170 __wait_on_bit
171 out_of_line_wait_on_bit
172 do_get_write_access
173 jbd2_journal_get_write_access
174 __ext4_journal_get_write_access
175 __ext4_new_inode
176 ext4_create
177 vfs_create
178 path_openat
179 do_filp_open
180 do_sys_open
181 sys_open
182 entry_SYSCALL_64_fastpath
183 supervise
184 12702
185
186 schedule
187 rcu_nocb_kthread
188 kthread
189 ret_from_fork
190 rcuos/2
191 16036
192
193 schedule
194 rcu_nocb_kthread
195 kthread
196 ret_from_fork
197 rcuos/4
198 24085
199
200 schedule
201 do_wait
202 sys_wait4
203 entry_SYSCALL_64_fastpath
204 run
205 233055
206
207 schedule
208 schedule_timeout
209 io_schedule_timeout
210 bit_wait_io
211 __wait_on_bit
212 wait_on_page_bit
213 truncate_inode_pages_range
214 truncate_inode_pages_final
215 ext4_evict_inode
216 evict
217 iput
218 __dentry_kill
219 dput
220 sys_rename
221 entry_SYSCALL_64_fastpath
222 supervise
223 297113
224
225 schedule
226 schedule_timeout
227 wait_woken
228 n_tty_read
229 tty_read
230 __vfs_read
231 vfs_read
232 sys_read
233 entry_SYSCALL_64_fastpath
234 bash
235 1789866
236
237 schedule
238 schedule_timeout
239 io_schedule_timeout
240 do_blockdev_direct_IO
241 __blockdev_direct_IO
242 blkdev_direct_IO
243 generic_file_read_iter
244 blkdev_read_iter
245 __vfs_read
246 vfs_read
247 sys_read
248 entry_SYSCALL_64_fastpath
249 dd
250 3310763
251
252 schedule
253 smpboot_thread_fn
254 kthread
255 ret_from_fork
256 watchdog/1
257 3999989
258
259 schedule
260 smpboot_thread_fn
261 kthread
262 ret_from_fork
263 watchdog/5
264 3999995
265
266 schedule
267 smpboot_thread_fn
268 kthread
269 ret_from_fork
270 watchdog/4
271 3999996
272
273 schedule
274 smpboot_thread_fn
275 kthread
276 ret_from_fork
277 watchdog/0
278 3999996
279
280 schedule
281 smpboot_thread_fn
282 kthread
283 ret_from_fork
284 watchdog/3
285 3999998
286
287 schedule
288 smpboot_thread_fn
289 kthread
290 ret_from_fork
291 watchdog/7
292 3999999
293
294 schedule
295 smpboot_thread_fn
296 kthread
297 ret_from_fork
298 watchdog/2
299 4000001
300
301 schedule
302 smpboot_thread_fn
303 kthread
304 ret_from_fork
305 watchdog/6
306 4000001
307
308 schedule
309 do_wait
310 sys_wait4
311 entry_SYSCALL_64_fastpath
312 bash
313 4039675
314
315 schedule
316 do_nanosleep
317 hrtimer_nanosleep
318 sys_nanosleep
319 entry_SYSCALL_64_fastpath
320 svscan
321 5000112
322
323 schedule
324 schedule_hrtimeout_range_clock
325 schedule_hrtimeout_range
326 poll_schedule_timeout
327 do_select
328 core_sys_select
329 sys_select
330 entry_SYSCALL_64_fastpath
331 snmpd
332 5998761
333
334 schedule
335 smpboot_thread_fn
336 kthread
337 ret_from_fork
338 migration/3
339 6149779
340
341 schedule
342 schedule_hrtimeout_range_clock
343 schedule_hrtimeout_range
344 poll_schedule_timeout
345 do_select
346 core_sys_select
347 sys_select
348 entry_SYSCALL_64_fastpath
349 ntpd
350 6999832
351
352 schedule
353 worker_thread
354 kthread
355 ret_from_fork
356 kworker/u16:2
357 7131941
358
359 schedule
360 worker_thread
361 kthread
362 ret_from_fork
363 kworker/3:0
364 7999844
365
366 schedule
367 worker_thread
368 kthread
369 ret_from_fork
370 kworker/1:1
371 7999872
372
373 schedule
374 worker_thread
375 kthread
376 ret_from_fork
377 kworker/2:1
378 7999889
379
380 schedule
381 worker_thread
382 kthread
383 ret_from_fork
384 kworker/5:1
385 7999936
386
387 schedule
388 worker_thread
389 kthread
390 ret_from_fork
391 kworker/7:1
392 7999938
393
394 schedule
395 worker_thread
396 kthread
397 ret_from_fork
398 kworker/6:1
399 7999940
400
401 schedule
402 do_nanosleep
403 hrtimer_nanosleep
404 sys_nanosleep
405 entry_SYSCALL_64_fastpath
406 tail
407 8000905
408
409 schedule
410 smpboot_thread_fn
411 kthread
412 ret_from_fork
413 migration/7
414 8197046
415
416 schedule
417 pipe_wait
418 pipe_read
419 __vfs_read
420 vfs_read
421 sys_read
422 entry_SYSCALL_64_fastpath
423 readproctitle
424 8197835
425
426 schedule
427 smpboot_thread_fn
428 kthread
429 ret_from_fork
430 migration/4
431 8201851
432
433 schedule
434 smpboot_thread_fn
435 kthread
436 ret_from_fork
437 migration/2
438 8203375
439
440 schedule
441 smpboot_thread_fn
442 kthread
443 ret_from_fork
444 migration/6
445 8208664
446
447 schedule
448 smpboot_thread_fn
449 kthread
450 ret_from_fork
451 migration/5
452 8209819
453
454 schedule
455 smpboot_thread_fn
456 kthread
457 ret_from_fork
458 ddebug_tables
459 migration/0
460 8211292
461
462 schedule
463 smpboot_thread_fn
464 kthread
465 ret_from_fork
466 migration/1
467 8212100
468
469 schedule
470 worker_thread
471 kthread
472 ret_from_fork
473 kworker/0:2
474 8270305
475
476 schedule
477 rcu_nocb_kthread
478 kthread
479 ret_from_fork
480 rcuos/3
481 8349697
482
483 schedule
484 rcu_nocb_kthread
485 kthread
486 ret_from_fork
487 rcuos/2
488 8363357
489
490 schedule
491 rcu_nocb_kthread
492 kthread
493 ret_from_fork
494 rcuos/1
495 8365338
496
497 schedule
498 schedule_timeout
499 xfs_buf_terminate
500 kthread
501 ret_from_fork
502 xfsaild/md0
503 8371514
504
505 schedule
506 rcu_nocb_kthread
507 kthread
508 ret_from_fork
509 rcuos/4
510 8384013
511
512 schedule
513 rcu_nocb_kthread
514 kthread
515 ret_from_fork
516 rcuos/5
517 8390016
518
519 schedule
520 rcu_nocb_kthread
521 kthread
522 ret_from_fork
523 ddebug_tables
524 rcuos/0
525 8405428
526
527 schedule
528 schedule_timeout
529 rcu_gp_kthread
530 kthread
531 ret_from_fork
532 ddebug_tables
533 rcu_sched
534 8406930
535
536 schedule
537 rcu_nocb_kthread
538 kthread
539 ret_from_fork
540 rcuos/7
541 8409575
542
543 schedule
544 rcu_nocb_kthread
545 kthread
546 ret_from_fork
547 rcuos/6
548 8415062
549
550 schedule
551 schedule_hrtimeout_range_clock
552 schedule_hrtimeout_range
553 poll_schedule_timeout
554 do_select
555 core_sys_select
556 sys_select
557 entry_SYSCALL_64_fastpath
558 offcputime
559 8421478
560
561 schedule
562 worker_thread
563 kthread
564 ret_from_fork
565 kworker/4:0
566 8421492
567
568 schedule
569 schedule_hrtimeout_range_clock
570 schedule_hrtimeout_range
571 poll_schedule_timeout
572 do_select
573 core_sys_select
574 sys_select
575 entry_SYSCALL_64_fastpath
576 sshd
577 14249005
578
579 schedule
580 schedule_hrtimeout_range_clock
581 schedule_hrtimeout_range
582 poll_schedule_timeout
583 do_sys_poll
584 sys_poll
585 entry_SYSCALL_64_fastpath
586 supervise
587 81670888
588
589Detaching...
590
591The last few stack traces aren't very interesting, since they are threads that
592are ofter blocked off-CPU waiting for work.
593
594Do be somewhat careful with overhead: this is tracing scheduler functions, which
595can be called very frequently. While this uses in-kernel summaries for
596efficiency, the rate of scheduler functions can be very high (> 1,000,000/sec),
597and this is performing stack walks when threads return to CPU. At some point
598the overhead will be measurable.
599
600
601A -p option can be used to filter (in-kernel) on a single process ID. For
602example, only matching PID 26651, which is a running "dd" command:
603
604# ./offcputime -p 26651
605Tracing off-CPU time (us) by kernel stack... Hit Ctrl-C to end.
606^C
607 schedule
608 schedule_timeout
609 io_schedule_timeout
610 do_blockdev_direct_IO
611 __blockdev_direct_IO
612 blkdev_direct_IO
613 generic_file_read_iter
614 blkdev_read_iter
615 __vfs_read
616 vfs_read
617 sys_read
618 entry_SYSCALL_64_fastpath
619 dd
620 2405710
621
622Detaching...
623
624The stack trace shows "dd" is blocked waiting on disk I/O, as expected, for a
625total of 2.4 seconds during tracing.
626
627
628A duration can be added, for example, tracing for 5 seconds only:
629
630# ./offcputime -p 26651 5
631Tracing off-CPU time (us) by kernel stack for 5 secs.
632
633 schedule
634 schedule_timeout
635 io_schedule_timeout
636 do_blockdev_direct_IO
637 __blockdev_direct_IO
638 blkdev_direct_IO
639 generic_file_read_iter
640 blkdev_read_iter
641 __vfs_read
642 vfs_read
643 sys_read
644 entry_SYSCALL_64_fastpath
645 dd
646 4413909
647
648Detaching...
649
650Here, dd was blocked for 4.4 seconds out of 5. Or put differently, likely
651on-CPU for about 12% of the time. Which matches the ratio seen by time(1):
652
653# time dd if=/dev/md0 iflag=direct of=/dev/null bs=1k
654^C108115+0 records in
655108114+0 records out
656110708736 bytes (111 MB) copied, 13.7565 s, 8.0 MB/s
657
658real 0m13.760s
659user 0m0.000s
660sys 0m1.739s
661
662
663A -f option will emit output using the "folded stacks" format, which can be
664read directly by flamegraph.pl from the FlameGraph open source software
665(https://github.com/brendangregg/FlameGraph). Eg:
666
667# ./offcputime -f 5
668bash;entry_SYSCALL_64_fastpath;sys_read;vfs_read;__vfs_read;tty_read;n_tty_read;call_rwsem_down_read_failed;rwsem_down_read_failed;schedule 8
669yes;entry_SYSCALL_64_fastpath;sys_write;vfs_write;__vfs_write;tty_write;n_tty_write;call_rwsem_down_read_failed;rwsem_down_read_failed;schedule 14
670run;page_fault;do_page_fault;__do_page_fault;handle_mm_fault;__do_fault;filemap_fault;__lock_page_or_retry;wait_on_page_bit_killable;__wait_on_bit;bit_wait_io;io_schedule_timeout;schedule_timeout;schedule 33
671rcuos/4;ret_from_fork;kthread;rcu_nocb_kthread;schedule 45
672bash;entry_SYSCALL_64_fastpath;sys_read;vfs_read;__vfs_read;pipe_read;pipe_wait;schedule 88
673run;page_fault;do_page_fault;__do_page_fault;handle_mm_fault;__do_fault;filemap_fault;__lock_page_or_retry;wait_on_page_bit_killable;__wait_on_bit;bit_wait_io;io_schedule_timeout;schedule_timeout;schedule 108
674jbd2/xvda1-8;mb_cache_list;ret_from_fork;kthread;kjournald2;jbd2_journal_commit_transaction;__wait_on_buffer;out_of_line_wait_on_bit;__wait_on_bit;bit_wait_io;io_schedule_timeout;schedule_timeout;schedule 828
675jbd2/xvda1-8;mb_cache_list;ret_from_fork;kthread;kjournald2;jbd2_journal_commit_transaction;__wait_on_buffer;out_of_line_wait_on_bit;__wait_on_bit;bit_wait_io;io_schedule_timeout;schedule_timeout;schedule 6201
676supervise;entry_SYSCALL_64_fastpath;sys_rename;dput;__dentry_kill;iput;evict;ext4_evict_inode;truncate_inode_pages_final;truncate_inode_pages_range;wait_on_page_bit;__wait_on_bit;bit_wait_io;io_schedule_timeout;schedule_timeout;schedule 41049
677run;entry_SYSCALL_64_fastpath;sys_wait4;do_wait;schedule 120709
678bash;entry_SYSCALL_64_fastpath;sys_wait4;do_wait;schedule 699320
679ksoftirqd/0;ret_from_fork;kthread;smpboot_thread_fn;schedule 1077529
680bash;entry_SYSCALL_64_fastpath;sys_read;vfs_read;__vfs_read;tty_read;n_tty_read;wait_woken;schedule_timeout;schedule 1362045
681sshd;entry_SYSCALL_64_fastpath;sys_select;core_sys_select;do_select;poll_schedule_timeout;schedule_hrtimeout_range;schedule_hrtimeout_range_clock;schedule 1377627
682migration/0;ddebug_tables;ret_from_fork;kthread;smpboot_thread_fn;schedule 2040753
683snmpd;entry_SYSCALL_64_fastpath;sys_select;core_sys_select;do_select;poll_schedule_timeout;schedule_hrtimeout_range;schedule_hrtimeout_range_clock;schedule 2197568
684migration/5;ret_from_fork;kthread;smpboot_thread_fn;schedule 3079426
685migration/7;ret_from_fork;kthread;smpboot_thread_fn;schedule 3084746
686kworker/6:2;ret_from_fork;kthread;worker_thread;schedule 3940583
687kworker/5:1;ret_from_fork;kthread;worker_thread;schedule 3944892
688kworker/1:2;ret_from_fork;kthread;worker_thread;schedule 3999646
689ntpd;entry_SYSCALL_64_fastpath;sys_select;core_sys_select;do_select;poll_schedule_timeout;schedule_hrtimeout_range;schedule_hrtimeout_range_clock;schedule 3999904
690kworker/u16:0;ret_from_fork;kthread;worker_thread;schedule 3999967
691kworker/7:0;ret_from_fork;kthread;worker_thread;schedule 3999987
692tail;entry_SYSCALL_64_fastpath;sys_nanosleep;hrtimer_nanosleep;do_nanosleep;schedule 4000473
693migration/1;ret_from_fork;kthread;smpboot_thread_fn;schedule 4091150
694migration/4;ret_from_fork;kthread;smpboot_thread_fn;schedule 4095217
695readproctitle;entry_SYSCALL_64_fastpath;sys_read;vfs_read;__vfs_read;pipe_read;pipe_wait;schedule 4108470
696migration/3;ret_from_fork;kthread;smpboot_thread_fn;schedule 4109264
697migration/2;ret_from_fork;kthread;smpboot_thread_fn;schedule 4109280
698migration/6;ret_from_fork;kthread;smpboot_thread_fn;schedule 4111143
699kworker/4:0;ret_from_fork;kthread;worker_thread;schedule 4402350
700kworker/3:0;ret_from_fork;kthread;worker_thread;schedule 4433988
701kworker/2:1;ret_from_fork;kthread;worker_thread;schedule 4636142
702kworker/0:2;ret_from_fork;kthread;worker_thread;schedule 4832023
703rcuos/1;ret_from_fork;kthread;rcu_nocb_kthread;schedule 4974186
704rcuos/5;ret_from_fork;kthread;rcu_nocb_kthread;schedule 4977137
705rcuos/6;ret_from_fork;kthread;rcu_nocb_kthread;schedule 4987769
706rcuos/3;ret_from_fork;kthread;rcu_nocb_kthread;schedule 4992282
707rcuos/4;ret_from_fork;kthread;rcu_nocb_kthread;schedule 4992364
708rcuos/2;ret_from_fork;kthread;rcu_nocb_kthread;schedule 4992714
709rcuos/0;ddebug_tables;ret_from_fork;kthread;rcu_nocb_kthread;schedule 4996504
710rcuos/7;ret_from_fork;kthread;rcu_nocb_kthread;schedule 4998497
711rcu_sched;ddebug_tables;ret_from_fork;kthread;rcu_gp_kthread;schedule_timeout;schedule 5000686
712offcputime;entry_SYSCALL_64_fastpath;sys_select;core_sys_select;do_select;poll_schedule_timeout;schedule_hrtimeout_range;schedule_hrtimeout_range_clock;schedule 5005063
713dd;entry_SYSCALL_64_fastpath;sys_read;vfs_read;__vfs_read;blkdev_read_iter;generic_file_read_iter;blkdev_direct_IO;__blockdev_direct_IO;do_blockdev_direct_IO;io_schedule_timeout;schedule_timeout;schedule 8025599
714supervise;entry_SYSCALL_64_fastpath;sys_poll;do_sys_poll;poll_schedule_timeout;schedule_hrtimeout_range;schedule_hrtimeout_range_clock;schedule 40835611
715
716The stack traces are shown as single lines, with functions separated by
717semicolons. The first entry is the task name. The 2nd column is the total
718off-CPU time.
719
720I'd save this output to a file, then move it to the system where you'll be
721creating your "off-CPU time flame graphs".
722
723
724USAGE message:
725
Brendan Gregg670a6aa2016-01-19 17:15:02 -0800726# ./offcputime -h
727usage: offcputime [-h] [-u] [-p PID] [-v] [-f] [duration]
Brendan Gregg38cef482016-01-15 17:26:30 -0800728
729Summarize off-CPU time by kernel stack trace
730
731positional arguments:
732 duration duration of trace, in seconds
733
734optional arguments:
735 -h, --help show this help message and exit
Brendan Gregg670a6aa2016-01-19 17:15:02 -0800736 -u, --useronly user threads only (no kernel threads)
Brendan Gregg38cef482016-01-15 17:26:30 -0800737 -p PID, --pid PID trace this PID only
738 -v, --verbose show raw addresses
739 -f, --folded output folded format
740
741examples:
742 ./offcputime # trace off-CPU stack time until Ctrl-C
743 ./offcputime 5 # trace for 5 seconds only
744 ./offcputime -f 5 # 5 seconds, and output in folded format
Brendan Gregg670a6aa2016-01-19 17:15:02 -0800745 ./offcputime -u # don't include kernel threads (user only)
Alex Bagehot3b9679a2016-02-06 16:01:02 +0000746 ./offcputime -p 185 # trace for PID 185 only