blob: d2d6448500fc0692135f8ba29032a61a8f86c923 [file] [log] [blame]
Demonstrations of stackcount, the Linux eBPF/bcc version.
This program traces functions and frequency counts them with their entire
stack trace, summarized in-kernel for efficiency. For example, counting
stack traces that led to submit_bio(), which creates block device I/O:
# ./stackcount submit_bio
Tracing 1 functions for "submit_bio"... Hit Ctrl-C to end.
^C
submit_bio
submit_bh
journal_submit_commit_record.isra.13
jbd2_journal_commit_transaction
kjournald2
kthread
ret_from_fork
mb_cache_list
1
submit_bio
__block_write_full_page.constprop.39
block_write_full_page
blkdev_writepage
__writepage
write_cache_pages
generic_writepages
do_writepages
__writeback_single_inode
writeback_sb_inodes
__writeback_inodes_wb
2
submit_bio
__block_write_full_page.constprop.39
block_write_full_page
blkdev_writepage
__writepage
write_cache_pages
generic_writepages
do_writepages
__filemap_fdatawrite_range
filemap_fdatawrite
fdatawrite_one_bdev
36
submit_bio
submit_bh
jbd2_journal_commit_transaction
kjournald2
kthread
ret_from_fork
mb_cache_list
38
submit_bio
ext4_writepages
do_writepages
__filemap_fdatawrite_range
filemap_flush
ext4_alloc_da_blocks
ext4_rename
ext4_rename2
vfs_rename
sys_rename
entry_SYSCALL_64_fastpath
79
Detaching...
The output shows unique stack traces, in order from leaf (on-CPU) to root,
followed by their occurrence count. The last stack trace in the above output
shows syscall handling, ext4_rename(), and filemap_flush(): looks like an
application issued file rename has caused back end disk I/O due to ext4
block allocation and a filemap_flush(). I'd have to browse the code to those
functions to confirm!
The order of printed stack traces is from least to most frequent. The most
frequent in this case, the ext4_rename() stack, was taken 79 times during
tracing.
It can be useful to trace the path to submit_bio to explain unusual rates of
disk IOPS. These could have in-kernel origins (eg, background scrub).
As another example, here are the code paths that led to ip_output(), which
sends a packet at the IP level:
# ./stackcount ip_output
Tracing 1 functions for "ip_output"... Hit Ctrl-C to end.
^C
ip_output
ip_queue_xmit
tcp_transmit_skb
tcp_write_xmit
tcp_tsq_handler.part.32
tcp_tasklet_func
tasklet_action
__do_softirq
do_softirq_own_stack
do_softirq
__local_bh_enable_ip
1
ip_output
ip_queue_xmit
tcp_transmit_skb
tcp_send_ack
tcp_send_delayed_ack
__tcp_ack_snd_check
tcp_rcv_established
tcp_v4_do_rcv
tcp_v4_rcv
ip_local_deliver_finish
ip_local_deliver
1
ip_output
ip_queue_xmit
tcp_transmit_skb
tcp_send_ack
tcp_send_delayed_ack
__tcp_ack_snd_check
tcp_rcv_established
tcp_v4_do_rcv
tcp_v4_rcv
ip_local_deliver_finish
ip_local_deliver
1
ip_output
ip_queue_xmit
tcp_transmit_skb
tcp_send_ack
tcp_delack_timer_handler
tcp_delack_timer
call_timer_fn
run_timer_softirq
__do_softirq
irq_exit
xen_evtchn_do_upcall
1
ip_output
ip_queue_xmit
tcp_transmit_skb
tcp_write_xmit
__tcp_push_pending_frames
tcp_rcv_established
tcp_v4_do_rcv
release_sock
tcp_sendmsg
inet_sendmsg
sock_sendmsg
3
ip_output
ip_queue_xmit
tcp_transmit_skb
tcp_write_xmit
tcp_tsq_handler.part.32
tcp_tasklet_func
tasklet_action
__do_softirq
run_ksoftirqd
smpboot_thread_fn
kthread
3
ip_output
ip_queue_xmit
tcp_transmit_skb
tcp_write_xmit
__tcp_push_pending_frames
tcp_rcv_established
tcp_v4_do_rcv
tcp_v4_rcv
ip_local_deliver_finish
ip_local_deliver
ip_rcv_finish
4
ip_output
ip_queue_xmit
tcp_transmit_skb
tcp_send_ack
tcp_cleanup_rbuf
tcp_recvmsg
inet_recvmsg
sock_recvmsg
sock_read_iter
__vfs_read
vfs_read
5
ip_output
ip_queue_xmit
tcp_transmit_skb
tcp_write_xmit
__tcp_push_pending_frames
tcp_rcv_established
tcp_v4_do_rcv
release_sock
tcp_sendmsg
inet_sendmsg
sock_sendmsg
9
ip_output
ip_queue_xmit
tcp_transmit_skb
tcp_write_xmit
__tcp_push_pending_frames
tcp_push
tcp_sendmsg
inet_sendmsg
sock_sendmsg
sock_write_iter
__vfs_write
51
ip_output
ip_queue_xmit
tcp_transmit_skb
tcp_write_xmit
__tcp_push_pending_frames
tcp_rcv_established
tcp_v4_do_rcv
tcp_v4_rcv
ip_local_deliver_finish
ip_local_deliver
ip_rcv_finish
171
ip_output
ip_queue_xmit
tcp_transmit_skb
tcp_write_xmit
__tcp_push_pending_frames
tcp_rcv_established
tcp_v4_do_rcv
tcp_v4_rcv
ip_local_deliver_finish
ip_local_deliver
ip_rcv_finish
994
ip_output
ip_queue_xmit
tcp_transmit_skb
tcp_write_xmit
tcp_tsq_handler.part.32
tcp_tasklet_func
tasklet_action
__do_softirq
irq_exit
xen_evtchn_do_upcall
xen_do_hypervisor_callback
1002
Detaching...
The last two most frequent stack traces are via tcp_transmit_skb(). Note the
send last begins with ip_rcv_finish(), for a local receive, which then becomes
a transmit: likely pushing more frames when processing the newly received ones.
As may be obvious, this is a great tool for quickly understanding kernel code
flow.
User-space functions can also be traced if a library name is provided. For
example, to quickly identify code locations that allocate heap memory:
# ./stackcount -l c -p 4902 malloc
Tracing 1 functions for "malloc"... Hit Ctrl-C to end.
^C
malloc
rbtree_new
main
[unknown]
12
malloc
_rbtree_node_new_internal
_rbtree_node_insert
rbtree_insert
main
[unknown]
1189
Detaching...
Note that user-space uses of stackcount can be somewhat more limited because
a lot of user-space libraries and binaries are compiled without debuginfo, or
with frame-pointer omission (-fomit-frame-pointer), which makes it impossible
to reliably obtain the stack trace.
In addition to kernel and user-space functions, kernel tracepoints and USDT
tracepoints are also supported.
For example, to determine where threads are being created in a particular
process, use the pthread_create USDT tracepoint:
# ./stackcount -p $(pidof parprimes) u:pthread:pthread_create
Tracing 1 functions for "u:pthread:pthread_create"... Hit Ctrl-C to end.
^C
parprimes [11923]
pthread_create@@GLIBC_2.2.5
main
__libc_start_main
[unknown]
7
Similarly, to determine where context switching is happening in the kernel,
use the sched:sched_switch kernel tracepoint:
# ./stackcount t:sched:sched_switch
... (omitted for brevity)
__schedule
schedule
schedule_hrtimeout_range_clock
schedule_hrtimeout_range
poll_schedule_timeout
do_select
core_sys_select
SyS_select
entry_SYSCALL_64_fastpath
40
__schedule
schedule
schedule_preempt_disabled
cpu_startup_entry
start_secondary
85
A -i option can be used to set an output interval, and -T to include a
timestamp. For example:
# ./stackcount -Ti 1 submit_bio
Tracing 1 functions for "submit_bio"... Hit Ctrl-C to end.
01:11:37
submit_bio
submit_bh
journal_submit_commit_record.isra.13
jbd2_journal_commit_transaction
kjournald2
kthread
ret_from_fork
mb_cache_list
1
submit_bio
ext4_writepages
do_writepages
__filemap_fdatawrite_range
filemap_flush
ext4_alloc_da_blocks
ext4_rename
ext4_rename2
vfs_rename
sys_rename
entry_SYSCALL_64_fastpath
20
submit_bio
submit_bh
jbd2_journal_commit_transaction
kjournald2
kthread
ret_from_fork
mb_cache_list
39
01:11:38
submit_bio
ext4_writepages
do_writepages
__filemap_fdatawrite_range
filemap_flush
ext4_alloc_da_blocks
ext4_rename
ext4_rename2
vfs_rename
sys_rename
entry_SYSCALL_64_fastpath
20
01:11:39
submit_bio
ext4_writepages
do_writepages
__filemap_fdatawrite_range
filemap_flush
ext4_alloc_da_blocks
ext4_rename
ext4_rename2
vfs_rename
sys_rename
entry_SYSCALL_64_fastpath
20
^C
01:11:39
submit_bio
ext4_writepages
do_writepages
__filemap_fdatawrite_range
filemap_flush
ext4_alloc_da_blocks
ext4_rename
ext4_rename2
vfs_rename
sys_rename
entry_SYSCALL_64_fastpath
20
Detaching...
The -s output prints the return instruction offset for each function (aka
symbol offset). Eg:
# ./stackcount -s tcp_sendmsg
Tracing 1 functions for "tcp_sendmsg"... Hit Ctrl-C to end.
^C
tcp_sendmsg0x1
sock_sendmsg0x38
sock_write_iter0x78
__vfs_write0xaa
vfs_write0xa9
sys_write0x46
entry_SYSCALL_64_fastpath0x16
29
Detaching...
If it wasn't clear how one function called another, knowing the instruction
offset can help you locate the lines of code from a disassembly dump.
A wildcard can also be used. Eg, all functions beginning with "tcp_send":
# ./stackcount -s 'tcp_send*'
Tracing 14 functions for "tcp_send*"... Hit Ctrl-C to end.
^C
tcp_send_delayed_ack0x1
tcp_rcv_established0x3b1
tcp_v4_do_rcv0x130
tcp_v4_rcv0x8e0
ip_local_deliver_finish0x9f
ip_local_deliver0x51
ip_rcv_finish0x8a
ip_rcv0x29d
__netif_receive_skb_core0x637
__netif_receive_skb0x18
netif_receive_skb_internal0x23
1
tcp_send_delayed_ack0x1
tcp_rcv_established0x222
tcp_v4_do_rcv0x130
tcp_v4_rcv0x8e0
ip_local_deliver_finish0x9f
ip_local_deliver0x51
ip_rcv_finish0x8a
ip_rcv0x29d
__netif_receive_skb_core0x637
__netif_receive_skb0x18
netif_receive_skb_internal0x23
4
tcp_send_mss0x1
inet_sendmsg0x67
sock_sendmsg0x38
sock_write_iter0x78
__vfs_write0xaa
vfs_write0xa9
sys_write0x46
entry_SYSCALL_64_fastpath0x16
7
tcp_sendmsg0x1
sock_sendmsg0x38
sock_write_iter0x78
__vfs_write0xaa
vfs_write0xa9
sys_write0x46
entry_SYSCALL_64_fastpath0x16
7
Detaching...
Use -r to allow regular expressions.
USAGE message:
# ./stackcount -h
usage: stackcount [-h] [-p PID] [-i INTERVAL] [-T] [-r] [-s]
[-l LIBRARY] [-v] [-d] pattern
Count function calls and their stack traces
positional arguments:
pattern search expression for functions
optional arguments:
-h, --help show this help message and exit
-p PID, --pid PID trace this PID only
-i INTERVAL, --interval INTERVAL
summary interval, seconds
-T, --timestamp include timestamp on output
-r, --regexp use regular expressions. Default is "*" wildcards
only.
-s, --offset show address offsets
-l, --library trace user-space functions from this library or executable
-v, --verbose show raw addresses
-d, --debug print BPF program before starting (for debugging purposes)
examples:
./stackcount submit_bio # count kernel stack traces for submit_bio
./stackcount ip_output # count kernel stack traces for ip_output
./stackcount -s ip_output # show symbol offsets
./stackcount -sv ip_output # show offsets and raw addresses (verbose)
./stackcount 'tcp_send*' # count stacks for funcs matching tcp_send*
./stackcount -r '^tcp_send.*' # same as above, using regular expressions
./stackcount -Ti 5 ip_output # output every 5 seconds, with timestamps
./stackcount -p 185 ip_output # count ip_output stacks for PID 185 only
./stackcount -p 185 -l c malloc # count stacks for malloc in PID 185
./stackcount t:sched:sched_fork # count stacks for the sched_fork tracepoint
./stackcount -p 185 u:node:* # count stacks for all USDT probes in node