| Demonstrations of stackcount, the Linux eBPF/bcc version. |
| |
| |
| This program traces functions and frequency counts them with their entire |
| stack trace, summarized in-kernel for efficiency. For example, counting |
| stack traces that led to submit_bio(), which creates block device I/O: |
| |
| # ./stackcount submit_bio |
| Tracing 1 functions for "submit_bio"... Hit Ctrl-C to end. |
| ^C |
| submit_bio |
| submit_bh |
| journal_submit_commit_record.isra.13 |
| jbd2_journal_commit_transaction |
| kjournald2 |
| kthread |
| ret_from_fork |
| mb_cache_list |
| 1 |
| |
| submit_bio |
| __block_write_full_page.constprop.39 |
| block_write_full_page |
| blkdev_writepage |
| __writepage |
| write_cache_pages |
| generic_writepages |
| do_writepages |
| __writeback_single_inode |
| writeback_sb_inodes |
| __writeback_inodes_wb |
| 2 |
| |
| submit_bio |
| __block_write_full_page.constprop.39 |
| block_write_full_page |
| blkdev_writepage |
| __writepage |
| write_cache_pages |
| generic_writepages |
| do_writepages |
| __filemap_fdatawrite_range |
| filemap_fdatawrite |
| fdatawrite_one_bdev |
| 36 |
| |
| submit_bio |
| submit_bh |
| jbd2_journal_commit_transaction |
| kjournald2 |
| kthread |
| ret_from_fork |
| mb_cache_list |
| 38 |
| |
| submit_bio |
| ext4_writepages |
| do_writepages |
| __filemap_fdatawrite_range |
| filemap_flush |
| ext4_alloc_da_blocks |
| ext4_rename |
| ext4_rename2 |
| vfs_rename |
| sys_rename |
| entry_SYSCALL_64_fastpath |
| 79 |
| |
| Detaching... |
| |
| The output shows unique stack traces, in order from leaf (on-CPU) to root, |
| followed by their occurrence count. The last stack trace in the above output |
| shows syscall handling, ext4_rename(), and filemap_flush(): looks like an |
| application issued file rename has caused back end disk I/O due to ext4 |
| block allocation and a filemap_flush(). I'd have to browse the code to those |
| functions to confirm! |
| |
| The order of printed stack traces is from least to most frequent. The most |
| frequent in this case, the ext4_rename() stack, was taken 79 times during |
| tracing. |
| |
| It can be useful to trace the path to submit_bio to explain unusual rates of |
| disk IOPS. These could have in-kernel origins (eg, background scrub). |
| |
| |
| As another example, here are the code paths that led to ip_output(), which |
| sends a packet at the IP level: |
| |
| # ./stackcount ip_output |
| Tracing 1 functions for "ip_output"... Hit Ctrl-C to end. |
| ^C |
| ip_output |
| ip_queue_xmit |
| tcp_transmit_skb |
| tcp_write_xmit |
| tcp_tsq_handler.part.32 |
| tcp_tasklet_func |
| tasklet_action |
| __do_softirq |
| do_softirq_own_stack |
| do_softirq |
| __local_bh_enable_ip |
| 1 |
| |
| ip_output |
| ip_queue_xmit |
| tcp_transmit_skb |
| tcp_send_ack |
| tcp_send_delayed_ack |
| __tcp_ack_snd_check |
| tcp_rcv_established |
| tcp_v4_do_rcv |
| tcp_v4_rcv |
| ip_local_deliver_finish |
| ip_local_deliver |
| 1 |
| |
| ip_output |
| ip_queue_xmit |
| tcp_transmit_skb |
| tcp_send_ack |
| tcp_send_delayed_ack |
| __tcp_ack_snd_check |
| tcp_rcv_established |
| tcp_v4_do_rcv |
| tcp_v4_rcv |
| ip_local_deliver_finish |
| ip_local_deliver |
| 1 |
| |
| ip_output |
| ip_queue_xmit |
| tcp_transmit_skb |
| tcp_send_ack |
| tcp_delack_timer_handler |
| tcp_delack_timer |
| call_timer_fn |
| run_timer_softirq |
| __do_softirq |
| irq_exit |
| xen_evtchn_do_upcall |
| 1 |
| |
| ip_output |
| ip_queue_xmit |
| tcp_transmit_skb |
| tcp_write_xmit |
| __tcp_push_pending_frames |
| tcp_rcv_established |
| tcp_v4_do_rcv |
| release_sock |
| tcp_sendmsg |
| inet_sendmsg |
| sock_sendmsg |
| 3 |
| |
| ip_output |
| ip_queue_xmit |
| tcp_transmit_skb |
| tcp_write_xmit |
| tcp_tsq_handler.part.32 |
| tcp_tasklet_func |
| tasklet_action |
| __do_softirq |
| run_ksoftirqd |
| smpboot_thread_fn |
| kthread |
| 3 |
| |
| ip_output |
| ip_queue_xmit |
| tcp_transmit_skb |
| tcp_write_xmit |
| __tcp_push_pending_frames |
| tcp_rcv_established |
| tcp_v4_do_rcv |
| tcp_v4_rcv |
| ip_local_deliver_finish |
| ip_local_deliver |
| ip_rcv_finish |
| 4 |
| |
| ip_output |
| ip_queue_xmit |
| tcp_transmit_skb |
| tcp_send_ack |
| tcp_cleanup_rbuf |
| tcp_recvmsg |
| inet_recvmsg |
| sock_recvmsg |
| sock_read_iter |
| __vfs_read |
| vfs_read |
| 5 |
| |
| ip_output |
| ip_queue_xmit |
| tcp_transmit_skb |
| tcp_write_xmit |
| __tcp_push_pending_frames |
| tcp_rcv_established |
| tcp_v4_do_rcv |
| release_sock |
| tcp_sendmsg |
| inet_sendmsg |
| sock_sendmsg |
| 9 |
| |
| ip_output |
| ip_queue_xmit |
| tcp_transmit_skb |
| tcp_write_xmit |
| __tcp_push_pending_frames |
| tcp_push |
| tcp_sendmsg |
| inet_sendmsg |
| sock_sendmsg |
| sock_write_iter |
| __vfs_write |
| 51 |
| |
| ip_output |
| ip_queue_xmit |
| tcp_transmit_skb |
| tcp_write_xmit |
| __tcp_push_pending_frames |
| tcp_rcv_established |
| tcp_v4_do_rcv |
| tcp_v4_rcv |
| ip_local_deliver_finish |
| ip_local_deliver |
| ip_rcv_finish |
| 171 |
| |
| ip_output |
| ip_queue_xmit |
| tcp_transmit_skb |
| tcp_write_xmit |
| __tcp_push_pending_frames |
| tcp_rcv_established |
| tcp_v4_do_rcv |
| tcp_v4_rcv |
| ip_local_deliver_finish |
| ip_local_deliver |
| ip_rcv_finish |
| 994 |
| |
| ip_output |
| ip_queue_xmit |
| tcp_transmit_skb |
| tcp_write_xmit |
| tcp_tsq_handler.part.32 |
| tcp_tasklet_func |
| tasklet_action |
| __do_softirq |
| irq_exit |
| xen_evtchn_do_upcall |
| xen_do_hypervisor_callback |
| 1002 |
| |
| Detaching... |
| |
| The last two most frequent stack traces are via tcp_transmit_skb(). Note the |
| send last begins with ip_rcv_finish(), for a local receive, which then becomes |
| a transmit: likely pushing more frames when processing the newly received ones. |
| |
| As may be obvious, this is a great tool for quickly understanding kernel code |
| flow. |
| |
| |
| User-space functions can also be traced if a library name is provided. For |
| example, to quickly identify code locations that allocate heap memory: |
| |
| # ./stackcount -l c -p 4902 malloc |
| Tracing 1 functions for "malloc"... Hit Ctrl-C to end. |
| ^C |
| malloc |
| rbtree_new |
| main |
| [unknown] |
| 12 |
| |
| malloc |
| _rbtree_node_new_internal |
| _rbtree_node_insert |
| rbtree_insert |
| main |
| [unknown] |
| 1189 |
| |
| Detaching... |
| |
| Note that user-space uses of stackcount can be somewhat more limited because |
| a lot of user-space libraries and binaries are compiled without debuginfo, or |
| with frame-pointer omission (-fomit-frame-pointer), which makes it impossible |
| to reliably obtain the stack trace. |
| |
| |
| In addition to kernel and user-space functions, kernel tracepoints and USDT |
| tracepoints are also supported. |
| |
| For example, to determine where threads are being created in a particular |
| process, use the pthread_create USDT tracepoint: |
| |
| # ./stackcount -p $(pidof parprimes) u:pthread:pthread_create |
| Tracing 1 functions for "u:pthread:pthread_create"... Hit Ctrl-C to end. |
| ^C |
| |
| parprimes [11923] |
| pthread_create@@GLIBC_2.2.5 |
| main |
| __libc_start_main |
| [unknown] |
| 7 |
| |
| Similarly, to determine where context switching is happening in the kernel, |
| use the sched:sched_switch kernel tracepoint: |
| |
| # ./stackcount t:sched:sched_switch |
| ... (omitted for brevity) |
| |
| __schedule |
| schedule |
| schedule_hrtimeout_range_clock |
| schedule_hrtimeout_range |
| poll_schedule_timeout |
| do_select |
| core_sys_select |
| SyS_select |
| entry_SYSCALL_64_fastpath |
| 40 |
| |
| __schedule |
| schedule |
| schedule_preempt_disabled |
| cpu_startup_entry |
| start_secondary |
| 85 |
| |
| |
| A -i option can be used to set an output interval, and -T to include a |
| timestamp. For example: |
| |
| # ./stackcount -Ti 1 submit_bio |
| Tracing 1 functions for "submit_bio"... Hit Ctrl-C to end. |
| |
| 01:11:37 |
| submit_bio |
| submit_bh |
| journal_submit_commit_record.isra.13 |
| jbd2_journal_commit_transaction |
| kjournald2 |
| kthread |
| ret_from_fork |
| mb_cache_list |
| 1 |
| |
| submit_bio |
| ext4_writepages |
| do_writepages |
| __filemap_fdatawrite_range |
| filemap_flush |
| ext4_alloc_da_blocks |
| ext4_rename |
| ext4_rename2 |
| vfs_rename |
| sys_rename |
| entry_SYSCALL_64_fastpath |
| 20 |
| |
| submit_bio |
| submit_bh |
| jbd2_journal_commit_transaction |
| kjournald2 |
| kthread |
| ret_from_fork |
| mb_cache_list |
| 39 |
| |
| |
| 01:11:38 |
| submit_bio |
| ext4_writepages |
| do_writepages |
| __filemap_fdatawrite_range |
| filemap_flush |
| ext4_alloc_da_blocks |
| ext4_rename |
| ext4_rename2 |
| vfs_rename |
| sys_rename |
| entry_SYSCALL_64_fastpath |
| 20 |
| |
| |
| 01:11:39 |
| submit_bio |
| ext4_writepages |
| do_writepages |
| __filemap_fdatawrite_range |
| filemap_flush |
| ext4_alloc_da_blocks |
| ext4_rename |
| ext4_rename2 |
| vfs_rename |
| sys_rename |
| entry_SYSCALL_64_fastpath |
| 20 |
| |
| ^C |
| 01:11:39 |
| submit_bio |
| ext4_writepages |
| do_writepages |
| __filemap_fdatawrite_range |
| filemap_flush |
| ext4_alloc_da_blocks |
| ext4_rename |
| ext4_rename2 |
| vfs_rename |
| sys_rename |
| entry_SYSCALL_64_fastpath |
| 20 |
| |
| Detaching... |
| |
| |
| The -s output prints the return instruction offset for each function (aka |
| symbol offset). Eg: |
| |
| # ./stackcount -s tcp_sendmsg |
| Tracing 1 functions for "tcp_sendmsg"... Hit Ctrl-C to end. |
| ^C |
| tcp_sendmsg0x1 |
| sock_sendmsg0x38 |
| sock_write_iter0x78 |
| __vfs_write0xaa |
| vfs_write0xa9 |
| sys_write0x46 |
| entry_SYSCALL_64_fastpath0x16 |
| 29 |
| |
| Detaching... |
| |
| If it wasn't clear how one function called another, knowing the instruction |
| offset can help you locate the lines of code from a disassembly dump. |
| |
| |
| A wildcard can also be used. Eg, all functions beginning with "tcp_send": |
| |
| # ./stackcount -s 'tcp_send*' |
| Tracing 14 functions for "tcp_send*"... Hit Ctrl-C to end. |
| ^C |
| tcp_send_delayed_ack0x1 |
| tcp_rcv_established0x3b1 |
| tcp_v4_do_rcv0x130 |
| tcp_v4_rcv0x8e0 |
| ip_local_deliver_finish0x9f |
| ip_local_deliver0x51 |
| ip_rcv_finish0x8a |
| ip_rcv0x29d |
| __netif_receive_skb_core0x637 |
| __netif_receive_skb0x18 |
| netif_receive_skb_internal0x23 |
| 1 |
| |
| tcp_send_delayed_ack0x1 |
| tcp_rcv_established0x222 |
| tcp_v4_do_rcv0x130 |
| tcp_v4_rcv0x8e0 |
| ip_local_deliver_finish0x9f |
| ip_local_deliver0x51 |
| ip_rcv_finish0x8a |
| ip_rcv0x29d |
| __netif_receive_skb_core0x637 |
| __netif_receive_skb0x18 |
| netif_receive_skb_internal0x23 |
| 4 |
| |
| tcp_send_mss0x1 |
| inet_sendmsg0x67 |
| sock_sendmsg0x38 |
| sock_write_iter0x78 |
| __vfs_write0xaa |
| vfs_write0xa9 |
| sys_write0x46 |
| entry_SYSCALL_64_fastpath0x16 |
| 7 |
| |
| tcp_sendmsg0x1 |
| sock_sendmsg0x38 |
| sock_write_iter0x78 |
| __vfs_write0xaa |
| vfs_write0xa9 |
| sys_write0x46 |
| entry_SYSCALL_64_fastpath0x16 |
| 7 |
| |
| Detaching... |
| |
| Use -r to allow regular expressions. |
| |
| |
| USAGE message: |
| |
| # ./stackcount -h |
| usage: stackcount [-h] [-p PID] [-i INTERVAL] [-T] [-r] [-s] |
| [-l LIBRARY] [-v] [-d] pattern |
| |
| Count function calls and their stack traces |
| |
| positional arguments: |
| pattern search expression for functions |
| |
| optional arguments: |
| -h, --help show this help message and exit |
| -p PID, --pid PID trace this PID only |
| -i INTERVAL, --interval INTERVAL |
| summary interval, seconds |
| -T, --timestamp include timestamp on output |
| -r, --regexp use regular expressions. Default is "*" wildcards |
| only. |
| -s, --offset show address offsets |
| -l, --library trace user-space functions from this library or executable |
| -v, --verbose show raw addresses |
| -d, --debug print BPF program before starting (for debugging purposes) |
| |
| examples: |
| ./stackcount submit_bio # count kernel stack traces for submit_bio |
| ./stackcount ip_output # count kernel stack traces for ip_output |
| ./stackcount -s ip_output # show symbol offsets |
| ./stackcount -sv ip_output # show offsets and raw addresses (verbose) |
| ./stackcount 'tcp_send*' # count stacks for funcs matching tcp_send* |
| ./stackcount -r '^tcp_send.*' # same as above, using regular expressions |
| ./stackcount -Ti 5 ip_output # output every 5 seconds, with timestamps |
| ./stackcount -p 185 ip_output # count ip_output stacks for PID 185 only |
| ./stackcount -p 185 -l c malloc # count stacks for malloc in PID 185 |
| ./stackcount t:sched:sched_fork # count stacks for the sched_fork tracepoint |
| ./stackcount -p 185 u:node:* # count stacks for all USDT probes in node |