| Demonstrations of stackcount, the Linux eBPF/bcc version. |
| |
| |
| This program traces kernel functions and frequency counts them with their entire |
| kernel stack trace, summarized in-kernel for efficiency. For example, counting |
| stack traces that led to submit_bio(), which creates block device I/O: |
| |
| # ./stackcount submit_bio |
| Tracing 1 functions for "submit_bio"... Hit Ctrl-C to end. |
| ^C |
| submit_bio |
| submit_bh |
| journal_submit_commit_record.isra.13 |
| jbd2_journal_commit_transaction |
| kjournald2 |
| kthread |
| ret_from_fork |
| mb_cache_list |
| 1 |
| |
| submit_bio |
| __block_write_full_page.constprop.39 |
| block_write_full_page |
| blkdev_writepage |
| __writepage |
| write_cache_pages |
| generic_writepages |
| do_writepages |
| __writeback_single_inode |
| writeback_sb_inodes |
| __writeback_inodes_wb |
| 2 |
| |
| submit_bio |
| __block_write_full_page.constprop.39 |
| block_write_full_page |
| blkdev_writepage |
| __writepage |
| write_cache_pages |
| generic_writepages |
| do_writepages |
| __filemap_fdatawrite_range |
| filemap_fdatawrite |
| fdatawrite_one_bdev |
| 36 |
| |
| submit_bio |
| submit_bh |
| jbd2_journal_commit_transaction |
| kjournald2 |
| kthread |
| ret_from_fork |
| mb_cache_list |
| 38 |
| |
| submit_bio |
| ext4_writepages |
| do_writepages |
| __filemap_fdatawrite_range |
| filemap_flush |
| ext4_alloc_da_blocks |
| ext4_rename |
| ext4_rename2 |
| vfs_rename |
| sys_rename |
| entry_SYSCALL_64_fastpath |
| 79 |
| |
| Detaching... |
| |
| The output shows unique stack traces, in order from leaf (on-CPU) to root, |
| followed by their occurrence count. The last stack trace in the above output |
| shows syscall handling, ext4_rename(), and filemap_flush(): looks like an |
| application issued file rename has caused back end disk I/O due to ext4 |
| block allocation and a filemap_flush(). I'd have to browse the code to those |
| functions to confirm! |
| |
| The order of printed stack traces is from least to most frequent. The most |
| frequent in this case, the ext4_rename() stack, was taken 79 times during |
| tracing. |
| |
| It can be useful to trace the path to submit_bio to explain unusual rates of |
| disk IOPS. These could have in-kernel origins (eg, background scrub). |
| |
| This version of stackcount truncates stacks to 10 levels deep (plus 1 for |
| the traced function, so 11). |
| |
| |
| As another example, here are the code paths that led to ip_output(), which |
| sends a packet at the IP level: |
| |
| # ./stackcount ip_output |
| Tracing 1 functions for "ip_output"... Hit Ctrl-C to end. |
| ^C |
| ip_output |
| ip_queue_xmit |
| tcp_transmit_skb |
| tcp_write_xmit |
| tcp_tsq_handler.part.32 |
| tcp_tasklet_func |
| tasklet_action |
| __do_softirq |
| do_softirq_own_stack |
| do_softirq |
| __local_bh_enable_ip |
| 1 |
| |
| ip_output |
| ip_queue_xmit |
| tcp_transmit_skb |
| tcp_send_ack |
| tcp_send_delayed_ack |
| __tcp_ack_snd_check |
| tcp_rcv_established |
| tcp_v4_do_rcv |
| tcp_v4_rcv |
| ip_local_deliver_finish |
| ip_local_deliver |
| 1 |
| |
| ip_output |
| ip_queue_xmit |
| tcp_transmit_skb |
| tcp_send_ack |
| tcp_send_delayed_ack |
| __tcp_ack_snd_check |
| tcp_rcv_established |
| tcp_v4_do_rcv |
| tcp_v4_rcv |
| ip_local_deliver_finish |
| ip_local_deliver |
| 1 |
| |
| ip_output |
| ip_queue_xmit |
| tcp_transmit_skb |
| tcp_send_ack |
| tcp_delack_timer_handler |
| tcp_delack_timer |
| call_timer_fn |
| run_timer_softirq |
| __do_softirq |
| irq_exit |
| xen_evtchn_do_upcall |
| 1 |
| |
| ip_output |
| ip_queue_xmit |
| tcp_transmit_skb |
| tcp_write_xmit |
| __tcp_push_pending_frames |
| tcp_rcv_established |
| tcp_v4_do_rcv |
| release_sock |
| tcp_sendmsg |
| inet_sendmsg |
| sock_sendmsg |
| 3 |
| |
| ip_output |
| ip_queue_xmit |
| tcp_transmit_skb |
| tcp_write_xmit |
| tcp_tsq_handler.part.32 |
| tcp_tasklet_func |
| tasklet_action |
| __do_softirq |
| run_ksoftirqd |
| smpboot_thread_fn |
| kthread |
| 3 |
| |
| ip_output |
| ip_queue_xmit |
| tcp_transmit_skb |
| tcp_write_xmit |
| __tcp_push_pending_frames |
| tcp_rcv_established |
| tcp_v4_do_rcv |
| tcp_v4_rcv |
| ip_local_deliver_finish |
| ip_local_deliver |
| ip_rcv_finish |
| 4 |
| |
| ip_output |
| ip_queue_xmit |
| tcp_transmit_skb |
| tcp_send_ack |
| tcp_cleanup_rbuf |
| tcp_recvmsg |
| inet_recvmsg |
| sock_recvmsg |
| sock_read_iter |
| __vfs_read |
| vfs_read |
| 5 |
| |
| ip_output |
| ip_queue_xmit |
| tcp_transmit_skb |
| tcp_write_xmit |
| __tcp_push_pending_frames |
| tcp_rcv_established |
| tcp_v4_do_rcv |
| release_sock |
| tcp_sendmsg |
| inet_sendmsg |
| sock_sendmsg |
| 9 |
| |
| ip_output |
| ip_queue_xmit |
| tcp_transmit_skb |
| tcp_write_xmit |
| __tcp_push_pending_frames |
| tcp_push |
| tcp_sendmsg |
| inet_sendmsg |
| sock_sendmsg |
| sock_write_iter |
| __vfs_write |
| 51 |
| |
| ip_output |
| ip_queue_xmit |
| tcp_transmit_skb |
| tcp_write_xmit |
| __tcp_push_pending_frames |
| tcp_rcv_established |
| tcp_v4_do_rcv |
| tcp_v4_rcv |
| ip_local_deliver_finish |
| ip_local_deliver |
| ip_rcv_finish |
| 171 |
| |
| ip_output |
| ip_queue_xmit |
| tcp_transmit_skb |
| tcp_write_xmit |
| __tcp_push_pending_frames |
| tcp_rcv_established |
| tcp_v4_do_rcv |
| tcp_v4_rcv |
| ip_local_deliver_finish |
| ip_local_deliver |
| ip_rcv_finish |
| 994 |
| |
| ip_output |
| ip_queue_xmit |
| tcp_transmit_skb |
| tcp_write_xmit |
| tcp_tsq_handler.part.32 |
| tcp_tasklet_func |
| tasklet_action |
| __do_softirq |
| irq_exit |
| xen_evtchn_do_upcall |
| xen_do_hypervisor_callback |
| 1002 |
| |
| Detaching... |
| |
| The last two most frequent stack traces are via tcp_transmit_skb(). Note the |
| send last begins with ip_rcv_finish(), for a local receive, which then becomes |
| a transmit: likely pushing more frames when processing the newly received ones. |
| |
| As may be obvious, this is a great tool for quickly understanding kernel code |
| flow. |
| |
| |
| A -i option can be used to set an output interval, and -T to include a |
| timestamp. For example: |
| |
| # ./stackcount -Ti 1 submit_bio |
| Tracing 1 functions for "submit_bio"... Hit Ctrl-C to end. |
| |
| 01:11:37 |
| submit_bio |
| submit_bh |
| journal_submit_commit_record.isra.13 |
| jbd2_journal_commit_transaction |
| kjournald2 |
| kthread |
| ret_from_fork |
| mb_cache_list |
| 1 |
| |
| submit_bio |
| ext4_writepages |
| do_writepages |
| __filemap_fdatawrite_range |
| filemap_flush |
| ext4_alloc_da_blocks |
| ext4_rename |
| ext4_rename2 |
| vfs_rename |
| sys_rename |
| entry_SYSCALL_64_fastpath |
| 20 |
| |
| submit_bio |
| submit_bh |
| jbd2_journal_commit_transaction |
| kjournald2 |
| kthread |
| ret_from_fork |
| mb_cache_list |
| 39 |
| |
| |
| 01:11:38 |
| submit_bio |
| ext4_writepages |
| do_writepages |
| __filemap_fdatawrite_range |
| filemap_flush |
| ext4_alloc_da_blocks |
| ext4_rename |
| ext4_rename2 |
| vfs_rename |
| sys_rename |
| entry_SYSCALL_64_fastpath |
| 20 |
| |
| |
| 01:11:39 |
| submit_bio |
| ext4_writepages |
| do_writepages |
| __filemap_fdatawrite_range |
| filemap_flush |
| ext4_alloc_da_blocks |
| ext4_rename |
| ext4_rename2 |
| vfs_rename |
| sys_rename |
| entry_SYSCALL_64_fastpath |
| 20 |
| |
| ^C |
| 01:11:39 |
| submit_bio |
| ext4_writepages |
| do_writepages |
| __filemap_fdatawrite_range |
| filemap_flush |
| ext4_alloc_da_blocks |
| ext4_rename |
| ext4_rename2 |
| vfs_rename |
| sys_rename |
| entry_SYSCALL_64_fastpath |
| 20 |
| |
| Detaching... |
| |
| |
| The -s output prints the return instruction offset for each function (aka |
| symbol offset). Eg: |
| |
| # ./stackcount -s tcp_sendmsg |
| Tracing 1 functions for "tcp_sendmsg"... Hit Ctrl-C to end. |
| ^C |
| tcp_sendmsg0x1 |
| sock_sendmsg0x38 |
| sock_write_iter0x78 |
| __vfs_write0xaa |
| vfs_write0xa9 |
| sys_write0x46 |
| entry_SYSCALL_64_fastpath0x16 |
| 29 |
| |
| Detaching... |
| |
| If it wasn't clear how one function called another, knowing the instruction |
| offset can help you locate the lines of code from a dissassembly dump. |
| |
| |
| A wildcard can also be used. Eg, all functions beginning with "tcp_send": |
| |
| # ./stackcount -s 'tcp_send*' |
| Tracing 14 functions for "tcp_send*"... Hit Ctrl-C to end. |
| ^C |
| tcp_send_delayed_ack0x1 |
| tcp_rcv_established0x3b1 |
| tcp_v4_do_rcv0x130 |
| tcp_v4_rcv0x8e0 |
| ip_local_deliver_finish0x9f |
| ip_local_deliver0x51 |
| ip_rcv_finish0x8a |
| ip_rcv0x29d |
| __netif_receive_skb_core0x637 |
| __netif_receive_skb0x18 |
| netif_receive_skb_internal0x23 |
| 1 |
| |
| tcp_send_delayed_ack0x1 |
| tcp_rcv_established0x222 |
| tcp_v4_do_rcv0x130 |
| tcp_v4_rcv0x8e0 |
| ip_local_deliver_finish0x9f |
| ip_local_deliver0x51 |
| ip_rcv_finish0x8a |
| ip_rcv0x29d |
| __netif_receive_skb_core0x637 |
| __netif_receive_skb0x18 |
| netif_receive_skb_internal0x23 |
| 4 |
| |
| tcp_send_mss0x1 |
| inet_sendmsg0x67 |
| sock_sendmsg0x38 |
| sock_write_iter0x78 |
| __vfs_write0xaa |
| vfs_write0xa9 |
| sys_write0x46 |
| entry_SYSCALL_64_fastpath0x16 |
| 7 |
| |
| tcp_sendmsg0x1 |
| sock_sendmsg0x38 |
| sock_write_iter0x78 |
| __vfs_write0xaa |
| vfs_write0xa9 |
| sys_write0x46 |
| entry_SYSCALL_64_fastpath0x16 |
| 7 |
| |
| Detaching... |
| |
| Use -r to allow regular expressions. |
| |
| |
| USAGE message: |
| |
| # ./stackcount -h |
| usage: stackcount [-h] [-p PID] [-i INTERVAL] [-T] [-r] [-s] [-v] pattern |
| |
| Count kernel function calls and their stack traces |
| |
| positional arguments: |
| pattern search expression for kernel functions |
| |
| optional arguments: |
| -h, --help show this help message and exit |
| -p PID, --pid PID trace this PID only |
| -i INTERVAL, --interval INTERVAL |
| summary interval, seconds |
| -T, --timestamp include timestamp on output |
| -r, --regexp use regular expressions. Default is "*" wildcards |
| only. |
| -s, --offset show address offsets |
| -v, --verbose show raw addresses |
| |
| examples: |
| ./stackcount submit_bio # count kernel stack traces for submit_bio |
| ./stackcount ip_output # count kernel stack traces for ip_output |
| ./stackcount -s ip_output # show symbol offsets |
| ./stackcount -sv ip_output # show offsets and raw addresses (verbose) |
| ./stackcount 'tcp_send*' # count stacks for funcs matching tcp_send* |
| ./stackcount -r '^tcp_send.*' # same as above, using regular expressions |
| ./stackcount -Ti 5 ip_output # output every 5 seconds, with timestamps |
| ./stackcount -p 185 ip_output # count ip_output stacks for PID 185 only |