Eduard - Gabriel Munteanu | aa46a7e | 2008-08-10 20:14:04 +0300 | [diff] [blame] | 1 | kmemtrace - Kernel Memory Tracer |
| 2 | |
| 3 | by Eduard - Gabriel Munteanu |
| 4 | <eduard.munteanu@linux360.ro> |
| 5 | |
| 6 | I. Introduction |
| 7 | =============== |
| 8 | |
| 9 | kmemtrace helps kernel developers figure out two things: |
| 10 | 1) how different allocators (SLAB, SLUB etc.) perform |
| 11 | 2) how kernel code allocates memory and how much |
| 12 | |
| 13 | To do this, we trace every allocation and export information to the userspace |
| 14 | through the relay interface. We export things such as the number of requested |
| 15 | bytes, the number of bytes actually allocated (i.e. including internal |
| 16 | fragmentation), whether this is a slab allocation or a plain kmalloc() and so |
| 17 | on. |
| 18 | |
| 19 | The actual analysis is performed by a userspace tool (see section III for |
| 20 | details on where to get it from). It logs the data exported by the kernel, |
| 21 | processes it and (as of writing this) can provide the following information: |
| 22 | - the total amount of memory allocated and fragmentation per call-site |
| 23 | - the amount of memory allocated and fragmentation per allocation |
| 24 | - total memory allocated and fragmentation in the collected dataset |
| 25 | - number of cross-CPU allocation and frees (makes sense in NUMA environments) |
| 26 | |
| 27 | Moreover, it can potentially find inconsistent and erroneous behavior in |
| 28 | kernel code, such as using slab free functions on kmalloc'ed memory or |
| 29 | allocating less memory than requested (but not truly failed allocations). |
| 30 | |
| 31 | kmemtrace also makes provisions for tracing on some arch and analysing the |
| 32 | data on another. |
| 33 | |
| 34 | II. Design and goals |
| 35 | ==================== |
| 36 | |
| 37 | kmemtrace was designed to handle rather large amounts of data. Thus, it uses |
| 38 | the relay interface to export whatever is logged to userspace, which then |
| 39 | stores it. Analysis and reporting is done asynchronously, that is, after the |
| 40 | data is collected and stored. By design, it allows one to log and analyse |
| 41 | on different machines and different arches. |
| 42 | |
| 43 | As of writing this, the ABI is not considered stable, though it might not |
| 44 | change much. However, no guarantees are made about compatibility yet. When |
| 45 | deemed stable, the ABI should still allow easy extension while maintaining |
| 46 | backward compatibility. This is described further in Documentation/ABI. |
| 47 | |
| 48 | Summary of design goals: |
| 49 | - allow logging and analysis to be done across different machines |
| 50 | - be fast and anticipate usage in high-load environments (*) |
| 51 | - be reasonably extensible |
| 52 | - make it possible for GNU/Linux distributions to have kmemtrace |
| 53 | included in their repositories |
| 54 | |
| 55 | (*) - one of the reasons Pekka Enberg's original userspace data analysis |
| 56 | tool's code was rewritten from Perl to C (although this is more than a |
| 57 | simple conversion) |
| 58 | |
| 59 | |
| 60 | III. Quick usage guide |
| 61 | ====================== |
| 62 | |
| 63 | 1) Get a kernel that supports kmemtrace and build it accordingly (i.e. enable |
Pekka Enberg | bf6803d | 2008-10-10 11:02:59 +0300 | [diff] [blame] | 64 | CONFIG_KMEMTRACE). |
Eduard - Gabriel Munteanu | aa46a7e | 2008-08-10 20:14:04 +0300 | [diff] [blame] | 65 | |
| 66 | 2) Get the userspace tool and build it: |
Matt Kraai | ff2f5ff | 2009-06-04 21:43:10 -0700 | [diff] [blame] | 67 | $ git clone git://repo.or.cz/kmemtrace-user.git # current repository |
Eduard - Gabriel Munteanu | aa46a7e | 2008-08-10 20:14:04 +0300 | [diff] [blame] | 68 | $ cd kmemtrace-user/ |
| 69 | $ ./autogen.sh |
| 70 | $ ./configure |
| 71 | $ make |
| 72 | |
| 73 | 3) Boot the kmemtrace-enabled kernel if you haven't, preferably in the |
| 74 | 'single' runlevel (so that relay buffers don't fill up easily), and run |
| 75 | kmemtrace: |
| 76 | # '$' does not mean user, but root here. |
| 77 | $ mount -t debugfs none /sys/kernel/debug |
| 78 | $ mount -t proc none /proc |
| 79 | $ cd path/to/kmemtrace-user/ |
| 80 | $ ./kmemtraced |
| 81 | Wait a bit, then stop it with CTRL+C. |
| 82 | $ cat /sys/kernel/debug/kmemtrace/total_overruns # Check if we didn't |
| 83 | # overrun, should |
| 84 | # be zero. |
| 85 | $ (Optionally) [Run kmemtrace_check separately on each cpu[0-9]*.out file to |
| 86 | check its correctness] |
| 87 | $ ./kmemtrace-report |
| 88 | |
| 89 | Now you should have a nice and short summary of how the allocator performs. |
| 90 | |
| 91 | IV. FAQ and known issues |
| 92 | ======================== |
| 93 | |
| 94 | Q: 'cat /sys/kernel/debug/kmemtrace/total_overruns' is non-zero, how do I fix |
| 95 | this? Should I worry? |
| 96 | A: If it's non-zero, this affects kmemtrace's accuracy, depending on how |
| 97 | large the number is. You can fix it by supplying a higher |
| 98 | 'kmemtrace.subbufs=N' kernel parameter. |
| 99 | --- |
| 100 | |
| 101 | Q: kmemtrace_check reports errors, how do I fix this? Should I worry? |
| 102 | A: This is a bug and should be reported. It can occur for a variety of |
| 103 | reasons: |
| 104 | - possible bugs in relay code |
| 105 | - possible misuse of relay by kmemtrace |
| 106 | - timestamps being collected unorderly |
| 107 | Or you may fix it yourself and send us a patch. |
| 108 | --- |
| 109 | |
| 110 | Q: kmemtrace_report shows many errors, how do I fix this? Should I worry? |
| 111 | A: This is a known issue and I'm working on it. These might be true errors |
| 112 | in kernel code, which may have inconsistent behavior (e.g. allocating memory |
| 113 | with kmem_cache_alloc() and freeing it with kfree()). Pekka Enberg pointed |
| 114 | out this behavior may work with SLAB, but may fail with other allocators. |
| 115 | |
| 116 | It may also be due to lack of tracing in some unusual allocator functions. |
| 117 | |
| 118 | We don't want bug reports regarding this issue yet. |
| 119 | --- |
| 120 | |
| 121 | V. See also |
| 122 | =========== |
| 123 | |
| 124 | Documentation/kernel-parameters.txt |
| 125 | Documentation/ABI/testing/debugfs-kmemtrace |
| 126 | |