blob: 47fc73078294a0b70275a38a428c8f6502ef1e93 [file] [log] [blame]
Wenbo Zhang15fbd7c2019-11-15 03:47:27 -05001Demonstrations of compactstall, the Linux eBPF/bcc version.
2
3
4compactsnoop traces the compact zone system-wide, and print various details.
5Example output (manual trigger by echo 1 > /proc/sys/vm/compact_memory):
6
7# ./compactsnoop
8COMM PID NODE ZONE ORDER MODE LAT(ms) STATUS
9zsh 23685 0 ZONE_DMA -1 SYNC 0.025 complete
10zsh 23685 0 ZONE_DMA32 -1 SYNC 3.925 complete
11zsh 23685 0 ZONE_NORMAL -1 SYNC 113.975 complete
12zsh 23685 1 ZONE_NORMAL -1 SYNC 81.57 complete
13zsh 23685 0 ZONE_DMA -1 SYNC 0.02 complete
14zsh 23685 0 ZONE_DMA32 -1 SYNC 4.631 complete
15zsh 23685 0 ZONE_NORMAL -1 SYNC 113.975 complete
16zsh 23685 1 ZONE_NORMAL -1 SYNC 80.647 complete
17zsh 23685 0 ZONE_DMA -1 SYNC 0.020 complete
18zsh 23685 0 ZONE_DMA32 -1 SYNC 3.367 complete
19zsh 23685 0 ZONE_NORMAL -1 SYNC 115.18 complete
20zsh 23685 1 ZONE_NORMAL -1 SYNC 81.766 complete
21zsh 23685 0 ZONE_DMA -1 SYNC 0.025 complete
22zsh 23685 0 ZONE_DMA32 -1 SYNC 4.346 complete
23zsh 23685 0 ZONE_NORMAL -1 SYNC 114.570 complete
24zsh 23685 1 ZONE_NORMAL -1 SYNC 80.820 complete
25zsh 23685 0 ZONE_DMA -1 SYNC 0.026 complete
26zsh 23685 0 ZONE_DMA32 -1 SYNC 4.611 complete
27zsh 23685 0 ZONE_NORMAL -1 SYNC 113.993 complete
28zsh 23685 1 ZONE_NORMAL -1 SYNC 80.928 complete
29zsh 23685 0 ZONE_DMA -1 SYNC 0.02 complete
30zsh 23685 0 ZONE_DMA32 -1 SYNC 3.889 complete
31zsh 23685 0 ZONE_NORMAL -1 SYNC 113.776 complete
32zsh 23685 1 ZONE_NORMAL -1 SYNC 80.727 complete
33^C
34
35While tracing, the processes alloc pages due to memory fragmentation is too
36serious to meet contiguous memory requirements in the system, compact zone
37events happened, which will increase the waiting delay of the processes.
38
39compactsnoop can be useful for discovering when compact_stall(/proc/vmstat)
40continues to increase, whether it is caused by some critical processes or not.
41
42The STATUS include (CentOS 7.6's kernel)
43
44 compact_status = {
45 # COMPACT_SKIPPED: compaction didn't start as it was not possible or direct reclaim was more suitable
46 0: "skipped",
47 # COMPACT_CONTINUE: compaction should continue to another pageblock
48 1: "continue",
49 # COMPACT_PARTIAL: direct compaction partially compacted a zone and there are suitable pages
50 2: "partial",
51 # COMPACT_COMPLETE: The full zone was compacted
52 3: "complete",
53 }
54
55or (kernel 4.7 and above)
56
57 compact_status = {
58 # COMPACT_NOT_SUITABLE_ZONE: For more detailed tracepoint output - internal to compaction
59 0: "not_suitable_zone",
60 # COMPACT_SKIPPED: compaction didn't start as it was not possible or direct reclaim was more suitable
61 1: "skipped",
62 # COMPACT_DEFERRED: compaction didn't start as it was deferred due to past failures
63 2: "deferred",
64 # COMPACT_NOT_SUITABLE_PAGE: For more detailed tracepoint output - internal to compaction
65 3: "no_suitable_page",
66 # COMPACT_CONTINUE: compaction should continue to another pageblock
67 4: "continue",
Michael Prokopc14d02a2020-01-09 02:29:18 +010068 # COMPACT_COMPLETE: The full zone was compacted scanned but wasn't successful to compact suitable pages.
Wenbo Zhang15fbd7c2019-11-15 03:47:27 -050069 5: "complete",
Michael Prokopc14d02a2020-01-09 02:29:18 +010070 # COMPACT_PARTIAL_SKIPPED: direct compaction has scanned part of the zone but wasn't successful to compact suitable pages.
Wenbo Zhang15fbd7c2019-11-15 03:47:27 -050071 6: "partial_skipped",
72 # COMPACT_CONTENDED: compaction terminated prematurely due to lock contentions
73 7: "contended",
74 # COMPACT_SUCCESS: direct compaction terminated after concluding that the allocation should now succeed
75 8: "success",
76 }
77
78The -p option can be used to filter on a PID, which is filtered in-kernel. Here
79I've used it with -T to print timestamps:
80
81# ./compactsnoop -Tp 24376
82TIME(s) COMM PID NODE ZONE ORDER MODE LAT(ms) STATUS
83101.364115000 zsh 24376 0 ZONE_DMA -1 SYNC 0.025 complete
84101.364555000 zsh 24376 0 ZONE_DMA32 -1 SYNC 3.925 complete
85^C
86
87This shows the zsh process allocs pages, and compact zone events happening,
88and the delays are not affected much.
89
90A maximum tracing duration can be set with the -d option. For example, to trace
91for 2 seconds:
92
93# ./compactsnoop -d 2
94COMM PID NODE ZONE ORDER MODE LAT(ms) STATUS
95zsh 26385 0 ZONE_DMA -1 SYNC 0.025444 complete
96^C
97
98The -e option prints out extra columns
99
100# ./compactsnoop -e
101COMM PID NODE ZONE ORDER MODE FRAGIDX MIN LOW HIGH FREE LAT(ms) STATUS
102summ 28276 1 ZONE_NORMAL 3 ASYNC 0.728 11284 14105 16926 14193 3.58 partial
103summ 28276 0 ZONE_NORMAL 2 ASYNC -1.000 11043 13803 16564 14479 0.0 complete
104summ 28276 1 ZONE_NORMAL 2 ASYNC -1.000 11284 14105 16926 14785 0.019 complete
105summ 28276 0 ZONE_NORMAL 2 ASYNC -1.000 11043 13803 16564 15199 0.006 partial
106summ 28276 1 ZONE_NORMAL 2 ASYNC -1.000 11284 14105 16926 17360 0.030 complete
107summ 28276 0 ZONE_NORMAL 2 ASYNC -1.000 11043 13803 16564 15443 0.024 complete
108summ 28276 1 ZONE_NORMAL 2 ASYNC -1.000 11284 14105 16926 15634 0.018 complete
109summ 28276 1 ZONE_NORMAL 3 ASYNC 0.832 11284 14105 16926 15301 0.006 partial
110summ 28276 0 ZONE_NORMAL 2 ASYNC -1.000 11043 13803 16564 14774 0.005 partial
111summ 28276 1 ZONE_NORMAL 3 ASYNC 0.733 11284 14105 16926 19888 0.012 partial
112^C
113
114The FRAGIDX is short for fragmentation index, which only makes sense if an
115allocation of a requested size would fail. If that is true, the fragmentation
116index indicates whether external fragmentation or a lack of memory was the
117problem. The value can be used to determine if page reclaim or compaction
118should be used.
119
120Index is between 0 and 1 so return within 3 decimal places
121
1220 => allocation would fail due to lack of memory
1231 => allocation would fail due to fragmentation
124
125We can see the whole buddy's fragmentation index from /sys/kernel/debug/extfrag/extfrag_index
126
127The MIN/LOW/HIGH shows the watermarks of the zone, which can also get from
128/proc/zoneinfo, and FREE means nr_free_pages (can be found in /proc/zoneinfo too).
129
130
131The -K option prints out kernel stack
132
133# ./compactsnoop -K -e
134
135summ 28276 0 ZONE_NORMAL 3 ASYNC 0.528 11043 13803 16564 22654 13.258 partial
136 kretprobe_trampoline+0x0
137 try_to_compact_pages+0x121
138 __alloc_pages_direct_compact+0xac
139 __alloc_pages_slowpath+0x3e9
140 __alloc_pages_nodemask+0x404
141 alloc_pages_current+0x98
142 new_slab+0x2c5
143 ___slab_alloc+0x3ac
144 __slab_alloc+0x40
145 kmem_cache_alloc_node+0x8b
146 copy_process+0x18e
147 do_fork+0x91
148 sys_clone+0x16
149 stub_clone+0x44
150
151summ 28276 1 ZONE_NORMAL 3 ASYNC -1.000 11284 14105 16926 22074 0.008 partial
152 kretprobe_trampoline+0x0
153 try_to_compact_pages+0x121
154 __alloc_pages_direct_compact+0xac
155 __alloc_pages_slowpath+0x3e9
156 __alloc_pages_nodemask+0x404
157 alloc_pages_current+0x98
158 new_slab+0x2c5
159 ___slab_alloc+0x3ac
160 __slab_alloc+0x40
161 kmem_cache_alloc_node+0x8b
162 copy_process+0x18e
163 do_fork+0x91
164 sys_clone+0x16
165 stub_clone+0x44
166
167summ 28276 0 ZONE_NORMAL 3 ASYNC 0.527 11043 13803 16564 25653 9.812 partial
168 kretprobe_trampoline+0x0
169 try_to_compact_pages+0x121
170 __alloc_pages_direct_compact+0xac
171 __alloc_pages_slowpath+0x3e9
172 __alloc_pages_nodemask+0x404
173 alloc_pages_current+0x98
174 new_slab+0x2c5
175 ___slab_alloc+0x3ac
176 __slab_alloc+0x40
177 kmem_cache_alloc_node+0x8b
178 copy_process+0x18e
179 do_fork+0x91
180 sys_clone+0x16
181 stub_clone+0x44
182
183# ./compactsnoop -h
184usage: compactsnoop.py [-h] [-T] [-p PID] [-d DURATION] [-K] [-e]
185
186Trace compact zone
187
188optional arguments:
189 -h, --help show this help message and exit
190 -T, --timestamp include timestamp on output
191 -p PID, --pid PID trace this PID only
192 -d DURATION, --duration DURATION
193 total duration of trace in seconds
194 -K, --kernel-stack output kernel stack trace
195 -e, --extended_fields
196 show system memory state
197
198examples:
199 ./compactsnoop # trace all compact stall
200 ./compactsnoop -T # include timestamps
201 ./compactsnoop -d 10 # trace for 10 seconds only
202 ./compactsnoop -K # output kernel stack trace
203 ./compactsnoop -e # show extended fields