Zwb | 6d147e6 | 2019-03-10 01:59:53 +0800 | [diff] [blame] | 1 | Demonstrations of drsnoop, the Linux eBPF/bcc version. |
| 2 | |
| 3 | |
| 4 | drsnoop traces the direct reclaim system-wide, and prints various details. |
| 5 | Example output: |
| 6 | |
| 7 | # ./drsnoop |
| 8 | COMM PID LAT(ms) PAGES |
| 9 | summond 17678 0.19 143 |
| 10 | summond 17669 0.55 313 |
| 11 | summond 17669 0.15 145 |
| 12 | summond 17669 0.27 237 |
| 13 | summond 17669 0.48 111 |
| 14 | summond 17669 0.16 75 |
| 15 | head 17821 0.29 339 |
| 16 | head 17825 0.17 109 |
| 17 | summond 17669 0.14 73 |
| 18 | summond 17496 104.84 40 |
| 19 | summond 17678 0.32 167 |
| 20 | summond 17678 0.14 106 |
| 21 | summond 17678 0.16 67 |
| 22 | summond 17678 0.29 267 |
| 23 | summond 17678 0.27 69 |
| 24 | summond 17678 0.32 46 |
| 25 | base64 17816 0.16 85 |
| 26 | summond 17678 0.43 283 |
| 27 | summond 17678 0.14 182 |
| 28 | head 17736 0.57 135 |
| 29 | ^C |
| 30 | |
| 31 | While tracing, the processes alloc pages,due to insufficient memory available |
| 32 | in the system, direct reclaim events happened, which will increase the waiting |
| 33 | delay of the processes. |
| 34 | |
| 35 | drsnoop can be useful for discovering when allocstall(/proc/vmstat) continues to increase, |
| 36 | whether it is caused by some critical processes or not. |
| 37 | |
| 38 | The -p option can be used to filter on a PID, which is filtered in-kernel. Here |
| 39 | I've used it with -T to print timestamps: |
| 40 | |
senofgithub | e3668b9 | 2019-10-16 15:13:24 +0800 | [diff] [blame] | 41 | # ./drsnoop -Tp 17491 |
Zwb | 6d147e6 | 2019-03-10 01:59:53 +0800 | [diff] [blame] | 42 | TIME(s) COMM PID LAT(ms) PAGES |
| 43 | 107.364115000 summond 17491 0.24 50 |
| 44 | 107.364550000 summond 17491 0.26 38 |
| 45 | 107.365266000 summond 17491 0.36 72 |
| 46 | 107.365753000 summond 17491 0.22 49 |
| 47 | ^C |
| 48 | |
| 49 | This shows the summond process allocs pages, and direct reclaim events happening, |
| 50 | and the delays are not affected much. |
| 51 | |
| 52 | The -U option include UID on output: |
| 53 | |
| 54 | # ./drsnoop -U |
| 55 | UID COMM PID LAT(ms) PAGES |
| 56 | 1000 summond 17678 0.32 46 |
| 57 | 0 base64 17816 0.16 85 |
| 58 | 1000 summond 17678 0.43 283 |
| 59 | 1000 summond 17678 0.14 182 |
| 60 | 0 head 17821 0.29 339 |
| 61 | 0 head 17825 0.17 109 |
| 62 | ^C |
| 63 | |
| 64 | The -u option filtering UID: |
| 65 | |
| 66 | # ./drsnoop -Uu 1000 |
| 67 | UID COMM PID LAT(ms) PAGES |
| 68 | 1000 summond 17678 0.19 143 |
| 69 | 1000 summond 17669 0.55 313 |
| 70 | 1000 summond 17669 0.15 145 |
| 71 | 1000 summond 17669 0.27 237 |
| 72 | 1000 summond 17669 0.48 111 |
| 73 | 1000 summond 17669 0.16 75 |
| 74 | 1000 summond 17669 0.14 73 |
| 75 | 1000 summond 17678 0.32 167 |
| 76 | ^C |
| 77 | |
| 78 | A maximum tracing duration can be set with the -d option. For example, to trace |
| 79 | for 2 seconds: |
| 80 | |
| 81 | # ./drsnoop -d 2 |
| 82 | COMM PID LAT(ms) PAGES |
| 83 | head 21715 0.15 195 |
| 84 | |
| 85 | The -n option can be used to filter on process name using partial matches: |
| 86 | |
| 87 | # ./drsnoop -n mond |
| 88 | COMM PID LAT(ms) PAGES |
| 89 | summond 10271 0.03 51 |
| 90 | summond 10271 0.03 51 |
| 91 | summond 10259 0.05 51 |
| 92 | summond 10269 319.41 37 |
| 93 | summond 10270 111.73 35 |
| 94 | summond 10270 0.11 78 |
| 95 | summond 10270 0.12 71 |
| 96 | summond 10270 0.03 35 |
| 97 | summond 10277 111.62 41 |
| 98 | summond 10277 0.08 45 |
| 99 | summond 10277 0.06 32 |
| 100 | ^C |
| 101 | |
| 102 | This caught the 'summond' command because it partially matches 'mond' that's passed |
| 103 | to the '-n' option. |
| 104 | |
| 105 | |
| 106 | The -v option can be used to show system memory state (now only free mem) at |
| 107 | the beginning of direct reclaiming: |
| 108 | |
| 109 | # ./drsnoop.py -v |
| 110 | COMM PID LAT(ms) PAGES FREE(KB) |
| 111 | base64 34924 0.23 151 86260 |
| 112 | base64 34962 0.26 149 86260 |
| 113 | head 34931 0.24 150 86260 |
| 114 | base64 34902 0.19 148 86260 |
| 115 | head 34963 0.19 151 86228 |
| 116 | base64 34959 0.17 151 86228 |
| 117 | head 34965 0.29 190 86228 |
| 118 | base64 34957 0.24 152 86228 |
| 119 | summond 34870 0.15 151 86080 |
| 120 | summond 34870 0.12 115 86184 |
| 121 | |
| 122 | USAGE message: |
| 123 | |
| 124 | # ./drsnoop -h |
| 125 | usage: drsnoop.py [-h] [-T] [-U] [-p PID] [-t TID] [-u UID] [-d DURATION] |
| 126 | [-n NAME] |
| 127 | |
| 128 | Trace direct reclaim |
| 129 | |
| 130 | optional arguments: |
| 131 | -h, --help show this help message and exit |
| 132 | -T, --timestamp include timestamp on output |
| 133 | -U, --print-uid print UID column |
| 134 | -p PID, --pid PID trace this PID only |
| 135 | -t TID, --tid TID trace this TID only |
| 136 | -u UID, --uid UID trace this UID only |
| 137 | -d DURATION, --duration DURATION |
| 138 | total duration of trace in seconds |
| 139 | -n NAME, --name NAME only print process names containing this name |
| 140 | |
| 141 | examples: |
| 142 | ./drsnoop # trace all direct reclaim |
| 143 | ./drsnoop -T # include timestamps |
| 144 | ./drsnoop -U # include UID |
senofgithub | e3668b9 | 2019-10-16 15:13:24 +0800 | [diff] [blame] | 145 | ./drsnoop -p 181 # only trace PID 181 |
Zwb | 6d147e6 | 2019-03-10 01:59:53 +0800 | [diff] [blame] | 146 | ./drsnoop -t 123 # only trace TID 123 |
| 147 | ./drsnoop -u 1000 # only trace UID 1000 |
| 148 | ./drsnoop -d 10 # trace for 10 seconds only |
| 149 | ./drsnoop -n main # only print process names containing "main" |