Brendan Gregg | 251823a | 2016-12-14 12:10:59 -0800 | [diff] [blame] | 1 | Demonstrations of runqlen, the Linux eBPF/bcc version. |
| 2 | |
| 3 | |
| 4 | This program summarizes scheduler queue length as a histogram, and can also |
| 5 | show run queue occupancy. It works by sampling the run queue length on all |
| 6 | CPUs at 99 Hertz. |
| 7 | |
| 8 | As an example, here is an idle system: |
| 9 | |
| 10 | # ./runqlen.py |
| 11 | Sampling run queue length... Hit Ctrl-C to end. |
| 12 | ^C |
| 13 | runqlen : count distribution |
| 14 | 0 : 1776 |****************************************| |
| 15 | |
| 16 | This shows a zero run queue length each time it was sampled. |
| 17 | |
| 18 | And now a heavily loaded system: |
| 19 | |
| 20 | # ./runqlen.py |
| 21 | Sampling run queue length... Hit Ctrl-C to end. |
| 22 | ^C |
| 23 | runqlen : count distribution |
| 24 | 0 : 1068 |****************************************| |
| 25 | 1 : 642 |************************ | |
| 26 | 2 : 369 |************* | |
| 27 | 3 : 183 |****** | |
| 28 | 4 : 104 |*** | |
| 29 | 5 : 42 |* | |
| 30 | 6 : 13 | | |
| 31 | 7 : 2 | | |
| 32 | 8 : 1 | | |
| 33 | |
| 34 | Now there is often threads queued, with one sample reaching a queue length |
| 35 | of 8. This will cause run queue latency, which can be measured by the bcc |
| 36 | runqlat tool. |
| 37 | |
| 38 | |
| 39 | Here's an example of an issue that runqlen can indentify. Starting with the |
| 40 | system-wide summary: |
| 41 | |
| 42 | # ./runqlen.py |
| 43 | Sampling run queue length... Hit Ctrl-C to end. |
| 44 | ^C |
| 45 | runqlen : count distribution |
| 46 | 0 : 1209 |****************************************| |
| 47 | 1 : 372 |************ | |
| 48 | 2 : 73 |** | |
| 49 | 3 : 3 | | |
| 50 | 4 : 1 | | |
| 51 | 5 : 0 | | |
| 52 | 6 : 0 | | |
| 53 | 7 : 237 |******* | |
| 54 | |
| 55 | This shows there is often a run queue length of 7. Now using the -C option to |
| 56 | see per-CPU histograms: |
| 57 | |
| 58 | # ./runqlen.py -C |
| 59 | Sampling run queue length... Hit Ctrl-C to end. |
| 60 | ^C |
| 61 | |
| 62 | cpu = 0 |
| 63 | runqlen : count distribution |
| 64 | 0 : 257 |****************************************| |
| 65 | 1 : 64 |********* | |
| 66 | 2 : 5 | | |
| 67 | 3 : 0 | | |
| 68 | 4 : 0 | | |
| 69 | 5 : 0 | | |
| 70 | 6 : 1 | | |
| 71 | |
| 72 | cpu = 1 |
| 73 | runqlen : count distribution |
| 74 | 0 : 226 |****************************************| |
| 75 | 1 : 90 |*************** | |
| 76 | 2 : 11 |* | |
| 77 | |
| 78 | cpu = 2 |
| 79 | runqlen : count distribution |
| 80 | 0 : 264 |****************************************| |
| 81 | 1 : 52 |******* | |
| 82 | 2 : 8 |* | |
| 83 | 3 : 1 | | |
| 84 | 4 : 0 | | |
| 85 | 5 : 0 | | |
| 86 | 6 : 1 | | |
| 87 | 7 : 0 | | |
| 88 | 8 : 1 | | |
| 89 | |
| 90 | cpu = 3 |
| 91 | runqlen : count distribution |
| 92 | 0 : 0 | | |
| 93 | 1 : 0 | | |
| 94 | 2 : 0 | | |
| 95 | 3 : 0 | | |
| 96 | 4 : 0 | | |
| 97 | 5 : 0 | | |
| 98 | 6 : 0 | | |
| 99 | 7 : 327 |****************************************| |
| 100 | |
| 101 | cpu = 4 |
| 102 | runqlen : count distribution |
| 103 | 0 : 255 |****************************************| |
| 104 | 1 : 63 |********* | |
| 105 | 2 : 9 |* | |
| 106 | |
| 107 | cpu = 5 |
| 108 | runqlen : count distribution |
| 109 | 0 : 244 |****************************************| |
| 110 | 1 : 78 |************ | |
| 111 | 2 : 3 | | |
| 112 | 3 : 2 | | |
| 113 | |
| 114 | cpu = 6 |
| 115 | runqlen : count distribution |
| 116 | 0 : 253 |****************************************| |
| 117 | 1 : 66 |********** | |
| 118 | 2 : 6 | | |
| 119 | 3 : 1 | | |
| 120 | 4 : 1 | | |
| 121 | |
| 122 | cpu = 7 |
| 123 | runqlen : count distribution |
| 124 | 0 : 243 |****************************************| |
| 125 | 1 : 74 |************ | |
| 126 | 2 : 6 | | |
| 127 | 3 : 1 | | |
| 128 | 4 : 0 | | |
| 129 | 5 : 1 | | |
| 130 | 6 : 2 | | |
| 131 | |
| 132 | The run queue length of 7 is isolated to CPU 3. It was caused by CPU binding |
| 133 | (taskset). This can sometimes happen by applications that try to auto-bind |
| 134 | to CPUs, leaving other CPUs idle while work is queued. |
| 135 | |
| 136 | |
| 137 | runqlat accepts an interval and a count. For example, with -T for timestamps: |
| 138 | |
| 139 | # ./runqlen.py -T 1 5 |
| 140 | Sampling run queue length... Hit Ctrl-C to end. |
| 141 | |
| 142 | 19:51:34 |
| 143 | runqlen : count distribution |
| 144 | 0 : 635 |****************************************| |
| 145 | 1 : 142 |******** | |
| 146 | 2 : 13 | | |
| 147 | 3 : 0 | | |
| 148 | 4 : 1 | | |
| 149 | |
| 150 | 19:51:35 |
| 151 | runqlen : count distribution |
| 152 | 0 : 640 |****************************************| |
| 153 | 1 : 136 |******** | |
| 154 | 2 : 13 | | |
| 155 | 3 : 1 | | |
| 156 | 4 : 0 | | |
| 157 | 5 : 0 | | |
| 158 | 6 : 0 | | |
| 159 | 7 : 0 | | |
| 160 | 8 : 0 | | |
| 161 | 9 : 0 | | |
| 162 | 10 : 1 | | |
| 163 | |
| 164 | 19:51:36 |
| 165 | runqlen : count distribution |
| 166 | 0 : 603 |****************************************| |
| 167 | 1 : 170 |*********** | |
| 168 | 2 : 16 |* | |
| 169 | 3 : 1 | | |
| 170 | 4 : 0 | | |
| 171 | 5 : 0 | | |
| 172 | 6 : 0 | | |
| 173 | 7 : 0 | | |
| 174 | 8 : 0 | | |
| 175 | 9 : 1 | | |
| 176 | |
| 177 | 19:51:37 |
| 178 | runqlen : count distribution |
| 179 | 0 : 617 |****************************************| |
| 180 | 1 : 154 |********* | |
| 181 | 2 : 20 |* | |
| 182 | 3 : 0 | | |
| 183 | 4 : 0 | | |
| 184 | 5 : 0 | | |
| 185 | 6 : 0 | | |
| 186 | 7 : 0 | | |
| 187 | 8 : 0 | | |
| 188 | 9 : 0 | | |
| 189 | 10 : 0 | | |
| 190 | 11 : 1 | | |
| 191 | |
| 192 | 19:51:38 |
| 193 | runqlen : count distribution |
| 194 | 0 : 603 |****************************************| |
| 195 | 1 : 161 |********** | |
| 196 | 2 : 24 |* | |
| 197 | 3 : 4 | | |
| 198 | |
| 199 | The spikes in run queue length of 11 are likely threads waking up at the same |
| 200 | time (a thundering herd), and then are scheduled and complete their execution |
| 201 | quickly. |
| 202 | |
| 203 | |
| 204 | The -O option prints run queue occupancy: the percentage of time that there |
| 205 | was work queued waiting its turn. Eg: |
| 206 | |
| 207 | # ./runqlen.py -OT 1 |
| 208 | Sampling run queue length... Hit Ctrl-C to end. |
| 209 | |
| 210 | 19:54:53 |
| 211 | runqocc: 41.09% |
| 212 | |
| 213 | 19:54:54 |
| 214 | runqocc: 41.85% |
| 215 | |
| 216 | 19:54:55 |
| 217 | runqocc: 41.47% |
| 218 | |
| 219 | 19:54:56 |
| 220 | runqocc: 42.35% |
| 221 | |
| 222 | 19:54:57 |
| 223 | runqocc: 40.83% |
| 224 | [...] |
| 225 | |
| 226 | This can also be examined per-CPU: |
| 227 | |
| 228 | # ./runqlen.py -COT 1 |
| 229 | Sampling run queue length... Hit Ctrl-C to end. |
| 230 | |
| 231 | 19:55:03 |
| 232 | runqocc, CPU 0 32.32% |
| 233 | runqocc, CPU 1 26.26% |
| 234 | runqocc, CPU 2 38.38% |
| 235 | runqocc, CPU 3 100.00% |
| 236 | runqocc, CPU 4 26.26% |
| 237 | runqocc, CPU 5 32.32% |
| 238 | runqocc, CPU 6 39.39% |
| 239 | runqocc, CPU 7 46.46% |
| 240 | |
| 241 | 19:55:04 |
| 242 | runqocc, CPU 0 35.00% |
| 243 | runqocc, CPU 1 32.32% |
| 244 | runqocc, CPU 2 37.00% |
| 245 | runqocc, CPU 3 100.00% |
| 246 | runqocc, CPU 4 43.43% |
| 247 | runqocc, CPU 5 31.31% |
| 248 | runqocc, CPU 6 28.00% |
| 249 | runqocc, CPU 7 31.31% |
| 250 | |
| 251 | 19:55:05 |
| 252 | runqocc, CPU 0 43.43% |
| 253 | runqocc, CPU 1 32.32% |
| 254 | runqocc, CPU 2 45.45% |
| 255 | runqocc, CPU 3 100.00% |
| 256 | runqocc, CPU 4 29.29% |
| 257 | runqocc, CPU 5 36.36% |
| 258 | runqocc, CPU 6 36.36% |
| 259 | runqocc, CPU 7 30.30% |
| 260 | |
| 261 | 19:55:06 |
| 262 | runqocc, CPU 0 40.00% |
| 263 | runqocc, CPU 1 38.00% |
| 264 | runqocc, CPU 2 31.31% |
| 265 | runqocc, CPU 3 100.00% |
| 266 | runqocc, CPU 4 31.31% |
| 267 | runqocc, CPU 5 28.28% |
| 268 | runqocc, CPU 6 31.00% |
| 269 | runqocc, CPU 7 29.29% |
| 270 | [...] |
| 271 | |
| 272 | |
| 273 | USAGE message: |
| 274 | |
| 275 | # ./runqlen -h |
| 276 | usage: runqlen [-h] [-T] [-O] [-C] [interval] [count] |
| 277 | |
| 278 | Summarize scheduler run queue length as a histogram |
| 279 | |
| 280 | positional arguments: |
| 281 | interval output interval, in seconds |
| 282 | count number of outputs |
| 283 | |
| 284 | optional arguments: |
| 285 | -h, --help show this help message and exit |
| 286 | -T, --timestamp include timestamp on output |
| 287 | -O, --runqocc report run queue occupancy |
| 288 | -C, --cpus print output for each CPU separately |
| 289 | |
| 290 | examples: |
| 291 | ./runqlen # summarize run queue length as a histogram |
| 292 | ./runqlen 1 10 # print 1 second summaries, 10 times |
| 293 | ./runqlen -T 1 # 1s summaries and timestamps |
| 294 | ./runqlen -O # report run queue occupancy |
| 295 | ./runqlen -C # show each CPU separately |