Sasha Goldshtein | 40975ab | 2016-06-29 03:57:01 +0300 | [diff] [blame] | 1 | Demonstrations of cpudist. |
| 2 | |
| 3 | This program summarizes task on-CPU time as a histogram, showing how long tasks |
| 4 | spent on the CPU before being descheduled. This provides valuable information |
| 5 | that can indicate oversubscription (too many tasks for too few processors), |
| 6 | overhead due to excessive context switching (e.g. a common shared lock for |
| 7 | multiple threads), uneven workload distribution, too-granular tasks, and more. |
| 8 | |
Sasha Goldshtein | 9972f27 | 2016-06-29 01:48:08 -0700 | [diff] [blame] | 9 | Alternatively, the same options are available for summarizing task off-CPU |
| 10 | time, which helps understand how often threads are being descheduled and how |
| 11 | long they spend waiting for I/O, locks, timers, and other causes of suspension. |
| 12 | |
Sasha Goldshtein | 40975ab | 2016-06-29 03:57:01 +0300 | [diff] [blame] | 13 | # ./cpudist.py |
| 14 | Tracing on-CPU time... Hit Ctrl-C to end. |
| 15 | ^C |
| 16 | usecs : count distribution |
| 17 | 0 -> 1 : 0 | | |
| 18 | 2 -> 3 : 1 | | |
| 19 | 4 -> 7 : 1 | | |
| 20 | 8 -> 15 : 13 |** | |
| 21 | 16 -> 31 : 187 |****************************************| |
| 22 | 32 -> 63 : 89 |******************* | |
| 23 | 64 -> 127 : 26 |***** | |
| 24 | 128 -> 255 : 0 | | |
| 25 | 256 -> 511 : 1 | | |
| 26 | |
| 27 | This is from a mostly idle system. Tasks wake up occasionally and run for only |
| 28 | a few dozen microseconds, and then get descheduled. |
| 29 | |
| 30 | Here's some output from a system that is heavily loaded by threads that perform |
| 31 | computation but also compete for a lock: |
| 32 | |
| 33 | # ./cpudist.py |
| 34 | Tracing on-CPU time... Hit Ctrl-C to end. |
| 35 | ^C |
| 36 | usecs : count distribution |
| 37 | 0 -> 1 : 51 |* | |
| 38 | 2 -> 3 : 395 |*********** | |
| 39 | 4 -> 7 : 259 |******* | |
| 40 | 8 -> 15 : 61 |* | |
| 41 | 16 -> 31 : 75 |** | |
| 42 | 32 -> 63 : 31 | | |
| 43 | 64 -> 127 : 7 | | |
| 44 | 128 -> 255 : 5 | | |
| 45 | 256 -> 511 : 3 | | |
| 46 | 512 -> 1023 : 5 | | |
| 47 | 1024 -> 2047 : 6 | | |
| 48 | 2048 -> 4095 : 4 | | |
| 49 | 4096 -> 8191 : 1361 |****************************************| |
| 50 | 8192 -> 16383 : 523 |*************** | |
| 51 | 16384 -> 32767 : 3 | | |
| 52 | |
| 53 | A bimodal distribution is now clearly visible. Most of the time, tasks were |
| 54 | able to run for 4-16ms before being descheduled (this is likely the quantum |
| 55 | length). Occasionally, tasks had to be descheduled a lot earlier -- possibly |
| 56 | because they competed for a shared lock. |
| 57 | |
| 58 | If necessary, you can restrict the output to include only threads from a |
| 59 | particular process -- this helps reduce noise: |
| 60 | |
| 61 | # ./cpudist.py -p $(pidof parprimes) |
| 62 | Tracing on-CPU time... Hit Ctrl-C to end. |
| 63 | ^C |
| 64 | usecs : count distribution |
| 65 | 0 -> 1 : 3 | | |
| 66 | 2 -> 3 : 17 | | |
| 67 | 4 -> 7 : 39 | | |
| 68 | 8 -> 15 : 52 |* | |
| 69 | 16 -> 31 : 43 | | |
| 70 | 32 -> 63 : 12 | | |
| 71 | 64 -> 127 : 13 | | |
| 72 | 128 -> 255 : 0 | | |
| 73 | 256 -> 511 : 1 | | |
| 74 | 512 -> 1023 : 11 | | |
| 75 | 1024 -> 2047 : 15 | | |
| 76 | 2048 -> 4095 : 41 | | |
| 77 | 4096 -> 8191 : 1134 |************************ | |
| 78 | 8192 -> 16383 : 1883 |****************************************| |
| 79 | 16384 -> 32767 : 65 |* | |
| 80 | |
| 81 | You can also ask for output at predefined intervals, and include timestamps for |
| 82 | easier interpretation. While we're at it, the -P switch will print a histogram |
| 83 | separately for each process: |
| 84 | |
| 85 | # ./cpudist.py -TP 5 3 |
| 86 | Tracing on-CPU time... Hit Ctrl-C to end. |
| 87 | |
| 88 | 03:46:51 |
| 89 | |
| 90 | pid = 0 |
| 91 | usecs : count distribution |
| 92 | 0 -> 1 : 0 | | |
| 93 | 2 -> 3 : 1 |** | |
| 94 | 4 -> 7 : 17 |********************************** | |
| 95 | 8 -> 15 : 11 |********************** | |
| 96 | 16 -> 31 : 20 |****************************************| |
| 97 | 32 -> 63 : 15 |****************************** | |
| 98 | 64 -> 127 : 9 |****************** | |
| 99 | 128 -> 255 : 6 |************ | |
| 100 | 256 -> 511 : 0 | | |
| 101 | 512 -> 1023 : 0 | | |
| 102 | 1024 -> 2047 : 0 | | |
| 103 | 2048 -> 4095 : 1 |** | |
| 104 | |
| 105 | pid = 5068 |
| 106 | usecs : count distribution |
| 107 | 0 -> 1 : 0 | | |
| 108 | 2 -> 3 : 1 |************* | |
| 109 | 4 -> 7 : 3 |****************************************| |
| 110 | 8 -> 15 : 0 | | |
| 111 | 16 -> 31 : 0 | | |
| 112 | 32 -> 63 : 1 |************* | |
| 113 | |
| 114 | 03:46:56 |
| 115 | |
| 116 | pid = 0 |
| 117 | usecs : count distribution |
| 118 | 0 -> 1 : 0 | | |
| 119 | 2 -> 3 : 1 |** | |
| 120 | 4 -> 7 : 19 |****************************************| |
| 121 | 8 -> 15 : 11 |*********************** | |
| 122 | 16 -> 31 : 9 |****************** | |
| 123 | 32 -> 63 : 3 |****** | |
| 124 | 64 -> 127 : 1 |** | |
| 125 | 128 -> 255 : 3 |****** | |
| 126 | 256 -> 511 : 0 | | |
| 127 | 512 -> 1023 : 1 |** | |
| 128 | |
| 129 | pid = 5068 |
| 130 | usecs : count distribution |
| 131 | 0 -> 1 : 1 |******************** | |
| 132 | 2 -> 3 : 0 | | |
| 133 | 4 -> 7 : 2 |****************************************| |
| 134 | |
| 135 | 03:47:01 |
| 136 | |
| 137 | pid = 0 |
| 138 | usecs : count distribution |
| 139 | 0 -> 1 : 0 | | |
| 140 | 2 -> 3 : 0 | | |
| 141 | 4 -> 7 : 12 |******************************** | |
| 142 | 8 -> 15 : 15 |****************************************| |
| 143 | 16 -> 31 : 15 |****************************************| |
| 144 | 32 -> 63 : 0 | | |
| 145 | 64 -> 127 : 3 |******** | |
| 146 | 128 -> 255 : 1 |** | |
| 147 | 256 -> 511 : 0 | | |
| 148 | 512 -> 1023 : 1 |** | |
| 149 | |
| 150 | pid = 5068 |
| 151 | usecs : count distribution |
| 152 | 0 -> 1 : 0 | | |
| 153 | 2 -> 3 : 1 |****** | |
| 154 | 4 -> 7 : 6 |****************************************| |
| 155 | 8 -> 15 : 0 | | |
| 156 | 16 -> 31 : 0 | | |
| 157 | 32 -> 63 : 2 |************* | |
| 158 | |
| 159 | This histogram was obtained while executing `dd if=/dev/zero of=/dev/null` with |
| 160 | fairly large block sizes. |
| 161 | |
Sasha Goldshtein | 9972f27 | 2016-06-29 01:48:08 -0700 | [diff] [blame] | 162 | You could also ask for an off-CPU report using the -O switch. Here's a |
| 163 | histogram of task block times while the system is heavily loaded: |
| 164 | |
| 165 | # ./cpudist -O -p $(parprimes) |
| 166 | Tracing off-CPU time... Hit Ctrl-C to end. |
| 167 | ^C |
| 168 | usecs : count distribution |
| 169 | 0 -> 1 : 0 | | |
| 170 | 2 -> 3 : 1 | | |
| 171 | 4 -> 7 : 0 | | |
| 172 | 8 -> 15 : 0 | | |
| 173 | 16 -> 31 : 0 | | |
| 174 | 32 -> 63 : 3 | | |
| 175 | 64 -> 127 : 1 | | |
| 176 | 128 -> 255 : 1 | | |
| 177 | 256 -> 511 : 0 | | |
| 178 | 512 -> 1023 : 2 | | |
| 179 | 1024 -> 2047 : 4 | | |
| 180 | 2048 -> 4095 : 3 | | |
| 181 | 4096 -> 8191 : 70 |*** | |
| 182 | 8192 -> 16383 : 867 |****************************************| |
| 183 | 16384 -> 32767 : 141 |****** | |
| 184 | 32768 -> 65535 : 8 | | |
| 185 | 65536 -> 131071 : 0 | | |
| 186 | 131072 -> 262143 : 1 | | |
| 187 | 262144 -> 524287 : 2 | | |
| 188 | 524288 -> 1048575 : 3 | | |
| 189 | |
| 190 | As you can see, threads are switching out for relatively long intervals, even |
| 191 | though we know the workload doesn't have any significant blocking. This can be |
| 192 | a result of over-subscription -- too many threads contending over too few CPUs. |
| 193 | Indeed, there are four available CPUs and more than four runnable threads: |
| 194 | |
| 195 | # nproc |
| 196 | 4 |
| 197 | # cat /proc/loadavg |
| 198 | 0.04 0.11 0.06 9/147 7494 |
| 199 | |
| 200 | (This shows we have 9 threads runnable out of 147 total. This is more than 4, |
| 201 | the number of available CPUs.) |
| 202 | |
Sasha Goldshtein | 40975ab | 2016-06-29 03:57:01 +0300 | [diff] [blame] | 203 | Finally, let's ask for a per-thread report and values in milliseconds instead |
| 204 | of microseconds: |
| 205 | |
| 206 | # ./cpudist.py -p $(pidof parprimes) -mL |
| 207 | Tracing on-CPU time... Hit Ctrl-C to end. |
| 208 | |
| 209 | |
| 210 | tid = 5092 |
| 211 | msecs : count distribution |
| 212 | 0 -> 1 : 3 | | |
| 213 | 2 -> 3 : 4 | | |
| 214 | 4 -> 7 : 4 | | |
| 215 | 8 -> 15 : 535 |****************************************| |
| 216 | 16 -> 31 : 14 |* | |
| 217 | |
| 218 | tid = 5093 |
| 219 | msecs : count distribution |
| 220 | 0 -> 1 : 8 | | |
| 221 | 2 -> 3 : 6 | | |
| 222 | 4 -> 7 : 4 | | |
| 223 | 8 -> 15 : 534 |****************************************| |
| 224 | 16 -> 31 : 12 | | |
| 225 | |
| 226 | tid = 5094 |
| 227 | msecs : count distribution |
| 228 | 0 -> 1 : 38 |*** | |
| 229 | 2 -> 3 : 5 | | |
| 230 | 4 -> 7 : 5 | | |
| 231 | 8 -> 15 : 476 |****************************************| |
| 232 | 16 -> 31 : 25 |** | |
| 233 | |
| 234 | tid = 5095 |
| 235 | msecs : count distribution |
| 236 | 0 -> 1 : 31 |** | |
| 237 | 2 -> 3 : 6 | | |
| 238 | 4 -> 7 : 10 | | |
| 239 | 8 -> 15 : 478 |****************************************| |
| 240 | 16 -> 31 : 20 |* | |
| 241 | |
| 242 | tid = 5096 |
| 243 | msecs : count distribution |
| 244 | 0 -> 1 : 21 |* | |
| 245 | 2 -> 3 : 5 | | |
| 246 | 4 -> 7 : 4 | | |
| 247 | 8 -> 15 : 523 |****************************************| |
| 248 | 16 -> 31 : 16 |* | |
| 249 | |
| 250 | tid = 5097 |
| 251 | msecs : count distribution |
| 252 | 0 -> 1 : 11 | | |
| 253 | 2 -> 3 : 7 | | |
| 254 | 4 -> 7 : 7 | | |
| 255 | 8 -> 15 : 502 |****************************************| |
| 256 | 16 -> 31 : 23 |* | |
| 257 | |
| 258 | tid = 5098 |
| 259 | msecs : count distribution |
| 260 | 0 -> 1 : 21 |* | |
| 261 | 2 -> 3 : 5 | | |
| 262 | 4 -> 7 : 3 | | |
| 263 | 8 -> 15 : 494 |****************************************| |
| 264 | 16 -> 31 : 28 |** | |
| 265 | |
| 266 | tid = 5099 |
| 267 | msecs : count distribution |
| 268 | 0 -> 1 : 15 |* | |
| 269 | 2 -> 3 : 4 | | |
| 270 | 4 -> 7 : 6 | | |
| 271 | 8 -> 15 : 521 |****************************************| |
| 272 | 16 -> 31 : 12 | | |
| 273 | |
| 274 | It looks like all threads are more-or-less equally busy, and are typically |
| 275 | switched out after running for 8-15 milliseconds (again, this is the typical |
| 276 | quantum length). |
| 277 | |
| 278 | |
| 279 | USAGE message: |
| 280 | |
| 281 | # ./cpudist.py -h |
| 282 | |
Sasha Goldshtein | 9972f27 | 2016-06-29 01:48:08 -0700 | [diff] [blame] | 283 | usage: cpudist.py [-h] [-O] [-T] [-m] [-P] [-L] [-p PID] [interval] [count] |
Sasha Goldshtein | 40975ab | 2016-06-29 03:57:01 +0300 | [diff] [blame] | 284 | |
| 285 | Summarize on-CPU time per task as a histogram. |
| 286 | |
| 287 | positional arguments: |
| 288 | interval output interval, in seconds |
| 289 | count number of outputs |
| 290 | |
| 291 | optional arguments: |
| 292 | -h, --help show this help message and exit |
Sasha Goldshtein | 9972f27 | 2016-06-29 01:48:08 -0700 | [diff] [blame] | 293 | -O, --offcpu measure off-CPU time |
Sasha Goldshtein | 40975ab | 2016-06-29 03:57:01 +0300 | [diff] [blame] | 294 | -T, --timestamp include timestamp on output |
| 295 | -m, --milliseconds millisecond histogram |
| 296 | -P, --pids print a histogram per process ID |
| 297 | -L, --tids print a histogram per thread ID |
| 298 | -p PID, --pid PID trace this PID only |
| 299 | |
| 300 | examples: |
| 301 | cpudist # summarize on-CPU time as a histogram |
Sasha Goldshtein | 9972f27 | 2016-06-29 01:48:08 -0700 | [diff] [blame] | 302 | cpudist -O # summarize off-CPU time as a histogram |
Sasha Goldshtein | 40975ab | 2016-06-29 03:57:01 +0300 | [diff] [blame] | 303 | cpudist 1 10 # print 1 second summaries, 10 times |
| 304 | cpudist -mT 1 # 1s summaries, milliseconds, and timestamps |
| 305 | cpudist -P # show each PID separately |
| 306 | cpudist -p 185 # trace PID 185 only |