Blame - tools/cpudist_example.txt - platform/external/bcc

Sasha Goldshtein

40975ab

2016-06-29 03:57:01 +0300

[diff] [blame]

1

Demonstrations of cpudist.

2

3

This program summarizes task on-CPU time as a histogram, showing how long tasks

4

spent on the CPU before being descheduled. This provides valuable information

5

that can indicate oversubscription (too many tasks for too few processors),

6

overhead due to excessive context switching (e.g. a common shared lock for

7

multiple threads), uneven workload distribution, too-granular tasks, and more.

8

Sasha Goldshtein

9972f27

2016-06-29 01:48:08 -0700

[diff] [blame]

9

Alternatively, the same options are available for summarizing task off-CPU

10

time, which helps understand how often threads are being descheduled and how

11

long they spend waiting for I/O, locks, timers, and other causes of suspension.

12

Sasha Goldshtein

40975ab

2016-06-29 03:57:01 +0300

[diff] [blame]

13

# ./cpudist.py

14

Tracing on-CPU time... Hit Ctrl-C to end.

15

^C

16

usecs : count distribution

0 -> 1 : 0 | |

2 -> 3 : 1 | |

4 -> 7 : 1 | |

8 -> 15 : 13 |** |

16 -> 31 : 187 |****************************************|

22

32 -> 63 : 89 |******************* |

23

64 -> 127 : 26 |***** |

128 -> 255 : 0 | |

256 -> 511 : 1 | |

This is from a mostly idle system. Tasks wake up occasionally and run for only

28

a few dozen microseconds, and then get descheduled.

29

30

Here's some output from a system that is heavily loaded by threads that perform

31

computation but also compete for a lock:

32

33

# ./cpudist.py

34

Tracing on-CPU time... Hit Ctrl-C to end.

35

^C

36

usecs : count distribution

37

0 -> 1 : 51 |* |

38

2 -> 3 : 395 |*********** |

39

4 -> 7 : 259 |******* |

8 -> 15 : 61 |* |

16 -> 31 : 75 |** |

32 -> 63 : 31 | |

64 -> 127 : 7 | |

128 -> 255 : 5 | |

256 -> 511 : 3 | |

512 -> 1023 : 5 | |

1024 -> 2047 : 6 | |

2048 -> 4095 : 4 | |

4096 -> 8191 : 1361 |****************************************|

50

8192 -> 16383 : 523 |*************** |

51

16384 -> 32767 : 3 | |

52

53

A bimodal distribution is now clearly visible. Most of the time, tasks were

54

able to run for 4-16ms before being descheduled (this is likely the quantum

55

length). Occasionally, tasks had to be descheduled a lot earlier -- possibly

56

because they competed for a shared lock.

57

58

If necessary, you can restrict the output to include only threads from a

59

particular process -- this helps reduce noise:

60

61

# ./cpudist.py -p $(pidof parprimes)

62

Tracing on-CPU time... Hit Ctrl-C to end.

63

^C

64

usecs : count distribution

0 -> 1 : 3 | |

2 -> 3 : 17 | |

4 -> 7 : 39 | |

8 -> 15 : 52 |* |

16 -> 31 : 43 | |

32 -> 63 : 12 | |

64 -> 127 : 13 | |

128 -> 255 : 0 | |

256 -> 511 : 1 | |

512 -> 1023 : 11 | |

1024 -> 2047 : 15 | |

76

2048 -> 4095 : 41 | |

77

4096 -> 8191 : 1134 |************************ |

78

8192 -> 16383 : 1883 |****************************************|

79

16384 -> 32767 : 65 |* |

80

81

You can also ask for output at predefined intervals, and include timestamps for

82

easier interpretation. While we're at it, the -P switch will print a histogram

83

separately for each process:

84

85

# ./cpudist.py -TP 5 3

86

Tracing on-CPU time... Hit Ctrl-C to end.

03:46:51

pid = 0

usecs : count distribution

92

0 -> 1 : 0 | |

93

2 -> 3 : 1 |** |

94

4 -> 7 : 17 |********************************** |

95

8 -> 15 : 11 |********************** |

96

16 -> 31 : 20 |****************************************|

97

32 -> 63 : 15 |****************************** |

98

64 -> 127 : 9 |****************** |

99

128 -> 255 : 6 |************ |

256 -> 511 : 0 | |

512 -> 1023 : 0 | |

1024 -> 2047 : 0 | |

2048 -> 4095 : 1 |** |

104

105

pid = 5068

106

usecs : count distribution

107

0 -> 1 : 0 | |

108

2 -> 3 : 1 |************* |

109

4 -> 7 : 3 |****************************************|

110

8 -> 15 : 0 | |

111

16 -> 31 : 0 | |

112

32 -> 63 : 1 |************* |

03:46:56

pid = 0

usecs : count distribution

118

0 -> 1 : 0 | |

119

2 -> 3 : 1 |** |

120

4 -> 7 : 19 |****************************************|

121

8 -> 15 : 11 |*********************** |

122

16 -> 31 : 9 |****************** |

123

32 -> 63 : 3 |****** |

124

64 -> 127 : 1 |** |

125

128 -> 255 : 3 |****** |

126

256 -> 511 : 0 | |

127

512 -> 1023 : 1 |** |

128

129

pid = 5068

130

usecs : count distribution

131

0 -> 1 : 1 |******************** |

132

2 -> 3 : 0 | |

133

4 -> 7 : 2 |****************************************|

03:47:01

pid = 0

usecs : count distribution

139

0 -> 1 : 0 | |

140

2 -> 3 : 0 | |

141

4 -> 7 : 12 |******************************** |

142

8 -> 15 : 15 |****************************************|

143

16 -> 31 : 15 |****************************************|

144

32 -> 63 : 0 | |

145

64 -> 127 : 3 |******** |

146

128 -> 255 : 1 |** |

147

256 -> 511 : 0 | |

148

512 -> 1023 : 1 |** |

149

150

pid = 5068

151

usecs : count distribution

152

0 -> 1 : 0 | |

153

2 -> 3 : 1 |****** |

154

4 -> 7 : 6 |****************************************|

155

8 -> 15 : 0 | |

156

16 -> 31 : 0 | |

157

32 -> 63 : 2 |************* |

158

159

This histogram was obtained while executing `dd if=/dev/zero of=/dev/null` with

160

fairly large block sizes.

161

Sasha Goldshtein

9972f27

2016-06-29 01:48:08 -0700

[diff] [blame]

162

You could also ask for an off-CPU report using the -O switch. Here's a

163

histogram of task block times while the system is heavily loaded:

164

165

# ./cpudist -O -p $(parprimes)

166

Tracing off-CPU time... Hit Ctrl-C to end.

167

^C

168

usecs : count distribution

0 -> 1 : 0 | |

2 -> 3 : 1 | |

4 -> 7 : 0 | |

8 -> 15 : 0 | |

16 -> 31 : 0 | |

32 -> 63 : 3 | |

64 -> 127 : 1 | |

128 -> 255 : 1 | |

256 -> 511 : 0 | |

512 -> 1023 : 2 | |

1024 -> 2047 : 4 | |

2048 -> 4095 : 3 | |

4096 -> 8191 : 70 |*** |

182

8192 -> 16383 : 867 |****************************************|

183

16384 -> 32767 : 141 |****** |

184

32768 -> 65535 : 8 | |

185

65536 -> 131071 : 0 | |

186

131072 -> 262143 : 1 | |

187

262144 -> 524287 : 2 | |

188

524288 -> 1048575 : 3 | |

189

190

As you can see, threads are switching out for relatively long intervals, even

191

though we know the workload doesn't have any significant blocking. This can be

192

a result of over-subscription -- too many threads contending over too few CPUs.

193

Indeed, there are four available CPUs and more than four runnable threads:

# nproc

4

# cat /proc/loadavg

0.04 0.11 0.06 9/147 7494

199

200

(This shows we have 9 threads runnable out of 147 total. This is more than 4,

201

the number of available CPUs.)

202

Sasha Goldshtein

40975ab

2016-06-29 03:57:01 +0300

[diff] [blame]

203

Finally, let's ask for a per-thread report and values in milliseconds instead

204

of microseconds:

205

206

# ./cpudist.py -p $(pidof parprimes) -mL

207

Tracing on-CPU time... Hit Ctrl-C to end.

tid = 5092

msecs : count distribution

0 -> 1 : 3 | |

2 -> 3 : 4 | |

4 -> 7 : 4 | |

8 -> 15 : 535 |****************************************|

16 -> 31 : 14 |* |

tid = 5093

msecs : count distribution

0 -> 1 : 8 | |

2 -> 3 : 6 | |

4 -> 7 : 4 | |

8 -> 15 : 534 |****************************************|

16 -> 31 : 12 | |

tid = 5094

msecs : count distribution

0 -> 1 : 38 |*** |

2 -> 3 : 5 | |

4 -> 7 : 5 | |

8 -> 15 : 476 |****************************************|

16 -> 31 : 25 |** |

tid = 5095

msecs : count distribution

0 -> 1 : 31 |** |

2 -> 3 : 6 | |

4 -> 7 : 10 | |

8 -> 15 : 478 |****************************************|

16 -> 31 : 20 |* |

tid = 5096

msecs : count distribution

0 -> 1 : 21 |* |

2 -> 3 : 5 | |

4 -> 7 : 4 | |

8 -> 15 : 523 |****************************************|

16 -> 31 : 16 |* |

tid = 5097

msecs : count distribution

0 -> 1 : 11 | |

2 -> 3 : 7 | |

4 -> 7 : 7 | |

8 -> 15 : 502 |****************************************|

16 -> 31 : 23 |* |

tid = 5098

msecs : count distribution

0 -> 1 : 21 |* |

2 -> 3 : 5 | |

4 -> 7 : 3 | |

8 -> 15 : 494 |****************************************|

16 -> 31 : 28 |** |

tid = 5099

msecs : count distribution

0 -> 1 : 15 |* |

2 -> 3 : 4 | |

4 -> 7 : 6 | |

8 -> 15 : 521 |****************************************|

272

16 -> 31 : 12 | |

273

274

It looks like all threads are more-or-less equally busy, and are typically

275

switched out after running for 8-15 milliseconds (again, this is the typical

quantum length).

USAGE message:

# ./cpudist.py -h

Sasha Goldshtein

2016-06-29 01:48:08 -0700

[diff] [blame]

283

usage: cpudist.py [-h] [-O] [-T] [-m] [-P] [-L] [-p PID] [interval] [count]

Sasha Goldshtein

40975ab

2016-06-29 03:57:01 +0300

[diff] [blame]

284

285

Summarize on-CPU time per task as a histogram.

286

287

positional arguments:

288

interval output interval, in seconds

289

count number of outputs

290

291

optional arguments:

292

-h, --help show this help message and exit

Sasha Goldshtein

9972f27

2016-06-29 01:48:08 -0700

[diff] [blame]

293

-O, --offcpu measure off-CPU time

Sasha Goldshtein

40975ab

2016-06-29 03:57:01 +0300

[diff] [blame]

294

-T, --timestamp include timestamp on output

295

-m, --milliseconds millisecond histogram

296

-P, --pids print a histogram per process ID

297

-L, --tids print a histogram per thread ID

298

-p PID, --pid PID trace this PID only

299

300

examples:

301

cpudist # summarize on-CPU time as a histogram

Sasha Goldshtein

9972f27

2016-06-29 01:48:08 -0700

[diff] [blame]

302

cpudist -O # summarize off-CPU time as a histogram

Sasha Goldshtein

40975ab

2016-06-29 03:57:01 +0300

[diff] [blame]

303

cpudist 1 10 # print 1 second summaries, 10 times

304

cpudist -mT 1 # 1s summaries, milliseconds, and timestamps

305

cpudist -P # show each PID separately

306

cpudist -p 185 # trace PID 185 only