Thomas Wouters | 49fd7fa | 2006-04-21 10:40:58 +0000 | [diff] [blame] | 1 | ________________________________________________________________________ |
| 2 | |
| 3 | PYBENCH - A Python Benchmark Suite |
| 4 | ________________________________________________________________________ |
| 5 | |
| 6 | Extendable suite of of low-level benchmarks for measuring |
| 7 | the performance of the Python implementation |
| 8 | (interpreter, compiler or VM). |
| 9 | |
| 10 | pybench is a collection of tests that provides a standardized way to |
| 11 | measure the performance of Python implementations. It takes a very |
| 12 | close look at different aspects of Python programs and let's you |
| 13 | decide which factors are more important to you than others, rather |
| 14 | than wrapping everything up in one number, like the other performance |
| 15 | tests do (e.g. pystone which is included in the Python Standard |
| 16 | Library). |
| 17 | |
| 18 | pybench has been used in the past by several Python developers to |
| 19 | track down performance bottlenecks or to demonstrate the impact of |
| 20 | optimizations and new features in Python. |
| 21 | |
| 22 | The command line interface for pybench is the file pybench.py. Run |
| 23 | this script with option '--help' to get a listing of the possible |
| 24 | options. Without options, pybench will simply execute the benchmark |
| 25 | and then print out a report to stdout. |
| 26 | |
| 27 | |
| 28 | Micro-Manual |
| 29 | ------------ |
| 30 | |
Thomas Wouters | 0e3f591 | 2006-08-11 14:57:12 +0000 | [diff] [blame] | 31 | Run 'pybench.py -h' to see the help screen. Run 'pybench.py' to run |
| 32 | the benchmark suite using default settings and 'pybench.py -f <file>' |
| 33 | to have it store the results in a file too. |
| 34 | |
| 35 | It is usually a good idea to run pybench.py multiple times to see |
| 36 | whether the environment, timers and benchmark run-times are suitable |
| 37 | for doing benchmark tests. |
| 38 | |
| 39 | You can use the comparison feature of pybench.py ('pybench.py -c |
| 40 | <file>') to check how well the system behaves in comparison to a |
| 41 | reference run. |
| 42 | |
| 43 | If the differences are well below 10% for each test, then you have a |
| 44 | system that is good for doing benchmark testings. Of you get random |
| 45 | differences of more than 10% or significant differences between the |
| 46 | values for minimum and average time, then you likely have some |
| 47 | background processes running which cause the readings to become |
| 48 | inconsistent. Examples include: web-browsers, email clients, RSS |
| 49 | readers, music players, backup programs, etc. |
| 50 | |
| 51 | If you are only interested in a few tests of the whole suite, you can |
| 52 | use the filtering option, e.g. 'pybench.py -t string' will only |
| 53 | run/show the tests that have 'string' in their name. |
Thomas Wouters | 49fd7fa | 2006-04-21 10:40:58 +0000 | [diff] [blame] | 54 | |
| 55 | This is the current output of pybench.py --help: |
| 56 | |
Thomas Wouters | 0e3f591 | 2006-08-11 14:57:12 +0000 | [diff] [blame] | 57 | """ |
| 58 | ------------------------------------------------------------------------ |
| 59 | PYBENCH - a benchmark test suite for Python interpreters/compilers. |
| 60 | ------------------------------------------------------------------------ |
| 61 | |
Thomas Wouters | 49fd7fa | 2006-04-21 10:40:58 +0000 | [diff] [blame] | 62 | Synopsis: |
| 63 | pybench.py [option] files... |
| 64 | |
| 65 | Options and default settings: |
| 66 | -n arg number of rounds (10) |
| 67 | -f arg save benchmark to file arg () |
| 68 | -c arg compare benchmark with the one in file arg () |
| 69 | -s arg show benchmark in file arg, then exit () |
Thomas Wouters | 0e3f591 | 2006-08-11 14:57:12 +0000 | [diff] [blame] | 70 | -w arg set warp factor to arg (10) |
| 71 | -t arg run only tests with names matching arg () |
| 72 | -C arg set the number of calibration runs to arg (20) |
| 73 | -d hide noise in comparisons (0) |
| 74 | -v verbose output (not recommended) (0) |
| 75 | --with-gc enable garbage collection (0) |
| 76 | --with-syscheck use default sys check interval (0) |
| 77 | --timer arg use given timer (time.time) |
Thomas Wouters | 49fd7fa | 2006-04-21 10:40:58 +0000 | [diff] [blame] | 78 | -h show this help text |
| 79 | --help show this help text |
| 80 | --debug enable debugging |
| 81 | --copyright show copyright |
| 82 | --examples show examples of usage |
| 83 | |
| 84 | Version: |
Antoine Pitrou | 8a68122 | 2009-02-07 17:13:31 +0000 | [diff] [blame] | 85 | 2.1 |
Thomas Wouters | 49fd7fa | 2006-04-21 10:40:58 +0000 | [diff] [blame] | 86 | |
| 87 | The normal operation is to run the suite and display the |
Thomas Wouters | 0e3f591 | 2006-08-11 14:57:12 +0000 | [diff] [blame] | 88 | results. Use -f to save them for later reuse or comparisons. |
| 89 | |
| 90 | Available timers: |
| 91 | |
| 92 | time.time |
| 93 | time.clock |
| 94 | systimes.processtime |
Thomas Wouters | 49fd7fa | 2006-04-21 10:40:58 +0000 | [diff] [blame] | 95 | |
| 96 | Examples: |
| 97 | |
Antoine Pitrou | 8a68122 | 2009-02-07 17:13:31 +0000 | [diff] [blame] | 98 | python3.0 pybench.py -f p30.pybench |
| 99 | python3.1 pybench.py -f p31.pybench |
| 100 | python pybench.py -s p31.pybench -c p30.pybench |
Thomas Wouters | 0e3f591 | 2006-08-11 14:57:12 +0000 | [diff] [blame] | 101 | """ |
Thomas Wouters | 49fd7fa | 2006-04-21 10:40:58 +0000 | [diff] [blame] | 102 | |
| 103 | License |
| 104 | ------- |
| 105 | |
| 106 | See LICENSE file. |
| 107 | |
| 108 | |
| 109 | Sample output |
| 110 | ------------- |
| 111 | |
Thomas Wouters | 0e3f591 | 2006-08-11 14:57:12 +0000 | [diff] [blame] | 112 | """ |
| 113 | ------------------------------------------------------------------------------- |
Antoine Pitrou | 8a68122 | 2009-02-07 17:13:31 +0000 | [diff] [blame] | 114 | PYBENCH 2.1 |
Thomas Wouters | 0e3f591 | 2006-08-11 14:57:12 +0000 | [diff] [blame] | 115 | ------------------------------------------------------------------------------- |
Antoine Pitrou | 8a68122 | 2009-02-07 17:13:31 +0000 | [diff] [blame] | 116 | * using CPython 3.0 |
Thomas Wouters | 0e3f591 | 2006-08-11 14:57:12 +0000 | [diff] [blame] | 117 | * disabled garbage collection |
| 118 | * system check interval set to maximum: 2147483647 |
| 119 | * using timer: time.time |
Thomas Wouters | 49fd7fa | 2006-04-21 10:40:58 +0000 | [diff] [blame] | 120 | |
Thomas Wouters | 0e3f591 | 2006-08-11 14:57:12 +0000 | [diff] [blame] | 121 | Calibrating tests. Please wait... |
Thomas Wouters | 49fd7fa | 2006-04-21 10:40:58 +0000 | [diff] [blame] | 122 | |
Thomas Wouters | 0e3f591 | 2006-08-11 14:57:12 +0000 | [diff] [blame] | 123 | Running 10 round(s) of the suite at warp factor 10: |
Thomas Wouters | 49fd7fa | 2006-04-21 10:40:58 +0000 | [diff] [blame] | 124 | |
Thomas Wouters | 0e3f591 | 2006-08-11 14:57:12 +0000 | [diff] [blame] | 125 | * Round 1 done in 6.388 seconds. |
| 126 | * Round 2 done in 6.485 seconds. |
| 127 | * Round 3 done in 6.786 seconds. |
Thomas Wouters | 49fd7fa | 2006-04-21 10:40:58 +0000 | [diff] [blame] | 128 | ... |
Thomas Wouters | 0e3f591 | 2006-08-11 14:57:12 +0000 | [diff] [blame] | 129 | * Round 10 done in 6.546 seconds. |
Thomas Wouters | 49fd7fa | 2006-04-21 10:40:58 +0000 | [diff] [blame] | 130 | |
Thomas Wouters | 0e3f591 | 2006-08-11 14:57:12 +0000 | [diff] [blame] | 131 | ------------------------------------------------------------------------------- |
| 132 | Benchmark: 2006-06-12 12:09:25 |
| 133 | ------------------------------------------------------------------------------- |
| 134 | |
| 135 | Rounds: 10 |
| 136 | Warp: 10 |
| 137 | Timer: time.time |
| 138 | |
| 139 | Machine Details: |
| 140 | Platform ID: Linux-2.6.8-24.19-default-x86_64-with-SuSE-9.2-x86-64 |
| 141 | Processor: x86_64 |
| 142 | |
| 143 | Python: |
Antoine Pitrou | 8a68122 | 2009-02-07 17:13:31 +0000 | [diff] [blame] | 144 | Implementation: CPython |
Thomas Wouters | 0e3f591 | 2006-08-11 14:57:12 +0000 | [diff] [blame] | 145 | Executable: /usr/local/bin/python |
Antoine Pitrou | 8a68122 | 2009-02-07 17:13:31 +0000 | [diff] [blame] | 146 | Version: 3.0 |
Thomas Wouters | 0e3f591 | 2006-08-11 14:57:12 +0000 | [diff] [blame] | 147 | Compiler: GCC 3.3.4 (pre 3.3.5 20040809) |
| 148 | Bits: 64bit |
| 149 | Build: Oct 1 2005 15:24:35 (#1) |
| 150 | Unicode: UCS2 |
Thomas Wouters | 49fd7fa | 2006-04-21 10:40:58 +0000 | [diff] [blame] | 151 | |
| 152 | |
Thomas Wouters | 0e3f591 | 2006-08-11 14:57:12 +0000 | [diff] [blame] | 153 | Test minimum average operation overhead |
| 154 | ------------------------------------------------------------------------------- |
| 155 | BuiltinFunctionCalls: 126ms 145ms 0.28us 0.274ms |
| 156 | BuiltinMethodLookup: 124ms 130ms 0.12us 0.316ms |
| 157 | CompareFloats: 109ms 110ms 0.09us 0.361ms |
| 158 | CompareFloatsIntegers: 100ms 104ms 0.12us 0.271ms |
| 159 | CompareIntegers: 137ms 138ms 0.08us 0.542ms |
| 160 | CompareInternedStrings: 124ms 127ms 0.08us 1.367ms |
| 161 | CompareLongs: 100ms 104ms 0.10us 0.316ms |
| 162 | CompareStrings: 111ms 115ms 0.12us 0.929ms |
| 163 | CompareUnicode: 108ms 128ms 0.17us 0.693ms |
| 164 | ConcatStrings: 142ms 155ms 0.31us 0.562ms |
| 165 | ConcatUnicode: 119ms 127ms 0.42us 0.384ms |
| 166 | CreateInstances: 123ms 128ms 1.14us 0.367ms |
| 167 | CreateNewInstances: 121ms 126ms 1.49us 0.335ms |
| 168 | CreateStringsWithConcat: 130ms 135ms 0.14us 0.916ms |
| 169 | CreateUnicodeWithConcat: 130ms 135ms 0.34us 0.361ms |
| 170 | DictCreation: 108ms 109ms 0.27us 0.361ms |
| 171 | DictWithFloatKeys: 149ms 153ms 0.17us 0.678ms |
| 172 | DictWithIntegerKeys: 124ms 126ms 0.11us 0.915ms |
| 173 | DictWithStringKeys: 114ms 117ms 0.10us 0.905ms |
| 174 | ForLoops: 110ms 111ms 4.46us 0.063ms |
| 175 | IfThenElse: 118ms 119ms 0.09us 0.685ms |
| 176 | ListSlicing: 116ms 120ms 8.59us 0.103ms |
| 177 | NestedForLoops: 125ms 137ms 0.09us 0.019ms |
| 178 | NormalClassAttribute: 124ms 136ms 0.11us 0.457ms |
| 179 | NormalInstanceAttribute: 110ms 117ms 0.10us 0.454ms |
| 180 | PythonFunctionCalls: 107ms 113ms 0.34us 0.271ms |
| 181 | PythonMethodCalls: 140ms 149ms 0.66us 0.141ms |
| 182 | Recursion: 156ms 166ms 3.32us 0.452ms |
| 183 | SecondImport: 112ms 118ms 1.18us 0.180ms |
| 184 | SecondPackageImport: 118ms 127ms 1.27us 0.180ms |
| 185 | SecondSubmoduleImport: 140ms 151ms 1.51us 0.180ms |
| 186 | SimpleComplexArithmetic: 128ms 139ms 0.16us 0.361ms |
| 187 | SimpleDictManipulation: 134ms 136ms 0.11us 0.452ms |
| 188 | SimpleFloatArithmetic: 110ms 113ms 0.09us 0.571ms |
| 189 | SimpleIntFloatArithmetic: 106ms 111ms 0.08us 0.548ms |
| 190 | SimpleIntegerArithmetic: 106ms 109ms 0.08us 0.544ms |
| 191 | SimpleListManipulation: 103ms 113ms 0.10us 0.587ms |
| 192 | SimpleLongArithmetic: 112ms 118ms 0.18us 0.271ms |
| 193 | SmallLists: 105ms 116ms 0.17us 0.366ms |
| 194 | SmallTuples: 108ms 128ms 0.24us 0.406ms |
| 195 | SpecialClassAttribute: 119ms 136ms 0.11us 0.453ms |
| 196 | SpecialInstanceAttribute: 143ms 155ms 0.13us 0.454ms |
| 197 | StringMappings: 115ms 121ms 0.48us 0.405ms |
| 198 | StringPredicates: 120ms 129ms 0.18us 2.064ms |
| 199 | StringSlicing: 111ms 127ms 0.23us 0.781ms |
| 200 | TryExcept: 125ms 126ms 0.06us 0.681ms |
| 201 | TryRaiseExcept: 133ms 137ms 2.14us 0.361ms |
| 202 | TupleSlicing: 117ms 120ms 0.46us 0.066ms |
| 203 | UnicodeMappings: 156ms 160ms 4.44us 0.429ms |
| 204 | UnicodePredicates: 117ms 121ms 0.22us 2.487ms |
| 205 | UnicodeProperties: 115ms 153ms 0.38us 2.070ms |
| 206 | UnicodeSlicing: 126ms 129ms 0.26us 0.689ms |
| 207 | ------------------------------------------------------------------------------- |
| 208 | Totals: 6283ms 6673ms |
| 209 | """ |
Thomas Wouters | 49fd7fa | 2006-04-21 10:40:58 +0000 | [diff] [blame] | 210 | ________________________________________________________________________ |
| 211 | |
| 212 | Writing New Tests |
| 213 | ________________________________________________________________________ |
| 214 | |
| 215 | pybench tests are simple modules defining one or more pybench.Test |
| 216 | subclasses. |
| 217 | |
| 218 | Writing a test essentially boils down to providing two methods: |
| 219 | .test() which runs .rounds number of .operations test operations each |
| 220 | and .calibrate() which does the same except that it doesn't actually |
| 221 | execute the operations. |
| 222 | |
| 223 | |
| 224 | Here's an example: |
| 225 | ------------------ |
| 226 | |
| 227 | from pybench import Test |
| 228 | |
| 229 | class IntegerCounting(Test): |
| 230 | |
| 231 | # Version number of the test as float (x.yy); this is important |
| 232 | # for comparisons of benchmark runs - tests with unequal version |
| 233 | # number will not get compared. |
| 234 | version = 1.0 |
| 235 | |
| 236 | # The number of abstract operations done in each round of the |
| 237 | # test. An operation is the basic unit of what you want to |
| 238 | # measure. The benchmark will output the amount of run-time per |
| 239 | # operation. Note that in order to raise the measured timings |
| 240 | # significantly above noise level, it is often required to repeat |
| 241 | # sets of operations more than once per test round. The measured |
| 242 | # overhead per test round should be less than 1 second. |
| 243 | operations = 20 |
| 244 | |
| 245 | # Number of rounds to execute per test run. This should be |
| 246 | # adjusted to a figure that results in a test run-time of between |
Thomas Wouters | 0e3f591 | 2006-08-11 14:57:12 +0000 | [diff] [blame] | 247 | # 1-2 seconds (at warp 1). |
Thomas Wouters | 49fd7fa | 2006-04-21 10:40:58 +0000 | [diff] [blame] | 248 | rounds = 100000 |
| 249 | |
| 250 | def test(self): |
| 251 | |
| 252 | """ Run the test. |
| 253 | |
| 254 | The test needs to run self.rounds executing |
| 255 | self.operations number of operations each. |
| 256 | |
| 257 | """ |
| 258 | # Init the test |
| 259 | a = 1 |
| 260 | |
| 261 | # Run test rounds |
| 262 | # |
Georg Brandl | c9a5a0e | 2009-09-01 07:34:27 +0000 | [diff] [blame^] | 263 | for i in range(self.rounds): |
Thomas Wouters | 49fd7fa | 2006-04-21 10:40:58 +0000 | [diff] [blame] | 264 | |
| 265 | # Repeat the operations per round to raise the run-time |
| 266 | # per operation significantly above the noise level of the |
| 267 | # for-loop overhead. |
| 268 | |
| 269 | # Execute 20 operations (a += 1): |
| 270 | a += 1 |
| 271 | a += 1 |
| 272 | a += 1 |
| 273 | a += 1 |
| 274 | a += 1 |
| 275 | a += 1 |
| 276 | a += 1 |
| 277 | a += 1 |
| 278 | a += 1 |
| 279 | a += 1 |
| 280 | a += 1 |
| 281 | a += 1 |
| 282 | a += 1 |
| 283 | a += 1 |
| 284 | a += 1 |
| 285 | a += 1 |
| 286 | a += 1 |
| 287 | a += 1 |
| 288 | a += 1 |
| 289 | a += 1 |
| 290 | |
| 291 | def calibrate(self): |
| 292 | |
| 293 | """ Calibrate the test. |
| 294 | |
| 295 | This method should execute everything that is needed to |
| 296 | setup and run the test - except for the actual operations |
| 297 | that you intend to measure. pybench uses this method to |
| 298 | measure the test implementation overhead. |
| 299 | |
| 300 | """ |
| 301 | # Init the test |
| 302 | a = 1 |
| 303 | |
| 304 | # Run test rounds (without actually doing any operation) |
Georg Brandl | c9a5a0e | 2009-09-01 07:34:27 +0000 | [diff] [blame^] | 305 | for i in range(self.rounds): |
Thomas Wouters | 49fd7fa | 2006-04-21 10:40:58 +0000 | [diff] [blame] | 306 | |
| 307 | # Skip the actual execution of the operations, since we |
| 308 | # only want to measure the test's administration overhead. |
| 309 | pass |
| 310 | |
| 311 | Registering a new test module |
| 312 | ----------------------------- |
| 313 | |
| 314 | To register a test module with pybench, the classes need to be |
| 315 | imported into the pybench.Setup module. pybench will then scan all the |
| 316 | symbols defined in that module for subclasses of pybench.Test and |
| 317 | automatically add them to the benchmark suite. |
| 318 | |
| 319 | |
Thomas Wouters | 477c8d5 | 2006-05-27 19:21:47 +0000 | [diff] [blame] | 320 | Breaking Comparability |
| 321 | ---------------------- |
| 322 | |
| 323 | If a change is made to any individual test that means it is no |
| 324 | longer strictly comparable with previous runs, the '.version' class |
| 325 | variable should be updated. Therefafter, comparisons with previous |
| 326 | versions of the test will list as "n/a" to reflect the change. |
| 327 | |
Thomas Wouters | 0e3f591 | 2006-08-11 14:57:12 +0000 | [diff] [blame] | 328 | |
| 329 | Version History |
| 330 | --------------- |
| 331 | |
Antoine Pitrou | 8a68122 | 2009-02-07 17:13:31 +0000 | [diff] [blame] | 332 | 2.1: made some minor changes for compatibility with Python 3.0: |
| 333 | - replaced cmp with divmod and range with max in Calls.py |
| 334 | (cmp no longer exists in 3.0, and range is a list in |
| 335 | Python 2.x and an iterator in Python 3.x) |
| 336 | |
Thomas Wouters | 0e3f591 | 2006-08-11 14:57:12 +0000 | [diff] [blame] | 337 | 2.0: rewrote parts of pybench which resulted in more repeatable |
| 338 | timings: |
| 339 | - made timer a parameter |
| 340 | - changed the platform default timer to use high-resolution |
| 341 | timers rather than process timers (which have a much lower |
| 342 | resolution) |
| 343 | - added option to select timer |
| 344 | - added process time timer (using systimes.py) |
| 345 | - changed to use min() as timing estimator (average |
| 346 | is still taken as well to provide an idea of the difference) |
| 347 | - garbage collection is turned off per default |
| 348 | - sys check interval is set to the highest possible value |
| 349 | - calibration is now a separate step and done using |
| 350 | a different strategy that allows measuring the test |
| 351 | overhead more accurately |
| 352 | - modified the tests to each give a run-time of between |
| 353 | 100-200ms using warp 10 |
| 354 | - changed default warp factor to 10 (from 20) |
| 355 | - compared results with timeit.py and confirmed measurements |
| 356 | - bumped all test versions to 2.0 |
| 357 | - updated platform.py to the latest version |
| 358 | - changed the output format a bit to make it look |
| 359 | nicer |
| 360 | - refactored the APIs somewhat |
| 361 | 1.3+: Steve Holden added the NewInstances test and the filtering |
| 362 | option during the NeedForSpeed sprint; this also triggered a long |
| 363 | discussion on how to improve benchmark timing and finally |
| 364 | resulted in the release of 2.0 |
| 365 | 1.3: initial checkin into the Python SVN repository |
| 366 | |
| 367 | |
Thomas Wouters | 49fd7fa | 2006-04-21 10:40:58 +0000 | [diff] [blame] | 368 | Have fun, |
| 369 | -- |
| 370 | Marc-Andre Lemburg |
| 371 | mal@lemburg.com |