| ________________________________________________________________________ |
| |
| PYBENCH - A Python Benchmark Suite |
| ________________________________________________________________________ |
| |
| Extendable suite of low-level benchmarks for measuring |
| the performance of the Python implementation |
| (interpreter, compiler or VM). |
| |
| pybench is a collection of tests that provides a standardized way to |
| measure the performance of Python implementations. It takes a very |
| close look at different aspects of Python programs and let's you |
| decide which factors are more important to you than others, rather |
| than wrapping everything up in one number, like the other performance |
| tests do (e.g. pystone which is included in the Python Standard |
| Library). |
| |
| pybench has been used in the past by several Python developers to |
| track down performance bottlenecks or to demonstrate the impact of |
| optimizations and new features in Python. |
| |
| The command line interface for pybench is the file pybench.py. Run |
| this script with option '--help' to get a listing of the possible |
| options. Without options, pybench will simply execute the benchmark |
| and then print out a report to stdout. |
| |
| |
| Micro-Manual |
| ------------ |
| |
| Run 'pybench.py -h' to see the help screen. Run 'pybench.py' to run |
| the benchmark suite using default settings and 'pybench.py -f <file>' |
| to have it store the results in a file too. |
| |
| It is usually a good idea to run pybench.py multiple times to see |
| whether the environment, timers and benchmark run-times are suitable |
| for doing benchmark tests. |
| |
| You can use the comparison feature of pybench.py ('pybench.py -c |
| <file>') to check how well the system behaves in comparison to a |
| reference run. |
| |
| If the differences are well below 10% for each test, then you have a |
| system that is good for doing benchmark testings. Of you get random |
| differences of more than 10% or significant differences between the |
| values for minimum and average time, then you likely have some |
| background processes running which cause the readings to become |
| inconsistent. Examples include: web-browsers, email clients, RSS |
| readers, music players, backup programs, etc. |
| |
| If you are only interested in a few tests of the whole suite, you can |
| use the filtering option, e.g. 'pybench.py -t string' will only |
| run/show the tests that have 'string' in their name. |
| |
| This is the current output of pybench.py --help: |
| |
| """ |
| ------------------------------------------------------------------------ |
| PYBENCH - a benchmark test suite for Python interpreters/compilers. |
| ------------------------------------------------------------------------ |
| |
| Synopsis: |
| pybench.py [option] files... |
| |
| Options and default settings: |
| -n arg number of rounds (10) |
| -f arg save benchmark to file arg () |
| -c arg compare benchmark with the one in file arg () |
| -s arg show benchmark in file arg, then exit () |
| -w arg set warp factor to arg (10) |
| -t arg run only tests with names matching arg () |
| -C arg set the number of calibration runs to arg (20) |
| -d hide noise in comparisons (0) |
| -v verbose output (not recommended) (0) |
| --with-gc enable garbage collection (0) |
| --with-syscheck use default sys check interval (0) |
| --timer arg use given timer (time.time) |
| -h show this help text |
| --help show this help text |
| --debug enable debugging |
| --copyright show copyright |
| --examples show examples of usage |
| |
| Version: |
| 2.1 |
| |
| The normal operation is to run the suite and display the |
| results. Use -f to save them for later reuse or comparisons. |
| |
| Available timers: |
| |
| time.time |
| time.clock |
| systimes.processtime |
| |
| Examples: |
| |
| python3.0 pybench.py -f p30.pybench |
| python3.1 pybench.py -f p31.pybench |
| python pybench.py -s p31.pybench -c p30.pybench |
| """ |
| |
| License |
| ------- |
| |
| See LICENSE file. |
| |
| |
| Sample output |
| ------------- |
| |
| """ |
| ------------------------------------------------------------------------------- |
| PYBENCH 2.1 |
| ------------------------------------------------------------------------------- |
| * using CPython 3.0 |
| * disabled garbage collection |
| * system check interval set to maximum: 2147483647 |
| * using timer: time.time |
| |
| Calibrating tests. Please wait... |
| |
| Running 10 round(s) of the suite at warp factor 10: |
| |
| * Round 1 done in 6.388 seconds. |
| * Round 2 done in 6.485 seconds. |
| * Round 3 done in 6.786 seconds. |
| ... |
| * Round 10 done in 6.546 seconds. |
| |
| ------------------------------------------------------------------------------- |
| Benchmark: 2006-06-12 12:09:25 |
| ------------------------------------------------------------------------------- |
| |
| Rounds: 10 |
| Warp: 10 |
| Timer: time.time |
| |
| Machine Details: |
| Platform ID: Linux-2.6.8-24.19-default-x86_64-with-SuSE-9.2-x86-64 |
| Processor: x86_64 |
| |
| Python: |
| Implementation: CPython |
| Executable: /usr/local/bin/python |
| Version: 3.0 |
| Compiler: GCC 3.3.4 (pre 3.3.5 20040809) |
| Bits: 64bit |
| Build: Oct 1 2005 15:24:35 (#1) |
| Unicode: UCS2 |
| |
| |
| Test minimum average operation overhead |
| ------------------------------------------------------------------------------- |
| BuiltinFunctionCalls: 126ms 145ms 0.28us 0.274ms |
| BuiltinMethodLookup: 124ms 130ms 0.12us 0.316ms |
| CompareFloats: 109ms 110ms 0.09us 0.361ms |
| CompareFloatsIntegers: 100ms 104ms 0.12us 0.271ms |
| CompareIntegers: 137ms 138ms 0.08us 0.542ms |
| CompareInternedStrings: 124ms 127ms 0.08us 1.367ms |
| CompareLongs: 100ms 104ms 0.10us 0.316ms |
| CompareStrings: 111ms 115ms 0.12us 0.929ms |
| CompareUnicode: 108ms 128ms 0.17us 0.693ms |
| ConcatStrings: 142ms 155ms 0.31us 0.562ms |
| ConcatUnicode: 119ms 127ms 0.42us 0.384ms |
| CreateInstances: 123ms 128ms 1.14us 0.367ms |
| CreateNewInstances: 121ms 126ms 1.49us 0.335ms |
| CreateStringsWithConcat: 130ms 135ms 0.14us 0.916ms |
| CreateUnicodeWithConcat: 130ms 135ms 0.34us 0.361ms |
| DictCreation: 108ms 109ms 0.27us 0.361ms |
| DictWithFloatKeys: 149ms 153ms 0.17us 0.678ms |
| DictWithIntegerKeys: 124ms 126ms 0.11us 0.915ms |
| DictWithStringKeys: 114ms 117ms 0.10us 0.905ms |
| ForLoops: 110ms 111ms 4.46us 0.063ms |
| IfThenElse: 118ms 119ms 0.09us 0.685ms |
| ListSlicing: 116ms 120ms 8.59us 0.103ms |
| NestedForLoops: 125ms 137ms 0.09us 0.019ms |
| NormalClassAttribute: 124ms 136ms 0.11us 0.457ms |
| NormalInstanceAttribute: 110ms 117ms 0.10us 0.454ms |
| PythonFunctionCalls: 107ms 113ms 0.34us 0.271ms |
| PythonMethodCalls: 140ms 149ms 0.66us 0.141ms |
| Recursion: 156ms 166ms 3.32us 0.452ms |
| SecondImport: 112ms 118ms 1.18us 0.180ms |
| SecondPackageImport: 118ms 127ms 1.27us 0.180ms |
| SecondSubmoduleImport: 140ms 151ms 1.51us 0.180ms |
| SimpleComplexArithmetic: 128ms 139ms 0.16us 0.361ms |
| SimpleDictManipulation: 134ms 136ms 0.11us 0.452ms |
| SimpleFloatArithmetic: 110ms 113ms 0.09us 0.571ms |
| SimpleIntFloatArithmetic: 106ms 111ms 0.08us 0.548ms |
| SimpleIntegerArithmetic: 106ms 109ms 0.08us 0.544ms |
| SimpleListManipulation: 103ms 113ms 0.10us 0.587ms |
| SimpleLongArithmetic: 112ms 118ms 0.18us 0.271ms |
| SmallLists: 105ms 116ms 0.17us 0.366ms |
| SmallTuples: 108ms 128ms 0.24us 0.406ms |
| SpecialClassAttribute: 119ms 136ms 0.11us 0.453ms |
| SpecialInstanceAttribute: 143ms 155ms 0.13us 0.454ms |
| StringMappings: 115ms 121ms 0.48us 0.405ms |
| StringPredicates: 120ms 129ms 0.18us 2.064ms |
| StringSlicing: 111ms 127ms 0.23us 0.781ms |
| TryExcept: 125ms 126ms 0.06us 0.681ms |
| TryRaiseExcept: 133ms 137ms 2.14us 0.361ms |
| TupleSlicing: 117ms 120ms 0.46us 0.066ms |
| UnicodeMappings: 156ms 160ms 4.44us 0.429ms |
| UnicodePredicates: 117ms 121ms 0.22us 2.487ms |
| UnicodeProperties: 115ms 153ms 0.38us 2.070ms |
| UnicodeSlicing: 126ms 129ms 0.26us 0.689ms |
| ------------------------------------------------------------------------------- |
| Totals: 6283ms 6673ms |
| """ |
| ________________________________________________________________________ |
| |
| Writing New Tests |
| ________________________________________________________________________ |
| |
| pybench tests are simple modules defining one or more pybench.Test |
| subclasses. |
| |
| Writing a test essentially boils down to providing two methods: |
| .test() which runs .rounds number of .operations test operations each |
| and .calibrate() which does the same except that it doesn't actually |
| execute the operations. |
| |
| |
| Here's an example: |
| ------------------ |
| |
| from pybench import Test |
| |
| class IntegerCounting(Test): |
| |
| # Version number of the test as float (x.yy); this is important |
| # for comparisons of benchmark runs - tests with unequal version |
| # number will not get compared. |
| version = 1.0 |
| |
| # The number of abstract operations done in each round of the |
| # test. An operation is the basic unit of what you want to |
| # measure. The benchmark will output the amount of run-time per |
| # operation. Note that in order to raise the measured timings |
| # significantly above noise level, it is often required to repeat |
| # sets of operations more than once per test round. The measured |
| # overhead per test round should be less than 1 second. |
| operations = 20 |
| |
| # Number of rounds to execute per test run. This should be |
| # adjusted to a figure that results in a test run-time of between |
| # 1-2 seconds (at warp 1). |
| rounds = 100000 |
| |
| def test(self): |
| |
| """ Run the test. |
| |
| The test needs to run self.rounds executing |
| self.operations number of operations each. |
| |
| """ |
| # Init the test |
| a = 1 |
| |
| # Run test rounds |
| # |
| for i in range(self.rounds): |
| |
| # Repeat the operations per round to raise the run-time |
| # per operation significantly above the noise level of the |
| # for-loop overhead. |
| |
| # Execute 20 operations (a += 1): |
| a += 1 |
| a += 1 |
| a += 1 |
| a += 1 |
| a += 1 |
| a += 1 |
| a += 1 |
| a += 1 |
| a += 1 |
| a += 1 |
| a += 1 |
| a += 1 |
| a += 1 |
| a += 1 |
| a += 1 |
| a += 1 |
| a += 1 |
| a += 1 |
| a += 1 |
| a += 1 |
| |
| def calibrate(self): |
| |
| """ Calibrate the test. |
| |
| This method should execute everything that is needed to |
| setup and run the test - except for the actual operations |
| that you intend to measure. pybench uses this method to |
| measure the test implementation overhead. |
| |
| """ |
| # Init the test |
| a = 1 |
| |
| # Run test rounds (without actually doing any operation) |
| for i in range(self.rounds): |
| |
| # Skip the actual execution of the operations, since we |
| # only want to measure the test's administration overhead. |
| pass |
| |
| Registering a new test module |
| ----------------------------- |
| |
| To register a test module with pybench, the classes need to be |
| imported into the pybench.Setup module. pybench will then scan all the |
| symbols defined in that module for subclasses of pybench.Test and |
| automatically add them to the benchmark suite. |
| |
| |
| Breaking Comparability |
| ---------------------- |
| |
| If a change is made to any individual test that means it is no |
| longer strictly comparable with previous runs, the '.version' class |
| variable should be updated. Therefafter, comparisons with previous |
| versions of the test will list as "n/a" to reflect the change. |
| |
| |
| Version History |
| --------------- |
| |
| 2.1: made some minor changes for compatibility with Python 3.0: |
| - replaced cmp with divmod and range with max in Calls.py |
| (cmp no longer exists in 3.0, and range is a list in |
| Python 2.x and an iterator in Python 3.x) |
| |
| 2.0: rewrote parts of pybench which resulted in more repeatable |
| timings: |
| - made timer a parameter |
| - changed the platform default timer to use high-resolution |
| timers rather than process timers (which have a much lower |
| resolution) |
| - added option to select timer |
| - added process time timer (using systimes.py) |
| - changed to use min() as timing estimator (average |
| is still taken as well to provide an idea of the difference) |
| - garbage collection is turned off per default |
| - sys check interval is set to the highest possible value |
| - calibration is now a separate step and done using |
| a different strategy that allows measuring the test |
| overhead more accurately |
| - modified the tests to each give a run-time of between |
| 100-200ms using warp 10 |
| - changed default warp factor to 10 (from 20) |
| - compared results with timeit.py and confirmed measurements |
| - bumped all test versions to 2.0 |
| - updated platform.py to the latest version |
| - changed the output format a bit to make it look |
| nicer |
| - refactored the APIs somewhat |
| 1.3+: Steve Holden added the NewInstances test and the filtering |
| option during the NeedForSpeed sprint; this also triggered a long |
| discussion on how to improve benchmark timing and finally |
| resulted in the release of 2.0 |
| 1.3: initial checkin into the Python SVN repository |
| |
| |
| Have fun, |
| -- |
| Marc-Andre Lemburg |
| mal@lemburg.com |