blob: e33d0647006002b695cf05306d8528ae2b21a263 [file] [log] [blame]
Thomas Wouters49fd7fa2006-04-21 10:40:58 +00001________________________________________________________________________
2
3PYBENCH - A Python Benchmark Suite
4________________________________________________________________________
5
6 Extendable suite of of low-level benchmarks for measuring
7 the performance of the Python implementation
8 (interpreter, compiler or VM).
9
10pybench is a collection of tests that provides a standardized way to
11measure the performance of Python implementations. It takes a very
12close look at different aspects of Python programs and let's you
13decide which factors are more important to you than others, rather
14than wrapping everything up in one number, like the other performance
15tests do (e.g. pystone which is included in the Python Standard
16Library).
17
18pybench has been used in the past by several Python developers to
19track down performance bottlenecks or to demonstrate the impact of
20optimizations and new features in Python.
21
22The command line interface for pybench is the file pybench.py. Run
23this script with option '--help' to get a listing of the possible
24options. Without options, pybench will simply execute the benchmark
25and then print out a report to stdout.
26
27
28Micro-Manual
29------------
30
Thomas Wouters0e3f5912006-08-11 14:57:12 +000031Run 'pybench.py -h' to see the help screen. Run 'pybench.py' to run
32the benchmark suite using default settings and 'pybench.py -f <file>'
33to have it store the results in a file too.
34
35It is usually a good idea to run pybench.py multiple times to see
36whether the environment, timers and benchmark run-times are suitable
37for doing benchmark tests.
38
39You can use the comparison feature of pybench.py ('pybench.py -c
40<file>') to check how well the system behaves in comparison to a
41reference run.
42
43If the differences are well below 10% for each test, then you have a
44system that is good for doing benchmark testings. Of you get random
45differences of more than 10% or significant differences between the
46values for minimum and average time, then you likely have some
47background processes running which cause the readings to become
48inconsistent. Examples include: web-browsers, email clients, RSS
49readers, music players, backup programs, etc.
50
51If you are only interested in a few tests of the whole suite, you can
52use the filtering option, e.g. 'pybench.py -t string' will only
53run/show the tests that have 'string' in their name.
Thomas Wouters49fd7fa2006-04-21 10:40:58 +000054
55This is the current output of pybench.py --help:
56
Thomas Wouters0e3f5912006-08-11 14:57:12 +000057"""
58------------------------------------------------------------------------
59PYBENCH - a benchmark test suite for Python interpreters/compilers.
60------------------------------------------------------------------------
61
Thomas Wouters49fd7fa2006-04-21 10:40:58 +000062Synopsis:
63 pybench.py [option] files...
64
65Options and default settings:
66 -n arg number of rounds (10)
67 -f arg save benchmark to file arg ()
68 -c arg compare benchmark with the one in file arg ()
69 -s arg show benchmark in file arg, then exit ()
Thomas Wouters0e3f5912006-08-11 14:57:12 +000070 -w arg set warp factor to arg (10)
71 -t arg run only tests with names matching arg ()
72 -C arg set the number of calibration runs to arg (20)
73 -d hide noise in comparisons (0)
74 -v verbose output (not recommended) (0)
75 --with-gc enable garbage collection (0)
76 --with-syscheck use default sys check interval (0)
77 --timer arg use given timer (time.time)
Thomas Wouters49fd7fa2006-04-21 10:40:58 +000078 -h show this help text
79 --help show this help text
80 --debug enable debugging
81 --copyright show copyright
82 --examples show examples of usage
83
84Version:
Antoine Pitrou8a681222009-02-07 17:13:31 +000085 2.1
Thomas Wouters49fd7fa2006-04-21 10:40:58 +000086
87The normal operation is to run the suite and display the
Thomas Wouters0e3f5912006-08-11 14:57:12 +000088results. Use -f to save them for later reuse or comparisons.
89
90Available timers:
91
92 time.time
93 time.clock
94 systimes.processtime
Thomas Wouters49fd7fa2006-04-21 10:40:58 +000095
96Examples:
97
Antoine Pitrou8a681222009-02-07 17:13:31 +000098python3.0 pybench.py -f p30.pybench
99python3.1 pybench.py -f p31.pybench
100python pybench.py -s p31.pybench -c p30.pybench
Thomas Wouters0e3f5912006-08-11 14:57:12 +0000101"""
Thomas Wouters49fd7fa2006-04-21 10:40:58 +0000102
103License
104-------
105
106See LICENSE file.
107
108
109Sample output
110-------------
111
Thomas Wouters0e3f5912006-08-11 14:57:12 +0000112"""
113-------------------------------------------------------------------------------
Antoine Pitrou8a681222009-02-07 17:13:31 +0000114PYBENCH 2.1
Thomas Wouters0e3f5912006-08-11 14:57:12 +0000115-------------------------------------------------------------------------------
Antoine Pitrou8a681222009-02-07 17:13:31 +0000116* using CPython 3.0
Thomas Wouters0e3f5912006-08-11 14:57:12 +0000117* disabled garbage collection
118* system check interval set to maximum: 2147483647
119* using timer: time.time
Thomas Wouters49fd7fa2006-04-21 10:40:58 +0000120
Thomas Wouters0e3f5912006-08-11 14:57:12 +0000121Calibrating tests. Please wait...
Thomas Wouters49fd7fa2006-04-21 10:40:58 +0000122
Thomas Wouters0e3f5912006-08-11 14:57:12 +0000123Running 10 round(s) of the suite at warp factor 10:
Thomas Wouters49fd7fa2006-04-21 10:40:58 +0000124
Thomas Wouters0e3f5912006-08-11 14:57:12 +0000125* Round 1 done in 6.388 seconds.
126* Round 2 done in 6.485 seconds.
127* Round 3 done in 6.786 seconds.
Thomas Wouters49fd7fa2006-04-21 10:40:58 +0000128...
Thomas Wouters0e3f5912006-08-11 14:57:12 +0000129* Round 10 done in 6.546 seconds.
Thomas Wouters49fd7fa2006-04-21 10:40:58 +0000130
Thomas Wouters0e3f5912006-08-11 14:57:12 +0000131-------------------------------------------------------------------------------
132Benchmark: 2006-06-12 12:09:25
133-------------------------------------------------------------------------------
134
135 Rounds: 10
136 Warp: 10
137 Timer: time.time
138
139 Machine Details:
140 Platform ID: Linux-2.6.8-24.19-default-x86_64-with-SuSE-9.2-x86-64
141 Processor: x86_64
142
143 Python:
Antoine Pitrou8a681222009-02-07 17:13:31 +0000144 Implementation: CPython
Thomas Wouters0e3f5912006-08-11 14:57:12 +0000145 Executable: /usr/local/bin/python
Antoine Pitrou8a681222009-02-07 17:13:31 +0000146 Version: 3.0
Thomas Wouters0e3f5912006-08-11 14:57:12 +0000147 Compiler: GCC 3.3.4 (pre 3.3.5 20040809)
148 Bits: 64bit
149 Build: Oct 1 2005 15:24:35 (#1)
150 Unicode: UCS2
Thomas Wouters49fd7fa2006-04-21 10:40:58 +0000151
152
Thomas Wouters0e3f5912006-08-11 14:57:12 +0000153Test minimum average operation overhead
154-------------------------------------------------------------------------------
155 BuiltinFunctionCalls: 126ms 145ms 0.28us 0.274ms
156 BuiltinMethodLookup: 124ms 130ms 0.12us 0.316ms
157 CompareFloats: 109ms 110ms 0.09us 0.361ms
158 CompareFloatsIntegers: 100ms 104ms 0.12us 0.271ms
159 CompareIntegers: 137ms 138ms 0.08us 0.542ms
160 CompareInternedStrings: 124ms 127ms 0.08us 1.367ms
161 CompareLongs: 100ms 104ms 0.10us 0.316ms
162 CompareStrings: 111ms 115ms 0.12us 0.929ms
163 CompareUnicode: 108ms 128ms 0.17us 0.693ms
164 ConcatStrings: 142ms 155ms 0.31us 0.562ms
165 ConcatUnicode: 119ms 127ms 0.42us 0.384ms
166 CreateInstances: 123ms 128ms 1.14us 0.367ms
167 CreateNewInstances: 121ms 126ms 1.49us 0.335ms
168 CreateStringsWithConcat: 130ms 135ms 0.14us 0.916ms
169 CreateUnicodeWithConcat: 130ms 135ms 0.34us 0.361ms
170 DictCreation: 108ms 109ms 0.27us 0.361ms
171 DictWithFloatKeys: 149ms 153ms 0.17us 0.678ms
172 DictWithIntegerKeys: 124ms 126ms 0.11us 0.915ms
173 DictWithStringKeys: 114ms 117ms 0.10us 0.905ms
174 ForLoops: 110ms 111ms 4.46us 0.063ms
175 IfThenElse: 118ms 119ms 0.09us 0.685ms
176 ListSlicing: 116ms 120ms 8.59us 0.103ms
177 NestedForLoops: 125ms 137ms 0.09us 0.019ms
178 NormalClassAttribute: 124ms 136ms 0.11us 0.457ms
179 NormalInstanceAttribute: 110ms 117ms 0.10us 0.454ms
180 PythonFunctionCalls: 107ms 113ms 0.34us 0.271ms
181 PythonMethodCalls: 140ms 149ms 0.66us 0.141ms
182 Recursion: 156ms 166ms 3.32us 0.452ms
183 SecondImport: 112ms 118ms 1.18us 0.180ms
184 SecondPackageImport: 118ms 127ms 1.27us 0.180ms
185 SecondSubmoduleImport: 140ms 151ms 1.51us 0.180ms
186 SimpleComplexArithmetic: 128ms 139ms 0.16us 0.361ms
187 SimpleDictManipulation: 134ms 136ms 0.11us 0.452ms
188 SimpleFloatArithmetic: 110ms 113ms 0.09us 0.571ms
189 SimpleIntFloatArithmetic: 106ms 111ms 0.08us 0.548ms
190 SimpleIntegerArithmetic: 106ms 109ms 0.08us 0.544ms
191 SimpleListManipulation: 103ms 113ms 0.10us 0.587ms
192 SimpleLongArithmetic: 112ms 118ms 0.18us 0.271ms
193 SmallLists: 105ms 116ms 0.17us 0.366ms
194 SmallTuples: 108ms 128ms 0.24us 0.406ms
195 SpecialClassAttribute: 119ms 136ms 0.11us 0.453ms
196 SpecialInstanceAttribute: 143ms 155ms 0.13us 0.454ms
197 StringMappings: 115ms 121ms 0.48us 0.405ms
198 StringPredicates: 120ms 129ms 0.18us 2.064ms
199 StringSlicing: 111ms 127ms 0.23us 0.781ms
200 TryExcept: 125ms 126ms 0.06us 0.681ms
201 TryRaiseExcept: 133ms 137ms 2.14us 0.361ms
202 TupleSlicing: 117ms 120ms 0.46us 0.066ms
203 UnicodeMappings: 156ms 160ms 4.44us 0.429ms
204 UnicodePredicates: 117ms 121ms 0.22us 2.487ms
205 UnicodeProperties: 115ms 153ms 0.38us 2.070ms
206 UnicodeSlicing: 126ms 129ms 0.26us 0.689ms
207-------------------------------------------------------------------------------
208Totals: 6283ms 6673ms
209"""
Thomas Wouters49fd7fa2006-04-21 10:40:58 +0000210________________________________________________________________________
211
212Writing New Tests
213________________________________________________________________________
214
215pybench tests are simple modules defining one or more pybench.Test
216subclasses.
217
218Writing a test essentially boils down to providing two methods:
219.test() which runs .rounds number of .operations test operations each
220and .calibrate() which does the same except that it doesn't actually
221execute the operations.
222
223
224Here's an example:
225------------------
226
227from pybench import Test
228
229class IntegerCounting(Test):
230
231 # Version number of the test as float (x.yy); this is important
232 # for comparisons of benchmark runs - tests with unequal version
233 # number will not get compared.
234 version = 1.0
235
236 # The number of abstract operations done in each round of the
237 # test. An operation is the basic unit of what you want to
238 # measure. The benchmark will output the amount of run-time per
239 # operation. Note that in order to raise the measured timings
240 # significantly above noise level, it is often required to repeat
241 # sets of operations more than once per test round. The measured
242 # overhead per test round should be less than 1 second.
243 operations = 20
244
245 # Number of rounds to execute per test run. This should be
246 # adjusted to a figure that results in a test run-time of between
Thomas Wouters0e3f5912006-08-11 14:57:12 +0000247 # 1-2 seconds (at warp 1).
Thomas Wouters49fd7fa2006-04-21 10:40:58 +0000248 rounds = 100000
249
250 def test(self):
251
252 """ Run the test.
253
254 The test needs to run self.rounds executing
255 self.operations number of operations each.
256
257 """
258 # Init the test
259 a = 1
260
261 # Run test rounds
262 #
Georg Brandlc9a5a0e2009-09-01 07:34:27 +0000263 for i in range(self.rounds):
Thomas Wouters49fd7fa2006-04-21 10:40:58 +0000264
265 # Repeat the operations per round to raise the run-time
266 # per operation significantly above the noise level of the
267 # for-loop overhead.
268
269 # Execute 20 operations (a += 1):
270 a += 1
271 a += 1
272 a += 1
273 a += 1
274 a += 1
275 a += 1
276 a += 1
277 a += 1
278 a += 1
279 a += 1
280 a += 1
281 a += 1
282 a += 1
283 a += 1
284 a += 1
285 a += 1
286 a += 1
287 a += 1
288 a += 1
289 a += 1
290
291 def calibrate(self):
292
293 """ Calibrate the test.
294
295 This method should execute everything that is needed to
296 setup and run the test - except for the actual operations
297 that you intend to measure. pybench uses this method to
298 measure the test implementation overhead.
299
300 """
301 # Init the test
302 a = 1
303
304 # Run test rounds (without actually doing any operation)
Georg Brandlc9a5a0e2009-09-01 07:34:27 +0000305 for i in range(self.rounds):
Thomas Wouters49fd7fa2006-04-21 10:40:58 +0000306
307 # Skip the actual execution of the operations, since we
308 # only want to measure the test's administration overhead.
309 pass
310
311Registering a new test module
312-----------------------------
313
314To register a test module with pybench, the classes need to be
315imported into the pybench.Setup module. pybench will then scan all the
316symbols defined in that module for subclasses of pybench.Test and
317automatically add them to the benchmark suite.
318
319
Thomas Wouters477c8d52006-05-27 19:21:47 +0000320Breaking Comparability
321----------------------
322
323If a change is made to any individual test that means it is no
324longer strictly comparable with previous runs, the '.version' class
325variable should be updated. Therefafter, comparisons with previous
326versions of the test will list as "n/a" to reflect the change.
327
Thomas Wouters0e3f5912006-08-11 14:57:12 +0000328
329Version History
330---------------
331
Antoine Pitrou8a681222009-02-07 17:13:31 +0000332 2.1: made some minor changes for compatibility with Python 3.0:
333 - replaced cmp with divmod and range with max in Calls.py
334 (cmp no longer exists in 3.0, and range is a list in
335 Python 2.x and an iterator in Python 3.x)
336
Thomas Wouters0e3f5912006-08-11 14:57:12 +0000337 2.0: rewrote parts of pybench which resulted in more repeatable
338 timings:
339 - made timer a parameter
340 - changed the platform default timer to use high-resolution
341 timers rather than process timers (which have a much lower
342 resolution)
343 - added option to select timer
344 - added process time timer (using systimes.py)
345 - changed to use min() as timing estimator (average
346 is still taken as well to provide an idea of the difference)
347 - garbage collection is turned off per default
348 - sys check interval is set to the highest possible value
349 - calibration is now a separate step and done using
350 a different strategy that allows measuring the test
351 overhead more accurately
352 - modified the tests to each give a run-time of between
353 100-200ms using warp 10
354 - changed default warp factor to 10 (from 20)
355 - compared results with timeit.py and confirmed measurements
356 - bumped all test versions to 2.0
357 - updated platform.py to the latest version
358 - changed the output format a bit to make it look
359 nicer
360 - refactored the APIs somewhat
361 1.3+: Steve Holden added the NewInstances test and the filtering
362 option during the NeedForSpeed sprint; this also triggered a long
363 discussion on how to improve benchmark timing and finally
364 resulted in the release of 2.0
365 1.3: initial checkin into the Python SVN repository
366
367
Thomas Wouters49fd7fa2006-04-21 10:40:58 +0000368Have fun,
369--
370Marc-Andre Lemburg
371mal@lemburg.com