Antoine Pitrou | 1584ae3 | 2012-04-09 17:03:32 +0200 | [diff] [blame] | 1 | stringbench is a set of performance tests comparing byte string |
| 2 | operations with unicode operations. The two string implementations |
| 3 | are loosely based on each other and sometimes the algorithm for one is |
| 4 | faster than the other. |
| 5 | |
| 6 | These test set was started at the Need For Speed sprint in Reykjavik |
| 7 | to identify which string methods could be sped up quickly and to |
| 8 | identify obvious places for improvement. |
| 9 | |
| 10 | Here is an example of a benchmark |
| 11 | |
| 12 | |
| 13 | @bench('"Andrew".startswith("A")', 'startswith single character', 1000) |
| 14 | def startswith_single(STR): |
| 15 | s1 = STR("Andrew") |
| 16 | s2 = STR("A") |
| 17 | s1_startswith = s1.startswith |
| 18 | for x in _RANGE_1000: |
| 19 | s1_startswith(s2) |
| 20 | |
| 21 | The bench decorator takes three parameters. The first is a short |
| 22 | description of how the code works. In most cases this is Python code |
| 23 | snippet. It is not the code which is actually run because the real |
| 24 | code is hand-optimized to focus on the method being tested. |
| 25 | |
| 26 | The second parameter is a group title. All benchmarks with the same |
| 27 | group title are listed together. This lets you compare different |
| 28 | implementations of the same algorithm, such as "t in s" |
| 29 | vs. "s.find(t)". |
| 30 | |
| 31 | The last is a count. Each benchmark loops over the algorithm either |
| 32 | 100 or 1000 times, depending on the algorithm performance. The output |
| 33 | time is the time per benchmark call so the reader needs a way to know |
| 34 | how to scale the performance. |
| 35 | |
| 36 | These parameters become function attributes. |
| 37 | |
| 38 | |
| 39 | Here is an example of the output |
| 40 | |
| 41 | |
| 42 | ========== count newlines |
| 43 | 38.54 41.60 92.7 ...text.with.2000.newlines.count("\n") (*100) |
| 44 | ========== early match, single character |
| 45 | 1.14 1.18 96.8 ("A"*1000).find("A") (*1000) |
| 46 | 0.44 0.41 105.6 "A" in "A"*1000 (*1000) |
| 47 | 1.15 1.17 98.1 ("A"*1000).index("A") (*1000) |
| 48 | |
| 49 | The first column is the run time in milliseconds for byte strings. |
| 50 | The second is the run time for unicode strings. The third is a |
| 51 | percentage; byte time / unicode time. It's the percentage by which |
| 52 | unicode is faster than byte strings. |
| 53 | |
| 54 | The last column contains the code snippet and the repeat count for the |
| 55 | internal benchmark loop. |
| 56 | |
| 57 | The times are computed with 'timeit.py' which repeats the test more |
| 58 | and more times until the total time takes over 0.2 seconds, returning |
| 59 | the best time for a single iteration. |
| 60 | |
| 61 | The final line of the output is the cumulative time for byte and |
| 62 | unicode strings, and the overall performance of unicode relative to |
| 63 | bytes. For example |
| 64 | |
| 65 | 4079.83 5432.25 75.1 TOTAL |
| 66 | |
| 67 | However, this has no meaning as it evenly weights every test. |
| 68 | |