| Antoine Pitrou | 1584ae3 | 2012-04-09 17:03:32 +0200 | [diff] [blame] | 1 | stringbench is a set of performance tests comparing byte string | 
|  | 2 | operations with unicode operations.  The two string implementations | 
|  | 3 | are loosely based on each other and sometimes the algorithm for one is | 
|  | 4 | faster than the other. | 
|  | 5 |  | 
|  | 6 | These test set was started at the Need For Speed sprint in Reykjavik | 
|  | 7 | to identify which string methods could be sped up quickly and to | 
|  | 8 | identify obvious places for improvement. | 
|  | 9 |  | 
|  | 10 | Here is an example of a benchmark | 
|  | 11 |  | 
|  | 12 |  | 
|  | 13 | @bench('"Andrew".startswith("A")', 'startswith single character', 1000) | 
|  | 14 | def startswith_single(STR): | 
|  | 15 | s1 = STR("Andrew") | 
|  | 16 | s2 = STR("A") | 
|  | 17 | s1_startswith = s1.startswith | 
|  | 18 | for x in _RANGE_1000: | 
|  | 19 | s1_startswith(s2) | 
|  | 20 |  | 
|  | 21 | The bench decorator takes three parameters.  The first is a short | 
|  | 22 | description of how the code works.  In most cases this is Python code | 
|  | 23 | snippet.  It is not the code which is actually run because the real | 
|  | 24 | code is hand-optimized to focus on the method being tested. | 
|  | 25 |  | 
|  | 26 | The second parameter is a group title.  All benchmarks with the same | 
|  | 27 | group title are listed together.  This lets you compare different | 
|  | 28 | implementations of the same algorithm, such as "t in s" | 
|  | 29 | vs. "s.find(t)". | 
|  | 30 |  | 
|  | 31 | The last is a count.  Each benchmark loops over the algorithm either | 
|  | 32 | 100 or 1000 times, depending on the algorithm performance.  The output | 
|  | 33 | time is the time per benchmark call so the reader needs a way to know | 
|  | 34 | how to scale the performance. | 
|  | 35 |  | 
|  | 36 | These parameters become function attributes. | 
|  | 37 |  | 
|  | 38 |  | 
|  | 39 | Here is an example of the output | 
|  | 40 |  | 
|  | 41 |  | 
|  | 42 | ========== count newlines | 
|  | 43 | 38.54   41.60   92.7    ...text.with.2000.newlines.count("\n") (*100) | 
|  | 44 | ========== early match, single character | 
|  | 45 | 1.14    1.18    96.8    ("A"*1000).find("A") (*1000) | 
|  | 46 | 0.44    0.41    105.6   "A" in "A"*1000 (*1000) | 
|  | 47 | 1.15    1.17    98.1    ("A"*1000).index("A") (*1000) | 
|  | 48 |  | 
|  | 49 | The first column is the run time in milliseconds for byte strings. | 
|  | 50 | The second is the run time for unicode strings.  The third is a | 
|  | 51 | percentage; byte time / unicode time.  It's the percentage by which | 
|  | 52 | unicode is faster than byte strings. | 
|  | 53 |  | 
|  | 54 | The last column contains the code snippet and the repeat count for the | 
|  | 55 | internal benchmark loop. | 
|  | 56 |  | 
|  | 57 | The times are computed with 'timeit.py' which repeats the test more | 
|  | 58 | and more times until the total time takes over 0.2 seconds, returning | 
|  | 59 | the best time for a single iteration. | 
|  | 60 |  | 
|  | 61 | The final line of the output is the cumulative time for byte and | 
|  | 62 | unicode strings, and the overall performance of unicode relative to | 
|  | 63 | bytes.  For example | 
|  | 64 |  | 
|  | 65 | 4079.83 5432.25 75.1    TOTAL | 
|  | 66 |  | 
|  | 67 | However, this has no meaning as it evenly weights every test. | 
|  | 68 |  |