| stringbench is a set of performance tests comparing byte string |
| operations with unicode operations. The two string implementations |
| are loosely based on each other and sometimes the algorithm for one is |
| faster than the other. |
| |
| These test set was started at the Need For Speed sprint in Reykjavik |
| to identify which string methods could be sped up quickly and to |
| identify obvious places for improvement. |
| |
| Here is an example of a benchmark |
| |
| |
| @bench('"Andrew".startswith("A")', 'startswith single character', 1000) |
| def startswith_single(STR): |
| s1 = STR("Andrew") |
| s2 = STR("A") |
| s1_startswith = s1.startswith |
| for x in _RANGE_1000: |
| s1_startswith(s2) |
| |
| The bench decorator takes three parameters. The first is a short |
| description of how the code works. In most cases this is Python code |
| snippet. It is not the code which is actually run because the real |
| code is hand-optimized to focus on the method being tested. |
| |
| The second parameter is a group title. All benchmarks with the same |
| group title are listed together. This lets you compare different |
| implementations of the same algorithm, such as "t in s" |
| vs. "s.find(t)". |
| |
| The last is a count. Each benchmark loops over the algorithm either |
| 100 or 1000 times, depending on the algorithm performance. The output |
| time is the time per benchmark call so the reader needs a way to know |
| how to scale the performance. |
| |
| These parameters become function attributes. |
| |
| |
| Here is an example of the output |
| |
| |
| ========== count newlines |
| 38.54 41.60 92.7 ...text.with.2000.newlines.count("\n") (*100) |
| ========== early match, single character |
| 1.14 1.18 96.8 ("A"*1000).find("A") (*1000) |
| 0.44 0.41 105.6 "A" in "A"*1000 (*1000) |
| 1.15 1.17 98.1 ("A"*1000).index("A") (*1000) |
| |
| The first column is the run time in milliseconds for byte strings. |
| The second is the run time for unicode strings. The third is a |
| percentage; byte time / unicode time. It's the percentage by which |
| unicode is faster than byte strings. |
| |
| The last column contains the code snippet and the repeat count for the |
| internal benchmark loop. |
| |
| The times are computed with 'timeit.py' which repeats the test more |
| and more times until the total time takes over 0.2 seconds, returning |
| the best time for a single iteration. |
| |
| The final line of the output is the cumulative time for byte and |
| unicode strings, and the overall performance of unicode relative to |
| bytes. For example |
| |
| 4079.83 5432.25 75.1 TOTAL |
| |
| However, this has no meaning as it evenly weights every test. |
| |