Armin Rigo | cd73a78 | 2006-08-25 12:44:28 +0000 | [diff] [blame] | 1 | """ |
| 2 | The regular expression engine in '_sre' can segfault when interpreting |
| 3 | bogus bytecode. |
| 4 | |
| 5 | It is unclear whether this is a real bug or a "won't fix" case like |
| 6 | bogus_code_obj.py, because it requires bytecode that is built by hand, |
| 7 | as opposed to compiled by 're' from a string-source regexp. The |
| 8 | difference with bogus_code_obj, though, is that the only existing regexp |
| 9 | compiler is written in Python, so that the C code has no choice but |
| 10 | accept arbitrary bytecode from Python-level. |
| 11 | |
| 12 | The test below builds and runs random bytecodes until 'match' crashes |
| 13 | Python. I have not investigated why exactly segfaults occur nor how |
| 14 | hard they would be to fix. Here are a few examples of 'code' that |
| 15 | segfault for me: |
| 16 | |
| 17 | [21, 50814, 8, 29, 16] |
| 18 | [21, 3967, 26, 10, 23, 54113] |
| 19 | [29, 23, 0, 2, 5] |
| 20 | [31, 64351, 0, 28, 3, 22281, 20, 4463, 9, 25, 59154, 15245, 2, |
| 21 | 16343, 3, 11600, 24380, 10, 37556, 10, 31, 15, 31] |
| 22 | |
| 23 | Here is also a 'code' that triggers an infinite uninterruptible loop: |
| 24 | |
| 25 | [29, 1, 8, 21, 1, 43083, 6] |
| 26 | |
| 27 | """ |
| 28 | |
| 29 | import _sre, random |
| 30 | |
| 31 | def pick(): |
| 32 | n = random.randrange(-65536, 65536) |
| 33 | if n < 0: |
| 34 | n &= 31 |
| 35 | return n |
| 36 | |
| 37 | ss = ["", "world", "x" * 500] |
| 38 | |
| 39 | while 1: |
| 40 | code = [pick() for i in range(random.randrange(5, 25))] |
| 41 | print code |
| 42 | pat = _sre.compile(None, 0, code) |
| 43 | for s in ss: |
| 44 | try: |
| 45 | pat.match(s) |
| 46 | except RuntimeError: |
| 47 | pass |