Elliott Hughes | 2dbd7d2 | 2020-06-03 14:32:37 -0700 | [diff] [blame] | 1 | # These test special UTF and UCP features of DFA matching. The output is |
| 2 | # different for the different widths. |
Janis Danisevskis | 112c9cc | 2016-03-31 13:35:25 +0100 | [diff] [blame] | 3 | |
| 4 | #subject dfa |
| 5 | |
Elliott Hughes | 2dbd7d2 | 2020-06-03 14:32:37 -0700 | [diff] [blame] | 6 | # ---------------------------------------------------- |
| 7 | # These are a selection of the more comprehensive tests that are run for |
| 8 | # non-DFA matching. |
| 9 | |
Janis Danisevskis | 112c9cc | 2016-03-31 13:35:25 +0100 | [diff] [blame] | 10 | /X/utf |
| 11 | XX\x{d800} |
| 12 | Failed: error -16: UTF-8 error: code points 0xd800-0xdfff are not defined at offset 2 |
| 13 | XX\x{d800}\=offset=3 |
| 14 | Error -36 (bad UTF-8 offset) |
| 15 | XX\x{d800}\=no_utf_check |
| 16 | 0: X |
| 17 | XX\x{da00} |
| 18 | Failed: error -16: UTF-8 error: code points 0xd800-0xdfff are not defined at offset 2 |
| 19 | XX\x{da00}\=no_utf_check |
| 20 | 0: X |
| 21 | XX\x{dc00} |
| 22 | Failed: error -16: UTF-8 error: code points 0xd800-0xdfff are not defined at offset 2 |
| 23 | XX\x{dc00}\=no_utf_check |
| 24 | 0: X |
| 25 | XX\x{de00} |
| 26 | Failed: error -16: UTF-8 error: code points 0xd800-0xdfff are not defined at offset 2 |
| 27 | XX\x{de00}\=no_utf_check |
| 28 | 0: X |
| 29 | XX\x{dfff} |
| 30 | Failed: error -16: UTF-8 error: code points 0xd800-0xdfff are not defined at offset 2 |
| 31 | XX\x{dfff}\=no_utf_check |
| 32 | 0: X |
| 33 | XX\x{110000} |
| 34 | Failed: error -15: UTF-8 error: code points greater than 0x10ffff are not defined at offset 2 |
| 35 | XX\x{d800}\x{1234} |
| 36 | Failed: error -16: UTF-8 error: code points 0xd800-0xdfff are not defined at offset 2 |
| 37 | |
| 38 | /badutf/utf |
| 39 | X\xdf |
| 40 | Failed: error -3: UTF-8 error: 1 byte missing at end at offset 1 |
| 41 | XX\xef |
| 42 | Failed: error -4: UTF-8 error: 2 bytes missing at end at offset 2 |
| 43 | XXX\xef\x80 |
| 44 | Failed: error -3: UTF-8 error: 1 byte missing at end at offset 3 |
| 45 | X\xf7 |
| 46 | Failed: error -5: UTF-8 error: 3 bytes missing at end at offset 1 |
| 47 | XX\xf7\x80 |
| 48 | Failed: error -4: UTF-8 error: 2 bytes missing at end at offset 2 |
| 49 | XXX\xf7\x80\x80 |
| 50 | Failed: error -3: UTF-8 error: 1 byte missing at end at offset 3 |
| 51 | |
| 52 | /shortutf/utf |
| 53 | XX\xdf\=ph |
| 54 | Failed: error -3: UTF-8 error: 1 byte missing at end at offset 2 |
| 55 | XX\xef\=ph |
| 56 | Failed: error -4: UTF-8 error: 2 bytes missing at end at offset 2 |
| 57 | XX\xef\x80\=ph |
| 58 | Failed: error -3: UTF-8 error: 1 byte missing at end at offset 2 |
| 59 | \xf7\=ph |
| 60 | Failed: error -5: UTF-8 error: 3 bytes missing at end at offset 0 |
| 61 | \xf7\x80\=ph |
| 62 | Failed: error -4: UTF-8 error: 2 bytes missing at end at offset 0 |
Elliott Hughes | 2dbd7d2 | 2020-06-03 14:32:37 -0700 | [diff] [blame] | 63 | |
| 64 | # ---------------------------------------------------- |
| 65 | # UCP and casing tests - except for the first two, these will all fail in 8-bit |
| 66 | # mode because they are testing UCP without UTF and use characters > 255. |
| 67 | |
| 68 | /\x{c1}/i,no_start_optimize |
| 69 | \= Expect no match |
| 70 | \x{e1} |
| 71 | No match |
| 72 | |
| 73 | /\x{c1}+\x{e1}/iB,ucp |
| 74 | ------------------------------------------------------------------ |
| 75 | Bra |
| 76 | /i \x{c1}+ |
| 77 | /i \x{e1} |
| 78 | Ket |
| 79 | End |
| 80 | ------------------------------------------------------------------ |
| 81 | \x{c1}\x{c1}\x{c1} |
| 82 | 0: \xc1\xc1\xc1 |
| 83 | 1: \xc1\xc1 |
| 84 | \x{e1}\x{e1}\x{e1} |
| 85 | 0: \xe1\xe1\xe1 |
| 86 | 1: \xe1\xe1 |
| 87 | |
| 88 | /\x{120}\x{c1}/i,ucp,no_start_optimize |
| 89 | Failed: error 134 at offset 6: character code point value in \x{} or \o{} is too large |
| 90 | \x{121}\x{e1} |
| 91 | |
| 92 | /\x{120}\x{c1}/i,ucp |
| 93 | Failed: error 134 at offset 6: character code point value in \x{} or \o{} is too large |
| 94 | \x{121}\x{e1} |
| 95 | |
| 96 | /[^\x{120}]/i,no_start_optimize |
| 97 | Failed: error 134 at offset 8: character code point value in \x{} or \o{} is too large |
| 98 | \x{121} |
| 99 | |
| 100 | /[^\x{120}]/i,ucp,no_start_optimize |
| 101 | Failed: error 134 at offset 8: character code point value in \x{} or \o{} is too large |
| 102 | \= Expect no match |
| 103 | \x{121} |
| 104 | |
| 105 | /[^\x{120}]/i |
| 106 | Failed: error 134 at offset 8: character code point value in \x{} or \o{} is too large |
| 107 | \x{121} |
| 108 | |
| 109 | /[^\x{120}]/i,ucp |
| 110 | Failed: error 134 at offset 8: character code point value in \x{} or \o{} is too large |
| 111 | \= Expect no match |
| 112 | \x{121} |
| 113 | |
| 114 | /\x{120}{2}/i,ucp |
| 115 | Failed: error 134 at offset 6: character code point value in \x{} or \o{} is too large |
| 116 | \x{121}\x{121} |
| 117 | |
| 118 | /[^\x{120}]{2}/i,ucp |
| 119 | Failed: error 134 at offset 8: character code point value in \x{} or \o{} is too large |
| 120 | \= Expect no match |
| 121 | \x{121}\x{121} |
| 122 | |
| 123 | # ---------------------------------------------------- |
Janis Danisevskis | 112c9cc | 2016-03-31 13:35:25 +0100 | [diff] [blame] | 124 | |
| 125 | # End of testinput14 |