Janis Danisevskis | 112c9cc | 2016-03-31 13:35:25 +0100 | [diff] [blame] | 1 | # This set of tests is for UTF-8 support and Unicode property support, with |
| 2 | # relevance only for the 8-bit library. |
| 3 | |
Elliott Hughes | 2dbd7d2 | 2020-06-03 14:32:37 -0700 | [diff] [blame] | 4 | # The next 5 patterns have UTF-8 errors |
Janis Danisevskis | 112c9cc | 2016-03-31 13:35:25 +0100 | [diff] [blame] | 5 | |
| 6 | /[Ã]/utf |
| 7 | Failed: error -8 at offset 1: UTF-8 error: byte 2 top bits not 0x80 |
| 8 | |
| 9 | /Ã/utf |
| 10 | Failed: error -3 at offset 0: UTF-8 error: 1 byte missing at end |
| 11 | |
| 12 | /ÃÃÃxxx/utf |
| 13 | Failed: error -8 at offset 0: UTF-8 error: byte 2 top bits not 0x80 |
| 14 | |
| 15 | /ÃÃ/utf |
| 16 | Failed: error -22 at offset 2: UTF-8 error: isolated byte with 0x80 bit set |
| 17 | |
Elliott Hughes | 2dbd7d2 | 2020-06-03 14:32:37 -0700 | [diff] [blame] | 18 | /ÃÃ/match_invalid_utf |
| 19 | Failed: error -22 at offset 2: UTF-8 error: isolated byte with 0x80 bit set |
| 20 | |
Janis Danisevskis | 112c9cc | 2016-03-31 13:35:25 +0100 | [diff] [blame] | 21 | # Now test subjects |
| 22 | |
| 23 | /badutf/utf |
| 24 | \= Expect UTF-8 errors |
| 25 | X\xdf |
| 26 | Failed: error -3: UTF-8 error: 1 byte missing at end at offset 1 |
| 27 | XX\xef |
| 28 | Failed: error -4: UTF-8 error: 2 bytes missing at end at offset 2 |
| 29 | XXX\xef\x80 |
| 30 | Failed: error -3: UTF-8 error: 1 byte missing at end at offset 3 |
| 31 | X\xf7 |
| 32 | Failed: error -5: UTF-8 error: 3 bytes missing at end at offset 1 |
| 33 | XX\xf7\x80 |
| 34 | Failed: error -4: UTF-8 error: 2 bytes missing at end at offset 2 |
| 35 | XXX\xf7\x80\x80 |
| 36 | Failed: error -3: UTF-8 error: 1 byte missing at end at offset 3 |
| 37 | \xfb |
| 38 | Failed: error -6: UTF-8 error: 4 bytes missing at end at offset 0 |
| 39 | \xfb\x80 |
| 40 | Failed: error -5: UTF-8 error: 3 bytes missing at end at offset 0 |
| 41 | \xfb\x80\x80 |
| 42 | Failed: error -4: UTF-8 error: 2 bytes missing at end at offset 0 |
| 43 | \xfb\x80\x80\x80 |
| 44 | Failed: error -3: UTF-8 error: 1 byte missing at end at offset 0 |
| 45 | \xfd |
| 46 | Failed: error -7: UTF-8 error: 5 bytes missing at end at offset 0 |
| 47 | \xfd\x80 |
| 48 | Failed: error -6: UTF-8 error: 4 bytes missing at end at offset 0 |
| 49 | \xfd\x80\x80 |
| 50 | Failed: error -5: UTF-8 error: 3 bytes missing at end at offset 0 |
| 51 | \xfd\x80\x80\x80 |
| 52 | Failed: error -4: UTF-8 error: 2 bytes missing at end at offset 0 |
| 53 | \xfd\x80\x80\x80\x80 |
| 54 | Failed: error -3: UTF-8 error: 1 byte missing at end at offset 0 |
| 55 | \xdf\x7f |
| 56 | Failed: error -8: UTF-8 error: byte 2 top bits not 0x80 at offset 0 |
| 57 | \xef\x7f\x80 |
| 58 | Failed: error -8: UTF-8 error: byte 2 top bits not 0x80 at offset 0 |
| 59 | \xef\x80\x7f |
| 60 | Failed: error -9: UTF-8 error: byte 3 top bits not 0x80 at offset 0 |
| 61 | \xf7\x7f\x80\x80 |
| 62 | Failed: error -8: UTF-8 error: byte 2 top bits not 0x80 at offset 0 |
| 63 | \xf7\x80\x7f\x80 |
| 64 | Failed: error -9: UTF-8 error: byte 3 top bits not 0x80 at offset 0 |
| 65 | \xf7\x80\x80\x7f |
| 66 | Failed: error -10: UTF-8 error: byte 4 top bits not 0x80 at offset 0 |
| 67 | \xfb\x7f\x80\x80\x80 |
| 68 | Failed: error -8: UTF-8 error: byte 2 top bits not 0x80 at offset 0 |
| 69 | \xfb\x80\x7f\x80\x80 |
| 70 | Failed: error -9: UTF-8 error: byte 3 top bits not 0x80 at offset 0 |
| 71 | \xfb\x80\x80\x7f\x80 |
| 72 | Failed: error -10: UTF-8 error: byte 4 top bits not 0x80 at offset 0 |
| 73 | \xfb\x80\x80\x80\x7f |
| 74 | Failed: error -11: UTF-8 error: byte 5 top bits not 0x80 at offset 0 |
| 75 | \xfd\x7f\x80\x80\x80\x80 |
| 76 | Failed: error -8: UTF-8 error: byte 2 top bits not 0x80 at offset 0 |
| 77 | \xfd\x80\x7f\x80\x80\x80 |
| 78 | Failed: error -9: UTF-8 error: byte 3 top bits not 0x80 at offset 0 |
| 79 | \xfd\x80\x80\x7f\x80\x80 |
| 80 | Failed: error -10: UTF-8 error: byte 4 top bits not 0x80 at offset 0 |
| 81 | \xfd\x80\x80\x80\x7f\x80 |
| 82 | Failed: error -11: UTF-8 error: byte 5 top bits not 0x80 at offset 0 |
| 83 | \xfd\x80\x80\x80\x80\x7f |
| 84 | Failed: error -12: UTF-8 error: byte 6 top bits not 0x80 at offset 0 |
| 85 | \xed\xa0\x80 |
| 86 | Failed: error -16: UTF-8 error: code points 0xd800-0xdfff are not defined at offset 0 |
| 87 | \xc0\x8f |
| 88 | Failed: error -17: UTF-8 error: overlong 2-byte sequence at offset 0 |
| 89 | \xe0\x80\x8f |
| 90 | Failed: error -18: UTF-8 error: overlong 3-byte sequence at offset 0 |
| 91 | \xf0\x80\x80\x8f |
| 92 | Failed: error -19: UTF-8 error: overlong 4-byte sequence at offset 0 |
| 93 | \xf8\x80\x80\x80\x8f |
| 94 | Failed: error -20: UTF-8 error: overlong 5-byte sequence at offset 0 |
| 95 | \xfc\x80\x80\x80\x80\x8f |
| 96 | Failed: error -21: UTF-8 error: overlong 6-byte sequence at offset 0 |
| 97 | \x80 |
| 98 | Failed: error -22: UTF-8 error: isolated byte with 0x80 bit set at offset 0 |
| 99 | \xfe |
| 100 | Failed: error -23: UTF-8 error: illegal byte (0xfe or 0xff) at offset 0 |
| 101 | \xff |
| 102 | Failed: error -23: UTF-8 error: illegal byte (0xfe or 0xff) at offset 0 |
| 103 | |
| 104 | /badutf/utf |
| 105 | \= Expect UTF-8 errors |
| 106 | XX\xfb\x80\x80\x80\x80 |
| 107 | Failed: error -13: UTF-8 error: 5-byte character is not allowed (RFC 3629) at offset 2 |
| 108 | XX\xfd\x80\x80\x80\x80\x80 |
| 109 | Failed: error -14: UTF-8 error: 6-byte character is not allowed (RFC 3629) at offset 2 |
| 110 | XX\xf7\xbf\xbf\xbf |
| 111 | Failed: error -15: UTF-8 error: code points greater than 0x10ffff are not defined at offset 2 |
| 112 | |
| 113 | /shortutf/utf |
| 114 | \= Expect UTF-8 errors |
| 115 | XX\xdf\=ph |
| 116 | Failed: error -3: UTF-8 error: 1 byte missing at end at offset 2 |
| 117 | XX\xef\=ph |
| 118 | Failed: error -4: UTF-8 error: 2 bytes missing at end at offset 2 |
| 119 | XX\xef\x80\=ph |
| 120 | Failed: error -3: UTF-8 error: 1 byte missing at end at offset 2 |
| 121 | \xf7\=ph |
| 122 | Failed: error -5: UTF-8 error: 3 bytes missing at end at offset 0 |
| 123 | \xf7\x80\=ph |
| 124 | Failed: error -4: UTF-8 error: 2 bytes missing at end at offset 0 |
| 125 | \xf7\x80\x80\=ph |
| 126 | Failed: error -3: UTF-8 error: 1 byte missing at end at offset 0 |
| 127 | \xfb\=ph |
| 128 | Failed: error -6: UTF-8 error: 4 bytes missing at end at offset 0 |
| 129 | \xfb\x80\=ph |
| 130 | Failed: error -5: UTF-8 error: 3 bytes missing at end at offset 0 |
| 131 | \xfb\x80\x80\=ph |
| 132 | Failed: error -4: UTF-8 error: 2 bytes missing at end at offset 0 |
| 133 | \xfb\x80\x80\x80\=ph |
| 134 | Failed: error -3: UTF-8 error: 1 byte missing at end at offset 0 |
| 135 | \xfd\=ph |
| 136 | Failed: error -7: UTF-8 error: 5 bytes missing at end at offset 0 |
| 137 | \xfd\x80\=ph |
| 138 | Failed: error -6: UTF-8 error: 4 bytes missing at end at offset 0 |
| 139 | \xfd\x80\x80\=ph |
| 140 | Failed: error -5: UTF-8 error: 3 bytes missing at end at offset 0 |
| 141 | \xfd\x80\x80\x80\=ph |
| 142 | Failed: error -4: UTF-8 error: 2 bytes missing at end at offset 0 |
| 143 | \xfd\x80\x80\x80\x80\=ph |
| 144 | Failed: error -3: UTF-8 error: 1 byte missing at end at offset 0 |
| 145 | |
| 146 | /anything/utf |
| 147 | \= Expect UTF-8 errors |
| 148 | X\xc0\x80 |
| 149 | Failed: error -17: UTF-8 error: overlong 2-byte sequence at offset 1 |
| 150 | XX\xc1\x8f |
| 151 | Failed: error -17: UTF-8 error: overlong 2-byte sequence at offset 2 |
| 152 | XXX\xe0\x9f\x80 |
| 153 | Failed: error -18: UTF-8 error: overlong 3-byte sequence at offset 3 |
| 154 | \xf0\x8f\x80\x80 |
| 155 | Failed: error -19: UTF-8 error: overlong 4-byte sequence at offset 0 |
| 156 | \xf8\x87\x80\x80\x80 |
| 157 | Failed: error -20: UTF-8 error: overlong 5-byte sequence at offset 0 |
| 158 | \xfc\x83\x80\x80\x80\x80 |
| 159 | Failed: error -21: UTF-8 error: overlong 6-byte sequence at offset 0 |
| 160 | \xfe\x80\x80\x80\x80\x80 |
| 161 | Failed: error -23: UTF-8 error: illegal byte (0xfe or 0xff) at offset 0 |
| 162 | \xff\x80\x80\x80\x80\x80 |
| 163 | Failed: error -23: UTF-8 error: illegal byte (0xfe or 0xff) at offset 0 |
| 164 | \xf8\x88\x80\x80\x80 |
| 165 | Failed: error -13: UTF-8 error: 5-byte character is not allowed (RFC 3629) at offset 0 |
| 166 | \xf9\x87\x80\x80\x80 |
| 167 | Failed: error -13: UTF-8 error: 5-byte character is not allowed (RFC 3629) at offset 0 |
| 168 | \xfc\x84\x80\x80\x80\x80 |
| 169 | Failed: error -14: UTF-8 error: 6-byte character is not allowed (RFC 3629) at offset 0 |
| 170 | \xfd\x83\x80\x80\x80\x80 |
| 171 | Failed: error -14: UTF-8 error: 6-byte character is not allowed (RFC 3629) at offset 0 |
| 172 | \= Expect no match |
| 173 | \xc3\x8f |
| 174 | No match |
| 175 | \xe0\xaf\x80 |
| 176 | No match |
| 177 | \xe1\x80\x80 |
| 178 | No match |
| 179 | \xf0\x9f\x80\x80 |
| 180 | No match |
| 181 | \xf1\x8f\x80\x80 |
| 182 | No match |
| 183 | \xf8\x88\x80\x80\x80\=no_utf_check |
| 184 | No match |
| 185 | \xf9\x87\x80\x80\x80\=no_utf_check |
| 186 | No match |
| 187 | \xfc\x84\x80\x80\x80\x80\=no_utf_check |
| 188 | No match |
| 189 | \xfd\x83\x80\x80\x80\x80\=no_utf_check |
| 190 | No match |
| 191 | |
| 192 | # Similar tests with offsets |
| 193 | |
| 194 | /badutf/utf |
| 195 | \= Expect UTF-8 errors |
| 196 | X\xdfabcd |
| 197 | Failed: error -8: UTF-8 error: byte 2 top bits not 0x80 at offset 1 |
| 198 | X\xdfabcd\=offset=1 |
| 199 | Failed: error -8: UTF-8 error: byte 2 top bits not 0x80 at offset 1 |
| 200 | \= Expect no match |
| 201 | X\xdfabcd\=offset=2 |
| 202 | No match |
| 203 | |
| 204 | /(?<=x)badutf/utf |
| 205 | \= Expect UTF-8 errors |
| 206 | X\xdfabcd |
| 207 | Failed: error -8: UTF-8 error: byte 2 top bits not 0x80 at offset 1 |
| 208 | X\xdfabcd\=offset=1 |
| 209 | Failed: error -8: UTF-8 error: byte 2 top bits not 0x80 at offset 1 |
| 210 | X\xdfabcd\=offset=2 |
| 211 | Failed: error -8: UTF-8 error: byte 2 top bits not 0x80 at offset 1 |
| 212 | X\xdfabcd\xdf\=offset=3 |
| 213 | Failed: error -3: UTF-8 error: 1 byte missing at end at offset 6 |
| 214 | \= Expect no match |
| 215 | X\xdfabcd\=offset=3 |
| 216 | No match |
| 217 | |
| 218 | /(?<=xx)badutf/utf |
| 219 | \= Expect UTF-8 errors |
| 220 | X\xdfabcd |
| 221 | Failed: error -8: UTF-8 error: byte 2 top bits not 0x80 at offset 1 |
| 222 | X\xdfabcd\=offset=1 |
| 223 | Failed: error -8: UTF-8 error: byte 2 top bits not 0x80 at offset 1 |
| 224 | X\xdfabcd\=offset=2 |
| 225 | Failed: error -8: UTF-8 error: byte 2 top bits not 0x80 at offset 1 |
| 226 | X\xdfabcd\=offset=3 |
| 227 | Failed: error -8: UTF-8 error: byte 2 top bits not 0x80 at offset 1 |
| 228 | |
| 229 | /(?<=xxxx)badutf/utf |
| 230 | \= Expect UTF-8 errors |
| 231 | X\xdfabcd |
| 232 | Failed: error -8: UTF-8 error: byte 2 top bits not 0x80 at offset 1 |
| 233 | X\xdfabcd\=offset=1 |
| 234 | Failed: error -8: UTF-8 error: byte 2 top bits not 0x80 at offset 1 |
| 235 | X\xdfabcd\=offset=2 |
| 236 | Failed: error -8: UTF-8 error: byte 2 top bits not 0x80 at offset 1 |
| 237 | X\xdfabcd\=offset=3 |
| 238 | Failed: error -8: UTF-8 error: byte 2 top bits not 0x80 at offset 1 |
| 239 | X\xdfabc\xdf\=offset=6 |
| 240 | Failed: error -3: UTF-8 error: 1 byte missing at end at offset 5 |
| 241 | X\xdfabc\xdf\=offset=7 |
| 242 | Failed: error -33: bad offset value |
| 243 | \= Expect no match |
| 244 | X\xdfabcd\=offset=6 |
| 245 | No match |
| 246 | |
| 247 | /\x{100}/IB,utf |
| 248 | ------------------------------------------------------------------ |
| 249 | Bra |
| 250 | \x{100} |
| 251 | Ket |
| 252 | End |
| 253 | ------------------------------------------------------------------ |
Elliott Hughes | 0c26e19 | 2019-08-07 12:24:46 -0700 | [diff] [blame] | 254 | Capture group count = 0 |
Janis Danisevskis | 112c9cc | 2016-03-31 13:35:25 +0100 | [diff] [blame] | 255 | Options: utf |
| 256 | First code unit = \xc4 |
| 257 | Last code unit = \x80 |
| 258 | Subject length lower bound = 1 |
| 259 | |
| 260 | /\x{1000}/IB,utf |
| 261 | ------------------------------------------------------------------ |
| 262 | Bra |
| 263 | \x{1000} |
| 264 | Ket |
| 265 | End |
| 266 | ------------------------------------------------------------------ |
Elliott Hughes | 0c26e19 | 2019-08-07 12:24:46 -0700 | [diff] [blame] | 267 | Capture group count = 0 |
Janis Danisevskis | 112c9cc | 2016-03-31 13:35:25 +0100 | [diff] [blame] | 268 | Options: utf |
| 269 | First code unit = \xe1 |
| 270 | Last code unit = \x80 |
| 271 | Subject length lower bound = 1 |
| 272 | |
| 273 | /\x{10000}/IB,utf |
| 274 | ------------------------------------------------------------------ |
| 275 | Bra |
| 276 | \x{10000} |
| 277 | Ket |
| 278 | End |
| 279 | ------------------------------------------------------------------ |
Elliott Hughes | 0c26e19 | 2019-08-07 12:24:46 -0700 | [diff] [blame] | 280 | Capture group count = 0 |
Janis Danisevskis | 112c9cc | 2016-03-31 13:35:25 +0100 | [diff] [blame] | 281 | Options: utf |
| 282 | First code unit = \xf0 |
| 283 | Last code unit = \x80 |
| 284 | Subject length lower bound = 1 |
| 285 | |
| 286 | /\x{100000}/IB,utf |
| 287 | ------------------------------------------------------------------ |
| 288 | Bra |
| 289 | \x{100000} |
| 290 | Ket |
| 291 | End |
| 292 | ------------------------------------------------------------------ |
Elliott Hughes | 0c26e19 | 2019-08-07 12:24:46 -0700 | [diff] [blame] | 293 | Capture group count = 0 |
Janis Danisevskis | 112c9cc | 2016-03-31 13:35:25 +0100 | [diff] [blame] | 294 | Options: utf |
| 295 | First code unit = \xf4 |
| 296 | Last code unit = \x80 |
| 297 | Subject length lower bound = 1 |
| 298 | |
| 299 | /\x{10ffff}/IB,utf |
| 300 | ------------------------------------------------------------------ |
| 301 | Bra |
| 302 | \x{10ffff} |
| 303 | Ket |
| 304 | End |
| 305 | ------------------------------------------------------------------ |
Elliott Hughes | 0c26e19 | 2019-08-07 12:24:46 -0700 | [diff] [blame] | 306 | Capture group count = 0 |
Janis Danisevskis | 112c9cc | 2016-03-31 13:35:25 +0100 | [diff] [blame] | 307 | Options: utf |
| 308 | First code unit = \xf4 |
| 309 | Last code unit = \xbf |
| 310 | Subject length lower bound = 1 |
| 311 | |
| 312 | /[\x{ff}]/IB,utf |
| 313 | ------------------------------------------------------------------ |
| 314 | Bra |
| 315 | \x{ff} |
| 316 | Ket |
| 317 | End |
| 318 | ------------------------------------------------------------------ |
Elliott Hughes | 0c26e19 | 2019-08-07 12:24:46 -0700 | [diff] [blame] | 319 | Capture group count = 0 |
Janis Danisevskis | 112c9cc | 2016-03-31 13:35:25 +0100 | [diff] [blame] | 320 | Options: utf |
| 321 | First code unit = \xc3 |
| 322 | Last code unit = \xbf |
| 323 | Subject length lower bound = 1 |
| 324 | |
| 325 | /[\x{100}]/IB,utf |
| 326 | ------------------------------------------------------------------ |
| 327 | Bra |
| 328 | \x{100} |
| 329 | Ket |
| 330 | End |
| 331 | ------------------------------------------------------------------ |
Elliott Hughes | 0c26e19 | 2019-08-07 12:24:46 -0700 | [diff] [blame] | 332 | Capture group count = 0 |
Janis Danisevskis | 112c9cc | 2016-03-31 13:35:25 +0100 | [diff] [blame] | 333 | Options: utf |
| 334 | First code unit = \xc4 |
| 335 | Last code unit = \x80 |
| 336 | Subject length lower bound = 1 |
| 337 | |
| 338 | /\x80/IB,utf |
| 339 | ------------------------------------------------------------------ |
| 340 | Bra |
| 341 | \x{80} |
| 342 | Ket |
| 343 | End |
| 344 | ------------------------------------------------------------------ |
Elliott Hughes | 0c26e19 | 2019-08-07 12:24:46 -0700 | [diff] [blame] | 345 | Capture group count = 0 |
Janis Danisevskis | 112c9cc | 2016-03-31 13:35:25 +0100 | [diff] [blame] | 346 | Options: utf |
| 347 | First code unit = \xc2 |
| 348 | Last code unit = \x80 |
| 349 | Subject length lower bound = 1 |
| 350 | |
| 351 | /\xff/IB,utf |
| 352 | ------------------------------------------------------------------ |
| 353 | Bra |
| 354 | \x{ff} |
| 355 | Ket |
| 356 | End |
| 357 | ------------------------------------------------------------------ |
Elliott Hughes | 0c26e19 | 2019-08-07 12:24:46 -0700 | [diff] [blame] | 358 | Capture group count = 0 |
Janis Danisevskis | 112c9cc | 2016-03-31 13:35:25 +0100 | [diff] [blame] | 359 | Options: utf |
| 360 | First code unit = \xc3 |
| 361 | Last code unit = \xbf |
| 362 | Subject length lower bound = 1 |
| 363 | |
| 364 | /\x{D55c}\x{ad6d}\x{C5B4}/IB,utf |
| 365 | ------------------------------------------------------------------ |
| 366 | Bra |
| 367 | \x{d55c}\x{ad6d}\x{c5b4} |
| 368 | Ket |
| 369 | End |
| 370 | ------------------------------------------------------------------ |
Elliott Hughes | 0c26e19 | 2019-08-07 12:24:46 -0700 | [diff] [blame] | 371 | Capture group count = 0 |
Janis Danisevskis | 112c9cc | 2016-03-31 13:35:25 +0100 | [diff] [blame] | 372 | Options: utf |
| 373 | First code unit = \xed |
| 374 | Last code unit = \xb4 |
| 375 | Subject length lower bound = 3 |
| 376 | \x{D55c}\x{ad6d}\x{C5B4} |
| 377 | 0: \x{d55c}\x{ad6d}\x{c5b4} |
| 378 | |
| 379 | /\x{65e5}\x{672c}\x{8a9e}/IB,utf |
| 380 | ------------------------------------------------------------------ |
| 381 | Bra |
| 382 | \x{65e5}\x{672c}\x{8a9e} |
| 383 | Ket |
| 384 | End |
| 385 | ------------------------------------------------------------------ |
Elliott Hughes | 0c26e19 | 2019-08-07 12:24:46 -0700 | [diff] [blame] | 386 | Capture group count = 0 |
Janis Danisevskis | 112c9cc | 2016-03-31 13:35:25 +0100 | [diff] [blame] | 387 | Options: utf |
| 388 | First code unit = \xe6 |
| 389 | Last code unit = \x9e |
| 390 | Subject length lower bound = 3 |
| 391 | \x{65e5}\x{672c}\x{8a9e} |
| 392 | 0: \x{65e5}\x{672c}\x{8a9e} |
| 393 | |
| 394 | /\x{80}/IB,utf |
| 395 | ------------------------------------------------------------------ |
| 396 | Bra |
| 397 | \x{80} |
| 398 | Ket |
| 399 | End |
| 400 | ------------------------------------------------------------------ |
Elliott Hughes | 0c26e19 | 2019-08-07 12:24:46 -0700 | [diff] [blame] | 401 | Capture group count = 0 |
Janis Danisevskis | 112c9cc | 2016-03-31 13:35:25 +0100 | [diff] [blame] | 402 | Options: utf |
| 403 | First code unit = \xc2 |
| 404 | Last code unit = \x80 |
| 405 | Subject length lower bound = 1 |
| 406 | |
| 407 | /\x{084}/IB,utf |
| 408 | ------------------------------------------------------------------ |
| 409 | Bra |
| 410 | \x{84} |
| 411 | Ket |
| 412 | End |
| 413 | ------------------------------------------------------------------ |
Elliott Hughes | 0c26e19 | 2019-08-07 12:24:46 -0700 | [diff] [blame] | 414 | Capture group count = 0 |
Janis Danisevskis | 112c9cc | 2016-03-31 13:35:25 +0100 | [diff] [blame] | 415 | Options: utf |
| 416 | First code unit = \xc2 |
| 417 | Last code unit = \x84 |
| 418 | Subject length lower bound = 1 |
| 419 | |
| 420 | /\x{104}/IB,utf |
| 421 | ------------------------------------------------------------------ |
| 422 | Bra |
| 423 | \x{104} |
| 424 | Ket |
| 425 | End |
| 426 | ------------------------------------------------------------------ |
Elliott Hughes | 0c26e19 | 2019-08-07 12:24:46 -0700 | [diff] [blame] | 427 | Capture group count = 0 |
Janis Danisevskis | 112c9cc | 2016-03-31 13:35:25 +0100 | [diff] [blame] | 428 | Options: utf |
| 429 | First code unit = \xc4 |
| 430 | Last code unit = \x84 |
| 431 | Subject length lower bound = 1 |
| 432 | |
| 433 | /\x{861}/IB,utf |
| 434 | ------------------------------------------------------------------ |
| 435 | Bra |
| 436 | \x{861} |
| 437 | Ket |
| 438 | End |
| 439 | ------------------------------------------------------------------ |
Elliott Hughes | 0c26e19 | 2019-08-07 12:24:46 -0700 | [diff] [blame] | 440 | Capture group count = 0 |
Janis Danisevskis | 112c9cc | 2016-03-31 13:35:25 +0100 | [diff] [blame] | 441 | Options: utf |
| 442 | First code unit = \xe0 |
| 443 | Last code unit = \xa1 |
| 444 | Subject length lower bound = 1 |
| 445 | |
| 446 | /\x{212ab}/IB,utf |
| 447 | ------------------------------------------------------------------ |
| 448 | Bra |
| 449 | \x{212ab} |
| 450 | Ket |
| 451 | End |
| 452 | ------------------------------------------------------------------ |
Elliott Hughes | 0c26e19 | 2019-08-07 12:24:46 -0700 | [diff] [blame] | 453 | Capture group count = 0 |
Janis Danisevskis | 112c9cc | 2016-03-31 13:35:25 +0100 | [diff] [blame] | 454 | Options: utf |
| 455 | First code unit = \xf0 |
| 456 | Last code unit = \xab |
| 457 | Subject length lower bound = 1 |
| 458 | |
| 459 | /[^ab\xC0-\xF0]/IB,utf |
| 460 | ------------------------------------------------------------------ |
| 461 | Bra |
| 462 | [\x00-`c-\xbf\xf1-\xff] (neg) |
| 463 | Ket |
| 464 | End |
| 465 | ------------------------------------------------------------------ |
Elliott Hughes | 0c26e19 | 2019-08-07 12:24:46 -0700 | [diff] [blame] | 466 | Capture group count = 0 |
Janis Danisevskis | 112c9cc | 2016-03-31 13:35:25 +0100 | [diff] [blame] | 467 | Options: utf |
| 468 | Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a |
| 469 | \x0b \x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 |
| 470 | \x1a \x1b \x1c \x1d \x1e \x1f \x20 ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 |
| 471 | 5 6 7 8 9 : ; < = > ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y |
| 472 | Z [ \ ] ^ _ ` c d e f g h i j k l m n o p q r s t u v w x y z { | } ~ \x7f |
| 473 | \xc2 \xc3 \xc4 \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc \xcd \xce \xcf \xd0 |
| 474 | \xd1 \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb \xdc \xdd \xde \xdf |
| 475 | \xe0 \xe1 \xe2 \xe3 \xe4 \xe5 \xe6 \xe7 \xe8 \xe9 \xea \xeb \xec \xed \xee |
| 476 | \xef \xf0 \xf1 \xf2 \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 \xf9 \xfa \xfb \xfc \xfd |
| 477 | \xfe \xff |
| 478 | Subject length lower bound = 1 |
| 479 | \x{f1} |
| 480 | 0: \x{f1} |
| 481 | \x{bf} |
| 482 | 0: \x{bf} |
| 483 | \x{100} |
| 484 | 0: \x{100} |
| 485 | \x{1000} |
| 486 | 0: \x{1000} |
| 487 | \= Expect no match |
| 488 | \x{c0} |
| 489 | No match |
| 490 | \x{f0} |
| 491 | No match |
| 492 | |
| 493 | /Ä{3,4}/IB,utf |
| 494 | ------------------------------------------------------------------ |
| 495 | Bra |
| 496 | \x{100}{3} |
| 497 | \x{100}?+ |
| 498 | Ket |
| 499 | End |
| 500 | ------------------------------------------------------------------ |
Elliott Hughes | 0c26e19 | 2019-08-07 12:24:46 -0700 | [diff] [blame] | 501 | Capture group count = 0 |
Janis Danisevskis | 112c9cc | 2016-03-31 13:35:25 +0100 | [diff] [blame] | 502 | Options: utf |
| 503 | First code unit = \xc4 |
| 504 | Last code unit = \x80 |
| 505 | Subject length lower bound = 3 |
| 506 | \x{100}\x{100}\x{100}\x{100\x{100} |
| 507 | 0: \x{100}\x{100}\x{100} |
| 508 | |
| 509 | /(\x{100}+|x)/IB,utf |
| 510 | ------------------------------------------------------------------ |
| 511 | Bra |
| 512 | CBra 1 |
| 513 | \x{100}++ |
| 514 | Alt |
| 515 | x |
| 516 | Ket |
| 517 | Ket |
| 518 | End |
| 519 | ------------------------------------------------------------------ |
Elliott Hughes | 0c26e19 | 2019-08-07 12:24:46 -0700 | [diff] [blame] | 520 | Capture group count = 1 |
Janis Danisevskis | 112c9cc | 2016-03-31 13:35:25 +0100 | [diff] [blame] | 521 | Options: utf |
| 522 | Starting code units: x \xc4 |
| 523 | Subject length lower bound = 1 |
| 524 | |
| 525 | /(\x{100}*a|x)/IB,utf |
| 526 | ------------------------------------------------------------------ |
| 527 | Bra |
| 528 | CBra 1 |
| 529 | \x{100}*+ |
| 530 | a |
| 531 | Alt |
| 532 | x |
| 533 | Ket |
| 534 | Ket |
| 535 | End |
| 536 | ------------------------------------------------------------------ |
Elliott Hughes | 0c26e19 | 2019-08-07 12:24:46 -0700 | [diff] [blame] | 537 | Capture group count = 1 |
Janis Danisevskis | 112c9cc | 2016-03-31 13:35:25 +0100 | [diff] [blame] | 538 | Options: utf |
| 539 | Starting code units: a x \xc4 |
| 540 | Subject length lower bound = 1 |
| 541 | |
| 542 | /(\x{100}{0,2}a|x)/IB,utf |
| 543 | ------------------------------------------------------------------ |
| 544 | Bra |
| 545 | CBra 1 |
| 546 | \x{100}{0,2}+ |
| 547 | a |
| 548 | Alt |
| 549 | x |
| 550 | Ket |
| 551 | Ket |
| 552 | End |
| 553 | ------------------------------------------------------------------ |
Elliott Hughes | 0c26e19 | 2019-08-07 12:24:46 -0700 | [diff] [blame] | 554 | Capture group count = 1 |
Janis Danisevskis | 112c9cc | 2016-03-31 13:35:25 +0100 | [diff] [blame] | 555 | Options: utf |
| 556 | Starting code units: a x \xc4 |
| 557 | Subject length lower bound = 1 |
| 558 | |
| 559 | /(\x{100}{1,2}a|x)/IB,utf |
| 560 | ------------------------------------------------------------------ |
| 561 | Bra |
| 562 | CBra 1 |
| 563 | \x{100} |
| 564 | \x{100}{0,1}+ |
| 565 | a |
| 566 | Alt |
| 567 | x |
| 568 | Ket |
| 569 | Ket |
| 570 | End |
| 571 | ------------------------------------------------------------------ |
Elliott Hughes | 0c26e19 | 2019-08-07 12:24:46 -0700 | [diff] [blame] | 572 | Capture group count = 1 |
Janis Danisevskis | 112c9cc | 2016-03-31 13:35:25 +0100 | [diff] [blame] | 573 | Options: utf |
| 574 | Starting code units: x \xc4 |
| 575 | Subject length lower bound = 1 |
| 576 | |
| 577 | /\x{100}/IB,utf |
| 578 | ------------------------------------------------------------------ |
| 579 | Bra |
| 580 | \x{100} |
| 581 | Ket |
| 582 | End |
| 583 | ------------------------------------------------------------------ |
Elliott Hughes | 0c26e19 | 2019-08-07 12:24:46 -0700 | [diff] [blame] | 584 | Capture group count = 0 |
Janis Danisevskis | 112c9cc | 2016-03-31 13:35:25 +0100 | [diff] [blame] | 585 | Options: utf |
| 586 | First code unit = \xc4 |
| 587 | Last code unit = \x80 |
| 588 | Subject length lower bound = 1 |
| 589 | |
| 590 | /a\x{100}\x{101}*/IB,utf |
| 591 | ------------------------------------------------------------------ |
| 592 | Bra |
| 593 | a\x{100} |
| 594 | \x{101}*+ |
| 595 | Ket |
| 596 | End |
| 597 | ------------------------------------------------------------------ |
Elliott Hughes | 0c26e19 | 2019-08-07 12:24:46 -0700 | [diff] [blame] | 598 | Capture group count = 0 |
Janis Danisevskis | 112c9cc | 2016-03-31 13:35:25 +0100 | [diff] [blame] | 599 | Options: utf |
| 600 | First code unit = 'a' |
| 601 | Last code unit = \x80 |
| 602 | Subject length lower bound = 2 |
| 603 | |
| 604 | /a\x{100}\x{101}+/IB,utf |
| 605 | ------------------------------------------------------------------ |
| 606 | Bra |
| 607 | a\x{100} |
| 608 | \x{101}++ |
| 609 | Ket |
| 610 | End |
| 611 | ------------------------------------------------------------------ |
Elliott Hughes | 0c26e19 | 2019-08-07 12:24:46 -0700 | [diff] [blame] | 612 | Capture group count = 0 |
Janis Danisevskis | 112c9cc | 2016-03-31 13:35:25 +0100 | [diff] [blame] | 613 | Options: utf |
| 614 | First code unit = 'a' |
| 615 | Last code unit = \x81 |
| 616 | Subject length lower bound = 3 |
| 617 | |
| 618 | /[^\x{c4}]/IB |
| 619 | ------------------------------------------------------------------ |
| 620 | Bra |
| 621 | [^\x{c4}] |
| 622 | Ket |
| 623 | End |
| 624 | ------------------------------------------------------------------ |
Elliott Hughes | 0c26e19 | 2019-08-07 12:24:46 -0700 | [diff] [blame] | 625 | Capture group count = 0 |
Janis Danisevskis | 112c9cc | 2016-03-31 13:35:25 +0100 | [diff] [blame] | 626 | Subject length lower bound = 1 |
| 627 | |
| 628 | /[\x{100}]/IB,utf |
| 629 | ------------------------------------------------------------------ |
| 630 | Bra |
| 631 | \x{100} |
| 632 | Ket |
| 633 | End |
| 634 | ------------------------------------------------------------------ |
Elliott Hughes | 0c26e19 | 2019-08-07 12:24:46 -0700 | [diff] [blame] | 635 | Capture group count = 0 |
Janis Danisevskis | 112c9cc | 2016-03-31 13:35:25 +0100 | [diff] [blame] | 636 | Options: utf |
| 637 | First code unit = \xc4 |
| 638 | Last code unit = \x80 |
| 639 | Subject length lower bound = 1 |
| 640 | \x{100} |
| 641 | 0: \x{100} |
| 642 | Z\x{100} |
| 643 | 0: \x{100} |
| 644 | \x{100}Z |
| 645 | 0: \x{100} |
| 646 | |
| 647 | /[\xff]/IB,utf |
| 648 | ------------------------------------------------------------------ |
| 649 | Bra |
| 650 | \x{ff} |
| 651 | Ket |
| 652 | End |
| 653 | ------------------------------------------------------------------ |
Elliott Hughes | 0c26e19 | 2019-08-07 12:24:46 -0700 | [diff] [blame] | 654 | Capture group count = 0 |
Janis Danisevskis | 112c9cc | 2016-03-31 13:35:25 +0100 | [diff] [blame] | 655 | Options: utf |
| 656 | First code unit = \xc3 |
| 657 | Last code unit = \xbf |
| 658 | Subject length lower bound = 1 |
| 659 | >\x{ff}< |
| 660 | 0: \x{ff} |
| 661 | |
| 662 | /[^\xff]/IB,utf |
| 663 | ------------------------------------------------------------------ |
| 664 | Bra |
| 665 | [^\x{ff}] |
| 666 | Ket |
| 667 | End |
| 668 | ------------------------------------------------------------------ |
Elliott Hughes | 0c26e19 | 2019-08-07 12:24:46 -0700 | [diff] [blame] | 669 | Capture group count = 0 |
Janis Danisevskis | 112c9cc | 2016-03-31 13:35:25 +0100 | [diff] [blame] | 670 | Options: utf |
| 671 | Subject length lower bound = 1 |
| 672 | |
| 673 | /\x{100}abc(xyz(?1))/IB,utf |
| 674 | ------------------------------------------------------------------ |
| 675 | Bra |
| 676 | \x{100}abc |
| 677 | CBra 1 |
| 678 | xyz |
| 679 | Recurse |
| 680 | Ket |
| 681 | Ket |
| 682 | End |
| 683 | ------------------------------------------------------------------ |
Elliott Hughes | 0c26e19 | 2019-08-07 12:24:46 -0700 | [diff] [blame] | 684 | Capture group count = 1 |
Janis Danisevskis | 112c9cc | 2016-03-31 13:35:25 +0100 | [diff] [blame] | 685 | Options: utf |
| 686 | First code unit = \xc4 |
| 687 | Last code unit = 'z' |
| 688 | Subject length lower bound = 7 |
| 689 | |
| 690 | /\777/I,utf |
Elliott Hughes | 0c26e19 | 2019-08-07 12:24:46 -0700 | [diff] [blame] | 691 | Capture group count = 0 |
Janis Danisevskis | 112c9cc | 2016-03-31 13:35:25 +0100 | [diff] [blame] | 692 | Options: utf |
| 693 | First code unit = \xc7 |
| 694 | Last code unit = \xbf |
| 695 | Subject length lower bound = 1 |
| 696 | \x{1ff} |
| 697 | 0: \x{1ff} |
| 698 | \777 |
| 699 | 0: \x{1ff} |
| 700 | |
| 701 | /\x{100}+\x{200}/IB,utf |
| 702 | ------------------------------------------------------------------ |
| 703 | Bra |
| 704 | \x{100}++ |
| 705 | \x{200} |
| 706 | Ket |
| 707 | End |
| 708 | ------------------------------------------------------------------ |
Elliott Hughes | 0c26e19 | 2019-08-07 12:24:46 -0700 | [diff] [blame] | 709 | Capture group count = 0 |
Janis Danisevskis | 112c9cc | 2016-03-31 13:35:25 +0100 | [diff] [blame] | 710 | Options: utf |
| 711 | First code unit = \xc4 |
| 712 | Last code unit = \x80 |
| 713 | Subject length lower bound = 2 |
| 714 | |
| 715 | /\x{100}+X/IB,utf |
| 716 | ------------------------------------------------------------------ |
| 717 | Bra |
| 718 | \x{100}++ |
| 719 | X |
| 720 | Ket |
| 721 | End |
| 722 | ------------------------------------------------------------------ |
Elliott Hughes | 0c26e19 | 2019-08-07 12:24:46 -0700 | [diff] [blame] | 723 | Capture group count = 0 |
Janis Danisevskis | 112c9cc | 2016-03-31 13:35:25 +0100 | [diff] [blame] | 724 | Options: utf |
| 725 | First code unit = \xc4 |
| 726 | Last code unit = 'X' |
| 727 | Subject length lower bound = 2 |
| 728 | |
| 729 | /^[\QÄ\E-\QÅ\E/B,utf |
| 730 | Failed: error 106 at offset 15: missing terminating ] for character class |
| 731 | |
| 732 | # This tests the stricter UTF-8 check according to RFC 3629. |
| 733 | |
| 734 | /X/utf |
| 735 | \= Expect UTF-8 errors |
| 736 | \x{d800} |
| 737 | Failed: error -16: UTF-8 error: code points 0xd800-0xdfff are not defined at offset 0 |
| 738 | \x{da00} |
| 739 | Failed: error -16: UTF-8 error: code points 0xd800-0xdfff are not defined at offset 0 |
| 740 | \x{dfff} |
| 741 | Failed: error -16: UTF-8 error: code points 0xd800-0xdfff are not defined at offset 0 |
| 742 | \x{110000} |
| 743 | Failed: error -15: UTF-8 error: code points greater than 0x10ffff are not defined at offset 0 |
| 744 | \x{2000000} |
| 745 | Failed: error -13: UTF-8 error: 5-byte character is not allowed (RFC 3629) at offset 0 |
| 746 | \x{7fffffff} |
| 747 | Failed: error -14: UTF-8 error: 6-byte character is not allowed (RFC 3629) at offset 0 |
| 748 | \= Expect no match |
| 749 | \x{d800}\=no_utf_check |
| 750 | No match |
| 751 | \x{da00}\=no_utf_check |
| 752 | No match |
| 753 | \x{dfff}\=no_utf_check |
| 754 | No match |
| 755 | \x{110000}\=no_utf_check |
| 756 | No match |
| 757 | \x{2000000}\=no_utf_check |
| 758 | No match |
| 759 | \x{7fffffff}\=no_utf_check |
| 760 | No match |
| 761 | |
| 762 | /(*UTF8)\x{1234}/ |
| 763 | abcd\x{1234}pqr |
| 764 | 0: \x{1234} |
| 765 | |
| 766 | /(*CRLF)(*UTF)(*BSR_UNICODE)a\Rb/I |
Elliott Hughes | 0c26e19 | 2019-08-07 12:24:46 -0700 | [diff] [blame] | 767 | Capture group count = 0 |
Janis Danisevskis | 112c9cc | 2016-03-31 13:35:25 +0100 | [diff] [blame] | 768 | Compile options: <none> |
| 769 | Overall options: utf |
| 770 | \R matches any Unicode newline |
| 771 | Forced newline is CRLF |
| 772 | First code unit = 'a' |
| 773 | Last code unit = 'b' |
| 774 | Subject length lower bound = 3 |
| 775 | |
| 776 | /\h/I,utf |
Elliott Hughes | 0c26e19 | 2019-08-07 12:24:46 -0700 | [diff] [blame] | 777 | Capture group count = 0 |
Janis Danisevskis | 112c9cc | 2016-03-31 13:35:25 +0100 | [diff] [blame] | 778 | Options: utf |
| 779 | Starting code units: \x09 \x20 \xc2 \xe1 \xe2 \xe3 |
| 780 | Subject length lower bound = 1 |
| 781 | ABC\x{09} |
| 782 | 0: \x{09} |
| 783 | ABC\x{20} |
| 784 | 0: |
| 785 | ABC\x{a0} |
| 786 | 0: \x{a0} |
| 787 | ABC\x{1680} |
| 788 | 0: \x{1680} |
| 789 | ABC\x{180e} |
| 790 | 0: \x{180e} |
| 791 | ABC\x{2000} |
| 792 | 0: \x{2000} |
| 793 | ABC\x{202f} |
| 794 | 0: \x{202f} |
| 795 | ABC\x{205f} |
| 796 | 0: \x{205f} |
| 797 | ABC\x{3000} |
| 798 | 0: \x{3000} |
| 799 | |
| 800 | /\v/I,utf |
Elliott Hughes | 0c26e19 | 2019-08-07 12:24:46 -0700 | [diff] [blame] | 801 | Capture group count = 0 |
Janis Danisevskis | 112c9cc | 2016-03-31 13:35:25 +0100 | [diff] [blame] | 802 | Options: utf |
| 803 | Starting code units: \x0a \x0b \x0c \x0d \xc2 \xe2 |
| 804 | Subject length lower bound = 1 |
| 805 | ABC\x{0a} |
| 806 | 0: \x{0a} |
| 807 | ABC\x{0b} |
| 808 | 0: \x{0b} |
| 809 | ABC\x{0c} |
| 810 | 0: \x{0c} |
| 811 | ABC\x{0d} |
| 812 | 0: \x{0d} |
| 813 | ABC\x{85} |
| 814 | 0: \x{85} |
| 815 | ABC\x{2028} |
| 816 | 0: \x{2028} |
| 817 | |
| 818 | /\h*A/I,utf |
Elliott Hughes | 0c26e19 | 2019-08-07 12:24:46 -0700 | [diff] [blame] | 819 | Capture group count = 0 |
Janis Danisevskis | 112c9cc | 2016-03-31 13:35:25 +0100 | [diff] [blame] | 820 | Options: utf |
| 821 | Starting code units: \x09 \x20 A \xc2 \xe1 \xe2 \xe3 |
| 822 | Last code unit = 'A' |
| 823 | Subject length lower bound = 1 |
| 824 | CDBABC |
| 825 | 0: A |
| 826 | |
| 827 | /\v+A/I,utf |
Elliott Hughes | 0c26e19 | 2019-08-07 12:24:46 -0700 | [diff] [blame] | 828 | Capture group count = 0 |
Janis Danisevskis | 112c9cc | 2016-03-31 13:35:25 +0100 | [diff] [blame] | 829 | Options: utf |
| 830 | Starting code units: \x0a \x0b \x0c \x0d \xc2 \xe2 |
| 831 | Last code unit = 'A' |
| 832 | Subject length lower bound = 2 |
| 833 | |
| 834 | /\s?xxx\s/I,utf |
Elliott Hughes | 0c26e19 | 2019-08-07 12:24:46 -0700 | [diff] [blame] | 835 | Capture group count = 0 |
Janis Danisevskis | 112c9cc | 2016-03-31 13:35:25 +0100 | [diff] [blame] | 836 | Options: utf |
| 837 | Starting code units: \x09 \x0a \x0b \x0c \x0d \x20 x |
| 838 | Last code unit = 'x' |
| 839 | Subject length lower bound = 4 |
| 840 | |
| 841 | /\sxxx\s/I,utf,tables=2 |
Elliott Hughes | 0c26e19 | 2019-08-07 12:24:46 -0700 | [diff] [blame] | 842 | Capture group count = 0 |
Janis Danisevskis | 112c9cc | 2016-03-31 13:35:25 +0100 | [diff] [blame] | 843 | Options: utf |
| 844 | Starting code units: \x09 \x0a \x0b \x0c \x0d \x20 \xc2 |
| 845 | Last code unit = 'x' |
| 846 | Subject length lower bound = 5 |
| 847 | AB\x{85}xxx\x{a0}XYZ |
| 848 | 0: \x{85}xxx\x{a0} |
| 849 | AB\x{a0}xxx\x{85}XYZ |
| 850 | 0: \x{a0}xxx\x{85} |
| 851 | |
| 852 | /\S \S/I,utf,tables=2 |
Elliott Hughes | 0c26e19 | 2019-08-07 12:24:46 -0700 | [diff] [blame] | 853 | Capture group count = 0 |
Janis Danisevskis | 112c9cc | 2016-03-31 13:35:25 +0100 | [diff] [blame] | 854 | Options: utf |
| 855 | Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x0e \x0f |
| 856 | \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x1a \x1b \x1c \x1d \x1e |
| 857 | \x1f ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @ A B C |
| 858 | D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ ` a b c d e f g h |
| 859 | i j k l m n o p q r s t u v w x y z { | } ~ \x7f \xc0 \xc1 \xc2 \xc3 \xc4 |
| 860 | \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc \xcd \xce \xcf \xd0 \xd1 \xd2 \xd3 |
| 861 | \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb \xdc \xdd \xde \xdf \xe0 \xe1 \xe2 |
| 862 | \xe3 \xe4 \xe5 \xe6 \xe7 \xe8 \xe9 \xea \xeb \xec \xed \xee \xef \xf0 \xf1 |
| 863 | \xf2 \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 \xf9 \xfa \xfb \xfc \xfd \xfe \xff |
| 864 | Last code unit = ' ' |
| 865 | Subject length lower bound = 3 |
| 866 | \x{a2} \x{84} |
| 867 | 0: \x{a2} \x{84} |
| 868 | A Z |
| 869 | 0: A Z |
| 870 | |
| 871 | /a+/utf |
| 872 | a\x{123}aa\=offset=1 |
| 873 | 0: aa |
| 874 | a\x{123}aa\=offset=3 |
| 875 | 0: aa |
| 876 | a\x{123}aa\=offset=4 |
| 877 | 0: a |
| 878 | \= Expect bad offset value |
| 879 | a\x{123}aa\=offset=6 |
| 880 | Failed: error -33: bad offset value |
| 881 | \= Expect bad UTF-8 offset |
| 882 | a\x{123}aa\=offset=2 |
| 883 | Error -36 (bad UTF-8 offset) |
| 884 | \= Expect no match |
| 885 | a\x{123}aa\=offset=5 |
| 886 | No match |
| 887 | |
| 888 | /\x{1234}+/Ii,utf |
Elliott Hughes | 0c26e19 | 2019-08-07 12:24:46 -0700 | [diff] [blame] | 889 | Capture group count = 0 |
Janis Danisevskis | 112c9cc | 2016-03-31 13:35:25 +0100 | [diff] [blame] | 890 | Options: caseless utf |
| 891 | Starting code units: \xe1 |
| 892 | Subject length lower bound = 1 |
| 893 | |
| 894 | /\x{1234}+?/Ii,utf |
Elliott Hughes | 0c26e19 | 2019-08-07 12:24:46 -0700 | [diff] [blame] | 895 | Capture group count = 0 |
Janis Danisevskis | 112c9cc | 2016-03-31 13:35:25 +0100 | [diff] [blame] | 896 | Options: caseless utf |
| 897 | Starting code units: \xe1 |
| 898 | Subject length lower bound = 1 |
| 899 | |
| 900 | /\x{1234}++/Ii,utf |
Elliott Hughes | 0c26e19 | 2019-08-07 12:24:46 -0700 | [diff] [blame] | 901 | Capture group count = 0 |
Janis Danisevskis | 112c9cc | 2016-03-31 13:35:25 +0100 | [diff] [blame] | 902 | Options: caseless utf |
| 903 | Starting code units: \xe1 |
| 904 | Subject length lower bound = 1 |
| 905 | |
| 906 | /\x{1234}{2}/Ii,utf |
Elliott Hughes | 0c26e19 | 2019-08-07 12:24:46 -0700 | [diff] [blame] | 907 | Capture group count = 0 |
Janis Danisevskis | 112c9cc | 2016-03-31 13:35:25 +0100 | [diff] [blame] | 908 | Options: caseless utf |
| 909 | Starting code units: \xe1 |
| 910 | Subject length lower bound = 2 |
| 911 | |
| 912 | /[^\x{c4}]/IB,utf |
| 913 | ------------------------------------------------------------------ |
| 914 | Bra |
| 915 | [^\x{c4}] |
| 916 | Ket |
| 917 | End |
| 918 | ------------------------------------------------------------------ |
Elliott Hughes | 0c26e19 | 2019-08-07 12:24:46 -0700 | [diff] [blame] | 919 | Capture group count = 0 |
Janis Danisevskis | 112c9cc | 2016-03-31 13:35:25 +0100 | [diff] [blame] | 920 | Options: utf |
| 921 | Subject length lower bound = 1 |
| 922 | |
| 923 | /X+\x{200}/IB,utf |
| 924 | ------------------------------------------------------------------ |
| 925 | Bra |
| 926 | X++ |
| 927 | \x{200} |
| 928 | Ket |
| 929 | End |
| 930 | ------------------------------------------------------------------ |
Elliott Hughes | 0c26e19 | 2019-08-07 12:24:46 -0700 | [diff] [blame] | 931 | Capture group count = 0 |
Janis Danisevskis | 112c9cc | 2016-03-31 13:35:25 +0100 | [diff] [blame] | 932 | Options: utf |
| 933 | First code unit = 'X' |
| 934 | Last code unit = \x80 |
| 935 | Subject length lower bound = 2 |
| 936 | |
| 937 | /\R/I,utf |
Elliott Hughes | 0c26e19 | 2019-08-07 12:24:46 -0700 | [diff] [blame] | 938 | Capture group count = 0 |
Janis Danisevskis | 112c9cc | 2016-03-31 13:35:25 +0100 | [diff] [blame] | 939 | Options: utf |
| 940 | Starting code units: \x0a \x0b \x0c \x0d \xc2 \xe2 |
| 941 | Subject length lower bound = 1 |
| 942 | |
| 943 | /\777/IB,utf |
| 944 | ------------------------------------------------------------------ |
| 945 | Bra |
| 946 | \x{1ff} |
| 947 | Ket |
| 948 | End |
| 949 | ------------------------------------------------------------------ |
Elliott Hughes | 0c26e19 | 2019-08-07 12:24:46 -0700 | [diff] [blame] | 950 | Capture group count = 0 |
Janis Danisevskis | 112c9cc | 2016-03-31 13:35:25 +0100 | [diff] [blame] | 951 | Options: utf |
| 952 | First code unit = \xc7 |
| 953 | Last code unit = \xbf |
| 954 | Subject length lower bound = 1 |
| 955 | |
| 956 | /\w+\x{C4}/B,utf |
| 957 | ------------------------------------------------------------------ |
| 958 | Bra |
| 959 | \w++ |
| 960 | \x{c4} |
| 961 | Ket |
| 962 | End |
| 963 | ------------------------------------------------------------------ |
| 964 | a\x{C4}\x{C4} |
| 965 | 0: a\x{c4} |
| 966 | |
| 967 | /\w+\x{C4}/B,utf,tables=2 |
| 968 | ------------------------------------------------------------------ |
| 969 | Bra |
| 970 | \w+ |
| 971 | \x{c4} |
| 972 | Ket |
| 973 | End |
| 974 | ------------------------------------------------------------------ |
| 975 | a\x{C4}\x{C4} |
| 976 | 0: a\x{c4}\x{c4} |
| 977 | |
| 978 | /\W+\x{C4}/B,utf |
| 979 | ------------------------------------------------------------------ |
| 980 | Bra |
| 981 | \W+ |
| 982 | \x{c4} |
| 983 | Ket |
| 984 | End |
| 985 | ------------------------------------------------------------------ |
| 986 | !\x{C4} |
| 987 | 0: !\x{c4} |
| 988 | |
| 989 | /\W+\x{C4}/B,utf,tables=2 |
| 990 | ------------------------------------------------------------------ |
| 991 | Bra |
| 992 | \W++ |
| 993 | \x{c4} |
| 994 | Ket |
| 995 | End |
| 996 | ------------------------------------------------------------------ |
| 997 | !\x{C4} |
| 998 | 0: !\x{c4} |
| 999 | |
| 1000 | /\W+\x{A1}/B,utf |
| 1001 | ------------------------------------------------------------------ |
| 1002 | Bra |
| 1003 | \W+ |
| 1004 | \x{a1} |
| 1005 | Ket |
| 1006 | End |
| 1007 | ------------------------------------------------------------------ |
| 1008 | !\x{A1} |
| 1009 | 0: !\x{a1} |
| 1010 | |
| 1011 | /\W+\x{A1}/B,utf,tables=2 |
| 1012 | ------------------------------------------------------------------ |
| 1013 | Bra |
| 1014 | \W+ |
| 1015 | \x{a1} |
| 1016 | Ket |
| 1017 | End |
| 1018 | ------------------------------------------------------------------ |
| 1019 | !\x{A1} |
| 1020 | 0: !\x{a1} |
| 1021 | |
| 1022 | /X\s+\x{A0}/B,utf |
| 1023 | ------------------------------------------------------------------ |
| 1024 | Bra |
| 1025 | X |
| 1026 | \s++ |
| 1027 | \x{a0} |
| 1028 | Ket |
| 1029 | End |
| 1030 | ------------------------------------------------------------------ |
| 1031 | X\x20\x{A0}\x{A0} |
| 1032 | 0: X \x{a0} |
| 1033 | |
| 1034 | /X\s+\x{A0}/B,utf,tables=2 |
| 1035 | ------------------------------------------------------------------ |
| 1036 | Bra |
| 1037 | X |
| 1038 | \s+ |
| 1039 | \x{a0} |
| 1040 | Ket |
| 1041 | End |
| 1042 | ------------------------------------------------------------------ |
| 1043 | X\x20\x{A0}\x{A0} |
| 1044 | 0: X \x{a0}\x{a0} |
| 1045 | |
| 1046 | /\S+\x{A0}/B,utf |
| 1047 | ------------------------------------------------------------------ |
| 1048 | Bra |
| 1049 | \S+ |
| 1050 | \x{a0} |
| 1051 | Ket |
| 1052 | End |
| 1053 | ------------------------------------------------------------------ |
| 1054 | X\x{A0}\x{A0} |
| 1055 | 0: X\x{a0}\x{a0} |
| 1056 | |
| 1057 | /\S+\x{A0}/B,utf,tables=2 |
| 1058 | ------------------------------------------------------------------ |
| 1059 | Bra |
| 1060 | \S++ |
| 1061 | \x{a0} |
| 1062 | Ket |
| 1063 | End |
| 1064 | ------------------------------------------------------------------ |
| 1065 | X\x{A0}\x{A0} |
| 1066 | 0: X\x{a0} |
| 1067 | |
| 1068 | /\x{a0}+\s!/B,utf |
| 1069 | ------------------------------------------------------------------ |
| 1070 | Bra |
| 1071 | \x{a0}++ |
| 1072 | \s |
| 1073 | ! |
| 1074 | Ket |
| 1075 | End |
| 1076 | ------------------------------------------------------------------ |
| 1077 | \x{a0}\x20! |
| 1078 | 0: \x{a0} ! |
| 1079 | |
| 1080 | /\x{a0}+\s!/B,utf,tables=2 |
| 1081 | ------------------------------------------------------------------ |
| 1082 | Bra |
| 1083 | \x{a0}+ |
| 1084 | \s |
| 1085 | ! |
| 1086 | Ket |
| 1087 | End |
| 1088 | ------------------------------------------------------------------ |
| 1089 | \x{a0}\x20! |
| 1090 | 0: \x{a0} ! |
| 1091 | |
| 1092 | /A/utf |
| 1093 | \x{ff000041} |
| 1094 | ** Character \x{ff000041} is greater than 0x7fffffff and so cannot be converted to UTF-8 |
| 1095 | \x{7f000041} |
| 1096 | Failed: error -14: UTF-8 error: 6-byte character is not allowed (RFC 3629) at offset 0 |
| 1097 | |
| 1098 | /(*UTF8)abc/never_utf |
| 1099 | Failed: error 174 at offset 7: using UTF is disabled by the application |
| 1100 | |
| 1101 | /abc/utf,never_utf |
| 1102 | Failed: error 174 at offset 0: using UTF is disabled by the application |
| 1103 | |
| 1104 | /A\x{391}\x{10427}\x{ff3a}\x{1fb0}/IBi,utf |
| 1105 | ------------------------------------------------------------------ |
| 1106 | Bra |
| 1107 | /i A\x{391}\x{10427}\x{ff3a}\x{1fb0} |
| 1108 | Ket |
| 1109 | End |
| 1110 | ------------------------------------------------------------------ |
Elliott Hughes | 0c26e19 | 2019-08-07 12:24:46 -0700 | [diff] [blame] | 1111 | Capture group count = 0 |
Janis Danisevskis | 112c9cc | 2016-03-31 13:35:25 +0100 | [diff] [blame] | 1112 | Options: caseless utf |
| 1113 | First code unit = 'A' (caseless) |
| 1114 | Subject length lower bound = 5 |
| 1115 | |
| 1116 | /A\x{391}\x{10427}\x{ff3a}\x{1fb0}/IB,utf |
| 1117 | ------------------------------------------------------------------ |
| 1118 | Bra |
| 1119 | A\x{391}\x{10427}\x{ff3a}\x{1fb0} |
| 1120 | Ket |
| 1121 | End |
| 1122 | ------------------------------------------------------------------ |
Elliott Hughes | 0c26e19 | 2019-08-07 12:24:46 -0700 | [diff] [blame] | 1123 | Capture group count = 0 |
Janis Danisevskis | 112c9cc | 2016-03-31 13:35:25 +0100 | [diff] [blame] | 1124 | Options: utf |
| 1125 | First code unit = 'A' |
| 1126 | Last code unit = \xb0 |
| 1127 | Subject length lower bound = 5 |
| 1128 | |
| 1129 | /AB\x{1fb0}/IB,utf |
| 1130 | ------------------------------------------------------------------ |
| 1131 | Bra |
| 1132 | AB\x{1fb0} |
| 1133 | Ket |
| 1134 | End |
| 1135 | ------------------------------------------------------------------ |
Elliott Hughes | 0c26e19 | 2019-08-07 12:24:46 -0700 | [diff] [blame] | 1136 | Capture group count = 0 |
Janis Danisevskis | 112c9cc | 2016-03-31 13:35:25 +0100 | [diff] [blame] | 1137 | Options: utf |
| 1138 | First code unit = 'A' |
| 1139 | Last code unit = \xb0 |
| 1140 | Subject length lower bound = 3 |
| 1141 | |
| 1142 | /AB\x{1fb0}/IBi,utf |
| 1143 | ------------------------------------------------------------------ |
| 1144 | Bra |
| 1145 | /i AB\x{1fb0} |
| 1146 | Ket |
| 1147 | End |
| 1148 | ------------------------------------------------------------------ |
Elliott Hughes | 0c26e19 | 2019-08-07 12:24:46 -0700 | [diff] [blame] | 1149 | Capture group count = 0 |
Janis Danisevskis | 112c9cc | 2016-03-31 13:35:25 +0100 | [diff] [blame] | 1150 | Options: caseless utf |
| 1151 | First code unit = 'A' (caseless) |
| 1152 | Last code unit = 'B' (caseless) |
| 1153 | Subject length lower bound = 3 |
| 1154 | |
| 1155 | /\x{401}\x{420}\x{421}\x{422}\x{423}\x{424}\x{425}\x{426}\x{427}\x{428}\x{429}\x{42a}\x{42b}\x{42c}\x{42d}\x{42e}\x{42f}/Ii,utf |
Elliott Hughes | 0c26e19 | 2019-08-07 12:24:46 -0700 | [diff] [blame] | 1156 | Capture group count = 0 |
Janis Danisevskis | 112c9cc | 2016-03-31 13:35:25 +0100 | [diff] [blame] | 1157 | Options: caseless utf |
| 1158 | Starting code units: \xd0 \xd1 |
| 1159 | Subject length lower bound = 17 |
| 1160 | \x{401}\x{420}\x{421}\x{422}\x{423}\x{424}\x{425}\x{426}\x{427}\x{428}\x{429}\x{42a}\x{42b}\x{42c}\x{42d}\x{42e}\x{42f} |
| 1161 | 0: \x{401}\x{420}\x{421}\x{422}\x{423}\x{424}\x{425}\x{426}\x{427}\x{428}\x{429}\x{42a}\x{42b}\x{42c}\x{42d}\x{42e}\x{42f} |
| 1162 | \x{451}\x{440}\x{441}\x{442}\x{443}\x{444}\x{445}\x{446}\x{447}\x{448}\x{449}\x{44a}\x{44b}\x{44c}\x{44d}\x{44e}\x{44f} |
| 1163 | 0: \x{451}\x{440}\x{441}\x{442}\x{443}\x{444}\x{445}\x{446}\x{447}\x{448}\x{449}\x{44a}\x{44b}\x{44c}\x{44d}\x{44e}\x{44f} |
| 1164 | |
| 1165 | /[â±¥]/Bi,utf |
| 1166 | ------------------------------------------------------------------ |
| 1167 | Bra |
| 1168 | /i \x{2c65} |
| 1169 | Ket |
| 1170 | End |
| 1171 | ------------------------------------------------------------------ |
| 1172 | |
| 1173 | /[^â±¥]/Bi,utf |
| 1174 | ------------------------------------------------------------------ |
| 1175 | Bra |
| 1176 | /i [^\x{2c65}] |
| 1177 | Ket |
| 1178 | End |
| 1179 | ------------------------------------------------------------------ |
| 1180 | |
| 1181 | /\h/I |
Elliott Hughes | 0c26e19 | 2019-08-07 12:24:46 -0700 | [diff] [blame] | 1182 | Capture group count = 0 |
Janis Danisevskis | 112c9cc | 2016-03-31 13:35:25 +0100 | [diff] [blame] | 1183 | Starting code units: \x09 \x20 \xa0 |
| 1184 | Subject length lower bound = 1 |
| 1185 | |
| 1186 | /\v/I |
Elliott Hughes | 0c26e19 | 2019-08-07 12:24:46 -0700 | [diff] [blame] | 1187 | Capture group count = 0 |
Janis Danisevskis | 112c9cc | 2016-03-31 13:35:25 +0100 | [diff] [blame] | 1188 | Starting code units: \x0a \x0b \x0c \x0d \x85 |
| 1189 | Subject length lower bound = 1 |
| 1190 | |
| 1191 | /\R/I |
Elliott Hughes | 0c26e19 | 2019-08-07 12:24:46 -0700 | [diff] [blame] | 1192 | Capture group count = 0 |
Janis Danisevskis | 112c9cc | 2016-03-31 13:35:25 +0100 | [diff] [blame] | 1193 | Starting code units: \x0a \x0b \x0c \x0d \x85 |
| 1194 | Subject length lower bound = 1 |
| 1195 | |
| 1196 | /[[:blank:]]/B,ucp |
| 1197 | ------------------------------------------------------------------ |
| 1198 | Bra |
| 1199 | [\x09 \xa0] |
| 1200 | Ket |
| 1201 | End |
| 1202 | ------------------------------------------------------------------ |
| 1203 | |
| 1204 | /\x{212a}+/Ii,utf |
Elliott Hughes | 0c26e19 | 2019-08-07 12:24:46 -0700 | [diff] [blame] | 1205 | Capture group count = 0 |
Janis Danisevskis | 112c9cc | 2016-03-31 13:35:25 +0100 | [diff] [blame] | 1206 | Options: caseless utf |
| 1207 | Starting code units: K k \xe2 |
| 1208 | Subject length lower bound = 1 |
| 1209 | KKkk\x{212a} |
| 1210 | 0: KKkk\x{212a} |
| 1211 | |
| 1212 | /s+/Ii,utf |
Elliott Hughes | 0c26e19 | 2019-08-07 12:24:46 -0700 | [diff] [blame] | 1213 | Capture group count = 0 |
Janis Danisevskis | 112c9cc | 2016-03-31 13:35:25 +0100 | [diff] [blame] | 1214 | Options: caseless utf |
| 1215 | Starting code units: S s \xc5 |
| 1216 | Subject length lower bound = 1 |
| 1217 | SSss\x{17f} |
| 1218 | 0: SSss\x{17f} |
| 1219 | |
| 1220 | /\x{100}*A/IB,utf |
| 1221 | ------------------------------------------------------------------ |
| 1222 | Bra |
| 1223 | \x{100}*+ |
| 1224 | A |
| 1225 | Ket |
| 1226 | End |
| 1227 | ------------------------------------------------------------------ |
Elliott Hughes | 0c26e19 | 2019-08-07 12:24:46 -0700 | [diff] [blame] | 1228 | Capture group count = 0 |
Janis Danisevskis | 112c9cc | 2016-03-31 13:35:25 +0100 | [diff] [blame] | 1229 | Options: utf |
| 1230 | Starting code units: A \xc4 |
| 1231 | Last code unit = 'A' |
| 1232 | Subject length lower bound = 1 |
| 1233 | A |
| 1234 | 0: A |
| 1235 | |
| 1236 | /\x{100}*\d(?R)/IB,utf |
| 1237 | ------------------------------------------------------------------ |
| 1238 | Bra |
| 1239 | \x{100}*+ |
| 1240 | \d |
| 1241 | Recurse |
| 1242 | Ket |
| 1243 | End |
| 1244 | ------------------------------------------------------------------ |
Elliott Hughes | 0c26e19 | 2019-08-07 12:24:46 -0700 | [diff] [blame] | 1245 | Capture group count = 0 |
Janis Danisevskis | 112c9cc | 2016-03-31 13:35:25 +0100 | [diff] [blame] | 1246 | Options: utf |
| 1247 | Starting code units: 0 1 2 3 4 5 6 7 8 9 \xc4 |
| 1248 | Subject length lower bound = 1 |
| 1249 | |
| 1250 | /[Z\x{100}]/IB,utf |
| 1251 | ------------------------------------------------------------------ |
| 1252 | Bra |
| 1253 | [Z\x{100}] |
| 1254 | Ket |
| 1255 | End |
| 1256 | ------------------------------------------------------------------ |
Elliott Hughes | 0c26e19 | 2019-08-07 12:24:46 -0700 | [diff] [blame] | 1257 | Capture group count = 0 |
Janis Danisevskis | 112c9cc | 2016-03-31 13:35:25 +0100 | [diff] [blame] | 1258 | Options: utf |
Elliott Hughes | 2dbd7d2 | 2020-06-03 14:32:37 -0700 | [diff] [blame] | 1259 | Starting code units: Z \xc4 |
Janis Danisevskis | 112c9cc | 2016-03-31 13:35:25 +0100 | [diff] [blame] | 1260 | Subject length lower bound = 1 |
| 1261 | Z\x{100} |
| 1262 | 0: Z |
| 1263 | \x{100} |
| 1264 | 0: \x{100} |
| 1265 | \x{100}Z |
| 1266 | 0: \x{100} |
| 1267 | |
| 1268 | /[z-\x{100}]/IB,utf |
| 1269 | ------------------------------------------------------------------ |
| 1270 | Bra |
| 1271 | [z-\xff\x{100}] |
| 1272 | Ket |
| 1273 | End |
| 1274 | ------------------------------------------------------------------ |
Elliott Hughes | 0c26e19 | 2019-08-07 12:24:46 -0700 | [diff] [blame] | 1275 | Capture group count = 0 |
Janis Danisevskis | 112c9cc | 2016-03-31 13:35:25 +0100 | [diff] [blame] | 1276 | Options: utf |
Elliott Hughes | 2dbd7d2 | 2020-06-03 14:32:37 -0700 | [diff] [blame] | 1277 | Starting code units: z { | } ~ \x7f \xc2 \xc3 \xc4 |
Janis Danisevskis | 112c9cc | 2016-03-31 13:35:25 +0100 | [diff] [blame] | 1278 | Subject length lower bound = 1 |
| 1279 | |
| 1280 | /[z\Qa-d]Ä\E]/IB,utf |
| 1281 | ------------------------------------------------------------------ |
| 1282 | Bra |
| 1283 | [\-\]adz\x{100}] |
| 1284 | Ket |
| 1285 | End |
| 1286 | ------------------------------------------------------------------ |
Elliott Hughes | 0c26e19 | 2019-08-07 12:24:46 -0700 | [diff] [blame] | 1287 | Capture group count = 0 |
Janis Danisevskis | 112c9cc | 2016-03-31 13:35:25 +0100 | [diff] [blame] | 1288 | Options: utf |
Elliott Hughes | 2dbd7d2 | 2020-06-03 14:32:37 -0700 | [diff] [blame] | 1289 | Starting code units: - ] a d z \xc4 |
Janis Danisevskis | 112c9cc | 2016-03-31 13:35:25 +0100 | [diff] [blame] | 1290 | Subject length lower bound = 1 |
| 1291 | \x{100} |
| 1292 | 0: \x{100} |
| 1293 | Ä |
| 1294 | 0: \x{100} |
| 1295 | |
| 1296 | /[ab\x{100}]abc(xyz(?1))/IB,utf |
| 1297 | ------------------------------------------------------------------ |
| 1298 | Bra |
| 1299 | [ab\x{100}] |
| 1300 | abc |
| 1301 | CBra 1 |
| 1302 | xyz |
| 1303 | Recurse |
| 1304 | Ket |
| 1305 | Ket |
| 1306 | End |
| 1307 | ------------------------------------------------------------------ |
Elliott Hughes | 0c26e19 | 2019-08-07 12:24:46 -0700 | [diff] [blame] | 1308 | Capture group count = 1 |
Janis Danisevskis | 112c9cc | 2016-03-31 13:35:25 +0100 | [diff] [blame] | 1309 | Options: utf |
Elliott Hughes | 2dbd7d2 | 2020-06-03 14:32:37 -0700 | [diff] [blame] | 1310 | Starting code units: a b \xc4 |
Janis Danisevskis | 112c9cc | 2016-03-31 13:35:25 +0100 | [diff] [blame] | 1311 | Last code unit = 'z' |
| 1312 | Subject length lower bound = 7 |
| 1313 | |
| 1314 | /\x{100}*\s/IB,utf |
| 1315 | ------------------------------------------------------------------ |
| 1316 | Bra |
| 1317 | \x{100}*+ |
| 1318 | \s |
| 1319 | Ket |
| 1320 | End |
| 1321 | ------------------------------------------------------------------ |
Elliott Hughes | 0c26e19 | 2019-08-07 12:24:46 -0700 | [diff] [blame] | 1322 | Capture group count = 0 |
Janis Danisevskis | 112c9cc | 2016-03-31 13:35:25 +0100 | [diff] [blame] | 1323 | Options: utf |
| 1324 | Starting code units: \x09 \x0a \x0b \x0c \x0d \x20 \xc4 |
| 1325 | Subject length lower bound = 1 |
| 1326 | |
| 1327 | /\x{100}*\d/IB,utf |
| 1328 | ------------------------------------------------------------------ |
| 1329 | Bra |
| 1330 | \x{100}*+ |
| 1331 | \d |
| 1332 | Ket |
| 1333 | End |
| 1334 | ------------------------------------------------------------------ |
Elliott Hughes | 0c26e19 | 2019-08-07 12:24:46 -0700 | [diff] [blame] | 1335 | Capture group count = 0 |
Janis Danisevskis | 112c9cc | 2016-03-31 13:35:25 +0100 | [diff] [blame] | 1336 | Options: utf |
| 1337 | Starting code units: 0 1 2 3 4 5 6 7 8 9 \xc4 |
| 1338 | Subject length lower bound = 1 |
| 1339 | |
| 1340 | /\x{100}*\w/IB,utf |
| 1341 | ------------------------------------------------------------------ |
| 1342 | Bra |
| 1343 | \x{100}*+ |
| 1344 | \w |
| 1345 | Ket |
| 1346 | End |
| 1347 | ------------------------------------------------------------------ |
Elliott Hughes | 0c26e19 | 2019-08-07 12:24:46 -0700 | [diff] [blame] | 1348 | Capture group count = 0 |
Janis Danisevskis | 112c9cc | 2016-03-31 13:35:25 +0100 | [diff] [blame] | 1349 | Options: utf |
| 1350 | Starting code units: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P |
| 1351 | Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z |
| 1352 | \xc4 |
| 1353 | Subject length lower bound = 1 |
| 1354 | |
| 1355 | /\x{100}*\D/IB,utf |
| 1356 | ------------------------------------------------------------------ |
| 1357 | Bra |
| 1358 | \x{100}* |
| 1359 | \D |
| 1360 | Ket |
| 1361 | End |
| 1362 | ------------------------------------------------------------------ |
Elliott Hughes | 0c26e19 | 2019-08-07 12:24:46 -0700 | [diff] [blame] | 1363 | Capture group count = 0 |
Janis Danisevskis | 112c9cc | 2016-03-31 13:35:25 +0100 | [diff] [blame] | 1364 | Options: utf |
| 1365 | Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a |
| 1366 | \x0b \x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 |
| 1367 | \x1a \x1b \x1c \x1d \x1e \x1f \x20 ! " # $ % & ' ( ) * + , - . / : ; < = > |
| 1368 | ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ ` a b c |
| 1369 | d e f g h i j k l m n o p q r s t u v w x y z { | } ~ \x7f \xc0 \xc1 \xc2 |
| 1370 | \xc3 \xc4 \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc \xcd \xce \xcf \xd0 \xd1 |
| 1371 | \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb \xdc \xdd \xde \xdf \xe0 |
| 1372 | \xe1 \xe2 \xe3 \xe4 \xe5 \xe6 \xe7 \xe8 \xe9 \xea \xeb \xec \xed \xee \xef |
| 1373 | \xf0 \xf1 \xf2 \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 \xf9 \xfa \xfb \xfc \xfd \xfe |
| 1374 | \xff |
| 1375 | Subject length lower bound = 1 |
| 1376 | |
| 1377 | /\x{100}*\S/IB,utf |
| 1378 | ------------------------------------------------------------------ |
| 1379 | Bra |
| 1380 | \x{100}* |
| 1381 | \S |
| 1382 | Ket |
| 1383 | End |
| 1384 | ------------------------------------------------------------------ |
Elliott Hughes | 0c26e19 | 2019-08-07 12:24:46 -0700 | [diff] [blame] | 1385 | Capture group count = 0 |
Janis Danisevskis | 112c9cc | 2016-03-31 13:35:25 +0100 | [diff] [blame] | 1386 | Options: utf |
| 1387 | Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x0e \x0f |
| 1388 | \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x1a \x1b \x1c \x1d \x1e |
| 1389 | \x1f ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @ A B C |
| 1390 | D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ ` a b c d e f g h |
| 1391 | i j k l m n o p q r s t u v w x y z { | } ~ \x7f \xc0 \xc1 \xc2 \xc3 \xc4 |
| 1392 | \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc \xcd \xce \xcf \xd0 \xd1 \xd2 \xd3 |
| 1393 | \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb \xdc \xdd \xde \xdf \xe0 \xe1 \xe2 |
| 1394 | \xe3 \xe4 \xe5 \xe6 \xe7 \xe8 \xe9 \xea \xeb \xec \xed \xee \xef \xf0 \xf1 |
| 1395 | \xf2 \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 \xf9 \xfa \xfb \xfc \xfd \xfe \xff |
| 1396 | Subject length lower bound = 1 |
| 1397 | |
| 1398 | /\x{100}*\W/IB,utf |
| 1399 | ------------------------------------------------------------------ |
| 1400 | Bra |
| 1401 | \x{100}* |
| 1402 | \W |
| 1403 | Ket |
| 1404 | End |
| 1405 | ------------------------------------------------------------------ |
Elliott Hughes | 0c26e19 | 2019-08-07 12:24:46 -0700 | [diff] [blame] | 1406 | Capture group count = 0 |
Janis Danisevskis | 112c9cc | 2016-03-31 13:35:25 +0100 | [diff] [blame] | 1407 | Options: utf |
| 1408 | Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a |
| 1409 | \x0b \x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 |
| 1410 | \x1a \x1b \x1c \x1d \x1e \x1f \x20 ! " # $ % & ' ( ) * + , - . / : ; < = > |
| 1411 | ? @ [ \ ] ^ ` { | } ~ \x7f \xc0 \xc1 \xc2 \xc3 \xc4 \xc5 \xc6 \xc7 \xc8 \xc9 |
| 1412 | \xca \xcb \xcc \xcd \xce \xcf \xd0 \xd1 \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8 |
| 1413 | \xd9 \xda \xdb \xdc \xdd \xde \xdf \xe0 \xe1 \xe2 \xe3 \xe4 \xe5 \xe6 \xe7 |
| 1414 | \xe8 \xe9 \xea \xeb \xec \xed \xee \xef \xf0 \xf1 \xf2 \xf3 \xf4 \xf5 \xf6 |
| 1415 | \xf7 \xf8 \xf9 \xfa \xfb \xfc \xfd \xfe \xff |
| 1416 | Subject length lower bound = 1 |
| 1417 | |
| 1418 | /[\x{105}-\x{109}]/IBi,utf |
| 1419 | ------------------------------------------------------------------ |
| 1420 | Bra |
| 1421 | [\x{104}-\x{109}] |
| 1422 | Ket |
| 1423 | End |
| 1424 | ------------------------------------------------------------------ |
Elliott Hughes | 0c26e19 | 2019-08-07 12:24:46 -0700 | [diff] [blame] | 1425 | Capture group count = 0 |
Janis Danisevskis | 112c9cc | 2016-03-31 13:35:25 +0100 | [diff] [blame] | 1426 | Options: caseless utf |
Elliott Hughes | 2dbd7d2 | 2020-06-03 14:32:37 -0700 | [diff] [blame] | 1427 | Starting code units: \xc4 |
Janis Danisevskis | 112c9cc | 2016-03-31 13:35:25 +0100 | [diff] [blame] | 1428 | Subject length lower bound = 1 |
| 1429 | \x{104} |
| 1430 | 0: \x{104} |
| 1431 | \x{105} |
| 1432 | 0: \x{105} |
| 1433 | \x{109} |
| 1434 | 0: \x{109} |
| 1435 | \= Expect no match |
| 1436 | \x{100} |
| 1437 | No match |
| 1438 | \x{10a} |
| 1439 | No match |
| 1440 | |
| 1441 | /[z-\x{100}]/IBi,utf |
| 1442 | ------------------------------------------------------------------ |
| 1443 | Bra |
| 1444 | [Zz-\xff\x{39c}\x{3bc}\x{212b}\x{1e9e}\x{212b}\x{178}\x{100}-\x{101}] |
| 1445 | Ket |
| 1446 | End |
| 1447 | ------------------------------------------------------------------ |
Elliott Hughes | 0c26e19 | 2019-08-07 12:24:46 -0700 | [diff] [blame] | 1448 | Capture group count = 0 |
Janis Danisevskis | 112c9cc | 2016-03-31 13:35:25 +0100 | [diff] [blame] | 1449 | Options: caseless utf |
Elliott Hughes | 2dbd7d2 | 2020-06-03 14:32:37 -0700 | [diff] [blame] | 1450 | Starting code units: Z z { | } ~ \x7f \xc2 \xc3 \xc4 \xc5 \xce \xe1 \xe2 |
Janis Danisevskis | 112c9cc | 2016-03-31 13:35:25 +0100 | [diff] [blame] | 1451 | Subject length lower bound = 1 |
| 1452 | Z |
| 1453 | 0: Z |
| 1454 | z |
| 1455 | 0: z |
| 1456 | \x{39c} |
| 1457 | 0: \x{39c} |
| 1458 | \x{178} |
| 1459 | 0: \x{178} |
| 1460 | | |
| 1461 | 0: | |
| 1462 | \x{80} |
| 1463 | 0: \x{80} |
| 1464 | \x{ff} |
| 1465 | 0: \x{ff} |
| 1466 | \x{100} |
| 1467 | 0: \x{100} |
| 1468 | \x{101} |
| 1469 | 0: \x{101} |
| 1470 | \= Expect no match |
| 1471 | \x{102} |
| 1472 | No match |
| 1473 | Y |
| 1474 | No match |
| 1475 | y |
| 1476 | No match |
| 1477 | |
| 1478 | /[z-\x{100}]/IBi,utf |
| 1479 | ------------------------------------------------------------------ |
| 1480 | Bra |
| 1481 | [Zz-\xff\x{39c}\x{3bc}\x{212b}\x{1e9e}\x{212b}\x{178}\x{100}-\x{101}] |
| 1482 | Ket |
| 1483 | End |
| 1484 | ------------------------------------------------------------------ |
Elliott Hughes | 0c26e19 | 2019-08-07 12:24:46 -0700 | [diff] [blame] | 1485 | Capture group count = 0 |
Janis Danisevskis | 112c9cc | 2016-03-31 13:35:25 +0100 | [diff] [blame] | 1486 | Options: caseless utf |
Elliott Hughes | 2dbd7d2 | 2020-06-03 14:32:37 -0700 | [diff] [blame] | 1487 | Starting code units: Z z { | } ~ \x7f \xc2 \xc3 \xc4 \xc5 \xce \xe1 \xe2 |
Janis Danisevskis | 112c9cc | 2016-03-31 13:35:25 +0100 | [diff] [blame] | 1488 | Subject length lower bound = 1 |
| 1489 | |
| 1490 | /\x{3a3}B/IBi,utf |
| 1491 | ------------------------------------------------------------------ |
| 1492 | Bra |
| 1493 | clist 03a3 03c2 03c3 |
| 1494 | /i B |
| 1495 | Ket |
| 1496 | End |
| 1497 | ------------------------------------------------------------------ |
Elliott Hughes | 0c26e19 | 2019-08-07 12:24:46 -0700 | [diff] [blame] | 1498 | Capture group count = 0 |
Janis Danisevskis | 112c9cc | 2016-03-31 13:35:25 +0100 | [diff] [blame] | 1499 | Options: caseless utf |
| 1500 | Starting code units: \xce \xcf |
| 1501 | Last code unit = 'B' (caseless) |
| 1502 | Subject length lower bound = 2 |
| 1503 | |
| 1504 | /abc/utf,replace=Ã |
| 1505 | abc |
| 1506 | Failed: error -3: UTF-8 error: 1 byte missing at end |
| 1507 | |
| 1508 | /(?<=(a)(?-1))x/I,utf |
Elliott Hughes | 0c26e19 | 2019-08-07 12:24:46 -0700 | [diff] [blame] | 1509 | Capture group count = 1 |
Janis Danisevskis | 112c9cc | 2016-03-31 13:35:25 +0100 | [diff] [blame] | 1510 | Max lookbehind = 2 |
| 1511 | Options: utf |
| 1512 | First code unit = 'x' |
| 1513 | Subject length lower bound = 1 |
| 1514 | a\x80zx\=offset=3 |
| 1515 | Failed: error -22: UTF-8 error: isolated byte with 0x80 bit set at offset 1 |
| 1516 | |
Elliott Hughes | 9bc971b | 2018-07-27 13:23:14 -0700 | [diff] [blame] | 1517 | /[\W\p{Any}]/B |
| 1518 | ------------------------------------------------------------------ |
| 1519 | Bra |
| 1520 | [\x00-/:-@[-^`{-\xff\p{Any}] |
| 1521 | Ket |
| 1522 | End |
| 1523 | ------------------------------------------------------------------ |
| 1524 | abc |
| 1525 | 0: a |
| 1526 | 123 |
| 1527 | 0: 1 |
| 1528 | |
| 1529 | /[\W\pL]/B |
| 1530 | ------------------------------------------------------------------ |
| 1531 | Bra |
| 1532 | [\x00-/:-@[-^`{-\xff\p{L}] |
| 1533 | Ket |
| 1534 | End |
| 1535 | ------------------------------------------------------------------ |
| 1536 | abc |
| 1537 | 0: a |
| 1538 | \= Expect no match |
| 1539 | 123 |
| 1540 | No match |
| 1541 | |
| 1542 | /(*:*++++++++++++''''''''''''''''''''+''+++'+++x+++++++++++++++++++++++++++++++++++(++++++++++++++++++++:++++++%++:''''''''''''''''''''''''+++++++++++++++++++++++++++++++++++++++++++++++++++++-++++++++k+++++++''''+++'+++++++++++++++++++++++''''++++++++++++':Æ¿)/utf |
| 1543 | Failed: error 176 at offset 259: name is too long in (*MARK), (*PRUNE), (*SKIP), or (*THEN) |
| 1544 | |
| 1545 | /[\s[:^ascii:]]/B,ucp |
| 1546 | ------------------------------------------------------------------ |
| 1547 | Bra |
| 1548 | [\x80-\xff\p{Xsp}] |
| 1549 | Ket |
| 1550 | End |
| 1551 | ------------------------------------------------------------------ |
| 1552 | |
| 1553 | # A special extra option allows excaped surrogate code points in 8-bit mode, |
| 1554 | # but subjects containing them must not be UTF-checked. |
| 1555 | |
| 1556 | /\x{d800}/I,utf,allow_surrogate_escapes |
Elliott Hughes | 0c26e19 | 2019-08-07 12:24:46 -0700 | [diff] [blame] | 1557 | Capture group count = 0 |
Elliott Hughes | 9bc971b | 2018-07-27 13:23:14 -0700 | [diff] [blame] | 1558 | Options: utf |
| 1559 | Extra options: allow_surrogate_escapes |
| 1560 | First code unit = \xed |
| 1561 | Last code unit = \x80 |
| 1562 | Subject length lower bound = 1 |
| 1563 | \x{d800}\=no_utf_check |
| 1564 | 0: \x{d800} |
| 1565 | |
| 1566 | /\udfff\o{157401}/utf,alt_bsux,allow_surrogate_escapes |
| 1567 | \x{dfff}\x{df01}\=no_utf_check |
| 1568 | 0: \x{dfff}\x{df01} |
| 1569 | |
| 1570 | # This has different starting code units in 8-bit mode. |
| 1571 | |
| 1572 | /^[^ab]/IB,utf |
| 1573 | ------------------------------------------------------------------ |
| 1574 | Bra |
| 1575 | ^ |
| 1576 | [\x00-`c-\xff] (neg) |
| 1577 | Ket |
| 1578 | End |
| 1579 | ------------------------------------------------------------------ |
Elliott Hughes | 0c26e19 | 2019-08-07 12:24:46 -0700 | [diff] [blame] | 1580 | Capture group count = 0 |
Elliott Hughes | 9bc971b | 2018-07-27 13:23:14 -0700 | [diff] [blame] | 1581 | Compile options: utf |
| 1582 | Overall options: anchored utf |
| 1583 | Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a |
| 1584 | \x0b \x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 |
| 1585 | \x1a \x1b \x1c \x1d \x1e \x1f \x20 ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 |
| 1586 | 5 6 7 8 9 : ; < = > ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y |
| 1587 | Z [ \ ] ^ _ ` c d e f g h i j k l m n o p q r s t u v w x y z { | } ~ \x7f |
| 1588 | \xc2 \xc3 \xc4 \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc \xcd \xce \xcf \xd0 |
| 1589 | \xd1 \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb \xdc \xdd \xde \xdf |
| 1590 | \xe0 \xe1 \xe2 \xe3 \xe4 \xe5 \xe6 \xe7 \xe8 \xe9 \xea \xeb \xec \xed \xee |
| 1591 | \xef \xf0 \xf1 \xf2 \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 \xf9 \xfa \xfb \xfc \xfd |
| 1592 | \xfe \xff |
| 1593 | Subject length lower bound = 1 |
| 1594 | c |
| 1595 | 0: c |
| 1596 | \x{ff} |
| 1597 | 0: \x{ff} |
| 1598 | \x{100} |
| 1599 | 0: \x{100} |
| 1600 | \= Expect no match |
| 1601 | aaa |
| 1602 | No match |
Elliott Hughes | 0c26e19 | 2019-08-07 12:24:46 -0700 | [diff] [blame] | 1603 | |
| 1604 | # Offsets are different in 8-bit mode. |
| 1605 | |
| 1606 | /(?<=abc)(|def)/g,utf,replace=<$0>,substitute_callout |
| 1607 | 123abcáyzabcdef789abcá´qr |
| 1608 | 1(2) Old 6 6 "" New 6 8 "<>" |
| 1609 | 2(2) Old 13 13 "" New 15 17 "<>" |
| 1610 | 3(2) Old 13 16 "def" New 17 22 "<def>" |
| 1611 | 4(2) Old 22 22 "" New 28 30 "<>" |
| 1612 | 4: 123abc<>\x{e1}yzabc<><def>789abc<>\x{1234}qr |
| 1613 | |
| 1614 | # Check name length with non-ASCII characters |
| 1615 | |
| 1616 | /(?'ABáC678901234567890123456789012'...)/utf |
| 1617 | |
| 1618 | /(?'ABáC6789012345678901234567890123'...)/utf |
| 1619 | Failed: error 148 at offset 36: subpattern name is too long (maximum 32 code units) |
| 1620 | |
| 1621 | /(?'ABZC6789012345678901234567890123'...)/utf |
| 1622 | |
| 1623 | /(?(n/utf |
| 1624 | Failed: error 142 at offset 4: syntax error in subpattern name (missing terminator?) |
| 1625 | |
| 1626 | /(?(á/utf |
| 1627 | Failed: error 142 at offset 5: syntax error in subpattern name (missing terminator?) |
Elliott Hughes | 9bc971b | 2018-07-27 13:23:14 -0700 | [diff] [blame] | 1628 | |
Elliott Hughes | 2dbd7d2 | 2020-06-03 14:32:37 -0700 | [diff] [blame] | 1629 | # Invalid UTF-8 tests |
| 1630 | |
| 1631 | /.../g,match_invalid_utf |
| 1632 | abcd\x80wxzy\x80pqrs |
| 1633 | 0: abc |
| 1634 | 0: wxz |
| 1635 | 0: pqr |
| 1636 | abcd\x{80}wxzy\x80pqrs |
| 1637 | 0: abc |
| 1638 | 0: d\x{80}w |
| 1639 | 0: xzy |
| 1640 | 0: pqr |
| 1641 | |
| 1642 | /abc/match_invalid_utf |
| 1643 | ab\x80ab\=ph |
| 1644 | Partial match: ab |
| 1645 | \= Expect no match |
| 1646 | ab\x80cdef\=ph |
| 1647 | No match |
| 1648 | |
| 1649 | /ab$/match_invalid_utf |
| 1650 | ab\x80cdeab |
| 1651 | 0: ab |
| 1652 | \= Expect no match |
| 1653 | ab\x80cde |
| 1654 | No match |
| 1655 | |
| 1656 | /.../g,match_invalid_utf |
| 1657 | abcd\x{80}wxzy\x80pqrs |
| 1658 | 0: abc |
| 1659 | 0: d\x{80}w |
| 1660 | 0: xzy |
| 1661 | 0: pqr |
| 1662 | |
| 1663 | /(?<=x)../g,match_invalid_utf |
| 1664 | abcd\x{80}wxzy\x80pqrs |
| 1665 | 0: zy |
| 1666 | abcd\x{80}wxzy\x80xpqrs |
| 1667 | 0: zy |
| 1668 | 0: pq |
| 1669 | |
| 1670 | /X$/match_invalid_utf |
| 1671 | \= Expect no match |
| 1672 | X\xc4 |
| 1673 | No match |
| 1674 | |
| 1675 | /(?<=..)X/match_invalid_utf,aftertext |
| 1676 | AB\x80AQXYZ |
| 1677 | 0: X |
| 1678 | 0+ YZ |
| 1679 | AB\x80AQXYZ\=offset=5 |
| 1680 | 0: X |
| 1681 | 0+ YZ |
| 1682 | AB\x80\x80AXYZXC\=offset=5 |
| 1683 | 0: X |
| 1684 | 0+ C |
| 1685 | \= Expect no match |
| 1686 | AB\x80XYZ |
| 1687 | No match |
| 1688 | AB\x80XYZ\=offset=3 |
| 1689 | No match |
| 1690 | AB\xfeXYZ |
| 1691 | No match |
| 1692 | AB\xffXYZ\=offset=3 |
| 1693 | No match |
| 1694 | AB\x80AXYZ |
| 1695 | No match |
| 1696 | AB\x80AXYZ\=offset=4 |
| 1697 | No match |
| 1698 | AB\x80\x80AXYZ\=offset=5 |
| 1699 | No match |
| 1700 | |
| 1701 | /.../match_invalid_utf |
| 1702 | AB\xc4CCC |
| 1703 | 0: CCC |
| 1704 | \= Expect no match |
| 1705 | A\x{d800}B |
| 1706 | No match |
| 1707 | A\x{110000}B |
| 1708 | No match |
| 1709 | A\xc4B |
| 1710 | No match |
| 1711 | |
| 1712 | /\bX/match_invalid_utf |
| 1713 | A\x80X |
| 1714 | 0: X |
| 1715 | |
| 1716 | /\BX/match_invalid_utf |
| 1717 | \= Expect no match |
| 1718 | A\x80X |
| 1719 | No match |
| 1720 | |
| 1721 | /(?<=...)X/match_invalid_utf |
| 1722 | AAA\x80BBBXYZ |
| 1723 | 0: X |
| 1724 | \= Expect no match |
| 1725 | AAA\x80BXYZ |
| 1726 | No match |
| 1727 | AAA\x80BBXYZ |
| 1728 | No match |
| 1729 | |
| 1730 | # ------------------------------------- |
| 1731 | |
| 1732 | /(*UTF)(?=\x{123})/I |
| 1733 | Capture group count = 0 |
| 1734 | May match empty string |
| 1735 | Compile options: <none> |
| 1736 | Overall options: utf |
| 1737 | First code unit = \xc4 |
| 1738 | Last code unit = \xa3 |
| 1739 | Subject length lower bound = 1 |
| 1740 | |
| 1741 | /[\x{c1}\x{e1}]X[\x{145}\x{146}]/I,utf |
| 1742 | Capture group count = 0 |
| 1743 | Options: utf |
| 1744 | Starting code units: \xc3 |
| 1745 | Last code unit = 'X' |
| 1746 | Subject length lower bound = 3 |
| 1747 | |
| 1748 | /[ó¿¾,]/BI,utf |
| 1749 | ------------------------------------------------------------------ |
| 1750 | Bra |
| 1751 | [,\x{fff9f}] |
| 1752 | Ket |
| 1753 | End |
| 1754 | ------------------------------------------------------------------ |
| 1755 | Capture group count = 0 |
| 1756 | Options: utf |
| 1757 | Starting code units: , \xf3 |
| 1758 | Subject length lower bound = 1 |
| 1759 | |
| 1760 | /[\x{fff4}-\x{ffff8}]/I,utf |
| 1761 | Capture group count = 0 |
| 1762 | Options: utf |
| 1763 | Starting code units: \xef \xf0 \xf1 \xf2 \xf3 |
| 1764 | Subject length lower bound = 1 |
| 1765 | |
| 1766 | /[\x{fff4}-\x{afff8}\x{10ffff}]/I,utf |
| 1767 | Capture group count = 0 |
| 1768 | Options: utf |
| 1769 | Starting code units: \xef \xf0 \xf1 \xf2 \xf4 |
| 1770 | Subject length lower bound = 1 |
| 1771 | |
| 1772 | /[\xff\x{ffff}]/I,utf |
| 1773 | Capture group count = 0 |
| 1774 | Options: utf |
| 1775 | Starting code units: \xc3 \xef |
| 1776 | Subject length lower bound = 1 |
| 1777 | |
| 1778 | /[\xff\x{ff}]/I,utf |
| 1779 | Capture group count = 0 |
| 1780 | Options: utf |
| 1781 | Starting code units: \xc3 |
| 1782 | Subject length lower bound = 1 |
| 1783 | abc\x{ff}def |
| 1784 | 0: \x{ff} |
| 1785 | |
| 1786 | /[\xff\x{ff}]/I |
| 1787 | Capture group count = 0 |
| 1788 | First code unit = \xff |
| 1789 | Subject length lower bound = 1 |
| 1790 | abc\x{ff}def |
| 1791 | 0: \xff |
| 1792 | |
| 1793 | /[Ss]/I |
| 1794 | Capture group count = 0 |
| 1795 | First code unit = 'S' (caseless) |
| 1796 | Subject length lower bound = 1 |
| 1797 | |
| 1798 | /[Ss]/I,utf |
| 1799 | Capture group count = 0 |
| 1800 | Options: utf |
| 1801 | Starting code units: S s |
| 1802 | Subject length lower bound = 1 |
| 1803 | |
| 1804 | /(?:\x{ff}|\x{3000})/I,utf |
| 1805 | Capture group count = 0 |
| 1806 | Options: utf |
| 1807 | Starting code units: \xc3 \xe3 |
| 1808 | Subject length lower bound = 1 |
| 1809 | |
| 1810 | /x/utf |
| 1811 | abxyz |
| 1812 | 0: x |
| 1813 | \x80\=startchar |
| 1814 | Failed: error -22: UTF-8 error: isolated byte with 0x80 bit set at offset 0 |
| 1815 | abc\x80\=startchar |
| 1816 | Failed: error -22: UTF-8 error: isolated byte with 0x80 bit set at offset 3 |
| 1817 | abc\x80\=startchar,offset=3 |
| 1818 | Error -36 (bad UTF-8 offset) |
| 1819 | |
| 1820 | /\x{c1}+\x{e1}/iIB,ucp |
| 1821 | ------------------------------------------------------------------ |
| 1822 | Bra |
| 1823 | /i \x{c1}+ |
| 1824 | /i \x{e1} |
| 1825 | Ket |
| 1826 | End |
| 1827 | ------------------------------------------------------------------ |
| 1828 | Capture group count = 0 |
| 1829 | Options: caseless ucp |
| 1830 | First code unit = \xc1 (caseless) |
| 1831 | Last code unit = \xe1 (caseless) |
| 1832 | Subject length lower bound = 2 |
| 1833 | \x{c1}\x{c1}\x{c1} |
| 1834 | 0: \xc1\xc1\xc1 |
| 1835 | \x{e1}\x{e1}\x{e1} |
| 1836 | 0: \xe1\xe1\xe1 |
| 1837 | |
| 1838 | /a|\x{c1}/iI,ucp |
| 1839 | Capture group count = 0 |
| 1840 | Options: caseless ucp |
| 1841 | Starting code units: A a \xc1 \xe1 |
| 1842 | Subject length lower bound = 1 |
| 1843 | \x{e1}xxx |
| 1844 | 0: \xe1 |
| 1845 | |
| 1846 | /a|\x{c1}/iI,utf |
| 1847 | Capture group count = 0 |
| 1848 | Options: caseless utf |
| 1849 | Starting code units: A a \xc3 |
| 1850 | Subject length lower bound = 1 |
| 1851 | \x{e1}xxx |
| 1852 | 0: \x{e1} |
| 1853 | |
| 1854 | /\x{c1}|\x{e1}/iI,ucp |
| 1855 | Capture group count = 0 |
| 1856 | Options: caseless ucp |
| 1857 | First code unit = \xc1 (caseless) |
| 1858 | Subject length lower bound = 1 |
| 1859 | |
| 1860 | /X(\x{e1})Y/ucp,replace=>\U$1<,substitute_extended |
| 1861 | X\x{e1}Y |
| 1862 | 1: >\xc1< |
| 1863 | |
| 1864 | /X(\x{e1})Y/i,ucp,replace=>\L$1<,substitute_extended |
| 1865 | X\x{c1}Y |
| 1866 | 1: >\xe1< |
| 1867 | |
| 1868 | # Without UTF or UCP characters > 127 have only one case in the default locale. |
| 1869 | |
| 1870 | /X(\x{e1})Y/replace=>\U$1<,substitute_extended |
| 1871 | X\x{e1}Y |
| 1872 | 1: >\xe1< |
| 1873 | |
Elliott Hughes | 3435c42 | 2020-12-04 13:18:28 -0800 | [diff] [blame] | 1874 | /A/utf,match_invalid_utf,caseless |
| 1875 | \xe5A |
| 1876 | 0: A |
| 1877 | |
| 1878 | /\bch\b/utf,match_invalid_utf |
| 1879 | qchq\=ph |
| 1880 | Partial match: |
| 1881 | qchq\=ps |
| 1882 | Partial match: |
| 1883 | |
Janis Danisevskis | 112c9cc | 2016-03-31 13:35:25 +0100 | [diff] [blame] | 1884 | # End of testinput10 |