Elliott Hughes | 4e19c8e | 2022-04-15 15:11:02 -0700 | [diff] [blame] | 1 | .TH PCRE2SYNTAX 3 "12 January 2022" "PCRE2 10.40" |
Elliott Hughes | 5b80804 | 2021-10-01 10:56:10 -0700 | [diff] [blame] | 2 | .SH NAME |
| 3 | PCRE2 - Perl-compatible regular expressions (revised API) |
| 4 | .SH "PCRE2 REGULAR EXPRESSION SYNTAX SUMMARY" |
| 5 | .rs |
| 6 | .sp |
| 7 | The full syntax and semantics of the regular expressions that are supported by |
| 8 | PCRE2 are described in the |
| 9 | .\" HREF |
| 10 | \fBpcre2pattern\fP |
| 11 | .\" |
| 12 | documentation. This document contains a quick-reference summary of the syntax. |
| 13 | . |
| 14 | . |
| 15 | .SH "QUOTING" |
| 16 | .rs |
| 17 | .sp |
| 18 | \ex where x is non-alphanumeric is a literal x |
| 19 | \eQ...\eE treat enclosed characters as literal |
| 20 | . |
| 21 | . |
| 22 | .SH "ESCAPED CHARACTERS" |
| 23 | .rs |
| 24 | .sp |
| 25 | This table applies to ASCII and Unicode environments. An unrecognized escape |
| 26 | sequence causes an error. |
| 27 | .sp |
| 28 | \ea alarm, that is, the BEL character (hex 07) |
| 29 | \ecx "control-x", where x is any ASCII printing character |
| 30 | \ee escape (hex 1B) |
| 31 | \ef form feed (hex 0C) |
| 32 | \en newline (hex 0A) |
| 33 | \er carriage return (hex 0D) |
| 34 | \et tab (hex 09) |
| 35 | \e0dd character with octal code 0dd |
| 36 | \eddd character with octal code ddd, or backreference |
| 37 | \eo{ddd..} character with octal code ddd.. |
| 38 | \eN{U+hh..} character with Unicode code point hh.. (Unicode mode only) |
| 39 | \exhh character with hex code hh |
| 40 | \ex{hh..} character with hex code hh.. |
| 41 | .sp |
| 42 | If PCRE2_ALT_BSUX or PCRE2_EXTRA_ALT_BSUX is set ("ALT_BSUX mode"), the |
| 43 | following are also recognized: |
| 44 | .sp |
| 45 | \eU the character "U" |
| 46 | \euhhhh character with hex code hhhh |
| 47 | \eu{hh..} character with hex code hh.. but only for EXTRA_ALT_BSUX |
| 48 | .sp |
| 49 | When \ex is not followed by {, from zero to two hexadecimal digits are read, |
| 50 | but in ALT_BSUX mode \ex must be followed by two hexadecimal digits to be |
| 51 | recognized as a hexadecimal escape; otherwise it matches a literal "x". |
| 52 | Likewise, if \eu (in ALT_BSUX mode) is not followed by four hexadecimal digits |
| 53 | or (in EXTRA_ALT_BSUX mode) a sequence of hex digits in curly brackets, it |
| 54 | matches a literal "u". |
| 55 | .P |
| 56 | Note that \e0dd is always an octal code. The treatment of backslash followed by |
| 57 | a non-zero digit is complicated; for details see the section |
| 58 | .\" HTML <a href="pcre2pattern.html#digitsafterbackslash"> |
| 59 | .\" </a> |
| 60 | "Non-printing characters" |
| 61 | .\" |
| 62 | in the |
| 63 | .\" HREF |
| 64 | \fBpcre2pattern\fP |
| 65 | .\" |
| 66 | documentation, where details of escape processing in EBCDIC environments are |
| 67 | also given. \eN{U+hh..} is synonymous with \ex{hh..} in PCRE2 but is not |
| 68 | supported in EBCDIC environments. Note that \eN not followed by an opening |
| 69 | curly bracket has a different meaning (see below). |
| 70 | . |
| 71 | . |
| 72 | .SH "CHARACTER TYPES" |
| 73 | .rs |
| 74 | .sp |
| 75 | . any character except newline; |
| 76 | in dotall mode, any character whatsoever |
| 77 | \eC one code unit, even in UTF mode (best avoided) |
| 78 | \ed a decimal digit |
| 79 | \eD a character that is not a decimal digit |
| 80 | \eh a horizontal white space character |
| 81 | \eH a character that is not a horizontal white space character |
| 82 | \eN a character that is not a newline |
| 83 | \ep{\fIxx\fP} a character with the \fIxx\fP property |
| 84 | \eP{\fIxx\fP} a character without the \fIxx\fP property |
| 85 | \eR a newline sequence |
| 86 | \es a white space character |
| 87 | \eS a character that is not a white space character |
| 88 | \ev a vertical white space character |
| 89 | \eV a character that is not a vertical white space character |
| 90 | \ew a "word" character |
| 91 | \eW a "non-word" character |
| 92 | \eX a Unicode extended grapheme cluster |
| 93 | .sp |
| 94 | \eC is dangerous because it may leave the current matching point in the middle |
| 95 | of a UTF-8 or UTF-16 character. The application can lock out the use of \eC by |
| 96 | setting the PCRE2_NEVER_BACKSLASH_C option. It is also possible to build PCRE2 |
| 97 | with the use of \eC permanently disabled. |
| 98 | .P |
| 99 | By default, \ed, \es, and \ew match only ASCII characters, even in UTF-8 mode |
| 100 | or in the 16-bit and 32-bit libraries. However, if locale-specific matching is |
| 101 | happening, \es and \ew may also match characters with code points in the range |
| 102 | 128-255. If the PCRE2_UCP option is set, the behaviour of these escape |
| 103 | sequences is changed to use Unicode properties and they match many more |
| 104 | characters. |
Elliott Hughes | 4e19c8e | 2022-04-15 15:11:02 -0700 | [diff] [blame] | 105 | .P |
| 106 | Property descriptions in \ep and \eP are matched caselessly; hyphens, |
| 107 | underscores, and white space are ignored, in accordance with Unicode's "loose |
| 108 | matching" rules. |
Elliott Hughes | 5b80804 | 2021-10-01 10:56:10 -0700 | [diff] [blame] | 109 | . |
| 110 | . |
| 111 | .SH "GENERAL CATEGORY PROPERTIES FOR \ep and \eP" |
| 112 | .rs |
| 113 | .sp |
| 114 | C Other |
| 115 | Cc Control |
| 116 | Cf Format |
| 117 | Cn Unassigned |
| 118 | Co Private use |
| 119 | Cs Surrogate |
| 120 | .sp |
| 121 | L Letter |
| 122 | Ll Lower case letter |
| 123 | Lm Modifier letter |
| 124 | Lo Other letter |
| 125 | Lt Title case letter |
| 126 | Lu Upper case letter |
Elliott Hughes | 4e19c8e | 2022-04-15 15:11:02 -0700 | [diff] [blame] | 127 | Lc Ll, Lu, or Lt |
Elliott Hughes | 5b80804 | 2021-10-01 10:56:10 -0700 | [diff] [blame] | 128 | L& Ll, Lu, or Lt |
| 129 | .sp |
| 130 | M Mark |
| 131 | Mc Spacing mark |
| 132 | Me Enclosing mark |
| 133 | Mn Non-spacing mark |
| 134 | .sp |
| 135 | N Number |
| 136 | Nd Decimal number |
| 137 | Nl Letter number |
| 138 | No Other number |
| 139 | .sp |
| 140 | P Punctuation |
| 141 | Pc Connector punctuation |
| 142 | Pd Dash punctuation |
| 143 | Pe Close punctuation |
| 144 | Pf Final punctuation |
| 145 | Pi Initial punctuation |
| 146 | Po Other punctuation |
| 147 | Ps Open punctuation |
| 148 | .sp |
| 149 | S Symbol |
| 150 | Sc Currency symbol |
| 151 | Sk Modifier symbol |
| 152 | Sm Mathematical symbol |
| 153 | So Other symbol |
| 154 | .sp |
| 155 | Z Separator |
| 156 | Zl Line separator |
| 157 | Zp Paragraph separator |
| 158 | Zs Space separator |
| 159 | . |
| 160 | . |
| 161 | .SH "PCRE2 SPECIAL CATEGORY PROPERTIES FOR \ep and \eP" |
| 162 | .rs |
| 163 | .sp |
| 164 | Xan Alphanumeric: union of properties L and N |
| 165 | Xps POSIX space: property Z or tab, NL, VT, FF, CR |
| 166 | Xsp Perl space: property Z or tab, NL, VT, FF, CR |
| 167 | Xuc Univerally-named character: one that can be |
| 168 | represented by a Universal Character Name |
| 169 | Xwd Perl word: property Xan or underscore |
| 170 | .sp |
| 171 | Perl and POSIX space are now the same. Perl added VT to its space character set |
| 172 | at release 5.18. |
| 173 | . |
| 174 | . |
Elliott Hughes | 4e19c8e | 2022-04-15 15:11:02 -0700 | [diff] [blame] | 175 | .SH "BINARY PROPERTIES FOR \ep AND \eP" |
Elliott Hughes | 5b80804 | 2021-10-01 10:56:10 -0700 | [diff] [blame] | 176 | .rs |
| 177 | .sp |
Elliott Hughes | 4e19c8e | 2022-04-15 15:11:02 -0700 | [diff] [blame] | 178 | Unicode defines a number of binary properties, that is, properties whose only |
| 179 | values are true or false. You can obtain a list of those that are recognized by |
| 180 | \ep and \eP, along with their abbreviations, by running this command: |
| 181 | .sp |
| 182 | pcre2test -LP |
| 183 | . |
| 184 | . |
| 185 | . |
| 186 | .SH "SCRIPT MATCHING WITH \ep AND \eP" |
| 187 | .rs |
| 188 | .sp |
| 189 | Many script names and their 4-letter abbreviations are recognized in |
| 190 | \ep{sc:...} or \ep{scx:...} items, or on their own with \ep (and also \eP of |
| 191 | course). You can obtain a list of these scripts by running this command: |
| 192 | .sp |
| 193 | pcre2test -LS |
| 194 | . |
| 195 | . |
| 196 | . |
| 197 | .SH "THE BIDI_CLASS PROPERTY FOR \ep AND \eP" |
| 198 | .rs |
| 199 | .sp |
| 200 | \ep{Bidi_Class:<class>} matches a character with the given class |
| 201 | \ep{BC:<class>} matches a character with the given class |
| 202 | .sp |
| 203 | The recognized classes are: |
| 204 | .sp |
| 205 | AL Arabic letter |
| 206 | AN Arabic number |
| 207 | B paragraph separator |
| 208 | BN boundary neutral |
| 209 | CS common separator |
| 210 | EN European number |
| 211 | ES European separator |
| 212 | ET European terminator |
| 213 | FSI first strong isolate |
| 214 | L left-to-right |
| 215 | LRE left-to-right embedding |
| 216 | LRI left-to-right isolate |
| 217 | LRO left-to-right override |
| 218 | NSM non-spacing mark |
| 219 | ON other neutral |
| 220 | PDF pop directional format |
| 221 | PDI pop directional isolate |
| 222 | R right-to-left |
| 223 | RLE right-to-left embedding |
| 224 | RLI right-to-left isolate |
| 225 | RLO right-to-left override |
| 226 | S segment separator |
| 227 | WS which space |
Elliott Hughes | 5b80804 | 2021-10-01 10:56:10 -0700 | [diff] [blame] | 228 | . |
| 229 | . |
| 230 | .SH "CHARACTER CLASSES" |
| 231 | .rs |
| 232 | .sp |
| 233 | [...] positive character class |
| 234 | [^...] negative character class |
| 235 | [x-y] range (can be used for hex characters) |
| 236 | [[:xxx:]] positive POSIX named set |
| 237 | [[:^xxx:]] negative POSIX named set |
| 238 | .sp |
| 239 | alnum alphanumeric |
| 240 | alpha alphabetic |
| 241 | ascii 0-127 |
| 242 | blank space or tab |
| 243 | cntrl control character |
| 244 | digit decimal digit |
| 245 | graph printing, excluding space |
| 246 | lower lower case letter |
| 247 | print printing, including space |
| 248 | punct printing, excluding alphanumeric |
| 249 | space white space |
| 250 | upper upper case letter |
| 251 | word same as \ew |
| 252 | xdigit hexadecimal digit |
| 253 | .sp |
| 254 | In PCRE2, POSIX character set names recognize only ASCII characters by default, |
| 255 | but some of them use Unicode properties if PCRE2_UCP is set. You can use |
| 256 | \eQ...\eE inside a character class. |
| 257 | . |
| 258 | . |
| 259 | .SH "QUANTIFIERS" |
| 260 | .rs |
| 261 | .sp |
| 262 | ? 0 or 1, greedy |
| 263 | ?+ 0 or 1, possessive |
| 264 | ?? 0 or 1, lazy |
| 265 | * 0 or more, greedy |
| 266 | *+ 0 or more, possessive |
| 267 | *? 0 or more, lazy |
| 268 | + 1 or more, greedy |
| 269 | ++ 1 or more, possessive |
| 270 | +? 1 or more, lazy |
| 271 | {n} exactly n |
| 272 | {n,m} at least n, no more than m, greedy |
| 273 | {n,m}+ at least n, no more than m, possessive |
| 274 | {n,m}? at least n, no more than m, lazy |
| 275 | {n,} n or more, greedy |
| 276 | {n,}+ n or more, possessive |
| 277 | {n,}? n or more, lazy |
| 278 | . |
| 279 | . |
| 280 | .SH "ANCHORS AND SIMPLE ASSERTIONS" |
| 281 | .rs |
| 282 | .sp |
| 283 | \eb word boundary |
| 284 | \eB not a word boundary |
| 285 | ^ start of subject |
| 286 | also after an internal newline in multiline mode |
| 287 | (after any newline if PCRE2_ALT_CIRCUMFLEX is set) |
| 288 | \eA start of subject |
| 289 | $ end of subject |
| 290 | also before newline at end of subject |
| 291 | also before internal newline in multiline mode |
| 292 | \eZ end of subject |
| 293 | also before newline at end of subject |
| 294 | \ez end of subject |
| 295 | \eG first matching position in subject |
| 296 | . |
| 297 | . |
| 298 | .SH "REPORTED MATCH POINT SETTING" |
| 299 | .rs |
| 300 | .sp |
| 301 | \eK set reported start of match |
| 302 | .sp |
| 303 | From release 10.38 \eK is not permitted by default in lookaround assertions, |
| 304 | for compatibility with Perl. However, if the PCRE2_EXTRA_ALLOW_LOOKAROUND_BSK |
| 305 | option is set, the previous behaviour is re-enabled. When this option is set, |
| 306 | \eK is honoured in positive assertions, but ignored in negative ones. |
| 307 | . |
| 308 | . |
| 309 | .SH "ALTERNATION" |
| 310 | .rs |
| 311 | .sp |
| 312 | expr|expr|expr... |
| 313 | . |
| 314 | . |
| 315 | .SH "CAPTURING" |
| 316 | .rs |
| 317 | .sp |
| 318 | (...) capture group |
| 319 | (?<name>...) named capture group (Perl) |
| 320 | (?'name'...) named capture group (Perl) |
| 321 | (?P<name>...) named capture group (Python) |
| 322 | (?:...) non-capture group |
| 323 | (?|...) non-capture group; reset group numbers for |
| 324 | capture groups in each alternative |
| 325 | .sp |
| 326 | In non-UTF modes, names may contain underscores and ASCII letters and digits; |
| 327 | in UTF modes, any Unicode letters and Unicode decimal digits are permitted. In |
| 328 | both cases, a name must not start with a digit. |
| 329 | . |
| 330 | . |
| 331 | .SH "ATOMIC GROUPS" |
| 332 | .rs |
| 333 | .sp |
| 334 | (?>...) atomic non-capture group |
| 335 | (*atomic:...) atomic non-capture group |
| 336 | . |
| 337 | . |
| 338 | .SH "COMMENT" |
| 339 | .rs |
| 340 | .sp |
| 341 | (?#....) comment (not nestable) |
| 342 | . |
| 343 | . |
| 344 | .SH "OPTION SETTING" |
| 345 | .rs |
| 346 | Changes of these options within a group are automatically cancelled at the end |
| 347 | of the group. |
| 348 | .sp |
| 349 | (?i) caseless |
| 350 | (?J) allow duplicate named groups |
| 351 | (?m) multiline |
| 352 | (?n) no auto capture |
| 353 | (?s) single line (dotall) |
| 354 | (?U) default ungreedy (lazy) |
| 355 | (?x) extended: ignore white space except in classes |
| 356 | (?xx) as (?x) but also ignore space and tab in classes |
| 357 | (?-...) unset option(s) |
| 358 | (?^) unset imnsx options |
| 359 | .sp |
| 360 | Unsetting x or xx unsets both. Several options may be set at once, and a |
| 361 | mixture of setting and unsetting such as (?i-x) is allowed, but there may be |
| 362 | only one hyphen. Setting (but no unsetting) is allowed after (?^ for example |
| 363 | (?^in). An option setting may appear at the start of a non-capture group, for |
| 364 | example (?i:...). |
| 365 | .P |
| 366 | The following are recognized only at the very start of a pattern or after one |
| 367 | of the newline or \eR options with similar syntax. More than one of them may |
| 368 | appear. For the first three, d is a decimal number. |
| 369 | .sp |
| 370 | (*LIMIT_DEPTH=d) set the backtracking limit to d |
| 371 | (*LIMIT_HEAP=d) set the heap size limit to d * 1024 bytes |
| 372 | (*LIMIT_MATCH=d) set the match limit to d |
| 373 | (*NOTEMPTY) set PCRE2_NOTEMPTY when matching |
| 374 | (*NOTEMPTY_ATSTART) set PCRE2_NOTEMPTY_ATSTART when matching |
| 375 | (*NO_AUTO_POSSESS) no auto-possessification (PCRE2_NO_AUTO_POSSESS) |
| 376 | (*NO_DOTSTAR_ANCHOR) no .* anchoring (PCRE2_NO_DOTSTAR_ANCHOR) |
| 377 | (*NO_JIT) disable JIT optimization |
| 378 | (*NO_START_OPT) no start-match optimization (PCRE2_NO_START_OPTIMIZE) |
| 379 | (*UTF) set appropriate UTF mode for the library in use |
| 380 | (*UCP) set PCRE2_UCP (use Unicode properties for \ed etc) |
| 381 | .sp |
| 382 | Note that LIMIT_DEPTH, LIMIT_HEAP, and LIMIT_MATCH can only reduce the value of |
| 383 | the limits set by the caller of \fBpcre2_match()\fP or \fBpcre2_dfa_match()\fP, |
| 384 | not increase them. LIMIT_RECURSION is an obsolete synonym for LIMIT_DEPTH. The |
| 385 | application can lock out the use of (*UTF) and (*UCP) by setting the |
| 386 | PCRE2_NEVER_UTF or PCRE2_NEVER_UCP options, respectively, at compile time. |
| 387 | . |
| 388 | . |
| 389 | .SH "NEWLINE CONVENTION" |
| 390 | .rs |
| 391 | .sp |
| 392 | These are recognized only at the very start of the pattern or after option |
| 393 | settings with a similar syntax. |
| 394 | .sp |
| 395 | (*CR) carriage return only |
| 396 | (*LF) linefeed only |
| 397 | (*CRLF) carriage return followed by linefeed |
| 398 | (*ANYCRLF) all three of the above |
| 399 | (*ANY) any Unicode newline sequence |
| 400 | (*NUL) the NUL character (binary zero) |
| 401 | . |
| 402 | . |
| 403 | .SH "WHAT \eR MATCHES" |
| 404 | .rs |
| 405 | .sp |
| 406 | These are recognized only at the very start of the pattern or after option |
| 407 | setting with a similar syntax. |
| 408 | .sp |
| 409 | (*BSR_ANYCRLF) CR, LF, or CRLF |
| 410 | (*BSR_UNICODE) any Unicode newline sequence |
| 411 | . |
| 412 | . |
| 413 | .SH "LOOKAHEAD AND LOOKBEHIND ASSERTIONS" |
| 414 | .rs |
| 415 | .sp |
| 416 | (?=...) ) |
| 417 | (*pla:...) ) positive lookahead |
| 418 | (*positive_lookahead:...) ) |
| 419 | .sp |
| 420 | (?!...) ) |
| 421 | (*nla:...) ) negative lookahead |
| 422 | (*negative_lookahead:...) ) |
| 423 | .sp |
| 424 | (?<=...) ) |
| 425 | (*plb:...) ) positive lookbehind |
| 426 | (*positive_lookbehind:...) ) |
| 427 | .sp |
| 428 | (?<!...) ) |
| 429 | (*nlb:...) ) negative lookbehind |
| 430 | (*negative_lookbehind:...) ) |
| 431 | .sp |
| 432 | Each top-level branch of a lookbehind must be of a fixed length. |
| 433 | . |
| 434 | . |
| 435 | .SH "NON-ATOMIC LOOKAROUND ASSERTIONS" |
| 436 | .rs |
| 437 | .sp |
| 438 | These assertions are specific to PCRE2 and are not Perl-compatible. |
| 439 | .sp |
| 440 | (?*...) ) |
| 441 | (*napla:...) ) synonyms |
| 442 | (*non_atomic_positive_lookahead:...) ) |
| 443 | .sp |
| 444 | (?<*...) ) |
| 445 | (*naplb:...) ) synonyms |
| 446 | (*non_atomic_positive_lookbehind:...) ) |
| 447 | . |
| 448 | . |
| 449 | .SH "SCRIPT RUNS" |
| 450 | .rs |
| 451 | .sp |
| 452 | (*script_run:...) ) script run, can be backtracked into |
| 453 | (*sr:...) ) |
| 454 | .sp |
| 455 | (*atomic_script_run:...) ) atomic script run |
| 456 | (*asr:...) ) |
| 457 | . |
| 458 | . |
| 459 | .SH "BACKREFERENCES" |
| 460 | .rs |
| 461 | .sp |
| 462 | \en reference by number (can be ambiguous) |
| 463 | \egn reference by number |
| 464 | \eg{n} reference by number |
| 465 | \eg+n relative reference by number (PCRE2 extension) |
| 466 | \eg-n relative reference by number |
| 467 | \eg{+n} relative reference by number (PCRE2 extension) |
| 468 | \eg{-n} relative reference by number |
| 469 | \ek<name> reference by name (Perl) |
| 470 | \ek'name' reference by name (Perl) |
| 471 | \eg{name} reference by name (Perl) |
| 472 | \ek{name} reference by name (.NET) |
| 473 | (?P=name) reference by name (Python) |
| 474 | . |
| 475 | . |
| 476 | .SH "SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)" |
| 477 | .rs |
| 478 | .sp |
| 479 | (?R) recurse whole pattern |
| 480 | (?n) call subroutine by absolute number |
| 481 | (?+n) call subroutine by relative number |
| 482 | (?-n) call subroutine by relative number |
| 483 | (?&name) call subroutine by name (Perl) |
| 484 | (?P>name) call subroutine by name (Python) |
| 485 | \eg<name> call subroutine by name (Oniguruma) |
| 486 | \eg'name' call subroutine by name (Oniguruma) |
| 487 | \eg<n> call subroutine by absolute number (Oniguruma) |
| 488 | \eg'n' call subroutine by absolute number (Oniguruma) |
| 489 | \eg<+n> call subroutine by relative number (PCRE2 extension) |
| 490 | \eg'+n' call subroutine by relative number (PCRE2 extension) |
| 491 | \eg<-n> call subroutine by relative number (PCRE2 extension) |
| 492 | \eg'-n' call subroutine by relative number (PCRE2 extension) |
| 493 | . |
| 494 | . |
| 495 | .SH "CONDITIONAL PATTERNS" |
| 496 | .rs |
| 497 | .sp |
| 498 | (?(condition)yes-pattern) |
| 499 | (?(condition)yes-pattern|no-pattern) |
| 500 | .sp |
| 501 | (?(n) absolute reference condition |
| 502 | (?(+n) relative reference condition |
| 503 | (?(-n) relative reference condition |
| 504 | (?(<name>) named reference condition (Perl) |
| 505 | (?('name') named reference condition (Perl) |
| 506 | (?(name) named reference condition (PCRE2, deprecated) |
| 507 | (?(R) overall recursion condition |
| 508 | (?(Rn) specific numbered group recursion condition |
| 509 | (?(R&name) specific named group recursion condition |
| 510 | (?(DEFINE) define groups for reference |
| 511 | (?(VERSION[>]=n.m) test PCRE2 version |
| 512 | (?(assert) assertion condition |
| 513 | .sp |
| 514 | Note the ambiguity of (?(R) and (?(Rn) which might be named reference |
| 515 | conditions or recursion tests. Such a condition is interpreted as a reference |
| 516 | condition if the relevant named group exists. |
| 517 | . |
| 518 | . |
| 519 | .SH "BACKTRACKING CONTROL" |
| 520 | .rs |
| 521 | .sp |
| 522 | All backtracking control verbs may be in the form (*VERB:NAME). For (*MARK) the |
| 523 | name is mandatory, for the others it is optional. (*SKIP) changes its behaviour |
| 524 | if :NAME is present. The others just set a name for passing back to the caller, |
| 525 | but this is not a name that (*SKIP) can see. The following act immediately they |
| 526 | are reached: |
| 527 | .sp |
| 528 | (*ACCEPT) force successful match |
| 529 | (*FAIL) force backtrack; synonym (*F) |
| 530 | (*MARK:NAME) set name to be passed back; synonym (*:NAME) |
| 531 | .sp |
| 532 | The following act only when a subsequent match failure causes a backtrack to |
| 533 | reach them. They all force a match failure, but they differ in what happens |
| 534 | afterwards. Those that advance the start-of-match point do so only if the |
| 535 | pattern is not anchored. |
| 536 | .sp |
| 537 | (*COMMIT) overall failure, no advance of starting point |
| 538 | (*PRUNE) advance to next starting character |
| 539 | (*SKIP) advance to current matching position |
| 540 | (*SKIP:NAME) advance to position corresponding to an earlier |
| 541 | (*MARK:NAME); if not found, the (*SKIP) is ignored |
| 542 | (*THEN) local failure, backtrack to next alternation |
| 543 | .sp |
| 544 | The effect of one of these verbs in a group called as a subroutine is confined |
| 545 | to the subroutine call. |
| 546 | . |
| 547 | . |
| 548 | .SH "CALLOUTS" |
| 549 | .rs |
| 550 | .sp |
| 551 | (?C) callout (assumed number 0) |
| 552 | (?Cn) callout with numerical data n |
| 553 | (?C"text") callout with string data |
| 554 | .sp |
| 555 | The allowed string delimiters are ` ' " ^ % # $ (which are the same for the |
| 556 | start and the end), and the starting delimiter { matched with the ending |
| 557 | delimiter }. To encode the ending delimiter within the string, double it. |
| 558 | . |
| 559 | . |
| 560 | .SH "SEE ALSO" |
| 561 | .rs |
| 562 | .sp |
| 563 | \fBpcre2pattern\fP(3), \fBpcre2api\fP(3), \fBpcre2callout\fP(3), |
| 564 | \fBpcre2matching\fP(3), \fBpcre2\fP(3). |
| 565 | . |
| 566 | . |
| 567 | .SH AUTHOR |
| 568 | .rs |
| 569 | .sp |
| 570 | .nf |
| 571 | Philip Hazel |
| 572 | Retired from University Computing Service |
| 573 | Cambridge, England. |
| 574 | .fi |
| 575 | . |
| 576 | . |
| 577 | .SH REVISION |
| 578 | .rs |
| 579 | .sp |
| 580 | .nf |
Elliott Hughes | 4e19c8e | 2022-04-15 15:11:02 -0700 | [diff] [blame] | 581 | Last updated: 12 January 2022 |
| 582 | Copyright (c) 1997-2022 University of Cambridge. |
Elliott Hughes | 5b80804 | 2021-10-01 10:56:10 -0700 | [diff] [blame] | 583 | .fi |