Ian Hodson | 2ee91b4 | 2012-05-14 12:29:36 +0100 | [diff] [blame] | 1 | RE2 regular expression syntax reference |
| 2 | ------------------------------------- |
| 3 | |
| 4 | Single characters: |
Alexander Gutkin | 0d4c523 | 2013-02-28 13:47:27 +0000 | [diff] [blame] | 5 | . any character, possibly including newline (s=true) |
Ian Hodson | 2ee91b4 | 2012-05-14 12:29:36 +0100 | [diff] [blame] | 6 | [xyz] character class |
| 7 | [^xyz] negated character class |
| 8 | \d Perl character class |
| 9 | \D negated Perl character class |
| 10 | [:alpha:] ASCII character class |
| 11 | [:^alpha:] negated ASCII character class |
| 12 | \pN Unicode character class (one-letter name) |
| 13 | \p{Greek} Unicode character class |
| 14 | \PN negated Unicode character class (one-letter name) |
| 15 | \P{Greek} negated Unicode character class |
| 16 | |
| 17 | Composites: |
| 18 | xy «x» followed by «y» |
| 19 | x|y «x» or «y» (prefer «x») |
| 20 | |
| 21 | Repetitions: |
| 22 | x* zero or more «x», prefer more |
| 23 | x+ one or more «x», prefer more |
| 24 | x? zero or one «x», prefer one |
| 25 | x{n,m} «n» or «n»+1 or ... or «m» «x», prefer more |
| 26 | x{n,} «n» or more «x», prefer more |
| 27 | x{n} exactly «n» «x» |
| 28 | x*? zero or more «x», prefer fewer |
| 29 | x+? one or more «x», prefer fewer |
| 30 | x?? zero or one «x», prefer zero |
| 31 | x{n,m}? «n» or «n»+1 or ... or «m» «x», prefer fewer |
| 32 | x{n,}? «n» or more «x», prefer fewer |
| 33 | x{n}? exactly «n» «x» |
| 34 | x{} (== x*) NOT SUPPORTED vim |
| 35 | x{-} (== x*?) NOT SUPPORTED vim |
| 36 | x{-n} (== x{n}?) NOT SUPPORTED vim |
| 37 | x= (== x?) NOT SUPPORTED vim |
| 38 | |
| 39 | Possessive repetitions: |
| 40 | x*+ zero or more «x», possessive NOT SUPPORTED |
| 41 | x++ one or more «x», possessive NOT SUPPORTED |
| 42 | x?+ zero or one «x», possessive NOT SUPPORTED |
| 43 | x{n,m}+ «n» or ... or «m» «x», possessive NOT SUPPORTED |
| 44 | x{n,}+ «n» or more «x», possessive NOT SUPPORTED |
| 45 | x{n}+ exactly «n» «x», possessive NOT SUPPORTED |
| 46 | |
| 47 | Grouping: |
| 48 | (re) numbered capturing group |
| 49 | (?P<name>re) named & numbered capturing group |
| 50 | (?<name>re) named & numbered capturing group NOT SUPPORTED |
| 51 | (?'name're) named & numbered capturing group NOT SUPPORTED |
| 52 | (?:re) non-capturing group |
| 53 | (?flags) set flags within current group; non-capturing |
| 54 | (?flags:re) set flags during re; non-capturing |
| 55 | (?#text) comment NOT SUPPORTED |
| 56 | (?|x|y|z) branch numbering reset NOT SUPPORTED |
| 57 | (?>re) possessive match of «re» NOT SUPPORTED |
| 58 | re@> possessive match of «re» NOT SUPPORTED vim |
| 59 | %(re) non-capturing group NOT SUPPORTED vim |
| 60 | |
| 61 | Flags: |
| 62 | i case-insensitive (default false) |
Alexander Gutkin | 0d4c523 | 2013-02-28 13:47:27 +0000 | [diff] [blame] | 63 | m multi-line mode: «^» and «$» match begin/end line in addition to begin/end text (default false) |
Ian Hodson | 2ee91b4 | 2012-05-14 12:29:36 +0100 | [diff] [blame] | 64 | s let «.» match «\n» (default false) |
| 65 | U ungreedy: swap meaning of «x*» and «x*?», «x+» and «x+?», etc (default false) |
| 66 | Flag syntax is «xyz» (set) or «-xyz» (clear) or «xy-z» (set «xy», clear «z»). |
| 67 | |
| 68 | Empty strings: |
| 69 | ^ at beginning of text or line («m»=true) |
| 70 | $ at end of text (like «\z» not «\Z») or line («m»=true) |
| 71 | \A at beginning of text |
| 72 | \b at word boundary («\w» on one side and «\W», «\A», or «\z» on the other) |
| 73 | \B not a word boundary |
| 74 | \G at beginning of subtext being searched NOT SUPPORTED pcre |
| 75 | \G at end of last match NOT SUPPORTED perl |
| 76 | \Z at end of text, or before newline at end of text NOT SUPPORTED |
| 77 | \z at end of text |
| 78 | (?=re) before text matching «re» NOT SUPPORTED |
| 79 | (?!re) before text not matching «re» NOT SUPPORTED |
| 80 | (?<=re) after text matching «re» NOT SUPPORTED |
| 81 | (?<!re) after text not matching «re» NOT SUPPORTED |
| 82 | re& before text matching «re» NOT SUPPORTED vim |
| 83 | re@= before text matching «re» NOT SUPPORTED vim |
| 84 | re@! before text not matching «re» NOT SUPPORTED vim |
| 85 | re@<= after text matching «re» NOT SUPPORTED vim |
| 86 | re@<! after text not matching «re» NOT SUPPORTED vim |
| 87 | \zs sets start of match (= \K) NOT SUPPORTED vim |
| 88 | \ze sets end of match NOT SUPPORTED vim |
| 89 | \%^ beginning of file NOT SUPPORTED vim |
| 90 | \%$ end of file NOT SUPPORTED vim |
| 91 | \%V on screen NOT SUPPORTED vim |
| 92 | \%# cursor position NOT SUPPORTED vim |
| 93 | \%'m mark «m» position NOT SUPPORTED vim |
| 94 | \%23l in line 23 NOT SUPPORTED vim |
| 95 | \%23c in column 23 NOT SUPPORTED vim |
| 96 | \%23v in virtual column 23 NOT SUPPORTED vim |
| 97 | |
| 98 | Escape sequences: |
| 99 | \a bell (== \007) |
| 100 | \f form feed (== \014) |
| 101 | \t horizontal tab (== \011) |
| 102 | \n newline (== \012) |
| 103 | \r carriage return (== \015) |
| 104 | \v vertical tab character (== \013) |
| 105 | \* literal «*», for any punctuation character «*» |
| 106 | \123 octal character code (up to three digits) |
| 107 | \x7F hex character code (exactly two digits) |
| 108 | \x{10FFFF} hex character code |
| 109 | \C match a single byte even in UTF-8 mode |
| 110 | \Q...\E literal text «...» even if «...» has punctuation |
| 111 | |
| 112 | \1 backreference NOT SUPPORTED |
| 113 | \b backspace NOT SUPPORTED (use «\010») |
| 114 | \cK control char ^K NOT SUPPORTED (use «\001» etc) |
| 115 | \e escape NOT SUPPORTED (use «\033») |
| 116 | \g1 backreference NOT SUPPORTED |
| 117 | \g{1} backreference NOT SUPPORTED |
| 118 | \g{+1} backreference NOT SUPPORTED |
| 119 | \g{-1} backreference NOT SUPPORTED |
| 120 | \g{name} named backreference NOT SUPPORTED |
| 121 | \g<name> subroutine call NOT SUPPORTED |
| 122 | \g'name' subroutine call NOT SUPPORTED |
| 123 | \k<name> named backreference NOT SUPPORTED |
| 124 | \k'name' named backreference NOT SUPPORTED |
| 125 | \lX lowercase «X» NOT SUPPORTED |
| 126 | \ux uppercase «x» NOT SUPPORTED |
| 127 | \L...\E lowercase text «...» NOT SUPPORTED |
| 128 | \K reset beginning of «$0» NOT SUPPORTED |
| 129 | \N{name} named Unicode character NOT SUPPORTED |
| 130 | \R line break NOT SUPPORTED |
| 131 | \U...\E upper case text «...» NOT SUPPORTED |
| 132 | \X extended Unicode sequence NOT SUPPORTED |
| 133 | |
| 134 | \%d123 decimal character 123 NOT SUPPORTED vim |
| 135 | \%xFF hex character FF NOT SUPPORTED vim |
| 136 | \%o123 octal character 123 NOT SUPPORTED vim |
| 137 | \%u1234 Unicode character 0x1234 NOT SUPPORTED vim |
| 138 | \%U12345678 Unicode character 0x12345678 NOT SUPPORTED vim |
| 139 | |
| 140 | Character class elements: |
| 141 | x single character |
| 142 | A-Z character range (inclusive) |
| 143 | \d Perl character class |
| 144 | [:foo:] ASCII character class «foo» |
| 145 | \p{Foo} Unicode character class «Foo» |
| 146 | \pF Unicode character class «F» (one-letter name) |
| 147 | |
| 148 | Named character classes as character class elements: |
| 149 | [\d] digits (== \d) |
| 150 | [^\d] not digits (== \D) |
| 151 | [\D] not digits (== \D) |
| 152 | [^\D] not not digits (== \d) |
| 153 | [[:name:]] named ASCII class inside character class (== [:name:]) |
| 154 | [^[:name:]] named ASCII class inside negated character class (== [:^name:]) |
| 155 | [\p{Name}] named Unicode property inside character class (== \p{Name}) |
| 156 | [^\p{Name}] named Unicode property inside negated character class (== \P{Name}) |
| 157 | |
| 158 | Perl character classes: |
| 159 | \d digits (== [0-9]) |
| 160 | \D not digits (== [^0-9]) |
| 161 | \s whitespace (== [\t\n\f\r ]) |
| 162 | \S not whitespace (== [^\t\n\f\r ]) |
| 163 | \w word characters (== [0-9A-Za-z_]) |
| 164 | \W not word characters (== [^0-9A-Za-z_]) |
| 165 | |
| 166 | \h horizontal space NOT SUPPORTED |
| 167 | \H not horizontal space NOT SUPPORTED |
| 168 | \v vertical space NOT SUPPORTED |
| 169 | \V not vertical space NOT SUPPORTED |
| 170 | |
| 171 | ASCII character classes: |
| 172 | [:alnum:] alphanumeric (== [0-9A-Za-z]) |
| 173 | [:alpha:] alphabetic (== [A-Za-z]) |
| 174 | [:ascii:] ASCII (== [\x00-\x7F]) |
| 175 | [:blank:] blank (== [\t ]) |
| 176 | [:cntrl:] control (== [\x00-\x1F\x7F]) |
| 177 | [:digit:] digits (== [0-9]) |
| 178 | [:graph:] graphical (== [!-~] == [A-Za-z0-9!"#$%&'()*+,\-./:;<=>?@[\\\]^_`{|}~]) |
| 179 | [:lower:] lower case (== [a-z]) |
| 180 | [:print:] printable (== [ -~] == [ [:graph:]]) |
| 181 | [:punct:] punctuation (== [!-/:-@[-`{-~]) |
| 182 | [:space:] whitespace (== [\t\n\v\f\r ]) |
| 183 | [:upper:] upper case (== [A-Z]) |
| 184 | [:word:] word characters (== [0-9A-Za-z_]) |
| 185 | [:xdigit:] hex digit (== [0-9A-Fa-f]) |
| 186 | |
| 187 | Unicode character class names--general category: |
| 188 | C other |
| 189 | Cc control |
| 190 | Cf format |
| 191 | Cn unassigned code points NOT SUPPORTED |
| 192 | Co private use |
| 193 | Cs surrogate |
| 194 | L letter |
| 195 | LC cased letter NOT SUPPORTED |
| 196 | L& cased letter NOT SUPPORTED |
| 197 | Ll lowercase letter |
| 198 | Lm modifier letter |
| 199 | Lo other letter |
| 200 | Lt titlecase letter |
| 201 | Lu uppercase letter |
| 202 | M mark |
| 203 | Mc spacing mark |
| 204 | Me enclosing mark |
| 205 | Mn non-spacing mark |
| 206 | N number |
| 207 | Nd decimal number |
| 208 | Nl letter number |
| 209 | No other number |
| 210 | P punctuation |
| 211 | Pc connector punctuation |
| 212 | Pd dash punctuation |
| 213 | Pe close punctuation |
| 214 | Pf final punctuation |
| 215 | Pi initial punctuation |
| 216 | Po other punctuation |
| 217 | Ps open punctuation |
| 218 | S symbol |
| 219 | Sc currency symbol |
| 220 | Sk modifier symbol |
| 221 | Sm math symbol |
| 222 | So other symbol |
| 223 | Z separator |
| 224 | Zl line separator |
| 225 | Zp paragraph separator |
| 226 | Zs space separator |
| 227 | |
| 228 | Unicode character class names--scripts: |
| 229 | Arabic Arabic |
| 230 | Armenian Armenian |
| 231 | Balinese Balinese |
| 232 | Bengali Bengali |
| 233 | Bopomofo Bopomofo |
| 234 | Braille Braille |
| 235 | Buginese Buginese |
| 236 | Buhid Buhid |
| 237 | Canadian_Aboriginal Canadian Aboriginal |
| 238 | Carian Carian |
| 239 | Cham Cham |
| 240 | Cherokee Cherokee |
| 241 | Common characters not specific to one script |
| 242 | Coptic Coptic |
| 243 | Cuneiform Cuneiform |
| 244 | Cypriot Cypriot |
| 245 | Cyrillic Cyrillic |
| 246 | Deseret Deseret |
| 247 | Devanagari Devanagari |
| 248 | Ethiopic Ethiopic |
| 249 | Georgian Georgian |
| 250 | Glagolitic Glagolitic |
| 251 | Gothic Gothic |
| 252 | Greek Greek |
| 253 | Gujarati Gujarati |
| 254 | Gurmukhi Gurmukhi |
| 255 | Han Han |
| 256 | Hangul Hangul |
| 257 | Hanunoo Hanunoo |
| 258 | Hebrew Hebrew |
| 259 | Hiragana Hiragana |
| 260 | Inherited inherit script from previous character |
| 261 | Kannada Kannada |
| 262 | Katakana Katakana |
| 263 | Kayah_Li Kayah Li |
| 264 | Kharoshthi Kharoshthi |
| 265 | Khmer Khmer |
| 266 | Lao Lao |
| 267 | Latin Latin |
| 268 | Lepcha Lepcha |
| 269 | Limbu Limbu |
| 270 | Linear_B Linear B |
| 271 | Lycian Lycian |
| 272 | Lydian Lydian |
| 273 | Malayalam Malayalam |
| 274 | Mongolian Mongolian |
| 275 | Myanmar Myanmar |
| 276 | New_Tai_Lue New Tai Lue (aka Simplified Tai Lue) |
| 277 | Nko Nko |
| 278 | Ogham Ogham |
| 279 | Ol_Chiki Ol Chiki |
| 280 | Old_Italic Old Italic |
| 281 | Old_Persian Old Persian |
| 282 | Oriya Oriya |
| 283 | Osmanya Osmanya |
| 284 | Phags_Pa 'Phags Pa |
| 285 | Phoenician Phoenician |
| 286 | Rejang Rejang |
| 287 | Runic Runic |
| 288 | Saurashtra Saurashtra |
| 289 | Shavian Shavian |
| 290 | Sinhala Sinhala |
| 291 | Sundanese Sundanese |
| 292 | Syloti_Nagri Syloti Nagri |
| 293 | Syriac Syriac |
| 294 | Tagalog Tagalog |
| 295 | Tagbanwa Tagbanwa |
| 296 | Tai_Le Tai Le |
| 297 | Tamil Tamil |
| 298 | Telugu Telugu |
| 299 | Thaana Thaana |
| 300 | Thai Thai |
| 301 | Tibetan Tibetan |
| 302 | Tifinagh Tifinagh |
| 303 | Ugaritic Ugaritic |
| 304 | Vai Vai |
| 305 | Yi Yi |
| 306 | |
| 307 | Vim character classes: |
| 308 | \i identifier character NOT SUPPORTED vim |
| 309 | \I «\i» except digits NOT SUPPORTED vim |
| 310 | \k keyword character NOT SUPPORTED vim |
| 311 | \K «\k» except digits NOT SUPPORTED vim |
| 312 | \f file name character NOT SUPPORTED vim |
| 313 | \F «\f» except digits NOT SUPPORTED vim |
| 314 | \p printable character NOT SUPPORTED vim |
| 315 | \P «\p» except digits NOT SUPPORTED vim |
| 316 | \s whitespace character (== [ \t]) NOT SUPPORTED vim |
| 317 | \S non-white space character (== [^ \t]) NOT SUPPORTED vim |
| 318 | \d digits (== [0-9]) vim |
| 319 | \D not «\d» vim |
| 320 | \x hex digits (== [0-9A-Fa-f]) NOT SUPPORTED vim |
| 321 | \X not «\x» NOT SUPPORTED vim |
| 322 | \o octal digits (== [0-7]) NOT SUPPORTED vim |
| 323 | \O not «\o» NOT SUPPORTED vim |
| 324 | \w word character vim |
| 325 | \W not «\w» vim |
| 326 | \h head of word character NOT SUPPORTED vim |
| 327 | \H not «\h» NOT SUPPORTED vim |
| 328 | \a alphabetic NOT SUPPORTED vim |
| 329 | \A not «\a» NOT SUPPORTED vim |
| 330 | \l lowercase NOT SUPPORTED vim |
| 331 | \L not lowercase NOT SUPPORTED vim |
| 332 | \u uppercase NOT SUPPORTED vim |
| 333 | \U not uppercase NOT SUPPORTED vim |
| 334 | \_x «\x» plus newline, for any «x» NOT SUPPORTED vim |
| 335 | |
| 336 | Vim flags: |
| 337 | \c ignore case NOT SUPPORTED vim |
| 338 | \C match case NOT SUPPORTED vim |
| 339 | \m magic NOT SUPPORTED vim |
| 340 | \M nomagic NOT SUPPORTED vim |
| 341 | \v verymagic NOT SUPPORTED vim |
| 342 | \V verynomagic NOT SUPPORTED vim |
| 343 | \Z ignore differences in Unicode combining characters NOT SUPPORTED vim |
| 344 | |
| 345 | Magic: |
| 346 | (?{code}) arbitrary Perl code NOT SUPPORTED perl |
| 347 | (??{code}) postponed arbitrary Perl code NOT SUPPORTED perl |
| 348 | (?n) recursive call to regexp capturing group «n» NOT SUPPORTED |
| 349 | (?+n) recursive call to relative group «+n» NOT SUPPORTED |
| 350 | (?-n) recursive call to relative group «-n» NOT SUPPORTED |
| 351 | (?C) PCRE callout NOT SUPPORTED pcre |
| 352 | (?R) recursive call to entire regexp (== (?0)) NOT SUPPORTED |
| 353 | (?&name) recursive call to named group NOT SUPPORTED |
| 354 | (?P=name) named backreference NOT SUPPORTED |
| 355 | (?P>name) recursive call to named group NOT SUPPORTED |
| 356 | (?(cond)true|false) conditional branch NOT SUPPORTED |
| 357 | (?(cond)true) conditional branch NOT SUPPORTED |
| 358 | (*ACCEPT) make regexps more like Prolog NOT SUPPORTED |
| 359 | (*COMMIT) NOT SUPPORTED |
| 360 | (*F) NOT SUPPORTED |
| 361 | (*FAIL) NOT SUPPORTED |
| 362 | (*MARK) NOT SUPPORTED |
| 363 | (*PRUNE) NOT SUPPORTED |
| 364 | (*SKIP) NOT SUPPORTED |
| 365 | (*THEN) NOT SUPPORTED |
| 366 | (*ANY) set newline convention NOT SUPPORTED |
| 367 | (*ANYCRLF) NOT SUPPORTED |
| 368 | (*CR) NOT SUPPORTED |
| 369 | (*CRLF) NOT SUPPORTED |
| 370 | (*LF) NOT SUPPORTED |
| 371 | (*BSR_ANYCRLF) set \R convention NOT SUPPORTED pcre |
| 372 | (*BSR_UNICODE) NOT SUPPORTED pcre |
| 373 | |