Elliott Hughes | 5b80804 | 2021-10-01 10:56:10 -0700 | [diff] [blame] | 1 | PCRE2GREP(1) General Commands Manual PCRE2GREP(1) |
| 2 | |
| 3 | |
| 4 | |
| 5 | NAME |
| 6 | pcre2grep - a grep with Perl-compatible regular expressions. |
| 7 | |
| 8 | SYNOPSIS |
| 9 | pcre2grep [options] [long options] [pattern] [path1 path2 ...] |
| 10 | |
| 11 | |
| 12 | DESCRIPTION |
| 13 | |
| 14 | pcre2grep searches files for character patterns, in the same way as |
| 15 | other grep commands do, but it uses the PCRE2 regular expression li- |
| 16 | brary to support patterns that are compatible with the regular expres- |
| 17 | sions of Perl 5. See pcre2syntax(3) for a quick-reference summary of |
| 18 | pattern syntax, or pcre2pattern(3) for a full description of the syntax |
| 19 | and semantics of the regular expressions that PCRE2 supports. |
| 20 | |
| 21 | Patterns, whether supplied on the command line or in a separate file, |
| 22 | are given without delimiters. For example: |
| 23 | |
| 24 | pcre2grep Thursday /etc/motd |
| 25 | |
| 26 | If you attempt to use delimiters (for example, by surrounding a pattern |
| 27 | with slashes, as is common in Perl scripts), they are interpreted as |
| 28 | part of the pattern. Quotes can of course be used to delimit patterns |
| 29 | on the command line because they are interpreted by the shell, and in- |
| 30 | deed quotes are required if a pattern contains white space or shell |
| 31 | metacharacters. |
| 32 | |
| 33 | The first argument that follows any option settings is treated as the |
| 34 | single pattern to be matched when neither -e nor -f is present. Con- |
| 35 | versely, when one or both of these options are used to specify pat- |
| 36 | terns, all arguments are treated as path names. At least one of -e, -f, |
| 37 | or an argument pattern must be provided. |
| 38 | |
| 39 | If no files are specified, pcre2grep reads the standard input. The |
| 40 | standard input can also be referenced by a name consisting of a single |
| 41 | hyphen. For example: |
| 42 | |
| 43 | pcre2grep some-pattern file1 - file3 |
| 44 | |
| 45 | Input files are searched line by line. By default, each line that |
| 46 | matches a pattern is copied to the standard output, and if there is |
| 47 | more than one file, the file name is output at the start of each line, |
| 48 | followed by a colon. However, there are options that can change how |
| 49 | pcre2grep behaves. In particular, the -M option makes it possible to |
| 50 | search for strings that span line boundaries. What defines a line |
| 51 | boundary is controlled by the -N (--newline) option. |
| 52 | |
| 53 | The amount of memory used for buffering files that are being scanned is |
| 54 | controlled by parameters that can be set by the --buffer-size and |
| 55 | --max-buffer-size options. The first of these sets the size of buffer |
| 56 | that is obtained at the start of processing. If an input file contains |
| 57 | very long lines, a larger buffer may be needed; this is handled by au- |
| 58 | tomatically extending the buffer, up to the limit specified by --max- |
| 59 | buffer-size. The default values for these parameters can be set when |
| 60 | pcre2grep is built; if nothing is specified, the defaults are set to |
| 61 | 20KiB and 1MiB respectively. An error occurs if a line is too long and |
| 62 | the buffer can no longer be expanded. |
| 63 | |
| 64 | The block of memory that is actually used is three times the "buffer |
| 65 | size", to allow for buffering "before" and "after" lines. If the buffer |
| 66 | size is too small, fewer than requested "before" and "after" lines may |
| 67 | be output. |
| 68 | |
| 69 | Patterns can be no longer than 8KiB or BUFSIZ bytes, whichever is the |
| 70 | greater. BUFSIZ is defined in <stdio.h>. When there is more than one |
| 71 | pattern (specified by the use of -e and/or -f), each pattern is applied |
| 72 | to each line in the order in which they are defined, except that all |
| 73 | the -e patterns are tried before the -f patterns. |
| 74 | |
| 75 | By default, as soon as one pattern matches a line, no further patterns |
| 76 | are considered. However, if --colour (or --color) is used to colour the |
| 77 | matching substrings, or if --only-matching, --file-offsets, or --line- |
| 78 | offsets is used to output only the part of the line that matched (ei- |
| 79 | ther shown literally, or as an offset), scanning resumes immediately |
| 80 | following the match, so that further matches on the same line can be |
| 81 | found. If there are multiple patterns, they are all tried on the re- |
| 82 | mainder of the line, but patterns that follow the one that matched are |
| 83 | not tried on the earlier matched part of the line. |
| 84 | |
| 85 | This behaviour means that the order in which multiple patterns are |
| 86 | specified can affect the output when one of the above options is used. |
| 87 | This is no longer the same behaviour as GNU grep, which now manages to |
| 88 | display earlier matches for later patterns (as long as there is no |
| 89 | overlap). |
| 90 | |
| 91 | Patterns that can match an empty string are accepted, but empty string |
| 92 | matches are never recognized. An example is the pattern "(su- |
| 93 | per)?(man)?", in which all components are optional. This pattern finds |
| 94 | all occurrences of both "super" and "man"; the output differs from |
| 95 | matching with "super|man" when only the matching substrings are being |
| 96 | shown. |
| 97 | |
| 98 | If the LC_ALL or LC_CTYPE environment variable is set, pcre2grep uses |
| 99 | the value to set a locale when calling the PCRE2 library. The --locale |
| 100 | option can be used to override this. |
| 101 | |
| 102 | |
| 103 | SUPPORT FOR COMPRESSED FILES |
| 104 | |
| 105 | It is possible to compile pcre2grep so that it uses libz or libbz2 to |
| 106 | read compressed files whose names end in .gz or .bz2, respectively. You |
| 107 | can find out whether your pcre2grep binary has support for one or both |
| 108 | of these file types by running it with the --help option. If the appro- |
| 109 | priate support is not present, all files are treated as plain text. The |
| 110 | standard input is always so treated. When input is from a compressed |
| 111 | .gz or .bz2 file, the --line-buffered option is ignored. |
| 112 | |
| 113 | |
| 114 | BINARY FILES |
| 115 | |
| 116 | By default, a file that contains a binary zero byte within the first |
| 117 | 1024 bytes is identified as a binary file, and is processed specially. |
| 118 | However, if the newline type is specified as NUL, that is, the line |
| 119 | terminator is a binary zero, the test for a binary file is not applied. |
| 120 | See the --binary-files option for a means of changing the way binary |
| 121 | files are handled. |
| 122 | |
| 123 | |
| 124 | BINARY ZEROS IN PATTERNS |
| 125 | |
| 126 | Patterns passed from the command line are strings that are terminated |
| 127 | by a binary zero, so cannot contain internal zeros. However, patterns |
| 128 | that are read from a file via the -f option may contain binary zeros. |
| 129 | |
| 130 | |
| 131 | OPTIONS |
| 132 | |
| 133 | The order in which some of the options appear can affect the output. |
| 134 | For example, both the -H and -l options affect the printing of file |
| 135 | names. Whichever comes later in the command line will be the one that |
| 136 | takes effect. Similarly, except where noted below, if an option is |
| 137 | given twice, the later setting is used. Numerical values for options |
| 138 | may be followed by K or M, to signify multiplication by 1024 or |
| 139 | 1024*1024 respectively. |
| 140 | |
| 141 | -- This terminates the list of options. It is useful if the next |
| 142 | item on the command line starts with a hyphen but is not an |
| 143 | option. This allows for the processing of patterns and file |
| 144 | names that start with hyphens. |
| 145 | |
| 146 | -A number, --after-context=number |
| 147 | Output up to number lines of context after each matching |
| 148 | line. Fewer lines are output if the next match or the end of |
| 149 | the file is reached, or if the processing buffer size has |
| 150 | been set too small. If file names and/or line numbers are be- |
| 151 | ing output, a hyphen separator is used instead of a colon for |
| 152 | the context lines. A line containing "--" is output between |
| 153 | each group of lines, unless they are in fact contiguous in |
| 154 | the input file. The value of number is expected to be rela- |
| 155 | tively small. When -c is used, -A is ignored. |
| 156 | |
| 157 | -a, --text |
| 158 | Treat binary files as text. This is equivalent to --binary- |
| 159 | files=text. |
| 160 | |
| 161 | --allow-lookaround-bsk |
| 162 | PCRE2 now forbids the use of \K in lookarounds by default, in |
| 163 | line with Perl. This option causes pcre2grep to set the |
| 164 | PCRE2_EXTRA_ALLOW_LOOKAROUND_BSK option, which enables this |
| 165 | somewhat dangerous usage. |
| 166 | |
| 167 | -B number, --before-context=number |
| 168 | Output up to number lines of context before each matching |
| 169 | line. Fewer lines are output if the previous match or the |
| 170 | start of the file is within number lines, or if the process- |
| 171 | ing buffer size has been set too small. If file names and/or |
| 172 | line numbers are being output, a hyphen separator is used in- |
| 173 | stead of a colon for the context lines. A line containing |
| 174 | "--" is output between each group of lines, unless they are |
| 175 | in fact contiguous in the input file. The value of number is |
| 176 | expected to be relatively small. When -c is used, -B is ig- |
| 177 | nored. |
| 178 | |
| 179 | --binary-files=word |
| 180 | Specify how binary files are to be processed. If the word is |
| 181 | "binary" (the default), pattern matching is performed on bi- |
| 182 | nary files, but the only output is "Binary file <name> |
| 183 | matches" when a match succeeds. If the word is "text", which |
| 184 | is equivalent to the -a or --text option, binary files are |
| 185 | processed in the same way as any other file. In this case, |
| 186 | when a match succeeds, the output may be binary garbage, |
| 187 | which can have nasty effects if sent to a terminal. If the |
| 188 | word is "without-match", which is equivalent to the -I op- |
| 189 | tion, binary files are not processed at all; they are assumed |
| 190 | not to be of interest and are skipped without causing any |
| 191 | output or affecting the return code. |
| 192 | |
| 193 | --buffer-size=number |
| 194 | Set the parameter that controls how much memory is obtained |
| 195 | at the start of processing for buffering files that are being |
| 196 | scanned. See also --max-buffer-size below. |
| 197 | |
| 198 | -C number, --context=number |
| 199 | Output number lines of context both before and after each |
| 200 | matching line. This is equivalent to setting both -A and -B |
| 201 | to the same value. |
| 202 | |
| 203 | -c, --count |
| 204 | Do not output lines from the files that are being scanned; |
| 205 | instead output the number of lines that would have been |
| 206 | shown, either because they matched, or, if -v is set, because |
| 207 | they failed to match. By default, this count is exactly the |
| 208 | same as the number of lines that would have been output, but |
| 209 | if the -M (multiline) option is used (without -v), there may |
| 210 | be more suppressed lines than the count (that is, the number |
| 211 | of matches). |
| 212 | |
| 213 | If no lines are selected, the number zero is output. If sev- |
| 214 | eral files are are being scanned, a count is output for each |
| 215 | of them and the -t option can be used to cause a total to be |
| 216 | output at the end. However, if the --files-with-matches op- |
| 217 | tion is also used, only those files whose counts are greater |
| 218 | than zero are listed. When -c is used, the -A, -B, and -C op- |
| 219 | tions are ignored. |
| 220 | |
| 221 | --colour, --color |
| 222 | If this option is given without any data, it is equivalent to |
| 223 | "--colour=auto". If data is required, it must be given in |
| 224 | the same shell item, separated by an equals sign. |
| 225 | |
| 226 | --colour=value, --color=value |
| 227 | This option specifies under what circumstances the parts of a |
| 228 | line that matched a pattern should be coloured in the output. |
| 229 | By default, the output is not coloured. The value (which is |
| 230 | optional, see above) may be "never", "always", or "auto". In |
| 231 | the latter case, colouring happens only if the standard out- |
| 232 | put is connected to a terminal. More resources are used when |
| 233 | colouring is enabled, because pcre2grep has to search for all |
| 234 | possible matches in a line, not just one, in order to colour |
| 235 | them all. |
| 236 | |
| 237 | The colour that is used can be specified by setting one of |
| 238 | the environment variables PCRE2GREP_COLOUR, PCRE2GREP_COLOR, |
| 239 | PCREGREP_COLOUR, or PCREGREP_COLOR, which are checked in that |
| 240 | order. If none of these are set, pcre2grep looks for |
| 241 | GREP_COLORS or GREP_COLOR (in that order). The value of the |
| 242 | variable should be a string of two numbers, separated by a |
| 243 | semicolon, except in the case of GREP_COLORS, which must |
| 244 | start with "ms=" or "mt=" followed by two semicolon-separated |
| 245 | colours, terminated by the end of the string or by a colon. |
| 246 | If GREP_COLORS does not start with "ms=" or "mt=" it is ig- |
| 247 | nored, and GREP_COLOR is checked. |
| 248 | |
| 249 | If the string obtained from one of the above variables con- |
| 250 | tains any characters other than semicolon or digits, the set- |
| 251 | ting is ignored and the default colour is used. The string is |
| 252 | copied directly into the control string for setting colour on |
| 253 | a terminal, so it is your responsibility to ensure that the |
| 254 | values make sense. If no relevant environment variable is |
| 255 | set, the default is "1;31", which gives red. |
| 256 | |
| 257 | -D action, --devices=action |
| 258 | If an input path is not a regular file or a directory, "ac- |
| 259 | tion" specifies how it is to be processed. Valid values are |
| 260 | "read" (the default) or "skip" (silently skip the path). |
| 261 | |
| 262 | -d action, --directories=action |
| 263 | If an input path is a directory, "action" specifies how it is |
| 264 | to be processed. Valid values are "read" (the default in |
| 265 | non-Windows environments, for compatibility with GNU grep), |
| 266 | "recurse" (equivalent to the -r option), or "skip" (silently |
| 267 | skip the path, the default in Windows environments). In the |
| 268 | "read" case, directories are read as if they were ordinary |
| 269 | files. In some operating systems the effect of reading a di- |
| 270 | rectory like this is an immediate end-of-file; in others it |
| 271 | may provoke an error. |
| 272 | |
| 273 | --depth-limit=number |
| 274 | See --match-limit below. |
| 275 | |
| 276 | -e pattern, --regex=pattern, --regexp=pattern |
| 277 | Specify a pattern to be matched. This option can be used mul- |
| 278 | tiple times in order to specify several patterns. It can also |
| 279 | be used as a way of specifying a single pattern that starts |
| 280 | with a hyphen. When -e is used, no argument pattern is taken |
| 281 | from the command line; all arguments are treated as file |
| 282 | names. There is no limit to the number of patterns. They are |
| 283 | applied to each line in the order in which they are defined |
| 284 | until one matches. |
| 285 | |
| 286 | If -f is used with -e, the command line patterns are matched |
| 287 | first, followed by the patterns from the file(s), independent |
| 288 | of the order in which these options are specified. Note that |
| 289 | multiple use of -e is not the same as a single pattern with |
| 290 | alternatives. For example, X|Y finds the first character in a |
| 291 | line that is X or Y, whereas if the two patterns are given |
| 292 | separately, with X first, pcre2grep finds X if it is present, |
| 293 | even if it follows Y in the line. It finds Y only if there is |
| 294 | no X in the line. This matters only if you are using -o or |
| 295 | --colo(u)r to show the part(s) of the line that matched. |
| 296 | |
| 297 | --exclude=pattern |
| 298 | Files (but not directories) whose names match the pattern are |
| 299 | skipped without being processed. This applies to all files, |
| 300 | whether listed on the command line, obtained from --file- |
| 301 | list, or by scanning a directory. The pattern is a PCRE2 reg- |
| 302 | ular expression, and is matched against the final component |
| 303 | of the file name, not the entire path. The -F, -w, and -x op- |
| 304 | tions do not apply to this pattern. The option may be given |
| 305 | any number of times in order to specify multiple patterns. If |
| 306 | a file name matches both an --include and an --exclude pat- |
| 307 | tern, it is excluded. There is no short form for this option. |
| 308 | |
| 309 | --exclude-from=filename |
| 310 | Treat each non-empty line of the file as the data for an |
| 311 | --exclude option. What constitutes a newline when reading the |
| 312 | file is the operating system's default. The --newline option |
| 313 | has no effect on this option. This option may be given more |
| 314 | than once in order to specify a number of files to read. |
| 315 | |
| 316 | --exclude-dir=pattern |
| 317 | Directories whose names match the pattern are skipped without |
| 318 | being processed, whatever the setting of the --recursive op- |
| 319 | tion. This applies to all directories, whether listed on the |
| 320 | command line, obtained from --file-list, or by scanning a |
| 321 | parent directory. The pattern is a PCRE2 regular expression, |
| 322 | and is matched against the final component of the directory |
| 323 | name, not the entire path. The -F, -w, and -x options do not |
| 324 | apply to this pattern. The option may be given any number of |
| 325 | times in order to specify more than one pattern. If a direc- |
| 326 | tory matches both --include-dir and --exclude-dir, it is ex- |
| 327 | cluded. There is no short form for this option. |
| 328 | |
| 329 | -F, --fixed-strings |
| 330 | Interpret each data-matching pattern as a list of fixed |
| 331 | strings, separated by newlines, instead of as a regular ex- |
| 332 | pression. What constitutes a newline for this purpose is con- |
| 333 | trolled by the --newline option. The -w (match as a word) and |
| 334 | -x (match whole line) options can be used with -F. They ap- |
| 335 | ply to each of the fixed strings. A line is selected if any |
| 336 | of the fixed strings are found in it (subject to -w or -x, if |
| 337 | present). This option applies only to the patterns that are |
| 338 | matched against the contents of files; it does not apply to |
| 339 | patterns specified by any of the --include or --exclude op- |
| 340 | tions. |
| 341 | |
| 342 | -f filename, --file=filename |
| 343 | Read patterns from the file, one per line, and match them |
| 344 | against each line of input. As is the case with patterns on |
| 345 | the command line, no delimiters should be used. What consti- |
| 346 | tutes a newline when reading the file is the operating sys- |
| 347 | tem's default interpretation of \n. The --newline option has |
| 348 | no effect on this option. Trailing white space is removed |
| 349 | from each line, and blank lines are ignored. An empty file |
| 350 | contains no patterns and therefore matches nothing. Patterns |
| 351 | read from a file in this way may contain binary zeros, which |
| 352 | are treated as ordinary data characters. See also the com- |
| 353 | ments about multiple patterns versus a single pattern with |
| 354 | alternatives in the description of -e above. |
| 355 | |
| 356 | If this option is given more than once, all the specified |
| 357 | files are read. A data line is output if any of the patterns |
| 358 | match it. A file name can be given as "-" to refer to the |
| 359 | standard input. When -f is used, patterns specified on the |
| 360 | command line using -e may also be present; they are tested |
| 361 | before the file's patterns. However, no other pattern is |
| 362 | taken from the command line; all arguments are treated as the |
| 363 | names of paths to be searched. |
| 364 | |
| 365 | --file-list=filename |
| 366 | Read a list of files and/or directories that are to be |
| 367 | scanned from the given file, one per line. What constitutes a |
| 368 | newline when reading the file is the operating system's de- |
| 369 | fault. Trailing white space is removed from each line, and |
| 370 | blank lines are ignored. These paths are processed before any |
| 371 | that are listed on the command line. The file name can be |
| 372 | given as "-" to refer to the standard input. If --file and |
| 373 | --file-list are both specified as "-", patterns are read |
| 374 | first. This is useful only when the standard input is a ter- |
| 375 | minal, from which further lines (the list of files) can be |
| 376 | read after an end-of-file indication. If this option is given |
| 377 | more than once, all the specified files are read. |
| 378 | |
| 379 | --file-offsets |
| 380 | Instead of showing lines or parts of lines that match, show |
| 381 | each match as an offset from the start of the file and a |
| 382 | length, separated by a comma. In this mode, no context is |
| 383 | shown. That is, the -A, -B, and -C options are ignored. If |
| 384 | there is more than one match in a line, each of them is shown |
| 385 | separately. This option is mutually exclusive with --output, |
| 386 | --line-offsets, and --only-matching. |
| 387 | |
| 388 | -H, --with-filename |
| 389 | Force the inclusion of the file name at the start of output |
| 390 | lines when searching a single file. By default, the file name |
| 391 | is not shown in this case. For matching lines, the file name |
| 392 | is followed by a colon; for context lines, a hyphen separator |
| 393 | is used. If a line number is also being output, it follows |
| 394 | the file name. When the -M option causes a pattern to match |
| 395 | more than one line, only the first is preceded by the file |
| 396 | name. This option overrides any previous -h, -l, or -L op- |
| 397 | tions. |
| 398 | |
| 399 | -h, --no-filename |
| 400 | Suppress the output file names when searching multiple files. |
| 401 | By default, file names are shown when multiple files are |
| 402 | searched. For matching lines, the file name is followed by a |
| 403 | colon; for context lines, a hyphen separator is used. If a |
| 404 | line number is also being output, it follows the file name. |
| 405 | This option overrides any previous -H, -L, or -l options. |
| 406 | |
| 407 | --heap-limit=number |
| 408 | See --match-limit below. |
| 409 | |
| 410 | --help Output a help message, giving brief details of the command |
| 411 | options and file type support, and then exit. Anything else |
| 412 | on the command line is ignored. |
| 413 | |
| 414 | -I Ignore binary files. This is equivalent to --binary- |
| 415 | files=without-match. |
| 416 | |
| 417 | -i, --ignore-case |
| 418 | Ignore upper/lower case distinctions during comparisons. |
| 419 | |
| 420 | --include=pattern |
| 421 | If any --include patterns are specified, the only files that |
| 422 | are processed are those whose names match one of the patterns |
| 423 | and do not match an --exclude pattern. This option does not |
| 424 | affect directories, but it applies to all files, whether |
| 425 | listed on the command line, obtained from --file-list, or by |
| 426 | scanning a directory. The pattern is a PCRE2 regular expres- |
| 427 | sion, and is matched against the final component of the file |
| 428 | name, not the entire path. The -F, -w, and -x options do not |
| 429 | apply to this pattern. The option may be given any number of |
| 430 | times. If a file name matches both an --include and an --ex- |
| 431 | clude pattern, it is excluded. There is no short form for |
| 432 | this option. |
| 433 | |
| 434 | --include-from=filename |
| 435 | Treat each non-empty line of the file as the data for an |
| 436 | --include option. What constitutes a newline for this purpose |
| 437 | is the operating system's default. The --newline option has |
| 438 | no effect on this option. This option may be given any number |
| 439 | of times; all the files are read. |
| 440 | |
| 441 | --include-dir=pattern |
| 442 | If any --include-dir patterns are specified, the only direc- |
| 443 | tories that are processed are those whose names match one of |
| 444 | the patterns and do not match an --exclude-dir pattern. This |
| 445 | applies to all directories, whether listed on the command |
| 446 | line, obtained from --file-list, or by scanning a parent di- |
| 447 | rectory. The pattern is a PCRE2 regular expression, and is |
| 448 | matched against the final component of the directory name, |
| 449 | not the entire path. The -F, -w, and -x options do not apply |
| 450 | to this pattern. The option may be given any number of times. |
| 451 | If a directory matches both --include-dir and --exclude-dir, |
| 452 | it is excluded. There is no short form for this option. |
| 453 | |
| 454 | -L, --files-without-match |
| 455 | Instead of outputting lines from the files, just output the |
| 456 | names of the files that do not contain any lines that would |
| 457 | have been output. Each file name is output once, on a sepa- |
| 458 | rate line. This option overrides any previous -H, -h, or -l |
| 459 | options. |
| 460 | |
| 461 | -l, --files-with-matches |
| 462 | Instead of outputting lines from the files, just output the |
| 463 | names of the files containing lines that would have been out- |
| 464 | put. Each file name is output once, on a separate line. |
| 465 | Searching normally stops as soon as a matching line is found |
| 466 | in a file. However, if the -c (count) option is also used, |
| 467 | matching continues in order to obtain the correct count, and |
| 468 | those files that have at least one match are listed along |
| 469 | with their counts. Using this option with -c is a way of sup- |
| 470 | pressing the listing of files with no matches that occurs |
| 471 | with -c on its own. This option overrides any previous -H, |
| 472 | -h, or -L options. |
| 473 | |
| 474 | --label=name |
| 475 | This option supplies a name to be used for the standard input |
| 476 | when file names are being output. If not supplied, "(standard |
| 477 | input)" is used. There is no short form for this option. |
| 478 | |
| 479 | --line-buffered |
| 480 | When this option is given, non-compressed input is read and |
| 481 | processed line by line, and the output is flushed after each |
| 482 | write. By default, input is read in large chunks, unless |
| 483 | pcre2grep can determine that it is reading from a terminal, |
| 484 | which is currently possible only in Unix-like environments or |
| 485 | Windows. Output to terminal is normally automatically flushed |
| 486 | by the operating system. This option can be useful when the |
| 487 | input or output is attached to a pipe and you do not want |
| 488 | pcre2grep to buffer up large amounts of data. However, its |
| 489 | use will affect performance, and the -M (multiline) option |
| 490 | ceases to work. When input is from a compressed .gz or .bz2 |
| 491 | file, --line-buffered is ignored. |
| 492 | |
| 493 | --line-offsets |
| 494 | Instead of showing lines or parts of lines that match, show |
| 495 | each match as a line number, the offset from the start of the |
| 496 | line, and a length. The line number is terminated by a colon |
| 497 | (as usual; see the -n option), and the offset and length are |
| 498 | separated by a comma. In this mode, no context is shown. |
| 499 | That is, the -A, -B, and -C options are ignored. If there is |
| 500 | more than one match in a line, each of them is shown sepa- |
| 501 | rately. This option is mutually exclusive with --output, |
| 502 | --file-offsets, and --only-matching. |
| 503 | |
| 504 | --locale=locale-name |
| 505 | This option specifies a locale to be used for pattern match- |
| 506 | ing. It overrides the value in the LC_ALL or LC_CTYPE envi- |
| 507 | ronment variables. If no locale is specified, the PCRE2 li- |
| 508 | brary's default (usually the "C" locale) is used. There is no |
| 509 | short form for this option. |
| 510 | |
| 511 | -M, --multiline |
| 512 | Allow patterns to match more than one line. When this option |
| 513 | is set, the PCRE2 library is called in "multiline" mode. This |
| 514 | allows a matched string to extend past the end of a line and |
| 515 | continue on one or more subsequent lines. Patterns used with |
| 516 | -M may usefully contain literal newline characters and inter- |
| 517 | nal occurrences of ^ and $ characters. The output for a suc- |
| 518 | cessful match may consist of more than one line. The first |
| 519 | line is the line in which the match started, and the last |
| 520 | line is the line in which the match ended. If the matched |
| 521 | string ends with a newline sequence, the output ends at the |
| 522 | end of that line. If -v is set, none of the lines in a |
| 523 | multi-line match are output. Once a match has been handled, |
| 524 | scanning restarts at the beginning of the line after the one |
| 525 | in which the match ended. |
| 526 | |
| 527 | The newline sequence that separates multiple lines must be |
| 528 | matched as part of the pattern. For example, to find the |
| 529 | phrase "regular expression" in a file where "regular" might |
| 530 | be at the end of a line and "expression" at the start of the |
| 531 | next line, you could use this command: |
| 532 | |
| 533 | pcre2grep -M 'regular\s+expression' <file> |
| 534 | |
| 535 | The \s escape sequence matches any white space character, in- |
| 536 | cluding newlines, and is followed by + so as to match trail- |
| 537 | ing white space on the first line as well as possibly han- |
| 538 | dling a two-character newline sequence. |
| 539 | |
| 540 | There is a limit to the number of lines that can be matched, |
| 541 | imposed by the way that pcre2grep buffers the input file as |
| 542 | it scans it. With a sufficiently large processing buffer, |
| 543 | this should not be a problem, but the -M option does not work |
| 544 | when input is read line by line (see --line-buffered.) |
| 545 | |
| 546 | -m number, --max-count=number |
| 547 | Stop processing after finding number matching lines, or non- |
| 548 | matching lines if -v is also set. Any trailing context lines |
| 549 | are output after the final match. In multiline mode, each |
| 550 | multiline match counts as just one line for this purpose. If |
| 551 | this limit is reached when reading the standard input from a |
| 552 | regular file, the file is left positioned just after the last |
| 553 | matching line. If -c is also set, the count that is output |
| 554 | is never greater than number. This option has no effect if |
| 555 | used with -L, -l, or -q, or when just checking for a match in |
| 556 | a binary file. |
| 557 | |
| 558 | --match-limit=number |
| 559 | Processing some regular expression patterns may take a very |
| 560 | long time to search for all possible matching strings. Others |
| 561 | may require a very large amount of memory. There are three |
| 562 | options that set resource limits for matching. |
| 563 | |
| 564 | The --match-limit option provides a means of limiting comput- |
| 565 | ing resource usage when processing patterns that are not go- |
| 566 | ing to match, but which have a very large number of possibil- |
| 567 | ities in their search trees. The classic example is a pattern |
| 568 | that uses nested unlimited repeats. Internally, PCRE2 has a |
| 569 | counter that is incremented each time around its main pro- |
| 570 | cessing loop. If the value set by --match-limit is reached, |
| 571 | an error occurs. |
| 572 | |
| 573 | The --heap-limit option specifies, as a number of kibibytes |
| 574 | (units of 1024 bytes), the amount of heap memory that may be |
| 575 | used for matching. Heap memory is needed only if matching the |
| 576 | pattern requires a significant number of nested backtracking |
| 577 | points to be remembered. This parameter can be set to zero to |
| 578 | forbid the use of heap memory altogether. |
| 579 | |
| 580 | The --depth-limit option limits the depth of nested back- |
| 581 | tracking points, which indirectly limits the amount of memory |
| 582 | that is used. The amount of memory needed for each backtrack- |
| 583 | ing point depends on the number of capturing parentheses in |
| 584 | the pattern, so the amount of memory that is used before this |
| 585 | limit acts varies from pattern to pattern. This limit is of |
| 586 | use only if it is set smaller than --match-limit. |
| 587 | |
| 588 | There are no short forms for these options. The default lim- |
| 589 | its can be set when the PCRE2 library is compiled; if they |
| 590 | are not specified, the defaults are very large and so effec- |
| 591 | tively unlimited. |
| 592 | |
| 593 | --max-buffer-size=number |
| 594 | This limits the expansion of the processing buffer, whose |
| 595 | initial size can be set by --buffer-size. The maximum buffer |
| 596 | size is silently forced to be no smaller than the starting |
| 597 | buffer size. |
| 598 | |
| 599 | -N newline-type, --newline=newline-type |
| 600 | Six different conventions for indicating the ends of lines in |
| 601 | scanned files are supported. For example: |
| 602 | |
| 603 | pcre2grep -N CRLF 'some pattern' <file> |
| 604 | |
| 605 | The newline type may be specified in upper, lower, or mixed |
| 606 | case. If the newline type is NUL, lines are separated by bi- |
| 607 | nary zero characters. The other types are the single-charac- |
| 608 | ter sequences CR (carriage return) and LF (linefeed), the |
| 609 | two-character sequence CRLF, an "anycrlf" type, which recog- |
| 610 | nizes any of the preceding three types, and an "any" type, |
| 611 | for which any Unicode line ending sequence is assumed to end |
| 612 | a line. The Unicode sequences are the three just mentioned, |
| 613 | plus VT (vertical tab, U+000B), FF (form feed, U+000C), NEL |
| 614 | (next line, U+0085), LS (line separator, U+2028), and PS |
| 615 | (paragraph separator, U+2029). |
| 616 | |
| 617 | When the PCRE2 library is built, a default line-ending se- |
| 618 | quence is specified. This is normally the standard sequence |
| 619 | for the operating system. Unless otherwise specified by this |
| 620 | option, pcre2grep uses the library's default. |
| 621 | |
| 622 | This option makes it possible to use pcre2grep to scan files |
| 623 | that have come from other environments without having to mod- |
| 624 | ify their line endings. If the data that is being scanned |
| 625 | does not agree with the convention set by this option, |
| 626 | pcre2grep may behave in strange ways. Note that this option |
| 627 | does not apply to files specified by the -f, --exclude-from, |
| 628 | or --include-from options, which are expected to use the op- |
| 629 | erating system's standard newline sequence. |
| 630 | |
| 631 | -n, --line-number |
| 632 | Precede each output line by its line number in the file, fol- |
| 633 | lowed by a colon for matching lines or a hyphen for context |
| 634 | lines. If the file name is also being output, it precedes the |
| 635 | line number. When the -M option causes a pattern to match |
| 636 | more than one line, only the first is preceded by its line |
| 637 | number. This option is forced if --line-offsets is used. |
| 638 | |
| 639 | --no-jit If the PCRE2 library is built with support for just-in-time |
| 640 | compiling (which speeds up matching), pcre2grep automatically |
| 641 | makes use of this, unless it was explicitly disabled at build |
| 642 | time. This option can be used to disable the use of JIT at |
| 643 | run time. It is provided for testing and working round prob- |
| 644 | lems. It should never be needed in normal use. |
| 645 | |
| 646 | -O text, --output=text |
| 647 | When there is a match, instead of outputting the line that |
| 648 | matched, output just the text specified in this option, fol- |
| 649 | lowed by an operating-system standard newline. In this mode, |
| 650 | no context is shown. That is, the -A, -B, and -C options are |
| 651 | ignored. The --newline option has no effect on this option, |
| 652 | which is mutually exclusive with --only-matching, --file-off- |
| 653 | sets, and --line-offsets. However, like --only-matching, if |
| 654 | there is more than one match in a line, each of them causes a |
| 655 | line of output. |
| 656 | |
| 657 | Escape sequences starting with a dollar character may be used |
| 658 | to insert the contents of the matched part of the line and/or |
| 659 | captured substrings into the text. |
| 660 | |
| 661 | $<digits> or ${<digits>} is replaced by the captured sub- |
| 662 | string of the given decimal number; zero substitutes the |
| 663 | whole match. If the number is greater than the number of cap- |
| 664 | turing substrings, or if the capture is unset, the replace- |
| 665 | ment is empty. |
| 666 | |
| 667 | $a is replaced by bell; $b by backspace; $e by escape; $f by |
| 668 | form feed; $n by newline; $r by carriage return; $t by tab; |
| 669 | $v by vertical tab. |
| 670 | |
| 671 | $o<digits> or $o{<digits>} is replaced by the character whose |
| 672 | code point is the given octal number. In the first form, up |
| 673 | to three octal digits are processed. When more digits are |
| 674 | needed in Unicode mode to specify a wide character, the sec- |
| 675 | ond form must be used. |
| 676 | |
| 677 | $x<digits> or $x{<digits>} is replaced by the character rep- |
| 678 | resented by the given hexadecimal number. In the first form, |
| 679 | up to two hexadecimal digits are processed. When more digits |
| 680 | are needed in Unicode mode to specify a wide character, the |
| 681 | second form must be used. |
| 682 | |
| 683 | Any other character is substituted by itself. In particular, |
| 684 | $$ is replaced by a single dollar. |
| 685 | |
| 686 | -o, --only-matching |
| 687 | Show only the part of the line that matched a pattern instead |
| 688 | of the whole line. In this mode, no context is shown. That |
| 689 | is, the -A, -B, and -C options are ignored. If there is more |
| 690 | than one match in a line, each of them is shown separately, |
| 691 | on a separate line of output. If -o is combined with -v (in- |
| 692 | vert the sense of the match to find non-matching lines), no |
| 693 | output is generated, but the return code is set appropri- |
| 694 | ately. If the matched portion of the line is empty, nothing |
| 695 | is output unless the file name or line number are being |
| 696 | printed, in which case they are shown on an otherwise empty |
| 697 | line. This option is mutually exclusive with --output, |
| 698 | --file-offsets and --line-offsets. |
| 699 | |
| 700 | -onumber, --only-matching=number |
| 701 | Show only the part of the line that matched the capturing |
| 702 | parentheses of the given number. Up to 50 capturing parenthe- |
| 703 | ses are supported by default. This limit can be changed via |
| 704 | the --om-capture option. A pattern may contain any number of |
| 705 | capturing parentheses, but only those whose number is within |
| 706 | the limit can be accessed by -o. An error occurs if the num- |
| 707 | ber specified by -o is greater than the limit. |
| 708 | |
| 709 | -o0 is the same as -o without a number. Because these options |
| 710 | can be given without an argument (see above), if an argument |
| 711 | is present, it must be given in the same shell item, for ex- |
| 712 | ample, -o3 or --only-matching=2. The comments given for the |
| 713 | non-argument case above also apply to this option. If the |
| 714 | specified capturing parentheses do not exist in the pattern, |
| 715 | or were not set in the match, nothing is output unless the |
| 716 | file name or line number are being output. |
| 717 | |
| 718 | If this option is given multiple times, multiple substrings |
| 719 | are output for each match, in the order the options are |
| 720 | given, and all on one line. For example, -o3 -o1 -o3 causes |
| 721 | the substrings matched by capturing parentheses 3 and 1 and |
| 722 | then 3 again to be output. By default, there is no separator |
| 723 | (but see the next but one option). |
| 724 | |
| 725 | --om-capture=number |
| 726 | Set the number of capturing parentheses that can be accessed |
| 727 | by -o. The default is 50. |
| 728 | |
| 729 | --om-separator=text |
| 730 | Specify a separating string for multiple occurrences of -o. |
| 731 | The default is an empty string. Separating strings are never |
| 732 | coloured. |
| 733 | |
| 734 | -q, --quiet |
| 735 | Work quietly, that is, display nothing except error messages. |
| 736 | The exit status indicates whether or not any matches were |
| 737 | found. |
| 738 | |
| 739 | -r, --recursive |
| 740 | If any given path is a directory, recursively scan the files |
| 741 | it contains, taking note of any --include and --exclude set- |
| 742 | tings. By default, a directory is read as a normal file; in |
| 743 | some operating systems this gives an immediate end-of-file. |
| 744 | This option is a shorthand for setting the -d option to "re- |
| 745 | curse". |
| 746 | |
| 747 | --recursion-limit=number |
| 748 | This is an obsolete synonym for --depth-limit. See --match- |
| 749 | limit above for details. |
| 750 | |
| 751 | -s, --no-messages |
| 752 | Suppress error messages about non-existent or unreadable |
| 753 | files. Such files are quietly skipped. However, the return |
| 754 | code is still 2, even if matches were found in other files. |
| 755 | |
| 756 | -t, --total-count |
| 757 | This option is useful when scanning more than one file. If |
| 758 | used on its own, -t suppresses all output except for a grand |
| 759 | total number of matching lines (or non-matching lines if -v |
| 760 | is used) in all the files. If -t is used with -c, a grand to- |
| 761 | tal is output except when the previous output is just one |
| 762 | line. In other words, it is not output when just one file's |
| 763 | count is listed. If file names are being output, the grand |
| 764 | total is preceded by "TOTAL:". Otherwise, it appears as just |
| 765 | another number. The -t option is ignored when used with -L |
| 766 | (list files without matches), because the grand total would |
| 767 | always be zero. |
| 768 | |
| 769 | -u, --utf Operate in UTF-8 mode. This option is available only if PCRE2 |
| 770 | has been compiled with UTF-8 support. All patterns (including |
| 771 | those for any --exclude and --include options) and all lines |
| 772 | that are scanned must be valid strings of UTF-8 characters. |
| 773 | If an invalid UTF-8 string is encountered, an error occurs. |
| 774 | |
| 775 | -U, --utf-allow-invalid |
| 776 | As --utf, but in addition subject lines may contain invalid |
| 777 | UTF-8 code unit sequences. These can never form part of any |
| 778 | pattern match. Patterns themselves, however, must still be |
| 779 | valid UTF-8 strings. This facility allows valid UTF-8 strings |
| 780 | to be sought within arbitrary byte sequences in executable or |
| 781 | other binary files. For more details about matching in non- |
| 782 | valid UTF-8 strings, see the pcre2unicode(3) documentation. |
| 783 | |
| 784 | -V, --version |
| 785 | Write the version numbers of pcre2grep and the PCRE2 library |
| 786 | to the standard output and then exit. Anything else on the |
| 787 | command line is ignored. |
| 788 | |
| 789 | -v, --invert-match |
| 790 | Invert the sense of the match, so that lines which do not |
| 791 | match any of the patterns are the ones that are found. When |
| 792 | this option is set, options such as --only-matching and |
| 793 | --output, which specify parts of a match that are to be out- |
| 794 | put, are ignored. |
| 795 | |
| 796 | -w, --word-regex, --word-regexp |
| 797 | Force the patterns only to match "words". That is, there must |
| 798 | be a word boundary at the start and end of each matched |
| 799 | string. This is equivalent to having "\b(?:" at the start of |
| 800 | each pattern, and ")\b" at the end. This option applies only |
| 801 | to the patterns that are matched against the contents of |
| 802 | files; it does not apply to patterns specified by any of the |
| 803 | --include or --exclude options. |
| 804 | |
| 805 | -x, --line-regex, --line-regexp |
| 806 | Force the patterns to start matching only at the beginnings |
| 807 | of lines, and in addition, require them to match entire |
| 808 | lines. In multiline mode the match may be more than one line. |
| 809 | This is equivalent to having "^(?:" at the start of each pat- |
| 810 | tern and ")$" at the end. This option applies only to the |
| 811 | patterns that are matched against the contents of files; it |
| 812 | does not apply to patterns specified by any of the --include |
| 813 | or --exclude options. |
| 814 | |
| 815 | |
| 816 | ENVIRONMENT VARIABLES |
| 817 | |
| 818 | The environment variables LC_ALL and LC_CTYPE are examined, in that or- |
| 819 | der, for a locale. The first one that is set is used. This can be over- |
| 820 | ridden by the --locale option. If no locale is set, the PCRE2 library's |
| 821 | default (usually the "C" locale) is used. |
| 822 | |
| 823 | |
| 824 | NEWLINES |
| 825 | |
| 826 | The -N (--newline) option allows pcre2grep to scan files with newline |
| 827 | conventions that differ from the default. This option affects only the |
| 828 | way scanned files are processed. It does not affect the interpretation |
| 829 | of files specified by the -f, --file-list, --exclude-from, or --in- |
| 830 | clude-from options. |
| 831 | |
| 832 | Any parts of the scanned input files that are written to the standard |
| 833 | output are copied with whatever newline sequences they have in the in- |
| 834 | put. However, if the final line of a file is output, and it does not |
| 835 | end with a newline sequence, a newline sequence is added. If the new- |
| 836 | line setting is CR, LF, CRLF or NUL, that line ending is output; for |
| 837 | the other settings (ANYCRLF or ANY) a single NL is used. |
| 838 | |
| 839 | The newline setting does not affect the way in which pcre2grep writes |
| 840 | newlines in informational messages to the standard output and error |
| 841 | streams. Under Windows, the standard output is set to be binary, so |
| 842 | that "\r\n" at the ends of output lines that are copied from the input |
| 843 | is not converted to "\r\r\n" by the C I/O library. This means that any |
| 844 | messages written to the standard output must end with "\r\n". For all |
| 845 | other operating systems, and for all messages to the standard error |
| 846 | stream, "\n" is used. |
| 847 | |
| 848 | |
| 849 | OPTIONS COMPATIBILITY |
| 850 | |
| 851 | Many of the short and long forms of pcre2grep's options are the same as |
| 852 | in the GNU grep program. Any long option of the form --xxx-regexp (GNU |
| 853 | terminology) is also available as --xxx-regex (PCRE2 terminology). How- |
| 854 | ever, the --depth-limit, --file-list, --file-offsets, --heap-limit, |
| 855 | --include-dir, --line-offsets, --locale, --match-limit, -M, --multi- |
| 856 | line, -N, --newline, --om-separator, --output, -u, --utf, -U, and |
| 857 | --utf-allow-invalid options are specific to pcre2grep, as is the use of |
| 858 | the --only-matching option with a capturing parentheses number. |
| 859 | |
| 860 | Although most of the common options work the same way, a few are dif- |
| 861 | ferent in pcre2grep. For example, the --include option's argument is a |
| 862 | glob for GNU grep, but a regular expression for pcre2grep. If both the |
| 863 | -c and -l options are given, GNU grep lists only file names, without |
| 864 | counts, but pcre2grep gives the counts as well. |
| 865 | |
| 866 | |
| 867 | OPTIONS WITH DATA |
| 868 | |
| 869 | There are four different ways in which an option with data can be spec- |
| 870 | ified. If a short form option is used, the data may follow immedi- |
| 871 | ately, or (with one exception) in the next command line item. For exam- |
| 872 | ple: |
| 873 | |
| 874 | -f/some/file |
| 875 | -f /some/file |
| 876 | |
| 877 | The exception is the -o option, which may appear with or without data. |
| 878 | Because of this, if data is present, it must follow immediately in the |
| 879 | same item, for example -o3. |
| 880 | |
| 881 | If a long form option is used, the data may appear in the same command |
| 882 | line item, separated by an equals character, or (with two exceptions) |
| 883 | it may appear in the next command line item. For example: |
| 884 | |
| 885 | --file=/some/file |
| 886 | --file /some/file |
| 887 | |
| 888 | Note, however, that if you want to supply a file name beginning with ~ |
| 889 | as data in a shell command, and have the shell expand ~ to a home di- |
| 890 | rectory, you must separate the file name from the option, because the |
| 891 | shell does not treat ~ specially unless it is at the start of an item. |
| 892 | |
| 893 | The exceptions to the above are the --colour (or --color) and --only- |
| 894 | matching options, for which the data is optional. If one of these op- |
| 895 | tions does have data, it must be given in the first form, using an |
| 896 | equals character. Otherwise pcre2grep will assume that it has no data. |
| 897 | |
| 898 | |
| 899 | USING PCRE2'S CALLOUT FACILITY |
| 900 | |
| 901 | pcre2grep has, by default, support for calling external programs or |
| 902 | scripts or echoing specific strings during matching by making use of |
| 903 | PCRE2's callout facility. However, this support can be completely or |
| 904 | partially disabled when pcre2grep is built. You can find out whether |
| 905 | your binary has support for callouts by running it with the --help op- |
| 906 | tion. If callout support is completely disabled, all callouts in pat- |
| 907 | terns are ignored by pcre2grep. If the facility is partially disabled, |
| 908 | calling external programs is not supported, and callouts that request |
| 909 | it are ignored. |
| 910 | |
| 911 | A callout in a PCRE2 pattern is of the form (?C<arg>) where the argu- |
| 912 | ment is either a number or a quoted string (see the pcre2callout docu- |
| 913 | mentation for details). Numbered callouts are ignored by pcre2grep; |
| 914 | only callouts with string arguments are useful. |
| 915 | |
| 916 | Echoing a specific string |
| 917 | |
| 918 | Starting the callout string with a pipe character invokes an echoing |
| 919 | facility that avoids calling an external program or script. This facil- |
| 920 | ity is always available, provided that callouts were not completely |
| 921 | disabled when pcre2grep was built. The rest of the callout string is |
| 922 | processed as a zero-terminated string, which means it should not con- |
| 923 | tain any internal binary zeros. It is written to the output, having |
| 924 | first been passed through the same escape processing as text from the |
| 925 | --output (-O) option (see above). However, $0 cannot be used to insert |
| 926 | a matched substring because the match is still in progress. Instead, |
| 927 | the single character '0' is inserted. Any syntax errors in the string |
| 928 | (for example, a dollar not followed by another character) causes the |
| 929 | callout to be ignored. No terminator is added to the output string, so |
| 930 | if you want a newline, you must include it explicitly using the escape |
| 931 | $n. For example: |
| 932 | |
| 933 | pcre2grep '(.)(..(.))(?C"|[$1] [$2] [$3]$n")' <some file> |
| 934 | |
| 935 | Matching continues normally after the string is output. If you want to |
| 936 | see only the callout output but not any output from an actual match, |
| 937 | you should end the pattern with (*FAIL). |
| 938 | |
| 939 | Calling external programs or scripts |
| 940 | |
| 941 | This facility can be independently disabled when pcre2grep is built. It |
| 942 | is supported for Windows, where a call to _spawnvp() is used, for VMS, |
| 943 | where lib$spawn() is used, and for any Unix-like environment where |
| 944 | fork() and execv() are available. |
| 945 | |
| 946 | If the callout string does not start with a pipe (vertical bar) charac- |
| 947 | ter, it is parsed into a list of substrings separated by pipe charac- |
| 948 | ters. The first substring must be an executable name, with the follow- |
| 949 | ing substrings specifying arguments: |
| 950 | |
| 951 | executable_name|arg1|arg2|... |
| 952 | |
| 953 | Any substring (including the executable name) may contain escape se- |
| 954 | quences started by a dollar character. These are the same as for the |
| 955 | --output (-O) option documented above, except that $0 cannot insert the |
| 956 | matched string because the match is still in progress. Instead, the |
| 957 | character '0' is inserted. If you need a literal dollar or pipe charac- |
| 958 | ter in any substring, use $$ or $| respectively. Here is an example: |
| 959 | |
| 960 | echo -e "abcde\n12345" | pcre2grep \ |
| 961 | '(?x)(.)(..(.)) |
| 962 | (?C"/bin/echo|Arg1: [$1] [$2] [$3]|Arg2: $|${1}$| ($4)")()' - |
| 963 | |
| 964 | Output: |
| 965 | |
| 966 | Arg1: [a] [bcd] [d] Arg2: |a| () |
| 967 | abcde |
| 968 | Arg1: [1] [234] [4] Arg2: |1| () |
| 969 | 12345 |
| 970 | |
| 971 | The parameters for the system call that is used to run the program or |
| 972 | script are zero-terminated strings. This means that binary zero charac- |
| 973 | ters in the callout argument will cause premature termination of their |
| 974 | substrings, and therefore should not be present. Any syntax errors in |
| 975 | the string (for example, a dollar not followed by another character) |
| 976 | causes the callout to be ignored. If running the program fails for any |
| 977 | reason (including the non-existence of the executable), a local match- |
| 978 | ing failure occurs and the matcher backtracks in the normal way. |
| 979 | |
| 980 | |
| 981 | MATCHING ERRORS |
| 982 | |
| 983 | It is possible to supply a regular expression that takes a very long |
| 984 | time to fail to match certain lines. Such patterns normally involve |
| 985 | nested indefinite repeats, for example: (a+)*\d when matched against a |
| 986 | line of a's with no final digit. The PCRE2 matching function has a re- |
| 987 | source limit that causes it to abort in these circumstances. If this |
| 988 | happens, pcre2grep outputs an error message and the line that caused |
| 989 | the problem to the standard error stream. If there are more than 20 |
| 990 | such errors, pcre2grep gives up. |
| 991 | |
| 992 | The --match-limit option of pcre2grep can be used to set the overall |
| 993 | resource limit. There are also other limits that affect the amount of |
| 994 | memory used during matching; see the discussion of --heap-limit and |
| 995 | --depth-limit above. |
| 996 | |
| 997 | |
| 998 | DIAGNOSTICS |
| 999 | |
| 1000 | Exit status is 0 if any matches were found, 1 if no matches were found, |
| 1001 | and 2 for syntax errors, overlong lines, non-existent or inaccessible |
| 1002 | files (even if matches were found in other files) or too many matching |
| 1003 | errors. Using the -s option to suppress error messages about inaccessi- |
| 1004 | ble files does not affect the return code. |
| 1005 | |
| 1006 | When run under VMS, the return code is placed in the symbol |
| 1007 | PCRE2GREP_RC because VMS does not distinguish between exit(0) and |
| 1008 | exit(1). |
| 1009 | |
| 1010 | |
| 1011 | SEE ALSO |
| 1012 | |
| 1013 | pcre2pattern(3), pcre2syntax(3), pcre2callout(3), pcre2unicode(3). |
| 1014 | |
| 1015 | |
| 1016 | AUTHOR |
| 1017 | |
| 1018 | Philip Hazel |
| 1019 | Retired from University Computing Service |
| 1020 | Cambridge, England. |
| 1021 | |
| 1022 | |
| 1023 | REVISION |
| 1024 | |
| 1025 | Last updated: 31 August 2021 |
| 1026 | Copyright (c) 1997-2021 University of Cambridge. |