Elliott Hughes | 5b80804 | 2021-10-01 10:56:10 -0700 | [diff] [blame] | 1 | .TH PCRE2GREP 1 "31 August 2021" "PCRE2 10.38" |
| 2 | .SH NAME |
| 3 | pcre2grep - a grep with Perl-compatible regular expressions. |
| 4 | .SH SYNOPSIS |
| 5 | .B pcre2grep [options] [long options] [pattern] [path1 path2 ...] |
| 6 | . |
| 7 | .SH DESCRIPTION |
| 8 | .rs |
| 9 | .sp |
| 10 | \fBpcre2grep\fP searches files for character patterns, in the same way as other |
| 11 | grep commands do, but it uses the PCRE2 regular expression library to support |
| 12 | patterns that are compatible with the regular expressions of Perl 5. See |
| 13 | .\" HREF |
| 14 | \fBpcre2syntax\fP(3) |
| 15 | .\" |
| 16 | for a quick-reference summary of pattern syntax, or |
| 17 | .\" HREF |
| 18 | \fBpcre2pattern\fP(3) |
| 19 | .\" |
| 20 | for a full description of the syntax and semantics of the regular expressions |
| 21 | that PCRE2 supports. |
| 22 | .P |
| 23 | Patterns, whether supplied on the command line or in a separate file, are given |
| 24 | without delimiters. For example: |
| 25 | .sp |
| 26 | pcre2grep Thursday /etc/motd |
| 27 | .sp |
| 28 | If you attempt to use delimiters (for example, by surrounding a pattern with |
| 29 | slashes, as is common in Perl scripts), they are interpreted as part of the |
| 30 | pattern. Quotes can of course be used to delimit patterns on the command line |
| 31 | because they are interpreted by the shell, and indeed quotes are required if a |
| 32 | pattern contains white space or shell metacharacters. |
| 33 | .P |
| 34 | The first argument that follows any option settings is treated as the single |
| 35 | pattern to be matched when neither \fB-e\fP nor \fB-f\fP is present. |
| 36 | Conversely, when one or both of these options are used to specify patterns, all |
| 37 | arguments are treated as path names. At least one of \fB-e\fP, \fB-f\fP, or an |
| 38 | argument pattern must be provided. |
| 39 | .P |
| 40 | If no files are specified, \fBpcre2grep\fP reads the standard input. The |
| 41 | standard input can also be referenced by a name consisting of a single hyphen. |
| 42 | For example: |
| 43 | .sp |
| 44 | pcre2grep some-pattern file1 - file3 |
| 45 | .sp |
| 46 | Input files are searched line by line. By default, each line that matches a |
| 47 | pattern is copied to the standard output, and if there is more than one file, |
| 48 | the file name is output at the start of each line, followed by a colon. |
| 49 | However, there are options that can change how \fBpcre2grep\fP behaves. In |
| 50 | particular, the \fB-M\fP option makes it possible to search for strings that |
| 51 | span line boundaries. What defines a line boundary is controlled by the |
| 52 | \fB-N\fP (\fB--newline\fP) option. |
| 53 | .P |
| 54 | The amount of memory used for buffering files that are being scanned is |
| 55 | controlled by parameters that can be set by the \fB--buffer-size\fP and |
| 56 | \fB--max-buffer-size\fP options. The first of these sets the size of buffer |
| 57 | that is obtained at the start of processing. If an input file contains very |
| 58 | long lines, a larger buffer may be needed; this is handled by automatically |
| 59 | extending the buffer, up to the limit specified by \fB--max-buffer-size\fP. The |
| 60 | default values for these parameters can be set when \fBpcre2grep\fP is |
| 61 | built; if nothing is specified, the defaults are set to 20KiB and 1MiB |
| 62 | respectively. An error occurs if a line is too long and the buffer can no |
| 63 | longer be expanded. |
| 64 | .P |
| 65 | The block of memory that is actually used is three times the "buffer size", to |
| 66 | allow for buffering "before" and "after" lines. If the buffer size is too |
| 67 | small, fewer than requested "before" and "after" lines may be output. |
| 68 | .P |
| 69 | Patterns can be no longer than 8KiB or BUFSIZ bytes, whichever is the greater. |
| 70 | BUFSIZ is defined in \fB<stdio.h>\fP. When there is more than one pattern |
| 71 | (specified by the use of \fB-e\fP and/or \fB-f\fP), each pattern is applied to |
| 72 | each line in the order in which they are defined, except that all the \fB-e\fP |
| 73 | patterns are tried before the \fB-f\fP patterns. |
| 74 | .P |
| 75 | By default, as soon as one pattern matches a line, no further patterns are |
| 76 | considered. However, if \fB--colour\fP (or \fB--color\fP) is used to colour the |
| 77 | matching substrings, or if \fB--only-matching\fP, \fB--file-offsets\fP, or |
| 78 | \fB--line-offsets\fP is used to output only the part of the line that matched |
| 79 | (either shown literally, or as an offset), scanning resumes immediately |
| 80 | following the match, so that further matches on the same line can be found. If |
| 81 | there are multiple patterns, they are all tried on the remainder of the line, |
| 82 | but patterns that follow the one that matched are not tried on the earlier |
| 83 | matched part of the line. |
| 84 | .P |
| 85 | This behaviour means that the order in which multiple patterns are specified |
| 86 | can affect the output when one of the above options is used. This is no longer |
| 87 | the same behaviour as GNU grep, which now manages to display earlier matches |
| 88 | for later patterns (as long as there is no overlap). |
| 89 | .P |
| 90 | Patterns that can match an empty string are accepted, but empty string |
| 91 | matches are never recognized. An example is the pattern "(super)?(man)?", in |
| 92 | which all components are optional. This pattern finds all occurrences of both |
| 93 | "super" and "man"; the output differs from matching with "super|man" when only |
| 94 | the matching substrings are being shown. |
| 95 | .P |
| 96 | If the \fBLC_ALL\fP or \fBLC_CTYPE\fP environment variable is set, |
| 97 | \fBpcre2grep\fP uses the value to set a locale when calling the PCRE2 library. |
| 98 | The \fB--locale\fP option can be used to override this. |
| 99 | . |
| 100 | . |
| 101 | .SH "SUPPORT FOR COMPRESSED FILES" |
| 102 | .rs |
| 103 | .sp |
| 104 | It is possible to compile \fBpcre2grep\fP so that it uses \fBlibz\fP or |
| 105 | \fBlibbz2\fP to read compressed files whose names end in \fB.gz\fP or |
| 106 | \fB.bz2\fP, respectively. You can find out whether your \fBpcre2grep\fP binary |
| 107 | has support for one or both of these file types by running it with the |
| 108 | \fB--help\fP option. If the appropriate support is not present, all files are |
| 109 | treated as plain text. The standard input is always so treated. When input is |
| 110 | from a compressed .gz or .bz2 file, the \fB--line-buffered\fP option is |
| 111 | ignored. |
| 112 | . |
| 113 | . |
| 114 | .SH "BINARY FILES" |
| 115 | .rs |
| 116 | .sp |
| 117 | By default, a file that contains a binary zero byte within the first 1024 bytes |
| 118 | is identified as a binary file, and is processed specially. However, if the |
| 119 | newline type is specified as NUL, that is, the line terminator is a binary |
| 120 | zero, the test for a binary file is not applied. See the \fB--binary-files\fP |
| 121 | option for a means of changing the way binary files are handled. |
| 122 | . |
| 123 | . |
| 124 | .SH "BINARY ZEROS IN PATTERNS" |
| 125 | .rs |
| 126 | .sp |
| 127 | Patterns passed from the command line are strings that are terminated by a |
| 128 | binary zero, so cannot contain internal zeros. However, patterns that are read |
| 129 | from a file via the \fB-f\fP option may contain binary zeros. |
| 130 | . |
| 131 | . |
| 132 | .SH OPTIONS |
| 133 | .rs |
| 134 | .sp |
| 135 | The order in which some of the options appear can affect the output. For |
| 136 | example, both the \fB-H\fP and \fB-l\fP options affect the printing of file |
| 137 | names. Whichever comes later in the command line will be the one that takes |
| 138 | effect. Similarly, except where noted below, if an option is given twice, the |
| 139 | later setting is used. Numerical values for options may be followed by K or M, |
| 140 | to signify multiplication by 1024 or 1024*1024 respectively. |
| 141 | .TP 10 |
| 142 | \fB--\fP |
| 143 | This terminates the list of options. It is useful if the next item on the |
| 144 | command line starts with a hyphen but is not an option. This allows for the |
| 145 | processing of patterns and file names that start with hyphens. |
| 146 | .TP |
| 147 | \fB-A\fP \fInumber\fP, \fB--after-context=\fP\fInumber\fP |
| 148 | Output up to \fInumber\fP lines of context after each matching line. Fewer |
| 149 | lines are output if the next match or the end of the file is reached, or if the |
| 150 | processing buffer size has been set too small. If file names and/or line |
| 151 | numbers are being output, a hyphen separator is used instead of a colon for the |
| 152 | context lines. A line containing "--" is output between each group of lines, |
| 153 | unless they are in fact contiguous in the input file. The value of \fInumber\fP |
| 154 | is expected to be relatively small. When \fB-c\fP is used, \fB-A\fP is ignored. |
| 155 | .TP |
| 156 | \fB-a\fP, \fB--text\fP |
| 157 | Treat binary files as text. This is equivalent to |
| 158 | \fB--binary-files\fP=\fItext\fP. |
| 159 | .TP |
| 160 | \fB--allow-lookaround-bsk\fP |
| 161 | PCRE2 now forbids the use of \eK in lookarounds by default, in line with Perl. |
| 162 | This option causes \fBpcre2grep\fP to set the PCRE2_EXTRA_ALLOW_LOOKAROUND_BSK |
| 163 | option, which enables this somewhat dangerous usage. |
| 164 | .TP |
| 165 | \fB-B\fP \fInumber\fP, \fB--before-context=\fP\fInumber\fP |
| 166 | Output up to \fInumber\fP lines of context before each matching line. Fewer |
| 167 | lines are output if the previous match or the start of the file is within |
| 168 | \fInumber\fP lines, or if the processing buffer size has been set too small. If |
| 169 | file names and/or line numbers are being output, a hyphen separator is used |
| 170 | instead of a colon for the context lines. A line containing "--" is output |
| 171 | between each group of lines, unless they are in fact contiguous in the input |
| 172 | file. The value of \fInumber\fP is expected to be relatively small. When |
| 173 | \fB-c\fP is used, \fB-B\fP is ignored. |
| 174 | .TP |
| 175 | \fB--binary-files=\fP\fIword\fP |
| 176 | Specify how binary files are to be processed. If the word is "binary" (the |
| 177 | default), pattern matching is performed on binary files, but the only output is |
| 178 | "Binary file <name> matches" when a match succeeds. If the word is "text", |
| 179 | which is equivalent to the \fB-a\fP or \fB--text\fP option, binary files are |
| 180 | processed in the same way as any other file. In this case, when a match |
| 181 | succeeds, the output may be binary garbage, which can have nasty effects if |
| 182 | sent to a terminal. If the word is "without-match", which is equivalent to the |
| 183 | \fB-I\fP option, binary files are not processed at all; they are assumed not to |
| 184 | be of interest and are skipped without causing any output or affecting the |
| 185 | return code. |
| 186 | .TP |
| 187 | \fB--buffer-size=\fP\fInumber\fP |
| 188 | Set the parameter that controls how much memory is obtained at the start of |
| 189 | processing for buffering files that are being scanned. See also |
| 190 | \fB--max-buffer-size\fP below. |
| 191 | .TP |
| 192 | \fB-C\fP \fInumber\fP, \fB--context=\fP\fInumber\fP |
| 193 | Output \fInumber\fP lines of context both before and after each matching line. |
| 194 | This is equivalent to setting both \fB-A\fP and \fB-B\fP to the same value. |
| 195 | .TP |
| 196 | \fB-c\fP, \fB--count\fP |
| 197 | Do not output lines from the files that are being scanned; instead output the |
| 198 | number of lines that would have been shown, either because they matched, or, if |
| 199 | \fB-v\fP is set, because they failed to match. By default, this count is |
| 200 | exactly the same as the number of lines that would have been output, but if the |
| 201 | \fB-M\fP (multiline) option is used (without \fB-v\fP), there may be more |
| 202 | suppressed lines than the count (that is, the number of matches). |
| 203 | .sp |
| 204 | If no lines are selected, the number zero is output. If several files are are |
| 205 | being scanned, a count is output for each of them and the \fB-t\fP option can |
| 206 | be used to cause a total to be output at the end. However, if the |
| 207 | \fB--files-with-matches\fP option is also used, only those files whose counts |
| 208 | are greater than zero are listed. When \fB-c\fP is used, the \fB-A\fP, |
| 209 | \fB-B\fP, and \fB-C\fP options are ignored. |
| 210 | .TP |
| 211 | \fB--colour\fP, \fB--color\fP |
| 212 | If this option is given without any data, it is equivalent to "--colour=auto". |
| 213 | If data is required, it must be given in the same shell item, separated by an |
| 214 | equals sign. |
| 215 | .TP |
| 216 | \fB--colour=\fP\fIvalue\fP, \fB--color=\fP\fIvalue\fP |
| 217 | This option specifies under what circumstances the parts of a line that matched |
| 218 | a pattern should be coloured in the output. By default, the output is not |
| 219 | coloured. The value (which is optional, see above) may be "never", "always", or |
| 220 | "auto". In the latter case, colouring happens only if the standard output is |
| 221 | connected to a terminal. More resources are used when colouring is enabled, |
| 222 | because \fBpcre2grep\fP has to search for all possible matches in a line, not |
| 223 | just one, in order to colour them all. |
| 224 | .sp |
| 225 | The colour that is used can be specified by setting one of the environment |
| 226 | variables PCRE2GREP_COLOUR, PCRE2GREP_COLOR, PCREGREP_COLOUR, or |
| 227 | PCREGREP_COLOR, which are checked in that order. If none of these are set, |
| 228 | \fBpcre2grep\fP looks for GREP_COLORS or GREP_COLOR (in that order). The value |
| 229 | of the variable should be a string of two numbers, separated by a semicolon, |
| 230 | except in the case of GREP_COLORS, which must start with "ms=" or "mt=" |
| 231 | followed by two semicolon-separated colours, terminated by the end of the |
| 232 | string or by a colon. If GREP_COLORS does not start with "ms=" or "mt=" it is |
| 233 | ignored, and GREP_COLOR is checked. |
| 234 | .sp |
| 235 | If the string obtained from one of the above variables contains any characters |
| 236 | other than semicolon or digits, the setting is ignored and the default colour |
| 237 | is used. The string is copied directly into the control string for setting |
| 238 | colour on a terminal, so it is your responsibility to ensure that the values |
| 239 | make sense. If no relevant environment variable is set, the default is "1;31", |
| 240 | which gives red. |
| 241 | .TP |
| 242 | \fB-D\fP \fIaction\fP, \fB--devices=\fP\fIaction\fP |
| 243 | If an input path is not a regular file or a directory, "action" specifies how |
| 244 | it is to be processed. Valid values are "read" (the default) or "skip" |
| 245 | (silently skip the path). |
| 246 | .TP |
| 247 | \fB-d\fP \fIaction\fP, \fB--directories=\fP\fIaction\fP |
| 248 | If an input path is a directory, "action" specifies how it is to be processed. |
| 249 | Valid values are "read" (the default in non-Windows environments, for |
| 250 | compatibility with GNU grep), "recurse" (equivalent to the \fB-r\fP option), or |
| 251 | "skip" (silently skip the path, the default in Windows environments). In the |
| 252 | "read" case, directories are read as if they were ordinary files. In some |
| 253 | operating systems the effect of reading a directory like this is an immediate |
| 254 | end-of-file; in others it may provoke an error. |
| 255 | .TP |
| 256 | \fB--depth-limit\fP=\fInumber\fP |
| 257 | See \fB--match-limit\fP below. |
| 258 | .TP |
| 259 | \fB-e\fP \fIpattern\fP, \fB--regex=\fP\fIpattern\fP, \fB--regexp=\fP\fIpattern\fP |
| 260 | Specify a pattern to be matched. This option can be used multiple times in |
| 261 | order to specify several patterns. It can also be used as a way of specifying a |
| 262 | single pattern that starts with a hyphen. When \fB-e\fP is used, no argument |
| 263 | pattern is taken from the command line; all arguments are treated as file |
| 264 | names. There is no limit to the number of patterns. They are applied to each |
| 265 | line in the order in which they are defined until one matches. |
| 266 | .sp |
| 267 | If \fB-f\fP is used with \fB-e\fP, the command line patterns are matched first, |
| 268 | followed by the patterns from the file(s), independent of the order in which |
| 269 | these options are specified. Note that multiple use of \fB-e\fP is not the same |
| 270 | as a single pattern with alternatives. For example, X|Y finds the first |
| 271 | character in a line that is X or Y, whereas if the two patterns are given |
| 272 | separately, with X first, \fBpcre2grep\fP finds X if it is present, even if it |
| 273 | follows Y in the line. It finds Y only if there is no X in the line. This |
| 274 | matters only if you are using \fB-o\fP or \fB--colo(u)r\fP to show the part(s) |
| 275 | of the line that matched. |
| 276 | .TP |
| 277 | \fB--exclude\fP=\fIpattern\fP |
| 278 | Files (but not directories) whose names match the pattern are skipped without |
| 279 | being processed. This applies to all files, whether listed on the command line, |
| 280 | obtained from \fB--file-list\fP, or by scanning a directory. The pattern is a |
| 281 | PCRE2 regular expression, and is matched against the final component of the |
| 282 | file name, not the entire path. The \fB-F\fP, \fB-w\fP, and \fB-x\fP options do |
| 283 | not apply to this pattern. The option may be given any number of times in order |
| 284 | to specify multiple patterns. If a file name matches both an \fB--include\fP |
| 285 | and an \fB--exclude\fP pattern, it is excluded. There is no short form for this |
| 286 | option. |
| 287 | .TP |
| 288 | \fB--exclude-from=\fP\fIfilename\fP |
| 289 | Treat each non-empty line of the file as the data for an \fB--exclude\fP |
| 290 | option. What constitutes a newline when reading the file is the operating |
| 291 | system's default. The \fB--newline\fP option has no effect on this option. This |
| 292 | option may be given more than once in order to specify a number of files to |
| 293 | read. |
| 294 | .TP |
| 295 | \fB--exclude-dir\fP=\fIpattern\fP |
| 296 | Directories whose names match the pattern are skipped without being processed, |
| 297 | whatever the setting of the \fB--recursive\fP option. This applies to all |
| 298 | directories, whether listed on the command line, obtained from |
| 299 | \fB--file-list\fP, or by scanning a parent directory. The pattern is a PCRE2 |
| 300 | regular expression, and is matched against the final component of the directory |
| 301 | name, not the entire path. The \fB-F\fP, \fB-w\fP, and \fB-x\fP options do not |
| 302 | apply to this pattern. The option may be given any number of times in order to |
| 303 | specify more than one pattern. If a directory matches both \fB--include-dir\fP |
| 304 | and \fB--exclude-dir\fP, it is excluded. There is no short form for this |
| 305 | option. |
| 306 | .TP |
| 307 | \fB-F\fP, \fB--fixed-strings\fP |
| 308 | Interpret each data-matching pattern as a list of fixed strings, separated by |
| 309 | newlines, instead of as a regular expression. What constitutes a newline for |
| 310 | this purpose is controlled by the \fB--newline\fP option. The \fB-w\fP (match |
| 311 | as a word) and \fB-x\fP (match whole line) options can be used with \fB-F\fP. |
| 312 | They apply to each of the fixed strings. A line is selected if any of the fixed |
| 313 | strings are found in it (subject to \fB-w\fP or \fB-x\fP, if present). This |
| 314 | option applies only to the patterns that are matched against the contents of |
| 315 | files; it does not apply to patterns specified by any of the \fB--include\fP or |
| 316 | \fB--exclude\fP options. |
| 317 | .TP |
| 318 | \fB-f\fP \fIfilename\fP, \fB--file=\fP\fIfilename\fP |
| 319 | Read patterns from the file, one per line, and match them against each line of |
| 320 | input. As is the case with patterns on the command line, no delimiters should |
| 321 | be used. What constitutes a newline when reading the file is the operating |
| 322 | system's default interpretation of \en. The \fB--newline\fP option has no |
| 323 | effect on this option. Trailing white space is removed from each line, and |
| 324 | blank lines are ignored. An empty file contains no patterns and therefore |
| 325 | matches nothing. Patterns read from a file in this way may contain binary |
| 326 | zeros, which are treated as ordinary data characters. See also the comments |
| 327 | about multiple patterns versus a single pattern with alternatives in the |
| 328 | description of \fB-e\fP above. |
| 329 | .sp |
| 330 | If this option is given more than once, all the specified files are read. A |
| 331 | data line is output if any of the patterns match it. A file name can be given |
| 332 | as "-" to refer to the standard input. When \fB-f\fP is used, patterns |
| 333 | specified on the command line using \fB-e\fP may also be present; they are |
| 334 | tested before the file's patterns. However, no other pattern is taken from the |
| 335 | command line; all arguments are treated as the names of paths to be searched. |
| 336 | .TP |
| 337 | \fB--file-list\fP=\fIfilename\fP |
| 338 | Read a list of files and/or directories that are to be scanned from the given |
| 339 | file, one per line. What constitutes a newline when reading the file is the |
| 340 | operating system's default. Trailing white space is removed from each line, and |
| 341 | blank lines are ignored. These paths are processed before any that are listed |
| 342 | on the command line. The file name can be given as "-" to refer to the standard |
| 343 | input. If \fB--file\fP and \fB--file-list\fP are both specified as "-", |
| 344 | patterns are read first. This is useful only when the standard input is a |
| 345 | terminal, from which further lines (the list of files) can be read after an |
| 346 | end-of-file indication. If this option is given more than once, all the |
| 347 | specified files are read. |
| 348 | .TP |
| 349 | \fB--file-offsets\fP |
| 350 | Instead of showing lines or parts of lines that match, show each match as an |
| 351 | offset from the start of the file and a length, separated by a comma. In this |
| 352 | mode, no context is shown. That is, the \fB-A\fP, \fB-B\fP, and \fB-C\fP |
| 353 | options are ignored. If there is more than one match in a line, each of them is |
| 354 | shown separately. This option is mutually exclusive with \fB--output\fP, |
| 355 | \fB--line-offsets\fP, and \fB--only-matching\fP. |
| 356 | .TP |
| 357 | \fB-H\fP, \fB--with-filename\fP |
| 358 | Force the inclusion of the file name at the start of output lines when |
| 359 | searching a single file. By default, the file name is not shown in this case. |
| 360 | For matching lines, the file name is followed by a colon; for context lines, a |
| 361 | hyphen separator is used. If a line number is also being output, it follows the |
| 362 | file name. When the \fB-M\fP option causes a pattern to match more than one |
| 363 | line, only the first is preceded by the file name. This option overrides any |
| 364 | previous \fB-h\fP, \fB-l\fP, or \fB-L\fP options. |
| 365 | .TP |
| 366 | \fB-h\fP, \fB--no-filename\fP |
| 367 | Suppress the output file names when searching multiple files. By default, |
| 368 | file names are shown when multiple files are searched. For matching lines, the |
| 369 | file name is followed by a colon; for context lines, a hyphen separator is used. |
| 370 | If a line number is also being output, it follows the file name. This option |
| 371 | overrides any previous \fB-H\fP, \fB-L\fP, or \fB-l\fP options. |
| 372 | .TP |
| 373 | \fB--heap-limit\fP=\fInumber\fP |
| 374 | See \fB--match-limit\fP below. |
| 375 | .TP |
| 376 | \fB--help\fP |
| 377 | Output a help message, giving brief details of the command options and file |
| 378 | type support, and then exit. Anything else on the command line is |
| 379 | ignored. |
| 380 | .TP |
| 381 | \fB-I\fP |
| 382 | Ignore binary files. This is equivalent to |
| 383 | \fB--binary-files\fP=\fIwithout-match\fP. |
| 384 | .TP |
| 385 | \fB-i\fP, \fB--ignore-case\fP |
| 386 | Ignore upper/lower case distinctions during comparisons. |
| 387 | .TP |
| 388 | \fB--include\fP=\fIpattern\fP |
| 389 | If any \fB--include\fP patterns are specified, the only files that are |
| 390 | processed are those whose names match one of the patterns and do not match an |
| 391 | \fB--exclude\fP pattern. This option does not affect directories, but it |
| 392 | applies to all files, whether listed on the command line, obtained from |
| 393 | \fB--file-list\fP, or by scanning a directory. The pattern is a PCRE2 regular |
| 394 | expression, and is matched against the final component of the file name, not |
| 395 | the entire path. The \fB-F\fP, \fB-w\fP, and \fB-x\fP options do not apply to |
| 396 | this pattern. The option may be given any number of times. If a file name |
| 397 | matches both an \fB--include\fP and an \fB--exclude\fP pattern, it is excluded. |
| 398 | There is no short form for this option. |
| 399 | .TP |
| 400 | \fB--include-from=\fP\fIfilename\fP |
| 401 | Treat each non-empty line of the file as the data for an \fB--include\fP |
| 402 | option. What constitutes a newline for this purpose is the operating system's |
| 403 | default. The \fB--newline\fP option has no effect on this option. This option |
| 404 | may be given any number of times; all the files are read. |
| 405 | .TP |
| 406 | \fB--include-dir\fP=\fIpattern\fP |
| 407 | If any \fB--include-dir\fP patterns are specified, the only directories that |
| 408 | are processed are those whose names match one of the patterns and do not match |
| 409 | an \fB--exclude-dir\fP pattern. This applies to all directories, whether listed |
| 410 | on the command line, obtained from \fB--file-list\fP, or by scanning a parent |
| 411 | directory. The pattern is a PCRE2 regular expression, and is matched against |
| 412 | the final component of the directory name, not the entire path. The \fB-F\fP, |
| 413 | \fB-w\fP, and \fB-x\fP options do not apply to this pattern. The option may be |
| 414 | given any number of times. If a directory matches both \fB--include-dir\fP and |
| 415 | \fB--exclude-dir\fP, it is excluded. There is no short form for this option. |
| 416 | .TP |
| 417 | \fB-L\fP, \fB--files-without-match\fP |
| 418 | Instead of outputting lines from the files, just output the names of the files |
| 419 | that do not contain any lines that would have been output. Each file name is |
| 420 | output once, on a separate line. This option overrides any previous \fB-H\fP, |
| 421 | \fB-h\fP, or \fB-l\fP options. |
| 422 | .TP |
| 423 | \fB-l\fP, \fB--files-with-matches\fP |
| 424 | Instead of outputting lines from the files, just output the names of the files |
| 425 | containing lines that would have been output. Each file name is output once, on |
| 426 | a separate line. Searching normally stops as soon as a matching line is found |
| 427 | in a file. However, if the \fB-c\fP (count) option is also used, matching |
| 428 | continues in order to obtain the correct count, and those files that have at |
| 429 | least one match are listed along with their counts. Using this option with |
| 430 | \fB-c\fP is a way of suppressing the listing of files with no matches that |
| 431 | occurs with \fB-c\fP on its own. This option overrides any previous \fB-H\fP, |
| 432 | \fB-h\fP, or \fB-L\fP options. |
| 433 | .TP |
| 434 | \fB--label\fP=\fIname\fP |
| 435 | This option supplies a name to be used for the standard input when file names |
| 436 | are being output. If not supplied, "(standard input)" is used. There is no |
| 437 | short form for this option. |
| 438 | .TP |
| 439 | \fB--line-buffered\fP |
| 440 | When this option is given, non-compressed input is read and processed line by |
| 441 | line, and the output is flushed after each write. By default, input is read in |
| 442 | large chunks, unless \fBpcre2grep\fP can determine that it is reading from a |
| 443 | terminal, which is currently possible only in Unix-like environments or |
| 444 | Windows. Output to terminal is normally automatically flushed by the operating |
| 445 | system. This option can be useful when the input or output is attached to a |
| 446 | pipe and you do not want \fBpcre2grep\fP to buffer up large amounts of data. |
| 447 | However, its use will affect performance, and the \fB-M\fP (multiline) option |
| 448 | ceases to work. When input is from a compressed .gz or .bz2 file, |
| 449 | \fB--line-buffered\fP is ignored. |
| 450 | .TP |
| 451 | \fB--line-offsets\fP |
| 452 | Instead of showing lines or parts of lines that match, show each match as a |
| 453 | line number, the offset from the start of the line, and a length. The line |
| 454 | number is terminated by a colon (as usual; see the \fB-n\fP option), and the |
| 455 | offset and length are separated by a comma. In this mode, no context is shown. |
| 456 | That is, the \fB-A\fP, \fB-B\fP, and \fB-C\fP options are ignored. If there is |
| 457 | more than one match in a line, each of them is shown separately. This option is |
| 458 | mutually exclusive with \fB--output\fP, \fB--file-offsets\fP, and |
| 459 | \fB--only-matching\fP. |
| 460 | .TP |
| 461 | \fB--locale\fP=\fIlocale-name\fP |
| 462 | This option specifies a locale to be used for pattern matching. It overrides |
| 463 | the value in the \fBLC_ALL\fP or \fBLC_CTYPE\fP environment variables. If no |
| 464 | locale is specified, the PCRE2 library's default (usually the "C" locale) is |
| 465 | used. There is no short form for this option. |
| 466 | .TP |
| 467 | \fB-M\fP, \fB--multiline\fP |
| 468 | Allow patterns to match more than one line. When this option is set, the PCRE2 |
| 469 | library is called in "multiline" mode. This allows a matched string to extend |
| 470 | past the end of a line and continue on one or more subsequent lines. Patterns |
| 471 | used with \fB-M\fP may usefully contain literal newline characters and internal |
| 472 | occurrences of ^ and $ characters. The output for a successful match may |
| 473 | consist of more than one line. The first line is the line in which the match |
| 474 | started, and the last line is the line in which the match ended. If the matched |
| 475 | string ends with a newline sequence, the output ends at the end of that line. |
| 476 | If \fB-v\fP is set, none of the lines in a multi-line match are output. Once a |
| 477 | match has been handled, scanning restarts at the beginning of the line after |
| 478 | the one in which the match ended. |
| 479 | .sp |
| 480 | The newline sequence that separates multiple lines must be matched as part of |
| 481 | the pattern. For example, to find the phrase "regular expression" in a file |
| 482 | where "regular" might be at the end of a line and "expression" at the start of |
| 483 | the next line, you could use this command: |
| 484 | .sp |
| 485 | pcre2grep -M 'regular\es+expression' <file> |
| 486 | .sp |
| 487 | The \es escape sequence matches any white space character, including newlines, |
| 488 | and is followed by + so as to match trailing white space on the first line as |
| 489 | well as possibly handling a two-character newline sequence. |
| 490 | .sp |
| 491 | There is a limit to the number of lines that can be matched, imposed by the way |
| 492 | that \fBpcre2grep\fP buffers the input file as it scans it. With a sufficiently |
| 493 | large processing buffer, this should not be a problem, but the \fB-M\fP option |
| 494 | does not work when input is read line by line (see \fB--line-buffered\fP.) |
| 495 | .TP |
| 496 | \fB-m\fP \fInumber\fP, \fB--max-count\fP=\fInumber\fP |
| 497 | Stop processing after finding \fInumber\fP matching lines, or non-matching |
| 498 | lines if \fB-v\fP is also set. Any trailing context lines are output after the |
| 499 | final match. In multiline mode, each multiline match counts as just one line |
| 500 | for this purpose. If this limit is reached when reading the standard input from |
| 501 | a regular file, the file is left positioned just after the last matching line. |
| 502 | If \fB-c\fP is also set, the count that is output is never greater than |
| 503 | \fInumber\fP. This option has no effect if used with \fB-L\fP, \fB-l\fP, or |
| 504 | \fB-q\fP, or when just checking for a match in a binary file. |
| 505 | .TP |
| 506 | \fB--match-limit\fP=\fInumber\fP |
| 507 | Processing some regular expression patterns may take a very long time to search |
| 508 | for all possible matching strings. Others may require a very large amount of |
| 509 | memory. There are three options that set resource limits for matching. |
| 510 | .sp |
| 511 | The \fB--match-limit\fP option provides a means of limiting computing resource |
| 512 | usage when processing patterns that are not going to match, but which have a |
| 513 | very large number of possibilities in their search trees. The classic example |
| 514 | is a pattern that uses nested unlimited repeats. Internally, PCRE2 has a |
| 515 | counter that is incremented each time around its main processing loop. If the |
| 516 | value set by \fB--match-limit\fP is reached, an error occurs. |
| 517 | .sp |
| 518 | The \fB--heap-limit\fP option specifies, as a number of kibibytes (units of |
| 519 | 1024 bytes), the amount of heap memory that may be used for matching. Heap |
| 520 | memory is needed only if matching the pattern requires a significant number of |
| 521 | nested backtracking points to be remembered. This parameter can be set to zero |
| 522 | to forbid the use of heap memory altogether. |
| 523 | .sp |
| 524 | The \fB--depth-limit\fP option limits the depth of nested backtracking points, |
| 525 | which indirectly limits the amount of memory that is used. The amount of memory |
| 526 | needed for each backtracking point depends on the number of capturing |
| 527 | parentheses in the pattern, so the amount of memory that is used before this |
| 528 | limit acts varies from pattern to pattern. This limit is of use only if it is |
| 529 | set smaller than \fB--match-limit\fP. |
| 530 | .sp |
| 531 | There are no short forms for these options. The default limits can be set |
| 532 | when the PCRE2 library is compiled; if they are not specified, the defaults |
| 533 | are very large and so effectively unlimited. |
| 534 | .TP |
| 535 | \fB--max-buffer-size\fP=\fInumber\fP |
| 536 | This limits the expansion of the processing buffer, whose initial size can be |
| 537 | set by \fB--buffer-size\fP. The maximum buffer size is silently forced to be no |
| 538 | smaller than the starting buffer size. |
| 539 | .TP |
| 540 | \fB-N\fP \fInewline-type\fP, \fB--newline\fP=\fInewline-type\fP |
| 541 | Six different conventions for indicating the ends of lines in scanned files are |
| 542 | supported. For example: |
| 543 | .sp |
| 544 | pcre2grep -N CRLF 'some pattern' <file> |
| 545 | .sp |
| 546 | The newline type may be specified in upper, lower, or mixed case. If the |
| 547 | newline type is NUL, lines are separated by binary zero characters. The other |
| 548 | types are the single-character sequences CR (carriage return) and LF |
| 549 | (linefeed), the two-character sequence CRLF, an "anycrlf" type, which |
| 550 | recognizes any of the preceding three types, and an "any" type, for which any |
| 551 | Unicode line ending sequence is assumed to end a line. The Unicode sequences |
| 552 | are the three just mentioned, plus VT (vertical tab, U+000B), FF (form feed, |
| 553 | U+000C), NEL (next line, U+0085), LS (line separator, U+2028), and PS |
| 554 | (paragraph separator, U+2029). |
| 555 | .sp |
| 556 | When the PCRE2 library is built, a default line-ending sequence is specified. |
| 557 | This is normally the standard sequence for the operating system. Unless |
| 558 | otherwise specified by this option, \fBpcre2grep\fP uses the library's default. |
| 559 | .sp |
| 560 | This option makes it possible to use \fBpcre2grep\fP to scan files that have |
| 561 | come from other environments without having to modify their line endings. If |
| 562 | the data that is being scanned does not agree with the convention set by this |
| 563 | option, \fBpcre2grep\fP may behave in strange ways. Note that this option does |
| 564 | not apply to files specified by the \fB-f\fP, \fB--exclude-from\fP, or |
| 565 | \fB--include-from\fP options, which are expected to use the operating system's |
| 566 | standard newline sequence. |
| 567 | .TP |
| 568 | \fB-n\fP, \fB--line-number\fP |
| 569 | Precede each output line by its line number in the file, followed by a colon |
| 570 | for matching lines or a hyphen for context lines. If the file name is also |
| 571 | being output, it precedes the line number. When the \fB-M\fP option causes a |
| 572 | pattern to match more than one line, only the first is preceded by its line |
| 573 | number. This option is forced if \fB--line-offsets\fP is used. |
| 574 | .TP |
| 575 | \fB--no-jit\fP |
| 576 | If the PCRE2 library is built with support for just-in-time compiling (which |
| 577 | speeds up matching), \fBpcre2grep\fP automatically makes use of this, unless it |
| 578 | was explicitly disabled at build time. This option can be used to disable the |
| 579 | use of JIT at run time. It is provided for testing and working round problems. |
| 580 | It should never be needed in normal use. |
| 581 | .TP |
| 582 | \fB-O\fP \fItext\fP, \fB--output\fP=\fItext\fP |
| 583 | When there is a match, instead of outputting the line that matched, output just |
| 584 | the text specified in this option, followed by an operating-system standard |
| 585 | newline. In this mode, no context is shown. That is, the \fB-A\fP, \fB-B\fP, |
| 586 | and \fB-C\fP options are ignored. The \fB--newline\fP option has no effect on |
| 587 | this option, which is mutually exclusive with \fB--only-matching\fP, |
| 588 | \fB--file-offsets\fP, and \fB--line-offsets\fP. However, like |
| 589 | \fB--only-matching\fP, if there is more than one match in a line, each of them |
| 590 | causes a line of output. |
| 591 | .sp |
| 592 | Escape sequences starting with a dollar character may be used to insert the |
| 593 | contents of the matched part of the line and/or captured substrings into the |
| 594 | text. |
| 595 | .sp |
| 596 | $<digits> or ${<digits>} is replaced by the captured substring of the given |
| 597 | decimal number; zero substitutes the whole match. If the number is greater than |
| 598 | the number of capturing substrings, or if the capture is unset, the replacement |
| 599 | is empty. |
| 600 | .sp |
| 601 | $a is replaced by bell; $b by backspace; $e by escape; $f by form feed; $n by |
| 602 | newline; $r by carriage return; $t by tab; $v by vertical tab. |
| 603 | .sp |
| 604 | $o<digits> or $o{<digits>} is replaced by the character whose code point is the |
| 605 | given octal number. In the first form, up to three octal digits are processed. |
| 606 | When more digits are needed in Unicode mode to specify a wide character, the |
| 607 | second form must be used. |
| 608 | .sp |
| 609 | $x<digits> or $x{<digits>} is replaced by the character represented by the |
| 610 | given hexadecimal number. In the first form, up to two hexadecimal digits are |
| 611 | processed. When more digits are needed in Unicode mode to specify a wide |
| 612 | character, the second form must be used. |
| 613 | .sp |
| 614 | Any other character is substituted by itself. In particular, $$ is replaced by |
| 615 | a single dollar. |
| 616 | .TP |
| 617 | \fB-o\fP, \fB--only-matching\fP |
| 618 | Show only the part of the line that matched a pattern instead of the whole |
| 619 | line. In this mode, no context is shown. That is, the \fB-A\fP, \fB-B\fP, and |
| 620 | \fB-C\fP options are ignored. If there is more than one match in a line, each |
| 621 | of them is shown separately, on a separate line of output. If \fB-o\fP is |
| 622 | combined with \fB-v\fP (invert the sense of the match to find non-matching |
| 623 | lines), no output is generated, but the return code is set appropriately. If |
| 624 | the matched portion of the line is empty, nothing is output unless the file |
| 625 | name or line number are being printed, in which case they are shown on an |
| 626 | otherwise empty line. This option is mutually exclusive with \fB--output\fP, |
| 627 | \fB--file-offsets\fP and \fB--line-offsets\fP. |
| 628 | .TP |
| 629 | \fB-o\fP\fInumber\fP, \fB--only-matching\fP=\fInumber\fP |
| 630 | Show only the part of the line that matched the capturing parentheses of the |
| 631 | given number. Up to 50 capturing parentheses are supported by default. This |
| 632 | limit can be changed via the \fB--om-capture\fP option. A pattern may contain |
| 633 | any number of capturing parentheses, but only those whose number is within the |
| 634 | limit can be accessed by \fB-o\fP. An error occurs if the number specified by |
| 635 | \fB-o\fP is greater than the limit. |
| 636 | .sp |
| 637 | -o0 is the same as \fB-o\fP without a number. Because these options can be |
| 638 | given without an argument (see above), if an argument is present, it must be |
| 639 | given in the same shell item, for example, -o3 or --only-matching=2. The |
| 640 | comments given for the non-argument case above also apply to this option. If |
| 641 | the specified capturing parentheses do not exist in the pattern, or were not |
| 642 | set in the match, nothing is output unless the file name or line number are |
| 643 | being output. |
| 644 | .sp |
| 645 | If this option is given multiple times, multiple substrings are output for each |
| 646 | match, in the order the options are given, and all on one line. For example, |
| 647 | -o3 -o1 -o3 causes the substrings matched by capturing parentheses 3 and 1 and |
| 648 | then 3 again to be output. By default, there is no separator (but see the next |
| 649 | but one option). |
| 650 | .TP |
| 651 | \fB--om-capture\fP=\fInumber\fP |
| 652 | Set the number of capturing parentheses that can be accessed by \fB-o\fP. The |
| 653 | default is 50. |
| 654 | .TP |
| 655 | \fB--om-separator\fP=\fItext\fP |
| 656 | Specify a separating string for multiple occurrences of \fB-o\fP. The default |
| 657 | is an empty string. Separating strings are never coloured. |
| 658 | .TP |
| 659 | \fB-q\fP, \fB--quiet\fP |
| 660 | Work quietly, that is, display nothing except error messages. The exit |
| 661 | status indicates whether or not any matches were found. |
| 662 | .TP |
| 663 | \fB-r\fP, \fB--recursive\fP |
| 664 | If any given path is a directory, recursively scan the files it contains, |
| 665 | taking note of any \fB--include\fP and \fB--exclude\fP settings. By default, a |
| 666 | directory is read as a normal file; in some operating systems this gives an |
| 667 | immediate end-of-file. This option is a shorthand for setting the \fB-d\fP |
| 668 | option to "recurse". |
| 669 | .TP |
| 670 | \fB--recursion-limit\fP=\fInumber\fP |
| 671 | This is an obsolete synonym for \fB--depth-limit\fP. See \fB--match-limit\fP |
| 672 | above for details. |
| 673 | .TP |
| 674 | \fB-s\fP, \fB--no-messages\fP |
| 675 | Suppress error messages about non-existent or unreadable files. Such files are |
| 676 | quietly skipped. However, the return code is still 2, even if matches were |
| 677 | found in other files. |
| 678 | .TP |
| 679 | \fB-t\fP, \fB--total-count\fP |
| 680 | This option is useful when scanning more than one file. If used on its own, |
| 681 | \fB-t\fP suppresses all output except for a grand total number of matching |
| 682 | lines (or non-matching lines if \fB-v\fP is used) in all the files. If \fB-t\fP |
| 683 | is used with \fB-c\fP, a grand total is output except when the previous output |
| 684 | is just one line. In other words, it is not output when just one file's count |
| 685 | is listed. If file names are being output, the grand total is preceded by |
| 686 | "TOTAL:". Otherwise, it appears as just another number. The \fB-t\fP option is |
| 687 | ignored when used with \fB-L\fP (list files without matches), because the grand |
| 688 | total would always be zero. |
| 689 | .TP |
| 690 | \fB-u\fP, \fB--utf\fP |
| 691 | Operate in UTF-8 mode. This option is available only if PCRE2 has been compiled |
| 692 | with UTF-8 support. All patterns (including those for any \fB--exclude\fP and |
| 693 | \fB--include\fP options) and all lines that are scanned must be valid strings |
| 694 | of UTF-8 characters. If an invalid UTF-8 string is encountered, an error |
| 695 | occurs. |
| 696 | .TP |
| 697 | \fB-U\fP, \fB--utf-allow-invalid\fP |
| 698 | As \fB--utf\fP, but in addition subject lines may contain invalid UTF-8 code |
| 699 | unit sequences. These can never form part of any pattern match. Patterns |
| 700 | themselves, however, must still be valid UTF-8 strings. This facility allows |
| 701 | valid UTF-8 strings to be sought within arbitrary byte sequences in executable |
| 702 | or other binary files. For more details about matching in non-valid UTF-8 |
| 703 | strings, see the |
| 704 | .\" HREF |
| 705 | \fBpcre2unicode\fP(3) |
| 706 | .\" |
| 707 | documentation. |
| 708 | .TP |
| 709 | \fB-V\fP, \fB--version\fP |
| 710 | Write the version numbers of \fBpcre2grep\fP and the PCRE2 library to the |
| 711 | standard output and then exit. Anything else on the command line is |
| 712 | ignored. |
| 713 | .TP |
| 714 | \fB-v\fP, \fB--invert-match\fP |
| 715 | Invert the sense of the match, so that lines which do \fInot\fP match any of |
| 716 | the patterns are the ones that are found. When this option is set, options such |
| 717 | as \fB--only-matching\fP and \fB--output\fP, which specify parts of a match |
| 718 | that are to be output, are ignored. |
| 719 | .TP |
| 720 | \fB-w\fP, \fB--word-regex\fP, \fB--word-regexp\fP |
| 721 | Force the patterns only to match "words". That is, there must be a word |
| 722 | boundary at the start and end of each matched string. This is equivalent to |
| 723 | having "\eb(?:" at the start of each pattern, and ")\eb" at the end. This |
| 724 | option applies only to the patterns that are matched against the contents of |
| 725 | files; it does not apply to patterns specified by any of the \fB--include\fP or |
| 726 | \fB--exclude\fP options. |
| 727 | .TP |
| 728 | \fB-x\fP, \fB--line-regex\fP, \fB--line-regexp\fP |
| 729 | Force the patterns to start matching only at the beginnings of lines, and in |
| 730 | addition, require them to match entire lines. In multiline mode the match may |
| 731 | be more than one line. This is equivalent to having "^(?:" at the start of each |
| 732 | pattern and ")$" at the end. This option applies only to the patterns that are |
| 733 | matched against the contents of files; it does not apply to patterns specified |
| 734 | by any of the \fB--include\fP or \fB--exclude\fP options. |
| 735 | . |
| 736 | . |
| 737 | .SH "ENVIRONMENT VARIABLES" |
| 738 | .rs |
| 739 | .sp |
| 740 | The environment variables \fBLC_ALL\fP and \fBLC_CTYPE\fP are examined, in that |
| 741 | order, for a locale. The first one that is set is used. This can be overridden |
| 742 | by the \fB--locale\fP option. If no locale is set, the PCRE2 library's default |
| 743 | (usually the "C" locale) is used. |
| 744 | . |
| 745 | . |
| 746 | .SH "NEWLINES" |
| 747 | .rs |
| 748 | .sp |
| 749 | The \fB-N\fP (\fB--newline\fP) option allows \fBpcre2grep\fP to scan files with |
| 750 | newline conventions that differ from the default. This option affects only the |
| 751 | way scanned files are processed. It does not affect the interpretation of files |
| 752 | specified by the \fB-f\fP, \fB--file-list\fP, \fB--exclude-from\fP, or |
| 753 | \fB--include-from\fP options. |
| 754 | .P |
| 755 | Any parts of the scanned input files that are written to the standard output |
| 756 | are copied with whatever newline sequences they have in the input. However, if |
| 757 | the final line of a file is output, and it does not end with a newline |
| 758 | sequence, a newline sequence is added. If the newline setting is CR, LF, CRLF |
| 759 | or NUL, that line ending is output; for the other settings (ANYCRLF or ANY) a |
| 760 | single NL is used. |
| 761 | .P |
| 762 | The newline setting does not affect the way in which \fBpcre2grep\fP writes |
| 763 | newlines in informational messages to the standard output and error streams. |
| 764 | Under Windows, the standard output is set to be binary, so that "\er\en" at the |
| 765 | ends of output lines that are copied from the input is not converted to |
| 766 | "\er\er\en" by the C I/O library. This means that any messages written to the |
| 767 | standard output must end with "\er\en". For all other operating systems, and |
| 768 | for all messages to the standard error stream, "\en" is used. |
| 769 | . |
| 770 | . |
| 771 | .SH "OPTIONS COMPATIBILITY" |
| 772 | .rs |
| 773 | .sp |
| 774 | Many of the short and long forms of \fBpcre2grep\fP's options are the same |
| 775 | as in the GNU \fBgrep\fP program. Any long option of the form |
| 776 | \fB--xxx-regexp\fP (GNU terminology) is also available as \fB--xxx-regex\fP |
| 777 | (PCRE2 terminology). However, the \fB--depth-limit\fP, \fB--file-list\fP, |
| 778 | \fB--file-offsets\fP, \fB--heap-limit\fP, \fB--include-dir\fP, |
| 779 | \fB--line-offsets\fP, \fB--locale\fP, \fB--match-limit\fP, \fB-M\fP, |
| 780 | \fB--multiline\fP, \fB-N\fP, \fB--newline\fP, \fB--om-separator\fP, |
| 781 | \fB--output\fP, \fB-u\fP, \fB--utf\fP, \fB-U\fP, and \fB--utf-allow-invalid\fP |
| 782 | options are specific to \fBpcre2grep\fP, as is the use of the |
| 783 | \fB--only-matching\fP option with a capturing parentheses number. |
| 784 | .P |
| 785 | Although most of the common options work the same way, a few are different in |
| 786 | \fBpcre2grep\fP. For example, the \fB--include\fP option's argument is a glob |
| 787 | for GNU \fBgrep\fP, but a regular expression for \fBpcre2grep\fP. If both the |
| 788 | \fB-c\fP and \fB-l\fP options are given, GNU grep lists only file names, |
| 789 | without counts, but \fBpcre2grep\fP gives the counts as well. |
| 790 | . |
| 791 | . |
| 792 | .SH "OPTIONS WITH DATA" |
| 793 | .rs |
| 794 | .sp |
| 795 | There are four different ways in which an option with data can be specified. |
| 796 | If a short form option is used, the data may follow immediately, or (with one |
| 797 | exception) in the next command line item. For example: |
| 798 | .sp |
| 799 | -f/some/file |
| 800 | -f /some/file |
| 801 | .sp |
| 802 | The exception is the \fB-o\fP option, which may appear with or without data. |
| 803 | Because of this, if data is present, it must follow immediately in the same |
| 804 | item, for example -o3. |
| 805 | .P |
| 806 | If a long form option is used, the data may appear in the same command line |
| 807 | item, separated by an equals character, or (with two exceptions) it may appear |
| 808 | in the next command line item. For example: |
| 809 | .sp |
| 810 | --file=/some/file |
| 811 | --file /some/file |
| 812 | .sp |
| 813 | Note, however, that if you want to supply a file name beginning with ~ as data |
| 814 | in a shell command, and have the shell expand ~ to a home directory, you must |
| 815 | separate the file name from the option, because the shell does not treat ~ |
| 816 | specially unless it is at the start of an item. |
| 817 | .P |
| 818 | The exceptions to the above are the \fB--colour\fP (or \fB--color\fP) and |
| 819 | \fB--only-matching\fP options, for which the data is optional. If one of these |
| 820 | options does have data, it must be given in the first form, using an equals |
| 821 | character. Otherwise \fBpcre2grep\fP will assume that it has no data. |
| 822 | . |
| 823 | . |
| 824 | .SH "USING PCRE2'S CALLOUT FACILITY" |
| 825 | .rs |
| 826 | .sp |
| 827 | \fBpcre2grep\fP has, by default, support for calling external programs or |
| 828 | scripts or echoing specific strings during matching by making use of PCRE2's |
| 829 | callout facility. However, this support can be completely or partially disabled |
| 830 | when \fBpcre2grep\fP is built. You can find out whether your binary has support |
| 831 | for callouts by running it with the \fB--help\fP option. If callout support is |
| 832 | completely disabled, all callouts in patterns are ignored by \fBpcre2grep\fP. |
| 833 | If the facility is partially disabled, calling external programs is not |
| 834 | supported, and callouts that request it are ignored. |
| 835 | .P |
| 836 | A callout in a PCRE2 pattern is of the form (?C<arg>) where the argument is |
| 837 | either a number or a quoted string (see the |
| 838 | .\" HREF |
| 839 | \fBpcre2callout\fP |
| 840 | .\" |
| 841 | documentation for details). Numbered callouts are ignored by \fBpcre2grep\fP; |
| 842 | only callouts with string arguments are useful. |
| 843 | . |
| 844 | . |
| 845 | .SS "Echoing a specific string" |
| 846 | .rs |
| 847 | .sp |
| 848 | Starting the callout string with a pipe character invokes an echoing facility |
| 849 | that avoids calling an external program or script. This facility is always |
| 850 | available, provided that callouts were not completely disabled when |
| 851 | \fBpcre2grep\fP was built. The rest of the callout string is processed as a |
| 852 | zero-terminated string, which means it should not contain any internal binary |
| 853 | zeros. It is written to the output, having first been passed through the same |
| 854 | escape processing as text from the \fB--output\fP (\fB-O\fP) option (see |
| 855 | above). However, $0 cannot be used to insert a matched substring because the |
| 856 | match is still in progress. Instead, the single character '0' is inserted. Any |
| 857 | syntax errors in the string (for example, a dollar not followed by another |
| 858 | character) causes the callout to be ignored. No terminator is added to the |
| 859 | output string, so if you want a newline, you must include it explicitly using |
| 860 | the escape $n. For example: |
| 861 | .sp |
| 862 | pcre2grep '(.)(..(.))(?C"|[$1] [$2] [$3]$n")' <some file> |
| 863 | .sp |
| 864 | Matching continues normally after the string is output. If you want to see only |
| 865 | the callout output but not any output from an actual match, you should end the |
| 866 | pattern with (*FAIL). |
| 867 | . |
| 868 | . |
| 869 | .SS "Calling external programs or scripts" |
| 870 | .rs |
| 871 | .sp |
| 872 | This facility can be independently disabled when \fBpcre2grep\fP is built. It |
| 873 | is supported for Windows, where a call to \fB_spawnvp()\fP is used, for VMS, |
| 874 | where \fBlib$spawn()\fP is used, and for any Unix-like environment where |
| 875 | \fBfork()\fP and \fBexecv()\fP are available. |
| 876 | .P |
| 877 | If the callout string does not start with a pipe (vertical bar) character, it |
| 878 | is parsed into a list of substrings separated by pipe characters. The first |
| 879 | substring must be an executable name, with the following substrings specifying |
| 880 | arguments: |
| 881 | .sp |
| 882 | executable_name|arg1|arg2|... |
| 883 | .sp |
| 884 | Any substring (including the executable name) may contain escape sequences |
| 885 | started by a dollar character. These are the same as for the \fB--output\fP |
| 886 | (\fB-O\fP) option documented above, except that $0 cannot insert the matched |
| 887 | string because the match is still in progress. Instead, the character '0' |
| 888 | is inserted. If you need a literal dollar or pipe character in any |
| 889 | substring, use $$ or $| respectively. Here is an example: |
| 890 | .sp |
| 891 | echo -e "abcde\en12345" | pcre2grep \e |
| 892 | '(?x)(.)(..(.)) |
| 893 | (?C"/bin/echo|Arg1: [$1] [$2] [$3]|Arg2: $|${1}$| ($4)")()' - |
| 894 | .sp |
| 895 | Output: |
| 896 | .sp |
| 897 | Arg1: [a] [bcd] [d] Arg2: |a| () |
| 898 | abcde |
| 899 | Arg1: [1] [234] [4] Arg2: |1| () |
| 900 | 12345 |
| 901 | .sp |
| 902 | The parameters for the system call that is used to run the program or script |
| 903 | are zero-terminated strings. This means that binary zero characters in the |
| 904 | callout argument will cause premature termination of their substrings, and |
| 905 | therefore should not be present. Any syntax errors in the string (for example, |
| 906 | a dollar not followed by another character) causes the callout to be ignored. |
| 907 | If running the program fails for any reason (including the non-existence of the |
| 908 | executable), a local matching failure occurs and the matcher backtracks in the |
| 909 | normal way. |
| 910 | . |
| 911 | . |
| 912 | .SH "MATCHING ERRORS" |
| 913 | .rs |
| 914 | .sp |
| 915 | It is possible to supply a regular expression that takes a very long time to |
| 916 | fail to match certain lines. Such patterns normally involve nested indefinite |
| 917 | repeats, for example: (a+)*\ed when matched against a line of a's with no final |
| 918 | digit. The PCRE2 matching function has a resource limit that causes it to abort |
| 919 | in these circumstances. If this happens, \fBpcre2grep\fP outputs an error |
| 920 | message and the line that caused the problem to the standard error stream. If |
| 921 | there are more than 20 such errors, \fBpcre2grep\fP gives up. |
| 922 | .P |
| 923 | The \fB--match-limit\fP option of \fBpcre2grep\fP can be used to set the |
| 924 | overall resource limit. There are also other limits that affect the amount of |
| 925 | memory used during matching; see the discussion of \fB--heap-limit\fP and |
| 926 | \fB--depth-limit\fP above. |
| 927 | . |
| 928 | . |
| 929 | .SH DIAGNOSTICS |
| 930 | .rs |
| 931 | .sp |
| 932 | Exit status is 0 if any matches were found, 1 if no matches were found, and 2 |
| 933 | for syntax errors, overlong lines, non-existent or inaccessible files (even if |
| 934 | matches were found in other files) or too many matching errors. Using the |
| 935 | \fB-s\fP option to suppress error messages about inaccessible files does not |
| 936 | affect the return code. |
| 937 | .P |
| 938 | When run under VMS, the return code is placed in the symbol PCRE2GREP_RC |
| 939 | because VMS does not distinguish between exit(0) and exit(1). |
| 940 | . |
| 941 | . |
| 942 | .SH "SEE ALSO" |
| 943 | .rs |
| 944 | .sp |
| 945 | \fBpcre2pattern\fP(3), \fBpcre2syntax\fP(3), \fBpcre2callout\fP(3), |
| 946 | \fBpcre2unicode\fP(3). |
| 947 | . |
| 948 | . |
| 949 | .SH AUTHOR |
| 950 | .rs |
| 951 | .sp |
| 952 | .nf |
| 953 | Philip Hazel |
| 954 | Retired from University Computing Service |
| 955 | Cambridge, England. |
| 956 | .fi |
| 957 | . |
| 958 | . |
| 959 | .SH REVISION |
| 960 | .rs |
| 961 | .sp |
| 962 | .nf |
| 963 | Last updated: 31 August 2021 |
| 964 | Copyright (c) 1997-2021 University of Cambridge. |
| 965 | .fi |