| .TH PCRE2CONVERT 3 "28 June 2018" "PCRE2 10.32" |
| .SH NAME |
| PCRE2 - Perl-compatible regular expressions (revised API) |
| .SH "EXPERIMENTAL PATTERN CONVERSION FUNCTIONS" |
| .rs |
| .sp |
| This document describes a set of functions that can be used to convert |
| "foreign" patterns into PCRE2 regular expressions. This facility is currently |
| experimental, and may be changed in future releases. Two kinds of pattern, |
| globs and POSIX patterns, are supported. |
| . |
| . |
| .SH "THE CONVERT CONTEXT" |
| .rs |
| .sp |
| .nf |
| .B pcre2_convert_context *pcre2_convert_context_create( |
| .B " pcre2_general_context *\fIgcontext\fP);" |
| .sp |
| .B pcre2_convert_context *pcre2_convert_context_copy( |
| .B " pcre2_convert_context *\fIcvcontext\fP);" |
| .sp |
| .B void pcre2_convert_context_free(pcre2_convert_context *\fIcvcontext\fP); |
| .sp |
| .B int pcre2_set_glob_escape(pcre2_convert_context *\fIcvcontext\fP, |
| .B " uint32_t \fIescape_char\fP);" |
| .sp |
| .B int pcre2_set_glob_separator(pcre2_convert_context *\fIcvcontext\fP, |
| .B " uint32_t \fIseparator_char\fP);" |
| .fi |
| .sp |
| A convert context is used to hold parameters that affect the way that pattern |
| conversion works. Like all PCRE2 contexts, you need to use a context only if |
| you want to override the defaults. There are the usual create, copy, and free |
| functions. If custom memory management functions are set in a general context |
| that is passed to \fBpcre2_convert_context_create()\fP, they are used for all |
| memory management within the conversion functions. |
| .P |
| There are only two parameters in the convert context at present. Both apply |
| only to glob conversions. The escape character defaults to grave accent under |
| Windows, otherwise backslash. It can be set to zero, meaning no escape |
| character, or to any punctuation character with a code point less than 256. |
| The separator character defaults to backslash under Windows, otherwise forward |
| slash. It can be set to forward slash, backslash, or dot. |
| .P |
| The two setting functions return zero on success, or PCRE2_ERROR_BADDATA if |
| their second argument is invalid. |
| . |
| . |
| .SH "THE CONVERSION FUNCTION" |
| .rs |
| .sp |
| .nf |
| .B int pcre2_pattern_convert(PCRE2_SPTR \fIpattern\fP, PCRE2_SIZE \fIlength\fP, |
| .B " uint32_t \fIoptions\fP, PCRE2_UCHAR **\fIbuffer\fP," |
| .B " PCRE2_SIZE *\fIblength\fP, pcre2_convert_context *\fIcvcontext\fP);" |
| .sp |
| .B void pcre2_converted_pattern_free(PCRE2_UCHAR *\fIconverted_pattern\fP); |
| .fi |
| .sp |
| The first two arguments of \fBpcre2_pattern_convert()\fP define the foreign |
| pattern that is to be converted. The length may be given as |
| PCRE2_ZERO_TERMINATED. The \fBoptions\fP argument defines how the pattern is to |
| be processed. If the input is UTF, the PCRE2_CONVERT_UTF option should be set. |
| PCRE2_CONVERT_NO_UTF_CHECK may also be set if you are sure the input is valid. |
| One or more of the glob options, or one of the following POSIX options must be |
| set to define the type of conversion that is required: |
| .sp |
| PCRE2_CONVERT_GLOB |
| PCRE2_CONVERT_GLOB_NO_WILD_SEPARATOR |
| PCRE2_CONVERT_GLOB_NO_STARSTAR |
| PCRE2_CONVERT_POSIX_BASIC |
| PCRE2_CONVERT_POSIX_EXTENDED |
| .sp |
| Details of the conversions are given below. The \fBbuffer\fP and \fBblength\fP |
| arguments define how the output is handled: |
| .P |
| If \fBbuffer\fP is NULL, the function just returns the length of the converted |
| pattern via \fBblength\fP. This is one less than the length of buffer needed, |
| because a terminating zero is always added to the output. |
| .P |
| If \fBbuffer\fP points to a NULL pointer, an output buffer is obtained using |
| the allocator in the context or \fBmalloc()\fP if no context is supplied. A |
| pointer to this buffer is placed in the variable to which \fBbuffer\fP points. |
| When no longer needed the output buffer must be freed by calling |
| \fBpcre2_converted_pattern_free()\fP. If this function is called with a NULL |
| argument, it returns immediately without doing anything. |
| .P |
| If \fBbuffer\fP points to a non-NULL pointer, \fBblength\fP must be set to the |
| actual length of the buffer provided (in code units). |
| .P |
| In all cases, after successful conversion, the variable pointed to by |
| \fBblength\fP is updated to the length actually used (in code units), excluding |
| the terminating zero that is always added. |
| .P |
| If an error occurs, the length (via \fBblength\fP) is set to the offset |
| within the input pattern where the error was detected. Only gross syntax errors |
| are caught; there are plenty of errors that will get passed on for |
| \fBpcre2_compile()\fP to discover. |
| .P |
| The return from \fBpcre2_pattern_convert()\fP is zero on success or a non-zero |
| PCRE2 error code. Note that PCRE2 error codes may be positive or negative: |
| \fBpcre2_compile()\fP uses mostly positive codes and \fBpcre2_match()\fP |
| negative ones; \fBpcre2_convert()\fP uses existing codes of both kinds. A |
| textual error message can be obtained by calling |
| \fBpcre2_get_error_message()\fP. |
| . |
| . |
| .SH "CONVERTING GLOBS" |
| .rs |
| .sp |
| Globs are used to match file names, and consequently have the concept of a |
| "path separator", which defaults to backslash under Windows and forward slash |
| otherwise. If PCRE2_CONVERT_GLOB is set, the wildcards * and ? are not |
| permitted to match separator characters, but the double-star (**) feature |
| (which does match separators) is supported. |
| .P |
| PCRE2_CONVERT_GLOB_NO_WILD_SEPARATOR matches globs with wildcards allowed to |
| match separator characters. PCRE2_CONVERT_GLOB_NO_STARSTAR matches globs with |
| the double-star feature disabled. These options may be given together. |
| . |
| . |
| .SH "CONVERTING POSIX PATTERNS" |
| .rs |
| .sp |
| POSIX defines two kinds of regular expression pattern: basic and extended. |
| These can be processed by setting PCRE2_CONVERT_POSIX_BASIC or |
| PCRE2_CONVERT_POSIX_EXTENDED, respectively. |
| .P |
| In POSIX patterns, backslash is not special in a character class. Unmatched |
| closing parentheses are treated as literals. |
| .P |
| In basic patterns, ? + | {} and () must be escaped to be recognized |
| as metacharacters outside a character class. If the first character in the |
| pattern is * it is treated as a literal. ^ is a metacharacter only at the start |
| of a branch. |
| .P |
| In extended patterns, a backslash not in a character class always |
| makes the next character literal, whatever it is. There are no backreferences. |
| .P |
| Note: POSIX mandates that the longest possible match at the first matching |
| position must be found. This is not what \fBpcre2_match()\fP does; it yields |
| the first match that is found. An application can use \fBpcre2_dfa_match()\fP |
| to find the longest match, but that does not support backreferences (but then |
| neither do POSIX extended patterns). |
| . |
| . |
| .SH AUTHOR |
| .rs |
| .sp |
| .nf |
| Philip Hazel |
| University Computing Service |
| Cambridge, England. |
| .fi |
| . |
| . |
| .SH REVISION |
| .rs |
| .sp |
| .nf |
| Last updated: 28 June 2018 |
| Copyright (c) 1997-2018 University of Cambridge. |
| .fi |