blob: 987bc3a4c6f8fd482df37ffa104a6846fcb57139 [file] [log] [blame]
Elliott Hughes653c2102019-01-09 15:41:36 -08001.TH PCRE2SERIALIZE 3 "27 June 2018" "PCRE2 10.32"
Janis Danisevskis53e448c2016-03-31 13:35:25 +01002.SH NAME
3PCRE2 - Perl-compatible regular expressions (revised API)
4.SH "SAVING AND RE-USING PRECOMPILED PCRE2 PATTERNS"
5.rs
6.sp
7.nf
8.B int32_t pcre2_serialize_decode(pcre2_code **\fIcodes\fP,
Elliott Hughes4e19c8e2022-04-15 15:11:02 -07009.B " int32_t \fInumber_of_codes\fP, const uint8_t *\fIbytes\fP,"
Janis Danisevskis53e448c2016-03-31 13:35:25 +010010.B " pcre2_general_context *\fIgcontext\fP);"
11.sp
Elliott Hughes4e19c8e2022-04-15 15:11:02 -070012.B int32_t pcre2_serialize_encode(const pcre2_code **\fIcodes\fP,
13.B " int32_t \fInumber_of_codes\fP, uint8_t **\fIserialized_bytes\fP,"
Janis Danisevskis53e448c2016-03-31 13:35:25 +010014.B " PCRE2_SIZE *\fIserialized_size\fP, pcre2_general_context *\fIgcontext\fP);"
15.sp
16.B void pcre2_serialize_free(uint8_t *\fIbytes\fP);
17.sp
18.B int32_t pcre2_serialize_get_number_of_codes(const uint8_t *\fIbytes\fP);
19.fi
20.sp
21If you are running an application that uses a large number of regular
22expression patterns, it may be useful to store them in a precompiled form
23instead of having to compile them every time the application is run. However,
24if you are using the just-in-time optimization feature, it is not possible to
25save and reload the JIT data, because it is position-dependent. The host on
26which the patterns are reloaded must be running the same version of PCRE2, with
27the same code unit width, and must also have the same endianness, pointer width
28and PCRE2_SIZE type. For example, patterns compiled on a 32-bit system using
29PCRE2's 16-bit library cannot be reloaded on a 64-bit system, nor can they be
30reloaded using the 8-bit library.
Elliott Hughes653c2102019-01-09 15:41:36 -080031.P
32Note that "serialization" in PCRE2 does not convert compiled patterns to an
33abstract format like Java or .NET serialization. The serialized output is
34really just a bytecode dump, which is why it can only be reloaded in the same
35environment as the one that created it. Hence the restrictions mentioned above.
36Applications that are not statically linked with a fixed version of PCRE2 must
37be prepared to recompile patterns from their sources, in order to be immune to
38PCRE2 upgrades.
Janis Danisevskis53e448c2016-03-31 13:35:25 +010039.
40.
Janis Danisevskis8b979b22016-08-15 16:09:16 +010041.SH "SECURITY CONCERNS"
42.rs
43.sp
44The facility for saving and restoring compiled patterns is intended for use
45within individual applications. As such, the data supplied to
46\fBpcre2_serialize_decode()\fP is expected to be trusted data, not data from
47arbitrary external sources. There is only some simple consistency checking, not
Elliott Hughes9bc971b2018-07-27 13:23:14 -070048complete validation of what is being re-loaded. Corrupted data may cause
49undefined results. For example, if the length field of a pattern in the
50serialized data is corrupted, the deserializing code may read beyond the end of
51the byte stream that is passed to it.
Janis Danisevskis8b979b22016-08-15 16:09:16 +010052.
53.
Janis Danisevskis53e448c2016-03-31 13:35:25 +010054.SH "SAVING COMPILED PATTERNS"
55.rs
56.sp
Elliott Hughes653c2102019-01-09 15:41:36 -080057Before compiled patterns can be saved they must be serialized, which in PCRE2
58means converting the pattern to a stream of bytes. A single byte stream may
59contain any number of compiled patterns, but they must all use the same
60character tables. A single copy of the tables is included in the byte stream
61(its size is 1088 bytes). For more details of character tables, see the
Janis Danisevskis53e448c2016-03-31 13:35:25 +010062.\" HTML <a href="pcre2api.html#localesupport">
63.\" </a>
64section on locale support
65.\"
66in the
67.\" HREF
68\fBpcre2api\fP
69.\"
70documentation.
71.P
72The function \fBpcre2_serialize_encode()\fP creates a serialized byte stream
73from a list of compiled patterns. Its first two arguments specify the list,
74being a pointer to a vector of pointers to compiled patterns, and the length of
75the vector. The third and fourth arguments point to variables which are set to
76point to the created byte stream and its length, respectively. The final
77argument is a pointer to a general context, which can be used to specify custom
78memory mangagement functions. If this argument is NULL, \fBmalloc()\fP is used
79to obtain memory for the byte stream. The yield of the function is the number
80of serialized patterns, or one of the following negative error codes:
81.sp
82 PCRE2_ERROR_BADDATA the number of patterns is zero or less
83 PCRE2_ERROR_BADMAGIC mismatch of id bytes in one of the patterns
84 PCRE2_ERROR_MEMORY memory allocation failed
85 PCRE2_ERROR_MIXEDTABLES the patterns do not all use the same tables
86 PCRE2_ERROR_NULL the 1st, 3rd, or 4th argument is NULL
87.sp
88PCRE2_ERROR_BADMAGIC means either that a pattern's code has been corrupted, or
89that a slot in the vector does not point to a compiled pattern.
90.P
91Once a set of patterns has been serialized you can save the data in any
92appropriate manner. Here is sample code that compiles two patterns and writes
93them to a file. It assumes that the variable \fIfd\fP refers to a file that is
94open for output. The error checking that should be present in a real
95application has been omitted for simplicity.
96.sp
97 int errorcode;
98 uint8_t *bytes;
99 PCRE2_SIZE erroroffset;
100 PCRE2_SIZE bytescount;
101 pcre2_code *list_of_codes[2];
102 list_of_codes[0] = pcre2_compile("first pattern",
103 PCRE2_ZERO_TERMINATED, 0, &errorcode, &erroroffset, NULL);
104 list_of_codes[1] = pcre2_compile("second pattern",
105 PCRE2_ZERO_TERMINATED, 0, &errorcode, &erroroffset, NULL);
106 errorcode = pcre2_serialize_encode(list_of_codes, 2, &bytes,
107 &bytescount, NULL);
108 errorcode = fwrite(bytes, 1, bytescount, fd);
109.sp
110Note that the serialized data is binary data that may contain any of the 256
111possible byte values. On systems that make a distinction between binary and
112non-binary data, be sure that the file is opened for binary output.
113.P
114Serializing a set of patterns leaves the original data untouched, so they can
115still be used for matching. Their memory must eventually be freed in the usual
116way by calling \fBpcre2_code_free()\fP. When you have finished with the byte
Elliott Hughes653c2102019-01-09 15:41:36 -0800117stream, it too must be freed by calling \fBpcre2_serialize_free()\fP. If this
118function is called with a NULL argument, it returns immediately without doing
119anything.
Janis Danisevskis53e448c2016-03-31 13:35:25 +0100120.
121.
122.SH "RE-USING PRECOMPILED PATTERNS"
123.rs
124.sp
125In order to re-use a set of saved patterns you must first make the serialized
126byte stream available in main memory (for example, by reading from a file). The
127management of this memory block is up to the application. You can use the
128\fBpcre2_serialize_get_number_of_codes()\fP function to find out how many
129compiled patterns are in the serialized data without actually decoding the
130patterns:
131.sp
132 uint8_t *bytes = <serialized data>;
133 int32_t number_of_codes = pcre2_serialize_get_number_of_codes(bytes);
134.sp
135The \fBpcre2_serialize_decode()\fP function reads a byte stream and recreates
136the compiled patterns in new memory blocks, setting pointers to them in a
137vector. The first two arguments are a pointer to a suitable vector and its
138length, and the third argument points to a byte stream. The final argument is a
139pointer to a general context, which can be used to specify custom memory
140mangagement functions for the decoded patterns. If this argument is NULL,
141\fBmalloc()\fP and \fBfree()\fP are used. After deserialization, the byte
142stream is no longer needed and can be discarded.
143.sp
Janis Danisevskis53e448c2016-03-31 13:35:25 +0100144 pcre2_code *list_of_codes[2];
145 uint8_t *bytes = <serialized data>;
146 int32_t number_of_codes =
147 pcre2_serialize_decode(list_of_codes, 2, bytes, NULL);
148.sp
149If the vector is not large enough for all the patterns in the byte stream, it
150is filled with those that fit, and the remainder are ignored. The yield of the
151function is the number of decoded patterns, or one of the following negative
152error codes:
153.sp
Janis Danisevskis8b979b22016-08-15 16:09:16 +0100154 PCRE2_ERROR_BADDATA second argument is zero or less
155 PCRE2_ERROR_BADMAGIC mismatch of id bytes in the data
156 PCRE2_ERROR_BADMODE mismatch of code unit size or PCRE2 version
157 PCRE2_ERROR_BADSERIALIZEDDATA other sanity check failure
158 PCRE2_ERROR_MEMORY memory allocation failed
159 PCRE2_ERROR_NULL first or third argument is NULL
Janis Danisevskis53e448c2016-03-31 13:35:25 +0100160.sp
161PCRE2_ERROR_BADMAGIC may mean that the data is corrupt, or that it was compiled
162on a system with different endianness.
163.P
164Decoded patterns can be used for matching in the usual way, and must be freed
165by calling \fBpcre2_code_free()\fP. However, be aware that there is a potential
166race issue if you are using multiple patterns that were decoded from a single
167byte stream in a multithreaded application. A single copy of the character
168tables is used by all the decoded patterns and a reference count is used to
169arrange for its memory to be automatically freed when the last pattern is
170freed, but there is no locking on this reference count. Therefore, if you want
171to call \fBpcre2_code_free()\fP for these patterns in different threads, you
172must arrange your own locking, and ensure that \fBpcre2_code_free()\fP cannot
173be called by two threads at the same time.
174.P
175If a pattern was processed by \fBpcre2_jit_compile()\fP before being
176serialized, the JIT data is discarded and so is no longer available after a
177save/restore cycle. You can, however, process a restored pattern with
178\fBpcre2_jit_compile()\fP if you wish.
179.
180.
181.
182.SH AUTHOR
183.rs
184.sp
185.nf
186Philip Hazel
187University Computing Service
188Cambridge, England.
189.fi
190.
191.
192.SH REVISION
193.rs
194.sp
195.nf
Elliott Hughes653c2102019-01-09 15:41:36 -0800196Last updated: 27 June 2018
197Copyright (c) 1997-2018 University of Cambridge.
Janis Danisevskis53e448c2016-03-31 13:35:25 +0100198.fi