blob: e0ac9b3bb9c0fb4c21d97cea5ba5631c99b114fe [file] [log] [blame]
Elliott Hughes5b808042021-10-01 10:56:10 -07001Change Log for PCRE2
2--------------------
3
4Version 10.38 01-October-2021
5-----------------------------
6
71. Fix invalid single character repetition issues in JIT when the repetition
8is inside a capturing bracket and the bracket is preceeded by character
9literals.
10
112. Installed revised CMake configuration files provided by Jan-Willem Blokland.
12This extends the CMake build system to build both static and shared libraries
13in one go, builds the static library with PIC, and exposes PCRE2 libraries
14using the CMake config files. JWB provided these notes:
15
16- Introduced CMake variable BUILD_STATIC_LIBS to build the static library.
17
18- Make a small modification to config-cmake.h.in by removing the PCRE2_STATIC
19 variable. Added PCRE2_STATIC variable to the static build using the
20 target_compile_definitions() function.
21
22- Extended the CMake config files.
23
24 - Introduced CMake variable PCRE2_USE_STATIC_LIBS to easily switch between
25 the static and shared libraries.
26
27 - Added the PCRE_STATIC variable to the target compile definitions for the
28 import of the static library.
29
30Building static and shared libraries using MSVC results in a name clash of
31the libraries. Both static and shared library builds create, for example, the
32file pcre2-8.lib. Therefore, I decided to change the static library names by
33adding "-static". For example, pcre2-8.lib has become pcre2-8-static.lib.
34[Comment by PH: this is MSVC-specific. It doesn't happen on Linux.]
35
363. Increased the minimum release number for CMake to 3.0.0 because older than
372.8.12 is deprecated (it was set to 2.8.5) and causes warnings. Even 3.0.0 is
38quite old; it was released in 2014.
39
404. Implemented a modified version of Thomas Tempelmann's pcre2grep patch for
41detecting symlink loops. This is dependent on the availability of realpath(),
42which is now tested for in ./configure and CMakeLists.txt.
43
445. Implemented a modified version of Thomas Tempelmann's patch for faster
45case-independent "first code unit" searches for unanchored patterns in 8-bit
46mode in the interpreters. Instead of just remembering whether one case matched
47or not, it remembers the position of a previous match so as to avoid
48unnecessary repeated searching.
49
506. Perl now locks out \K in lookarounds, so PCRE2 now does the same by default.
51However, just in case anybody was relying on the old behaviour, there is an
52option called PCRE2_EXTRA_ALLOW_LOOKAROUND_BSK that enables the old behaviour.
53An option has also been added to pcre2grep to enable this.
54
557. Re-enable a JIT optimization which was unintentionally disabled in 10.35.
56
578. There is a loop counter to catch excessively crazy patterns when checking
58the lengths of lookbehinds at compile time. This was incorrectly getting reset
59whenever a lookahead was processed, leading to some fuzzer-generated patterns
60taking a very long time to compile when (?|) was present in the pattern,
61because (?|) disables caching of group lengths.
62
63
64Version 10.37 26-May-2021
65-------------------------
66
671. Change RunGrepTest to use tr instead of sed when testing with binary
68zero bytes, because sed varies a lot from system to system and has problems
69with binary zeros. This is from Bugzilla #2681. Patch from Jeremie
70Courreges-Anglas via Nam Nguyen. This fixes RunGrepTest for OpenBSD. Later:
71it broke it for at least one version of Solaris, where tr can't handle binary
72zeros. However, that system had /usr/xpg4/bin/tr installed, which works OK, so
73RunGrepTest now checks for that command and uses it if found.
74
752. Compiling with gcc 10.2's -fanalyzer option showed up a hypothetical problem
76with a NULL dereference. I don't think this case could ever occur in practice,
77but I have put in a check in order to get rid of the compiler error.
78
793. An alternative patch for CMakeLists.txt because 10.36 #4 breaks CMake on
80Windows. Patch from email@cs-ware.de fixes bugzilla #2688.
81
824. Two bugs related to over-large numbers have been fixed so the behaviour is
83now the same as Perl.
84
85 (a) A pattern such as /\214748364/ gave an overflow error instead of being
86 treated as the octal number \214 followed by literal digits.
87
88 (b) A sequence such as {65536 that has no terminating } so is not a
89 quantifier was nevertheless complaining that a quantifier number was too big.
90
915. A run of autoconf suggested that configure.ac was out-of-date with respect
92to the lastest autoconf. Running autoupdate made some valid changes, some valid
93suggestions, and also some invalid changes, which were fixed by hand. Autoconf
94now runs clean and the resulting "configure" seems to work, so I hope nothing
95is broken. Later: the requirement for autoconf 2.70 broke some automatic test
96robots. It doesn't seem to be necessary: trying a reduction to 2.60.
97
986. The pattern /a\K.(?0)*/ when matched against "abac" by the interpreter gave
99the answer "bac", whereas Perl and JIT both yield "c". This was because the
100effect of \K was not propagating back from the full pattern recursion. Other
101recursions such as /(a\K.(?1)*)/ did not have this problem.
102
1037. Restore single character repetition optimization in JIT. Currently fewer
104character repetitions are optimized than in 10.34.
105
1068. When the names of the functions in the POSIX wrapper were changed to
107pcre2_regcomp() etc. (see change 10.33 #4 below), functions with the original
108names were left in the library so that pre-compiled programs would still work.
109However, this has proved troublesome when programs link with several libraries,
110some of which use PCRE2 via the POSIX interface while others use a native POSIX
111library. For this reason, the POSIX function names are removed in this release.
112The macros in pcre2posix.h should ensure that re-compiling fixes any programs
113that haven't been compiled since before 10.33.
114
115
116Version 10.36 04-December-2020
117------------------------------
118
1191. Add CET_CFLAGS so that when Intel CET is enabled, pass -mshstk to
120compiler. This fixes https://bugs.exim.org/show_bug.cgi?id=2578. Patch for
121Makefile.am and configure.ac by H.J. Lu. Equivalent patch for CMakeLists.txt
122invented by PH.
123
1242. Fix inifinite loop when a single byte newline is searched in JIT when
125invalid utf8 mode is enabled.
126
1273. Updated CMakeLists.txt with patch from Wolfgang Stöggl (Bugzilla #2584):
128
129 - Include GNUInstallDirs and use ${CMAKE_INSTALL_LIBDIR} instead of hardcoded
130 lib. This allows differentiation between lib and lib64.
131 CMAKE_INSTALL_LIBDIR is used for installation of libraries and also for
132 pkgconfig file generation.
133
134 - Add the version of PCRE2 to the configuration summary like ./configure
135 does.
136
137 - Fix typo: MACTHED_STRING->MATCHED_STRING
138
1394. Updated CMakeLists.txt with another patch from Wolfgang Stöggl (Bugzilla
140#2588):
141
142 - Add escaped double quotes around include directory in CMakeLists.txt to
143 allow spaces in directory names.
144
145 - This fixes a cmake error, if the path of the pcre2 source contains a space.
146
1475. Updated CMakeLists.txt with a patch from B. Scott Michel: CMake's
148documentation suggests using CHECK_SYMBOL_EXISTS over CHECK_FUNCTION_EXIST.
149Moreover, these functions come from specific header files, which need to be
150specified (and, thankfully, are the same on both the Linux and WinXX
151platforms.)
152
1536. Added a (uint32_t) cast to prevent a compiler warning in pcre2_compile.c.
154
1557. Applied a patch from Wolfgang Stöggl (Bugzilla #2600) to fix postfix for
156debug Windows builds using CMake. This also updated configure so that it
157generates *.pc files and pcre2-config with the same content, as in the past.
158
1598. If a pattern ended with (?(VERSION=n.d where n is any number but d is just a
160single digit, the code unit beyond d was being read (i.e. there was a read
161buffer overflow). Fixes ClusterFuzz 23779.
162
1639. After the rework in r1235, certain character ranges were incorrectly
164handled by an optimization in JIT. Furthermore a wrong offset was used to
165read a value from a buffer which could lead to memory overread.
166
16710. Unnoticed for many years was the fact that delimiters other than / in the
168testinput1 and testinput4 files could cause incorrect behaviour when these
169files were processed by perltest.sh. There were several tests that used quotes
170as delimiters, and it was just luck that they didn't go wrong with perltest.sh.
171All the patterns in testinput1 and testinput4 now use / as their delimiter.
172This fixes Bugzilla #2641.
173
17411. Perl has started to give an error for \K within lookarounds (though there
175are cases where it doesn't). PCRE2 still allows this, so the tests that include
176this case have been moved from test 1 to test 2.
177
17812. Further to 10 above, pcre2test has been updated to detect and grumble if a
179delimiter other than / is used after #perltest.
180
18113. Fixed a bug with PCRE2_MATCH_INVALID_UTF in 8-bit mode when PCRE2_CASELESS
182was set and PCRE2_NO_START_OPTIMIZE was not set. The optimization for finding
183the start of a match was not resetting correctly after a failed match on the
184first valid fragment of the subject, possibly causing incorrect "no match"
185returns on subsequent fragments. For example, the pattern /A/ failed to match
186the subject \xe5A. Fixes Bugzilla #2642.
187
18814. Fixed a bug in character set matching when JIT is enabled and both unicode
189scripts and unicode classes are present at the same time.
190
19115. Added GNU grep's -m (aka --max-count) option to pcre2grep.
192
19316. Refactored substitution processing in pcre2grep strings, both for the -O
194option and when dealing with callouts. There is now a single function that
195handles $ expansion in all cases (instead of multiple copies of almost
196identical code). This means that the same escape sequences are available
197everywhere, which was not previously the case. At the same time, the escape
198sequences $x{...} and $o{...} have been introduced, to allow for characters
199whose code points are greater than 255 in Unicode mode.
200
20117. Applied the patch from Bugzilla #2628 to RunGrepTest. This does an explicit
202test for a version of sed that can handle binary zero, instead of assuming that
203any Linux version will work. Later: replaced $(...) by `...` because not all
204shells recognize the former.
205
20618. Fixed a word boundary check bug in JIT when partial matching is enabled.
207
20819. Fix ARM64 compilation warning in JIT. Patch by Carlo.
209
21020. A bug in the RunTest script meant that if the first part of test 2 failed,
211the failure was not reported.
212
21321. Test 2 was failing when run from a directory other than the source
214directory. This failure was previously missed in RunTest because of 20 above.
215Fixes added to both RunTest and RunTest.bat.
216
21722. Patch to CMakeLists.txt from Daniel to fix problem with testing under
218Windows.
219
220
221Version 10.35 09-May-2020
222---------------------------
223
2241. Use PCRE2_MATCH_EMPTY flag to detect empty matches in JIT.
225
2262. Fix ARMv5 JIT improper handling of labels right after a constant pool.
227
2283. A JIT bug is fixed which allowed to read the fields of the compiled
229pattern before its existence is checked.
230
2314. Back in the PCRE1 day, capturing groups that contained recursive back
232references to themselves were made atomic (version 8.01, change 18) because
233after the end a repeated group, the captured substrings had their values from
234the final repetition, not from an earlier repetition that might be the
235destination of a backtrack. This feature was documented, and was carried over
236into PCRE2. However, it has now been realized that the major refactoring that
237was done for 10.30 has made this atomicizing unnecessary, and it is confusing
238when users are unaware of it, making some patterns appear not to be working as
239expected. Capture values of recursive back references in repeated groups are
240now correctly backtracked, so this unnecessary restriction has been removed.
241
2425. Added PCRE2_SUBSTITUTE_LITERAL.
243
2446. Avoid some VS compiler warnings.
245
2467. Added PCRE2_SUBSTITUTE_MATCHED.
247
2488. Added (?* and (?<* as synonms for (*napla: and (*naplb: to match another
249regex engine. The Perl regex folks are aware of this usage and have made a note
250about it.
251
2529. When an assertion is repeated, PCRE2 used to limit the maximum repetition to
2531, believing that repeating an assertion is pointless. However, if a positive
254assertion contains capturing groups, repetition can be useful. In any case, an
255assertion could always be wrapped in a repeated group. The only restriction
256that is now imposed is that an unlimited maximum is changed to one more than
257the minimum.
258
25910. Fix *THEN verbs in lookahead assertions in JIT.
260
26111. Added PCRE2_SUBSTITUTE_REPLACEMENT_ONLY.
262
26312. The JIT stack should be freed when the low-level stack allocation fails.
264
26513. In pcre2grep, if the final line in a scanned file is output but does not
266end with a newline sequence, add a newline according to the --newline setting.
267
26814. (?(DEFINE)...) groups were not being handled correctly when checking for
269the fixed length of a lookbehind assertion. Such a group within a lookbehind
270should be skipped, as it does not contribute to the length of the group.
271Instead, the (DEFINE) group was being processed, and if at the end of the
272lookbehind, that end was not correctly recognized. Errors such as "lookbehind
273assertion is not fixed length" and also "internal error: bad code value in
274parsed_skip()" could result.
275
27615. Put a limit of 1000 on recursive calls in pcre2_study() when searching
277nested groups for starting code units, in order to avoid stack overflow issues.
278If the limit is reached, it just gives up trying for this optimization.
279
28016. The control verb chain list must always be restored when exiting from a
281recurse function in JIT.
282
28317. Fix a crash which occurs when the character type of an invalid UTF
284character is decoded in JIT.
285
28618. Changes in many areas of the code so that when Unicode is supported and
287PCRE2_UCP is set without PCRE2_UTF, Unicode character properties are used for
288upper/lower case computations on characters whose code points are greater than
289127.
290
29119. The function for checking UTF-16 validity was returning an incorrect offset
292for the start of the error when a high surrogate was not followed by a valid
293low surrogate. This caused incorrect behaviour, for example when
294PCRE2_MATCH_INVALID_UTF was set and a match started immediately following the
295invalid high surrogate, such as /aa/ matching "\x{d800}aa".
296
29720. If a DEFINE group immediately preceded a lookbehind assertion, the pattern
298could be mis-compiled and therefore not match correctly. This is the example
299that found this: /(?(DEFINE)(?<foo>bar))(?<![-a-z0-9])word/ which failed to
300match "word" because the "move back" value was set to zero.
301
30221. Following a request from a user, some extensions and tidies to the
303character tables handling have been done:
304
305 (a) The dftables auxiliary program is renamed pcre2_dftables, but it is still
306 not installed for public use.
307
308 (b) There is now a -b option for pcre2_dftables, which causes the tables to
309 be written in binary. There is also a -help option.
310
311 (c) PCRE2_CONFIG_TABLES_LENGTH is added to pcre2_config() so that an
312 application that wants to save tables in binary knows how long they are.
313
31422. Changed setting of CMAKE_MODULE_PATH in CMakeLists.txt from SET to
315LIST(APPEND...) to allow a setting from the command line to be included.
316
31723. Updated to Unicode 13.0.0.
318
31924. CMake build now checks for secure_getenv() and strerror(). Patch by Carlo.
320
32125. Avoid using [-1] as a suffix in pcre2test because it can provoke a compiler
322warning.
323
32426. Added tests for __attribute__((uninitialized)) to both the configure and
325CMake build files, and then applied this attribute to the variable called
326stack_frames_vector[] in pcre2_match(). When implemented, this disables
327automatic initialization (a facility in clang), which can take time on big
328variables.
329
33027. Updated CMakeLists.txt (patches by Uwe Korn) to add support for
331pcre2-config, the libpcre*.pc files, SOVERSION, VERSION and the
332MACHO_*_VERSIONS settings for CMake builds.
333
33428. Another patch to CMakeLists.txt to check for mkostemp (configure already
335does). Patch by Carlo Marcelo Arenas Belon.
336
33729. Check for the existence of memfd_create in both CMake and configure
338configurations. Patch by Carlo Marcelo Arenas Belon.
339
34030. Restrict the configuration setting for the SELinux compatible execmem
341allocator (change 10.30/44) to Linux and NetBSD.
342
343
344Version 10.34 21-November-2019
345------------------------------
346
3471. The maximum number of capturing subpatterns is 65535 (documented), but no
348check on this was ever implemented. This omission has been rectified; it fixes
349ClusterFuzz 14376.
350
3512. Improved the invalid utf32 support of the JIT compiler. Now it correctly
352detects invalid characters in the 0xd800-0xdfff range.
353
3543. Fix minor typo bug in JIT compile when \X is used in a non-UTF string.
355
3564. Add support for matching in invalid UTF strings to the pcre2_match()
357interpreter, and integrate with the existing JIT support via the new
358PCRE2_MATCH_INVALID_UTF compile-time option.
359
3605. Give more error detail for invalid UTF-8 when detected in pcre2grep.
361
3626. Add support for invalid UTF-8 to pcre2grep.
363
3647. Adjust the limit for "must have" code unit searching, in particular,
365increase it substantially for non-anchored patterns.
366
3678. Allow (*ACCEPT) to be quantified, because an ungreedy quantifier with a zero
368minimum is potentially useful.
369
3709. Some changes to the way the minimum subject length is handled:
371
372 * When PCRE2_NO_START_OPTIMIZE is set, no minimum length is computed;
373 pcre2test now omits this item instead of showing a value of zero.
374
375 * An incorrect minimum length could be calculated for a pattern that
376 contained (*ACCEPT) inside a qualified group whose minimum repetition was
377 zero, for example /A(?:(*ACCEPT))?B/, which incorrectly computed a minimum
378 of 2. The minimum length scan no longer happens for a pattern that
379 contains (*ACCEPT).
380
381 * When no minimum length is set by the normal scan, but a first and/or last
382 code unit is recorded, set the minimum to 1 or 2 as appropriate.
383
384 * When a pattern contains multiple groups with the same number, a back
385 reference cannot know which one to scan for a minimum length. This used to
386 cause the minimum length finder to give up with no result. Now it treats
387 such references as not adding to the minimum length (which it should have
388 done all along).
389
390 * Furthermore, the above action now happens only if the back reference is to
391 a group that exists more than once in a pattern instead of any back
392 reference in a pattern with duplicate numbers.
393
39410. A (*MARK) value inside a successful condition was not being returned by the
395interpretive matcher (it was returned by JIT). This bug has been mended.
396
39711. A bug in pcre2grep meant that -o without an argument (or -o0) didn't work
398if the pattern had more than 32 capturing parentheses. This is fixed. In
399addition (a) the default limit for groups requested by -o<n> has been raised to
40050, (b) the new --om-capture option changes the limit, (c) an error is raised
401if -o asks for a group that is above the limit.
402
40312. The quantifier {1} was always being ignored, but this is incorrect when it
404is made possessive and applied to an item in parentheses, because a
405parenthesized item may contain multiple branches or other backtracking points,
406for example /(a|ab){1}+c/ or /(a+){1}+a/.
407
40813. For partial matches, pcre2test was always showing the maximum lookbehind
409characters, flagged with "<", which is misleading when the lookbehind didn't
410actually look behind the start (because it was later in the pattern). Showing
411all consulted preceding characters for partial matches is now controlled by the
412existing "allusedtext" modifier and, as for complete matches, this facility is
413available only for non-JIT matching, because JIT does not maintain the first
414and last consulted characters.
415
41614. DFA matching (using pcre2_dfa_match()) was not recognising a partial match
417if the end of the subject was encountered in a lookahead (conditional or
418otherwise), an atomic group, or a recursion.
419
42015. Give error if pcre2test -t, -T, -tm or -TM is given an argument of zero.
421
42216. Check for integer overflow when computing lookbehind lengths. Fixes
423Clusterfuzz issue 15636.
424
42517. Implemented non-atomic positive lookaround assertions.
426
42718. If a lookbehind contained a lookahead that contained another lookbehind
428within it, the nested lookbehind was not correctly processed. For example, if
429/(?<=(?=(?<=a)))b/ was matched to "ab" it gave no match instead of matching
430"b".
431
43219. Implemented pcre2_get_match_data_size().
433
43420. Two alterations to partial matching:
435
436 (a) The definition of a partial match is slightly changed: if a pattern
437 contains any lookbehinds, an empty partial match may be given, because this
438 is another situation where adding characters to the current subject can
439 lead to a full match. Example: /c*+(?<=[bc])/ with subject "ab".
440
441 (b) Similarly, if a pattern could match an empty string, an empty partial
442 match may be given. Example: /(?![ab]).*/ with subject "ab". This case
443 applies only to PCRE2_PARTIAL_HARD.
444
445 (c) An empty string partial hard match can be returned for \z and \Z as it
446 is documented that they shouldn't match.
447
44821. A branch that started with (*ACCEPT) was not being recognized as one that
449could match an empty string.
450
45122. Corrected pcre2_set_character_tables() tables data type: was const unsigned
452char * instead of const uint8_t *, as generated by pcre2_maketables().
453
45423. Upgraded to Unicode 12.1.0.
455
45624. Add -jitfast command line option to pcre2test (to make all the jit options
457available directly).
458
45925. Make pcre2test -C show if libreadline or libedit is supported.
460
46126. If the length of one branch of a group exceeded 65535 (the maximum value
462that is remembered as a minimum length), the whole group's length was
463incorrectly recorded as 65535, leading to incorrect "no match" when start-up
464optimizations were in force.
465
46627. The "rightmost consulted character" value was not always correct; in
467particular, if a pattern ended with a negative lookahead, characters that were
468inspected in that lookahead were not included.
469
47028. Add the pcre2_maketables_free() function.
471
47229. The start-up optimization that looks for a unique initial matching
473code unit in the interpretive engines uses memchr() in 8-bit mode. When the
474search is caseless, it was doing so inefficiently, which ended up slowing down
475the match drastically when the subject was very long. The revised code (a)
476remembers if one case is not found, so it never repeats the search for that
477case after a bumpalong and (b) when one case has been found, it searches only
478up to that position for an earlier occurrence of the other case. This fix
479applies to both interpretive pcre2_match() and to pcre2_dfa_match().
480
48130. While scanning to find the minimum length of a group, if any branch has
482minimum length zero, there is no need to scan any subsequent branches (a small
483compile-time performance improvement).
484
48531. Installed a .gitignore file on a user's suggestion. When using the svn
486repository with git (through git svn) this helps keep it tidy.
487
48832. Add underflow check in JIT which may occur when the value of subject
489string pointer is close to 0.
490
49133. Arrange for classes such as [Aa] which contain just the two cases of the
492same character, to be treated as a single caseless character. This causes the
493first and required code unit optimizations to kick in where relevant.
494
49534. Improve the bitmap of starting bytes for positive classes that include wide
496characters, but no property types, in UTF-8 mode. Previously, on encountering
497such a class, the bits for all bytes greater than \xc4 were set, thus
498specifying any character with codepoint >= 0x100. Now the only bits that are
499set are for the relevant bytes that start the wide characters. This can give a
500noticeable performance improvement.
501
50235. If the bitmap of starting code units contains only 1 or 2 bits, replace it
503with a single starting code unit (1 bit) or a caseless single starting code
504unit if the two relevant characters are case-partners. This is particularly
505relevant to the 8-bit library, though it applies to all. It can give a
506performance boost for patterns such as [Ww]ord and (word|WORD). However, this
507optimization doesn't happen if there is a "required" code unit of the same
508value (because the search for a "required" code unit starts at the match start
509for non-unique first code unit patterns, but after a unique first code unit,
510and patterns such as a*a need the former action).
511
51236. Small patch to pcre2posix.c to set the erroroffset field to -1 immediately
513after a successful compile, instead of at the start of matching to avoid a
514sanitizer complaint (regexec is supposed to be thread safe).
515
51637. Add NEON vectorization to JIT to speed up matching of first character and
517pairs of characters on ARM64 CPUs.
518
51938. If a non-ASCII character was the first in a starting assertion in a
520caseless match, the "first code unit" optimization did not get the casing
521right, and the assertion failed to match a character in the other case if it
522did not start with the same code unit.
523
52439. Fixed the incorrect computation of jump sizes on x86 CPUs in JIT. A masking
525operation was incorrectly removed in r1136. Reported by Ralf Junker.
526
527
528Version 10.33 16-April-2019
529---------------------------
530
5311. Added "allvector" to pcre2test to make it easy to check the part of the
532ovector that shouldn't be changed, in particular after substitute and failed or
533partial matches.
534
5352. Fix subject buffer overread in JIT when UTF is disabled and \X or \R has
536a greater than 1 fixed quantifier. This issue was found by Yunho Kim.
537
5383. Added support for callouts from pcre2_substitute(). After 10.33-RC1, but
539prior to release, fixed a bug that caused a crash if pcre2_substitute() was
540called with a NULL match context.
541
5424. The POSIX functions are now all called pcre2_regcomp() etc., with wrapper
543functions that use the standard POSIX names. However, in pcre2posix.h the POSIX
544names are defined as macros. This should help avoid linking with the wrong
545library in some environments while still exporting the POSIX names for
546pre-existing programs that use them. (The Debian alternative names are also
547defined as macros, but not documented.)
548
5495. Fix an xclass matching issue in JIT.
550
5516. Implement PCRE2_EXTRA_ESCAPED_CR_IS_LF (see Bugzilla 2315).
552
5537. Implement the Perl 5.28 experimental alphabetic names for atomic groups and
554lookaround assertions, for example, (*pla:...) and (*atomic:...). These are
555characterized by a lower case letter following (* and to simplify coding for
556this, the character tables created by pcre2_maketables() were updated to add a
557new "is lower case letter" bit. At the same time, the now unused "is
558hexadecimal digit" bit was removed. The default tables in
559src/pcre2_chartables.c.dist are updated.
560
5618. Implement the new Perl "script run" features (*script_run:...) and
562(*atomic_script_run:...) aka (*sr:...) and (*asr:...).
563
5649. Fixed two typos in change 22 for 10.21, which added special handling for
565ranges such as a-z in EBCDIC environments. The original code probably never
566worked, though there were no bug reports.
567
56810. Implement PCRE2_COPY_MATCHED_SUBJECT for pcre2_match() (including JIT via
569pcre2_match()) and pcre2_dfa_match(), but *not* the pcre2_jit_match() fast
570path. Also, when a match fails, set the subject field in the match data to NULL
571for tidiness - none of the substring extractors should reference this after
572match failure.
573
57411. If a pattern started with a subroutine call that had a quantifier with a
575minimum of zero, an incorrect "match must start with this character" could be
576recorded. Example: /(?&xxx)*ABC(?<xxx>XYZ)/ would (incorrectly) expect 'A' to
577be the first character of a match.
578
57912. The heap limit checking code in pcre2_dfa_match() could suffer from
580overflow if the heap limit was set very large. This could cause incorrect "heap
581limit exceeded" errors.
582
58313. Add "kibibytes" to the heap limit output from pcre2test -C to make the
584units clear.
585
58614. Add a call to pcre2_jit_free_unused_memory() in pcre2grep, for tidiness.
587
58815. Updated the VMS-specific code in pcre2test on the advice of a VMS user.
589
59016. Removed the unnecessary inclusion of stdint.h (or inttypes.h) from
591pcre2_internal.h as it is now included by pcre2.h. Also, change 17 for 10.32
592below was unnecessarily complicated, as inttypes.h is a Standard C header,
593which is defined to be a superset of stdint.h. Instead of conditionally
594including stdint.h or inttypes.h, pcre2.h now unconditionally includes
595inttypes.h. This supports environments that do not have stdint.h but do have
596inttypes.h, which are known to exist. A note in the autotools documentation
597says (November 2018) that there are none known that are the other way round.
598
59917. Added --disable-percent-zt to "configure" (and equivalent to CMake) to
600forcibly disable the use of %zu and %td in formatting strings because there is
601at least one version of VMS that claims to be C99 but does not support these
602modifiers.
603
60418. Added --disable-pcre2grep-callout-fork, which restricts the callout support
605in pcre2grep to the inbuilt echo facility. This may be useful in environments
606that do not support fork().
607
60819. Fix two instances of <= 0 being applied to unsigned integers (the VMS
609compiler complains).
610
61120. Added "fork" support for VMS to pcre2grep, for running an external program
612via a string callout.
613
61421. Improve MAP_JIT flag usage on MacOS. Patch by Rich Siegel.
615
61622. If a pattern started with (*MARK), (*COMMIT), (*PRUNE), (*SKIP), or (*THEN)
617followed by ^ it was not recognized as anchored.
618
61923. The RunGrepTest script used to cut out the test of NUL characters for
620Solaris and MacOS as printf and sed can't handle them. It seems that the *BSD
621systems can't either. I've inverted the test so that only those OS that are
622known to work (currently only Linux) try to run this test.
623
62424. Some tests in RunGrepTest appended to testtrygrep from two different file
625descriptors instead of redirecting stderr to stdout. This worked on Linux, but
626it was reported not to on other systems, causing the tests to fail.
627
62825. In the RunTest script, make the test for stack setting use the same value
629for the stack as it needs for -bigstack.
630
63126. Insert a cast in pcre2_dfa_match.c to suppress a compiler warning.
632
63326. With PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL set, escape sequences such as \s
634which are valid in character classes, but not as the end of ranges, were being
635treated as literals. An example is [_-\s] (but not [\s-_] because that gave an
636error at the *start* of a range). Now an "invalid range" error is given
637independently of PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL.
638
63927. Related to 26 above, PCRE2_BAD_ESCAPE_IS_LITERAL was affecting known escape
640sequences such as \eX when they appeared invalidly in a character class. Now
641the option applies only to unrecognized or malformed escape sequences.
642
64328. Fix word boundary in JIT compiler. Patch by Mike Munday.
644
64529. The pcre2_dfa_match() function was incorrectly handling conditional version
646tests such as (?(VERSION>=0)...) when the version test was true. Incorrect
647processing or a crash could result.
648
64930. When PCRE2_UTF is set, allow non-ASCII letters and decimal digits in group
650names, as Perl does. There was a small bug in this new code, found by
651ClusterFuzz 12950, fixed before release.
652
65331. Implemented PCRE2_EXTRA_ALT_BSUX to support ECMAScript 6's \u{hhh}
654construct.
655
65632. Compile \p{Any} to be the same as . in DOTALL mode, so that it benefits
657from auto-anchoring if \p{Any}* starts a pattern.
658
65933. Compile invalid UTF check in JIT test when only pcre32 is enabled.
660
66134. For some time now, CMake has been warning about the setting of policy
662CMP0026 to "OLD" in CmakeLists.txt, and hinting that the feature might be
663removed in a future version. A request for CMake expertise on the list produced
664no result, so I have now hacked CMakeLists.txt along the lines of some changes
665I found on the Internet. The new code no longer needs the policy setting, and
666it appears to work fine on Linux.
667
66835. Setting --enable-jit=auto for an out-of-tree build failed because the
669source directory wasn't in the search path for AC_TRY_COMPILE always. Patch
670from Ross Burton.
671
67236. Disable SSE2 JIT optimizations in x86 CPUs when SSE2 is not available.
673Patch by Guillem Jover.
674
67537. Changed expressions such as 1<<10 to 1u<<10 in many places because compiler
676warnings were reported.
677
67838. Using the clang compiler with sanitizing options causes runtime complaints
679about truncation for statments such as x = ~x when x is an 8-bit value; it
680seems to compute ~x as a 32-bit value. Changing such statements to x = 255 ^ x
681gets rid of the warnings. There were also two missing casts in pcre2test.
682
683
684Version 10.32 10-September-2018
685-------------------------------
686
6871. When matching using the the REG_STARTEND feature of the POSIX API with a
688non-zero starting offset, unset capturing groups with lower numbers than a
689group that did capture something were not being correctly returned as "unset"
690(that is, with offset values of -1).
691
6922. When matching using the POSIX API, pcre2test used to omit listing unset
693groups altogether. Now it shows those that come before any actual captures as
694"<unset>", as happens for non-POSIX matching.
695
6963. Running "pcre2test -C" always stated "\R matches CR, LF, or CRLF only",
697whatever the build configuration was. It now correctly says "\R matches all
698Unicode newlines" in the default case when --enable-bsr-anycrlf has not been
699specified. Similarly, running "pcre2test -C bsr" never produced the result
700ANY.
701
7024. Matching the pattern /(*UTF)\C[^\v]+\x80/ against an 8-bit string containing
703multi-code-unit characters caused bad behaviour and possibly a crash. This
704issue was fixed for other kinds of repeat in release 10.20 by change 19, but
705repeating character classes were overlooked.
706
7075. pcre2grep now supports the inclusion of binary zeros in patterns that are
708read from files via the -f option.
709
7106. A small fix to pcre2grep to avoid compiler warnings for -Wformat-overflow=2.
711
7127. Added --enable-jit=auto support to configure.ac.
713
7148. Added some dummy variables to the heapframe structure in 16-bit and 32-bit
715modes for the benefit of m68k, where pointers can be 16-bit aligned. The
716dummies force 32-bit alignment and this ensures that the structure is a
717multiple of PCRE2_SIZE, a requirement that is tested at compile time. In other
718architectures, alignment requirements take care of this automatically.
719
7209. When returning an error from pcre2_pattern_convert(), ensure the error
721offset is set zero for early errors.
722
72310. A number of patches for Windows support from Daniel Richard G:
724
725 (a) List of error numbers in Runtest.bat corrected (it was not the same as in
726 Runtest).
727
728 (b) pcre2grep snprintf() workaround as used elsewhere in the tree.
729
730 (c) Support for non-C99 snprintf() that returns -1 in the overflow case.
731
73211. Minor tidy of pcre2_dfa_match() code.
733
73412. Refactored pcre2_dfa_match() so that the internal recursive calls no longer
735use the stack for local workspace and local ovectors. Instead, an initial block
736of stack is reserved, but if this is insufficient, heap memory is used. The
737heap limit parameter now applies to pcre2_dfa_match().
738
73913. If a "find limits" test of DFA matching in pcre2test resulted in too many
740matches for the ovector, no matches were displayed.
741
74214. Removed an occurrence of ctrl/Z from test 6 because Windows treats it as
743EOF. The test looks to have come from a fuzzer.
744
74515. If PCRE2 was built with a default match limit a lot greater than the
746default default of 10 000 000, some JIT tests of the match limit no longer
747failed. All such tests now set 10 000 000 as the upper limit.
748
74916. Another Windows related patch for pcregrep to ensure that WIN32 is
750undefined under Cygwin.
751
75217. Test for the presence of stdint.h and inttypes.h in configure and CMake and
753include whichever exists (stdint preferred) instead of unconditionally
754including stdint. This makes life easier for old and non-standard systems.
755
75618. Further changes to improve portability, especially to old and or non-
757standard systems:
758
759 (a) Put all printf arguments in RunGrepTest into single, not double, quotes,
760 and use \0 not \x00 for binary zero.
761
762 (b) Avoid the use of C++ (i.e. BCPL) // comments.
763
764 (c) Parameterize the use of %zu in pcre2test to make it like %td. For both of
765 these now, if using MSVC or a standard C before C99, %lu is used with a
766 cast if necessary.
767
76819. Applied a contributed patch to CMakeLists.txt to increase the stack size
769when linking pcre2test with MSVC. This gets rid of a stack overflow error in
770the standard set of tests.
771
77220. Output a warning in pcre2test when ignoring the "altglobal" modifier when
773it is given with the "replace" modifier.
774
77521. In both pcre2test and pcre2_substitute(), with global matching, a pattern
776that matched an empty string, but never at the starting match offset, was not
777handled in a Perl-compatible way. The pattern /(<?=\G.)/ is an example of such
778a pattern. Because \G is in a lookbehind assertion, there has to be a
779"bumpalong" before there can be a match. The automatic "advance by one
780character after an empty string match" rule is therefore inappropriate. A more
781complicated algorithm has now been implemented.
782
78322. When checking to see if a lookbehind is of fixed length, lookaheads were
784correctly ignored, but qualifiers on lookaheads were not being ignored, leading
785to an incorrect "lookbehind assertion is not fixed length" error.
786
78723. The VERSION condition test was reading fractional PCRE2 version numbers
788such as the 04 in 10.04 incorrectly and hence giving wrong results.
789
79024. Updated to Unicode version 11.0.0. As well as the usual addition of new
791scripts and characters, this involved re-jigging the grapheme break property
792algorithm because Unicode has changed the way emojis are handled.
793
79425. Fixed an obscure bug that struck when there were two atomic groups not
795separated by something with a backtracking point. There could be an incorrect
796backtrack into the first of the atomic groups. A complicated example is
797/(?>a(*:1))(?>b)(*SKIP:1)x|.*/ matched against "abc", where the *SKIP
798shouldn't find a MARK (because is in an atomic group), but it did.
799
80026. Upgraded the perltest.sh script: (1) #pattern lines can now be used to set
801a list of modifiers for all subsequent patterns - only those that the script
802recognizes are meaningful; (2) #subject lines can be used to set or unset a
803default "mark" modifier; (3) Unsupported #command lines give a warning when
804they are ignored; (4) Mark data is output only if the "mark" modifier is
805present.
806
80727. (*ACCEPT:ARG), (*FAIL:ARG), and (*COMMIT:ARG) are now supported.
808
80928. A (*MARK) name was not being passed back for positive assertions that were
810terminated by (*ACCEPT).
811
81229. Add support for \N{U+dddd}, but only in Unicode mode.
813
81430. Add support for (?^) for unsetting all imnsx options.
815
81631. The PCRE2_EXTENDED (/x) option only ever discarded space characters whose
817code point was less than 256 and that were recognized by the lookup table
818generated by pcre2_maketables(), which uses isspace() to identify white space.
819Now, when Unicode support is compiled, PCRE2_EXTENDED also discards U+0085,
820U+200E, U+200F, U+2028, and U+2029, which are additional characters defined by
821Unicode as "Pattern White Space". This makes PCRE2 compatible with Perl.
822
82332. In certain circumstances, option settings within patterns were not being
824correctly processed. For example, the pattern /((?i)A)(?m)B/ incorrectly
825matched "ab". (The (?m) setting lost the fact that (?i) should be reset at the
826end of its group during the parse process, but without another setting such as
827(?m) the compile phase got it right.) This bug was introduced by the
828refactoring in release 10.23.
829
83033. PCRE2 uses bcopy() if available when memmove() is not, and it used just to
831define memmove() as function call to bcopy(). This hasn't been tested for a
832long time because in pcre2test the result of memmove() was being used, whereas
833bcopy() doesn't return a result. This feature is now refactored always to call
834an emulation function when there is no memmove(). The emulation makes use of
835bcopy() when available.
836
83734. When serializing a pattern, set the memctl, executable_jit, and tables
838fields (that is, all the fields that contain pointers) to zeros so that the
839result of serializing is always the same. These fields are re-set when the
840pattern is deserialized.
841
84235. In a pattern such as /[^\x{100}-\x{ffff}]*[\x80-\xff]/ which has a repeated
843negative class with no characters less than 0x100 followed by a positive class
844with only characters less than 0x100, the first class was incorrectly being
845auto-possessified, causing incorrect match failures.
846
84736. Removed the character type bit ctype_meta, which dates from PCRE1 and is
848not used in PCRE2.
849
85037. Tidied up unnecessarily complicated macros used in the escapes table.
851
85238. Since 10.21, the new testoutput8-16-4 file has accidentally been omitted
853from distribution tarballs, owing to a typo in Makefile.am which had
854testoutput8-16-3 twice. Now fixed.
855
85639. If the only branch in a conditional subpattern was anchored, the whole
857subpattern was treated as anchored, when it should not have been, since the
858assumed empty second branch cannot be anchored. Demonstrated by test patterns
859such as /(?(1)^())b/ or /(?(?=^))b/.
860
86140. A repeated conditional subpattern that could match an empty string was
862always assumed to be unanchored. Now it it checked just like any other
863repeated conditional subpattern, and can be found to be anchored if the minimum
864quantifier is one or more. I can't see much use for a repeated anchored
865pattern, but the behaviour is now consistent.
866
86741. Minor addition to pcre2_jit_compile.c to avoid static analyzer complaint
868(for an event that could never occur but you had to have external information
869to know that).
870
87142. If before the first match in a file that was being searched by pcre2grep
872there was a line that was sufficiently long to cause the input buffer to be
873expanded, the variable holding the location of the end of the previous match
874was being adjusted incorrectly, and could cause an overflow warning from a code
875sanitizer. However, as the value is used only to print pending "after" lines
876when the next match is reached (and there are no such lines in this case) this
877bug could do no damage.
878
879
880Version 10.31 12-February-2018
881------------------------------
882
8831. Fix typo (missing ]) in VMS code in pcre2test.c.
884
8852. Replace the replicated code for matching extended Unicode grapheme sequences
886(which got a lot more complicated by change 10.30/49) by a single subroutine
887that is called by both pcre2_match() and pcre2_dfa_match().
888
8893. Add idempotent guard to pcre2_internal.h.
890
8914. Add new pcre2_config() options: PCRE2_CONFIG_NEVER_BACKSLASH_C and
892PCRE2_CONFIG_COMPILED_WIDTHS.
893
8945. Cut out \C tests in the JIT regression tests when NEVER_BACKSLASH_C is
895defined (e.g. by --enable-never-backslash-C).
896
8976. Defined public names for all the pcre2_compile() error numbers, and used
898the public names in pcre2_convert.c.
899
9007. Fixed a small memory leak in pcre2test (convert contexts).
901
9028. Added two casts to compile.c and one to match.c to avoid compiler warnings.
903
9049. Added code to pcre2grep when compiled under VMS to set the symbol
905PCRE2GREP_RC to the exit status, because VMS does not distinguish between
906exit(0) and exit(1).
907
90810. Added the -LM (list modifiers) option to pcre2test. Also made -C complain
909about a bad option only if the following argument item does not start with a
910hyphen.
911
91211. pcre2grep was truncating components of file names to 128 characters when
913processing files with the -r option, and also (some very odd code) truncating
914path names to 512 characters. There is now a check on the absolute length of
915full path file names, which may be up to 2047 characters long.
916
91712. When an assertion contained (*ACCEPT) it caused all open capturing groups
918to be closed (as for a non-assertion ACCEPT), which was wrong and could lead to
919misbehaviour for subsequent references to groups that started outside the
920assertion. ACCEPT in an assertion now closes only those groups that were
921started within that assertion. Fixes oss-fuzz issues 3852 and 3891.
922
92313. Multiline matching in pcre2grep was misbehaving if the pattern matched
924within a line, and then matched again at the end of the line and over into
925subsequent lines. Behaviour was different with and without colouring, and
926sometimes context lines were incorrectly printed and/or line endings were lost.
927All these issues should now be fixed.
928
92914. If --line-buffered was specified for pcre2grep when input was from a
930compressed file (.gz or .bz2) a segfault occurred. (Line buffering should be
931ignored for compressed files.)
932
93315. Although pcre2_jit_match checks whether the pattern is compiled
934in a given mode, it was also expected that at least one mode is available.
935This is fixed and pcre2_jit_match returns with PCRE2_ERROR_JIT_BADOPTION
936when the pattern is not optimized by JIT at all.
937
93816. The line number and related variables such as match counts in pcre2grep
939were all int variables, causing overflow when files with more than 2147483647
940lines were processed (assuming 32-bit ints). They have all been changed to
941unsigned long ints.
942
94317. If a backreference with a minimum repeat count of zero was first in a
944pattern, apart from assertions, an incorrect first matching character could be
945recorded. For example, for the pattern /(?=(a))\1?b/, "b" was incorrectly set
946as the first character of a match.
947
94818. Characters in a leading positive assertion are considered for recording a
949first character of a match when the rest of the pattern does not provide one.
950However, a character in a non-assertive group within a leading assertion such
951as in the pattern /(?=(a))\1?b/ caused this process to fail. This was an
952infelicity rather than an outright bug, because it did not affect the result of
953a match, just its speed. (In fact, in this case, the starting 'a' was
954subsequently picked up in the study.)
955
95619. A minor tidy in pcre2_match(): making all PCRE2_ERROR_ returns use "return"
957instead of "RRETURN" saves unwinding the backtracks in these cases (only one
958didn't).
959
96020. Allocate a single callout block on the stack at the start of pcre2_match()
961and set its never-changing fields once only. Do the same for pcre2_dfa_match().
962
96321. Save the extra compile options (set in the compile context) with the
964compiled pattern (they were not previously saved), add PCRE2_INFO_EXTRAOPTIONS
965to retrieve them, and update pcre2test to show them.
966
96722. Added PCRE2_CALLOUT_STARTMATCH and PCRE2_CALLOUT_BACKTRACK bits to a new
968field callout_flags in callout blocks. The bits are set by pcre2_match(), but
969not by JIT or pcre2_dfa_match(). Their settings are shown in pcre2test callouts
970if the callout_extra subject modifier is set. These bits are provided to help
971with tracking how a backtracking match is proceeding.
972
97323. Updated the pcre2demo.c demonstration program, which was missing the extra
974code for -g that handles the case when \K in an assertion causes the match to
975end at the original start point. Also arranged for it to detect when \K causes
976the end of a match to be before its start.
977
97824. Similar to 23 above, strange things (including loops) could happen in
979pcre2grep when \K was used in an assertion when --colour was used or in
980multiline mode. The "end at original start point" bug is fixed, and if the end
981point is found to be before the start point, they are swapped.
982
98325. When PCRE2_FIRSTLINE without PCRE2_NO_START_OPTIMIZE was used in non-JIT
984matching (both pcre2_match() and pcre2_dfa_match()) and the matched string
985started with the first code unit of a newline sequence, matching failed because
986it was not tried at the newline.
987
98826. Code for giving up a non-partial match after failing to find a starting
989code unit anywhere in the subject was missing when searching for one of a
990number of code units (the bitmap case) in both pcre2_match() and
991pcre2_dfa_match(). This was a missing optimization rather than a bug.
992
99327. Tidied up the ACROSSCHAR macro to be like FORWARDCHAR and BACKCHAR, using a
994pointer argument rather than a code unit value. This should not have affected
995the generated code.
996
99728. The JIT compiler has been updated.
998
99929. Avoid pointer overflow for unset captures in pcre2_substring_list_get().
1000This could not actually cause a crash because it was always used in a memcpy()
1001call with zero length.
1002
100330. Some internal structures have a variable-length ovector[] as their last
1004element. Their actual memory is obtained dynamically, giving an ovector of
1005appropriate length. However, they are defined in the structure as
1006ovector[NUMBER], where NUMBER is large so that array bound checkers don't
1007grumble. The value of NUMBER was 10000, but a fuzzer exceeded 5000 capturing
1008groups, making the ovector larger than this. The number has been increased to
1009131072, which allows for the maximum number of captures (65535) plus the
1010overall match. This fixes oss-fuzz issue 5415.
1011
101231. Auto-possessification at the end of a capturing group was dependent on what
1013follows the group (e.g. /(a+)b/ would auto-possessify the a+) but this caused
1014incorrect behaviour when the group was called recursively from elsewhere in the
1015pattern where something different might follow. This bug is an unforseen
1016consequence of change #1 for 10.30 - the implementation of backtracking into
1017recursions. Iterators at the ends of capturing groups are no longer considered
1018for auto-possessification if the pattern contains any recursions. Fixes
1019Bugzilla #2232.
1020
1021
1022Version 10.30 14-August-2017
1023----------------------------
1024
10251. The main interpreter, pcre2_match(), has been refactored into a new version
1026that does not use recursive function calls (and therefore the stack) for
1027remembering backtracking positions. This makes --disable-stack-for-recursion a
1028NOOP. The new implementation allows backtracking into recursive group calls in
1029patterns, making it more compatible with Perl, and also fixes some other
1030hard-to-do issues such as #1887 in Bugzilla. The code is also cleaner because
1031the old code had a number of fudges to try to reduce stack usage. It seems to
1032run no slower than the old code.
1033
1034A number of bugs in the refactored code were subsequently fixed during testing
1035before release, but after the code was made available in the repository. These
1036bugs were never in fully released code, but are noted here for the record.
1037
1038 (a) If a pattern had fewer capturing parentheses than the ovector supplied in
1039 the match data block, a memory error (detectable by ASAN) occurred after
1040 a match, because the external block was being set from non-existent
1041 internal ovector fields. Fixes oss-fuzz issue 781.
1042
1043 (b) A pattern with very many capturing parentheses (when the internal frame
1044 size was greater than the initial frame vector on the stack) caused a
1045 crash. A vector on the heap is now set up at the start of matching if the
1046 vector on the stack is not big enough to handle at least 10 frames.
1047 Fixes oss-fuzz issue 783.
1048
1049 (c) Handling of (*VERB)s in recursions was wrong in some cases.
1050
1051 (d) Captures in negative assertions that were used as conditions were not
1052 happening if the assertion matched via (*ACCEPT).
1053
1054 (e) Mark values were not being passed out of recursions.
1055
1056 (f) Refactor some code in do_callout() to avoid picky compiler warnings about
1057 negative indices. Fixes oss-fuzz issue 1454.
1058
1059 (g) Similarly refactor the way the variable length ovector is addressed for
1060 similar reasons. Fixes oss-fuzz issue 1465.
1061
10622. Now that pcre2_match() no longer uses recursive function calls (see above),
1063the "match limit recursion" value seems misnamed. It still exists, and limits
1064the depth of tree that is searched. To avoid future confusion, it has been
1065renamed as "depth limit" in all relevant places (--with-depth-limit,
1066(*LIMIT_DEPTH), pcre2_set_depth_limit(), etc) but the old names are still
1067available for backwards compatibility.
1068
10693. Hardened pcre2test so as to reduce the number of bugs reported by fuzzers:
1070
1071 (a) Check for malloc failures when getting memory for the ovector (POSIX) or
1072 the match data block (non-POSIX).
1073
10744. In the 32-bit library in non-UTF mode, an attempt to find a Unicode property
1075for a character with a code point greater than 0x10ffff (the Unicode maximum)
1076caused a crash.
1077
10785. If a lookbehind assertion that contained a back reference to a group
1079appearing later in the pattern was compiled with the PCRE2_ANCHORED option,
1080undefined actions (often a segmentation fault) could occur, depending on what
1081other options were set. An example assertion is (?<!\1(abc)) where the
1082reference \1 precedes the group (abc). This fixes oss-fuzz issue 865.
1083
10846. Added the PCRE2_INFO_FRAMESIZE item to pcre2_pattern_info() and arranged for
1085pcre2test to use it to output the frame size when the "framesize" modifier is
1086given.
1087
10887. Reworked the recursive pattern matching in the JIT compiler to follow the
1089interpreter changes.
1090
10918. When the zero_terminate modifier was specified on a pcre2test subject line
1092for global matching, unpredictable things could happen. For example, in UTF-8
1093mode, the pattern //g,zero_terminate read random memory when matched against an
1094empty string with zero_terminate. This was a bug in pcre2test, not the library.
1095
10969. Moved some Windows-specific code in pcre2grep (introduced in 10.23/13) out
1097of the section that is compiled when Unix-style directory scanning is
1098available, and into a new section that is always compiled for Windows.
1099
110010. In pcre2test, explicitly close the file after an error during serialization
1101or deserialization (the "load" or "save" commands).
1102
110311. Fix memory leak in pcre2_serialize_decode() when the input is invalid.
1104
110512. Fix potential NULL dereference in pcre2_callout_enumerate() if called with
1106a NULL pattern pointer when Unicode support is available.
1107
110813. When the 32-bit library was being tested by pcre2test, error messages that
1109were longer than 64 code units could cause a buffer overflow. This was a bug in
1110pcre2test.
1111
111214. The alternative matching function, pcre2_dfa_match() misbehaved if it
1113encountered a character class with a possessive repeat, for example [a-f]{3}+.
1114
111515. The depth (formerly recursion) limit now applies to DFA matching (as
1116of 10.23/36); pcre2test has been upgraded so that \=find_limits works with DFA
1117matching to find the minimum value for this limit.
1118
111916. Since 10.21, if pcre2_match() was called with a null context, default
1120memory allocation functions were used instead of whatever was used when the
1121pattern was compiled.
1122
112317. Changes to the pcre2test "memory" modifier on a subject line. These apply
1124only to pcre2_match():
1125
1126 (a) Warn if null_context is set on both pattern and subject, because the
1127 memory details cannot then be shown.
1128
1129 (b) Remember (up to a certain number of) memory allocations and their
1130 lengths, and list only the lengths, so as to be system-independent.
1131 (In practice, the new interpreter never has more than 2 blocks allocated
1132 simultaneously.)
1133
113418. Make pcre2test detect an error return from pcre2_get_error_message(), give
1135a message, and abandon the run (this would have detected #13 above).
1136
113719. Implemented PCRE2_ENDANCHORED.
1138
113920. Applied Jason Hood's patches (slightly modified) to pcre2grep, to implement
1140the --output=text (-O) option and the inbuilt callout echo.
1141
114221. Extend auto-anchoring etc. to ignore groups with a zero qualifier and
1143single-branch conditions with a false condition (e.g. DEFINE) at the start of a
1144branch. For example, /(?(DEFINE)...)^A/ and /(...){0}^B/ are now flagged as
1145anchored.
1146
114722. Added an explicit limit on the amount of heap used by pcre2_match(), set by
1148pcre2_set_heap_limit() or (*LIMIT_HEAP=xxx). Upgraded pcre2test to show the
1149heap limit along with other pattern information, and to find the minimum when
1150the find_limits modifier is set.
1151
115223. Write to the last 8 bytes of the pcre2_real_code structure when a compiled
1153pattern is set up so as to initialize any padding the compiler might have
1154included. This avoids valgrind warnings when a compiled pattern is copied, in
1155particular when it is serialized.
1156
115724. Remove a redundant line of code left in accidentally a long time ago.
1158
115925. Remove a duplication typo in pcre2_tables.c
1160
116126. Correct an incorrect cast in pcre2_valid_utf.c
1162
116327. Update pcre2test, remove some unused code in pcre2_match(), and upgrade the
1164tests to improve coverage.
1165
116628. Some fixes/tidies as a result of looking at Coverity Scan output:
1167
1168 (a) Typo: ">" should be ">=" in opcode check in pcre2_auto_possess.c.
1169 (b) Added some casts to avoid "suspicious implicit sign extension".
1170 (c) Resource leaks in pcre2test in rare error cases.
1171 (d) Avoid warning for never-use case OP_TABLE_LENGTH which is just a fudge
1172 for checking at compile time that tables are the right size.
1173 (e) Add missing "fall through" comment.
1174
117529. Implemented PCRE2_EXTENDED_MORE and related /xx and (?xx) features.
1176
117730. Implement (?n: for PCRE2_NO_AUTO_CAPTURE, because Perl now has this.
1178
117931. If more than one of "push", "pushcopy", or "pushtablescopy" were set in
1180pcre2test, a crash could occur.
1181
118232. Make -bigstack in RunTest allocate a 64MiB stack (instead of 16MiB) so
1183that all the tests can run with clang's sanitizing options.
1184
118533. Implement extra compile options in the compile context and add the first
1186one: PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES.
1187
118834. Implement newline type PCRE2_NEWLINE_NUL.
1189
119035. A lookbehind assertion that had a zero-length branch caused undefined
1191behaviour when processed by pcre2_dfa_match(). This is oss-fuzz issue 1859.
1192
119336. The match limit value now also applies to pcre2_dfa_match() as there are
1194patterns that can use up a lot of resources without necessarily recursing very
1195deeply. (Compare item 10.23/36.) This should fix oss-fuzz #1761.
1196
119737. Implement PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL.
1198
119938. Fix returned offsets from regexec() when REG_STARTEND is used with a
1200starting offset greater than zero.
1201
120239. Implement REG_PEND (GNU extension) for the POSIX wrapper.
1203
120440. Implement the subject_literal modifier in pcre2test, and allow jitstack on
1205pattern lines.
1206
120741. Implement PCRE2_LITERAL and use it to support REG_NOSPEC.
1208
120942. Implement PCRE2_EXTRA_MATCH_LINE and PCRE2_EXTRA_MATCH_WORD for the benefit
1210of pcre2grep.
1211
121243. Re-implement pcre2grep's -F, -w, and -x options using PCRE2_LITERAL,
1213PCRE2_EXTRA_MATCH_WORD, and PCRE2_EXTRA_MATCH_LINE. This fixes two bugs:
1214
1215 (a) The -F option did not work for fixed strings containing \E.
1216 (b) The -w option did not work for patterns with multiple branches.
1217
121844. Added configuration options for the SELinux compatible execmem allocator in
1219JIT.
1220
122145. Increased the limit for searching for a "must be present" code unit in
1222subjects from 1000 to 2000 for 8-bit searches, since they use memchr() and are
1223much faster.
1224
122546. Arrange for anchored patterns to record and use "first code unit" data,
1226because this can give a fast "no match" without searching for a "required code
1227unit". Previously only non-anchored patterns did this.
1228
122947. Upgraded the Unicode tables from Unicode 8.0.0 to Unicode 10.0.0.
1230
123148. Add the callout_no_where modifier to pcre2test.
1232
123349. Update extended grapheme breaking rules to the latest set that are in
1234Unicode Standard Annex #29.
1235
123650. Added experimental foreign pattern conversion facilities
1237(pcre2_pattern_convert() and friends).
1238
123951. Change the macro FWRITE, used in pcre2grep, to FWRITE_IGNORE because FWRITE
1240is defined in a system header in cygwin. Also modified some of the #ifdefs in
1241pcre2grep related to Windows and Cygwin support.
1242
124352. Change 3(g) for 10.23 was a bit too zealous. If a hyphen that follows a
1244character class is the last character in the class, Perl does not give a
1245warning. PCRE2 now also treats this as a literal.
1246
124753. Related to 52, though PCRE2 was throwing an error for [[:digit:]-X] it was
1248not doing so for [\d-X] (and similar escapes), as is documented.
1249
125054. Fixed a MIPS issue in the JIT compiler reported by Joshua Kinard.
1251
125255. Fixed a "maybe uninitialized" warning for class_uchardata in \p handling in
1253pcre2_compile() which could never actually trigger (code should have been cut
1254out when Unicode support is disabled).
1255
1256
1257Version 10.23 14-February-2017
1258------------------------------
1259
12601. Extended pcre2test with the utf8_input modifier so that it is able to
1261generate all possible 16-bit and 32-bit code unit values in non-UTF modes.
1262
12632. In any wide-character mode (8-bit UTF or any 16-bit or 32-bit mode), without
1264PCRE2_UCP set, a negative character type such as \D in a positive class should
1265cause all characters greater than 255 to match, whatever else is in the class.
1266There was a bug that caused this not to happen if a Unicode property item was
1267added to such a class, for example [\D\P{Nd}] or [\W\pL].
1268
12693. There has been a major re-factoring of the pcre2_compile.c file. Most syntax
1270checking is now done in the pre-pass that identifies capturing groups. This has
1271reduced the amount of duplication and made the code tidier. While doing this,
1272some minor bugs and Perl incompatibilities were fixed, including:
1273
1274 (a) \Q\E in the middle of a quantifier such as A+\Q\E+ is now ignored instead
1275 of giving an invalid quantifier error.
1276
1277 (b) {0} can now be used after a group in a lookbehind assertion; previously
1278 this caused an "assertion is not fixed length" error.
1279
1280 (c) Perl always treats (?(DEFINE) as a "define" group, even if a group with
1281 the name "DEFINE" exists. PCRE2 now does likewise.
1282
1283 (d) A recursion condition test such as (?(R2)...) must now refer to an
1284 existing subpattern.
1285
1286 (e) A conditional recursion test such as (?(R)...) misbehaved if there was a
1287 group whose name began with "R".
1288
1289 (f) When testing zero-terminated patterns under valgrind, the terminating
1290 zero is now marked "no access". This catches bugs that would otherwise
1291 show up only with non-zero-terminated patterns.
1292
1293 (g) A hyphen appearing immediately after a POSIX character class (for example
1294 /[[:ascii:]-z]/) now generates an error. Perl does accept this as a
1295 literal, but gives a warning, so it seems best to fail it in PCRE.
1296
1297 (h) An empty \Q\E sequence may appear after a callout that precedes an
1298 assertion condition (it is, of course, ignored).
1299
1300One effect of the refactoring is that some error numbers and messages have
1301changed, and the pattern offset given for compiling errors is not always the
1302right-most character that has been read. In particular, for a variable-length
1303lookbehind assertion it now points to the start of the assertion. Another
1304change is that when a callout appears before a group, the "length of next
1305pattern item" that is passed now just gives the length of the opening
1306parenthesis item, not the length of the whole group. A length of zero is now
1307given only for a callout at the end of the pattern. Automatic callouts are no
1308longer inserted before and after explicit callouts in the pattern.
1309
1310A number of bugs in the refactored code were subsequently fixed during testing
1311before release, but after the code was made available in the repository. Many
1312of the bugs were discovered by fuzzing testing. Several of them were related to
1313the change from assuming a zero-terminated pattern (which previously had
1314required non-zero terminated strings to be copied). These bugs were never in
1315fully released code, but are noted here for the record.
1316
1317 (a) An overall recursion such as (?0) inside a lookbehind assertion was not
1318 being diagnosed as an error.
1319
1320 (b) In utf mode, the length of a *MARK (or other verb) name was being checked
1321 in characters instead of code units, which could lead to bad code being
1322 compiled, leading to unpredictable behaviour.
1323
1324 (c) In extended /x mode, characters whose code was greater than 255 caused
1325 a lookup outside one of the global tables. A similar bug existed for wide
1326 characters in *VERB names.
1327
1328 (d) The amount of memory needed for a compiled pattern was miscalculated if a
1329 lookbehind contained more than one toplevel branch and the first branch
1330 was of length zero.
1331
1332 (e) In UTF-8 or UTF-16 modes with PCRE2_EXTENDED (/x) set and a non-zero-
1333 terminated pattern, if a # comment ran on to the end of the pattern, one
1334 or more code units past the end were being read.
1335
1336 (f) An unterminated repeat at the end of a non-zero-terminated pattern (e.g.
1337 "{2,2") could cause reading beyond the pattern.
1338
1339 (g) When reading a callout string, if the end delimiter was at the end of the
1340 pattern one further code unit was read.
1341
1342 (h) An unterminated number after \g' could cause reading beyond the pattern.
1343
1344 (i) An insufficient memory size was being computed for compiling with
1345 PCRE2_AUTO_CALLOUT.
1346
1347 (j) A conditional group with an assertion condition used more memory than was
1348 allowed for it during parsing, so too many of them could therefore
1349 overrun a buffer.
1350
1351 (k) If parsing a pattern exactly filled the buffer, the internal test for
1352 overrun did not check when the final META_END item was added.
1353
1354 (l) If a lookbehind contained a subroutine call, and the called group
1355 contained an option setting such as (?s), and the PCRE2_ANCHORED option
1356 was set, unpredictable behaviour could occur. The underlying bug was
1357 incorrect code and insufficient checking while searching for the end of
1358 the called subroutine in the parsed pattern.
1359
1360 (m) Quantifiers following (*VERB)s were not being diagnosed as errors.
1361
1362 (n) The use of \Q...\E in a (*VERB) name when PCRE2_ALT_VERBNAMES and
1363 PCRE2_AUTO_CALLOUT were both specified caused undetermined behaviour.
1364
1365 (o) If \Q was preceded by a quantified item, and the following \E was
1366 followed by '?' or '+', and there was at least one literal character
1367 between them, an internal error "unexpected repeat" occurred (example:
1368 /.+\QX\E+/).
1369
1370 (p) A buffer overflow could occur while sorting the names in the group name
1371 list (depending on the order in which the names were seen).
1372
1373 (q) A conditional group that started with a callout was not doing the right
1374 check for a following assertion, leading to compiling bad code. Example:
1375 /(?(C'XX))?!XX/
1376
1377 (r) If a character whose code point was greater than 0xffff appeared within
1378 a lookbehind that was within another lookbehind, the calculation of the
1379 lookbehind length went wrong and could provoke an internal error.
1380
1381 (t) The sequence \E- or \Q\E- after a POSIX class in a character class caused
1382 an internal error. Now the hyphen is treated as a literal.
1383
13844. Back references are now permitted in lookbehind assertions when there are
1385no duplicated group numbers (that is, (?| has not been used), and, if the
1386reference is by name, there is only one group of that name. The referenced
1387group must, of course be of fixed length.
1388
13895. pcre2test has been upgraded so that, when run under valgrind with valgrind
1390support enabled, reading past the end of the pattern is detected, both when
1391compiling and during callout processing.
1392
13936. \g{+<number>} (e.g. \g{+2} ) is now supported. It is a "forward back
1394reference" and can be useful in repetitions (compare \g{-<number>} ). Perl does
1395not recognize this syntax.
1396
13977. Automatic callouts are no longer generated before and after callouts in the
1398pattern.
1399
14008. When pcre2test was outputing information from a callout, the caret indicator
1401for the current position in the subject line was incorrect if it was after an
1402escape sequence for a character whose code point was greater than \x{ff}.
1403
14049. Change 19 for 10.22 had a typo (PCRE_STATIC_RUNTIME should be
1405PCRE2_STATIC_RUNTIME). Fix from David Gaussmann.
1406
140710. Added --max-buffer-size to pcre2grep, to allow for automatic buffer
1408expansion when long lines are encountered. Original patch by Dmitry
1409Cherniachenko.
1410
141111. If pcre2grep was compiled with JIT support, but the library was compiled
1412without it (something that neither ./configure nor CMake allow, but it can be
1413done by editing config.h), pcre2grep was giving a JIT error. Now it detects
1414this situation and does not try to use JIT.
1415
141612. Added some "const" qualifiers to variables in pcre2grep.
1417
141813. Added Dmitry Cherniachenko's patch for colouring output in Windows
1419(untested by me). Also, look for GREP_COLOUR or GREP_COLOR if the environment
1420variables PCRE2GREP_COLOUR and PCRE2GREP_COLOR are not found.
1421
142214. Add the -t (grand total) option to pcre2grep.
1423
142415. A number of bugs have been mended relating to match start-up optimizations
1425when the first thing in a pattern is a positive lookahead. These all applied
1426only when PCRE2_NO_START_OPTIMIZE was *not* set:
1427
1428 (a) A pattern such as (?=.*X)X$ was incorrectly optimized as if it needed
1429 both an initial 'X' and a following 'X'.
1430 (b) Some patterns starting with an assertion that started with .* were
1431 incorrectly optimized as having to match at the start of the subject or
1432 after a newline. There are cases where this is not true, for example,
1433 (?=.*[A-Z])(?=.{8,16})(?!.*[\s]) matches after the start in lines that
1434 start with spaces. Starting .* in an assertion is no longer taken as an
1435 indication of matching at the start (or after a newline).
1436
143716. The "offset" modifier in pcre2test was not being ignored (as documented)
1438when the POSIX API was in use.
1439
144017. Added --enable-fuzz-support to "configure", causing an non-installed
1441library containing a test function that can be called by fuzzers to be
1442compiled. A non-installed binary to run the test function locally, called
1443pcre2fuzzcheck is also compiled.
1444
144518. A pattern with PCRE2_DOTALL (/s) set but not PCRE2_NO_DOTSTAR_ANCHOR, and
1446which started with .* inside a positive lookahead was incorrectly being
1447compiled as implicitly anchored.
1448
144919. Removed all instances of "register" declarations, as they are considered
1450obsolete these days and in any case had become very haphazard.
1451
145220. Add strerror() to pcre2test for failed file opening.
1453
145421. Make pcre2test -C list valgrind support when it is enabled.
1455
145622. Add the use_length modifier to pcre2test.
1457
145823. Fix an off-by-one bug in pcre2test for the list of names for 'get' and
1459'copy' modifiers.
1460
146124. Add PCRE2_CALL_CONVENTION into the prototype declarations in pcre2.h as it
1462is apparently needed there as well as in the function definitions. (Why did
1463nobody ask for this in PCRE1?)
1464
146525. Change the _PCRE2_H and _PCRE2_UCP_H guard macros in the header files to
1466PCRE2_H_IDEMPOTENT_GUARD and PCRE2_UCP_H_IDEMPOTENT_GUARD to be more standard
1467compliant and unique.
1468
146926. pcre2-config --libs-posix was listing -lpcre2posix instead of
1470-lpcre2-posix. Also, the CMake build process was building the library with the
1471wrong name.
1472
147327. In pcre2test, give some offset information for errors in hex patterns.
1474This uses the C99 formatting sequence %td, except for MSVC which doesn't
1475support it - %lu is used instead.
1476
147728. Implemented pcre2_code_copy_with_tables(), and added pushtablescopy to
1478pcre2test for testing it.
1479
148029. Fix small memory leak in pcre2test.
1481
148230. Fix out-of-bounds read for partial matching of /./ against an empty string
1483when the newline type is CRLF.
1484
148531. Fix a bug in pcre2test that caused a crash when a locale was set either in
1486the current pattern or a previous one and a wide character was matched.
1487
148832. The appearance of \p, \P, or \X in a substitution string when
1489PCRE2_SUBSTITUTE_EXTENDED was set caused a segmentation fault (NULL
1490dereference).
1491
149233. If the starting offset was specified as greater than the subject length in
1493a call to pcre2_substitute() an out-of-bounds memory reference could occur.
1494
149534. When PCRE2 was compiled to use the heap instead of the stack for recursive
1496calls to match(), a repeated minimizing caseless back reference, or a
1497maximizing one where the two cases had different numbers of code units,
1498followed by a caseful back reference, could lose the caselessness of the first
1499repeated back reference (example: /(Z)(a)\2{1,2}?(?-i)\1X/i should match ZaAAZX
1500but didn't).
1501
150235. When a pattern is too complicated, PCRE2 gives up trying to find a minimum
1503matching length and just records zero. Typically this happens when there are
1504too many nested or recursive back references. If the limit was reached in
1505certain recursive cases it failed to be triggered and an internal error could
1506be the result.
1507
150836. The pcre2_dfa_match() function now takes note of the recursion limit for
1509the internal recursive calls that are used for lookrounds and recursions within
1510the pattern.
1511
151237. More refactoring has got rid of the internal could_be_empty_branch()
1513function (around 400 lines of code, including comments) by keeping track of
1514could-be-emptiness as the pattern is compiled instead of scanning compiled
1515groups. (This would have been much harder before the refactoring of #3 above.)
1516This lifts a restriction on the number of branches in a group (more than about
15171100 would give "pattern is too complicated").
1518
151938. Add the "-ac" command line option to pcre2test as a synonym for "-pattern
1520auto_callout".
1521
152239. In a library with Unicode support, incorrect data was compiled for a
1523pattern with PCRE2_UCP set without PCRE2_UTF if a class required all wide
1524characters to match (for example, /[\s[:^ascii:]]/).
1525
152640. The callout_error modifier has been added to pcre2test to make it possible
1527to return PCRE2_ERROR_CALLOUT from a callout.
1528
152941. A minor change to pcre2grep: colour reset is now "<esc>[0m" instead of
1530"<esc>[00m".
1531
153242. The limit in the auto-possessification code that was intended to catch
1533overly-complicated patterns and not spend too much time auto-possessifying was
1534being reset too often, resulting in very long compile times for some patterns.
1535Now such patterns are no longer completely auto-possessified.
1536
153743. Applied Jason Hood's revised patch for RunTest.bat.
1538
153944. Added a new Windows script RunGrepTest.bat, courtesy of Jason Hood.
1540
154145. Minor cosmetic fix to pcre2test: move a variable that is not used under
1542Windows into the "not Windows" code.
1543
154446. Applied Jason Hood's patches to upgrade pcre2grep under Windows and tidy
1545some of the code:
1546
1547 * normalised the Windows condition by ensuring WIN32 is defined;
1548 * enables the callout feature under Windows;
1549 * adds globbing (Microsoft's implementation expands quoted args),
1550 using a tweaked opendirectory;
1551 * implements the is_*_tty functions for Windows;
1552 * --color=always will write the ANSI sequences to file;
1553 * add sequences 4 (underline works on Win10) and 5 (blink as bright
1554 background, relatively standard on DOS/Win);
1555 * remove the (char *) casts for the now-const strings;
1556 * remove GREP_COLOUR (grep's command line allowed the 'u', but not
1557 the environment), parsing GREP_COLORS instead;
1558 * uses the current colour if not set, rather than black;
1559 * add print_match for the undefined case;
1560 * fixes a typo.
1561
1562In addition, colour settings containing anything other than digits and
1563semicolon are ignored, and the colour controls are no longer output for empty
1564strings.
1565
156647. Detecting patterns that are too large inside the length-measuring loop
1567saves processing ridiculously long patterns to their end.
1568
156948. Ignore PCRE2_CASELESS when processing \h, \H, \v, and \V in classes as it
1570just wastes time. In the UTF case it can also produce redundant entries in
1571XCLASS lists caused by characters with multiple other cases and pairs of
1572characters in the same "not-x" sublists.
1573
157449. A pattern such as /(?=(a\K))/ can report the end of the match being before
1575its start; pcre2test was not handling this correctly when using the POSIX
1576interface (it was OK with the native interface).
1577
157850. In pcre2grep, ignore all JIT compile errors. This means that pcre2grep will
1579continue to work, falling back to interpretation if anything goes wrong with
1580JIT.
1581
158251. Applied patches from Christian Persch to configure.ac to make use of the
1583AC_USE_SYSTEM_EXTENSIONS macro and to test for functions used by the JIT
1584modules.
1585
158652. Minor fixes to pcre2grep from Jason Hood:
1587 * fixed some spacing;
1588 * Windows doesn't usually use single quotes, so I've added a define
1589 to use appropriate quotes [in an example];
1590 * LC_ALL was displayed as "LCC_ALL";
1591 * numbers 11, 12 & 13 should end in "th";
1592 * use double quotes in usage message.
1593
159453. When autopossessifying, skip empty branches without recursion, to reduce
1595stack usage for the benefit of clang with -fsanitize-address, which uses huge
1596stack frames. Example pattern: /X?(R||){3335}/. Fixes oss-fuzz issue 553.
1597
159854. A pattern with very many explicit back references to a group that is a long
1599way from the start of the pattern could take a long time to compile because
1600searching for the referenced group in order to find the minimum length was
1601being done repeatedly. Now up to 128 group minimum lengths are cached and the
1602attempt to find a minimum length is abandoned if there is a back reference to a
1603group whose number is greater than 128. (In that case, the pattern is so
1604complicated that this optimization probably isn't worth it.) This fixes
1605oss-fuzz issue 557.
1606
160755. Issue 32 for 10.22 below was not correctly fixed. If pcre2grep in multiline
1608mode with --only-matching matched several lines, it restarted scanning at the
1609next line instead of moving on to the end of the matched string, which can be
1610several lines after the start.
1611
161256. Applied Jason Hood's new patch for RunGrepTest.bat that updates it in line
1613with updates to the non-Windows version.
1614
1615
1616
1617Version 10.22 29-July-2016
1618--------------------------
1619
16201. Applied Jason Hood's patches to RunTest.bat and testdata/wintestoutput3
1621to fix problems with running the tests under Windows.
1622
16232. Implemented a facility for quoting literal characters within hexadecimal
1624patterns in pcre2test, to make it easier to create patterns with just a few
1625non-printing characters.
1626
16273. Binary zeros are not supported in pcre2test input files. It now detects them
1628and gives an error.
1629
16304. Updated the valgrind parameters in RunTest: (a) changed smc-check=all to
1631smc-check=all-non-file; (b) changed obj:* in the suppression file to obj:??? so
1632that it matches only unknown objects.
1633
16345. Updated the maintenance script maint/ManyConfigTests to make it easier to
1635select individual groups of tests.
1636
16376. When the POSIX wrapper function regcomp() is called, the REG_NOSUB option
1638used to set PCRE2_NO_AUTO_CAPTURE when calling pcre2_compile(). However, this
1639disables the use of back references (and subroutine calls), which are supported
1640by other implementations of regcomp() with RE_NOSUB. Therefore, REG_NOSUB no
1641longer causes PCRE2_NO_AUTO_CAPTURE to be set, though it still ignores nmatch
1642and pmatch when regexec() is called.
1643
16447. Because of 6 above, pcre2test has been modified with a new modifier called
1645posix_nosub, to call regcomp() with REG_NOSUB. Previously the no_auto_capture
1646modifier had this effect. That option is now ignored when the POSIX API is in
1647use.
1648
16498. Minor tidies to the pcre2demo.c sample program, including more comments
1650about its 8-bit-ness.
1651
16529. Detect unmatched closing parentheses and give the error in the pre-scan
1653instead of later. Previously the pre-scan carried on and could give a
1654misleading incorrect error message. For example, /(?J)(?'a'))(?'a')/ gave a
1655message about invalid duplicate group names.
1656
165710. It has happened that pcre2test was accidentally linked with another POSIX
1658regex library instead of libpcre2-posix. In this situation, a call to regcomp()
1659(in the other library) may succeed, returning zero, but of course putting its
1660own data into the regex_t block. In one example the re_pcre2_code field was
1661left as NULL, which made pcre2test think it had not got a compiled POSIX regex,
1662so it treated the next line as another pattern line, resulting in a confusing
1663error message. A check has been added to pcre2test to see if the data returned
1664from a successful call of regcomp() are valid for PCRE2's regcomp(). If they
1665are not, an error message is output and the pcre2test run is abandoned. The
1666message points out the possibility of a mis-linking. Hopefully this will avoid
1667some head-scratching the next time this happens.
1668
166911. A pattern such as /(?<=((?C)0))/, which has a callout inside a lookbehind
1670assertion, caused pcre2test to output a very large number of spaces when the
1671callout was taken, making the program appearing to loop.
1672
167312. A pattern that included (*ACCEPT) in the middle of a sufficiently deeply
1674nested set of parentheses of sufficient size caused an overflow of the
1675compiling workspace (which was diagnosed, but of course is not desirable).
1676
167713. Detect missing closing parentheses during the pre-pass for group
1678identification.
1679
168014. Changed some integer variable types and put in a number of casts, following
1681a report of compiler warnings from Visual Studio 2013 and a few tests with
1682gcc's -Wconversion (which still throws up a lot).
1683
168415. Implemented pcre2_code_copy(), and added pushcopy and #popcopy to pcre2test
1685for testing it.
1686
168716. Change 66 for 10.21 introduced the use of snprintf() in PCRE2's version of
1688regerror(). When the error buffer is too small, my version of snprintf() puts a
1689binary zero in the final byte. Bug #1801 seems to show that other versions do
1690not do this, leading to bad output from pcre2test when it was checking for
1691buffer overflow. It no longer assumes a binary zero at the end of a too-small
1692regerror() buffer.
1693
169417. Fixed typo ("&&" for "&") in pcre2_study(). Fortunately, this could not
1695actually affect anything, by sheer luck.
1696
169718. Two minor fixes for MSVC compilation: (a) removal of apparently incorrect
1698"const" qualifiers in pcre2test and (b) defining snprintf as _snprintf for
1699older MSVC compilers. This has been done both in src/pcre2_internal.h for most
1700of the library, and also in src/pcre2posix.c, which no longer includes
1701pcre2_internal.h (see 24 below).
1702
170319. Applied Chris Wilson's patch (Bugzilla #1681) to CMakeLists.txt for MSVC
1704static compilation. Subsequently applied Chris Wilson's second patch, putting
1705the first patch under a new option instead of being unconditional when
1706PCRE_STATIC is set.
1707
170820. Updated pcre2grep to set stdout as binary when run under Windows, so as not
1709to convert \r\n at the ends of reflected lines into \r\r\n. This required
1710ensuring that other output that is written to stdout (e.g. file names) uses the
1711appropriate line terminator: \r\n for Windows, \n otherwise.
1712
171321. When a line is too long for pcre2grep's internal buffer, show the maximum
1714length in the error message.
1715
171622. Added support for string callouts to pcre2grep (Zoltan's patch with PH
1717additions).
1718
171923. RunTest.bat was missing a "set type" line for test 22.
1720
172124. The pcre2posix.c file was including pcre2_internal.h, and using some
1722"private" knowledge of the data structures. This is unnecessary; the code has
1723been re-factored and no longer includes pcre2_internal.h.
1724
172525. A racing condition is fixed in JIT reported by Mozilla.
1726
172726. Minor code refactor to avoid "array subscript is below array bounds"
1728compiler warning.
1729
173027. Minor code refactor to avoid "left shift of negative number" warning.
1731
173228. Add a bit more sanity checking to pcre2_serialize_decode() and document
1733that it expects trusted data.
1734
173529. Fix typo in pcre2_jit_test.c
1736
173730. Due to an oversight, pcre2grep was not making use of JIT when available.
1738This is now fixed.
1739
174031. The RunGrepTest script is updated to use the valgrind suppressions file
1741when testing with JIT under valgrind (compare 10.21/51 below). The suppressions
1742file is updated so that is now the same as for PCRE1: it suppresses the
1743Memcheck warnings Addr16 and Cond in unknown objects (that is, JIT-compiled
1744code). Also changed smc-check=all to smc-check=all-non-file as was done for
1745RunTest (see 4 above).
1746
174732. Implemented the PCRE2_NO_JIT option for pcre2_match().
1748
174933. Fix typo that gave a compiler error when JIT not supported.
1750
175134. Fix comment describing the returns from find_fixedlength().
1752
175335. Fix potential negative index in pcre2test.
1754
175536. Calls to pcre2_get_error_message() with error numbers that are never
1756returned by PCRE2 functions were returning empty strings. Now the error code
1757PCRE2_ERROR_BADDATA is returned. A facility has been added to pcre2test to
1758show the texts for given error numbers (i.e. to call pcre2_get_error_message()
1759and display what it returns) and a few representative error codes are now
1760checked in RunTest.
1761
176237. Added "&& !defined(__INTEL_COMPILER)" to the test for __GNUC__ in
1763pcre2_match.c, in anticipation that this is needed for the same reason it was
1764recently added to pcrecpp.cc in PCRE1.
1765
176638. Using -o with -M in pcre2grep could cause unnecessary repeated output when
1767the match extended over a line boundary, as it tried to find more matches "on
1768the same line" - but it was already over the end.
1769
177039. Allow \C in lookbehinds and DFA matching in UTF-32 mode (by converting it
1771to the same code as '.' when PCRE2_DOTALL is set).
1772
177340. Fix two clang compiler warnings in pcre2test when only one code unit width
1774is supported.
1775
177641. Upgrade RunTest to automatically re-run test 2 with a large (64MiB) stack
1777if it fails when running the interpreter with a 16MiB stack (and if changing
1778the stack size via pcre2test is possible). This avoids having to manually set a
1779large stack size when testing with clang.
1780
178142. Fix register overwite in JIT when SSE2 acceleration is enabled.
1782
178343. Detect integer overflow in pcre2test pattern and data repetition counts.
1784
178544. In pcre2test, ignore "allcaptures" after DFA matching.
1786
178745. Fix unaligned accesses on x86. Patch by Marc Mutz.
1788
178946. Fix some more clang compiler warnings.
1790
1791
1792Version 10.21 12-January-2016
1793-----------------------------
1794
17951. Improve matching speed of patterns starting with + or * in JIT.
1796
17972. Use memchr() to find the first character in an unanchored match in 8-bit
1798mode in the interpreter. This gives a significant speed improvement.
1799
18003. Removed a redundant copy of the opcode_possessify table in the
1801pcre2_auto_possessify.c source.
1802
18034. Fix typos in dftables.c for z/OS.
1804
18055. Change 36 for 10.20 broke the handling of [[:>:]] and [[:<:]] in that
1806processing them could involve a buffer overflow if the following character was
1807an opening parenthesis.
1808
18096. Change 36 for 10.20 also introduced a bug in processing this pattern:
1810/((?x)(*:0))#(?'/. Specifically: if a setting of (?x) was followed by a (*MARK)
1811setting (which (*:0) is), then (?x) did not get unset at the end of its group
1812during the scan for named groups, and hence the external # was incorrectly
1813treated as a comment and the invalid (?' at the end of the pattern was not
1814diagnosed. This caused a buffer overflow during the real compile. This bug was
1815discovered by Karl Skomski with the LLVM fuzzer.
1816
18177. Moved the pcre2_find_bracket() function from src/pcre2_compile.c into its
1818own source module to avoid a circular dependency between src/pcre2_compile.c
1819and src/pcre2_study.c
1820
18218. A callout with a string argument containing an opening square bracket, for
1822example /(?C$[$)(?<]/, was incorrectly processed and could provoke a buffer
1823overflow. This bug was discovered by Karl Skomski with the LLVM fuzzer.
1824
18259. The handling of callouts during the pre-pass for named group identification
1826has been tightened up.
1827
182810. The quantifier {1} can be ignored, whether greedy, non-greedy, or
1829possessive. This is a very minor optimization.
1830
183111. A possessively repeated conditional group that could match an empty string,
1832for example, /(?(R))*+/, was incorrectly compiled.
1833
183412. The Unicode tables have been updated to Unicode 8.0.0 (thanks to Christian
1835Persch).
1836
183713. An empty comment (?#) in a pattern was incorrectly processed and could
1838provoke a buffer overflow. This bug was discovered by Karl Skomski with the
1839LLVM fuzzer.
1840
184114. Fix infinite recursion in the JIT compiler when certain patterns such as
1842/(?:|a|){100}x/ are analysed.
1843
184415. Some patterns with character classes involving [: and \\ were incorrectly
1845compiled and could cause reading from uninitialized memory or an incorrect
1846error diagnosis. Examples are: /[[:\\](?<[::]/ and /[[:\\](?'abc')[a:]. The
1847first of these bugs was discovered by Karl Skomski with the LLVM fuzzer.
1848
184916. Pathological patterns containing many nested occurrences of [: caused
1850pcre2_compile() to run for a very long time. This bug was found by the LLVM
1851fuzzer.
1852
185317. A missing closing parenthesis for a callout with a string argument was not
1854being diagnosed, possibly leading to a buffer overflow. This bug was found by
1855the LLVM fuzzer.
1856
185718. A conditional group with only one branch has an implicit empty alternative
1858branch and must therefore be treated as potentially matching an empty string.
1859
186019. If (?R was followed by - or + incorrect behaviour happened instead of a
1861diagnostic. This bug was discovered by Karl Skomski with the LLVM fuzzer.
1862
186320. Another bug that was introduced by change 36 for 10.20: conditional groups
1864whose condition was an assertion preceded by an explicit callout with a string
1865argument might be incorrectly processed, especially if the string contained \Q.
1866This bug was discovered by Karl Skomski with the LLVM fuzzer.
1867
186821. Compiling PCRE2 with the sanitize options of clang showed up a number of
1869very pedantic coding infelicities and a buffer overflow while checking a UTF-8
1870string if the final multi-byte UTF-8 character was truncated.
1871
187222. For Perl compatibility in EBCDIC environments, ranges such as a-z in a
1873class, where both values are literal letters in the same case, omit the
1874non-letter EBCDIC code points within the range.
1875
187623. Finding the minimum matching length of complex patterns with back
1877references and/or recursions can take a long time. There is now a cut-off that
1878gives up trying to find a minimum length when things get too complex.
1879
188024. An optimization has been added that speeds up finding the minimum matching
1881length for patterns containing repeated capturing groups or recursions.
1882
188325. If a pattern contained a back reference to a group whose number was
1884duplicated as a result of appearing in a (?|...) group, the computation of the
1885minimum matching length gave a wrong result, which could cause incorrect "no
1886match" errors. For such patterns, a minimum matching length cannot at present
1887be computed.
1888
188926. Added a check for integer overflow in conditions (?(<digits>) and
1890(?(R<digits>). This omission was discovered by Karl Skomski with the LLVM
1891fuzzer.
1892
189327. Fixed an issue when \p{Any} inside an xclass did not read the current
1894character.
1895
189628. If pcre2grep was given the -q option with -c or -l, or when handling a
1897binary file, it incorrectly wrote output to stdout.
1898
189929. The JIT compiler did not restore the control verb head in case of *THEN
1900control verbs. This issue was found by Karl Skomski with a custom LLVM fuzzer.
1901
190230. The way recursive references such as (?3) are compiled has been re-written
1903because the old way was the cause of many issues. Now, conversion of the group
1904number into a pattern offset does not happen until the pattern has been
1905completely compiled. This does mean that detection of all infinitely looping
1906recursions is postponed till match time. In the past, some easy ones were
1907detected at compile time. This re-writing was done in response to yet another
1908bug found by the LLVM fuzzer.
1909
191031. A test for a back reference to a non-existent group was missing for items
1911such as \987. This caused incorrect code to be compiled. This issue was found
1912by Karl Skomski with a custom LLVM fuzzer.
1913
191432. Error messages for syntax errors following \g and \k were giving inaccurate
1915offsets in the pattern.
1916
191733. Improve the performance of starting single character repetitions in JIT.
1918
191934. (*LIMIT_MATCH=) now gives an error instead of setting the value to 0.
1920
192135. Error messages for syntax errors in *LIMIT_MATCH and *LIMIT_RECURSION now
1922give the right offset instead of zero.
1923
192436. The JIT compiler should not check repeats after a {0,1} repeat byte code.
1925This issue was found by Karl Skomski with a custom LLVM fuzzer.
1926
192737. The JIT compiler should restore the control chain for empty possessive
1928repeats. This issue was found by Karl Skomski with a custom LLVM fuzzer.
1929
193038. A bug which was introduced by the single character repetition optimization
1931was fixed.
1932
193339. Match limit check added to recursion. This issue was found by Karl Skomski
1934with a custom LLVM fuzzer.
1935
193640. Arrange for the UTF check in pcre2_match() and pcre2_dfa_match() to look
1937only at the part of the subject that is relevant when the starting offset is
1938non-zero.
1939
194041. Improve first character match in JIT with SSE2 on x86.
1941
194242. Fix two assertion fails in JIT. These issues were found by Karl Skomski
1943with a custom LLVM fuzzer.
1944
194543. Correct the setting of CMAKE_C_FLAGS in CMakeLists.txt (patch from Roy Ivy
1946III).
1947
194844. Fix bug in RunTest.bat for new test 14, and adjust the script for the added
1949test (there are now 20 in total).
1950
195145. Fixed a corner case of range optimization in JIT.
1952
195346. Add the ${*MARK} facility to pcre2_substitute().
1954
195547. Modifier lists in pcre2test were splitting at spaces without the required
1956commas.
1957
195848. Implemented PCRE2_ALT_VERBNAMES.
1959
196049. Fixed two issues in JIT. These were found by Karl Skomski with a custom
1961LLVM fuzzer.
1962
196350. The pcre2test program has been extended by adding the #newline_default
1964command. This has made it possible to run the standard tests when PCRE2 is
1965compiled with either CR or CRLF as the default newline convention. As part of
1966this work, the new command was added to several test files and the testing
1967scripts were modified. The pcre2grep tests can now also be run when there is no
1968LF in the default newline convention.
1969
197051. The RunTest script has been modified so that, when JIT is used and valgrind
1971is specified, a valgrind suppressions file is set up to ignore "Invalid read of
1972size 16" errors because these are false positives when the hardware supports
1973the SSE2 instruction set.
1974
197552. It is now possible to have comment lines amid the subject strings in
1976pcre2test (and perltest.sh) input.
1977
197853. Implemented PCRE2_USE_OFFSET_LIMIT and pcre2_set_offset_limit().
1979
198054. Add the null_context modifier to pcre2test so that calling pcre2_compile()
1981and the matching functions with NULL contexts can be tested.
1982
198355. Implemented PCRE2_SUBSTITUTE_EXTENDED.
1984
198556. In a character class such as [\W\p{Any}] where both a negative-type escape
1986("not a word character") and a property escape were present, the property
1987escape was being ignored.
1988
198957. Fixed integer overflow for patterns whose minimum matching length is very,
1990very large.
1991
199258. Implemented --never-backslash-C.
1993
199459. Change 55 above introduced a bug by which certain patterns provoked the
1995erroneous error "\ at end of pattern".
1996
199760. The special sequences [[:<:]] and [[:>:]] gave rise to incorrect compiling
1998errors or other strange effects if compiled in UCP mode. Found with libFuzzer
1999and AddressSanitizer.
2000
200161. Whitespace at the end of a pcre2test pattern line caused a spurious error
2002message if there were only single-character modifiers. It should be ignored.
2003
200462. The use of PCRE2_NO_AUTO_CAPTURE could cause incorrect compilation results
2005or segmentation errors for some patterns. Found with libFuzzer and
2006AddressSanitizer.
2007
200863. Very long names in (*MARK) or (*THEN) etc. items could provoke a buffer
2009overflow.
2010
201164. Improve error message for overly-complicated patterns.
2012
201365. Implemented an optional replication feature for patterns in pcre2test, to
2014make it easier to test long repetitive patterns. The tests for 63 above are
2015converted to use the new feature.
2016
201766. In the POSIX wrapper, if regerror() was given too small a buffer, it could
2018misbehave.
2019
202067. In pcre2_substitute() in UTF mode, the UTF validity check on the
2021replacement string was happening before the length setting when the replacement
2022string was zero-terminated.
2023
202468. In pcre2_substitute() in UTF mode, PCRE2_NO_UTF_CHECK can be set for the
2025second and subsequent calls to pcre2_match().
2026
202769. There was no check for integer overflow for a replacement group number in
2028pcre2_substitute(). An added check for a number greater than the largest group
2029number in the pattern means this is not now needed.
2030
203170. The PCRE2-specific VERSION condition didn't work correctly if only one
2032digit was given after the decimal point, or if more than two digits were given.
2033It now works with one or two digits, and gives a compile time error if more are
2034given.
2035
203671. In pcre2_substitute() there was the possibility of reading one code unit
2037beyond the end of the replacement string.
2038
203972. The code for checking a subject's UTF-32 validity for a pattern with a
2040lookbehind involved an out-of-bounds pointer, which could potentially cause
2041trouble in some environments.
2042
204373. The maximum lookbehind length was incorrectly calculated for patterns such
2044as /(?<=(a)(?-1))x/ which have a recursion within a backreference.
2045
204674. Give an error if a lookbehind assertion is longer than 65535 code units.
2047
204875. Give an error in pcre2_substitute() if a match ends before it starts (as a
2049result of the use of \K).
2050
205176. Check the length of subpattern names and the names in (*MARK:xx) etc.
2052dynamically to avoid the possibility of integer overflow.
2053
205477. Implement pcre2_set_max_pattern_length() so that programs can restrict the
2055size of patterns that they are prepared to handle.
2056
205778. (*NO_AUTO_POSSESS) was not working.
2058
205979. Adding group information caching improves the speed of compiling when
2060checking whether a group has a fixed length and/or could match an empty string,
2061especially when recursion or subroutine calls are involved. However, this
2062cannot be used when (?| is present in the pattern because the same number may
2063be used for groups of different sizes. To catch runaway patterns in this
2064situation, counts have been introduced to the functions that scan for empty
2065branches or compute fixed lengths.
2066
206780. Allow for the possibility of the size of the nest_save structure not being
2068a factor of the size of the compiling workspace (it currently is).
2069
207081. Check for integer overflow in minimum length calculation and cap it at
207165535.
2072
207382. Small optimizations in code for finding the minimum matching length.
2074
207583. Lock out configuring for EBCDIC with non-8-bit libraries.
2076
207784. Test for error code <= 0 in regerror().
2078
207985. Check for too many replacements (more than INT_MAX) in pcre2_substitute().
2080
208186. Avoid the possibility of computing with an out-of-bounds pointer (though
2082not dereferencing it) while handling lookbehind assertions.
2083
208487. Failure to get memory for the match data in regcomp() is now given as a
2085regcomp() error instead of waiting for regexec() to pick it up.
2086
208788. In pcre2_substitute(), ensure that CRLF is not split when it is a valid
2088newline sequence.
2089
209089. Paranoid check in regcomp() for bad error code from pcre2_compile().
2091
209290. Run test 8 (internal offsets and code sizes) for link sizes 3 and 4 as well
2093as for link size 2.
2094
209591. Document that JIT has a limit on pattern size, and give more information
2096about JIT compile failures in pcre2test.
2097
209892. Implement PCRE2_INFO_HASBACKSLASHC.
2099
210093. Re-arrange valgrind support code in pcre2test to avoid spurious reports
2101with JIT (possibly caused by SSE2?).
2102
210394. Support offset_limit in JIT.
2104
210595. A sequence such as [[:punct:]b] that is, a POSIX character class followed
2106by a single ASCII character in a class item, was incorrectly compiled in UCP
2107mode. The POSIX class got lost, but only if the single character followed it.
2108
210996. [:punct:] in UCP mode was matching some characters in the range 128-255
2110that should not have been matched.
2111
211297. If [:^ascii:] or [:^xdigit:] are present in a non-negated class, all
2113characters with code points greater than 255 are in the class. When a Unicode
2114property was also in the class (if PCRE2_UCP is set, escapes such as \w are
2115turned into Unicode properties), wide characters were not correctly handled,
2116and could fail to match.
2117
211898. In pcre2test, make the "startoffset" modifier a synonym of "offset",
2119because it sets the "startoffset" parameter for pcre2_match().
2120
212199. If PCRE2_AUTO_CALLOUT was set on a pattern that had a (?# comment between
2122an item and its qualifier (for example, A(?#comment)?B) pcre2_compile()
2123misbehaved. This bug was found by the LLVM fuzzer.
2124
2125100. The error for an invalid UTF pattern string always gave the code unit
2126offset as zero instead of where the invalidity was found.
2127
2128101. Further to 97 above, negated classes such as [^[:^ascii:]\d] were also not
2129working correctly in UCP mode.
2130
2131102. Similar to 99 above, if an isolated \E was present between an item and its
2132qualifier when PCRE2_AUTO_CALLOUT was set, pcre2_compile() misbehaved. This bug
2133was found by the LLVM fuzzer.
2134
2135103. The POSIX wrapper function regexec() crashed if the option REG_STARTEND
2136was set when the pmatch argument was NULL. It now returns REG_INVARG.
2137
2138104. Allow for up to 32-bit numbers in the ordin() function in pcre2grep.
2139
2140105. An empty \Q\E sequence between an item and its qualifier caused
2141pcre2_compile() to misbehave when auto callouts were enabled. This bug
2142was found by the LLVM fuzzer.
2143
2144106. If both PCRE2_ALT_VERBNAMES and PCRE2_EXTENDED were set, and a (*MARK) or
2145other verb "name" ended with whitespace immediately before the closing
2146parenthesis, pcre2_compile() misbehaved. Example: /(*:abc )/, but only when
2147both those options were set.
2148
2149107. In a number of places pcre2_compile() was not handling NULL characters
2150correctly, and pcre2test with the "bincode" modifier was not always correctly
2151displaying fields containing NULLS:
2152
2153 (a) Within /x extended #-comments
2154 (b) Within the "name" part of (*MARK) and other *verbs
2155 (c) Within the text argument of a callout
2156
2157108. If a pattern that was compiled with PCRE2_EXTENDED started with white
2158space or a #-type comment that was followed by (?-x), which turns off
2159PCRE2_EXTENDED, and there was no subsequent (?x) to turn it on again,
2160pcre2_compile() assumed that (?-x) applied to the whole pattern and
2161consequently mis-compiled it. This bug was found by the LLVM fuzzer. The fix
2162for this bug means that a setting of any of the (?imsxJU) options at the start
2163of a pattern is no longer transferred to the options that are returned by
2164PCRE2_INFO_ALLOPTIONS. In fact, this was an anachronism that should have
2165changed when the effects of those options were all moved to compile time.
2166
2167109. An escaped closing parenthesis in the "name" part of a (*verb) when
2168PCRE2_ALT_VERBNAMES was set caused pcre2_compile() to malfunction. This bug
2169was found by the LLVM fuzzer.
2170
2171110. Implemented PCRE2_SUBSTITUTE_UNSET_EMPTY, and updated pcre2test to make it
2172possible to test it.
2173
2174111. "Harden" pcre2test against ridiculously large values in modifiers and
2175command line arguments.
2176
2177112. Implemented PCRE2_SUBSTITUTE_UNKNOWN_UNSET and PCRE2_SUBSTITUTE_OVERFLOW_
2178LENGTH.
2179
2180113. Fix printing of *MARK names that contain binary zeroes in pcre2test.
2181
2182
2183Version 10.20 30-June-2015
2184--------------------------
2185
21861. Callouts with string arguments have been added.
2187
21882. Assertion code generator in JIT has been optimized.
2189
21903. The invalid pattern (?(?C) has a missing assertion condition at the end. The
2191pcre2_compile() function read past the end of the input before diagnosing an
2192error. This bug was discovered by the LLVM fuzzer.
2193
21944. Implemented pcre2_callout_enumerate().
2195
21965. Fix JIT compilation of conditional blocks whose assertion is converted to
2197(*FAIL). E.g: /(?(?!))/.
2198
21996. The pattern /(?(?!)^)/ caused references to random memory. This bug was
2200discovered by the LLVM fuzzer.
2201
22027. The assertion (?!) is optimized to (*FAIL). This was not handled correctly
2203when this assertion was used as a condition, for example (?(?!)a|b). In
2204pcre2_match() it worked by luck; in pcre2_dfa_match() it gave an incorrect
2205error about an unsupported item.
2206
22078. For some types of pattern, for example /Z*(|d*){216}/, the auto-
2208possessification code could take exponential time to complete. A recursion
2209depth limit of 1000 has been imposed to limit the resources used by this
2210optimization. This infelicity was discovered by the LLVM fuzzer.
2211
22129. A pattern such as /(*UTF)[\S\V\H]/, which contains a negated special class
2213such as \S in non-UCP mode, explicit wide characters (> 255) can be ignored
2214because \S ensures they are all in the class. The code for doing this was
2215interacting badly with the code for computing the amount of space needed to
2216compile the pattern, leading to a buffer overflow. This bug was discovered by
2217the LLVM fuzzer.
2218
221910. A pattern such as /((?2)+)((?1))/ which has mutual recursion nested inside
2220other kinds of group caused stack overflow at compile time. This bug was
2221discovered by the LLVM fuzzer.
2222
222311. A pattern such as /(?1)(?#?'){8}(a)/ which had a parenthesized comment
2224between a subroutine call and its quantifier was incorrectly compiled, leading
2225to buffer overflow or other errors. This bug was discovered by the LLVM fuzzer.
2226
222712. The illegal pattern /(?(?<E>.*!.*)?)/ was not being diagnosed as missing an
2228assertion after (?(. The code was failing to check the character after (?(?<
2229for the ! or = that would indicate a lookbehind assertion. This bug was
2230discovered by the LLVM fuzzer.
2231
223213. A pattern such as /X((?2)()*+){2}+/ which has a possessive quantifier with
2233a fixed maximum following a group that contains a subroutine reference was
2234incorrectly compiled and could trigger buffer overflow. This bug was discovered
2235by the LLVM fuzzer.
2236
223714. Negative relative recursive references such as (?-7) to non-existent
2238subpatterns were not being diagnosed and could lead to unpredictable behaviour.
2239This bug was discovered by the LLVM fuzzer.
2240
224115. The bug fixed in 14 was due to an integer variable that was unsigned when
2242it should have been signed. Some other "int" variables, having been checked,
2243have either been changed to uint32_t or commented as "must be signed".
2244
224516. A mutual recursion within a lookbehind assertion such as (?<=((?2))((?1)))
2246caused a stack overflow instead of the diagnosis of a non-fixed length
2247lookbehind assertion. This bug was discovered by the LLVM fuzzer.
2248
224917. The use of \K in a positive lookbehind assertion in a non-anchored pattern
2250(e.g. /(?<=\Ka)/) could make pcre2grep loop.
2251
225218. There was a similar problem to 17 in pcre2test for global matches, though
2253the code there did catch the loop.
2254
225519. If a greedy quantified \X was preceded by \C in UTF mode (e.g. \C\X*),
2256and a subsequent item in the pattern caused a non-match, backtracking over the
2257repeated \X did not stop, but carried on past the start of the subject, causing
2258reference to random memory and/or a segfault. There were also some other cases
2259where backtracking after \C could crash. This set of bugs was discovered by the
2260LLVM fuzzer.
2261
226220. The function for finding the minimum length of a matching string could take
2263a very long time if mutual recursion was present many times in a pattern, for
2264example, /((?2){73}(?2))((?1))/. A better mutual recursion detection method has
2265been implemented. This infelicity was discovered by the LLVM fuzzer.
2266
226721. Implemented PCRE2_NEVER_BACKSLASH_C.
2268
226922. The feature for string replication in pcre2test could read from freed
2270memory if the replication required a buffer to be extended, and it was not
2271working properly in 16-bit and 32-bit modes. This issue was discovered by a
2272fuzzer: see http://lcamtuf.coredump.cx/afl/.
2273
227423. Added the PCRE2_ALT_CIRCUMFLEX option.
2275
227624. Adjust the treatment of \8 and \9 to be the same as the current Perl
2277behaviour.
2278
227925. Static linking against the PCRE2 library using the pkg-config module was
2280failing on missing pthread symbols.
2281
228226. If a group that contained a recursive back reference also contained a
2283forward reference subroutine call followed by a non-forward-reference
2284subroutine call, for example /.((?2)(?R)\1)()/, pcre2_compile() failed to
2285compile correct code, leading to undefined behaviour or an internally detected
2286error. This bug was discovered by the LLVM fuzzer.
2287
228827. Quantification of certain items (e.g. atomic back references) could cause
2289incorrect code to be compiled when recursive forward references were involved.
2290For example, in this pattern: /(?1)()((((((\1++))\x85)+)|))/. This bug was
2291discovered by the LLVM fuzzer.
2292
229328. A repeated conditional group whose condition was a reference by name caused
2294a buffer overflow if there was more than one group with the given name. This
2295bug was discovered by the LLVM fuzzer.
2296
229729. A recursive back reference by name within a group that had the same name as
2298another group caused a buffer overflow. For example: /(?J)(?'d'(?'d'\g{d}))/.
2299This bug was discovered by the LLVM fuzzer.
2300
230130. A forward reference by name to a group whose number is the same as the
2302current group, for example in this pattern: /(?|(\k'Pm')|(?'Pm'))/, caused a
2303buffer overflow at compile time. This bug was discovered by the LLVM fuzzer.
2304
230531. Fix -fsanitize=undefined warnings for left shifts of 1 by 31 (it treats 1
2306as an int; fixed by writing it as 1u).
2307
230832. Fix pcre2grep compile when -std=c99 is used with gcc, though it still gives
2309a warning for "fileno" unless -std=gnu99 us used.
2310
231133. A lookbehind assertion within a set of mutually recursive subpatterns could
2312provoke a buffer overflow. This bug was discovered by the LLVM fuzzer.
2313
231434. Give an error for an empty subpattern name such as (?'').
2315
231635. Make pcre2test give an error if a pattern that follows #forbud_utf contains
2317\P, \p, or \X.
2318
231936. The way named subpatterns are handled has been refactored. There is now a
2320pre-pass over the regex which does nothing other than identify named
2321subpatterns and count the total captures. This means that information about
2322named patterns is known before the rest of the compile. In particular, it means
2323that forward references can be checked as they are encountered. Previously, the
2324code for handling forward references was contorted and led to several errors in
2325computing the memory requirements for some patterns, leading to buffer
2326overflows.
2327
232837. There was no check for integer overflow in subroutine calls such as (?123).
2329
233038. The table entry for \l in EBCDIC environments was incorrect, leading to its
2331being treated as a literal 'l' instead of causing an error.
2332
233339. If a non-capturing group containing a conditional group that could match
2334an empty string was repeated, it was not identified as matching an empty string
2335itself. For example: /^(?:(?(1)x|)+)+$()/.
2336
233740. In an EBCDIC environment, pcretest was mishandling the escape sequences
2338\a and \e in test subject lines.
2339
234041. In an EBCDIC environment, \a in a pattern was converted to the ASCII
2341instead of the EBCDIC value.
2342
234342. The handling of \c in an EBCDIC environment has been revised so that it is
2344now compatible with the specification in Perl's perlebcdic page.
2345
234643. Single character repetition in JIT has been improved. 20-30% speedup
2347was achieved on certain patterns.
2348
234944. The EBCDIC character 0x41 is a non-breaking space, equivalent to 0xa0 in
2350ASCII/Unicode. This has now been added to the list of characters that are
2351recognized as white space in EBCDIC.
2352
235345. When PCRE2 was compiled without Unicode support, the use of \p and \P gave
2354an error (correctly) when used outside a class, but did not give an error
2355within a class.
2356
235746. \h within a class was incorrectly compiled in EBCDIC environments.
2358
235947. JIT should return with error when the compiled pattern requires
2360more stack space than the maximum.
2361
236248. Fixed a memory leak in pcre2grep when a locale is set.
2363
2364
2365Version 10.10 06-March-2015
2366---------------------------
2367
23681. When a pattern is compiled, it remembers the highest back reference so that
2369when matching, if the ovector is too small, extra memory can be obtained to
2370use instead. A conditional subpattern whose condition is a check on a capture
2371having happened, such as, for example in the pattern /^(?:(a)|b)(?(1)A|B)/, is
2372another kind of back reference, but it was not setting the highest
2373backreference number. This mattered only if pcre2_match() was called with an
2374ovector that was too small to hold the capture, and there was no other kind of
2375back reference (a situation which is probably quite rare). The effect of the
2376bug was that the condition was always treated as FALSE when the capture could
2377not be consulted, leading to a incorrect behaviour by pcre2_match(). This bug
2378has been fixed.
2379
23802. Functions for serialization and deserialization of sets of compiled patterns
2381have been added.
2382
23833. The value that is returned by PCRE2_INFO_SIZE has been corrected to remove
2384excess code units at the end of the data block that may occasionally occur if
2385the code for calculating the size over-estimates. This change stops the
2386serialization code copying uninitialized data, to which valgrind objects. The
2387documentation of PCRE2_INFO_SIZE was incorrect in stating that the size did not
2388include the general overhead. This has been corrected.
2389
23904. All code units in every slot in the table of group names are now set, again
2391in order to avoid accessing uninitialized data when serializing.
2392
23935. The (*NO_JIT) feature is implemented.
2394
23956. If a bug that caused pcre2_compile() to use more memory than allocated was
2396triggered when using valgrind, the code in (3) above passed a stupidly large
2397value to valgrind. This caused a crash instead of an "internal error" return.
2398
23997. A reference to a duplicated named group (either a back reference or a test
2400for being set in a conditional) that occurred in a part of the pattern where
2401PCRE2_DUPNAMES was not set caused the amount of memory needed for the pattern
2402to be incorrectly calculated, leading to overwriting.
2403
24048. A mutually recursive set of back references such as (\2)(\1) caused a
2405segfault at compile time (while trying to find the minimum matching length).
2406The infinite loop is now broken (with the minimum length unset, that is, zero).
2407
24089. If an assertion that was used as a condition was quantified with a minimum
2409of zero, matching went wrong. In particular, if the whole group had unlimited
2410repetition and could match an empty string, a segfault was likely. The pattern
2411(?(?=0)?)+ is an example that caused this. Perl allows assertions to be
2412quantified, but not if they are being used as conditions, so the above pattern
2413is faulted by Perl. PCRE2 has now been changed so that it also rejects such
2414patterns.
2415
241610. The error message for an invalid quantifier has been changed from "nothing
2417to repeat" to "quantifier does not follow a repeatable item".
2418
241911. If a bad UTF string is compiled with NO_UTF_CHECK, it may succeed, but
2420scanning the compiled pattern in subsequent auto-possessification can get out
2421of step and lead to an unknown opcode. Previously this could have caused an
2422infinite loop. Now it generates an "internal error" error. This is a tidyup,
2423not a bug fix; passing bad UTF with NO_UTF_CHECK is documented as having an
2424undefined outcome.
2425
242612. A UTF pattern containing a "not" match of a non-ASCII character and a
2427subroutine reference could loop at compile time. Example: /[^\xff]((?1))/.
2428
242913. The locale test (RunTest 3) has been upgraded. It now checks that a locale
2430that is found in the output of "locale -a" can actually be set by pcre2test
2431before it is accepted. Previously, in an environment where a locale was listed
2432but would not set (an example does exist), the test would "pass" without
2433actually doing anything. Also the fr_CA locale has been added to the list of
2434locales that can be used.
2435
243614. Fixed a bug in pcre2_substitute(). If a replacement string ended in a
2437capturing group number without parentheses, the last character was incorrectly
2438literally included at the end of the replacement string.
2439
244015. A possessive capturing group such as (a)*+ with a minimum repeat of zero
2441failed to allow the zero-repeat case if pcre2_match() was called with an
2442ovector too small to capture the group.
2443
244416. Improved error message in pcre2test when setting the stack size (-S) fails.
2445
244617. Fixed two bugs in CMakeLists.txt: (1) Some lines had got lost in the
2447transfer from PCRE1, meaning that CMake configuration failed if "build tests"
2448was selected. (2) The file src/pcre2_serialize.c had not been added to the list
2449of PCRE2 sources, which caused a failure to build pcre2test.
2450
245118. Fixed typo in pcre2_serialize.c (DECL instead of DEFN) that causes problems
2452only on Windows.
2453
245419. Use binary input when reading back saved serialized patterns in pcre2test.
2455
245620. Added RunTest.bat for running the tests under Windows.
2457
245821. "make distclean" was not removing config.h, a file that may be created for
2459use with CMake.
2460
246122. A pattern such as "((?2){0,1999}())?", which has a group containing a
2462forward reference repeated a large (but limited) number of times within a
2463repeated outer group that has a zero minimum quantifier, caused incorrect code
2464to be compiled, leading to the error "internal error: previously-checked
2465referenced subpattern not found" when an incorrect memory address was read.
2466This bug was reported as "heap overflow", discovered by Kai Lu of Fortinet's
2467FortiGuard Labs. (Added 24-March-2015: CVE-2015-2325 was given to this.)
2468
246923. A pattern such as "((?+1)(\1))/" containing a forward reference subroutine
2470call within a group that also contained a recursive back reference caused
2471incorrect code to be compiled. This bug was reported as "heap overflow",
2472discovered by Kai Lu of Fortinet's FortiGuard Labs. (Added 24-March-2015:
2473CVE-2015-2326 was given to this.)
2474
247524. Computing the size of the JIT read-only data in advance has been a source
2476of various issues, and new ones are still appear unfortunately. To fix
2477existing and future issues, size computation is eliminated from the code,
2478and replaced by on-demand memory allocation.
2479
248025. A pattern such as /(?i)[A-`]/, where characters in the other case are
2481adjacent to the end of the range, and the range contained characters with more
2482than one other case, caused incorrect behaviour when compiled in UTF mode. In
2483that example, the range a-j was left out of the class.
2484
2485
2486Version 10.00 05-January-2015
2487-----------------------------
2488
2489Version 10.00 is the first release of PCRE2, a revised API for the PCRE
2490library. Changes prior to 10.00 are logged in the ChangeLog file for the old
2491API, up to item 20 for release 8.36.
2492
2493The code of the library was heavily revised as part of the new API
2494implementation. Details of each and every modification were not individually
2495logged. In addition to the API changes, the following changes were made. They
2496are either new functionality, or bug fixes and other noticeable changes of
2497behaviour that were implemented after the code had been forked.
2498
24991. Including Unicode support at build time is now enabled by default, but it
2500can optionally be disabled. It is not enabled by default at run time (no
2501change).
2502
25032. The test program, now called pcre2test, was re-specified and almost
2504completely re-written. Its input is not compatible with input for pcretest.
2505
25063. Patterns may start with (*NOTEMPTY) or (*NOTEMPTY_ATSTART) to set the
2507PCRE2_NOTEMPTY or PCRE2_NOTEMPTY_ATSTART options for every subject line that is
2508matched by that pattern.
2509
25104. For the benefit of those who use PCRE2 via some other application, that is,
2511not writing the function calls themselves, it is possible to check the PCRE2
2512version by matching a pattern such as /(?(VERSION>=10)yes|no)/ against a
2513string such as "yesno".
2514
25155. There are case-equivalent Unicode characters whose encodings use different
2516numbers of code units in UTF-8. U+023A and U+2C65 are one example. (It is
2517theoretically possible for this to happen in UTF-16 too.) If a backreference to
2518a group containing one of these characters was greedily repeated, and during
2519the match a backtrack occurred, the subject might be backtracked by the wrong
2520number of code units. For example, if /^(\x{23a})\1*(.)/ is matched caselessly
2521(and in UTF-8 mode) against "\x{23a}\x{2c65}\x{2c65}\x{2c65}", group 2 should
2522capture the final character, which is the three bytes E2, B1, and A5 in UTF-8.
2523Incorrect backtracking meant that group 2 captured only the last two bytes.
2524This bug has been fixed; the new code is slower, but it is used only when the
2525strings matched by the repetition are not all the same length.
2526
25276. A pattern such as /()a/ was not setting the "first character must be 'a'"
2528information. This applied to any pattern with a group that matched no
2529characters, for example: /(?:(?=.)|(?<!x))a/.
2530
25317. When an (*ACCEPT) is triggered inside capturing parentheses, it arranges for
2532those parentheses to be closed with whatever has been captured so far. However,
2533it was failing to mark any other groups between the highest capture so far and
2534the currrent group as "unset". Thus, the ovector for those groups contained
2535whatever was previously there. An example is the pattern /(x)|((*ACCEPT))/ when
2536matched against "abcd".
2537
25388. The pcre2_substitute() function has been implemented.
2539
25409. If an assertion used as a condition was quantified with a minimum of zero
2541(an odd thing to do, but it happened), SIGSEGV or other misbehaviour could
2542occur.
2543
254410. The PCRE2_NO_DOTSTAR_ANCHOR option has been implemented.
2545
2546****