Elliott Hughes | 5b80804 | 2021-10-01 10:56:10 -0700 | [diff] [blame] | 1 | Change Log for PCRE2 |
| 2 | -------------------- |
| 3 | |
Elliott Hughes | 4e19c8e | 2022-04-15 15:11:02 -0700 | [diff] [blame] | 4 | |
| 5 | Version 10.40 15-April-2022 |
| 6 | --------------------------- |
| 7 | |
| 8 | 1. Merged patch from @carenas (GitHub #35, 7db87842) to fix pcre2grep incorrect |
| 9 | handling of multiple passes. |
| 10 | |
| 11 | 2. Merged patch from @carenas (GitHub #36, dae47509) to fix portability issue |
| 12 | in pcre2grep with buffered fseek(stdin). |
| 13 | |
| 14 | 3. Merged patch from @carenas (GitHub #37, acc520924) to fix tests when -S is |
| 15 | not supported. |
| 16 | |
| 17 | 4. Revert an unintended change in JIT repeat detection. |
| 18 | |
| 19 | 5. Merged patch from @carenas (GitHub #52, b037bfa1) to fix build on GNU Hurd. |
| 20 | |
| 21 | 6. Merged documentation and comments patches from @carenas (GitHub #47). |
| 22 | |
| 23 | 7. Merged patch from @carenas (GitHub #49) to remove obsolete JFriedl test code |
| 24 | from pcre2grep. |
| 25 | |
| 26 | 8. Merged patch from @carenas (GitHub #48) to fix CMake install issue #46. |
| 27 | |
| 28 | 9. Merged patch from @carenas (GitHub #53) fixing NULL checks in matching and |
| 29 | substituting. |
| 30 | |
| 31 | 10. Add null_subject and null_replacement modifiers to pcre2test. |
| 32 | |
| 33 | 11. Add check for NULL subject to POSIX regexec() function. |
| 34 | |
| 35 | 12. Add check for NULL replacement to pcre2_substitute(). |
| 36 | |
| 37 | 13. For the subject arguments of pcre2_match(), pcre2_dfa_match(), and |
| 38 | pcre2_substitute(), and the replacement argument of the latter, if the pointer |
| 39 | is NULL and the length is zero, treat as an empty string. Apparently a number |
| 40 | of applications treat NULL/0 in this way. |
| 41 | |
| 42 | 14. Added support for Bidi_Class and a number of binary Unicode properties, |
| 43 | including Bidi_Control. |
| 44 | |
| 45 | 15. Fix some minor issues raised by clang sanitize. |
| 46 | |
| 47 | 16. Very minor code speed up for maximizing character property matches. |
| 48 | |
| 49 | 17. A number of changes to script matching for \p and \P: |
| 50 | |
| 51 | (a) Script extensions for a character are now coded as a bitmap instead of |
| 52 | a list of script numbers, which should be faster and does not need a |
| 53 | loop. |
| 54 | |
| 55 | (b) Added the syntax \p{script:xxx} and \p{script_extensions:xxx} (synonyms |
| 56 | sc and scx). |
| 57 | |
| 58 | (c) Changed \p{scriptname} from being the same as \p{sc:scriptname} to being |
| 59 | the same as \p{scx:scriptname} because this change happened in Perl at |
| 60 | release 5.26. |
| 61 | |
| 62 | (d) The standard Unicode 4-letter abbreviations for script names are now |
| 63 | recognized. |
| 64 | |
| 65 | (e) In accordance with Unicode and Perl's "loose matching" rules, spaces, |
| 66 | hyphens, and underscores are ignored in property names, which are then |
| 67 | matched independent of case. |
| 68 | |
| 69 | 18. The Python scripts in the maint directory have been refactored. There are |
| 70 | now three scripts that generate pcre2_ucd.c, pcre2_ucp.h, and pcre2_ucptables.c |
| 71 | (which is #included by pcre2_tables.c). The data lists that used to be |
| 72 | duplicated are now held in a single common Python module. |
| 73 | |
| 74 | 19. On CHERI, and thus Arm's Morello prototype, pointers are represented as |
| 75 | hardware capabilities, which consist of both an integer address and additional |
| 76 | metadata, meaning they are twice the size of the platform's size_t type, i.e. |
| 77 | 16 bytes on a 64-bit system. The ovector member of heapframe happens to only be |
| 78 | 8 byte aligned, and so computing frame_size ended up with a multiple of 8 but |
| 79 | not 16. Whilst the first frame was always suitably aligned, this then |
| 80 | misaligned the frame that follows, resulting in an alignment fault when storing |
| 81 | a pointer to Fecode at the start of match. Patch to fix this issue by Jessica |
| 82 | Clarke PR#72. |
| 83 | |
| 84 | 20. Added -LP and -LS listing options to pcre2test. |
| 85 | |
| 86 | 21. A user discovered that the library names in CMakeLists.txt for MSVC |
| 87 | debugger (PDB) files were incorrect - perhaps never tried for PCRE2? |
| 88 | |
| 89 | 22. An item such as [Aa] is optimized into a caseless single character match. |
| 90 | When this was quantified (e.g. [Aa]{2}) and was also the last literal item in a |
| 91 | pattern, the optimizing "must be present for a match" character check was not |
| 92 | being flagged as caseless, causing some matches that should have succeeded to |
| 93 | fail. |
| 94 | |
| 95 | 23. Fixed a unicode properrty matching issue in JIT. The character was not |
| 96 | fully read in caseless matching. |
| 97 | |
| 98 | 24. Fixed an issue affecting recursions in JIT caused by duplicated data |
| 99 | transfers. |
| 100 | |
| 101 | 25. Merged patch from @carenas (GitHub #96) which fixes some problems with |
| 102 | pcre2test and readline/readedit: |
| 103 | |
| 104 | * Use the right header for libedit in FreeBSD with autoconf |
| 105 | * Really allow libedit with cmake |
| 106 | * Avoid using readline headers with libedit |
| 107 | |
| 108 | |
Elliott Hughes | 16619d6 | 2021-10-29 12:10:38 -0700 | [diff] [blame] | 109 | Version 10.39 29-October-2021 |
| 110 | ----------------------------- |
| 111 | |
| 112 | 1. Fix incorrect detection of alternatives in first character search in JIT. |
| 113 | |
| 114 | 2. Merged patch from @carenas (GitHub #28): |
| 115 | |
| 116 | Visual Studio 2013 includes support for %zu and %td, so let newer |
| 117 | versions of it avoid the fallback, and while at it, make sure that |
| 118 | the first check is for DISABLE_PERCENT_ZT so it will be always |
| 119 | honoured if chosen. |
| 120 | |
| 121 | prtdiff_t is signed, so use a signed type instead, and make sure |
| 122 | that an appropiate width is chosen if pointers are 64bit wide and |
| 123 | long is not (ex: Windows 64bit). |
| 124 | |
| 125 | IMHO removing the cast (and therefore the positibilty of truncation) |
| 126 | make the code cleaner and the fallback is likely portable enough |
| 127 | with all 64-bit POSIX systems doing LP64 except for Windows. |
| 128 | |
| 129 | 3. Merged patch from @carenas (GitHub #29) to update to Unicode 14.0.0. |
| 130 | |
| 131 | 4. Merged patch from @carenas (GitHub #30): |
| 132 | |
| 133 | * Cleanup: remove references to no longer used stdint.h |
| 134 | |
| 135 | Since 19c50b9d (Unconditionally use inttypes.h instead of trying for stdint.h |
| 136 | (simplification) and remove the now unnecessary inclusion in |
| 137 | pcre2_internal.h., 2018-11-14), stdint.h is no longer used. |
| 138 | |
| 139 | Remove checks for it in autotools and CMake and document better the expected |
| 140 | build failures for systems that might have stdint.h (C99) and not inttypes.h |
| 141 | (from POSIX), like old Windows. |
| 142 | |
| 143 | * Cleanup: remove detection for inttypes.h which is a hard dependency |
| 144 | |
| 145 | CMake checks for standard headers are not meant to be used for hard |
| 146 | dependencies, so will prevent a possible fallback to work. |
| 147 | |
| 148 | Alternatively, the header could be checked to make the configuration fail |
| 149 | instead of breaking the build, but that was punted, as it was missing anyway |
| 150 | from autotools. |
| 151 | |
| 152 | 5. Merged patch from @carenas (GitHub #32): |
| 153 | |
| 154 | * jit: allow building with ancient MSVC versions |
| 155 | |
| 156 | Visual Studio older than 2013 fails to build with JIT enabled, because it is |
| 157 | unable to parse non C89 compatible syntax, with mixed declarations and code. |
| 158 | While most recent compilers wouldn't even report this as a warning since it |
| 159 | is valid C99, it could be also made visible by adding to gcc/clang the |
| 160 | -Wdeclaration-after-statement flag at build time. |
| 161 | |
| 162 | Move the code below the affected definitions. |
| 163 | |
| 164 | * pcre2grep: avoid mixing declarations with code |
| 165 | |
| 166 | Since d5a61ee8 (Patch to detect (and ignore) symlink loops in pcre2grep, |
| 167 | 2021-08-28), code will fail to build in a strict C89 compiler. |
| 168 | |
| 169 | Reformat slightly to make it C89 compatible again. |
| 170 | |
| 171 | |
Elliott Hughes | 5b80804 | 2021-10-01 10:56:10 -0700 | [diff] [blame] | 172 | Version 10.38 01-October-2021 |
| 173 | ----------------------------- |
| 174 | |
| 175 | 1. Fix invalid single character repetition issues in JIT when the repetition |
| 176 | is inside a capturing bracket and the bracket is preceeded by character |
| 177 | literals. |
| 178 | |
| 179 | 2. Installed revised CMake configuration files provided by Jan-Willem Blokland. |
| 180 | This extends the CMake build system to build both static and shared libraries |
| 181 | in one go, builds the static library with PIC, and exposes PCRE2 libraries |
| 182 | using the CMake config files. JWB provided these notes: |
| 183 | |
| 184 | - Introduced CMake variable BUILD_STATIC_LIBS to build the static library. |
| 185 | |
| 186 | - Make a small modification to config-cmake.h.in by removing the PCRE2_STATIC |
| 187 | variable. Added PCRE2_STATIC variable to the static build using the |
| 188 | target_compile_definitions() function. |
| 189 | |
| 190 | - Extended the CMake config files. |
| 191 | |
| 192 | - Introduced CMake variable PCRE2_USE_STATIC_LIBS to easily switch between |
| 193 | the static and shared libraries. |
| 194 | |
| 195 | - Added the PCRE_STATIC variable to the target compile definitions for the |
| 196 | import of the static library. |
| 197 | |
| 198 | Building static and shared libraries using MSVC results in a name clash of |
| 199 | the libraries. Both static and shared library builds create, for example, the |
| 200 | file pcre2-8.lib. Therefore, I decided to change the static library names by |
| 201 | adding "-static". For example, pcre2-8.lib has become pcre2-8-static.lib. |
| 202 | [Comment by PH: this is MSVC-specific. It doesn't happen on Linux.] |
| 203 | |
| 204 | 3. Increased the minimum release number for CMake to 3.0.0 because older than |
| 205 | 2.8.12 is deprecated (it was set to 2.8.5) and causes warnings. Even 3.0.0 is |
| 206 | quite old; it was released in 2014. |
| 207 | |
| 208 | 4. Implemented a modified version of Thomas Tempelmann's pcre2grep patch for |
| 209 | detecting symlink loops. This is dependent on the availability of realpath(), |
| 210 | which is now tested for in ./configure and CMakeLists.txt. |
| 211 | |
| 212 | 5. Implemented a modified version of Thomas Tempelmann's patch for faster |
| 213 | case-independent "first code unit" searches for unanchored patterns in 8-bit |
| 214 | mode in the interpreters. Instead of just remembering whether one case matched |
| 215 | or not, it remembers the position of a previous match so as to avoid |
| 216 | unnecessary repeated searching. |
| 217 | |
| 218 | 6. Perl now locks out \K in lookarounds, so PCRE2 now does the same by default. |
| 219 | However, just in case anybody was relying on the old behaviour, there is an |
| 220 | option called PCRE2_EXTRA_ALLOW_LOOKAROUND_BSK that enables the old behaviour. |
| 221 | An option has also been added to pcre2grep to enable this. |
| 222 | |
| 223 | 7. Re-enable a JIT optimization which was unintentionally disabled in 10.35. |
| 224 | |
| 225 | 8. There is a loop counter to catch excessively crazy patterns when checking |
| 226 | the lengths of lookbehinds at compile time. This was incorrectly getting reset |
| 227 | whenever a lookahead was processed, leading to some fuzzer-generated patterns |
| 228 | taking a very long time to compile when (?|) was present in the pattern, |
| 229 | because (?|) disables caching of group lengths. |
| 230 | |
| 231 | |
| 232 | Version 10.37 26-May-2021 |
| 233 | ------------------------- |
| 234 | |
| 235 | 1. Change RunGrepTest to use tr instead of sed when testing with binary |
| 236 | zero bytes, because sed varies a lot from system to system and has problems |
| 237 | with binary zeros. This is from Bugzilla #2681. Patch from Jeremie |
| 238 | Courreges-Anglas via Nam Nguyen. This fixes RunGrepTest for OpenBSD. Later: |
| 239 | it broke it for at least one version of Solaris, where tr can't handle binary |
| 240 | zeros. However, that system had /usr/xpg4/bin/tr installed, which works OK, so |
| 241 | RunGrepTest now checks for that command and uses it if found. |
| 242 | |
| 243 | 2. Compiling with gcc 10.2's -fanalyzer option showed up a hypothetical problem |
| 244 | with a NULL dereference. I don't think this case could ever occur in practice, |
| 245 | but I have put in a check in order to get rid of the compiler error. |
| 246 | |
| 247 | 3. An alternative patch for CMakeLists.txt because 10.36 #4 breaks CMake on |
| 248 | Windows. Patch from email@cs-ware.de fixes bugzilla #2688. |
| 249 | |
| 250 | 4. Two bugs related to over-large numbers have been fixed so the behaviour is |
| 251 | now the same as Perl. |
| 252 | |
| 253 | (a) A pattern such as /\214748364/ gave an overflow error instead of being |
| 254 | treated as the octal number \214 followed by literal digits. |
| 255 | |
| 256 | (b) A sequence such as {65536 that has no terminating } so is not a |
| 257 | quantifier was nevertheless complaining that a quantifier number was too big. |
| 258 | |
| 259 | 5. A run of autoconf suggested that configure.ac was out-of-date with respect |
| 260 | to the lastest autoconf. Running autoupdate made some valid changes, some valid |
| 261 | suggestions, and also some invalid changes, which were fixed by hand. Autoconf |
| 262 | now runs clean and the resulting "configure" seems to work, so I hope nothing |
| 263 | is broken. Later: the requirement for autoconf 2.70 broke some automatic test |
| 264 | robots. It doesn't seem to be necessary: trying a reduction to 2.60. |
| 265 | |
| 266 | 6. The pattern /a\K.(?0)*/ when matched against "abac" by the interpreter gave |
| 267 | the answer "bac", whereas Perl and JIT both yield "c". This was because the |
| 268 | effect of \K was not propagating back from the full pattern recursion. Other |
| 269 | recursions such as /(a\K.(?1)*)/ did not have this problem. |
| 270 | |
| 271 | 7. Restore single character repetition optimization in JIT. Currently fewer |
| 272 | character repetitions are optimized than in 10.34. |
| 273 | |
| 274 | 8. When the names of the functions in the POSIX wrapper were changed to |
| 275 | pcre2_regcomp() etc. (see change 10.33 #4 below), functions with the original |
| 276 | names were left in the library so that pre-compiled programs would still work. |
| 277 | However, this has proved troublesome when programs link with several libraries, |
| 278 | some of which use PCRE2 via the POSIX interface while others use a native POSIX |
| 279 | library. For this reason, the POSIX function names are removed in this release. |
| 280 | The macros in pcre2posix.h should ensure that re-compiling fixes any programs |
| 281 | that haven't been compiled since before 10.33. |
| 282 | |
| 283 | |
| 284 | Version 10.36 04-December-2020 |
| 285 | ------------------------------ |
| 286 | |
| 287 | 1. Add CET_CFLAGS so that when Intel CET is enabled, pass -mshstk to |
| 288 | compiler. This fixes https://bugs.exim.org/show_bug.cgi?id=2578. Patch for |
| 289 | Makefile.am and configure.ac by H.J. Lu. Equivalent patch for CMakeLists.txt |
| 290 | invented by PH. |
| 291 | |
| 292 | 2. Fix inifinite loop when a single byte newline is searched in JIT when |
| 293 | invalid utf8 mode is enabled. |
| 294 | |
| 295 | 3. Updated CMakeLists.txt with patch from Wolfgang Stöggl (Bugzilla #2584): |
| 296 | |
| 297 | - Include GNUInstallDirs and use ${CMAKE_INSTALL_LIBDIR} instead of hardcoded |
| 298 | lib. This allows differentiation between lib and lib64. |
| 299 | CMAKE_INSTALL_LIBDIR is used for installation of libraries and also for |
| 300 | pkgconfig file generation. |
| 301 | |
| 302 | - Add the version of PCRE2 to the configuration summary like ./configure |
| 303 | does. |
| 304 | |
| 305 | - Fix typo: MACTHED_STRING->MATCHED_STRING |
| 306 | |
| 307 | 4. Updated CMakeLists.txt with another patch from Wolfgang Stöggl (Bugzilla |
| 308 | #2588): |
| 309 | |
| 310 | - Add escaped double quotes around include directory in CMakeLists.txt to |
| 311 | allow spaces in directory names. |
| 312 | |
| 313 | - This fixes a cmake error, if the path of the pcre2 source contains a space. |
| 314 | |
| 315 | 5. Updated CMakeLists.txt with a patch from B. Scott Michel: CMake's |
| 316 | documentation suggests using CHECK_SYMBOL_EXISTS over CHECK_FUNCTION_EXIST. |
| 317 | Moreover, these functions come from specific header files, which need to be |
| 318 | specified (and, thankfully, are the same on both the Linux and WinXX |
| 319 | platforms.) |
| 320 | |
| 321 | 6. Added a (uint32_t) cast to prevent a compiler warning in pcre2_compile.c. |
| 322 | |
| 323 | 7. Applied a patch from Wolfgang Stöggl (Bugzilla #2600) to fix postfix for |
| 324 | debug Windows builds using CMake. This also updated configure so that it |
| 325 | generates *.pc files and pcre2-config with the same content, as in the past. |
| 326 | |
| 327 | 8. If a pattern ended with (?(VERSION=n.d where n is any number but d is just a |
| 328 | single digit, the code unit beyond d was being read (i.e. there was a read |
| 329 | buffer overflow). Fixes ClusterFuzz 23779. |
| 330 | |
| 331 | 9. After the rework in r1235, certain character ranges were incorrectly |
| 332 | handled by an optimization in JIT. Furthermore a wrong offset was used to |
| 333 | read a value from a buffer which could lead to memory overread. |
| 334 | |
| 335 | 10. Unnoticed for many years was the fact that delimiters other than / in the |
| 336 | testinput1 and testinput4 files could cause incorrect behaviour when these |
| 337 | files were processed by perltest.sh. There were several tests that used quotes |
| 338 | as delimiters, and it was just luck that they didn't go wrong with perltest.sh. |
| 339 | All the patterns in testinput1 and testinput4 now use / as their delimiter. |
| 340 | This fixes Bugzilla #2641. |
| 341 | |
| 342 | 11. Perl has started to give an error for \K within lookarounds (though there |
| 343 | are cases where it doesn't). PCRE2 still allows this, so the tests that include |
| 344 | this case have been moved from test 1 to test 2. |
| 345 | |
| 346 | 12. Further to 10 above, pcre2test has been updated to detect and grumble if a |
| 347 | delimiter other than / is used after #perltest. |
| 348 | |
| 349 | 13. Fixed a bug with PCRE2_MATCH_INVALID_UTF in 8-bit mode when PCRE2_CASELESS |
| 350 | was set and PCRE2_NO_START_OPTIMIZE was not set. The optimization for finding |
| 351 | the start of a match was not resetting correctly after a failed match on the |
| 352 | first valid fragment of the subject, possibly causing incorrect "no match" |
| 353 | returns on subsequent fragments. For example, the pattern /A/ failed to match |
| 354 | the subject \xe5A. Fixes Bugzilla #2642. |
| 355 | |
| 356 | 14. Fixed a bug in character set matching when JIT is enabled and both unicode |
| 357 | scripts and unicode classes are present at the same time. |
| 358 | |
| 359 | 15. Added GNU grep's -m (aka --max-count) option to pcre2grep. |
| 360 | |
| 361 | 16. Refactored substitution processing in pcre2grep strings, both for the -O |
| 362 | option and when dealing with callouts. There is now a single function that |
| 363 | handles $ expansion in all cases (instead of multiple copies of almost |
| 364 | identical code). This means that the same escape sequences are available |
| 365 | everywhere, which was not previously the case. At the same time, the escape |
| 366 | sequences $x{...} and $o{...} have been introduced, to allow for characters |
| 367 | whose code points are greater than 255 in Unicode mode. |
| 368 | |
| 369 | 17. Applied the patch from Bugzilla #2628 to RunGrepTest. This does an explicit |
| 370 | test for a version of sed that can handle binary zero, instead of assuming that |
| 371 | any Linux version will work. Later: replaced $(...) by `...` because not all |
| 372 | shells recognize the former. |
| 373 | |
| 374 | 18. Fixed a word boundary check bug in JIT when partial matching is enabled. |
| 375 | |
| 376 | 19. Fix ARM64 compilation warning in JIT. Patch by Carlo. |
| 377 | |
| 378 | 20. A bug in the RunTest script meant that if the first part of test 2 failed, |
| 379 | the failure was not reported. |
| 380 | |
| 381 | 21. Test 2 was failing when run from a directory other than the source |
| 382 | directory. This failure was previously missed in RunTest because of 20 above. |
| 383 | Fixes added to both RunTest and RunTest.bat. |
| 384 | |
| 385 | 22. Patch to CMakeLists.txt from Daniel to fix problem with testing under |
| 386 | Windows. |
| 387 | |
| 388 | |
| 389 | Version 10.35 09-May-2020 |
| 390 | --------------------------- |
| 391 | |
| 392 | 1. Use PCRE2_MATCH_EMPTY flag to detect empty matches in JIT. |
| 393 | |
| 394 | 2. Fix ARMv5 JIT improper handling of labels right after a constant pool. |
| 395 | |
| 396 | 3. A JIT bug is fixed which allowed to read the fields of the compiled |
| 397 | pattern before its existence is checked. |
| 398 | |
| 399 | 4. Back in the PCRE1 day, capturing groups that contained recursive back |
| 400 | references to themselves were made atomic (version 8.01, change 18) because |
| 401 | after the end a repeated group, the captured substrings had their values from |
| 402 | the final repetition, not from an earlier repetition that might be the |
| 403 | destination of a backtrack. This feature was documented, and was carried over |
| 404 | into PCRE2. However, it has now been realized that the major refactoring that |
| 405 | was done for 10.30 has made this atomicizing unnecessary, and it is confusing |
| 406 | when users are unaware of it, making some patterns appear not to be working as |
| 407 | expected. Capture values of recursive back references in repeated groups are |
| 408 | now correctly backtracked, so this unnecessary restriction has been removed. |
| 409 | |
| 410 | 5. Added PCRE2_SUBSTITUTE_LITERAL. |
| 411 | |
| 412 | 6. Avoid some VS compiler warnings. |
| 413 | |
| 414 | 7. Added PCRE2_SUBSTITUTE_MATCHED. |
| 415 | |
| 416 | 8. Added (?* and (?<* as synonms for (*napla: and (*naplb: to match another |
| 417 | regex engine. The Perl regex folks are aware of this usage and have made a note |
| 418 | about it. |
| 419 | |
| 420 | 9. When an assertion is repeated, PCRE2 used to limit the maximum repetition to |
| 421 | 1, believing that repeating an assertion is pointless. However, if a positive |
| 422 | assertion contains capturing groups, repetition can be useful. In any case, an |
| 423 | assertion could always be wrapped in a repeated group. The only restriction |
| 424 | that is now imposed is that an unlimited maximum is changed to one more than |
| 425 | the minimum. |
| 426 | |
| 427 | 10. Fix *THEN verbs in lookahead assertions in JIT. |
| 428 | |
| 429 | 11. Added PCRE2_SUBSTITUTE_REPLACEMENT_ONLY. |
| 430 | |
| 431 | 12. The JIT stack should be freed when the low-level stack allocation fails. |
| 432 | |
| 433 | 13. In pcre2grep, if the final line in a scanned file is output but does not |
| 434 | end with a newline sequence, add a newline according to the --newline setting. |
| 435 | |
| 436 | 14. (?(DEFINE)...) groups were not being handled correctly when checking for |
| 437 | the fixed length of a lookbehind assertion. Such a group within a lookbehind |
| 438 | should be skipped, as it does not contribute to the length of the group. |
| 439 | Instead, the (DEFINE) group was being processed, and if at the end of the |
| 440 | lookbehind, that end was not correctly recognized. Errors such as "lookbehind |
| 441 | assertion is not fixed length" and also "internal error: bad code value in |
| 442 | parsed_skip()" could result. |
| 443 | |
| 444 | 15. Put a limit of 1000 on recursive calls in pcre2_study() when searching |
| 445 | nested groups for starting code units, in order to avoid stack overflow issues. |
| 446 | If the limit is reached, it just gives up trying for this optimization. |
| 447 | |
| 448 | 16. The control verb chain list must always be restored when exiting from a |
| 449 | recurse function in JIT. |
| 450 | |
| 451 | 17. Fix a crash which occurs when the character type of an invalid UTF |
| 452 | character is decoded in JIT. |
| 453 | |
| 454 | 18. Changes in many areas of the code so that when Unicode is supported and |
| 455 | PCRE2_UCP is set without PCRE2_UTF, Unicode character properties are used for |
| 456 | upper/lower case computations on characters whose code points are greater than |
| 457 | 127. |
| 458 | |
| 459 | 19. The function for checking UTF-16 validity was returning an incorrect offset |
| 460 | for the start of the error when a high surrogate was not followed by a valid |
| 461 | low surrogate. This caused incorrect behaviour, for example when |
| 462 | PCRE2_MATCH_INVALID_UTF was set and a match started immediately following the |
| 463 | invalid high surrogate, such as /aa/ matching "\x{d800}aa". |
| 464 | |
| 465 | 20. If a DEFINE group immediately preceded a lookbehind assertion, the pattern |
| 466 | could be mis-compiled and therefore not match correctly. This is the example |
| 467 | that found this: /(?(DEFINE)(?<foo>bar))(?<![-a-z0-9])word/ which failed to |
| 468 | match "word" because the "move back" value was set to zero. |
| 469 | |
| 470 | 21. Following a request from a user, some extensions and tidies to the |
| 471 | character tables handling have been done: |
| 472 | |
| 473 | (a) The dftables auxiliary program is renamed pcre2_dftables, but it is still |
| 474 | not installed for public use. |
| 475 | |
| 476 | (b) There is now a -b option for pcre2_dftables, which causes the tables to |
| 477 | be written in binary. There is also a -help option. |
| 478 | |
| 479 | (c) PCRE2_CONFIG_TABLES_LENGTH is added to pcre2_config() so that an |
| 480 | application that wants to save tables in binary knows how long they are. |
| 481 | |
| 482 | 22. Changed setting of CMAKE_MODULE_PATH in CMakeLists.txt from SET to |
| 483 | LIST(APPEND...) to allow a setting from the command line to be included. |
| 484 | |
| 485 | 23. Updated to Unicode 13.0.0. |
| 486 | |
| 487 | 24. CMake build now checks for secure_getenv() and strerror(). Patch by Carlo. |
| 488 | |
| 489 | 25. Avoid using [-1] as a suffix in pcre2test because it can provoke a compiler |
| 490 | warning. |
| 491 | |
| 492 | 26. Added tests for __attribute__((uninitialized)) to both the configure and |
| 493 | CMake build files, and then applied this attribute to the variable called |
| 494 | stack_frames_vector[] in pcre2_match(). When implemented, this disables |
| 495 | automatic initialization (a facility in clang), which can take time on big |
| 496 | variables. |
| 497 | |
| 498 | 27. Updated CMakeLists.txt (patches by Uwe Korn) to add support for |
| 499 | pcre2-config, the libpcre*.pc files, SOVERSION, VERSION and the |
| 500 | MACHO_*_VERSIONS settings for CMake builds. |
| 501 | |
| 502 | 28. Another patch to CMakeLists.txt to check for mkostemp (configure already |
| 503 | does). Patch by Carlo Marcelo Arenas Belon. |
| 504 | |
| 505 | 29. Check for the existence of memfd_create in both CMake and configure |
| 506 | configurations. Patch by Carlo Marcelo Arenas Belon. |
| 507 | |
| 508 | 30. Restrict the configuration setting for the SELinux compatible execmem |
| 509 | allocator (change 10.30/44) to Linux and NetBSD. |
| 510 | |
| 511 | |
| 512 | Version 10.34 21-November-2019 |
| 513 | ------------------------------ |
| 514 | |
| 515 | 1. The maximum number of capturing subpatterns is 65535 (documented), but no |
| 516 | check on this was ever implemented. This omission has been rectified; it fixes |
| 517 | ClusterFuzz 14376. |
| 518 | |
| 519 | 2. Improved the invalid utf32 support of the JIT compiler. Now it correctly |
| 520 | detects invalid characters in the 0xd800-0xdfff range. |
| 521 | |
| 522 | 3. Fix minor typo bug in JIT compile when \X is used in a non-UTF string. |
| 523 | |
| 524 | 4. Add support for matching in invalid UTF strings to the pcre2_match() |
| 525 | interpreter, and integrate with the existing JIT support via the new |
| 526 | PCRE2_MATCH_INVALID_UTF compile-time option. |
| 527 | |
| 528 | 5. Give more error detail for invalid UTF-8 when detected in pcre2grep. |
| 529 | |
| 530 | 6. Add support for invalid UTF-8 to pcre2grep. |
| 531 | |
| 532 | 7. Adjust the limit for "must have" code unit searching, in particular, |
| 533 | increase it substantially for non-anchored patterns. |
| 534 | |
| 535 | 8. Allow (*ACCEPT) to be quantified, because an ungreedy quantifier with a zero |
| 536 | minimum is potentially useful. |
| 537 | |
| 538 | 9. Some changes to the way the minimum subject length is handled: |
| 539 | |
| 540 | * When PCRE2_NO_START_OPTIMIZE is set, no minimum length is computed; |
| 541 | pcre2test now omits this item instead of showing a value of zero. |
| 542 | |
| 543 | * An incorrect minimum length could be calculated for a pattern that |
| 544 | contained (*ACCEPT) inside a qualified group whose minimum repetition was |
| 545 | zero, for example /A(?:(*ACCEPT))?B/, which incorrectly computed a minimum |
| 546 | of 2. The minimum length scan no longer happens for a pattern that |
| 547 | contains (*ACCEPT). |
| 548 | |
| 549 | * When no minimum length is set by the normal scan, but a first and/or last |
| 550 | code unit is recorded, set the minimum to 1 or 2 as appropriate. |
| 551 | |
| 552 | * When a pattern contains multiple groups with the same number, a back |
| 553 | reference cannot know which one to scan for a minimum length. This used to |
| 554 | cause the minimum length finder to give up with no result. Now it treats |
| 555 | such references as not adding to the minimum length (which it should have |
| 556 | done all along). |
| 557 | |
| 558 | * Furthermore, the above action now happens only if the back reference is to |
| 559 | a group that exists more than once in a pattern instead of any back |
| 560 | reference in a pattern with duplicate numbers. |
| 561 | |
| 562 | 10. A (*MARK) value inside a successful condition was not being returned by the |
| 563 | interpretive matcher (it was returned by JIT). This bug has been mended. |
| 564 | |
| 565 | 11. A bug in pcre2grep meant that -o without an argument (or -o0) didn't work |
| 566 | if the pattern had more than 32 capturing parentheses. This is fixed. In |
| 567 | addition (a) the default limit for groups requested by -o<n> has been raised to |
| 568 | 50, (b) the new --om-capture option changes the limit, (c) an error is raised |
| 569 | if -o asks for a group that is above the limit. |
| 570 | |
| 571 | 12. The quantifier {1} was always being ignored, but this is incorrect when it |
| 572 | is made possessive and applied to an item in parentheses, because a |
| 573 | parenthesized item may contain multiple branches or other backtracking points, |
| 574 | for example /(a|ab){1}+c/ or /(a+){1}+a/. |
| 575 | |
| 576 | 13. For partial matches, pcre2test was always showing the maximum lookbehind |
| 577 | characters, flagged with "<", which is misleading when the lookbehind didn't |
| 578 | actually look behind the start (because it was later in the pattern). Showing |
| 579 | all consulted preceding characters for partial matches is now controlled by the |
| 580 | existing "allusedtext" modifier and, as for complete matches, this facility is |
| 581 | available only for non-JIT matching, because JIT does not maintain the first |
| 582 | and last consulted characters. |
| 583 | |
| 584 | 14. DFA matching (using pcre2_dfa_match()) was not recognising a partial match |
| 585 | if the end of the subject was encountered in a lookahead (conditional or |
| 586 | otherwise), an atomic group, or a recursion. |
| 587 | |
| 588 | 15. Give error if pcre2test -t, -T, -tm or -TM is given an argument of zero. |
| 589 | |
| 590 | 16. Check for integer overflow when computing lookbehind lengths. Fixes |
| 591 | Clusterfuzz issue 15636. |
| 592 | |
| 593 | 17. Implemented non-atomic positive lookaround assertions. |
| 594 | |
| 595 | 18. If a lookbehind contained a lookahead that contained another lookbehind |
| 596 | within it, the nested lookbehind was not correctly processed. For example, if |
| 597 | /(?<=(?=(?<=a)))b/ was matched to "ab" it gave no match instead of matching |
| 598 | "b". |
| 599 | |
| 600 | 19. Implemented pcre2_get_match_data_size(). |
| 601 | |
| 602 | 20. Two alterations to partial matching: |
| 603 | |
| 604 | (a) The definition of a partial match is slightly changed: if a pattern |
| 605 | contains any lookbehinds, an empty partial match may be given, because this |
| 606 | is another situation where adding characters to the current subject can |
| 607 | lead to a full match. Example: /c*+(?<=[bc])/ with subject "ab". |
| 608 | |
| 609 | (b) Similarly, if a pattern could match an empty string, an empty partial |
| 610 | match may be given. Example: /(?![ab]).*/ with subject "ab". This case |
| 611 | applies only to PCRE2_PARTIAL_HARD. |
| 612 | |
| 613 | (c) An empty string partial hard match can be returned for \z and \Z as it |
| 614 | is documented that they shouldn't match. |
| 615 | |
| 616 | 21. A branch that started with (*ACCEPT) was not being recognized as one that |
| 617 | could match an empty string. |
| 618 | |
| 619 | 22. Corrected pcre2_set_character_tables() tables data type: was const unsigned |
| 620 | char * instead of const uint8_t *, as generated by pcre2_maketables(). |
| 621 | |
| 622 | 23. Upgraded to Unicode 12.1.0. |
| 623 | |
| 624 | 24. Add -jitfast command line option to pcre2test (to make all the jit options |
| 625 | available directly). |
| 626 | |
| 627 | 25. Make pcre2test -C show if libreadline or libedit is supported. |
| 628 | |
| 629 | 26. If the length of one branch of a group exceeded 65535 (the maximum value |
| 630 | that is remembered as a minimum length), the whole group's length was |
| 631 | incorrectly recorded as 65535, leading to incorrect "no match" when start-up |
| 632 | optimizations were in force. |
| 633 | |
| 634 | 27. The "rightmost consulted character" value was not always correct; in |
| 635 | particular, if a pattern ended with a negative lookahead, characters that were |
| 636 | inspected in that lookahead were not included. |
| 637 | |
| 638 | 28. Add the pcre2_maketables_free() function. |
| 639 | |
| 640 | 29. The start-up optimization that looks for a unique initial matching |
| 641 | code unit in the interpretive engines uses memchr() in 8-bit mode. When the |
| 642 | search is caseless, it was doing so inefficiently, which ended up slowing down |
| 643 | the match drastically when the subject was very long. The revised code (a) |
| 644 | remembers if one case is not found, so it never repeats the search for that |
| 645 | case after a bumpalong and (b) when one case has been found, it searches only |
| 646 | up to that position for an earlier occurrence of the other case. This fix |
| 647 | applies to both interpretive pcre2_match() and to pcre2_dfa_match(). |
| 648 | |
| 649 | 30. While scanning to find the minimum length of a group, if any branch has |
| 650 | minimum length zero, there is no need to scan any subsequent branches (a small |
| 651 | compile-time performance improvement). |
| 652 | |
| 653 | 31. Installed a .gitignore file on a user's suggestion. When using the svn |
| 654 | repository with git (through git svn) this helps keep it tidy. |
| 655 | |
| 656 | 32. Add underflow check in JIT which may occur when the value of subject |
| 657 | string pointer is close to 0. |
| 658 | |
| 659 | 33. Arrange for classes such as [Aa] which contain just the two cases of the |
| 660 | same character, to be treated as a single caseless character. This causes the |
| 661 | first and required code unit optimizations to kick in where relevant. |
| 662 | |
| 663 | 34. Improve the bitmap of starting bytes for positive classes that include wide |
| 664 | characters, but no property types, in UTF-8 mode. Previously, on encountering |
| 665 | such a class, the bits for all bytes greater than \xc4 were set, thus |
| 666 | specifying any character with codepoint >= 0x100. Now the only bits that are |
| 667 | set are for the relevant bytes that start the wide characters. This can give a |
| 668 | noticeable performance improvement. |
| 669 | |
| 670 | 35. If the bitmap of starting code units contains only 1 or 2 bits, replace it |
| 671 | with a single starting code unit (1 bit) or a caseless single starting code |
| 672 | unit if the two relevant characters are case-partners. This is particularly |
| 673 | relevant to the 8-bit library, though it applies to all. It can give a |
| 674 | performance boost for patterns such as [Ww]ord and (word|WORD). However, this |
| 675 | optimization doesn't happen if there is a "required" code unit of the same |
| 676 | value (because the search for a "required" code unit starts at the match start |
| 677 | for non-unique first code unit patterns, but after a unique first code unit, |
| 678 | and patterns such as a*a need the former action). |
| 679 | |
| 680 | 36. Small patch to pcre2posix.c to set the erroroffset field to -1 immediately |
| 681 | after a successful compile, instead of at the start of matching to avoid a |
| 682 | sanitizer complaint (regexec is supposed to be thread safe). |
| 683 | |
| 684 | 37. Add NEON vectorization to JIT to speed up matching of first character and |
| 685 | pairs of characters on ARM64 CPUs. |
| 686 | |
| 687 | 38. If a non-ASCII character was the first in a starting assertion in a |
| 688 | caseless match, the "first code unit" optimization did not get the casing |
| 689 | right, and the assertion failed to match a character in the other case if it |
| 690 | did not start with the same code unit. |
| 691 | |
| 692 | 39. Fixed the incorrect computation of jump sizes on x86 CPUs in JIT. A masking |
| 693 | operation was incorrectly removed in r1136. Reported by Ralf Junker. |
| 694 | |
| 695 | |
| 696 | Version 10.33 16-April-2019 |
| 697 | --------------------------- |
| 698 | |
| 699 | 1. Added "allvector" to pcre2test to make it easy to check the part of the |
| 700 | ovector that shouldn't be changed, in particular after substitute and failed or |
| 701 | partial matches. |
| 702 | |
| 703 | 2. Fix subject buffer overread in JIT when UTF is disabled and \X or \R has |
| 704 | a greater than 1 fixed quantifier. This issue was found by Yunho Kim. |
| 705 | |
| 706 | 3. Added support for callouts from pcre2_substitute(). After 10.33-RC1, but |
| 707 | prior to release, fixed a bug that caused a crash if pcre2_substitute() was |
| 708 | called with a NULL match context. |
| 709 | |
| 710 | 4. The POSIX functions are now all called pcre2_regcomp() etc., with wrapper |
| 711 | functions that use the standard POSIX names. However, in pcre2posix.h the POSIX |
| 712 | names are defined as macros. This should help avoid linking with the wrong |
| 713 | library in some environments while still exporting the POSIX names for |
| 714 | pre-existing programs that use them. (The Debian alternative names are also |
| 715 | defined as macros, but not documented.) |
| 716 | |
| 717 | 5. Fix an xclass matching issue in JIT. |
| 718 | |
| 719 | 6. Implement PCRE2_EXTRA_ESCAPED_CR_IS_LF (see Bugzilla 2315). |
| 720 | |
| 721 | 7. Implement the Perl 5.28 experimental alphabetic names for atomic groups and |
| 722 | lookaround assertions, for example, (*pla:...) and (*atomic:...). These are |
| 723 | characterized by a lower case letter following (* and to simplify coding for |
| 724 | this, the character tables created by pcre2_maketables() were updated to add a |
| 725 | new "is lower case letter" bit. At the same time, the now unused "is |
| 726 | hexadecimal digit" bit was removed. The default tables in |
| 727 | src/pcre2_chartables.c.dist are updated. |
| 728 | |
| 729 | 8. Implement the new Perl "script run" features (*script_run:...) and |
| 730 | (*atomic_script_run:...) aka (*sr:...) and (*asr:...). |
| 731 | |
| 732 | 9. Fixed two typos in change 22 for 10.21, which added special handling for |
| 733 | ranges such as a-z in EBCDIC environments. The original code probably never |
| 734 | worked, though there were no bug reports. |
| 735 | |
| 736 | 10. Implement PCRE2_COPY_MATCHED_SUBJECT for pcre2_match() (including JIT via |
| 737 | pcre2_match()) and pcre2_dfa_match(), but *not* the pcre2_jit_match() fast |
| 738 | path. Also, when a match fails, set the subject field in the match data to NULL |
| 739 | for tidiness - none of the substring extractors should reference this after |
| 740 | match failure. |
| 741 | |
| 742 | 11. If a pattern started with a subroutine call that had a quantifier with a |
| 743 | minimum of zero, an incorrect "match must start with this character" could be |
| 744 | recorded. Example: /(?&xxx)*ABC(?<xxx>XYZ)/ would (incorrectly) expect 'A' to |
| 745 | be the first character of a match. |
| 746 | |
| 747 | 12. The heap limit checking code in pcre2_dfa_match() could suffer from |
| 748 | overflow if the heap limit was set very large. This could cause incorrect "heap |
| 749 | limit exceeded" errors. |
| 750 | |
| 751 | 13. Add "kibibytes" to the heap limit output from pcre2test -C to make the |
| 752 | units clear. |
| 753 | |
| 754 | 14. Add a call to pcre2_jit_free_unused_memory() in pcre2grep, for tidiness. |
| 755 | |
| 756 | 15. Updated the VMS-specific code in pcre2test on the advice of a VMS user. |
| 757 | |
| 758 | 16. Removed the unnecessary inclusion of stdint.h (or inttypes.h) from |
| 759 | pcre2_internal.h as it is now included by pcre2.h. Also, change 17 for 10.32 |
| 760 | below was unnecessarily complicated, as inttypes.h is a Standard C header, |
| 761 | which is defined to be a superset of stdint.h. Instead of conditionally |
| 762 | including stdint.h or inttypes.h, pcre2.h now unconditionally includes |
| 763 | inttypes.h. This supports environments that do not have stdint.h but do have |
| 764 | inttypes.h, which are known to exist. A note in the autotools documentation |
| 765 | says (November 2018) that there are none known that are the other way round. |
| 766 | |
| 767 | 17. Added --disable-percent-zt to "configure" (and equivalent to CMake) to |
| 768 | forcibly disable the use of %zu and %td in formatting strings because there is |
| 769 | at least one version of VMS that claims to be C99 but does not support these |
| 770 | modifiers. |
| 771 | |
| 772 | 18. Added --disable-pcre2grep-callout-fork, which restricts the callout support |
| 773 | in pcre2grep to the inbuilt echo facility. This may be useful in environments |
| 774 | that do not support fork(). |
| 775 | |
| 776 | 19. Fix two instances of <= 0 being applied to unsigned integers (the VMS |
| 777 | compiler complains). |
| 778 | |
| 779 | 20. Added "fork" support for VMS to pcre2grep, for running an external program |
| 780 | via a string callout. |
| 781 | |
| 782 | 21. Improve MAP_JIT flag usage on MacOS. Patch by Rich Siegel. |
| 783 | |
| 784 | 22. If a pattern started with (*MARK), (*COMMIT), (*PRUNE), (*SKIP), or (*THEN) |
| 785 | followed by ^ it was not recognized as anchored. |
| 786 | |
| 787 | 23. The RunGrepTest script used to cut out the test of NUL characters for |
| 788 | Solaris and MacOS as printf and sed can't handle them. It seems that the *BSD |
| 789 | systems can't either. I've inverted the test so that only those OS that are |
| 790 | known to work (currently only Linux) try to run this test. |
| 791 | |
| 792 | 24. Some tests in RunGrepTest appended to testtrygrep from two different file |
| 793 | descriptors instead of redirecting stderr to stdout. This worked on Linux, but |
| 794 | it was reported not to on other systems, causing the tests to fail. |
| 795 | |
| 796 | 25. In the RunTest script, make the test for stack setting use the same value |
| 797 | for the stack as it needs for -bigstack. |
| 798 | |
| 799 | 26. Insert a cast in pcre2_dfa_match.c to suppress a compiler warning. |
| 800 | |
| 801 | 26. With PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL set, escape sequences such as \s |
| 802 | which are valid in character classes, but not as the end of ranges, were being |
| 803 | treated as literals. An example is [_-\s] (but not [\s-_] because that gave an |
| 804 | error at the *start* of a range). Now an "invalid range" error is given |
| 805 | independently of PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL. |
| 806 | |
| 807 | 27. Related to 26 above, PCRE2_BAD_ESCAPE_IS_LITERAL was affecting known escape |
| 808 | sequences such as \eX when they appeared invalidly in a character class. Now |
| 809 | the option applies only to unrecognized or malformed escape sequences. |
| 810 | |
| 811 | 28. Fix word boundary in JIT compiler. Patch by Mike Munday. |
| 812 | |
| 813 | 29. The pcre2_dfa_match() function was incorrectly handling conditional version |
| 814 | tests such as (?(VERSION>=0)...) when the version test was true. Incorrect |
| 815 | processing or a crash could result. |
| 816 | |
| 817 | 30. When PCRE2_UTF is set, allow non-ASCII letters and decimal digits in group |
| 818 | names, as Perl does. There was a small bug in this new code, found by |
| 819 | ClusterFuzz 12950, fixed before release. |
| 820 | |
| 821 | 31. Implemented PCRE2_EXTRA_ALT_BSUX to support ECMAScript 6's \u{hhh} |
| 822 | construct. |
| 823 | |
| 824 | 32. Compile \p{Any} to be the same as . in DOTALL mode, so that it benefits |
| 825 | from auto-anchoring if \p{Any}* starts a pattern. |
| 826 | |
| 827 | 33. Compile invalid UTF check in JIT test when only pcre32 is enabled. |
| 828 | |
| 829 | 34. For some time now, CMake has been warning about the setting of policy |
| 830 | CMP0026 to "OLD" in CmakeLists.txt, and hinting that the feature might be |
| 831 | removed in a future version. A request for CMake expertise on the list produced |
| 832 | no result, so I have now hacked CMakeLists.txt along the lines of some changes |
| 833 | I found on the Internet. The new code no longer needs the policy setting, and |
| 834 | it appears to work fine on Linux. |
| 835 | |
| 836 | 35. Setting --enable-jit=auto for an out-of-tree build failed because the |
| 837 | source directory wasn't in the search path for AC_TRY_COMPILE always. Patch |
| 838 | from Ross Burton. |
| 839 | |
| 840 | 36. Disable SSE2 JIT optimizations in x86 CPUs when SSE2 is not available. |
| 841 | Patch by Guillem Jover. |
| 842 | |
| 843 | 37. Changed expressions such as 1<<10 to 1u<<10 in many places because compiler |
| 844 | warnings were reported. |
| 845 | |
| 846 | 38. Using the clang compiler with sanitizing options causes runtime complaints |
| 847 | about truncation for statments such as x = ~x when x is an 8-bit value; it |
| 848 | seems to compute ~x as a 32-bit value. Changing such statements to x = 255 ^ x |
| 849 | gets rid of the warnings. There were also two missing casts in pcre2test. |
| 850 | |
| 851 | |
| 852 | Version 10.32 10-September-2018 |
| 853 | ------------------------------- |
| 854 | |
| 855 | 1. When matching using the the REG_STARTEND feature of the POSIX API with a |
| 856 | non-zero starting offset, unset capturing groups with lower numbers than a |
| 857 | group that did capture something were not being correctly returned as "unset" |
| 858 | (that is, with offset values of -1). |
| 859 | |
| 860 | 2. When matching using the POSIX API, pcre2test used to omit listing unset |
| 861 | groups altogether. Now it shows those that come before any actual captures as |
| 862 | "<unset>", as happens for non-POSIX matching. |
| 863 | |
| 864 | 3. Running "pcre2test -C" always stated "\R matches CR, LF, or CRLF only", |
| 865 | whatever the build configuration was. It now correctly says "\R matches all |
| 866 | Unicode newlines" in the default case when --enable-bsr-anycrlf has not been |
| 867 | specified. Similarly, running "pcre2test -C bsr" never produced the result |
| 868 | ANY. |
| 869 | |
| 870 | 4. Matching the pattern /(*UTF)\C[^\v]+\x80/ against an 8-bit string containing |
| 871 | multi-code-unit characters caused bad behaviour and possibly a crash. This |
| 872 | issue was fixed for other kinds of repeat in release 10.20 by change 19, but |
| 873 | repeating character classes were overlooked. |
| 874 | |
| 875 | 5. pcre2grep now supports the inclusion of binary zeros in patterns that are |
| 876 | read from files via the -f option. |
| 877 | |
| 878 | 6. A small fix to pcre2grep to avoid compiler warnings for -Wformat-overflow=2. |
| 879 | |
| 880 | 7. Added --enable-jit=auto support to configure.ac. |
| 881 | |
| 882 | 8. Added some dummy variables to the heapframe structure in 16-bit and 32-bit |
| 883 | modes for the benefit of m68k, where pointers can be 16-bit aligned. The |
| 884 | dummies force 32-bit alignment and this ensures that the structure is a |
| 885 | multiple of PCRE2_SIZE, a requirement that is tested at compile time. In other |
| 886 | architectures, alignment requirements take care of this automatically. |
| 887 | |
| 888 | 9. When returning an error from pcre2_pattern_convert(), ensure the error |
| 889 | offset is set zero for early errors. |
| 890 | |
| 891 | 10. A number of patches for Windows support from Daniel Richard G: |
| 892 | |
| 893 | (a) List of error numbers in Runtest.bat corrected (it was not the same as in |
| 894 | Runtest). |
| 895 | |
| 896 | (b) pcre2grep snprintf() workaround as used elsewhere in the tree. |
| 897 | |
| 898 | (c) Support for non-C99 snprintf() that returns -1 in the overflow case. |
| 899 | |
| 900 | 11. Minor tidy of pcre2_dfa_match() code. |
| 901 | |
| 902 | 12. Refactored pcre2_dfa_match() so that the internal recursive calls no longer |
| 903 | use the stack for local workspace and local ovectors. Instead, an initial block |
| 904 | of stack is reserved, but if this is insufficient, heap memory is used. The |
| 905 | heap limit parameter now applies to pcre2_dfa_match(). |
| 906 | |
| 907 | 13. If a "find limits" test of DFA matching in pcre2test resulted in too many |
| 908 | matches for the ovector, no matches were displayed. |
| 909 | |
| 910 | 14. Removed an occurrence of ctrl/Z from test 6 because Windows treats it as |
| 911 | EOF. The test looks to have come from a fuzzer. |
| 912 | |
| 913 | 15. If PCRE2 was built with a default match limit a lot greater than the |
| 914 | default default of 10 000 000, some JIT tests of the match limit no longer |
| 915 | failed. All such tests now set 10 000 000 as the upper limit. |
| 916 | |
| 917 | 16. Another Windows related patch for pcregrep to ensure that WIN32 is |
| 918 | undefined under Cygwin. |
| 919 | |
| 920 | 17. Test for the presence of stdint.h and inttypes.h in configure and CMake and |
| 921 | include whichever exists (stdint preferred) instead of unconditionally |
| 922 | including stdint. This makes life easier for old and non-standard systems. |
| 923 | |
| 924 | 18. Further changes to improve portability, especially to old and or non- |
| 925 | standard systems: |
| 926 | |
| 927 | (a) Put all printf arguments in RunGrepTest into single, not double, quotes, |
| 928 | and use \0 not \x00 for binary zero. |
| 929 | |
| 930 | (b) Avoid the use of C++ (i.e. BCPL) // comments. |
| 931 | |
| 932 | (c) Parameterize the use of %zu in pcre2test to make it like %td. For both of |
| 933 | these now, if using MSVC or a standard C before C99, %lu is used with a |
| 934 | cast if necessary. |
| 935 | |
| 936 | 19. Applied a contributed patch to CMakeLists.txt to increase the stack size |
| 937 | when linking pcre2test with MSVC. This gets rid of a stack overflow error in |
| 938 | the standard set of tests. |
| 939 | |
| 940 | 20. Output a warning in pcre2test when ignoring the "altglobal" modifier when |
| 941 | it is given with the "replace" modifier. |
| 942 | |
| 943 | 21. In both pcre2test and pcre2_substitute(), with global matching, a pattern |
| 944 | that matched an empty string, but never at the starting match offset, was not |
| 945 | handled in a Perl-compatible way. The pattern /(<?=\G.)/ is an example of such |
| 946 | a pattern. Because \G is in a lookbehind assertion, there has to be a |
| 947 | "bumpalong" before there can be a match. The automatic "advance by one |
| 948 | character after an empty string match" rule is therefore inappropriate. A more |
| 949 | complicated algorithm has now been implemented. |
| 950 | |
| 951 | 22. When checking to see if a lookbehind is of fixed length, lookaheads were |
| 952 | correctly ignored, but qualifiers on lookaheads were not being ignored, leading |
| 953 | to an incorrect "lookbehind assertion is not fixed length" error. |
| 954 | |
| 955 | 23. The VERSION condition test was reading fractional PCRE2 version numbers |
| 956 | such as the 04 in 10.04 incorrectly and hence giving wrong results. |
| 957 | |
| 958 | 24. Updated to Unicode version 11.0.0. As well as the usual addition of new |
| 959 | scripts and characters, this involved re-jigging the grapheme break property |
| 960 | algorithm because Unicode has changed the way emojis are handled. |
| 961 | |
| 962 | 25. Fixed an obscure bug that struck when there were two atomic groups not |
| 963 | separated by something with a backtracking point. There could be an incorrect |
| 964 | backtrack into the first of the atomic groups. A complicated example is |
| 965 | /(?>a(*:1))(?>b)(*SKIP:1)x|.*/ matched against "abc", where the *SKIP |
| 966 | shouldn't find a MARK (because is in an atomic group), but it did. |
| 967 | |
| 968 | 26. Upgraded the perltest.sh script: (1) #pattern lines can now be used to set |
| 969 | a list of modifiers for all subsequent patterns - only those that the script |
| 970 | recognizes are meaningful; (2) #subject lines can be used to set or unset a |
| 971 | default "mark" modifier; (3) Unsupported #command lines give a warning when |
| 972 | they are ignored; (4) Mark data is output only if the "mark" modifier is |
| 973 | present. |
| 974 | |
| 975 | 27. (*ACCEPT:ARG), (*FAIL:ARG), and (*COMMIT:ARG) are now supported. |
| 976 | |
| 977 | 28. A (*MARK) name was not being passed back for positive assertions that were |
| 978 | terminated by (*ACCEPT). |
| 979 | |
| 980 | 29. Add support for \N{U+dddd}, but only in Unicode mode. |
| 981 | |
| 982 | 30. Add support for (?^) for unsetting all imnsx options. |
| 983 | |
| 984 | 31. The PCRE2_EXTENDED (/x) option only ever discarded space characters whose |
| 985 | code point was less than 256 and that were recognized by the lookup table |
| 986 | generated by pcre2_maketables(), which uses isspace() to identify white space. |
| 987 | Now, when Unicode support is compiled, PCRE2_EXTENDED also discards U+0085, |
| 988 | U+200E, U+200F, U+2028, and U+2029, which are additional characters defined by |
| 989 | Unicode as "Pattern White Space". This makes PCRE2 compatible with Perl. |
| 990 | |
| 991 | 32. In certain circumstances, option settings within patterns were not being |
| 992 | correctly processed. For example, the pattern /((?i)A)(?m)B/ incorrectly |
| 993 | matched "ab". (The (?m) setting lost the fact that (?i) should be reset at the |
| 994 | end of its group during the parse process, but without another setting such as |
| 995 | (?m) the compile phase got it right.) This bug was introduced by the |
| 996 | refactoring in release 10.23. |
| 997 | |
| 998 | 33. PCRE2 uses bcopy() if available when memmove() is not, and it used just to |
| 999 | define memmove() as function call to bcopy(). This hasn't been tested for a |
| 1000 | long time because in pcre2test the result of memmove() was being used, whereas |
| 1001 | bcopy() doesn't return a result. This feature is now refactored always to call |
| 1002 | an emulation function when there is no memmove(). The emulation makes use of |
| 1003 | bcopy() when available. |
| 1004 | |
| 1005 | 34. When serializing a pattern, set the memctl, executable_jit, and tables |
| 1006 | fields (that is, all the fields that contain pointers) to zeros so that the |
| 1007 | result of serializing is always the same. These fields are re-set when the |
| 1008 | pattern is deserialized. |
| 1009 | |
| 1010 | 35. In a pattern such as /[^\x{100}-\x{ffff}]*[\x80-\xff]/ which has a repeated |
| 1011 | negative class with no characters less than 0x100 followed by a positive class |
| 1012 | with only characters less than 0x100, the first class was incorrectly being |
| 1013 | auto-possessified, causing incorrect match failures. |
| 1014 | |
| 1015 | 36. Removed the character type bit ctype_meta, which dates from PCRE1 and is |
| 1016 | not used in PCRE2. |
| 1017 | |
| 1018 | 37. Tidied up unnecessarily complicated macros used in the escapes table. |
| 1019 | |
| 1020 | 38. Since 10.21, the new testoutput8-16-4 file has accidentally been omitted |
| 1021 | from distribution tarballs, owing to a typo in Makefile.am which had |
| 1022 | testoutput8-16-3 twice. Now fixed. |
| 1023 | |
| 1024 | 39. If the only branch in a conditional subpattern was anchored, the whole |
| 1025 | subpattern was treated as anchored, when it should not have been, since the |
| 1026 | assumed empty second branch cannot be anchored. Demonstrated by test patterns |
| 1027 | such as /(?(1)^())b/ or /(?(?=^))b/. |
| 1028 | |
| 1029 | 40. A repeated conditional subpattern that could match an empty string was |
| 1030 | always assumed to be unanchored. Now it it checked just like any other |
| 1031 | repeated conditional subpattern, and can be found to be anchored if the minimum |
| 1032 | quantifier is one or more. I can't see much use for a repeated anchored |
| 1033 | pattern, but the behaviour is now consistent. |
| 1034 | |
| 1035 | 41. Minor addition to pcre2_jit_compile.c to avoid static analyzer complaint |
| 1036 | (for an event that could never occur but you had to have external information |
| 1037 | to know that). |
| 1038 | |
| 1039 | 42. If before the first match in a file that was being searched by pcre2grep |
| 1040 | there was a line that was sufficiently long to cause the input buffer to be |
| 1041 | expanded, the variable holding the location of the end of the previous match |
| 1042 | was being adjusted incorrectly, and could cause an overflow warning from a code |
| 1043 | sanitizer. However, as the value is used only to print pending "after" lines |
| 1044 | when the next match is reached (and there are no such lines in this case) this |
| 1045 | bug could do no damage. |
| 1046 | |
| 1047 | |
| 1048 | Version 10.31 12-February-2018 |
| 1049 | ------------------------------ |
| 1050 | |
| 1051 | 1. Fix typo (missing ]) in VMS code in pcre2test.c. |
| 1052 | |
| 1053 | 2. Replace the replicated code for matching extended Unicode grapheme sequences |
| 1054 | (which got a lot more complicated by change 10.30/49) by a single subroutine |
| 1055 | that is called by both pcre2_match() and pcre2_dfa_match(). |
| 1056 | |
| 1057 | 3. Add idempotent guard to pcre2_internal.h. |
| 1058 | |
| 1059 | 4. Add new pcre2_config() options: PCRE2_CONFIG_NEVER_BACKSLASH_C and |
| 1060 | PCRE2_CONFIG_COMPILED_WIDTHS. |
| 1061 | |
| 1062 | 5. Cut out \C tests in the JIT regression tests when NEVER_BACKSLASH_C is |
| 1063 | defined (e.g. by --enable-never-backslash-C). |
| 1064 | |
| 1065 | 6. Defined public names for all the pcre2_compile() error numbers, and used |
| 1066 | the public names in pcre2_convert.c. |
| 1067 | |
| 1068 | 7. Fixed a small memory leak in pcre2test (convert contexts). |
| 1069 | |
| 1070 | 8. Added two casts to compile.c and one to match.c to avoid compiler warnings. |
| 1071 | |
| 1072 | 9. Added code to pcre2grep when compiled under VMS to set the symbol |
| 1073 | PCRE2GREP_RC to the exit status, because VMS does not distinguish between |
| 1074 | exit(0) and exit(1). |
| 1075 | |
| 1076 | 10. Added the -LM (list modifiers) option to pcre2test. Also made -C complain |
| 1077 | about a bad option only if the following argument item does not start with a |
| 1078 | hyphen. |
| 1079 | |
| 1080 | 11. pcre2grep was truncating components of file names to 128 characters when |
| 1081 | processing files with the -r option, and also (some very odd code) truncating |
| 1082 | path names to 512 characters. There is now a check on the absolute length of |
| 1083 | full path file names, which may be up to 2047 characters long. |
| 1084 | |
| 1085 | 12. When an assertion contained (*ACCEPT) it caused all open capturing groups |
| 1086 | to be closed (as for a non-assertion ACCEPT), which was wrong and could lead to |
| 1087 | misbehaviour for subsequent references to groups that started outside the |
| 1088 | assertion. ACCEPT in an assertion now closes only those groups that were |
| 1089 | started within that assertion. Fixes oss-fuzz issues 3852 and 3891. |
| 1090 | |
| 1091 | 13. Multiline matching in pcre2grep was misbehaving if the pattern matched |
| 1092 | within a line, and then matched again at the end of the line and over into |
| 1093 | subsequent lines. Behaviour was different with and without colouring, and |
| 1094 | sometimes context lines were incorrectly printed and/or line endings were lost. |
| 1095 | All these issues should now be fixed. |
| 1096 | |
| 1097 | 14. If --line-buffered was specified for pcre2grep when input was from a |
| 1098 | compressed file (.gz or .bz2) a segfault occurred. (Line buffering should be |
| 1099 | ignored for compressed files.) |
| 1100 | |
| 1101 | 15. Although pcre2_jit_match checks whether the pattern is compiled |
| 1102 | in a given mode, it was also expected that at least one mode is available. |
| 1103 | This is fixed and pcre2_jit_match returns with PCRE2_ERROR_JIT_BADOPTION |
| 1104 | when the pattern is not optimized by JIT at all. |
| 1105 | |
| 1106 | 16. The line number and related variables such as match counts in pcre2grep |
| 1107 | were all int variables, causing overflow when files with more than 2147483647 |
| 1108 | lines were processed (assuming 32-bit ints). They have all been changed to |
| 1109 | unsigned long ints. |
| 1110 | |
| 1111 | 17. If a backreference with a minimum repeat count of zero was first in a |
| 1112 | pattern, apart from assertions, an incorrect first matching character could be |
| 1113 | recorded. For example, for the pattern /(?=(a))\1?b/, "b" was incorrectly set |
| 1114 | as the first character of a match. |
| 1115 | |
| 1116 | 18. Characters in a leading positive assertion are considered for recording a |
| 1117 | first character of a match when the rest of the pattern does not provide one. |
| 1118 | However, a character in a non-assertive group within a leading assertion such |
| 1119 | as in the pattern /(?=(a))\1?b/ caused this process to fail. This was an |
| 1120 | infelicity rather than an outright bug, because it did not affect the result of |
| 1121 | a match, just its speed. (In fact, in this case, the starting 'a' was |
| 1122 | subsequently picked up in the study.) |
| 1123 | |
| 1124 | 19. A minor tidy in pcre2_match(): making all PCRE2_ERROR_ returns use "return" |
| 1125 | instead of "RRETURN" saves unwinding the backtracks in these cases (only one |
| 1126 | didn't). |
| 1127 | |
| 1128 | 20. Allocate a single callout block on the stack at the start of pcre2_match() |
| 1129 | and set its never-changing fields once only. Do the same for pcre2_dfa_match(). |
| 1130 | |
| 1131 | 21. Save the extra compile options (set in the compile context) with the |
| 1132 | compiled pattern (they were not previously saved), add PCRE2_INFO_EXTRAOPTIONS |
| 1133 | to retrieve them, and update pcre2test to show them. |
| 1134 | |
| 1135 | 22. Added PCRE2_CALLOUT_STARTMATCH and PCRE2_CALLOUT_BACKTRACK bits to a new |
| 1136 | field callout_flags in callout blocks. The bits are set by pcre2_match(), but |
| 1137 | not by JIT or pcre2_dfa_match(). Their settings are shown in pcre2test callouts |
| 1138 | if the callout_extra subject modifier is set. These bits are provided to help |
| 1139 | with tracking how a backtracking match is proceeding. |
| 1140 | |
| 1141 | 23. Updated the pcre2demo.c demonstration program, which was missing the extra |
| 1142 | code for -g that handles the case when \K in an assertion causes the match to |
| 1143 | end at the original start point. Also arranged for it to detect when \K causes |
| 1144 | the end of a match to be before its start. |
| 1145 | |
| 1146 | 24. Similar to 23 above, strange things (including loops) could happen in |
| 1147 | pcre2grep when \K was used in an assertion when --colour was used or in |
| 1148 | multiline mode. The "end at original start point" bug is fixed, and if the end |
| 1149 | point is found to be before the start point, they are swapped. |
| 1150 | |
| 1151 | 25. When PCRE2_FIRSTLINE without PCRE2_NO_START_OPTIMIZE was used in non-JIT |
| 1152 | matching (both pcre2_match() and pcre2_dfa_match()) and the matched string |
| 1153 | started with the first code unit of a newline sequence, matching failed because |
| 1154 | it was not tried at the newline. |
| 1155 | |
| 1156 | 26. Code for giving up a non-partial match after failing to find a starting |
| 1157 | code unit anywhere in the subject was missing when searching for one of a |
| 1158 | number of code units (the bitmap case) in both pcre2_match() and |
| 1159 | pcre2_dfa_match(). This was a missing optimization rather than a bug. |
| 1160 | |
| 1161 | 27. Tidied up the ACROSSCHAR macro to be like FORWARDCHAR and BACKCHAR, using a |
| 1162 | pointer argument rather than a code unit value. This should not have affected |
| 1163 | the generated code. |
| 1164 | |
| 1165 | 28. The JIT compiler has been updated. |
| 1166 | |
| 1167 | 29. Avoid pointer overflow for unset captures in pcre2_substring_list_get(). |
| 1168 | This could not actually cause a crash because it was always used in a memcpy() |
| 1169 | call with zero length. |
| 1170 | |
| 1171 | 30. Some internal structures have a variable-length ovector[] as their last |
| 1172 | element. Their actual memory is obtained dynamically, giving an ovector of |
| 1173 | appropriate length. However, they are defined in the structure as |
| 1174 | ovector[NUMBER], where NUMBER is large so that array bound checkers don't |
| 1175 | grumble. The value of NUMBER was 10000, but a fuzzer exceeded 5000 capturing |
| 1176 | groups, making the ovector larger than this. The number has been increased to |
| 1177 | 131072, which allows for the maximum number of captures (65535) plus the |
| 1178 | overall match. This fixes oss-fuzz issue 5415. |
| 1179 | |
| 1180 | 31. Auto-possessification at the end of a capturing group was dependent on what |
| 1181 | follows the group (e.g. /(a+)b/ would auto-possessify the a+) but this caused |
| 1182 | incorrect behaviour when the group was called recursively from elsewhere in the |
| 1183 | pattern where something different might follow. This bug is an unforseen |
| 1184 | consequence of change #1 for 10.30 - the implementation of backtracking into |
| 1185 | recursions. Iterators at the ends of capturing groups are no longer considered |
| 1186 | for auto-possessification if the pattern contains any recursions. Fixes |
| 1187 | Bugzilla #2232. |
| 1188 | |
| 1189 | |
| 1190 | Version 10.30 14-August-2017 |
| 1191 | ---------------------------- |
| 1192 | |
| 1193 | 1. The main interpreter, pcre2_match(), has been refactored into a new version |
| 1194 | that does not use recursive function calls (and therefore the stack) for |
| 1195 | remembering backtracking positions. This makes --disable-stack-for-recursion a |
| 1196 | NOOP. The new implementation allows backtracking into recursive group calls in |
| 1197 | patterns, making it more compatible with Perl, and also fixes some other |
| 1198 | hard-to-do issues such as #1887 in Bugzilla. The code is also cleaner because |
| 1199 | the old code had a number of fudges to try to reduce stack usage. It seems to |
| 1200 | run no slower than the old code. |
| 1201 | |
| 1202 | A number of bugs in the refactored code were subsequently fixed during testing |
| 1203 | before release, but after the code was made available in the repository. These |
| 1204 | bugs were never in fully released code, but are noted here for the record. |
| 1205 | |
| 1206 | (a) If a pattern had fewer capturing parentheses than the ovector supplied in |
| 1207 | the match data block, a memory error (detectable by ASAN) occurred after |
| 1208 | a match, because the external block was being set from non-existent |
| 1209 | internal ovector fields. Fixes oss-fuzz issue 781. |
| 1210 | |
| 1211 | (b) A pattern with very many capturing parentheses (when the internal frame |
| 1212 | size was greater than the initial frame vector on the stack) caused a |
| 1213 | crash. A vector on the heap is now set up at the start of matching if the |
| 1214 | vector on the stack is not big enough to handle at least 10 frames. |
| 1215 | Fixes oss-fuzz issue 783. |
| 1216 | |
| 1217 | (c) Handling of (*VERB)s in recursions was wrong in some cases. |
| 1218 | |
| 1219 | (d) Captures in negative assertions that were used as conditions were not |
| 1220 | happening if the assertion matched via (*ACCEPT). |
| 1221 | |
| 1222 | (e) Mark values were not being passed out of recursions. |
| 1223 | |
| 1224 | (f) Refactor some code in do_callout() to avoid picky compiler warnings about |
| 1225 | negative indices. Fixes oss-fuzz issue 1454. |
| 1226 | |
| 1227 | (g) Similarly refactor the way the variable length ovector is addressed for |
| 1228 | similar reasons. Fixes oss-fuzz issue 1465. |
| 1229 | |
| 1230 | 2. Now that pcre2_match() no longer uses recursive function calls (see above), |
| 1231 | the "match limit recursion" value seems misnamed. It still exists, and limits |
| 1232 | the depth of tree that is searched. To avoid future confusion, it has been |
| 1233 | renamed as "depth limit" in all relevant places (--with-depth-limit, |
| 1234 | (*LIMIT_DEPTH), pcre2_set_depth_limit(), etc) but the old names are still |
| 1235 | available for backwards compatibility. |
| 1236 | |
| 1237 | 3. Hardened pcre2test so as to reduce the number of bugs reported by fuzzers: |
| 1238 | |
| 1239 | (a) Check for malloc failures when getting memory for the ovector (POSIX) or |
| 1240 | the match data block (non-POSIX). |
| 1241 | |
| 1242 | 4. In the 32-bit library in non-UTF mode, an attempt to find a Unicode property |
| 1243 | for a character with a code point greater than 0x10ffff (the Unicode maximum) |
| 1244 | caused a crash. |
| 1245 | |
| 1246 | 5. If a lookbehind assertion that contained a back reference to a group |
| 1247 | appearing later in the pattern was compiled with the PCRE2_ANCHORED option, |
| 1248 | undefined actions (often a segmentation fault) could occur, depending on what |
| 1249 | other options were set. An example assertion is (?<!\1(abc)) where the |
| 1250 | reference \1 precedes the group (abc). This fixes oss-fuzz issue 865. |
| 1251 | |
| 1252 | 6. Added the PCRE2_INFO_FRAMESIZE item to pcre2_pattern_info() and arranged for |
| 1253 | pcre2test to use it to output the frame size when the "framesize" modifier is |
| 1254 | given. |
| 1255 | |
| 1256 | 7. Reworked the recursive pattern matching in the JIT compiler to follow the |
| 1257 | interpreter changes. |
| 1258 | |
| 1259 | 8. When the zero_terminate modifier was specified on a pcre2test subject line |
| 1260 | for global matching, unpredictable things could happen. For example, in UTF-8 |
| 1261 | mode, the pattern //g,zero_terminate read random memory when matched against an |
| 1262 | empty string with zero_terminate. This was a bug in pcre2test, not the library. |
| 1263 | |
| 1264 | 9. Moved some Windows-specific code in pcre2grep (introduced in 10.23/13) out |
| 1265 | of the section that is compiled when Unix-style directory scanning is |
| 1266 | available, and into a new section that is always compiled for Windows. |
| 1267 | |
| 1268 | 10. In pcre2test, explicitly close the file after an error during serialization |
| 1269 | or deserialization (the "load" or "save" commands). |
| 1270 | |
| 1271 | 11. Fix memory leak in pcre2_serialize_decode() when the input is invalid. |
| 1272 | |
| 1273 | 12. Fix potential NULL dereference in pcre2_callout_enumerate() if called with |
| 1274 | a NULL pattern pointer when Unicode support is available. |
| 1275 | |
| 1276 | 13. When the 32-bit library was being tested by pcre2test, error messages that |
| 1277 | were longer than 64 code units could cause a buffer overflow. This was a bug in |
| 1278 | pcre2test. |
| 1279 | |
| 1280 | 14. The alternative matching function, pcre2_dfa_match() misbehaved if it |
| 1281 | encountered a character class with a possessive repeat, for example [a-f]{3}+. |
| 1282 | |
| 1283 | 15. The depth (formerly recursion) limit now applies to DFA matching (as |
| 1284 | of 10.23/36); pcre2test has been upgraded so that \=find_limits works with DFA |
| 1285 | matching to find the minimum value for this limit. |
| 1286 | |
| 1287 | 16. Since 10.21, if pcre2_match() was called with a null context, default |
| 1288 | memory allocation functions were used instead of whatever was used when the |
| 1289 | pattern was compiled. |
| 1290 | |
| 1291 | 17. Changes to the pcre2test "memory" modifier on a subject line. These apply |
| 1292 | only to pcre2_match(): |
| 1293 | |
| 1294 | (a) Warn if null_context is set on both pattern and subject, because the |
| 1295 | memory details cannot then be shown. |
| 1296 | |
| 1297 | (b) Remember (up to a certain number of) memory allocations and their |
| 1298 | lengths, and list only the lengths, so as to be system-independent. |
| 1299 | (In practice, the new interpreter never has more than 2 blocks allocated |
| 1300 | simultaneously.) |
| 1301 | |
| 1302 | 18. Make pcre2test detect an error return from pcre2_get_error_message(), give |
| 1303 | a message, and abandon the run (this would have detected #13 above). |
| 1304 | |
| 1305 | 19. Implemented PCRE2_ENDANCHORED. |
| 1306 | |
| 1307 | 20. Applied Jason Hood's patches (slightly modified) to pcre2grep, to implement |
| 1308 | the --output=text (-O) option and the inbuilt callout echo. |
| 1309 | |
| 1310 | 21. Extend auto-anchoring etc. to ignore groups with a zero qualifier and |
| 1311 | single-branch conditions with a false condition (e.g. DEFINE) at the start of a |
| 1312 | branch. For example, /(?(DEFINE)...)^A/ and /(...){0}^B/ are now flagged as |
| 1313 | anchored. |
| 1314 | |
| 1315 | 22. Added an explicit limit on the amount of heap used by pcre2_match(), set by |
| 1316 | pcre2_set_heap_limit() or (*LIMIT_HEAP=xxx). Upgraded pcre2test to show the |
| 1317 | heap limit along with other pattern information, and to find the minimum when |
| 1318 | the find_limits modifier is set. |
| 1319 | |
| 1320 | 23. Write to the last 8 bytes of the pcre2_real_code structure when a compiled |
| 1321 | pattern is set up so as to initialize any padding the compiler might have |
| 1322 | included. This avoids valgrind warnings when a compiled pattern is copied, in |
| 1323 | particular when it is serialized. |
| 1324 | |
| 1325 | 24. Remove a redundant line of code left in accidentally a long time ago. |
| 1326 | |
| 1327 | 25. Remove a duplication typo in pcre2_tables.c |
| 1328 | |
| 1329 | 26. Correct an incorrect cast in pcre2_valid_utf.c |
| 1330 | |
| 1331 | 27. Update pcre2test, remove some unused code in pcre2_match(), and upgrade the |
| 1332 | tests to improve coverage. |
| 1333 | |
| 1334 | 28. Some fixes/tidies as a result of looking at Coverity Scan output: |
| 1335 | |
| 1336 | (a) Typo: ">" should be ">=" in opcode check in pcre2_auto_possess.c. |
| 1337 | (b) Added some casts to avoid "suspicious implicit sign extension". |
| 1338 | (c) Resource leaks in pcre2test in rare error cases. |
| 1339 | (d) Avoid warning for never-use case OP_TABLE_LENGTH which is just a fudge |
| 1340 | for checking at compile time that tables are the right size. |
| 1341 | (e) Add missing "fall through" comment. |
| 1342 | |
| 1343 | 29. Implemented PCRE2_EXTENDED_MORE and related /xx and (?xx) features. |
| 1344 | |
| 1345 | 30. Implement (?n: for PCRE2_NO_AUTO_CAPTURE, because Perl now has this. |
| 1346 | |
| 1347 | 31. If more than one of "push", "pushcopy", or "pushtablescopy" were set in |
| 1348 | pcre2test, a crash could occur. |
| 1349 | |
| 1350 | 32. Make -bigstack in RunTest allocate a 64MiB stack (instead of 16MiB) so |
| 1351 | that all the tests can run with clang's sanitizing options. |
| 1352 | |
| 1353 | 33. Implement extra compile options in the compile context and add the first |
| 1354 | one: PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES. |
| 1355 | |
| 1356 | 34. Implement newline type PCRE2_NEWLINE_NUL. |
| 1357 | |
| 1358 | 35. A lookbehind assertion that had a zero-length branch caused undefined |
| 1359 | behaviour when processed by pcre2_dfa_match(). This is oss-fuzz issue 1859. |
| 1360 | |
| 1361 | 36. The match limit value now also applies to pcre2_dfa_match() as there are |
| 1362 | patterns that can use up a lot of resources without necessarily recursing very |
| 1363 | deeply. (Compare item 10.23/36.) This should fix oss-fuzz #1761. |
| 1364 | |
| 1365 | 37. Implement PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL. |
| 1366 | |
| 1367 | 38. Fix returned offsets from regexec() when REG_STARTEND is used with a |
| 1368 | starting offset greater than zero. |
| 1369 | |
| 1370 | 39. Implement REG_PEND (GNU extension) for the POSIX wrapper. |
| 1371 | |
| 1372 | 40. Implement the subject_literal modifier in pcre2test, and allow jitstack on |
| 1373 | pattern lines. |
| 1374 | |
| 1375 | 41. Implement PCRE2_LITERAL and use it to support REG_NOSPEC. |
| 1376 | |
| 1377 | 42. Implement PCRE2_EXTRA_MATCH_LINE and PCRE2_EXTRA_MATCH_WORD for the benefit |
| 1378 | of pcre2grep. |
| 1379 | |
| 1380 | 43. Re-implement pcre2grep's -F, -w, and -x options using PCRE2_LITERAL, |
| 1381 | PCRE2_EXTRA_MATCH_WORD, and PCRE2_EXTRA_MATCH_LINE. This fixes two bugs: |
| 1382 | |
| 1383 | (a) The -F option did not work for fixed strings containing \E. |
| 1384 | (b) The -w option did not work for patterns with multiple branches. |
| 1385 | |
| 1386 | 44. Added configuration options for the SELinux compatible execmem allocator in |
| 1387 | JIT. |
| 1388 | |
| 1389 | 45. Increased the limit for searching for a "must be present" code unit in |
| 1390 | subjects from 1000 to 2000 for 8-bit searches, since they use memchr() and are |
| 1391 | much faster. |
| 1392 | |
| 1393 | 46. Arrange for anchored patterns to record and use "first code unit" data, |
| 1394 | because this can give a fast "no match" without searching for a "required code |
| 1395 | unit". Previously only non-anchored patterns did this. |
| 1396 | |
| 1397 | 47. Upgraded the Unicode tables from Unicode 8.0.0 to Unicode 10.0.0. |
| 1398 | |
| 1399 | 48. Add the callout_no_where modifier to pcre2test. |
| 1400 | |
| 1401 | 49. Update extended grapheme breaking rules to the latest set that are in |
| 1402 | Unicode Standard Annex #29. |
| 1403 | |
| 1404 | 50. Added experimental foreign pattern conversion facilities |
| 1405 | (pcre2_pattern_convert() and friends). |
| 1406 | |
| 1407 | 51. Change the macro FWRITE, used in pcre2grep, to FWRITE_IGNORE because FWRITE |
| 1408 | is defined in a system header in cygwin. Also modified some of the #ifdefs in |
| 1409 | pcre2grep related to Windows and Cygwin support. |
| 1410 | |
| 1411 | 52. Change 3(g) for 10.23 was a bit too zealous. If a hyphen that follows a |
| 1412 | character class is the last character in the class, Perl does not give a |
| 1413 | warning. PCRE2 now also treats this as a literal. |
| 1414 | |
| 1415 | 53. Related to 52, though PCRE2 was throwing an error for [[:digit:]-X] it was |
| 1416 | not doing so for [\d-X] (and similar escapes), as is documented. |
| 1417 | |
| 1418 | 54. Fixed a MIPS issue in the JIT compiler reported by Joshua Kinard. |
| 1419 | |
| 1420 | 55. Fixed a "maybe uninitialized" warning for class_uchardata in \p handling in |
| 1421 | pcre2_compile() which could never actually trigger (code should have been cut |
| 1422 | out when Unicode support is disabled). |
| 1423 | |
| 1424 | |
| 1425 | Version 10.23 14-February-2017 |
| 1426 | ------------------------------ |
| 1427 | |
| 1428 | 1. Extended pcre2test with the utf8_input modifier so that it is able to |
| 1429 | generate all possible 16-bit and 32-bit code unit values in non-UTF modes. |
| 1430 | |
| 1431 | 2. In any wide-character mode (8-bit UTF or any 16-bit or 32-bit mode), without |
| 1432 | PCRE2_UCP set, a negative character type such as \D in a positive class should |
| 1433 | cause all characters greater than 255 to match, whatever else is in the class. |
| 1434 | There was a bug that caused this not to happen if a Unicode property item was |
| 1435 | added to such a class, for example [\D\P{Nd}] or [\W\pL]. |
| 1436 | |
| 1437 | 3. There has been a major re-factoring of the pcre2_compile.c file. Most syntax |
| 1438 | checking is now done in the pre-pass that identifies capturing groups. This has |
| 1439 | reduced the amount of duplication and made the code tidier. While doing this, |
| 1440 | some minor bugs and Perl incompatibilities were fixed, including: |
| 1441 | |
| 1442 | (a) \Q\E in the middle of a quantifier such as A+\Q\E+ is now ignored instead |
| 1443 | of giving an invalid quantifier error. |
| 1444 | |
| 1445 | (b) {0} can now be used after a group in a lookbehind assertion; previously |
| 1446 | this caused an "assertion is not fixed length" error. |
| 1447 | |
| 1448 | (c) Perl always treats (?(DEFINE) as a "define" group, even if a group with |
| 1449 | the name "DEFINE" exists. PCRE2 now does likewise. |
| 1450 | |
| 1451 | (d) A recursion condition test such as (?(R2)...) must now refer to an |
| 1452 | existing subpattern. |
| 1453 | |
| 1454 | (e) A conditional recursion test such as (?(R)...) misbehaved if there was a |
| 1455 | group whose name began with "R". |
| 1456 | |
| 1457 | (f) When testing zero-terminated patterns under valgrind, the terminating |
| 1458 | zero is now marked "no access". This catches bugs that would otherwise |
| 1459 | show up only with non-zero-terminated patterns. |
| 1460 | |
| 1461 | (g) A hyphen appearing immediately after a POSIX character class (for example |
| 1462 | /[[:ascii:]-z]/) now generates an error. Perl does accept this as a |
| 1463 | literal, but gives a warning, so it seems best to fail it in PCRE. |
| 1464 | |
| 1465 | (h) An empty \Q\E sequence may appear after a callout that precedes an |
| 1466 | assertion condition (it is, of course, ignored). |
| 1467 | |
| 1468 | One effect of the refactoring is that some error numbers and messages have |
| 1469 | changed, and the pattern offset given for compiling errors is not always the |
| 1470 | right-most character that has been read. In particular, for a variable-length |
| 1471 | lookbehind assertion it now points to the start of the assertion. Another |
| 1472 | change is that when a callout appears before a group, the "length of next |
| 1473 | pattern item" that is passed now just gives the length of the opening |
| 1474 | parenthesis item, not the length of the whole group. A length of zero is now |
| 1475 | given only for a callout at the end of the pattern. Automatic callouts are no |
| 1476 | longer inserted before and after explicit callouts in the pattern. |
| 1477 | |
| 1478 | A number of bugs in the refactored code were subsequently fixed during testing |
| 1479 | before release, but after the code was made available in the repository. Many |
| 1480 | of the bugs were discovered by fuzzing testing. Several of them were related to |
| 1481 | the change from assuming a zero-terminated pattern (which previously had |
| 1482 | required non-zero terminated strings to be copied). These bugs were never in |
| 1483 | fully released code, but are noted here for the record. |
| 1484 | |
| 1485 | (a) An overall recursion such as (?0) inside a lookbehind assertion was not |
| 1486 | being diagnosed as an error. |
| 1487 | |
| 1488 | (b) In utf mode, the length of a *MARK (or other verb) name was being checked |
| 1489 | in characters instead of code units, which could lead to bad code being |
| 1490 | compiled, leading to unpredictable behaviour. |
| 1491 | |
| 1492 | (c) In extended /x mode, characters whose code was greater than 255 caused |
| 1493 | a lookup outside one of the global tables. A similar bug existed for wide |
| 1494 | characters in *VERB names. |
| 1495 | |
| 1496 | (d) The amount of memory needed for a compiled pattern was miscalculated if a |
| 1497 | lookbehind contained more than one toplevel branch and the first branch |
| 1498 | was of length zero. |
| 1499 | |
| 1500 | (e) In UTF-8 or UTF-16 modes with PCRE2_EXTENDED (/x) set and a non-zero- |
| 1501 | terminated pattern, if a # comment ran on to the end of the pattern, one |
| 1502 | or more code units past the end were being read. |
| 1503 | |
| 1504 | (f) An unterminated repeat at the end of a non-zero-terminated pattern (e.g. |
| 1505 | "{2,2") could cause reading beyond the pattern. |
| 1506 | |
| 1507 | (g) When reading a callout string, if the end delimiter was at the end of the |
| 1508 | pattern one further code unit was read. |
| 1509 | |
| 1510 | (h) An unterminated number after \g' could cause reading beyond the pattern. |
| 1511 | |
| 1512 | (i) An insufficient memory size was being computed for compiling with |
| 1513 | PCRE2_AUTO_CALLOUT. |
| 1514 | |
| 1515 | (j) A conditional group with an assertion condition used more memory than was |
| 1516 | allowed for it during parsing, so too many of them could therefore |
| 1517 | overrun a buffer. |
| 1518 | |
| 1519 | (k) If parsing a pattern exactly filled the buffer, the internal test for |
| 1520 | overrun did not check when the final META_END item was added. |
| 1521 | |
| 1522 | (l) If a lookbehind contained a subroutine call, and the called group |
| 1523 | contained an option setting such as (?s), and the PCRE2_ANCHORED option |
| 1524 | was set, unpredictable behaviour could occur. The underlying bug was |
| 1525 | incorrect code and insufficient checking while searching for the end of |
| 1526 | the called subroutine in the parsed pattern. |
| 1527 | |
| 1528 | (m) Quantifiers following (*VERB)s were not being diagnosed as errors. |
| 1529 | |
| 1530 | (n) The use of \Q...\E in a (*VERB) name when PCRE2_ALT_VERBNAMES and |
| 1531 | PCRE2_AUTO_CALLOUT were both specified caused undetermined behaviour. |
| 1532 | |
| 1533 | (o) If \Q was preceded by a quantified item, and the following \E was |
| 1534 | followed by '?' or '+', and there was at least one literal character |
| 1535 | between them, an internal error "unexpected repeat" occurred (example: |
| 1536 | /.+\QX\E+/). |
| 1537 | |
| 1538 | (p) A buffer overflow could occur while sorting the names in the group name |
| 1539 | list (depending on the order in which the names were seen). |
| 1540 | |
| 1541 | (q) A conditional group that started with a callout was not doing the right |
| 1542 | check for a following assertion, leading to compiling bad code. Example: |
| 1543 | /(?(C'XX))?!XX/ |
| 1544 | |
| 1545 | (r) If a character whose code point was greater than 0xffff appeared within |
| 1546 | a lookbehind that was within another lookbehind, the calculation of the |
| 1547 | lookbehind length went wrong and could provoke an internal error. |
| 1548 | |
| 1549 | (t) The sequence \E- or \Q\E- after a POSIX class in a character class caused |
| 1550 | an internal error. Now the hyphen is treated as a literal. |
| 1551 | |
| 1552 | 4. Back references are now permitted in lookbehind assertions when there are |
| 1553 | no duplicated group numbers (that is, (?| has not been used), and, if the |
| 1554 | reference is by name, there is only one group of that name. The referenced |
| 1555 | group must, of course be of fixed length. |
| 1556 | |
| 1557 | 5. pcre2test has been upgraded so that, when run under valgrind with valgrind |
| 1558 | support enabled, reading past the end of the pattern is detected, both when |
| 1559 | compiling and during callout processing. |
| 1560 | |
| 1561 | 6. \g{+<number>} (e.g. \g{+2} ) is now supported. It is a "forward back |
| 1562 | reference" and can be useful in repetitions (compare \g{-<number>} ). Perl does |
| 1563 | not recognize this syntax. |
| 1564 | |
| 1565 | 7. Automatic callouts are no longer generated before and after callouts in the |
| 1566 | pattern. |
| 1567 | |
| 1568 | 8. When pcre2test was outputing information from a callout, the caret indicator |
| 1569 | for the current position in the subject line was incorrect if it was after an |
| 1570 | escape sequence for a character whose code point was greater than \x{ff}. |
| 1571 | |
| 1572 | 9. Change 19 for 10.22 had a typo (PCRE_STATIC_RUNTIME should be |
| 1573 | PCRE2_STATIC_RUNTIME). Fix from David Gaussmann. |
| 1574 | |
| 1575 | 10. Added --max-buffer-size to pcre2grep, to allow for automatic buffer |
| 1576 | expansion when long lines are encountered. Original patch by Dmitry |
| 1577 | Cherniachenko. |
| 1578 | |
| 1579 | 11. If pcre2grep was compiled with JIT support, but the library was compiled |
| 1580 | without it (something that neither ./configure nor CMake allow, but it can be |
| 1581 | done by editing config.h), pcre2grep was giving a JIT error. Now it detects |
| 1582 | this situation and does not try to use JIT. |
| 1583 | |
| 1584 | 12. Added some "const" qualifiers to variables in pcre2grep. |
| 1585 | |
| 1586 | 13. Added Dmitry Cherniachenko's patch for colouring output in Windows |
| 1587 | (untested by me). Also, look for GREP_COLOUR or GREP_COLOR if the environment |
| 1588 | variables PCRE2GREP_COLOUR and PCRE2GREP_COLOR are not found. |
| 1589 | |
| 1590 | 14. Add the -t (grand total) option to pcre2grep. |
| 1591 | |
| 1592 | 15. A number of bugs have been mended relating to match start-up optimizations |
| 1593 | when the first thing in a pattern is a positive lookahead. These all applied |
| 1594 | only when PCRE2_NO_START_OPTIMIZE was *not* set: |
| 1595 | |
| 1596 | (a) A pattern such as (?=.*X)X$ was incorrectly optimized as if it needed |
| 1597 | both an initial 'X' and a following 'X'. |
| 1598 | (b) Some patterns starting with an assertion that started with .* were |
| 1599 | incorrectly optimized as having to match at the start of the subject or |
| 1600 | after a newline. There are cases where this is not true, for example, |
| 1601 | (?=.*[A-Z])(?=.{8,16})(?!.*[\s]) matches after the start in lines that |
| 1602 | start with spaces. Starting .* in an assertion is no longer taken as an |
| 1603 | indication of matching at the start (or after a newline). |
| 1604 | |
| 1605 | 16. The "offset" modifier in pcre2test was not being ignored (as documented) |
| 1606 | when the POSIX API was in use. |
| 1607 | |
| 1608 | 17. Added --enable-fuzz-support to "configure", causing an non-installed |
| 1609 | library containing a test function that can be called by fuzzers to be |
| 1610 | compiled. A non-installed binary to run the test function locally, called |
| 1611 | pcre2fuzzcheck is also compiled. |
| 1612 | |
| 1613 | 18. A pattern with PCRE2_DOTALL (/s) set but not PCRE2_NO_DOTSTAR_ANCHOR, and |
| 1614 | which started with .* inside a positive lookahead was incorrectly being |
| 1615 | compiled as implicitly anchored. |
| 1616 | |
| 1617 | 19. Removed all instances of "register" declarations, as they are considered |
| 1618 | obsolete these days and in any case had become very haphazard. |
| 1619 | |
| 1620 | 20. Add strerror() to pcre2test for failed file opening. |
| 1621 | |
| 1622 | 21. Make pcre2test -C list valgrind support when it is enabled. |
| 1623 | |
| 1624 | 22. Add the use_length modifier to pcre2test. |
| 1625 | |
| 1626 | 23. Fix an off-by-one bug in pcre2test for the list of names for 'get' and |
| 1627 | 'copy' modifiers. |
| 1628 | |
| 1629 | 24. Add PCRE2_CALL_CONVENTION into the prototype declarations in pcre2.h as it |
| 1630 | is apparently needed there as well as in the function definitions. (Why did |
| 1631 | nobody ask for this in PCRE1?) |
| 1632 | |
| 1633 | 25. Change the _PCRE2_H and _PCRE2_UCP_H guard macros in the header files to |
| 1634 | PCRE2_H_IDEMPOTENT_GUARD and PCRE2_UCP_H_IDEMPOTENT_GUARD to be more standard |
| 1635 | compliant and unique. |
| 1636 | |
| 1637 | 26. pcre2-config --libs-posix was listing -lpcre2posix instead of |
| 1638 | -lpcre2-posix. Also, the CMake build process was building the library with the |
| 1639 | wrong name. |
| 1640 | |
| 1641 | 27. In pcre2test, give some offset information for errors in hex patterns. |
| 1642 | This uses the C99 formatting sequence %td, except for MSVC which doesn't |
| 1643 | support it - %lu is used instead. |
| 1644 | |
| 1645 | 28. Implemented pcre2_code_copy_with_tables(), and added pushtablescopy to |
| 1646 | pcre2test for testing it. |
| 1647 | |
| 1648 | 29. Fix small memory leak in pcre2test. |
| 1649 | |
| 1650 | 30. Fix out-of-bounds read for partial matching of /./ against an empty string |
| 1651 | when the newline type is CRLF. |
| 1652 | |
| 1653 | 31. Fix a bug in pcre2test that caused a crash when a locale was set either in |
| 1654 | the current pattern or a previous one and a wide character was matched. |
| 1655 | |
| 1656 | 32. The appearance of \p, \P, or \X in a substitution string when |
| 1657 | PCRE2_SUBSTITUTE_EXTENDED was set caused a segmentation fault (NULL |
| 1658 | dereference). |
| 1659 | |
| 1660 | 33. If the starting offset was specified as greater than the subject length in |
| 1661 | a call to pcre2_substitute() an out-of-bounds memory reference could occur. |
| 1662 | |
| 1663 | 34. When PCRE2 was compiled to use the heap instead of the stack for recursive |
| 1664 | calls to match(), a repeated minimizing caseless back reference, or a |
| 1665 | maximizing one where the two cases had different numbers of code units, |
| 1666 | followed by a caseful back reference, could lose the caselessness of the first |
| 1667 | repeated back reference (example: /(Z)(a)\2{1,2}?(?-i)\1X/i should match ZaAAZX |
| 1668 | but didn't). |
| 1669 | |
| 1670 | 35. When a pattern is too complicated, PCRE2 gives up trying to find a minimum |
| 1671 | matching length and just records zero. Typically this happens when there are |
| 1672 | too many nested or recursive back references. If the limit was reached in |
| 1673 | certain recursive cases it failed to be triggered and an internal error could |
| 1674 | be the result. |
| 1675 | |
| 1676 | 36. The pcre2_dfa_match() function now takes note of the recursion limit for |
| 1677 | the internal recursive calls that are used for lookrounds and recursions within |
| 1678 | the pattern. |
| 1679 | |
| 1680 | 37. More refactoring has got rid of the internal could_be_empty_branch() |
| 1681 | function (around 400 lines of code, including comments) by keeping track of |
| 1682 | could-be-emptiness as the pattern is compiled instead of scanning compiled |
| 1683 | groups. (This would have been much harder before the refactoring of #3 above.) |
| 1684 | This lifts a restriction on the number of branches in a group (more than about |
| 1685 | 1100 would give "pattern is too complicated"). |
| 1686 | |
| 1687 | 38. Add the "-ac" command line option to pcre2test as a synonym for "-pattern |
| 1688 | auto_callout". |
| 1689 | |
| 1690 | 39. In a library with Unicode support, incorrect data was compiled for a |
| 1691 | pattern with PCRE2_UCP set without PCRE2_UTF if a class required all wide |
| 1692 | characters to match (for example, /[\s[:^ascii:]]/). |
| 1693 | |
| 1694 | 40. The callout_error modifier has been added to pcre2test to make it possible |
| 1695 | to return PCRE2_ERROR_CALLOUT from a callout. |
| 1696 | |
| 1697 | 41. A minor change to pcre2grep: colour reset is now "<esc>[0m" instead of |
| 1698 | "<esc>[00m". |
| 1699 | |
| 1700 | 42. The limit in the auto-possessification code that was intended to catch |
| 1701 | overly-complicated patterns and not spend too much time auto-possessifying was |
| 1702 | being reset too often, resulting in very long compile times for some patterns. |
| 1703 | Now such patterns are no longer completely auto-possessified. |
| 1704 | |
| 1705 | 43. Applied Jason Hood's revised patch for RunTest.bat. |
| 1706 | |
| 1707 | 44. Added a new Windows script RunGrepTest.bat, courtesy of Jason Hood. |
| 1708 | |
| 1709 | 45. Minor cosmetic fix to pcre2test: move a variable that is not used under |
| 1710 | Windows into the "not Windows" code. |
| 1711 | |
| 1712 | 46. Applied Jason Hood's patches to upgrade pcre2grep under Windows and tidy |
| 1713 | some of the code: |
| 1714 | |
| 1715 | * normalised the Windows condition by ensuring WIN32 is defined; |
| 1716 | * enables the callout feature under Windows; |
| 1717 | * adds globbing (Microsoft's implementation expands quoted args), |
| 1718 | using a tweaked opendirectory; |
| 1719 | * implements the is_*_tty functions for Windows; |
| 1720 | * --color=always will write the ANSI sequences to file; |
| 1721 | * add sequences 4 (underline works on Win10) and 5 (blink as bright |
| 1722 | background, relatively standard on DOS/Win); |
| 1723 | * remove the (char *) casts for the now-const strings; |
| 1724 | * remove GREP_COLOUR (grep's command line allowed the 'u', but not |
| 1725 | the environment), parsing GREP_COLORS instead; |
| 1726 | * uses the current colour if not set, rather than black; |
| 1727 | * add print_match for the undefined case; |
| 1728 | * fixes a typo. |
| 1729 | |
| 1730 | In addition, colour settings containing anything other than digits and |
| 1731 | semicolon are ignored, and the colour controls are no longer output for empty |
| 1732 | strings. |
| 1733 | |
| 1734 | 47. Detecting patterns that are too large inside the length-measuring loop |
| 1735 | saves processing ridiculously long patterns to their end. |
| 1736 | |
| 1737 | 48. Ignore PCRE2_CASELESS when processing \h, \H, \v, and \V in classes as it |
| 1738 | just wastes time. In the UTF case it can also produce redundant entries in |
| 1739 | XCLASS lists caused by characters with multiple other cases and pairs of |
| 1740 | characters in the same "not-x" sublists. |
| 1741 | |
| 1742 | 49. A pattern such as /(?=(a\K))/ can report the end of the match being before |
| 1743 | its start; pcre2test was not handling this correctly when using the POSIX |
| 1744 | interface (it was OK with the native interface). |
| 1745 | |
| 1746 | 50. In pcre2grep, ignore all JIT compile errors. This means that pcre2grep will |
| 1747 | continue to work, falling back to interpretation if anything goes wrong with |
| 1748 | JIT. |
| 1749 | |
| 1750 | 51. Applied patches from Christian Persch to configure.ac to make use of the |
| 1751 | AC_USE_SYSTEM_EXTENSIONS macro and to test for functions used by the JIT |
| 1752 | modules. |
| 1753 | |
| 1754 | 52. Minor fixes to pcre2grep from Jason Hood: |
| 1755 | * fixed some spacing; |
| 1756 | * Windows doesn't usually use single quotes, so I've added a define |
| 1757 | to use appropriate quotes [in an example]; |
| 1758 | * LC_ALL was displayed as "LCC_ALL"; |
| 1759 | * numbers 11, 12 & 13 should end in "th"; |
| 1760 | * use double quotes in usage message. |
| 1761 | |
| 1762 | 53. When autopossessifying, skip empty branches without recursion, to reduce |
| 1763 | stack usage for the benefit of clang with -fsanitize-address, which uses huge |
| 1764 | stack frames. Example pattern: /X?(R||){3335}/. Fixes oss-fuzz issue 553. |
| 1765 | |
| 1766 | 54. A pattern with very many explicit back references to a group that is a long |
| 1767 | way from the start of the pattern could take a long time to compile because |
| 1768 | searching for the referenced group in order to find the minimum length was |
| 1769 | being done repeatedly. Now up to 128 group minimum lengths are cached and the |
| 1770 | attempt to find a minimum length is abandoned if there is a back reference to a |
| 1771 | group whose number is greater than 128. (In that case, the pattern is so |
| 1772 | complicated that this optimization probably isn't worth it.) This fixes |
| 1773 | oss-fuzz issue 557. |
| 1774 | |
| 1775 | 55. Issue 32 for 10.22 below was not correctly fixed. If pcre2grep in multiline |
| 1776 | mode with --only-matching matched several lines, it restarted scanning at the |
| 1777 | next line instead of moving on to the end of the matched string, which can be |
| 1778 | several lines after the start. |
| 1779 | |
| 1780 | 56. Applied Jason Hood's new patch for RunGrepTest.bat that updates it in line |
| 1781 | with updates to the non-Windows version. |
| 1782 | |
| 1783 | |
| 1784 | |
| 1785 | Version 10.22 29-July-2016 |
| 1786 | -------------------------- |
| 1787 | |
| 1788 | 1. Applied Jason Hood's patches to RunTest.bat and testdata/wintestoutput3 |
| 1789 | to fix problems with running the tests under Windows. |
| 1790 | |
| 1791 | 2. Implemented a facility for quoting literal characters within hexadecimal |
| 1792 | patterns in pcre2test, to make it easier to create patterns with just a few |
| 1793 | non-printing characters. |
| 1794 | |
| 1795 | 3. Binary zeros are not supported in pcre2test input files. It now detects them |
| 1796 | and gives an error. |
| 1797 | |
| 1798 | 4. Updated the valgrind parameters in RunTest: (a) changed smc-check=all to |
| 1799 | smc-check=all-non-file; (b) changed obj:* in the suppression file to obj:??? so |
| 1800 | that it matches only unknown objects. |
| 1801 | |
| 1802 | 5. Updated the maintenance script maint/ManyConfigTests to make it easier to |
| 1803 | select individual groups of tests. |
| 1804 | |
| 1805 | 6. When the POSIX wrapper function regcomp() is called, the REG_NOSUB option |
| 1806 | used to set PCRE2_NO_AUTO_CAPTURE when calling pcre2_compile(). However, this |
| 1807 | disables the use of back references (and subroutine calls), which are supported |
| 1808 | by other implementations of regcomp() with RE_NOSUB. Therefore, REG_NOSUB no |
| 1809 | longer causes PCRE2_NO_AUTO_CAPTURE to be set, though it still ignores nmatch |
| 1810 | and pmatch when regexec() is called. |
| 1811 | |
| 1812 | 7. Because of 6 above, pcre2test has been modified with a new modifier called |
| 1813 | posix_nosub, to call regcomp() with REG_NOSUB. Previously the no_auto_capture |
| 1814 | modifier had this effect. That option is now ignored when the POSIX API is in |
| 1815 | use. |
| 1816 | |
| 1817 | 8. Minor tidies to the pcre2demo.c sample program, including more comments |
| 1818 | about its 8-bit-ness. |
| 1819 | |
| 1820 | 9. Detect unmatched closing parentheses and give the error in the pre-scan |
| 1821 | instead of later. Previously the pre-scan carried on and could give a |
| 1822 | misleading incorrect error message. For example, /(?J)(?'a'))(?'a')/ gave a |
| 1823 | message about invalid duplicate group names. |
| 1824 | |
| 1825 | 10. It has happened that pcre2test was accidentally linked with another POSIX |
| 1826 | regex library instead of libpcre2-posix. In this situation, a call to regcomp() |
| 1827 | (in the other library) may succeed, returning zero, but of course putting its |
| 1828 | own data into the regex_t block. In one example the re_pcre2_code field was |
| 1829 | left as NULL, which made pcre2test think it had not got a compiled POSIX regex, |
| 1830 | so it treated the next line as another pattern line, resulting in a confusing |
| 1831 | error message. A check has been added to pcre2test to see if the data returned |
| 1832 | from a successful call of regcomp() are valid for PCRE2's regcomp(). If they |
| 1833 | are not, an error message is output and the pcre2test run is abandoned. The |
| 1834 | message points out the possibility of a mis-linking. Hopefully this will avoid |
| 1835 | some head-scratching the next time this happens. |
| 1836 | |
| 1837 | 11. A pattern such as /(?<=((?C)0))/, which has a callout inside a lookbehind |
| 1838 | assertion, caused pcre2test to output a very large number of spaces when the |
| 1839 | callout was taken, making the program appearing to loop. |
| 1840 | |
| 1841 | 12. A pattern that included (*ACCEPT) in the middle of a sufficiently deeply |
| 1842 | nested set of parentheses of sufficient size caused an overflow of the |
| 1843 | compiling workspace (which was diagnosed, but of course is not desirable). |
| 1844 | |
| 1845 | 13. Detect missing closing parentheses during the pre-pass for group |
| 1846 | identification. |
| 1847 | |
| 1848 | 14. Changed some integer variable types and put in a number of casts, following |
| 1849 | a report of compiler warnings from Visual Studio 2013 and a few tests with |
| 1850 | gcc's -Wconversion (which still throws up a lot). |
| 1851 | |
| 1852 | 15. Implemented pcre2_code_copy(), and added pushcopy and #popcopy to pcre2test |
| 1853 | for testing it. |
| 1854 | |
| 1855 | 16. Change 66 for 10.21 introduced the use of snprintf() in PCRE2's version of |
| 1856 | regerror(). When the error buffer is too small, my version of snprintf() puts a |
| 1857 | binary zero in the final byte. Bug #1801 seems to show that other versions do |
| 1858 | not do this, leading to bad output from pcre2test when it was checking for |
| 1859 | buffer overflow. It no longer assumes a binary zero at the end of a too-small |
| 1860 | regerror() buffer. |
| 1861 | |
| 1862 | 17. Fixed typo ("&&" for "&") in pcre2_study(). Fortunately, this could not |
| 1863 | actually affect anything, by sheer luck. |
| 1864 | |
| 1865 | 18. Two minor fixes for MSVC compilation: (a) removal of apparently incorrect |
| 1866 | "const" qualifiers in pcre2test and (b) defining snprintf as _snprintf for |
| 1867 | older MSVC compilers. This has been done both in src/pcre2_internal.h for most |
| 1868 | of the library, and also in src/pcre2posix.c, which no longer includes |
| 1869 | pcre2_internal.h (see 24 below). |
| 1870 | |
| 1871 | 19. Applied Chris Wilson's patch (Bugzilla #1681) to CMakeLists.txt for MSVC |
| 1872 | static compilation. Subsequently applied Chris Wilson's second patch, putting |
| 1873 | the first patch under a new option instead of being unconditional when |
| 1874 | PCRE_STATIC is set. |
| 1875 | |
| 1876 | 20. Updated pcre2grep to set stdout as binary when run under Windows, so as not |
| 1877 | to convert \r\n at the ends of reflected lines into \r\r\n. This required |
| 1878 | ensuring that other output that is written to stdout (e.g. file names) uses the |
| 1879 | appropriate line terminator: \r\n for Windows, \n otherwise. |
| 1880 | |
| 1881 | 21. When a line is too long for pcre2grep's internal buffer, show the maximum |
| 1882 | length in the error message. |
| 1883 | |
| 1884 | 22. Added support for string callouts to pcre2grep (Zoltan's patch with PH |
| 1885 | additions). |
| 1886 | |
| 1887 | 23. RunTest.bat was missing a "set type" line for test 22. |
| 1888 | |
| 1889 | 24. The pcre2posix.c file was including pcre2_internal.h, and using some |
| 1890 | "private" knowledge of the data structures. This is unnecessary; the code has |
| 1891 | been re-factored and no longer includes pcre2_internal.h. |
| 1892 | |
| 1893 | 25. A racing condition is fixed in JIT reported by Mozilla. |
| 1894 | |
| 1895 | 26. Minor code refactor to avoid "array subscript is below array bounds" |
| 1896 | compiler warning. |
| 1897 | |
| 1898 | 27. Minor code refactor to avoid "left shift of negative number" warning. |
| 1899 | |
| 1900 | 28. Add a bit more sanity checking to pcre2_serialize_decode() and document |
| 1901 | that it expects trusted data. |
| 1902 | |
| 1903 | 29. Fix typo in pcre2_jit_test.c |
| 1904 | |
| 1905 | 30. Due to an oversight, pcre2grep was not making use of JIT when available. |
| 1906 | This is now fixed. |
| 1907 | |
| 1908 | 31. The RunGrepTest script is updated to use the valgrind suppressions file |
| 1909 | when testing with JIT under valgrind (compare 10.21/51 below). The suppressions |
| 1910 | file is updated so that is now the same as for PCRE1: it suppresses the |
| 1911 | Memcheck warnings Addr16 and Cond in unknown objects (that is, JIT-compiled |
| 1912 | code). Also changed smc-check=all to smc-check=all-non-file as was done for |
| 1913 | RunTest (see 4 above). |
| 1914 | |
| 1915 | 32. Implemented the PCRE2_NO_JIT option for pcre2_match(). |
| 1916 | |
| 1917 | 33. Fix typo that gave a compiler error when JIT not supported. |
| 1918 | |
| 1919 | 34. Fix comment describing the returns from find_fixedlength(). |
| 1920 | |
| 1921 | 35. Fix potential negative index in pcre2test. |
| 1922 | |
| 1923 | 36. Calls to pcre2_get_error_message() with error numbers that are never |
| 1924 | returned by PCRE2 functions were returning empty strings. Now the error code |
| 1925 | PCRE2_ERROR_BADDATA is returned. A facility has been added to pcre2test to |
| 1926 | show the texts for given error numbers (i.e. to call pcre2_get_error_message() |
| 1927 | and display what it returns) and a few representative error codes are now |
| 1928 | checked in RunTest. |
| 1929 | |
| 1930 | 37. Added "&& !defined(__INTEL_COMPILER)" to the test for __GNUC__ in |
| 1931 | pcre2_match.c, in anticipation that this is needed for the same reason it was |
| 1932 | recently added to pcrecpp.cc in PCRE1. |
| 1933 | |
| 1934 | 38. Using -o with -M in pcre2grep could cause unnecessary repeated output when |
| 1935 | the match extended over a line boundary, as it tried to find more matches "on |
| 1936 | the same line" - but it was already over the end. |
| 1937 | |
| 1938 | 39. Allow \C in lookbehinds and DFA matching in UTF-32 mode (by converting it |
| 1939 | to the same code as '.' when PCRE2_DOTALL is set). |
| 1940 | |
| 1941 | 40. Fix two clang compiler warnings in pcre2test when only one code unit width |
| 1942 | is supported. |
| 1943 | |
| 1944 | 41. Upgrade RunTest to automatically re-run test 2 with a large (64MiB) stack |
| 1945 | if it fails when running the interpreter with a 16MiB stack (and if changing |
| 1946 | the stack size via pcre2test is possible). This avoids having to manually set a |
| 1947 | large stack size when testing with clang. |
| 1948 | |
| 1949 | 42. Fix register overwite in JIT when SSE2 acceleration is enabled. |
| 1950 | |
| 1951 | 43. Detect integer overflow in pcre2test pattern and data repetition counts. |
| 1952 | |
| 1953 | 44. In pcre2test, ignore "allcaptures" after DFA matching. |
| 1954 | |
| 1955 | 45. Fix unaligned accesses on x86. Patch by Marc Mutz. |
| 1956 | |
| 1957 | 46. Fix some more clang compiler warnings. |
| 1958 | |
| 1959 | |
| 1960 | Version 10.21 12-January-2016 |
| 1961 | ----------------------------- |
| 1962 | |
| 1963 | 1. Improve matching speed of patterns starting with + or * in JIT. |
| 1964 | |
| 1965 | 2. Use memchr() to find the first character in an unanchored match in 8-bit |
| 1966 | mode in the interpreter. This gives a significant speed improvement. |
| 1967 | |
| 1968 | 3. Removed a redundant copy of the opcode_possessify table in the |
| 1969 | pcre2_auto_possessify.c source. |
| 1970 | |
| 1971 | 4. Fix typos in dftables.c for z/OS. |
| 1972 | |
| 1973 | 5. Change 36 for 10.20 broke the handling of [[:>:]] and [[:<:]] in that |
| 1974 | processing them could involve a buffer overflow if the following character was |
| 1975 | an opening parenthesis. |
| 1976 | |
| 1977 | 6. Change 36 for 10.20 also introduced a bug in processing this pattern: |
| 1978 | /((?x)(*:0))#(?'/. Specifically: if a setting of (?x) was followed by a (*MARK) |
| 1979 | setting (which (*:0) is), then (?x) did not get unset at the end of its group |
| 1980 | during the scan for named groups, and hence the external # was incorrectly |
| 1981 | treated as a comment and the invalid (?' at the end of the pattern was not |
| 1982 | diagnosed. This caused a buffer overflow during the real compile. This bug was |
| 1983 | discovered by Karl Skomski with the LLVM fuzzer. |
| 1984 | |
| 1985 | 7. Moved the pcre2_find_bracket() function from src/pcre2_compile.c into its |
| 1986 | own source module to avoid a circular dependency between src/pcre2_compile.c |
| 1987 | and src/pcre2_study.c |
| 1988 | |
| 1989 | 8. A callout with a string argument containing an opening square bracket, for |
| 1990 | example /(?C$[$)(?<]/, was incorrectly processed and could provoke a buffer |
| 1991 | overflow. This bug was discovered by Karl Skomski with the LLVM fuzzer. |
| 1992 | |
| 1993 | 9. The handling of callouts during the pre-pass for named group identification |
| 1994 | has been tightened up. |
| 1995 | |
| 1996 | 10. The quantifier {1} can be ignored, whether greedy, non-greedy, or |
| 1997 | possessive. This is a very minor optimization. |
| 1998 | |
| 1999 | 11. A possessively repeated conditional group that could match an empty string, |
| 2000 | for example, /(?(R))*+/, was incorrectly compiled. |
| 2001 | |
| 2002 | 12. The Unicode tables have been updated to Unicode 8.0.0 (thanks to Christian |
| 2003 | Persch). |
| 2004 | |
| 2005 | 13. An empty comment (?#) in a pattern was incorrectly processed and could |
| 2006 | provoke a buffer overflow. This bug was discovered by Karl Skomski with the |
| 2007 | LLVM fuzzer. |
| 2008 | |
| 2009 | 14. Fix infinite recursion in the JIT compiler when certain patterns such as |
| 2010 | /(?:|a|){100}x/ are analysed. |
| 2011 | |
| 2012 | 15. Some patterns with character classes involving [: and \\ were incorrectly |
| 2013 | compiled and could cause reading from uninitialized memory or an incorrect |
| 2014 | error diagnosis. Examples are: /[[:\\](?<[::]/ and /[[:\\](?'abc')[a:]. The |
| 2015 | first of these bugs was discovered by Karl Skomski with the LLVM fuzzer. |
| 2016 | |
| 2017 | 16. Pathological patterns containing many nested occurrences of [: caused |
| 2018 | pcre2_compile() to run for a very long time. This bug was found by the LLVM |
| 2019 | fuzzer. |
| 2020 | |
| 2021 | 17. A missing closing parenthesis for a callout with a string argument was not |
| 2022 | being diagnosed, possibly leading to a buffer overflow. This bug was found by |
| 2023 | the LLVM fuzzer. |
| 2024 | |
| 2025 | 18. A conditional group with only one branch has an implicit empty alternative |
| 2026 | branch and must therefore be treated as potentially matching an empty string. |
| 2027 | |
| 2028 | 19. If (?R was followed by - or + incorrect behaviour happened instead of a |
| 2029 | diagnostic. This bug was discovered by Karl Skomski with the LLVM fuzzer. |
| 2030 | |
| 2031 | 20. Another bug that was introduced by change 36 for 10.20: conditional groups |
| 2032 | whose condition was an assertion preceded by an explicit callout with a string |
| 2033 | argument might be incorrectly processed, especially if the string contained \Q. |
| 2034 | This bug was discovered by Karl Skomski with the LLVM fuzzer. |
| 2035 | |
| 2036 | 21. Compiling PCRE2 with the sanitize options of clang showed up a number of |
| 2037 | very pedantic coding infelicities and a buffer overflow while checking a UTF-8 |
| 2038 | string if the final multi-byte UTF-8 character was truncated. |
| 2039 | |
| 2040 | 22. For Perl compatibility in EBCDIC environments, ranges such as a-z in a |
| 2041 | class, where both values are literal letters in the same case, omit the |
| 2042 | non-letter EBCDIC code points within the range. |
| 2043 | |
| 2044 | 23. Finding the minimum matching length of complex patterns with back |
| 2045 | references and/or recursions can take a long time. There is now a cut-off that |
| 2046 | gives up trying to find a minimum length when things get too complex. |
| 2047 | |
| 2048 | 24. An optimization has been added that speeds up finding the minimum matching |
| 2049 | length for patterns containing repeated capturing groups or recursions. |
| 2050 | |
| 2051 | 25. If a pattern contained a back reference to a group whose number was |
| 2052 | duplicated as a result of appearing in a (?|...) group, the computation of the |
| 2053 | minimum matching length gave a wrong result, which could cause incorrect "no |
| 2054 | match" errors. For such patterns, a minimum matching length cannot at present |
| 2055 | be computed. |
| 2056 | |
| 2057 | 26. Added a check for integer overflow in conditions (?(<digits>) and |
| 2058 | (?(R<digits>). This omission was discovered by Karl Skomski with the LLVM |
| 2059 | fuzzer. |
| 2060 | |
| 2061 | 27. Fixed an issue when \p{Any} inside an xclass did not read the current |
| 2062 | character. |
| 2063 | |
| 2064 | 28. If pcre2grep was given the -q option with -c or -l, or when handling a |
| 2065 | binary file, it incorrectly wrote output to stdout. |
| 2066 | |
| 2067 | 29. The JIT compiler did not restore the control verb head in case of *THEN |
| 2068 | control verbs. This issue was found by Karl Skomski with a custom LLVM fuzzer. |
| 2069 | |
| 2070 | 30. The way recursive references such as (?3) are compiled has been re-written |
| 2071 | because the old way was the cause of many issues. Now, conversion of the group |
| 2072 | number into a pattern offset does not happen until the pattern has been |
| 2073 | completely compiled. This does mean that detection of all infinitely looping |
| 2074 | recursions is postponed till match time. In the past, some easy ones were |
| 2075 | detected at compile time. This re-writing was done in response to yet another |
| 2076 | bug found by the LLVM fuzzer. |
| 2077 | |
| 2078 | 31. A test for a back reference to a non-existent group was missing for items |
| 2079 | such as \987. This caused incorrect code to be compiled. This issue was found |
| 2080 | by Karl Skomski with a custom LLVM fuzzer. |
| 2081 | |
| 2082 | 32. Error messages for syntax errors following \g and \k were giving inaccurate |
| 2083 | offsets in the pattern. |
| 2084 | |
| 2085 | 33. Improve the performance of starting single character repetitions in JIT. |
| 2086 | |
| 2087 | 34. (*LIMIT_MATCH=) now gives an error instead of setting the value to 0. |
| 2088 | |
| 2089 | 35. Error messages for syntax errors in *LIMIT_MATCH and *LIMIT_RECURSION now |
| 2090 | give the right offset instead of zero. |
| 2091 | |
| 2092 | 36. The JIT compiler should not check repeats after a {0,1} repeat byte code. |
| 2093 | This issue was found by Karl Skomski with a custom LLVM fuzzer. |
| 2094 | |
| 2095 | 37. The JIT compiler should restore the control chain for empty possessive |
| 2096 | repeats. This issue was found by Karl Skomski with a custom LLVM fuzzer. |
| 2097 | |
| 2098 | 38. A bug which was introduced by the single character repetition optimization |
| 2099 | was fixed. |
| 2100 | |
| 2101 | 39. Match limit check added to recursion. This issue was found by Karl Skomski |
| 2102 | with a custom LLVM fuzzer. |
| 2103 | |
| 2104 | 40. Arrange for the UTF check in pcre2_match() and pcre2_dfa_match() to look |
| 2105 | only at the part of the subject that is relevant when the starting offset is |
| 2106 | non-zero. |
| 2107 | |
| 2108 | 41. Improve first character match in JIT with SSE2 on x86. |
| 2109 | |
| 2110 | 42. Fix two assertion fails in JIT. These issues were found by Karl Skomski |
| 2111 | with a custom LLVM fuzzer. |
| 2112 | |
| 2113 | 43. Correct the setting of CMAKE_C_FLAGS in CMakeLists.txt (patch from Roy Ivy |
| 2114 | III). |
| 2115 | |
| 2116 | 44. Fix bug in RunTest.bat for new test 14, and adjust the script for the added |
| 2117 | test (there are now 20 in total). |
| 2118 | |
| 2119 | 45. Fixed a corner case of range optimization in JIT. |
| 2120 | |
| 2121 | 46. Add the ${*MARK} facility to pcre2_substitute(). |
| 2122 | |
| 2123 | 47. Modifier lists in pcre2test were splitting at spaces without the required |
| 2124 | commas. |
| 2125 | |
| 2126 | 48. Implemented PCRE2_ALT_VERBNAMES. |
| 2127 | |
| 2128 | 49. Fixed two issues in JIT. These were found by Karl Skomski with a custom |
| 2129 | LLVM fuzzer. |
| 2130 | |
| 2131 | 50. The pcre2test program has been extended by adding the #newline_default |
| 2132 | command. This has made it possible to run the standard tests when PCRE2 is |
| 2133 | compiled with either CR or CRLF as the default newline convention. As part of |
| 2134 | this work, the new command was added to several test files and the testing |
| 2135 | scripts were modified. The pcre2grep tests can now also be run when there is no |
| 2136 | LF in the default newline convention. |
| 2137 | |
| 2138 | 51. The RunTest script has been modified so that, when JIT is used and valgrind |
| 2139 | is specified, a valgrind suppressions file is set up to ignore "Invalid read of |
| 2140 | size 16" errors because these are false positives when the hardware supports |
| 2141 | the SSE2 instruction set. |
| 2142 | |
| 2143 | 52. It is now possible to have comment lines amid the subject strings in |
| 2144 | pcre2test (and perltest.sh) input. |
| 2145 | |
| 2146 | 53. Implemented PCRE2_USE_OFFSET_LIMIT and pcre2_set_offset_limit(). |
| 2147 | |
| 2148 | 54. Add the null_context modifier to pcre2test so that calling pcre2_compile() |
| 2149 | and the matching functions with NULL contexts can be tested. |
| 2150 | |
| 2151 | 55. Implemented PCRE2_SUBSTITUTE_EXTENDED. |
| 2152 | |
| 2153 | 56. In a character class such as [\W\p{Any}] where both a negative-type escape |
| 2154 | ("not a word character") and a property escape were present, the property |
| 2155 | escape was being ignored. |
| 2156 | |
| 2157 | 57. Fixed integer overflow for patterns whose minimum matching length is very, |
| 2158 | very large. |
| 2159 | |
| 2160 | 58. Implemented --never-backslash-C. |
| 2161 | |
| 2162 | 59. Change 55 above introduced a bug by which certain patterns provoked the |
| 2163 | erroneous error "\ at end of pattern". |
| 2164 | |
| 2165 | 60. The special sequences [[:<:]] and [[:>:]] gave rise to incorrect compiling |
| 2166 | errors or other strange effects if compiled in UCP mode. Found with libFuzzer |
| 2167 | and AddressSanitizer. |
| 2168 | |
| 2169 | 61. Whitespace at the end of a pcre2test pattern line caused a spurious error |
| 2170 | message if there were only single-character modifiers. It should be ignored. |
| 2171 | |
| 2172 | 62. The use of PCRE2_NO_AUTO_CAPTURE could cause incorrect compilation results |
| 2173 | or segmentation errors for some patterns. Found with libFuzzer and |
| 2174 | AddressSanitizer. |
| 2175 | |
| 2176 | 63. Very long names in (*MARK) or (*THEN) etc. items could provoke a buffer |
| 2177 | overflow. |
| 2178 | |
| 2179 | 64. Improve error message for overly-complicated patterns. |
| 2180 | |
| 2181 | 65. Implemented an optional replication feature for patterns in pcre2test, to |
| 2182 | make it easier to test long repetitive patterns. The tests for 63 above are |
| 2183 | converted to use the new feature. |
| 2184 | |
| 2185 | 66. In the POSIX wrapper, if regerror() was given too small a buffer, it could |
| 2186 | misbehave. |
| 2187 | |
| 2188 | 67. In pcre2_substitute() in UTF mode, the UTF validity check on the |
| 2189 | replacement string was happening before the length setting when the replacement |
| 2190 | string was zero-terminated. |
| 2191 | |
| 2192 | 68. In pcre2_substitute() in UTF mode, PCRE2_NO_UTF_CHECK can be set for the |
| 2193 | second and subsequent calls to pcre2_match(). |
| 2194 | |
| 2195 | 69. There was no check for integer overflow for a replacement group number in |
| 2196 | pcre2_substitute(). An added check for a number greater than the largest group |
| 2197 | number in the pattern means this is not now needed. |
| 2198 | |
| 2199 | 70. The PCRE2-specific VERSION condition didn't work correctly if only one |
| 2200 | digit was given after the decimal point, or if more than two digits were given. |
| 2201 | It now works with one or two digits, and gives a compile time error if more are |
| 2202 | given. |
| 2203 | |
| 2204 | 71. In pcre2_substitute() there was the possibility of reading one code unit |
| 2205 | beyond the end of the replacement string. |
| 2206 | |
| 2207 | 72. The code for checking a subject's UTF-32 validity for a pattern with a |
| 2208 | lookbehind involved an out-of-bounds pointer, which could potentially cause |
| 2209 | trouble in some environments. |
| 2210 | |
| 2211 | 73. The maximum lookbehind length was incorrectly calculated for patterns such |
| 2212 | as /(?<=(a)(?-1))x/ which have a recursion within a backreference. |
| 2213 | |
| 2214 | 74. Give an error if a lookbehind assertion is longer than 65535 code units. |
| 2215 | |
| 2216 | 75. Give an error in pcre2_substitute() if a match ends before it starts (as a |
| 2217 | result of the use of \K). |
| 2218 | |
| 2219 | 76. Check the length of subpattern names and the names in (*MARK:xx) etc. |
| 2220 | dynamically to avoid the possibility of integer overflow. |
| 2221 | |
| 2222 | 77. Implement pcre2_set_max_pattern_length() so that programs can restrict the |
| 2223 | size of patterns that they are prepared to handle. |
| 2224 | |
| 2225 | 78. (*NO_AUTO_POSSESS) was not working. |
| 2226 | |
| 2227 | 79. Adding group information caching improves the speed of compiling when |
| 2228 | checking whether a group has a fixed length and/or could match an empty string, |
| 2229 | especially when recursion or subroutine calls are involved. However, this |
| 2230 | cannot be used when (?| is present in the pattern because the same number may |
| 2231 | be used for groups of different sizes. To catch runaway patterns in this |
| 2232 | situation, counts have been introduced to the functions that scan for empty |
| 2233 | branches or compute fixed lengths. |
| 2234 | |
| 2235 | 80. Allow for the possibility of the size of the nest_save structure not being |
| 2236 | a factor of the size of the compiling workspace (it currently is). |
| 2237 | |
| 2238 | 81. Check for integer overflow in minimum length calculation and cap it at |
| 2239 | 65535. |
| 2240 | |
| 2241 | 82. Small optimizations in code for finding the minimum matching length. |
| 2242 | |
| 2243 | 83. Lock out configuring for EBCDIC with non-8-bit libraries. |
| 2244 | |
| 2245 | 84. Test for error code <= 0 in regerror(). |
| 2246 | |
| 2247 | 85. Check for too many replacements (more than INT_MAX) in pcre2_substitute(). |
| 2248 | |
| 2249 | 86. Avoid the possibility of computing with an out-of-bounds pointer (though |
| 2250 | not dereferencing it) while handling lookbehind assertions. |
| 2251 | |
| 2252 | 87. Failure to get memory for the match data in regcomp() is now given as a |
| 2253 | regcomp() error instead of waiting for regexec() to pick it up. |
| 2254 | |
| 2255 | 88. In pcre2_substitute(), ensure that CRLF is not split when it is a valid |
| 2256 | newline sequence. |
| 2257 | |
| 2258 | 89. Paranoid check in regcomp() for bad error code from pcre2_compile(). |
| 2259 | |
| 2260 | 90. Run test 8 (internal offsets and code sizes) for link sizes 3 and 4 as well |
| 2261 | as for link size 2. |
| 2262 | |
| 2263 | 91. Document that JIT has a limit on pattern size, and give more information |
| 2264 | about JIT compile failures in pcre2test. |
| 2265 | |
| 2266 | 92. Implement PCRE2_INFO_HASBACKSLASHC. |
| 2267 | |
| 2268 | 93. Re-arrange valgrind support code in pcre2test to avoid spurious reports |
| 2269 | with JIT (possibly caused by SSE2?). |
| 2270 | |
| 2271 | 94. Support offset_limit in JIT. |
| 2272 | |
| 2273 | 95. A sequence such as [[:punct:]b] that is, a POSIX character class followed |
| 2274 | by a single ASCII character in a class item, was incorrectly compiled in UCP |
| 2275 | mode. The POSIX class got lost, but only if the single character followed it. |
| 2276 | |
| 2277 | 96. [:punct:] in UCP mode was matching some characters in the range 128-255 |
| 2278 | that should not have been matched. |
| 2279 | |
| 2280 | 97. If [:^ascii:] or [:^xdigit:] are present in a non-negated class, all |
| 2281 | characters with code points greater than 255 are in the class. When a Unicode |
| 2282 | property was also in the class (if PCRE2_UCP is set, escapes such as \w are |
| 2283 | turned into Unicode properties), wide characters were not correctly handled, |
| 2284 | and could fail to match. |
| 2285 | |
| 2286 | 98. In pcre2test, make the "startoffset" modifier a synonym of "offset", |
| 2287 | because it sets the "startoffset" parameter for pcre2_match(). |
| 2288 | |
| 2289 | 99. If PCRE2_AUTO_CALLOUT was set on a pattern that had a (?# comment between |
| 2290 | an item and its qualifier (for example, A(?#comment)?B) pcre2_compile() |
| 2291 | misbehaved. This bug was found by the LLVM fuzzer. |
| 2292 | |
| 2293 | 100. The error for an invalid UTF pattern string always gave the code unit |
| 2294 | offset as zero instead of where the invalidity was found. |
| 2295 | |
| 2296 | 101. Further to 97 above, negated classes such as [^[:^ascii:]\d] were also not |
| 2297 | working correctly in UCP mode. |
| 2298 | |
| 2299 | 102. Similar to 99 above, if an isolated \E was present between an item and its |
| 2300 | qualifier when PCRE2_AUTO_CALLOUT was set, pcre2_compile() misbehaved. This bug |
| 2301 | was found by the LLVM fuzzer. |
| 2302 | |
| 2303 | 103. The POSIX wrapper function regexec() crashed if the option REG_STARTEND |
| 2304 | was set when the pmatch argument was NULL. It now returns REG_INVARG. |
| 2305 | |
| 2306 | 104. Allow for up to 32-bit numbers in the ordin() function in pcre2grep. |
| 2307 | |
| 2308 | 105. An empty \Q\E sequence between an item and its qualifier caused |
| 2309 | pcre2_compile() to misbehave when auto callouts were enabled. This bug |
| 2310 | was found by the LLVM fuzzer. |
| 2311 | |
| 2312 | 106. If both PCRE2_ALT_VERBNAMES and PCRE2_EXTENDED were set, and a (*MARK) or |
| 2313 | other verb "name" ended with whitespace immediately before the closing |
| 2314 | parenthesis, pcre2_compile() misbehaved. Example: /(*:abc )/, but only when |
| 2315 | both those options were set. |
| 2316 | |
| 2317 | 107. In a number of places pcre2_compile() was not handling NULL characters |
| 2318 | correctly, and pcre2test with the "bincode" modifier was not always correctly |
| 2319 | displaying fields containing NULLS: |
| 2320 | |
| 2321 | (a) Within /x extended #-comments |
| 2322 | (b) Within the "name" part of (*MARK) and other *verbs |
| 2323 | (c) Within the text argument of a callout |
| 2324 | |
| 2325 | 108. If a pattern that was compiled with PCRE2_EXTENDED started with white |
| 2326 | space or a #-type comment that was followed by (?-x), which turns off |
| 2327 | PCRE2_EXTENDED, and there was no subsequent (?x) to turn it on again, |
| 2328 | pcre2_compile() assumed that (?-x) applied to the whole pattern and |
| 2329 | consequently mis-compiled it. This bug was found by the LLVM fuzzer. The fix |
| 2330 | for this bug means that a setting of any of the (?imsxJU) options at the start |
| 2331 | of a pattern is no longer transferred to the options that are returned by |
| 2332 | PCRE2_INFO_ALLOPTIONS. In fact, this was an anachronism that should have |
| 2333 | changed when the effects of those options were all moved to compile time. |
| 2334 | |
| 2335 | 109. An escaped closing parenthesis in the "name" part of a (*verb) when |
| 2336 | PCRE2_ALT_VERBNAMES was set caused pcre2_compile() to malfunction. This bug |
| 2337 | was found by the LLVM fuzzer. |
| 2338 | |
| 2339 | 110. Implemented PCRE2_SUBSTITUTE_UNSET_EMPTY, and updated pcre2test to make it |
| 2340 | possible to test it. |
| 2341 | |
| 2342 | 111. "Harden" pcre2test against ridiculously large values in modifiers and |
| 2343 | command line arguments. |
| 2344 | |
| 2345 | 112. Implemented PCRE2_SUBSTITUTE_UNKNOWN_UNSET and PCRE2_SUBSTITUTE_OVERFLOW_ |
| 2346 | LENGTH. |
| 2347 | |
| 2348 | 113. Fix printing of *MARK names that contain binary zeroes in pcre2test. |
| 2349 | |
| 2350 | |
| 2351 | Version 10.20 30-June-2015 |
| 2352 | -------------------------- |
| 2353 | |
| 2354 | 1. Callouts with string arguments have been added. |
| 2355 | |
| 2356 | 2. Assertion code generator in JIT has been optimized. |
| 2357 | |
| 2358 | 3. The invalid pattern (?(?C) has a missing assertion condition at the end. The |
| 2359 | pcre2_compile() function read past the end of the input before diagnosing an |
| 2360 | error. This bug was discovered by the LLVM fuzzer. |
| 2361 | |
| 2362 | 4. Implemented pcre2_callout_enumerate(). |
| 2363 | |
| 2364 | 5. Fix JIT compilation of conditional blocks whose assertion is converted to |
| 2365 | (*FAIL). E.g: /(?(?!))/. |
| 2366 | |
| 2367 | 6. The pattern /(?(?!)^)/ caused references to random memory. This bug was |
| 2368 | discovered by the LLVM fuzzer. |
| 2369 | |
| 2370 | 7. The assertion (?!) is optimized to (*FAIL). This was not handled correctly |
| 2371 | when this assertion was used as a condition, for example (?(?!)a|b). In |
| 2372 | pcre2_match() it worked by luck; in pcre2_dfa_match() it gave an incorrect |
| 2373 | error about an unsupported item. |
| 2374 | |
| 2375 | 8. For some types of pattern, for example /Z*(|d*){216}/, the auto- |
| 2376 | possessification code could take exponential time to complete. A recursion |
| 2377 | depth limit of 1000 has been imposed to limit the resources used by this |
| 2378 | optimization. This infelicity was discovered by the LLVM fuzzer. |
| 2379 | |
| 2380 | 9. A pattern such as /(*UTF)[\S\V\H]/, which contains a negated special class |
| 2381 | such as \S in non-UCP mode, explicit wide characters (> 255) can be ignored |
| 2382 | because \S ensures they are all in the class. The code for doing this was |
| 2383 | interacting badly with the code for computing the amount of space needed to |
| 2384 | compile the pattern, leading to a buffer overflow. This bug was discovered by |
| 2385 | the LLVM fuzzer. |
| 2386 | |
| 2387 | 10. A pattern such as /((?2)+)((?1))/ which has mutual recursion nested inside |
| 2388 | other kinds of group caused stack overflow at compile time. This bug was |
| 2389 | discovered by the LLVM fuzzer. |
| 2390 | |
| 2391 | 11. A pattern such as /(?1)(?#?'){8}(a)/ which had a parenthesized comment |
| 2392 | between a subroutine call and its quantifier was incorrectly compiled, leading |
| 2393 | to buffer overflow or other errors. This bug was discovered by the LLVM fuzzer. |
| 2394 | |
| 2395 | 12. The illegal pattern /(?(?<E>.*!.*)?)/ was not being diagnosed as missing an |
| 2396 | assertion after (?(. The code was failing to check the character after (?(?< |
| 2397 | for the ! or = that would indicate a lookbehind assertion. This bug was |
| 2398 | discovered by the LLVM fuzzer. |
| 2399 | |
| 2400 | 13. A pattern such as /X((?2)()*+){2}+/ which has a possessive quantifier with |
| 2401 | a fixed maximum following a group that contains a subroutine reference was |
| 2402 | incorrectly compiled and could trigger buffer overflow. This bug was discovered |
| 2403 | by the LLVM fuzzer. |
| 2404 | |
| 2405 | 14. Negative relative recursive references such as (?-7) to non-existent |
| 2406 | subpatterns were not being diagnosed and could lead to unpredictable behaviour. |
| 2407 | This bug was discovered by the LLVM fuzzer. |
| 2408 | |
| 2409 | 15. The bug fixed in 14 was due to an integer variable that was unsigned when |
| 2410 | it should have been signed. Some other "int" variables, having been checked, |
| 2411 | have either been changed to uint32_t or commented as "must be signed". |
| 2412 | |
| 2413 | 16. A mutual recursion within a lookbehind assertion such as (?<=((?2))((?1))) |
| 2414 | caused a stack overflow instead of the diagnosis of a non-fixed length |
| 2415 | lookbehind assertion. This bug was discovered by the LLVM fuzzer. |
| 2416 | |
| 2417 | 17. The use of \K in a positive lookbehind assertion in a non-anchored pattern |
| 2418 | (e.g. /(?<=\Ka)/) could make pcre2grep loop. |
| 2419 | |
| 2420 | 18. There was a similar problem to 17 in pcre2test for global matches, though |
| 2421 | the code there did catch the loop. |
| 2422 | |
| 2423 | 19. If a greedy quantified \X was preceded by \C in UTF mode (e.g. \C\X*), |
| 2424 | and a subsequent item in the pattern caused a non-match, backtracking over the |
| 2425 | repeated \X did not stop, but carried on past the start of the subject, causing |
| 2426 | reference to random memory and/or a segfault. There were also some other cases |
| 2427 | where backtracking after \C could crash. This set of bugs was discovered by the |
| 2428 | LLVM fuzzer. |
| 2429 | |
| 2430 | 20. The function for finding the minimum length of a matching string could take |
| 2431 | a very long time if mutual recursion was present many times in a pattern, for |
| 2432 | example, /((?2){73}(?2))((?1))/. A better mutual recursion detection method has |
| 2433 | been implemented. This infelicity was discovered by the LLVM fuzzer. |
| 2434 | |
| 2435 | 21. Implemented PCRE2_NEVER_BACKSLASH_C. |
| 2436 | |
| 2437 | 22. The feature for string replication in pcre2test could read from freed |
| 2438 | memory if the replication required a buffer to be extended, and it was not |
| 2439 | working properly in 16-bit and 32-bit modes. This issue was discovered by a |
| 2440 | fuzzer: see http://lcamtuf.coredump.cx/afl/. |
| 2441 | |
| 2442 | 23. Added the PCRE2_ALT_CIRCUMFLEX option. |
| 2443 | |
| 2444 | 24. Adjust the treatment of \8 and \9 to be the same as the current Perl |
| 2445 | behaviour. |
| 2446 | |
| 2447 | 25. Static linking against the PCRE2 library using the pkg-config module was |
| 2448 | failing on missing pthread symbols. |
| 2449 | |
| 2450 | 26. If a group that contained a recursive back reference also contained a |
| 2451 | forward reference subroutine call followed by a non-forward-reference |
| 2452 | subroutine call, for example /.((?2)(?R)\1)()/, pcre2_compile() failed to |
| 2453 | compile correct code, leading to undefined behaviour or an internally detected |
| 2454 | error. This bug was discovered by the LLVM fuzzer. |
| 2455 | |
| 2456 | 27. Quantification of certain items (e.g. atomic back references) could cause |
| 2457 | incorrect code to be compiled when recursive forward references were involved. |
| 2458 | For example, in this pattern: /(?1)()((((((\1++))\x85)+)|))/. This bug was |
| 2459 | discovered by the LLVM fuzzer. |
| 2460 | |
| 2461 | 28. A repeated conditional group whose condition was a reference by name caused |
| 2462 | a buffer overflow if there was more than one group with the given name. This |
| 2463 | bug was discovered by the LLVM fuzzer. |
| 2464 | |
| 2465 | 29. A recursive back reference by name within a group that had the same name as |
| 2466 | another group caused a buffer overflow. For example: /(?J)(?'d'(?'d'\g{d}))/. |
| 2467 | This bug was discovered by the LLVM fuzzer. |
| 2468 | |
| 2469 | 30. A forward reference by name to a group whose number is the same as the |
| 2470 | current group, for example in this pattern: /(?|(\k'Pm')|(?'Pm'))/, caused a |
| 2471 | buffer overflow at compile time. This bug was discovered by the LLVM fuzzer. |
| 2472 | |
| 2473 | 31. Fix -fsanitize=undefined warnings for left shifts of 1 by 31 (it treats 1 |
| 2474 | as an int; fixed by writing it as 1u). |
| 2475 | |
| 2476 | 32. Fix pcre2grep compile when -std=c99 is used with gcc, though it still gives |
| 2477 | a warning for "fileno" unless -std=gnu99 us used. |
| 2478 | |
| 2479 | 33. A lookbehind assertion within a set of mutually recursive subpatterns could |
| 2480 | provoke a buffer overflow. This bug was discovered by the LLVM fuzzer. |
| 2481 | |
| 2482 | 34. Give an error for an empty subpattern name such as (?''). |
| 2483 | |
| 2484 | 35. Make pcre2test give an error if a pattern that follows #forbud_utf contains |
| 2485 | \P, \p, or \X. |
| 2486 | |
| 2487 | 36. The way named subpatterns are handled has been refactored. There is now a |
| 2488 | pre-pass over the regex which does nothing other than identify named |
| 2489 | subpatterns and count the total captures. This means that information about |
| 2490 | named patterns is known before the rest of the compile. In particular, it means |
| 2491 | that forward references can be checked as they are encountered. Previously, the |
| 2492 | code for handling forward references was contorted and led to several errors in |
| 2493 | computing the memory requirements for some patterns, leading to buffer |
| 2494 | overflows. |
| 2495 | |
| 2496 | 37. There was no check for integer overflow in subroutine calls such as (?123). |
| 2497 | |
| 2498 | 38. The table entry for \l in EBCDIC environments was incorrect, leading to its |
| 2499 | being treated as a literal 'l' instead of causing an error. |
| 2500 | |
| 2501 | 39. If a non-capturing group containing a conditional group that could match |
| 2502 | an empty string was repeated, it was not identified as matching an empty string |
| 2503 | itself. For example: /^(?:(?(1)x|)+)+$()/. |
| 2504 | |
| 2505 | 40. In an EBCDIC environment, pcretest was mishandling the escape sequences |
| 2506 | \a and \e in test subject lines. |
| 2507 | |
| 2508 | 41. In an EBCDIC environment, \a in a pattern was converted to the ASCII |
| 2509 | instead of the EBCDIC value. |
| 2510 | |
| 2511 | 42. The handling of \c in an EBCDIC environment has been revised so that it is |
| 2512 | now compatible with the specification in Perl's perlebcdic page. |
| 2513 | |
| 2514 | 43. Single character repetition in JIT has been improved. 20-30% speedup |
| 2515 | was achieved on certain patterns. |
| 2516 | |
| 2517 | 44. The EBCDIC character 0x41 is a non-breaking space, equivalent to 0xa0 in |
| 2518 | ASCII/Unicode. This has now been added to the list of characters that are |
| 2519 | recognized as white space in EBCDIC. |
| 2520 | |
| 2521 | 45. When PCRE2 was compiled without Unicode support, the use of \p and \P gave |
| 2522 | an error (correctly) when used outside a class, but did not give an error |
| 2523 | within a class. |
| 2524 | |
| 2525 | 46. \h within a class was incorrectly compiled in EBCDIC environments. |
| 2526 | |
| 2527 | 47. JIT should return with error when the compiled pattern requires |
| 2528 | more stack space than the maximum. |
| 2529 | |
| 2530 | 48. Fixed a memory leak in pcre2grep when a locale is set. |
| 2531 | |
| 2532 | |
| 2533 | Version 10.10 06-March-2015 |
| 2534 | --------------------------- |
| 2535 | |
| 2536 | 1. When a pattern is compiled, it remembers the highest back reference so that |
| 2537 | when matching, if the ovector is too small, extra memory can be obtained to |
| 2538 | use instead. A conditional subpattern whose condition is a check on a capture |
| 2539 | having happened, such as, for example in the pattern /^(?:(a)|b)(?(1)A|B)/, is |
| 2540 | another kind of back reference, but it was not setting the highest |
| 2541 | backreference number. This mattered only if pcre2_match() was called with an |
| 2542 | ovector that was too small to hold the capture, and there was no other kind of |
| 2543 | back reference (a situation which is probably quite rare). The effect of the |
| 2544 | bug was that the condition was always treated as FALSE when the capture could |
| 2545 | not be consulted, leading to a incorrect behaviour by pcre2_match(). This bug |
| 2546 | has been fixed. |
| 2547 | |
| 2548 | 2. Functions for serialization and deserialization of sets of compiled patterns |
| 2549 | have been added. |
| 2550 | |
| 2551 | 3. The value that is returned by PCRE2_INFO_SIZE has been corrected to remove |
| 2552 | excess code units at the end of the data block that may occasionally occur if |
| 2553 | the code for calculating the size over-estimates. This change stops the |
| 2554 | serialization code copying uninitialized data, to which valgrind objects. The |
| 2555 | documentation of PCRE2_INFO_SIZE was incorrect in stating that the size did not |
| 2556 | include the general overhead. This has been corrected. |
| 2557 | |
| 2558 | 4. All code units in every slot in the table of group names are now set, again |
| 2559 | in order to avoid accessing uninitialized data when serializing. |
| 2560 | |
| 2561 | 5. The (*NO_JIT) feature is implemented. |
| 2562 | |
| 2563 | 6. If a bug that caused pcre2_compile() to use more memory than allocated was |
| 2564 | triggered when using valgrind, the code in (3) above passed a stupidly large |
| 2565 | value to valgrind. This caused a crash instead of an "internal error" return. |
| 2566 | |
| 2567 | 7. A reference to a duplicated named group (either a back reference or a test |
| 2568 | for being set in a conditional) that occurred in a part of the pattern where |
| 2569 | PCRE2_DUPNAMES was not set caused the amount of memory needed for the pattern |
| 2570 | to be incorrectly calculated, leading to overwriting. |
| 2571 | |
| 2572 | 8. A mutually recursive set of back references such as (\2)(\1) caused a |
| 2573 | segfault at compile time (while trying to find the minimum matching length). |
| 2574 | The infinite loop is now broken (with the minimum length unset, that is, zero). |
| 2575 | |
| 2576 | 9. If an assertion that was used as a condition was quantified with a minimum |
| 2577 | of zero, matching went wrong. In particular, if the whole group had unlimited |
| 2578 | repetition and could match an empty string, a segfault was likely. The pattern |
| 2579 | (?(?=0)?)+ is an example that caused this. Perl allows assertions to be |
| 2580 | quantified, but not if they are being used as conditions, so the above pattern |
| 2581 | is faulted by Perl. PCRE2 has now been changed so that it also rejects such |
| 2582 | patterns. |
| 2583 | |
| 2584 | 10. The error message for an invalid quantifier has been changed from "nothing |
| 2585 | to repeat" to "quantifier does not follow a repeatable item". |
| 2586 | |
| 2587 | 11. If a bad UTF string is compiled with NO_UTF_CHECK, it may succeed, but |
| 2588 | scanning the compiled pattern in subsequent auto-possessification can get out |
| 2589 | of step and lead to an unknown opcode. Previously this could have caused an |
| 2590 | infinite loop. Now it generates an "internal error" error. This is a tidyup, |
| 2591 | not a bug fix; passing bad UTF with NO_UTF_CHECK is documented as having an |
| 2592 | undefined outcome. |
| 2593 | |
| 2594 | 12. A UTF pattern containing a "not" match of a non-ASCII character and a |
| 2595 | subroutine reference could loop at compile time. Example: /[^\xff]((?1))/. |
| 2596 | |
| 2597 | 13. The locale test (RunTest 3) has been upgraded. It now checks that a locale |
| 2598 | that is found in the output of "locale -a" can actually be set by pcre2test |
| 2599 | before it is accepted. Previously, in an environment where a locale was listed |
| 2600 | but would not set (an example does exist), the test would "pass" without |
| 2601 | actually doing anything. Also the fr_CA locale has been added to the list of |
| 2602 | locales that can be used. |
| 2603 | |
| 2604 | 14. Fixed a bug in pcre2_substitute(). If a replacement string ended in a |
| 2605 | capturing group number without parentheses, the last character was incorrectly |
| 2606 | literally included at the end of the replacement string. |
| 2607 | |
| 2608 | 15. A possessive capturing group such as (a)*+ with a minimum repeat of zero |
| 2609 | failed to allow the zero-repeat case if pcre2_match() was called with an |
| 2610 | ovector too small to capture the group. |
| 2611 | |
| 2612 | 16. Improved error message in pcre2test when setting the stack size (-S) fails. |
| 2613 | |
| 2614 | 17. Fixed two bugs in CMakeLists.txt: (1) Some lines had got lost in the |
| 2615 | transfer from PCRE1, meaning that CMake configuration failed if "build tests" |
| 2616 | was selected. (2) The file src/pcre2_serialize.c had not been added to the list |
| 2617 | of PCRE2 sources, which caused a failure to build pcre2test. |
| 2618 | |
| 2619 | 18. Fixed typo in pcre2_serialize.c (DECL instead of DEFN) that causes problems |
| 2620 | only on Windows. |
| 2621 | |
| 2622 | 19. Use binary input when reading back saved serialized patterns in pcre2test. |
| 2623 | |
| 2624 | 20. Added RunTest.bat for running the tests under Windows. |
| 2625 | |
| 2626 | 21. "make distclean" was not removing config.h, a file that may be created for |
| 2627 | use with CMake. |
| 2628 | |
| 2629 | 22. A pattern such as "((?2){0,1999}())?", which has a group containing a |
| 2630 | forward reference repeated a large (but limited) number of times within a |
| 2631 | repeated outer group that has a zero minimum quantifier, caused incorrect code |
| 2632 | to be compiled, leading to the error "internal error: previously-checked |
| 2633 | referenced subpattern not found" when an incorrect memory address was read. |
| 2634 | This bug was reported as "heap overflow", discovered by Kai Lu of Fortinet's |
| 2635 | FortiGuard Labs. (Added 24-March-2015: CVE-2015-2325 was given to this.) |
| 2636 | |
| 2637 | 23. A pattern such as "((?+1)(\1))/" containing a forward reference subroutine |
| 2638 | call within a group that also contained a recursive back reference caused |
| 2639 | incorrect code to be compiled. This bug was reported as "heap overflow", |
| 2640 | discovered by Kai Lu of Fortinet's FortiGuard Labs. (Added 24-March-2015: |
| 2641 | CVE-2015-2326 was given to this.) |
| 2642 | |
| 2643 | 24. Computing the size of the JIT read-only data in advance has been a source |
| 2644 | of various issues, and new ones are still appear unfortunately. To fix |
| 2645 | existing and future issues, size computation is eliminated from the code, |
| 2646 | and replaced by on-demand memory allocation. |
| 2647 | |
| 2648 | 25. A pattern such as /(?i)[A-`]/, where characters in the other case are |
| 2649 | adjacent to the end of the range, and the range contained characters with more |
| 2650 | than one other case, caused incorrect behaviour when compiled in UTF mode. In |
| 2651 | that example, the range a-j was left out of the class. |
| 2652 | |
| 2653 | |
| 2654 | Version 10.00 05-January-2015 |
| 2655 | ----------------------------- |
| 2656 | |
| 2657 | Version 10.00 is the first release of PCRE2, a revised API for the PCRE |
| 2658 | library. Changes prior to 10.00 are logged in the ChangeLog file for the old |
| 2659 | API, up to item 20 for release 8.36. |
| 2660 | |
| 2661 | The code of the library was heavily revised as part of the new API |
| 2662 | implementation. Details of each and every modification were not individually |
| 2663 | logged. In addition to the API changes, the following changes were made. They |
| 2664 | are either new functionality, or bug fixes and other noticeable changes of |
| 2665 | behaviour that were implemented after the code had been forked. |
| 2666 | |
| 2667 | 1. Including Unicode support at build time is now enabled by default, but it |
| 2668 | can optionally be disabled. It is not enabled by default at run time (no |
| 2669 | change). |
| 2670 | |
| 2671 | 2. The test program, now called pcre2test, was re-specified and almost |
| 2672 | completely re-written. Its input is not compatible with input for pcretest. |
| 2673 | |
| 2674 | 3. Patterns may start with (*NOTEMPTY) or (*NOTEMPTY_ATSTART) to set the |
| 2675 | PCRE2_NOTEMPTY or PCRE2_NOTEMPTY_ATSTART options for every subject line that is |
| 2676 | matched by that pattern. |
| 2677 | |
| 2678 | 4. For the benefit of those who use PCRE2 via some other application, that is, |
| 2679 | not writing the function calls themselves, it is possible to check the PCRE2 |
| 2680 | version by matching a pattern such as /(?(VERSION>=10)yes|no)/ against a |
| 2681 | string such as "yesno". |
| 2682 | |
| 2683 | 5. There are case-equivalent Unicode characters whose encodings use different |
| 2684 | numbers of code units in UTF-8. U+023A and U+2C65 are one example. (It is |
| 2685 | theoretically possible for this to happen in UTF-16 too.) If a backreference to |
| 2686 | a group containing one of these characters was greedily repeated, and during |
| 2687 | the match a backtrack occurred, the subject might be backtracked by the wrong |
| 2688 | number of code units. For example, if /^(\x{23a})\1*(.)/ is matched caselessly |
| 2689 | (and in UTF-8 mode) against "\x{23a}\x{2c65}\x{2c65}\x{2c65}", group 2 should |
| 2690 | capture the final character, which is the three bytes E2, B1, and A5 in UTF-8. |
| 2691 | Incorrect backtracking meant that group 2 captured only the last two bytes. |
| 2692 | This bug has been fixed; the new code is slower, but it is used only when the |
| 2693 | strings matched by the repetition are not all the same length. |
| 2694 | |
| 2695 | 6. A pattern such as /()a/ was not setting the "first character must be 'a'" |
| 2696 | information. This applied to any pattern with a group that matched no |
| 2697 | characters, for example: /(?:(?=.)|(?<!x))a/. |
| 2698 | |
| 2699 | 7. When an (*ACCEPT) is triggered inside capturing parentheses, it arranges for |
| 2700 | those parentheses to be closed with whatever has been captured so far. However, |
| 2701 | it was failing to mark any other groups between the highest capture so far and |
| 2702 | the currrent group as "unset". Thus, the ovector for those groups contained |
| 2703 | whatever was previously there. An example is the pattern /(x)|((*ACCEPT))/ when |
| 2704 | matched against "abcd". |
| 2705 | |
| 2706 | 8. The pcre2_substitute() function has been implemented. |
| 2707 | |
| 2708 | 9. If an assertion used as a condition was quantified with a minimum of zero |
| 2709 | (an odd thing to do, but it happened), SIGSEGV or other misbehaviour could |
| 2710 | occur. |
| 2711 | |
| 2712 | 10. The PCRE2_NO_DOTSTAR_ANCHOR option has been implemented. |
| 2713 | |
| 2714 | **** |