Elliott Hughes | 5b80804 | 2021-10-01 10:56:10 -0700 | [diff] [blame] | 1 | News about PCRE2 releases |
| 2 | ------------------------- |
| 3 | |
| 4 | |
Elliott Hughes | 4e19c8e | 2022-04-15 15:11:02 -0700 | [diff] [blame] | 5 | Version 10.40 15-April-2022 |
| 6 | --------------------------- |
| 7 | |
| 8 | This is mostly a bug-fixing and code-tidying release. However, there are some |
| 9 | extensions to Unicode property handling: |
| 10 | |
| 11 | * Added support for Bidi_Class and a number of binary Unicode properties, |
| 12 | including Bidi_Control. |
| 13 | |
| 14 | * A number of changes to script matching for \p and \P: |
| 15 | |
| 16 | (a) Script extensions for a character are now coded as a bitmap instead of |
| 17 | a list of script numbers, which should be faster and does not need a |
| 18 | loop. |
| 19 | |
| 20 | (b) Added the syntax \p{script:xxx} and \p{script_extensions:xxx} (synonyms |
| 21 | sc and scx). |
| 22 | |
| 23 | (c) Changed \p{scriptname} from being the same as \p{sc:scriptname} to being |
| 24 | the same as \p{scx:scriptname} because this change happened in Perl at |
| 25 | release 5.26. |
| 26 | |
| 27 | (d) The standard Unicode 4-letter abbreviations for script names are now |
| 28 | recognized. |
| 29 | |
| 30 | (e) In accordance with Unicode and Perl's "loose matching" rules, spaces, |
| 31 | hyphens, and underscores are ignored in property names, which are then |
| 32 | matched independent of case. |
| 33 | |
| 34 | As always, see ChangeLog for a list of all changes (also the Git log). |
| 35 | |
| 36 | |
Elliott Hughes | 16619d6 | 2021-10-29 12:10:38 -0700 | [diff] [blame] | 37 | Version 10.39 29-October-2021 |
| 38 | ----------------------------- |
| 39 | |
| 40 | This release is happening soon after 10.38 because the bug fix is important. |
| 41 | |
| 42 | 1. Fix incorrect detection of alternatives in first character search in JIT. |
| 43 | |
| 44 | 2. Update to Unicode 14.0.0. |
| 45 | |
| 46 | 3. Some code cleanups (see ChangeLog). |
| 47 | |
| 48 | |
Elliott Hughes | 5b80804 | 2021-10-01 10:56:10 -0700 | [diff] [blame] | 49 | Version 10.38 01-October-2021 |
| 50 | ----------------------------- |
| 51 | |
| 52 | As well as some bug fixes and tidies (as always, see ChangeLog for details), |
| 53 | the documentation is updated to list the new URLs, following the move of the |
| 54 | source repository to GitHub and the mailing list to Google Groups. |
| 55 | |
| 56 | * The CMake build system can now build both static and shared libraries in one |
| 57 | go. |
| 58 | |
| 59 | * Following Perl's lead, \K is now locked out in lookaround assertions by |
| 60 | default, but an option is provided to re-enable the previous behaviour. |
| 61 | |
| 62 | |
| 63 | Version 10.37 26-May-2021 |
| 64 | ------------------------- |
| 65 | |
| 66 | A few more bug fixes and tidies. The only change of real note is the removal of |
| 67 | the actual POSIX names regcomp etc. from the POSIX wrapper library because |
| 68 | these have caused issues for some applications (see 10.33 #2 below). |
| 69 | |
| 70 | |
| 71 | Version 10.36 04-December-2020 |
| 72 | ------------------------------ |
| 73 | |
| 74 | Again, mainly bug fixes and tidies. The only enhancements are the addition of |
| 75 | GNU grep's -m (aka --max-count) option to pcre2grep, and also unifying the |
| 76 | handling of substitution strings for both -O and callouts in pcre2grep, with |
| 77 | the addition of $x{...} and $o{...} to allow for characters whose code points |
| 78 | are greater than 255 in Unicode mode. |
| 79 | |
| 80 | NOTE: there is an outstanding issue with JIT support for MacOS on arm64 |
| 81 | hardware. For details, please see Bugzilla issue #2618. |
| 82 | |
| 83 | |
| 84 | Version 10.35 15-April-2020 |
| 85 | --------------------------- |
| 86 | |
| 87 | Bugfixes, tidies, and a few new enhancements. |
| 88 | |
| 89 | 1. Capturing groups that contain recursive backreferences to themselves are no |
| 90 | longer automatically atomic, because the restriction is no longer necessary |
| 91 | as a result of the 10.30 restructuring. |
| 92 | |
| 93 | 2. Several new options for pcre2_substitute(). |
| 94 | |
| 95 | 3. When Unicode is supported and PCRE2_UCP is set without PCRE2_UTF, Unicode |
| 96 | character properties are used for upper/lower case computations on characters |
| 97 | whose code points are greater than 127. |
| 98 | |
| 99 | 4. The character tables (for low-valued characters) can now more easily be |
| 100 | saved and restored in binary. |
| 101 | |
| 102 | 5. Updated to Unicode 13.0.0. |
| 103 | |
| 104 | |
| 105 | Version 10.34 21-November-2019 |
| 106 | ------------------------------ |
| 107 | |
| 108 | Another release with a few enhancements as well as bugfixes and tidies. The |
| 109 | main new features are: |
| 110 | |
| 111 | 1. There is now some support for matching in invalid UTF strings. |
| 112 | |
| 113 | 2. Non-atomic positive lookarounds are implemented in the pcre2_match() |
| 114 | interpreter, but not in JIT. |
| 115 | |
| 116 | 3. Added two new functions: pcre2_get_match_data_size() and |
| 117 | pcre2_maketables_free(). |
| 118 | |
| 119 | 4. Upgraded to Unicode 12.1.0. |
| 120 | |
| 121 | |
| 122 | Version 10.33 16-April-2019 |
| 123 | --------------------------- |
| 124 | |
| 125 | Yet more bugfixes, tidies, and a few enhancements, summarized here (see |
| 126 | ChangeLog for the full list): |
| 127 | |
| 128 | 1. Callouts from pcre2_substitute() are now available. |
| 129 | |
| 130 | 2. The POSIX functions are now all called pcre2_regcomp() etc., with wrapper |
| 131 | functions that use the standard POSIX names. However, in pcre2posix.h the POSIX |
| 132 | names are defined as macros. This should help avoid linking with the wrong |
| 133 | library in some environments, while still exporting the POSIX names for |
| 134 | pre-existing programs that use them. |
| 135 | |
| 136 | 3. Some new options: |
| 137 | |
| 138 | (a) PCRE2_EXTRA_ESCAPED_CR_IS_LF makes \r behave as \n. |
| 139 | |
| 140 | (b) PCRE2_EXTRA_ALT_BSUX enables support for ECMAScript 6's \u{hh...} |
| 141 | construct. |
| 142 | |
| 143 | (c) PCRE2_COPY_MATCHED_SUBJECT causes a copy of a matched subject to be |
| 144 | made, instead of just remembering a pointer. |
| 145 | |
| 146 | 4. Some new Perl features: |
| 147 | |
| 148 | (a) Perl 5.28's experimental alphabetic names for atomic groups and |
| 149 | lookaround assertions, for example, (*pla:...) and (*atomic:...). |
| 150 | |
| 151 | (b) The new Perl "script run" features (*script_run:...) and |
| 152 | (*atomic_script_run:...) aka (*sr:...) and (*asr:...). |
| 153 | |
| 154 | (c) When PCRE2_UTF is set, allow non-ASCII letters and decimal digits in |
| 155 | capture group names. |
| 156 | |
| 157 | 5. --disable-percent-zt disables the use of %zu and %td in formatting strings |
| 158 | in pcre2test. They were already automatically disabled for VC and older C |
| 159 | compilers. |
| 160 | |
| 161 | 6. Some changes related to callouts in pcre2grep: |
| 162 | |
| 163 | (a) Support for running an external program under VMS has been added, in |
| 164 | addition to Windows and fork() support. |
| 165 | |
| 166 | (b) --disable-pcre2grep-callout-fork restricts the callout support in |
| 167 | to the inbuilt echo facility. |
| 168 | |
| 169 | |
| 170 | Version 10.32 10-September-2018 |
| 171 | ------------------------------- |
| 172 | |
| 173 | This is another mainly bugfix and tidying release with a few minor |
| 174 | enhancements. These are the main ones: |
| 175 | |
| 176 | 1. pcre2grep now supports the inclusion of binary zeros in patterns that are |
| 177 | read from files via the -f option. |
| 178 | |
| 179 | 2. ./configure now supports --enable-jit=auto, which automatically enables JIT |
| 180 | if the hardware supports it. |
| 181 | |
| 182 | 3. In pcre2_dfa_match(), internal recursive calls no longer use the stack for |
| 183 | local workspace and local ovectors. Instead, an initial block of stack is |
| 184 | reserved, but if this is insufficient, heap memory is used. The heap limit |
| 185 | parameter now applies to pcre2_dfa_match(). |
| 186 | |
| 187 | 4. Updated to Unicode version 11.0.0. |
| 188 | |
| 189 | 5. (*ACCEPT:ARG), (*FAIL:ARG), and (*COMMIT:ARG) are now supported. |
| 190 | |
| 191 | 6. Added support for \N{U+dddd}, but only in Unicode mode. |
| 192 | |
| 193 | 7. Added support for (?^) to unset all imnsx options. |
| 194 | |
| 195 | |
| 196 | Version 10.31 12-February-2018 |
| 197 | ------------------------------ |
| 198 | |
| 199 | This is mainly a bugfix and tidying release (see ChangeLog for full details). |
| 200 | However, there are some minor enhancements. |
| 201 | |
| 202 | 1. New pcre2_config() options: PCRE2_CONFIG_NEVER_BACKSLASH_C and |
| 203 | PCRE2_CONFIG_COMPILED_WIDTHS. |
| 204 | |
| 205 | 2. New pcre2_pattern_info() option PCRE2_INFO_EXTRAOPTIONS to retrieve the |
| 206 | extra compile time options. |
| 207 | |
| 208 | 3. There are now public names for all the pcre2_compile() error numbers. |
| 209 | |
| 210 | 4. Added PCRE2_CALLOUT_STARTMATCH and PCRE2_CALLOUT_BACKTRACK bits to a new |
| 211 | field callout_flags in callout blocks. |
| 212 | |
| 213 | |
| 214 | Version 10.30 14-August-2017 |
| 215 | ---------------------------- |
| 216 | |
| 217 | The full list of changes that includes bugfixes and tidies is, as always, in |
| 218 | ChangeLog. These are the most important new features: |
| 219 | |
| 220 | 1. The main interpreter, pcre2_match(), has been refactored into a new version |
| 221 | that does not use recursive function calls (and therefore the system stack) for |
| 222 | remembering backtracking positions. This makes --disable-stack-for-recursion a |
| 223 | NOOP. The new implementation allows backtracking into recursive group calls in |
| 224 | patterns, making it more compatible with Perl, and also fixes some other |
| 225 | previously hard-to-do issues. For patterns that have a lot of backtracking, the |
| 226 | heap is now used, and there is an explicit limit on the amount, settable by |
| 227 | pcre2_set_heap_limit() or (*LIMIT_HEAP=xxx). The "recursion limit" is retained, |
| 228 | but is renamed as "depth limit" (though the old names remain for |
| 229 | compatibility). |
| 230 | |
| 231 | There is also a change in the way callouts from pcre2_match() are handled. The |
| 232 | offset_vector field in the callout block is no longer a pointer to the |
| 233 | actual ovector that was passed to the matching function in the match data |
| 234 | block. Instead it points to an internal ovector of a size large enough to hold |
| 235 | all possible captured substrings in the pattern. |
| 236 | |
| 237 | 2. The new option PCRE2_ENDANCHORED insists that a pattern match must end at |
| 238 | the end of the subject. |
| 239 | |
| 240 | 3. The new option PCRE2_EXTENDED_MORE implements Perl's /xx feature, and |
| 241 | pcre2test is upgraded to support it. Setting within the pattern by (?xx) is |
| 242 | also supported. |
| 243 | |
| 244 | 4. (?n) can be used to set PCRE2_NO_AUTO_CAPTURE, because Perl now has this. |
| 245 | |
| 246 | 5. Additional compile options in the compile context are now available, and the |
| 247 | first two are: PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES and |
| 248 | PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL. |
| 249 | |
| 250 | 6. The newline type PCRE2_NEWLINE_NUL is now available. |
| 251 | |
| 252 | 7. The match limit value now also applies to pcre2_dfa_match() as there are |
| 253 | patterns that can use up a lot of resources without necessarily recursing very |
| 254 | deeply. |
| 255 | |
| 256 | 8. The option REG_PEND (a GNU extension) is now available for the POSIX |
| 257 | wrapper. Also there is a new option PCRE2_LITERAL which is used to support |
| 258 | REG_NOSPEC. |
| 259 | |
| 260 | 9. PCRE2_EXTRA_MATCH_LINE and PCRE2_EXTRA_MATCH_WORD are implemented for the |
| 261 | benefit of pcre2grep, and pcre2grep's -F, -w, and -x options are re-implemented |
| 262 | using PCRE2_LITERAL, PCRE2_EXTRA_MATCH_WORD, and PCRE2_EXTRA_MATCH_LINE. This |
| 263 | is tidier and also fixes some bugs. |
| 264 | |
| 265 | 10. The Unicode tables are upgraded from Unicode 8.0.0 to Unicode 10.0.0. |
| 266 | |
| 267 | 11. There are some experimental functions for converting foreign patterns |
| 268 | (globs and POSIX patterns) into PCRE2 patterns. |
| 269 | |
| 270 | |
| 271 | Version 10.23 14-February-2017 |
| 272 | ------------------------------ |
| 273 | |
| 274 | 1. ChangeLog has the details of a lot of bug fixes and tidies. |
| 275 | |
| 276 | 2. There has been a major re-factoring of the pcre2_compile.c file. Most syntax |
| 277 | checking is now done in the pre-pass that identifies capturing groups. This has |
| 278 | reduced the amount of duplication and made the code tidier. While doing this, |
| 279 | some minor bugs and Perl incompatibilities were fixed (see ChangeLog for |
| 280 | details.) |
| 281 | |
| 282 | 3. Back references are now permitted in lookbehind assertions when there are |
| 283 | no duplicated group numbers (that is, (?| has not been used), and, if the |
| 284 | reference is by name, there is only one group of that name. The referenced |
| 285 | group must, of course be of fixed length. |
| 286 | |
| 287 | 4. \g{+<number>} (e.g. \g{+2} ) is now supported. It is a "forward back |
| 288 | reference" and can be useful in repetitions (compare \g{-<number>} ). Perl does |
| 289 | not recognize this syntax. |
| 290 | |
| 291 | 5. pcre2grep now automatically expands its buffer up to a maximum set by |
| 292 | --max-buffer-size. |
| 293 | |
| 294 | 6. The -t option (grand total) has been added to pcre2grep. |
| 295 | |
| 296 | 7. A new function called pcre2_code_copy_with_tables() exists to copy a |
| 297 | compiled pattern along with a private copy of the character tables that is |
| 298 | uses. |
| 299 | |
| 300 | 8. A user supplied a number of patches to upgrade pcre2grep under Windows and |
| 301 | tidy the code. |
| 302 | |
| 303 | 9. Several updates have been made to pcre2test and test scripts (see |
| 304 | ChangeLog). |
| 305 | |
| 306 | |
| 307 | Version 10.22 29-July-2016 |
| 308 | -------------------------- |
| 309 | |
| 310 | 1. ChangeLog has the details of a number of bug fixes. |
| 311 | |
| 312 | 2. The POSIX wrapper function regcomp() did not used to support back references |
| 313 | and subroutine calls if called with the REG_NOSUB option. It now does. |
| 314 | |
| 315 | 3. A new function, pcre2_code_copy(), is added, to make a copy of a compiled |
| 316 | pattern. |
| 317 | |
| 318 | 4. Support for string callouts is added to pcre2grep. |
| 319 | |
| 320 | 5. Added the PCRE2_NO_JIT option to pcre2_match(). |
| 321 | |
| 322 | 6. The pcre2_get_error_message() function now returns with a negative error |
| 323 | code if the error number it is given is unknown. |
| 324 | |
| 325 | 7. Several updates have been made to pcre2test and test scripts (see |
| 326 | ChangeLog). |
| 327 | |
| 328 | |
| 329 | Version 10.21 12-January-2016 |
| 330 | ----------------------------- |
| 331 | |
| 332 | 1. Many bugs have been fixed. A large number of them were provoked only by very |
| 333 | strange pattern input, and were discovered by fuzzers. Some others were |
| 334 | discovered by code auditing. See ChangeLog for details. |
| 335 | |
| 336 | 2. The Unicode tables have been updated to Unicode version 8.0.0. |
| 337 | |
| 338 | 3. For Perl compatibility in EBCDIC environments, ranges such as a-z in a |
| 339 | class, where both values are literal letters in the same case, omit the |
| 340 | non-letter EBCDIC code points within the range. |
| 341 | |
| 342 | 4. There have been a number of enhancements to the pcre2_substitute() function, |
| 343 | giving more flexibility to replacement facilities. It is now also possible to |
| 344 | cause the function to return the needed buffer size if the one given is too |
| 345 | small. |
| 346 | |
| 347 | 5. The PCRE2_ALT_VERBNAMES option causes the "name" parts of special verbs such |
| 348 | as (*THEN:name) to be processed for backslashes and to take note of |
| 349 | PCRE2_EXTENDED. |
| 350 | |
| 351 | 6. PCRE2_INFO_HASBACKSLASHC makes it possible for a client to find out if a |
| 352 | pattern uses \C, and --never-backslash-C makes it possible to compile a version |
| 353 | PCRE2 in which the use of \C is always forbidden. |
| 354 | |
| 355 | 7. A limit to the length of pattern that can be handled can now be set by |
| 356 | calling pcre2_set_max_pattern_length(). |
| 357 | |
| 358 | 8. When matching an unanchored pattern, a match can be required to begin within |
| 359 | a given number of code units after the start of the subject by calling |
| 360 | pcre2_set_offset_limit(). |
| 361 | |
| 362 | 9. The pcre2test program has been extended to test new facilities, and it can |
| 363 | now run the tests when LF on its own is not a valid newline sequence. |
| 364 | |
| 365 | 10. The RunTest script has also been updated to enable more tests to be run. |
| 366 | |
| 367 | 11. There have been some minor performance enhancements. |
| 368 | |
| 369 | |
| 370 | Version 10.20 30-June-2015 |
| 371 | -------------------------- |
| 372 | |
| 373 | 1. Callouts with string arguments and the pcre2_callout_enumerate() function |
| 374 | have been implemented. |
| 375 | |
| 376 | 2. The PCRE2_NEVER_BACKSLASH_C option, which locks out the use of \C, is added. |
| 377 | |
| 378 | 3. The PCRE2_ALT_CIRCUMFLEX option lets ^ match after a newline at the end of a |
| 379 | subject in multiline mode. |
| 380 | |
| 381 | 4. The way named subpatterns are handled has been refactored. The previous |
| 382 | approach had several bugs. |
| 383 | |
| 384 | 5. The handling of \c in EBCDIC environments has been changed to conform to the |
| 385 | perlebcdic document. This is an incompatible change. |
| 386 | |
| 387 | 6. Bugs have been mended, many of them discovered by fuzzers. |
| 388 | |
| 389 | |
| 390 | Version 10.10 06-March-2015 |
| 391 | --------------------------- |
| 392 | |
| 393 | 1. Serialization and de-serialization functions have been added to the API, |
| 394 | making it possible to save and restore sets of compiled patterns, though |
| 395 | restoration must be done in the same environment that was used for compilation. |
| 396 | |
| 397 | 2. The (*NO_JIT) feature has been added; this makes it possible for a pattern |
| 398 | creator to specify that JIT is not to be used. |
| 399 | |
| 400 | 3. A number of bugs have been fixed. In particular, bugs that caused building |
| 401 | on Windows using CMake to fail have been mended. |
| 402 | |
| 403 | |
| 404 | Version 10.00 05-January-2015 |
| 405 | ----------------------------- |
| 406 | |
| 407 | Version 10.00 is the first release of PCRE2, a revised API for the PCRE |
| 408 | library. Changes prior to 10.00 are logged in the ChangeLog file for the old |
| 409 | API, up to item 20 for release 8.36. New programs are recommended to use the |
| 410 | new library. Programs that use the original (PCRE1) API will need changing |
| 411 | before linking with the new library. |
| 412 | |
| 413 | **** |