Upgrade pcre to pcre2-10.40

Test: make
Change-Id: I5d7243566be5bc6b8e2a5eaf08dec2d08a565f84
diff --git a/doc/html/README.txt b/doc/html/README.txt
index 67e46b4..7896944 100644
--- a/doc/html/README.txt
+++ b/doc/html/README.txt
@@ -114,12 +114,18 @@
 The following instructions assume the use of the widely used "configure; make;
 make install" (autotools) process.
 
-To build PCRE2 on system that supports autotools, first run the "configure"
-command from the PCRE2 distribution directory, with your current directory set
+If you have downloaded and unpacked a PCRE2 release tarball, run the
+"configure" command from the PCRE2 directory, with your current directory set
 to the directory where you want the files to be created. This command is a
 standard GNU "autoconf" configuration script, for which generic instructions
 are supplied in the file INSTALL.
 
+The files in the GitHub repository do not contain "configure". If you have
+downloaded the PCRE2 source files from GitHub, before you can run "configure"
+you must run the shell script called autogen.sh. This runs a number of
+autotools to create a "configure" script (you must of course have the autotools
+commands installed in order to do this).
+
 Most commonly, people build PCRE2 within its own distribution directory, and in
 this case, on many systems, just running "./configure" is sufficient. However,
 the usual methods of changing standard defaults are available. For example:
@@ -188,10 +194,10 @@
 
   As well as supporting UTF strings, Unicode support includes support for the
   \P, \p, and \X sequences that recognize Unicode character properties.
-  However, only the basic two-letter properties such as Lu are supported.
-  Escape sequences such as \d and \w in patterns do not by default make use of
-  Unicode properties, but can be made to do so by setting the PCRE2_UCP option
-  or starting a pattern with (*UCP).
+  However, only a subset of Unicode properties are supported; see the
+  pcre2pattern man page for details. Escape sequences such as \d and \w in
+  patterns do not by default make use of Unicode properties, but can be made to
+  do so by setting the PCRE2_UCP option or starting a pattern with (*UCP).
 
 . You can build PCRE2 to recognize either CR or LF or the sequence CRLF, or any
   of the preceding, or any of the Unicode newline sequences, or the NUL (zero)
@@ -411,7 +417,7 @@
 . Makefile             the makefile that builds the library
 . src/config.h         build-time configuration options for the library
 . src/pcre2.h          the public PCRE2 header file
-. pcre2-config          script that shows the building settings such as CFLAGS
+. pcre2-config         script that shows the building settings such as CFLAGS
                          that were set for "configure"
 . libpcre2-8.pc        )
 . libpcre2-16.pc       ) data for the pkg-config command
@@ -571,9 +577,9 @@
 Making new tarballs
 -------------------
 
-The command "make dist" creates two PCRE2 tarballs, in tar.gz and zip formats.
-The command "make distcheck" does the same, but then does a trial build of the
-new distribution to ensure that it works.
+The command "make dist" creates three PCRE2 tarballs, in tar.gz, tar.bz2, and
+zip formats. The command "make distcheck" does the same, but then does a trial
+build of the new distribution to ensure that it works.
 
 If you have modified any of the man page sources in the doc directory, you
 should first run the PrepareRelease script before making a distribution. This
@@ -602,13 +608,13 @@
 
 Many (but not all) of the tests that are not skipped are run twice if JIT
 support is available. On the second run, JIT compilation is forced. This
-testing can be suppressed by putting "nojit" on the RunTest command line.
+testing can be suppressed by putting "-nojit" on the RunTest command line.
 
 The entire set of tests is run once for each of the 8-bit, 16-bit and 32-bit
 libraries that are enabled. If you want to run just one set of tests, call
 RunTest with either the -8, -16 or -32 option.
 
-If valgrind is installed, you can run the tests under it by putting "valgrind"
+If valgrind is installed, you can run the tests under it by putting "-valgrind"
 on the RunTest command line. To run pcre2test on just one or more specific test
 files, give their numbers as arguments to RunTest, for example:
 
@@ -905,4 +911,4 @@
 Philip Hazel
 Email local part: Philip.Hazel
 Email domain: gmail.com
-Last updated: 29 October 2021
+Last updated: 15 April 2022
diff --git a/doc/html/pcre2_jit_stack_create.html b/doc/html/pcre2_jit_stack_create.html
index 6200d17..548947c 100644
--- a/doc/html/pcre2_jit_stack_create.html
+++ b/doc/html/pcre2_jit_stack_create.html
@@ -34,7 +34,8 @@
 <b>pcre2_jit_stack_assign()</b> to associate the stack with a compiled pattern,
 which can then be processed by <b>pcre2_match()</b> or <b>pcre2_jit_match()</b>.
 A maximum stack size of 512KiB to 1MiB should be more than enough for any
-pattern. For more details, see the
+pattern. If the stack couldn't be allocated or the values passed were not
+reasonable, NULL will be returned. For more details, see the
 <a href="pcre2jit.html"><b>pcre2jit</b></a>
 page.
 </P>
diff --git a/doc/html/pcre2_set_compile_extra_options.html b/doc/html/pcre2_set_compile_extra_options.html
index b1c0a11..2f2bf61 100644
--- a/doc/html/pcre2_set_compile_extra_options.html
+++ b/doc/html/pcre2_set_compile_extra_options.html
@@ -30,8 +30,8 @@
 housed in a compile context. It completely replaces all the bits. The extra
 options are:
 <pre>
-  PCRE2_EXTRA_ALLOW_LOOKAROUND_BSK     Allow \K in lookarounds PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES  Allow \x{df800} to \x{dfff}
-                                         in UTF-8 and UTF-32 modes
+  PCRE2_EXTRA_ALLOW_LOOKAROUND_BSK     Allow \K in lookarounds
+  PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES  Allow \x{d800} to \x{dfff} in UTF-8 and UTF-32 modes
   PCRE2_EXTRA_ALT_BSUX                 Extended alternate \u, \U, and \x handling
   PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL    Treat all invalid escapes as a literal following character
   PCRE2_EXTRA_ESCAPED_CR_IS_LF         Interpret \r as \n
diff --git a/doc/html/pcre2_substitute.html b/doc/html/pcre2_substitute.html
index 10b2267..abf0a70 100644
--- a/doc/html/pcre2_substitute.html
+++ b/doc/html/pcre2_substitute.html
@@ -68,29 +68,29 @@
 The subject and replacement lengths can be given as PCRE2_ZERO_TERMINATED for
 zero-terminated strings. The options are:
 <pre>
-  PCRE2_ANCHORED             Match only at the first position
-  PCRE2_ENDANCHORED          Pattern can match only at end of subject
-  PCRE2_NOTBOL               Subject is not the beginning of a line
-  PCRE2_NOTEOL               Subject is not the end of a line
-  PCRE2_NOTEMPTY             An empty string is not a valid match
-  PCRE2_NOTEMPTY_ATSTART     An empty string at the start of the subject is not a valid match
-  PCRE2_NO_JIT               Do not use JIT matching
-  PCRE2_NO_UTF_CHECK         Do not check the subject or replacement for UTF validity (only relevant if
-                              PCRE2_UTF was set at compile time)
-  PCRE2_SUBSTITUTE_EXTENDED  Do extended replacement processing
-  PCRE2_SUBSTITUTE_GLOBAL    Replace all occurrences in the subject
-  PCRE2_SUBSTITUTE_LITERAL   The replacement string is literal
-  PCRE2_SUBSTITUTE_MATCHED   Use pre-existing match data for 1st match
-  PCRE2_SUBSTITUTE_OVERFLOW_LENGTH  If overflow, compute needed length
+  PCRE2_ANCHORED                     Match only at the first position
+  PCRE2_ENDANCHORED                  Match only at end of subject
+  PCRE2_NOTBOL                       Subject is not the beginning of a line
+  PCRE2_NOTEOL                       Subject is not the end of a line
+  PCRE2_NOTEMPTY                     An empty string is not a valid match
+  PCRE2_NOTEMPTY_ATSTART             An empty string at the start of the subject is not a valid match
+  PCRE2_NO_JIT                       Do not use JIT matching
+  PCRE2_NO_UTF_CHECK                 Do not check for UTF validity in the subject or replacement
+                                      (only relevant if PCRE2_UTF was set at compile time)
+  PCRE2_SUBSTITUTE_EXTENDED          Do extended replacement processing
+  PCRE2_SUBSTITUTE_GLOBAL            Replace all occurrences in the subject
+  PCRE2_SUBSTITUTE_LITERAL           The replacement string is literal
+  PCRE2_SUBSTITUTE_MATCHED           Use pre-existing match data for first match
+  PCRE2_SUBSTITUTE_OVERFLOW_LENGTH   If overflow, compute needed length
   PCRE2_SUBSTITUTE_REPLACEMENT_ONLY  Return only replacement string(s)
-  PCRE2_SUBSTITUTE_UNKNOWN_UNSET  Treat unknown group as unset
-  PCRE2_SUBSTITUTE_UNSET_EMPTY  Simple unset insert = empty string
+  PCRE2_SUBSTITUTE_UNKNOWN_UNSET     Treat unknown group as unset
+  PCRE2_SUBSTITUTE_UNSET_EMPTY       Simple unset insert = empty string
 </pre>
 If PCRE2_SUBSTITUTE_LITERAL is set, PCRE2_SUBSTITUTE_EXTENDED,
 PCRE2_SUBSTITUTE_UNKNOWN_UNSET, and PCRE2_SUBSTITUTE_UNSET_EMPTY are ignored.
 </P>
 <P>
-If PCRE2_SUBSTITUTE_MATCHED is set, <i>match_data</i> must be non-zero; its
+If PCRE2_SUBSTITUTE_MATCHED is set, <i>match_data</i> must be non-NULL; its
 contents must be the result of a call to <b>pcre2_match()</b> using the same
 pattern and subject.
 </P>
diff --git a/doc/html/pcre2api.html b/doc/html/pcre2api.html
index e2237e7..047e242 100644
--- a/doc/html/pcre2api.html
+++ b/doc/html/pcre2api.html
@@ -1845,7 +1845,7 @@
 </P>
 <P>
 Note that this option can also be passed to <b>pcre2_match()</b> and
-<b>pcre_dfa_match()</b>, to suppress UTF validity checking of the subject
+<b>pcre2_dfa_match()</b>, to suppress UTF validity checking of the subject
 string.
 </P>
 <P>
@@ -2055,8 +2055,8 @@
 \d.
 </P>
 <P>
-When PCRE2 is built with Unicode support (the default), the Unicode properties
-of all characters can be tested with \p and \P, or, alternatively, the
+When PCRE2 is built with Unicode support (the default), certain Unicode
+character properties can be tested with \p and \P, or, alternatively, the
 PCRE2_UCP option can be set when a pattern is compiled; this causes \w and
 friends to use Unicode property support instead of the built-in tables.
 PCRE2_UCP also causes upper/lower casing operations on characters with code
@@ -2316,7 +2316,7 @@
   PCRE2_INFO_LASTCODETYPE
 </pre>
 Returns 1 if there is a rightmost literal code unit that must exist in any
-matched string, other than at its start. The third argument should  point to a
+matched string, other than at its start. The third argument should point to a
 <b>uint32_t</b> variable. If there is no such value, 0 is returned. When 1 is
 returned, the code unit value itself can be retrieved using
 PCRE2_INFO_LASTCODEUNIT. For anchored patterns, a last literal value is
@@ -2640,7 +2640,9 @@
 <i>startoffset</i>. The length and offset are in code units, not characters.
 That is, they are in bytes for the 8-bit library, 16-bit code units for the
 16-bit library, and 32-bit code units for the 32-bit library, whether or not
-UTF processing is enabled.
+UTF processing is enabled. As a special case, if <i>subject</i> is NULL and
+<i>length</i> is zero, the subject is assumed to be an empty string. If
+<i>length</i> is non-zero, an error occurs if <i>subject</i> is NULL.
 </P>
 <P>
 If <i>startoffset</i> is greater than the length of the subject,
@@ -3394,12 +3396,17 @@
 <P>
 This function optionally calls <b>pcre2_match()</b> and then makes a copy of the
 subject string in <i>outputbuffer</i>, replacing parts that were matched with
-the <i>replacement</i> string, whose length is supplied in <b>rlength</b>. This
-can be given as PCRE2_ZERO_TERMINATED for a zero-terminated string. There is an
-option (see PCRE2_SUBSTITUTE_REPLACEMENT_ONLY below) to return just the
-replacement string(s). The default action is to perform just one replacement if
-the pattern matches, but there is an option that requests multiple replacements
-(see PCRE2_SUBSTITUTE_GLOBAL below).
+the <i>replacement</i> string, whose length is supplied in <b>rlength</b>, which
+can be given as PCRE2_ZERO_TERMINATED for a zero-terminated string. As a
+special case, if <i>replacement</i> is NULL and <i>rlength</i> is zero, the
+replacement is assumed to be an empty string. If <i>rlength</i> is non-zero, an
+error occurs if <i>replacement</i> is NULL.
+</P>
+<P>
+There is an option (see PCRE2_SUBSTITUTE_REPLACEMENT_ONLY below) to return just
+the replacement string(s). The default action is to perform just one
+replacement if the pattern matches, but there is an option that requests
+multiple replacements (see PCRE2_SUBSTITUTE_GLOBAL below).
 </P>
 <P>
 If successful, <b>pcre2_substitute()</b> returns the number of substitutions
@@ -3433,12 +3440,12 @@
 As well as the usual options for <b>pcre2_match()</b>, a number of additional
 options can be set in the <i>options</i> argument of <b>pcre2_substitute()</b>.
 One such option is PCRE2_SUBSTITUTE_MATCHED. When this is set, an external
-<i>match_data</i> block must be provided, and it must have been used for an
-external call to <b>pcre2_match()</b>. The data in the <i>match_data</i> block
-(return code, offset vector) is used for the first substitution instead of
-calling <b>pcre2_match()</b> from within <b>pcre2_substitute()</b>. This allows
-an application to check for a match before choosing to substitute, without
-having to repeat the match.
+<i>match_data</i> block must be provided, and it must have already been used for
+an external call to <b>pcre2_match()</b> with the same pattern and subject
+arguments. The data in the <i>match_data</i> block (return code, offset vector)
+is then used for the first substitution instead of calling <b>pcre2_match()</b>
+from within <b>pcre2_substitute()</b>. This allows an application to check for a
+match before choosing to substitute, without having to repeat the match.
 </P>
 <P>
 The contents of the externally supplied match data block are not changed when
@@ -3583,7 +3590,7 @@
 terminating a \Q quoted sequence) reverts to no case forcing. The sequences
 \u and \l force the next character (if it is a letter) to upper or lower
 case, respectively, and then the state automatically reverts to no case
-forcing. Case forcing applies to all inserted  characters, including those from
+forcing. Case forcing applies to all inserted characters, including those from
 capture groups and letters within \Q...\E quoted sequences. If either
 PCRE2_UTF or PCRE2_UCP was set when the pattern was compiled, Unicode
 properties are used for case forcing characters whose code points are greater
@@ -3655,7 +3662,9 @@
 </P>
 <P>
 PCRE2_ERROR_NULL is returned if PCRE2_SUBSTITUTE_MATCHED is set but the
-<i>match_data</i> argument is NULL.
+<i>match_data</i> argument is NULL or if the <i>subject</i> or <i>replacement</i>
+arguments are NULL. For backward compatibility reasons an exception is made for
+the <i>replacement</i> argument if the <i>rlength</i> argument is also 0.
 </P>
 <P>
 PCRE2_ERROR_BADREPLACEMENT is used for miscellaneous syntax errors in the
@@ -3810,12 +3819,13 @@
 <P>
 The function <b>pcre2_dfa_match()</b> is called to match a subject string
 against a compiled pattern, using a matching algorithm that scans the subject
-string just once (not counting lookaround assertions), and does not backtrack.
-This has different characteristics to the normal algorithm, and is not
-compatible with Perl. Some of the features of PCRE2 patterns are not supported.
-Nevertheless, there are times when this kind of matching can be useful. For a
-discussion of the two matching algorithms, and a list of features that
-<b>pcre2_dfa_match()</b> does not support, see the
+string just once (not counting lookaround assertions), and does not backtrack
+(except when processing lookaround assertions). This has different
+characteristics to the normal algorithm, and is not compatible with Perl. Some
+of the features of PCRE2 patterns are not supported. Nevertheless, there are
+times when this kind of matching can be useful. For a discussion of the two
+matching algorithms, and a list of features that <b>pcre2_dfa_match()</b> does
+not support, see the
 <a href="pcre2matching.html"><b>pcre2matching</b></a>
 documentation.
 </P>
@@ -3850,7 +3860,7 @@
 </PRE>
 </P>
 <br><b>
-Option bits for <b>pcre_dfa_match()</b>
+Option bits for <b>pcre2_dfa_match()</b>
 </b><br>
 <P>
 The unused bits of the <i>options</i> argument for <b>pcre2_dfa_match()</b> must
@@ -4008,7 +4018,7 @@
 </P>
 <br><a name="SEC42" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 30 August 2021
+Last updated: 14 December 2021
 <br>
 Copyright &copy; 1997-2021 University of Cambridge.
 <br>
diff --git a/doc/html/pcre2build.html b/doc/html/pcre2build.html
index a1c2e95..0d12155 100644
--- a/doc/html/pcre2build.html
+++ b/doc/html/pcre2build.html
@@ -142,8 +142,9 @@
 UTF support allows the libraries to process character code points up to
 0x10ffff in the strings that they handle. Unicode support also gives access to
 the Unicode properties of characters, using pattern escapes such as \P, \p,
-and \X. Only the general category properties such as <i>Lu</i> and <i>Nd</i> are
-supported. Details are given in the
+and \X. Only the general category properties such as <i>Lu</i> and <i>Nd</i>,
+script names, and some bi-directional properties are supported. Details are
+given in the
 <a href="pcre2pattern.html"><b>pcre2pattern</b></a>
 documentation.
 </P>
@@ -307,7 +308,7 @@
 for --with-match-limit. You can set a lower default limit by adding, for
 example,
 <pre>
-  --with-match-limit_depth=10000
+  --with-match-limit-depth=10000
 </pre>
 to the <b>configure</b> command. This value can be overridden at run time. This
 depth limit indirectly limits the amount of heap memory that is used, but
@@ -615,9 +616,9 @@
 </P>
 <br><a name="SEC26" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 20 March 2020
+Last updated: 08 December 2021
 <br>
-Copyright &copy; 1997-2020 University of Cambridge.
+Copyright &copy; 1997-2021 University of Cambridge.
 <br>
 <p>
 Return to the <a href="index.html">PCRE2 index page</a>.
diff --git a/doc/html/pcre2compat.html b/doc/html/pcre2compat.html
index eb82694..5f390c1 100644
--- a/doc/html/pcre2compat.html
+++ b/doc/html/pcre2compat.html
@@ -18,33 +18,41 @@
 <P>
 This document describes some of the differences in the ways that PCRE2 and Perl
 handle regular expressions. The differences described here are with respect to
-Perl version 5.32.0, but as both Perl and PCRE2 are continually changing, the
+Perl version 5.34.0, but as both Perl and PCRE2 are continually changing, the
 information may at times be out of date.
 </P>
 <P>
-1. PCRE2 has only a subset of Perl's Unicode support. Details of what it does
+1. When PCRE2_DOTALL (equivalent to Perl's /s qualifier) is not set, the
+behaviour of the '.' metacharacter differs from Perl. In PCRE2, '.' matches the
+next character unless it is the start of a newline sequence. This means that,
+if the newline setting is CR, CRLF, or NUL, '.' will match the code point LF
+(0x0A) in ASCII/Unicode environments, and NL (either 0x15 or 0x25) when using
+EBCDIC. In Perl, '.' appears never to match LF, even when 0x0A is not a newline
+indicator.
+</P>
+<P>
+2. PCRE2 has only a subset of Perl's Unicode support. Details of what it does
 have are given in the
 <a href="pcre2unicode.html"><b>pcre2unicode</b></a>
 page.
 </P>
 <P>
-2. Like Perl, PCRE2 allows repeat quantifiers on parenthesized assertions, but
+3. Like Perl, PCRE2 allows repeat quantifiers on parenthesized assertions, but
 they do not mean what you might think. For example, (?!a){3} does not assert
 that the next three characters are not "a". It just asserts that the next
 character is not "a" three times (in principle; PCRE2 optimizes this to run the
 assertion just once). Perl allows some repeat quantifiers on other assertions,
-for example, \b* (but not \b{3}, though oddly it does allow ^{3}), but these
-do not seem to have any use. PCRE2 does not allow any kind of quantifier on
-non-lookaround assertions.
+for example, \b* , but these do not seem to have any use. PCRE2 does not allow
+any kind of quantifier on non-lookaround assertions.
 </P>
 <P>
-3. Capture groups that occur inside negative lookaround assertions are counted,
+4. Capture groups that occur inside negative lookaround assertions are counted,
 but their entries in the offsets vector are set only when a negative assertion
 is a condition that has a matching branch (that is, the condition is false).
 Perl may set such capture groups in other circumstances.
 </P>
 <P>
-4. The following Perl escape sequences are not supported: \F, \l, \L, \u,
+5. The following Perl escape sequences are not supported: \F, \l, \L, \u,
 \U, and \N when followed by a character name. \N on its own, matching a
 non-newline character, and \N{U+dd..}, matching a Unicode code point, are
 supported. The escapes that modify the case of following letters are
@@ -55,26 +63,26 @@
 interprets them.
 </P>
 <P>
-5. The Perl escape sequences \p, \P, and \X are supported only if PCRE2 is
+6. The Perl escape sequences \p, \P, and \X are supported only if PCRE2 is
 built with Unicode support (the default). The properties that can be tested
 with \p and \P are limited to the general category properties such as Lu and
-Nd, script names such as Greek or Han, and the derived properties Any and L&.
-Both PCRE2 and Perl support the Cs (surrogate) property, but in PCRE2 its use
-is limited. See the
+Nd, script names such as Greek or Han, Bidi_Class, Bidi_Control, and the
+derived properties Any and LC (synonym L&). Both PCRE2 and Perl support the Cs
+(surrogate) property, but in PCRE2 its use is limited. See the
 <a href="pcre2pattern.html"><b>pcre2pattern</b></a>
 documentation for details. The long synonyms for property names that Perl
 supports (such as \p{Letter}) are not supported by PCRE2, nor is it permitted
 to prefix any of these properties with "Is".
 </P>
 <P>
-6. PCRE2 supports the \Q...\E escape for quoting substrings. Characters
+7. PCRE2 supports the \Q...\E escape for quoting substrings. Characters
 in between are treated as literals. However, this is slightly different from
 Perl in that $ and @ are also handled as literals inside the quotes. In Perl,
-they cause variable interpolation (but of course PCRE2 does not have
-variables). Also, Perl does "double-quotish backslash interpolation" on any
-backslashes between \Q and \E which, its documentation says, "may lead to
-confusing results". PCRE2 treats a backslash between \Q and \E just like any
-other character. Note the following examples:
+they cause variable interpolation (PCRE2 does not have variables). Also, Perl
+does "double-quotish backslash interpolation" on any backslashes between \Q
+and \E which, its documentation says, "may lead to confusing results". PCRE2
+treats a backslash between \Q and \E just like any other character. Note the
+following examples:
 <pre>
     Pattern            PCRE2 matches     Perl matches
 
@@ -88,19 +96,19 @@
 by both PCRE2 and Perl.
 </P>
 <P>
-7. Fairly obviously, PCRE2 does not support the (?{code}) and (??{code})
+8. Fairly obviously, PCRE2 does not support the (?{code}) and (??{code})
 constructions. However, PCRE2 does have a "callout" feature, which allows an
 external function to be called during pattern matching. See the
 <a href="pcre2callout.html"><b>pcre2callout</b></a>
 documentation for details.
 </P>
 <P>
-8. Subroutine calls (whether recursive or not) were treated as atomic groups up
+9. Subroutine calls (whether recursive or not) were treated as atomic groups up
 to PCRE2 release 10.23, but from release 10.30 this changed, and backtracking
 into subroutine calls is now supported, as in Perl.
 </P>
 <P>
-9. In PCRE2, if any of the backtracking control verbs are used in a group that
+10. In PCRE2, if any of the backtracking control verbs are used in a group that
 is called as a subroutine (whether or not recursively), their effect is
 confined to that group; it does not extend to the surrounding pattern. This is
 not always the case in Perl. In particular, if (*THEN) is present in a group
@@ -109,20 +117,20 @@
 processed as anchored at the point where they are tested.
 </P>
 <P>
-10. If a pattern contains more than one backtracking control verb, the first
+11. If a pattern contains more than one backtracking control verb, the first
 one that is backtracked onto acts. For example, in the pattern
 A(*COMMIT)B(*PRUNE)C a failure in B triggers (*COMMIT), but a failure in C
 triggers (*PRUNE). Perl's behaviour is more complex; in many cases it is the
 same as PCRE2, but there are cases where it differs.
 </P>
 <P>
-11. There are some differences that are concerned with the settings of captured
+12. There are some differences that are concerned with the settings of captured
 strings when part of a pattern is repeated. For example, matching "aba" against
 the pattern /^(a(b)?)+$/ in Perl leaves $2 unset, but in PCRE2 it is set to
 "b".
 </P>
 <P>
-12. PCRE2's handling of duplicate capture group numbers and names is not as
+13. PCRE2's handling of duplicate capture group numbers and names is not as
 general as Perl's. This is a consequence of the fact the PCRE2 works internally
 just with numbers, using an external table to translate between numbers and
 names. In particular, a pattern such as (?|(?&#60;a&#62;A)|(?&#60;b&#62;B)), where the two
@@ -132,42 +140,43 @@
 number 1. To avoid this confusing situation, an error is given at compile time.
 </P>
 <P>
-13. Perl used to recognize comments in some places that PCRE2 does not, for
+14. Perl used to recognize comments in some places that PCRE2 does not, for
 example, between the ( and ? at the start of a group. If the /x modifier is
 set, Perl allowed white space between ( and ? though the latest Perls give an
 error (for a while it was just deprecated). There may still be some cases where
 Perl behaves differently.
 </P>
 <P>
-14. Perl, when in warning mode, gives warnings for character classes such as
+15. Perl, when in warning mode, gives warnings for character classes such as
 [A-\d] or [a-[:digit:]]. It then treats the hyphens as literals. PCRE2 has no
 warning features, so it gives an error in these cases because they are almost
 certainly user mistakes.
 </P>
 <P>
-15. In PCRE2, the upper/lower case character properties Lu and Ll are not
+16. In PCRE2, the upper/lower case character properties Lu and Ll are not
 affected when case-independent matching is specified. For example, \p{Lu}
 always matches an upper case letter. I think Perl has changed in this respect;
-in the release at the time of writing (5.32), \p{Lu} and \p{Ll} match all
+in the release at the time of writing (5.34), \p{Lu} and \p{Ll} match all
 letters, regardless of case, when case independence is specified.
 </P>
 <P>
-16. From release 5.32.0, Perl locks out the use of \K in lookaround
+17. From release 5.32.0, Perl locks out the use of \K in lookaround
 assertions. From release 10.38 PCRE2 does the same by default. However, there
 is an option for re-enabling the previous behaviour. When this option is set,
 \K is acted on when it occurs in positive assertions, but is ignored in
 negative assertions.
 </P>
 <P>
-17. PCRE2 provides some extensions to the Perl regular expression facilities.
+18. PCRE2 provides some extensions to the Perl regular expression facilities.
 Perl 5.10 included new features that were not in earlier versions of Perl, some
 of which (such as named parentheses) were in PCRE2 for some time before. This
-list is with respect to Perl 5.32:
+list is with respect to Perl 5.34:
 <br>
 <br>
 (a) Although lookbehind assertions in PCRE2 must match fixed length strings,
 each alternative toplevel branch of a lookbehind assertion can match a
-different length of string. Perl requires them all to have the same length.
+different length of string. Perl used to require them all to have the same
+length, but the latest version has some variable length support.
 <br>
 <br>
 (b) From PCRE2 10.23, backreferences to groups of fixed length are supported
@@ -221,12 +230,12 @@
 lookarounds are atomic.
 </P>
 <P>
-18. The Perl /a modifier restricts /d numbers to pure ascii, and the /aa
+19. The Perl /a modifier restricts /d numbers to pure ascii, and the /aa
 modifier restricts /i case-insensitive matching to pure ascii, ignoring Unicode
 rules. This separation cannot be represented with PCRE2_UCP.
 </P>
 <P>
-19. Perl has different limits than PCRE2. See the
+20. Perl has different limits than PCRE2. See the
 <a href="pcre2limit.html"><b>pcre2limit</b></a>
 documentation for details. Perl went with 5.10 from recursion to iteration
 keeping the intermediate matches on the heap, which is ~10% slower but does not
@@ -248,7 +257,7 @@
 REVISION
 </b><br>
 <P>
-Last updated: 30 August 2021
+Last updated: 08 December 2021
 <br>
 Copyright &copy; 1997-2021 University of Cambridge.
 <br>
diff --git a/doc/html/pcre2jit.html b/doc/html/pcre2jit.html
index e73a229..d89fa23 100644
--- a/doc/html/pcre2jit.html
+++ b/doc/html/pcre2jit.html
@@ -269,11 +269,11 @@
 for currently suspended match(es).
 </P>
 <P>
-In a multithread application, if you do not
-specify a JIT stack, or if you assign or pass back NULL from a callback, that
-is thread-safe, because each thread has its own machine stack. However, if you
-assign or pass back a non-NULL JIT stack, this must be a different stack for
-each thread so that the application is thread-safe.
+In a multithread application, if you do not specify a JIT stack, or if you
+assign or pass back NULL from a callback, that is thread-safe, because each
+thread has its own machine stack. However, if you assign or pass back a
+non-NULL JIT stack, this must be a different stack for each thread so that the
+application is thread-safe.
 </P>
 <P>
 Strictly speaking, even more is allowed. You can assign the same non-NULL stack
@@ -382,8 +382,8 @@
 <b>void pcre2_jit_free_unused_memory(pcre2_general_context *<i>gcontext</i>);</b>
 </P>
 <P>
-The JIT executable allocator does not free all memory when it is possible.
-It expects new allocations, and keeps some free memory around to improve
+The JIT executable allocator does not free all memory when it is possible. It
+expects new allocations, and keeps some free memory around to improve
 allocation speed. However, in low memory conditions, it might be better to free
 all possible memory. You can cause this to happen by calling
 pcre2_jit_free_unused_memory(). Its argument is a general context, for custom
@@ -442,10 +442,10 @@
 <P>
 When you call <b>pcre2_match()</b>, as well as testing for invalid options, a
 number of other sanity checks are performed on the arguments. For example, if
-the subject pointer is NULL, an immediate error is given. Also, unless
-PCRE2_NO_UTF_CHECK is set, a UTF subject string is tested for validity. In the
-interests of speed, these checks do not happen on the JIT fast path, and if
-invalid data is passed, the result is undefined.
+the subject pointer is NULL but the length is non-zero, an immediate error is
+given. Also, unless PCRE2_NO_UTF_CHECK is set, a UTF subject string is tested
+for validity. In the interests of speed, these checks do not happen on the JIT
+fast path, and if invalid data is passed, the result is undefined.
 </P>
 <P>
 Bypassing the sanity checks and the <b>pcre2_match()</b> wrapping can give
@@ -466,9 +466,9 @@
 </P>
 <br><a name="SEC14" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 23 May 2019
+Last updated: 30 November 2021
 <br>
-Copyright &copy; 1997-2019 University of Cambridge.
+Copyright &copy; 1997-2021 University of Cambridge.
 <br>
 <p>
 Return to the <a href="index.html">PCRE2 index page</a>.
diff --git a/doc/html/pcre2pattern.html b/doc/html/pcre2pattern.html
index 9c2d66c..2c24301 100644
--- a/doc/html/pcre2pattern.html
+++ b/doc/html/pcre2pattern.html
@@ -534,7 +534,7 @@
   \0113  is a tab followed by the character "3"
   \113   might be a backreference, otherwise the character with octal code 113
   \377   might be a backreference, otherwise the value 255 (decimal)
-  \81    is always a backreference .sp
+  \81    is always a backreference
 </pre>
 Note that octal values of 100 or greater that are specified using this syntax
 must not be introduced by a leading zero, because no more than three octal
@@ -776,199 +776,62 @@
 sequences are of course limited to testing characters whose code points are
 less than U+0100 and U+10000, respectively. In 32-bit non-UTF mode, code points
 greater than 0x10ffff (the Unicode limit) may be encountered. These are all
-treated as being in the Unknown script and with an unassigned type. The extra
-escape sequences are:
+treated as being in the Unknown script and with an unassigned type.
+</P>
+<P>
+Matching characters by Unicode property is not fast, because PCRE2 has to do a
+multistage table lookup in order to find a character's property. That is why
+the traditional escape sequences such as \d and \w do not use Unicode
+properties in PCRE2 by default, though you can make them do so by setting the
+PCRE2_UCP option or by starting the pattern with (*UCP).
+</P>
+<P>
+The extra escape sequences that provide property support are:
 <pre>
   \p{<i>xx</i>}   a character with the <i>xx</i> property
   \P{<i>xx</i>}   a character without the <i>xx</i> property
   \X       a Unicode extended grapheme cluster
 </pre>
-The property names represented by <i>xx</i> above are case-sensitive. There is
-support for Unicode script names, Unicode general category properties, "Any",
-which matches any character (including newline), and some special PCRE2
-properties (described in the
-<a href="#extraprops">next section).</a>
-Other Perl properties such as "InMusicalSymbols" are not supported by PCRE2.
-Note that \P{Any} does not match any characters, so always causes a match
-failure.
+The property names represented by <i>xx</i> above are not case-sensitive, and in
+accordance with Unicode's "loose matching" rules, spaces, hyphens, and
+underscores are ignored. There is support for Unicode script names, Unicode
+general category properties, "Any", which matches any character (including
+newline), Bidi_Class, a number of binary (yes/no) properties, and some special
+PCRE2 properties (described
+<a href="#extraprops">below).</a>
+Certain other Perl properties such as "InMusicalSymbols" are not supported by
+PCRE2. Note that \P{Any} does not match any characters, so always causes a
+match failure.
+</P>
+<br><b>
+Script properties for \p and \P
+</b><br>
+<P>
+There are three different syntax forms for matching a script. Each Unicode
+character has a basic script and, optionally, a list of other scripts ("Script
+Extensions") with which it is commonly used. Using the Adlam script as an
+example, \p{sc:Adlam} matches characters whose basic script is Adlam, whereas
+\p{scx:Adlam} matches, in addition, characters that have Adlam in their
+extensions list. The full names "script" and "script extensions" for the
+property types are recognized, and a equals sign is an alternative to the
+colon. If a script name is given without a property type, for example,
+\p{Adlam}, it is treated as \p{scx:Adlam}. Perl changed to this
+interpretation at release 5.26 and PCRE2 changed at release 10.40.
 </P>
 <P>
-Sets of Unicode characters are defined as belonging to certain scripts. A
-character from one of these sets can be matched using a script name. For
-example:
-<pre>
-  \p{Greek}
-  \P{Han}
-</pre>
 Unassigned characters (and in non-UTF 32-bit mode, characters with code points
 greater than 0x10FFFF) are assigned the "Unknown" script. Others that are not
 part of an identified script are lumped together as "Common". The current list
-of scripts is:
+of recognized script names and their 4-character abbreviations can be obtained
+by running this command:
+<pre>
+  pcre2test -LS
+
+</PRE>
 </P>
-<P>
-Adlam,
-Ahom,
-Anatolian_Hieroglyphs,
-Arabic,
-Armenian,
-Avestan,
-Balinese,
-Bamum,
-Bassa_Vah,
-Batak,
-Bengali,
-Bhaiksuki,
-Bopomofo,
-Brahmi,
-Braille,
-Buginese,
-Buhid,
-Canadian_Aboriginal,
-Carian,
-Caucasian_Albanian,
-Chakma,
-Cham,
-Cherokee,
-Chorasmian,
-Common,
-Coptic,
-Cuneiform,
-Cypriot,
-Cypro_Minoan,
-Cyrillic,
-Deseret,
-Devanagari,
-Dives_Akuru,
-Dogra,
-Duployan,
-Egyptian_Hieroglyphs,
-Elbasan,
-Elymaic,
-Ethiopic,
-Georgian,
-Glagolitic,
-Gothic,
-Grantha,
-Greek,
-Gujarati,
-Gunjala_Gondi,
-Gurmukhi,
-Han,
-Hangul,
-Hanifi_Rohingya,
-Hanunoo,
-Hatran,
-Hebrew,
-Hiragana,
-Imperial_Aramaic,
-Inherited,
-Inscriptional_Pahlavi,
-Inscriptional_Parthian,
-Javanese,
-Kaithi,
-Kannada,
-Katakana,
-Kayah_Li,
-Kharoshthi,
-Khitan_Small_Script,
-Khmer,
-Khojki,
-Khudawadi,
-Lao,
-Latin,
-Lepcha,
-Limbu,
-Linear_A,
-Linear_B,
-Lisu,
-Lycian,
-Lydian,
-Mahajani,
-Makasar,
-Malayalam,
-Mandaic,
-Manichaean,
-Marchen,
-Masaram_Gondi,
-Medefaidrin,
-Meetei_Mayek,
-Mende_Kikakui,
-Meroitic_Cursive,
-Meroitic_Hieroglyphs,
-Miao,
-Modi,
-Mongolian,
-Mro,
-Multani,
-Myanmar,
-Nabataean,
-Nandinagari,
-New_Tai_Lue,
-Newa,
-Nko,
-Nushu,
-Nyakeng_Puachue_Hmong,
-Ogham,
-Ol_Chiki,
-Old_Hungarian,
-Old_Italic,
-Old_North_Arabian,
-Old_Permic,
-Old_Persian,
-Old_Sogdian,
-Old_South_Arabian,
-Old_Turkic,
-Old_Uyghur,
-Oriya,
-Osage,
-Osmanya,
-Pahawh_Hmong,
-Palmyrene,
-Pau_Cin_Hau,
-Phags_Pa,
-Phoenician,
-Psalter_Pahlavi,
-Rejang,
-Runic,
-Samaritan,
-Saurashtra,
-Sharada,
-Shavian,
-Siddham,
-SignWriting,
-Sinhala,
-Sogdian,
-Sora_Sompeng,
-Soyombo,
-Sundanese,
-Syloti_Nagri,
-Syriac,
-Tagalog,
-Tagbanwa,
-Tai_Le,
-Tai_Tham,
-Tai_Viet,
-Takri,
-Tamil,
-Tangsa,
-Tangut,
-Telugu,
-Thaana,
-Thai,
-Tibetan,
-Tifinagh,
-Tirhuta,
-Toto,
-Ugaritic,
-Unknown,
-Vai,
-Vithkuqi,
-Wancho,
-Warang_Citi,
-Yezidi,
-Yi,
-Zanabazar_Square.
-</P>
+<br><b>
+The general category property for \p and \P
+</b><br>
 <P>
 Each character has exactly one Unicode general category property, specified by
 a two-letter abbreviation. For compatibility with Perl, negation can be
@@ -1030,9 +893,9 @@
   Zp    Paragraph separator
   Zs    Space separator
 </pre>
-The special property L& is also supported: it matches a character that has
-the Lu, Ll, or Lt property, in other words, a letter that is not classified as
-a modifier or "other".
+The special property LC, which has the synonym L&, is also supported: it
+matches a character that has the Lu, Ll, or Lt property, in other words, a
+letter that is not classified as a modifier or "other".
 </P>
 <P>
 The Cs (Surrogate) property applies only to characters whose code points are in
@@ -1059,12 +922,54 @@
 example, \p{Lu} always matches only upper case letters. This is different from
 the behaviour of current versions of Perl.
 </P>
+<br><b>
+Binary (yes/no) properties for \p and \P
+</b><br>
 <P>
-Matching characters by Unicode property is not fast, because PCRE2 has to do a
-multistage table lookup in order to find a character's property. That is why
-the traditional escape sequences such as \d and \w do not use Unicode
-properties in PCRE2 by default, though you can make them do so by setting the
-PCRE2_UCP option or by starting the pattern with (*UCP).
+Unicode defines a number of binary properties, that is, properties whose only
+values are true or false. You can obtain a list of those that are recognized by
+\p and \P, along with their abbreviations, by running this command:
+<pre>
+  pcre2test -LP
+
+</PRE>
+</P>
+<br><b>
+The Bidi_Class property for \p and \P
+</b><br>
+<P>
+<pre>
+  \p{Bidi_Class:&#60;class&#62;}   matches a character with the given class
+  \p{BC:&#60;class&#62;}           matches a character with the given class
+</pre>
+The recognized classes are:
+<pre>
+  AL          Arabic letter
+  AN          Arabic number
+  B           paragraph separator
+  BN          boundary neutral
+  CS          common separator
+  EN          European number
+  ES          European separator
+  ET          European terminator
+  FSI         first strong isolate
+  L           left-to-right
+  LRE         left-to-right embedding
+  LRI         left-to-right isolate
+  LRO         left-to-right override
+  NSM         non-spacing mark
+  ON          other neutral
+  PDF         pop directional format
+  PDI         pop directional isolate
+  R           right-to-left
+  RLE         right-to-left embedding
+  RLI         right-to-left isolate
+  RLO         right-to-left override
+  S           segment separator
+  WS          which space
+</pre>
+An equals sign may be used instead of a colon. The class names are
+case-insensitive; only the short names listed above are recognized.
 </P>
 <br><b>
 Extended grapheme clusters
@@ -1341,15 +1246,17 @@
 <P>
 Outside a character class, a dot in the pattern matches any one character in
 the subject string except (by default) a character that signifies the end of a
-line.
+line. One or more characters may be specified as line terminators (see
+<a href="#newlines">"Newline conventions"</a>
+above).
 </P>
 <P>
-When a line ending is defined as a single character, dot never matches that
-character; when the two-character sequence CRLF is used, dot does not match CR
-if it is immediately followed by LF, but otherwise it matches all characters
-(including isolated CRs and LFs). When any Unicode line endings are being
-recognized, dot does not match CR or LF or any of the other line ending
-characters.
+Dot never matches a single line-ending character. When the two-character
+sequence CRLF is the only line ending, dot does not match CR if it is
+immediately followed by LF, but otherwise it matches all characters (including
+isolated CRs and LFs). When ANYCRLF is selected for line endings, no occurences
+of CR of LF match dot. When all Unicode line endings are being recognized, dot
+does not match CR or LF or any of the other line ending characters.
 </P>
 <P>
 The behaviour of dot with regard to newlines can be changed. If the
@@ -2180,10 +2087,10 @@
 <pre>
   (*atomic:\d+)foo
 </pre>
-This kind of parenthesized group "locks up" the  part of the pattern it
-contains once it has matched, and a failure further into the pattern is
-prevented from backtracking into it. Backtracking past it to previous items,
-however, works as normal.
+This kind of parenthesized group "locks up" the part of the pattern it contains
+once it has matched, and a failure further into the pattern is prevented from
+backtracking into it. Backtracking past it to previous items, however, works as
+normal.
 </P>
 <P>
 An alternative description is that a group of this type matches exactly the
@@ -3859,9 +3766,9 @@
 </P>
 <br><a name="SEC32" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 30 August 2021
+Last updated: 12 January 2022
 <br>
-Copyright &copy; 1997-2021 University of Cambridge.
+Copyright &copy; 1997-2022 University of Cambridge.
 <br>
 <p>
 Return to the <a href="index.html">PCRE2 index page</a>.
diff --git a/doc/html/pcre2serialize.html b/doc/html/pcre2serialize.html
index 18a8d7f..df4098e 100644
--- a/doc/html/pcre2serialize.html
+++ b/doc/html/pcre2serialize.html
@@ -23,12 +23,12 @@
 <br><a name="SEC1" href="#TOC1">SAVING AND RE-USING PRECOMPILED PCRE2 PATTERNS</a><br>
 <P>
 <b>int32_t pcre2_serialize_decode(pcre2_code **<i>codes</i>,</b>
-<b>  int32_t <i>number_of_codes</i>, const uint32_t *<i>bytes</i>,</b>
+<b>  int32_t <i>number_of_codes</i>, const uint8_t *<i>bytes</i>,</b>
 <b>  pcre2_general_context *<i>gcontext</i>);</b>
 <br>
 <br>
-<b>int32_t pcre2_serialize_encode(pcre2_code **<i>codes</i>,</b>
-<b>  int32_t <i>number_of_codes</i>, uint32_t **<i>serialized_bytes</i>,</b>
+<b>int32_t pcre2_serialize_encode(const pcre2_code **<i>codes</i>,</b>
+<b>  int32_t <i>number_of_codes</i>, uint8_t **<i>serialized_bytes</i>,</b>
 <b>  PCRE2_SIZE *<i>serialized_size</i>, pcre2_general_context *<i>gcontext</i>);</b>
 <br>
 <br>
@@ -154,7 +154,6 @@
 <b>malloc()</b> and <b>free()</b> are used. After deserialization, the byte
 stream is no longer needed and can be discarded.
 <pre>
-  int32_t number_of_codes;
   pcre2_code *list_of_codes[2];
   uint8_t *bytes = &#60;serialized data&#62;;
   int32_t number_of_codes =
diff --git a/doc/html/pcre2syntax.html b/doc/html/pcre2syntax.html
index 735eb69..8364c52 100644
--- a/doc/html/pcre2syntax.html
+++ b/doc/html/pcre2syntax.html
@@ -19,29 +19,31 @@
 <li><a name="TOC4" href="#SEC4">CHARACTER TYPES</a>
 <li><a name="TOC5" href="#SEC5">GENERAL CATEGORY PROPERTIES FOR \p and \P</a>
 <li><a name="TOC6" href="#SEC6">PCRE2 SPECIAL CATEGORY PROPERTIES FOR \p and \P</a>
-<li><a name="TOC7" href="#SEC7">SCRIPT NAMES FOR \p AND \P</a>
-<li><a name="TOC8" href="#SEC8">CHARACTER CLASSES</a>
-<li><a name="TOC9" href="#SEC9">QUANTIFIERS</a>
-<li><a name="TOC10" href="#SEC10">ANCHORS AND SIMPLE ASSERTIONS</a>
-<li><a name="TOC11" href="#SEC11">REPORTED MATCH POINT SETTING</a>
-<li><a name="TOC12" href="#SEC12">ALTERNATION</a>
-<li><a name="TOC13" href="#SEC13">CAPTURING</a>
-<li><a name="TOC14" href="#SEC14">ATOMIC GROUPS</a>
-<li><a name="TOC15" href="#SEC15">COMMENT</a>
-<li><a name="TOC16" href="#SEC16">OPTION SETTING</a>
-<li><a name="TOC17" href="#SEC17">NEWLINE CONVENTION</a>
-<li><a name="TOC18" href="#SEC18">WHAT \R MATCHES</a>
-<li><a name="TOC19" href="#SEC19">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a>
-<li><a name="TOC20" href="#SEC20">NON-ATOMIC LOOKAROUND ASSERTIONS</a>
-<li><a name="TOC21" href="#SEC21">SCRIPT RUNS</a>
-<li><a name="TOC22" href="#SEC22">BACKREFERENCES</a>
-<li><a name="TOC23" href="#SEC23">SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)</a>
-<li><a name="TOC24" href="#SEC24">CONDITIONAL PATTERNS</a>
-<li><a name="TOC25" href="#SEC25">BACKTRACKING CONTROL</a>
-<li><a name="TOC26" href="#SEC26">CALLOUTS</a>
-<li><a name="TOC27" href="#SEC27">SEE ALSO</a>
-<li><a name="TOC28" href="#SEC28">AUTHOR</a>
-<li><a name="TOC29" href="#SEC29">REVISION</a>
+<li><a name="TOC7" href="#SEC7">BINARY PROPERTIES FOR \p AND \P</a>
+<li><a name="TOC8" href="#SEC8">SCRIPT MATCHING WITH \p AND \P</a>
+<li><a name="TOC9" href="#SEC9">THE BIDI_CLASS PROPERTY FOR \p AND \P</a>
+<li><a name="TOC10" href="#SEC10">CHARACTER CLASSES</a>
+<li><a name="TOC11" href="#SEC11">QUANTIFIERS</a>
+<li><a name="TOC12" href="#SEC12">ANCHORS AND SIMPLE ASSERTIONS</a>
+<li><a name="TOC13" href="#SEC13">REPORTED MATCH POINT SETTING</a>
+<li><a name="TOC14" href="#SEC14">ALTERNATION</a>
+<li><a name="TOC15" href="#SEC15">CAPTURING</a>
+<li><a name="TOC16" href="#SEC16">ATOMIC GROUPS</a>
+<li><a name="TOC17" href="#SEC17">COMMENT</a>
+<li><a name="TOC18" href="#SEC18">OPTION SETTING</a>
+<li><a name="TOC19" href="#SEC19">NEWLINE CONVENTION</a>
+<li><a name="TOC20" href="#SEC20">WHAT \R MATCHES</a>
+<li><a name="TOC21" href="#SEC21">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a>
+<li><a name="TOC22" href="#SEC22">NON-ATOMIC LOOKAROUND ASSERTIONS</a>
+<li><a name="TOC23" href="#SEC23">SCRIPT RUNS</a>
+<li><a name="TOC24" href="#SEC24">BACKREFERENCES</a>
+<li><a name="TOC25" href="#SEC25">SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)</a>
+<li><a name="TOC26" href="#SEC26">CONDITIONAL PATTERNS</a>
+<li><a name="TOC27" href="#SEC27">BACKTRACKING CONTROL</a>
+<li><a name="TOC28" href="#SEC28">CALLOUTS</a>
+<li><a name="TOC29" href="#SEC29">SEE ALSO</a>
+<li><a name="TOC30" href="#SEC30">AUTHOR</a>
+<li><a name="TOC31" href="#SEC31">REVISION</a>
 </ul>
 <br><a name="SEC1" href="#TOC1">PCRE2 REGULAR EXPRESSION SYNTAX SUMMARY</a><br>
 <P>
@@ -136,6 +138,11 @@
 sequences is changed to use Unicode properties and they match many more
 characters.
 </P>
+<P>
+Property descriptions in \p and \P are matched caselessly; hyphens,
+underscores, and white space are ignored, in accordance with Unicode's "loose
+matching" rules.
+</P>
 <br><a name="SEC5" href="#TOC1">GENERAL CATEGORY PROPERTIES FOR \p and \P</a><br>
 <P>
 <pre>
@@ -152,6 +159,7 @@
   Lo         Other letter
   Lt         Title case letter
   Lu         Upper case letter
+  Lc         Ll, Lu, or Lt
   L&         Ll, Lu, or Lt
 
   M          Mark
@@ -198,171 +206,58 @@
 Perl and POSIX space are now the same. Perl added VT to its space character set
 at release 5.18.
 </P>
-<br><a name="SEC7" href="#TOC1">SCRIPT NAMES FOR \p AND \P</a><br>
+<br><a name="SEC7" href="#TOC1">BINARY PROPERTIES FOR \p AND \P</a><br>
 <P>
-Adlam,
-Ahom,
-Anatolian_Hieroglyphs,
-Arabic,
-Armenian,
-Avestan,
-Balinese,
-Bamum,
-Bassa_Vah,
-Batak,
-Bengali,
-Bhaiksuki,
-Bopomofo,
-Brahmi,
-Braille,
-Buginese,
-Buhid,
-Canadian_Aboriginal,
-Carian,
-Caucasian_Albanian,
-Chakma,
-Cham,
-Cherokee,
-Chorasmian,
-Common,
-Coptic,
-Cuneiform,
-Cypriot,
-Cypro_Minoan,
-Cyrillic,
-Deseret,
-Devanagari,
-Dives_Akuru,
-Dogra,
-Duployan,
-Egyptian_Hieroglyphs,
-Elbasan,
-Elymaic,
-Ethiopic,
-Georgian,
-Glagolitic,
-Gothic,
-Grantha,
-Greek,
-Gujarati,
-Gunjala_Gondi,
-Gurmukhi,
-Han,
-Hangul,
-Hanifi_Rohingya,
-Hanunoo,
-Hatran,
-Hebrew,
-Hiragana,
-Imperial_Aramaic,
-Inherited,
-Inscriptional_Pahlavi,
-Inscriptional_Parthian,
-Javanese,
-Kaithi,
-Kannada,
-Katakana,
-Kayah_Li,
-Kharoshthi,
-Khitan_Small_Script,
-Khmer,
-Khojki,
-Khudawadi,
-Lao,
-Latin,
-Lepcha,
-Limbu,
-Linear_A,
-Linear_B,
-Lisu,
-Lycian,
-Lydian,
-Mahajani,
-Makasar,
-Malayalam,
-Mandaic,
-Manichaean,
-Marchen,
-Masaram_Gondi,
-Medefaidrin,
-Meetei_Mayek,
-Mende_Kikakui,
-Meroitic_Cursive,
-Meroitic_Hieroglyphs,
-Miao,
-Modi,
-Mongolian,
-Mro,
-Multani,
-Myanmar,
-Nabataean,
-Nandinagari,
-New_Tai_Lue,
-Newa,
-Nko,
-Nushu,
-Nyakeng_Puachue_Hmong,
-Ogham,
-Ol_Chiki,
-Old_Hungarian,
-Old_Italic,
-Old_North_Arabian,
-Old_Permic,
-Old_Persian,
-Old_Sogdian,
-Old_South_Arabian,
-Old_Turkic,
-Old_Uyghur,
-Oriya,
-Osage,
-Osmanya,
-Pahawh_Hmong,
-Palmyrene,
-Pau_Cin_Hau,
-Phags_Pa,
-Phoenician,
-Psalter_Pahlavi,
-Rejang,
-Runic,
-Samaritan,
-Saurashtra,
-Sharada,
-Shavian,
-Siddham,
-SignWriting,
-Sinhala,
-Sogdian,
-Sora_Sompeng,
-Soyombo,
-Sundanese,
-Syloti_Nagri,
-Syriac,
-Tagalog,
-Tagbanwa,
-Tai_Le,
-Tai_Tham,
-Tai_Viet,
-Takri,
-Tamil,
-Tangsa,
-Tangut,
-Telugu,
-Thaana,
-Thai,
-Tibetan,
-Tifinagh,
-Tirhuta,
-Toto,
-Ugaritic,
-Vai,
-Vithkuqi,
-Wancho,
-Warang_Citi,
-Yezidi,
-Yi,
-Zanabazar_Square.
+Unicode defines a number of binary properties, that is, properties whose only
+values are true or false. You can obtain a list of those that are recognized by
+\p and \P, along with their abbreviations, by running this command:
+<pre>
+  pcre2test -LP
+</PRE>
 </P>
-<br><a name="SEC8" href="#TOC1">CHARACTER CLASSES</a><br>
+<br><a name="SEC8" href="#TOC1">SCRIPT MATCHING WITH \p AND \P</a><br>
+<P>
+Many script names and their 4-letter abbreviations are recognized in
+\p{sc:...} or \p{scx:...} items, or on their own with \p (and also \P of
+course). You can obtain a list of these scripts by running this command:
+<pre>
+  pcre2test -LS
+</PRE>
+</P>
+<br><a name="SEC9" href="#TOC1">THE BIDI_CLASS PROPERTY FOR \p AND \P</a><br>
+<P>
+<pre>
+  \p{Bidi_Class:&#60;class&#62;}   matches a character with the given class
+  \p{BC:&#60;class&#62;}           matches a character with the given class
+</pre>
+The recognized classes are:
+<pre>
+  AL          Arabic letter
+  AN          Arabic number
+  B           paragraph separator
+  BN          boundary neutral
+  CS          common separator
+  EN          European number
+  ES          European separator
+  ET          European terminator
+  FSI         first strong isolate
+  L           left-to-right
+  LRE         left-to-right embedding
+  LRI         left-to-right isolate
+  LRO         left-to-right override
+  NSM         non-spacing mark
+  ON          other neutral
+  PDF         pop directional format
+  PDI         pop directional isolate
+  R           right-to-left
+  RLE         right-to-left embedding
+  RLI         right-to-left isolate
+  RLO         right-to-left override
+  S           segment separator
+  WS          which space
+</PRE>
+</P>
+<br><a name="SEC10" href="#TOC1">CHARACTER CLASSES</a><br>
 <P>
 <pre>
   [...]       positive character class
@@ -390,7 +285,7 @@
 but some of them use Unicode properties if PCRE2_UCP is set. You can use
 \Q...\E inside a character class.
 </P>
-<br><a name="SEC9" href="#TOC1">QUANTIFIERS</a><br>
+<br><a name="SEC11" href="#TOC1">QUANTIFIERS</a><br>
 <P>
 <pre>
   ?           0 or 1, greedy
@@ -411,7 +306,7 @@
   {n,}?       n or more, lazy
 </PRE>
 </P>
-<br><a name="SEC10" href="#TOC1">ANCHORS AND SIMPLE ASSERTIONS</a><br>
+<br><a name="SEC12" href="#TOC1">ANCHORS AND SIMPLE ASSERTIONS</a><br>
 <P>
 <pre>
   \b          word boundary
@@ -429,7 +324,7 @@
   \G          first matching position in subject
 </PRE>
 </P>
-<br><a name="SEC11" href="#TOC1">REPORTED MATCH POINT SETTING</a><br>
+<br><a name="SEC13" href="#TOC1">REPORTED MATCH POINT SETTING</a><br>
 <P>
 <pre>
   \K          set reported start of match
@@ -439,13 +334,13 @@
 option is set, the previous behaviour is re-enabled. When this option is set,
 \K is honoured in positive assertions, but ignored in negative ones.
 </P>
-<br><a name="SEC12" href="#TOC1">ALTERNATION</a><br>
+<br><a name="SEC14" href="#TOC1">ALTERNATION</a><br>
 <P>
 <pre>
   expr|expr|expr...
 </PRE>
 </P>
-<br><a name="SEC13" href="#TOC1">CAPTURING</a><br>
+<br><a name="SEC15" href="#TOC1">CAPTURING</a><br>
 <P>
 <pre>
   (...)           capture group
@@ -460,20 +355,20 @@
 in UTF modes, any Unicode letters and Unicode decimal digits are permitted. In
 both cases, a name must not start with a digit.
 </P>
-<br><a name="SEC14" href="#TOC1">ATOMIC GROUPS</a><br>
+<br><a name="SEC16" href="#TOC1">ATOMIC GROUPS</a><br>
 <P>
 <pre>
   (?&#62;...)         atomic non-capture group
   (*atomic:...)   atomic non-capture group
 </PRE>
 </P>
-<br><a name="SEC15" href="#TOC1">COMMENT</a><br>
+<br><a name="SEC17" href="#TOC1">COMMENT</a><br>
 <P>
 <pre>
   (?#....)        comment (not nestable)
 </PRE>
 </P>
-<br><a name="SEC16" href="#TOC1">OPTION SETTING</a><br>
+<br><a name="SEC18" href="#TOC1">OPTION SETTING</a><br>
 <P>
 Changes of these options within a group are automatically cancelled at the end
 of the group.
@@ -518,7 +413,7 @@
 application can lock out the use of (*UTF) and (*UCP) by setting the
 PCRE2_NEVER_UTF or PCRE2_NEVER_UCP options, respectively, at compile time.
 </P>
-<br><a name="SEC17" href="#TOC1">NEWLINE CONVENTION</a><br>
+<br><a name="SEC19" href="#TOC1">NEWLINE CONVENTION</a><br>
 <P>
 These are recognized only at the very start of the pattern or after option
 settings with a similar syntax.
@@ -531,7 +426,7 @@
   (*NUL)          the NUL character (binary zero)
 </PRE>
 </P>
-<br><a name="SEC18" href="#TOC1">WHAT \R MATCHES</a><br>
+<br><a name="SEC20" href="#TOC1">WHAT \R MATCHES</a><br>
 <P>
 These are recognized only at the very start of the pattern or after option
 setting with a similar syntax.
@@ -540,7 +435,7 @@
   (*BSR_UNICODE)  any Unicode newline sequence
 </PRE>
 </P>
-<br><a name="SEC19" href="#TOC1">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a><br>
+<br><a name="SEC21" href="#TOC1">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a><br>
 <P>
 <pre>
   (?=...)                     )
@@ -561,7 +456,7 @@
 </pre>
 Each top-level branch of a lookbehind must be of a fixed length.
 </P>
-<br><a name="SEC20" href="#TOC1">NON-ATOMIC LOOKAROUND ASSERTIONS</a><br>
+<br><a name="SEC22" href="#TOC1">NON-ATOMIC LOOKAROUND ASSERTIONS</a><br>
 <P>
 These assertions are specific to PCRE2 and are not Perl-compatible.
 <pre>
@@ -574,7 +469,7 @@
   (*non_atomic_positive_lookbehind:...)  )
 </PRE>
 </P>
-<br><a name="SEC21" href="#TOC1">SCRIPT RUNS</a><br>
+<br><a name="SEC23" href="#TOC1">SCRIPT RUNS</a><br>
 <P>
 <pre>
   (*script_run:...)           ) script run, can be backtracked into
@@ -584,7 +479,7 @@
   (*asr:...)                  )
 </PRE>
 </P>
-<br><a name="SEC22" href="#TOC1">BACKREFERENCES</a><br>
+<br><a name="SEC24" href="#TOC1">BACKREFERENCES</a><br>
 <P>
 <pre>
   \n              reference by number (can be ambiguous)
@@ -601,7 +496,7 @@
   (?P=name)       reference by name (Python)
 </PRE>
 </P>
-<br><a name="SEC23" href="#TOC1">SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)</a><br>
+<br><a name="SEC25" href="#TOC1">SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)</a><br>
 <P>
 <pre>
   (?R)            recurse whole pattern
@@ -620,7 +515,7 @@
   \g'-n'          call subroutine by relative number (PCRE2 extension)
 </PRE>
 </P>
-<br><a name="SEC24" href="#TOC1">CONDITIONAL PATTERNS</a><br>
+<br><a name="SEC26" href="#TOC1">CONDITIONAL PATTERNS</a><br>
 <P>
 <pre>
   (?(condition)yes-pattern)
@@ -643,7 +538,7 @@
 conditions or recursion tests. Such a condition is interpreted as a reference
 condition if the relevant named group exists.
 </P>
-<br><a name="SEC25" href="#TOC1">BACKTRACKING CONTROL</a><br>
+<br><a name="SEC27" href="#TOC1">BACKTRACKING CONTROL</a><br>
 <P>
 All backtracking control verbs may be in the form (*VERB:NAME). For (*MARK) the
 name is mandatory, for the others it is optional. (*SKIP) changes its behaviour
@@ -670,7 +565,7 @@
 The effect of one of these verbs in a group called as a subroutine is confined
 to the subroutine call.
 </P>
-<br><a name="SEC26" href="#TOC1">CALLOUTS</a><br>
+<br><a name="SEC28" href="#TOC1">CALLOUTS</a><br>
 <P>
 <pre>
   (?C)            callout (assumed number 0)
@@ -681,12 +576,12 @@
 start and the end), and the starting delimiter { matched with the ending
 delimiter }. To encode the ending delimiter within the string, double it.
 </P>
-<br><a name="SEC27" href="#TOC1">SEE ALSO</a><br>
+<br><a name="SEC29" href="#TOC1">SEE ALSO</a><br>
 <P>
 <b>pcre2pattern</b>(3), <b>pcre2api</b>(3), <b>pcre2callout</b>(3),
 <b>pcre2matching</b>(3), <b>pcre2</b>(3).
 </P>
-<br><a name="SEC28" href="#TOC1">AUTHOR</a><br>
+<br><a name="SEC30" href="#TOC1">AUTHOR</a><br>
 <P>
 Philip Hazel
 <br>
@@ -695,11 +590,11 @@
 Cambridge, England.
 <br>
 </P>
-<br><a name="SEC29" href="#TOC1">REVISION</a><br>
+<br><a name="SEC31" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 30 August 2021
+Last updated: 12 January 2022
 <br>
-Copyright &copy; 1997-2021 University of Cambridge.
+Copyright &copy; 1997-2022 University of Cambridge.
 <br>
 <p>
 Return to the <a href="index.html">PCRE2 index page</a>.
diff --git a/doc/html/pcre2test.html b/doc/html/pcre2test.html
index 3ee51cd..373e5df 100644
--- a/doc/html/pcre2test.html
+++ b/doc/html/pcre2test.html
@@ -78,7 +78,7 @@
 </P>
 <P>
 In the rest of this document, the names of library functions and structures
-are given in generic form, for example, <b>pcre_compile()</b>. The actual
+are given in generic form, for example, <b>pcre2_compile()</b>. The actual
 names used in the libraries have a suffix _8, _16, or _32, as appropriate.
 <a name="inputencoding"></a></P>
 <br><a name="SEC3" href="#TOC1">INPUT ENCODING</a><br>
@@ -253,7 +253,19 @@
 <b>-LM</b>
 List modifiers: write a list of available pattern and subject modifiers to the
 standard output, then exit with zero exit code. All other options are ignored.
-If both -C and -LM are present, whichever is first is recognized.
+If both -C and any -Lx options are present, whichever is first is recognized.
+</P>
+<P>
+<b>-LP</b>
+List properties: write a list of recognized Unicode properties to the standard
+output, then exit with zero exit code. All other options are ignored. If both
+-C and any -Lx options are present, whichever is first is recognized.
+</P>
+<P>
+<b>-LS</b>
+List scripts: write a list of recogized Unicode script names to the standard
+output, then exit with zero exit code. All other options are ignored. If both
+-C and any -Lx options are present, whichever is first is recognized.
 </P>
 <P>
 <b>-pattern</b> <i>modifier-list</i>
@@ -1239,6 +1251,8 @@
       match_limit=&#60;n&#62;            set a match limit
       memory                     show heap memory usage
       null_context               match with a NULL context
+      null_replacement           substitute with NULL replacement
+      null_subject               match with NULL subject
       offset=&#60;n&#62;                 set starting offset
       offset_limit=&#60;n&#62;           set offset limit
       ovector=&#60;n&#62;                set size of output vector
@@ -1668,7 +1682,7 @@
 passing the replacement string as zero-terminated.
 </P>
 <br><b>
-Passing a NULL context
+Passing a NULL context, subject, or replacement
 </b><br>
 <P>
 Normally, <b>pcre2test</b> passes a context block to <b>pcre2_match()</b>,
@@ -1678,6 +1692,11 @@
 case (they use default values). This modifier cannot be used with the
 <b>find_limits</b> or <b>substitute_callout</b> modifiers.
 </P>
+<P>
+Similarly, for testing purposes, if the <b>null_subject</b> or
+<b>null_replacement</b> modifier is set, the subject or replacement string
+pointers are passed as NULL, respectively, to the relevant functions.
+</P>
 <br><a name="SEC12" href="#TOC1">THE ALTERNATIVE MATCHING FUNCTION</a><br>
 <P>
 By default, <b>pcre2test</b> uses the standard PCRE2 matching function,
@@ -2122,9 +2141,9 @@
 </P>
 <br><a name="SEC21" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 30 August 2021
+Last updated: 12 January 2022
 <br>
-Copyright &copy; 1997-2021 University of Cambridge.
+Copyright &copy; 1997-2022 University of Cambridge.
 <br>
 <p>
 Return to the <a href="index.html">PCRE2 index page</a>.
diff --git a/doc/html/pcre2unicode.html b/doc/html/pcre2unicode.html
index 76ca6ea..a0d4270 100644
--- a/doc/html/pcre2unicode.html
+++ b/doc/html/pcre2unicode.html
@@ -50,17 +50,18 @@
 <P>
 When PCRE2 is built with Unicode support, the escape sequences \p{..},
 \P{..}, and \X can be used. This is not dependent on the PCRE2_UTF setting.
-The Unicode properties that can be tested are limited to the general category
-properties such as Lu for an upper case letter or Nd for a decimal number, the
-Unicode script names such as Arabic or Han, and the derived properties Any and
-L&. Full lists are given in the
+The Unicode properties that can be tested are a subset of those that Perl
+supports. Currently they are limited to the general category properties such as
+Lu for an upper case letter or Nd for a decimal number, the Unicode script
+names such as Arabic or Han, Bidi_Class, Bidi_Control, and the derived
+properties Any and LC (synonym L&). Full lists are given in the
 <a href="pcre2pattern.html"><b>pcre2pattern</b></a>
 and
 <a href="pcre2syntax.html"><b>pcre2syntax</b></a>
-documentation. Only the short names for properties are supported. For example,
-\p{L} matches a letter. Its Perl synonym, \p{Letter}, is not supported.
-Furthermore, in Perl, many properties may optionally be prefixed by "Is", for
-compatibility with Perl 5.6. PCRE2 does not support this.
+documentation. In general, only the short names for properties are supported.
+For example, \p{L} matches a letter. Its longer synonym, \p{Letter}, is not
+supported. Furthermore, in Perl, many properties may optionally be prefixed by
+"Is", for compatibility with Perl 5.6. PCRE2 does not support this.
 </P>
 <br><b>
 WIDE CHARACTERS AND UTF MODES
@@ -477,7 +478,7 @@
 <P>
 Philip Hazel
 <br>
-University Computing Service
+Retired from University Computing Service
 <br>
 Cambridge, England.
 <br>
@@ -486,9 +487,9 @@
 REVISION
 </b><br>
 <P>
-Last updated: 23 February 2020
+Last updated: 22 December 2021
 <br>
-Copyright &copy; 1997-2020 University of Cambridge.
+Copyright &copy; 1997-2021 University of Cambridge.
 <br>
 <p>
 Return to the <a href="index.html">PCRE2 index page</a>.
diff --git a/doc/pcre2.txt b/doc/pcre2.txt
index dde66a1..641a1f9 100644
--- a/doc/pcre2.txt
+++ b/doc/pcre2.txt
@@ -1815,7 +1815,7 @@
        to crash or loop.
 
        Note  that  this  option  can  also  be  passed  to  pcre2_match()  and
-       pcre_dfa_match(),  to  suppress  UTF  validity  checking of the subject
+       pcre2_dfa_match(),  to  suppress  UTF  validity checking of the subject
        string.
 
        Note also that setting PCRE2_NO_UTF_CHECK at compile time does not dis-
@@ -2012,13 +2012,13 @@
        code  points  are  less than 256. By default, higher-valued code points
        never match escapes such as \w or \d.
 
-       When PCRE2 is built with Unicode support  (the  default),  the  Unicode
-       properties of all characters can be tested with \p and \P, or, alterna-
-       tively, the PCRE2_UCP option can be set when  a  pattern  is  compiled;
-       this  causes  \w and friends to use Unicode property support instead of
-       the built-in tables.  PCRE2_UCP also causes upper/lower  casing  opera-
-       tions  on  characters  with code points greater than 127 to use Unicode
-       properties. These effects apply even when PCRE2_UTF is not set.
+       When PCRE2 is built with Unicode support (the default), certain Unicode
+       character  properties  can be tested with \p and \P, or, alternatively,
+       the PCRE2_UCP option can be set when a pattern is compiled; this causes
+       \w  and friends to use Unicode property support instead of the built-in
+       tables.  PCRE2_UCP also causes upper/lower casing operations on charac-
+       ters with code points greater than 127 to use Unicode properties. These
+       effects apply even when PCRE2_UTF is not set.
 
        The use of locales with Unicode is discouraged.  If  you  are  handling
        characters  with  code  points  greater than 127, you should either use
@@ -2579,7 +2579,9 @@
        and  offset  are  in  code units, not characters.  That is, they are in
        bytes for the 8-bit library, 16-bit code units for the 16-bit  library,
        and  32-bit  code units for the 32-bit library, whether or not UTF pro-
-       cessing is enabled.
+       cessing is enabled. As a special case, if subject is NULL and length is
+       zero,  the  subject is assumed to be an empty string. If length is non-
+       zero, an error occurs if subject is NULL.
 
        If startoffset is greater than the length of the subject, pcre2_match()
        returns  PCRE2_ERROR_BADOFFSET.  When  the starting offset is zero, the
@@ -3280,8 +3282,12 @@
 
        This  function  optionally calls pcre2_match() and then makes a copy of
        the subject string in outputbuffer, replacing parts that  were  matched
-       with  the replacement string, whose length is supplied in rlength. This
-       can be given as PCRE2_ZERO_TERMINATED  for  a  zero-terminated  string.
+       with the replacement string, whose length is supplied in rlength, which
+       can be given as PCRE2_ZERO_TERMINATED for a zero-terminated string.  As
+       a  special  case,  if  replacement is NULL and rlength is zero, the re-
+       placement is assumed to be an empty string. If rlength is non-zero,  an
+       error occurs if replacement is NULL.
+
        There is an option (see PCRE2_SUBSTITUTE_REPLACEMENT_ONLY below) to re-
        turn just the replacement string(s). The default action is  to  perform
        just  one  replacement  if  the pattern matches, but there is an option
@@ -3315,89 +3321,90 @@
        As  well as the usual options for pcre2_match(), a number of additional
        options can be set in the options argument of pcre2_substitute().   One
        such  option is PCRE2_SUBSTITUTE_MATCHED. When this is set, an external
-       match_data block must be provided, and it must have been  used  for  an
-       external  call  to pcre2_match(). The data in the match_data block (re-
-       turn code, offset vector) is used for the first substitution instead of
-       calling  pcre2_match()  from  within pcre2_substitute(). This allows an
-       application to check for a match before choosing to substitute, without
-       having to repeat the match.
+       match_data block must be provided, and it must have already  been  used
+       for an external call to pcre2_match() with the same pattern and subject
+       arguments. The data in the match_data block (return code,  offset  vec-
+       tor)  is  then  used  for  the  first  substitution  instead of calling
+       pcre2_match() from within pcre2_substitute(). This allows  an  applica-
+       tion to check for a match before choosing to substitute, without having
+       to repeat the match.
 
-       The  contents  of  the  externally  supplied  match  data block are not
-       changed  when  PCRE2_SUBSTITUTE_MATCHED  is   set.   If   PCRE2_SUBSTI-
-       TUTE_GLOBAL  is  also set, pcre2_match() is called after the first sub-
-       stitution to check for further matches, but this is done using  an  in-
-       ternally  obtained  match  data block, thus always leaving the external
+       The contents of the  externally  supplied  match  data  block  are  not
+       changed   when   PCRE2_SUBSTITUTE_MATCHED   is  set.  If  PCRE2_SUBSTI-
+       TUTE_GLOBAL is also set, pcre2_match() is called after the  first  sub-
+       stitution  to  check for further matches, but this is done using an in-
+       ternally obtained match data block, thus always  leaving  the  external
        block unchanged.
 
-       The code argument is not used for matching before the  first  substitu-
-       tion  when  PCRE2_SUBSTITUTE_MATCHED  is  set, but it must be provided,
-       even when PCRE2_SUBSTITUTE_GLOBAL is not set, because it  contains  in-
+       The  code  argument is not used for matching before the first substitu-
+       tion when PCRE2_SUBSTITUTE_MATCHED is set, but  it  must  be  provided,
+       even  when  PCRE2_SUBSTITUTE_GLOBAL is not set, because it contains in-
        formation such as the UTF setting and the number of capturing parenthe-
        ses in the pattern.
 
-       The default action of pcre2_substitute() is to return  a  copy  of  the
+       The  default  action  of  pcre2_substitute() is to return a copy of the
        subject string with matched substrings replaced. However, if PCRE2_SUB-
-       STITUTE_REPLACEMENT_ONLY is set, only the  replacement  substrings  are
+       STITUTE_REPLACEMENT_ONLY  is  set,  only the replacement substrings are
        returned. In the global case, multiple replacements are concatenated in
-       the output buffer. Substitution callouts (see below)  can  be  used  to
+       the  output  buffer.  Substitution  callouts (see below) can be used to
        separate them if necessary.
 
-       The  outlengthptr  argument of pcre2_substitute() must point to a vari-
-       able that contains the length, in code units, of the output buffer.  If
-       the  function is successful, the value is updated to contain the length
-       in code units of the new string, excluding the trailing  zero  that  is
+       The outlengthptr argument of pcre2_substitute() must point to  a  vari-
+       able  that contains the length, in code units, of the output buffer. If
+       the function is successful, the value is updated to contain the  length
+       in  code  units  of the new string, excluding the trailing zero that is
        automatically added.
 
-       If  the  function is not successful, the value set via outlengthptr de-
-       pends on the type of  error.  For  syntax  errors  in  the  replacement
+       If the function is not successful, the value set via  outlengthptr  de-
+       pends  on  the  type  of  error.  For  syntax errors in the replacement
        string, the value is the offset in the replacement string where the er-
-       ror was detected. For other errors, the value  is  PCRE2_UNSET  by  de-
+       ror  was  detected.  For  other errors, the value is PCRE2_UNSET by de-
        fault. This includes the case of the output buffer being too small, un-
        less PCRE2_SUBSTITUTE_OVERFLOW_LENGTH is set.
 
-       PCRE2_SUBSTITUTE_OVERFLOW_LENGTH changes what happens when  the  output
+       PCRE2_SUBSTITUTE_OVERFLOW_LENGTH  changes  what happens when the output
        buffer is too small. The default action is to return PCRE2_ERROR_NOMEM-
-       ORY immediately. If this option  is  set,  however,  pcre2_substitute()
+       ORY  immediately.  If  this  option is set, however, pcre2_substitute()
        continues to go through the motions of matching and substituting (with-
-       out, of course, writing anything) in order to compute the size of  buf-
-       fer  that  is  needed.  This  value is passed back via the outlengthptr
-       variable, with  the  result  of  the  function  still  being  PCRE2_ER-
+       out,  of course, writing anything) in order to compute the size of buf-
+       fer that is needed. This value is  passed  back  via  the  outlengthptr
+       variable,  with  the  result  of  the  function  still  being PCRE2_ER-
        ROR_NOMEMORY.
 
-       Passing  a  buffer  size  of zero is a permitted way of finding out how
-       much memory is needed for given substitution. However, this  does  mean
+       Passing a buffer size of zero is a permitted way  of  finding  out  how
+       much  memory  is needed for given substitution. However, this does mean
        that the entire operation is carried out twice. Depending on the appli-
-       cation, it may be more efficient to allocate a large  buffer  and  free
-       the   excess   afterwards,   instead  of  using  PCRE2_SUBSTITUTE_OVER-
+       cation,  it  may  be more efficient to allocate a large buffer and free
+       the  excess  afterwards,  instead   of   using   PCRE2_SUBSTITUTE_OVER-
        FLOW_LENGTH.
 
-       The replacement string, which is interpreted as a  UTF  string  in  UTF
-       mode,  is checked for UTF validity unless PCRE2_NO_UTF_CHECK is set. An
+       The  replacement  string,  which  is interpreted as a UTF string in UTF
+       mode, is checked for UTF validity unless PCRE2_NO_UTF_CHECK is set.  An
        invalid UTF replacement string causes an immediate return with the rel-
        evant UTF error code.
 
-       If  PCRE2_SUBSTITUTE_LITERAL  is set, the replacement string is not in-
+       If PCRE2_SUBSTITUTE_LITERAL is set, the replacement string is  not  in-
        terpreted in any way. By default, however, a dollar character is an es-
-       cape  character  that can specify the insertion of characters from cap-
-       ture groups and names from (*MARK) or other control verbs in  the  pat-
+       cape character that can specify the insertion of characters  from  cap-
+       ture  groups  and names from (*MARK) or other control verbs in the pat-
        tern. The following forms are always recognized:
 
          $$                  insert a dollar character
          $<n> or ${<n>}      insert the contents of group <n>
          $*MARK or ${*MARK}  insert a control verb name
 
-       Either  a  group  number  or  a  group name can be given for <n>. Curly
-       brackets are required only if the following character would  be  inter-
+       Either a group number or a group name  can  be  given  for  <n>.  Curly
+       brackets  are  required only if the following character would be inter-
        preted as part of the number or name. The number may be zero to include
-       the entire matched string.   For  example,  if  the  pattern  a(b)c  is
-       matched  with "=abc=" and the replacement string "+$1$0$1+", the result
+       the  entire  matched  string.   For  example,  if  the pattern a(b)c is
+       matched with "=abc=" and the replacement string "+$1$0$1+", the  result
        is "=+babcb+=".
 
-       $*MARK inserts the name from the last encountered backtracking  control
-       verb  on the matching path that has a name. (*MARK) must always include
-       a name, but the other verbs need not.  For  example,  in  the  case  of
+       $*MARK  inserts the name from the last encountered backtracking control
+       verb on the matching path that has a name. (*MARK) must always  include
+       a  name,  but  the  other  verbs  need not. For example, in the case of
        (*MARK:A)(*PRUNE) the name inserted is "A", but for (*MARK:A)(*PRUNE:B)
-       the relevant name is "B". This facility can be used to  perform  simple
+       the  relevant  name is "B". This facility can be used to perform simple
        simultaneous substitutions, as this pcre2test example shows:
 
          /(*MARK:pear)apple|(*MARK:orange)lemon/g,replace=${*MARK}
@@ -3405,15 +3412,15 @@
           2: pear orange
 
        PCRE2_SUBSTITUTE_GLOBAL causes the function to iterate over the subject
-       string, replacing every matching substring. If this option is not  set,
-       only  the  first matching substring is replaced. The search for matches
-       takes place in the original subject string (that is, previous  replace-
-       ments  do  not  affect  it).  Iteration is implemented by advancing the
-       startoffset value for each search, which is always  passed  the  entire
+       string,  replacing every matching substring. If this option is not set,
+       only the first matching substring is replaced. The search  for  matches
+       takes  place in the original subject string (that is, previous replace-
+       ments do not affect it).  Iteration is  implemented  by  advancing  the
+       startoffset  value  for  each search, which is always passed the entire
        subject string. If an offset limit is set in the match context, search-
        ing stops when that limit is reached.
 
-       You can restrict the effect of a global substitution to  a  portion  of
+       You  can  restrict  the effect of a global substitution to a portion of
        the subject string by setting either or both of startoffset and an off-
        set limit. Here is a pcre2test example:
 
@@ -3421,73 +3428,73 @@
          ABC ABC ABC ABC\=offset=3,offset_limit=12
           2: ABC A!C A!C ABC
 
-       When continuing with global substitutions after  matching  a  substring
+       When  continuing  with  global substitutions after matching a substring
        with zero length, an attempt to find a non-empty match at the same off-
        set is performed.  If this is not successful, the offset is advanced by
        one character except when CRLF is a valid newline sequence and the next
-       two characters are CR, LF. In this case, the offset is advanced by  two
+       two  characters are CR, LF. In this case, the offset is advanced by two
        characters.
 
        PCRE2_SUBSTITUTE_UNKNOWN_UNSET causes references to capture groups that
        do not appear in the pattern to be treated as unset groups. This option
-       should  be used with care, because it means that a typo in a group name
+       should be used with care, because it means that a typo in a group  name
        or number no longer causes the PCRE2_ERROR_NOSUBSTRING error.
 
        PCRE2_SUBSTITUTE_UNSET_EMPTY causes unset capture groups (including un-
-       known  groups when PCRE2_SUBSTITUTE_UNKNOWN_UNSET is set) to be treated
-       as empty strings when inserted as described above. If  this  option  is
+       known groups when PCRE2_SUBSTITUTE_UNKNOWN_UNSET is set) to be  treated
+       as  empty  strings  when inserted as described above. If this option is
        not set, an attempt to insert an unset group causes the PCRE2_ERROR_UN-
-       SET error. This option does not  influence  the  extended  substitution
+       SET  error.  This  option  does not influence the extended substitution
        syntax described below.
 
-       PCRE2_SUBSTITUTE_EXTENDED  causes extra processing to be applied to the
-       replacement string. Without this option, only the dollar  character  is
-       special,  and  only  the  group insertion forms listed above are valid.
+       PCRE2_SUBSTITUTE_EXTENDED causes extra processing to be applied to  the
+       replacement  string.  Without this option, only the dollar character is
+       special, and only the group insertion forms  listed  above  are  valid.
        When PCRE2_SUBSTITUTE_EXTENDED is set, two things change:
 
-       Firstly, backslash in a replacement string is interpreted as an  escape
+       Firstly,  backslash in a replacement string is interpreted as an escape
        character. The usual forms such as \n or \x{ddd} can be used to specify
-       particular character codes, and backslash followed by any  non-alphanu-
-       meric  character  quotes  that character. Extended quoting can be coded
+       particular  character codes, and backslash followed by any non-alphanu-
+       meric character quotes that character. Extended quoting  can  be  coded
        using \Q...\E, exactly as in pattern strings.
 
-       There are also four escape sequences for forcing the case  of  inserted
-       letters.   The  insertion  mechanism has three states: no case forcing,
+       There  are  also four escape sequences for forcing the case of inserted
+       letters.  The insertion mechanism has three states:  no  case  forcing,
        force upper case, and force lower case. The escape sequences change the
        current state: \U and \L change to upper or lower case forcing, respec-
-       tively, and \E (when not terminating a \Q quoted sequence)  reverts  to
-       no  case  forcing. The sequences \u and \l force the next character (if
-       it is a letter) to upper or lower  case,  respectively,  and  then  the
+       tively,  and  \E (when not terminating a \Q quoted sequence) reverts to
+       no case forcing. The sequences \u and \l force the next  character  (if
+       it  is  a  letter)  to  upper or lower case, respectively, and then the
        state automatically reverts to no case forcing. Case forcing applies to
-       all inserted  characters, including those from capture groups and  let-
-       ters  within \Q...\E quoted sequences. If either PCRE2_UTF or PCRE2_UCP
-       was set when the pattern was compiled, Unicode properties are used  for
+       all  inserted  characters, including those from capture groups and let-
+       ters within \Q...\E quoted sequences. If either PCRE2_UTF or  PCRE2_UCP
+       was  set when the pattern was compiled, Unicode properties are used for
        case forcing characters whose code points are greater than 127.
 
        Note that case forcing sequences such as \U...\E do not nest. For exam-
-       ple, the result of processing "\Uaa\LBB\Ecc\E" is "AAbbcc";  the  final
-       \E  has  no  effect.  Note  also  that the PCRE2_ALT_BSUX and PCRE2_EX-
+       ple,  the  result of processing "\Uaa\LBB\Ecc\E" is "AAbbcc"; the final
+       \E has no effect. Note  also  that  the  PCRE2_ALT_BSUX  and  PCRE2_EX-
        TRA_ALT_BSUX options do not apply to replacement strings.
 
-       The second effect of setting PCRE2_SUBSTITUTE_EXTENDED is to  add  more
-       flexibility  to  capture  group  substitution. The syntax is similar to
+       The  second  effect of setting PCRE2_SUBSTITUTE_EXTENDED is to add more
+       flexibility to capture group substitution. The  syntax  is  similar  to
        that used by Bash:
 
          ${<n>:-<string>}
          ${<n>:+<string1>:<string2>}
 
-       As before, <n> may be a group number or a name. The first  form  speci-
-       fies  a  default  value. If group <n> is set, its value is inserted; if
-       not, <string> is expanded and the  result  inserted.  The  second  form
-       specifies  strings that are expanded and inserted when group <n> is set
-       or unset, respectively. The first form is just a  convenient  shorthand
+       As  before,  <n> may be a group number or a name. The first form speci-
+       fies a default value. If group <n> is set, its value  is  inserted;  if
+       not,  <string>  is  expanded  and  the result inserted. The second form
+       specifies strings that are expanded and inserted when group <n> is  set
+       or  unset,  respectively. The first form is just a convenient shorthand
        for
 
          ${<n>:+${<n>}:<string>}
 
-       Backslash  can  be  used to escape colons and closing curly brackets in
-       the replacement strings. A change of the case forcing  state  within  a
-       replacement  string  remains  in  force  afterwards,  as  shown in this
+       Backslash can be used to escape colons and closing  curly  brackets  in
+       the  replacement  strings.  A change of the case forcing state within a
+       replacement string remains  in  force  afterwards,  as  shown  in  this
        pcre2test example:
 
          /(some)?(body)/substitute_extended,replace=${1:+\U:\L}HeLLo
@@ -3496,8 +3503,8 @@
              somebody
           1: HELLO
 
-       The PCRE2_SUBSTITUTE_UNSET_EMPTY option does not affect these  extended
-       substitutions.  However,  PCRE2_SUBSTITUTE_UNKNOWN_UNSET does cause un-
+       The  PCRE2_SUBSTITUTE_UNSET_EMPTY option does not affect these extended
+       substitutions. However, PCRE2_SUBSTITUTE_UNKNOWN_UNSET does  cause  un-
        known groups in the extended syntax forms to be treated as unset.
 
        If  PCRE2_SUBSTITUTE_LITERAL  is  set,  PCRE2_SUBSTITUTE_UNKNOWN_UNSET,
@@ -3506,37 +3513,39 @@
 
    Substitution errors
 
-       In the event of an error, pcre2_substitute() returns a  negative  error
-       code.  Except for PCRE2_ERROR_NOMATCH (which is never returned), errors
+       In  the  event of an error, pcre2_substitute() returns a negative error
+       code. Except for PCRE2_ERROR_NOMATCH (which is never returned),  errors
        from pcre2_match() are passed straight back.
 
        PCRE2_ERROR_NOSUBSTRING is returned for a non-existent substring inser-
        tion, unless PCRE2_SUBSTITUTE_UNKNOWN_UNSET is set.
 
        PCRE2_ERROR_UNSET is returned for an unset substring insertion (includ-
-       ing an unknown substring when  PCRE2_SUBSTITUTE_UNKNOWN_UNSET  is  set)
-       when  the simple (non-extended) syntax is used and PCRE2_SUBSTITUTE_UN-
+       ing  an  unknown  substring when PCRE2_SUBSTITUTE_UNKNOWN_UNSET is set)
+       when the simple (non-extended) syntax is used and  PCRE2_SUBSTITUTE_UN-
        SET_EMPTY is not set.
 
-       PCRE2_ERROR_NOMEMORY is returned  if  the  output  buffer  is  not  big
+       PCRE2_ERROR_NOMEMORY  is  returned  if  the  output  buffer  is not big
        enough. If the PCRE2_SUBSTITUTE_OVERFLOW_LENGTH option is set, the size
-       of buffer that is needed is returned via outlengthptr. Note  that  this
+       of  buffer  that is needed is returned via outlengthptr. Note that this
        does not happen by default.
 
        PCRE2_ERROR_NULL is returned if PCRE2_SUBSTITUTE_MATCHED is set but the
-       match_data argument is NULL.
+       match_data  argument is NULL or if the subject or replacement arguments
+       are NULL. For backward compatibility reasons an exception is  made  for
+       the replacement argument if the rlength argument is also 0.
 
-       PCRE2_ERROR_BADREPLACEMENT is used for miscellaneous syntax  errors  in
-       the  replacement  string,  with  more particular errors being PCRE2_ER-
+       PCRE2_ERROR_BADREPLACEMENT  is  used for miscellaneous syntax errors in
+       the replacement string, with more  particular  errors  being  PCRE2_ER-
        ROR_BADREPESCAPE (invalid escape sequence), PCRE2_ERROR_REPMISSINGBRACE
-       (closing  curly bracket not found), PCRE2_ERROR_BADSUBSTITUTION (syntax
-       error in extended group substitution),  and  PCRE2_ERROR_BADSUBSPATTERN
+       (closing curly bracket not found), PCRE2_ERROR_BADSUBSTITUTION  (syntax
+       error  in  extended group substitution), and PCRE2_ERROR_BADSUBSPATTERN
        (the pattern match ended before it started or the match started earlier
-       than the current position in the subject, which can  happen  if  \K  is
+       than  the  current  position  in the subject, which can happen if \K is
        used in an assertion).
 
        As for all PCRE2 errors, a text message that describes the error can be
-       obtained by calling the pcre2_get_error_message()  function  (see  "Ob-
+       obtained  by  calling  the pcre2_get_error_message() function (see "Ob-
        taining a textual error message" above).
 
    Substitution callouts
@@ -3545,15 +3554,15 @@
          int (*callout_function)(pcre2_substitute_callout_block *, void *),
          void *callout_data);
 
-       The  pcre2_set_substitution_callout() function can be used to specify a
-       callout function for pcre2_substitute(). This information is passed  in
+       The pcre2_set_substitution_callout() function can be used to specify  a
+       callout  function for pcre2_substitute(). This information is passed in
        a match context. The callout function is called after each substitution
        has been processed, but it can cause the replacement not to happen. The
-       callout  function is not called for simulated substitutions that happen
+       callout function is not called for simulated substitutions that  happen
        as a result of the PCRE2_SUBSTITUTE_OVERFLOW_LENGTH option.
 
        The first argument of the callout function is a pointer to a substitute
-       callout  block structure, which contains the following fields, not nec-
+       callout block structure, which contains the following fields, not  nec-
        essarily in this order:
 
          uint32_t    version;
@@ -3564,34 +3573,34 @@
          uint32_t    oveccount;
          PCRE2_SIZE  output_offsets[2];
 
-       The version field contains the version number of the block format.  The
-       current  version  is  0.  The version number will increase in future if
-       more fields are added, but the intention is never to remove any of  the
+       The  version field contains the version number of the block format. The
+       current version is 0. The version number will  increase  in  future  if
+       more  fields are added, but the intention is never to remove any of the
        existing fields.
 
        The subscount field is the number of the current match. It is 1 for the
        first callout, 2 for the second, and so on. The input and output point-
        ers are copies of the values passed to pcre2_substitute().
 
-       The  ovector  field points to the ovector, which contains the result of
+       The ovector field points to the ovector, which contains the  result  of
        the most recent match. The oveccount field contains the number of pairs
        that are set in the ovector, and is always greater than zero.
 
-       The  output_offsets  vector  contains the offsets of the replacement in
-       the output string. This has already been processed for dollar  and  (if
+       The output_offsets vector contains the offsets of  the  replacement  in
+       the  output  string. This has already been processed for dollar and (if
        requested) backslash substitutions as described above.
 
-       The  second  argument  of  the  callout function is the value passed as
-       callout_data when the function was registered. The  value  returned  by
+       The second argument of the callout function  is  the  value  passed  as
+       callout_data  when  the  function was registered. The value returned by
        the callout function is interpreted as follows:
 
-       If  the  value is zero, the replacement is accepted, and, if PCRE2_SUB-
-       STITUTE_GLOBAL is set, processing continues with a search for the  next
-       match.  If  the  value  is not zero, the current replacement is not ac-
-       cepted. If the value is greater than zero,  processing  continues  when
-       PCRE2_SUBSTITUTE_GLOBAL  is set. Otherwise (the value is less than zero
-       or PCRE2_SUBSTITUTE_GLOBAL is not set), the the rest of  the  input  is
-       copied  to the output and the call to pcre2_substitute() exits, return-
+       If the value is zero, the replacement is accepted, and,  if  PCRE2_SUB-
+       STITUTE_GLOBAL  is set, processing continues with a search for the next
+       match. If the value is not zero, the current  replacement  is  not  ac-
+       cepted.  If  the  value is greater than zero, processing continues when
+       PCRE2_SUBSTITUTE_GLOBAL is set. Otherwise (the value is less than  zero
+       or  PCRE2_SUBSTITUTE_GLOBAL  is  not set), the the rest of the input is
+       copied to the output and the call to pcre2_substitute() exits,  return-
        ing the number of matches so far.
 
 
@@ -3600,56 +3609,56 @@
        int pcre2_substring_nametable_scan(const pcre2_code *code,
          PCRE2_SPTR name, PCRE2_SPTR *first, PCRE2_SPTR *last);
 
-       When a pattern is compiled with the PCRE2_DUPNAMES  option,  names  for
-       capture  groups  are not required to be unique. Duplicate names are al-
-       ways allowed for groups with the same number, created by using the  (?|
+       When  a  pattern  is compiled with the PCRE2_DUPNAMES option, names for
+       capture groups are not required to be unique. Duplicate names  are  al-
+       ways  allowed for groups with the same number, created by using the (?|
        feature. Indeed, if such groups are named, they are required to use the
        same names.
 
-       Normally, patterns that use duplicate names are such that  in  any  one
-       match,  only  one of each set of identically-named groups participates.
+       Normally,  patterns  that  use duplicate names are such that in any one
+       match, only one of each set of identically-named  groups  participates.
        An example is shown in the pcre2pattern documentation.
 
-       When  duplicates   are   present,   pcre2_substring_copy_byname()   and
-       pcre2_substring_get_byname()  return  the first substring corresponding
-       to the given name that is set. Only if none are set is  PCRE2_ERROR_UN-
-       SET  is  returned.  The pcre2_substring_number_from_name() function re-
-       turns the error PCRE2_ERROR_NOUNIQUESUBSTRING when there are  duplicate
+       When   duplicates   are   present,   pcre2_substring_copy_byname()  and
+       pcre2_substring_get_byname() return the first  substring  corresponding
+       to  the given name that is set. Only if none are set is PCRE2_ERROR_UN-
+       SET is returned. The  pcre2_substring_number_from_name()  function  re-
+       turns  the error PCRE2_ERROR_NOUNIQUESUBSTRING when there are duplicate
        names.
 
-       If  you want to get full details of all captured substrings for a given
-       name, you must use the pcre2_substring_nametable_scan()  function.  The
-       first  argument is the compiled pattern, and the second is the name. If
-       the third and fourth arguments are NULL, the function returns  a  group
+       If you want to get full details of all captured substrings for a  given
+       name,  you  must use the pcre2_substring_nametable_scan() function. The
+       first argument is the compiled pattern, and the second is the name.  If
+       the  third  and fourth arguments are NULL, the function returns a group
        number for a unique name, or PCRE2_ERROR_NOUNIQUESUBSTRING otherwise.
 
        When the third and fourth arguments are not NULL, they must be pointers
-       to variables that are updated by the function. After it has  run,  they
+       to  variables  that are updated by the function. After it has run, they
        point to the first and last entries in the name-to-number table for the
-       given name, and the function returns the length of each entry  in  code
-       units.  In both cases, PCRE2_ERROR_NOSUBSTRING is returned if there are
+       given  name,  and the function returns the length of each entry in code
+       units. In both cases, PCRE2_ERROR_NOSUBSTRING is returned if there  are
        no entries for the given name.
 
        The format of the name table is described above in the section entitled
-       Information  about  a  pattern.  Given all the relevant entries for the
-       name, you can extract each of their numbers,  and  hence  the  captured
+       Information about a pattern. Given all the  relevant  entries  for  the
+       name,  you  can  extract  each of their numbers, and hence the captured
        data.
 
 
 FINDING ALL POSSIBLE MATCHES AT ONE POSITION
 
-       The  traditional  matching  function  uses a similar algorithm to Perl,
-       which stops when it finds the first match at a given point in the  sub-
+       The traditional matching function uses a  similar  algorithm  to  Perl,
+       which  stops when it finds the first match at a given point in the sub-
        ject. If you want to find all possible matches, or the longest possible
-       match at a given position,  consider  using  the  alternative  matching
-       function  (see  below) instead. If you cannot use the alternative func-
+       match  at  a  given  position,  consider using the alternative matching
+       function (see below) instead. If you cannot use the  alternative  func-
        tion, you can kludge it up by making use of the callout facility, which
        is described in the pcre2callout documentation.
 
        What you have to do is to insert a callout right at the end of the pat-
-       tern.  When your callout function is called, extract and save the  cur-
-       rent  matched  substring.  Then return 1, which forces pcre2_match() to
-       backtrack and try other alternatives. Ultimately, when it runs  out  of
+       tern.   When your callout function is called, extract and save the cur-
+       rent matched substring. Then return 1, which  forces  pcre2_match()  to
+       backtrack  and  try other alternatives. Ultimately, when it runs out of
        matches, pcre2_match() will yield PCRE2_ERROR_NOMATCH.
 
 
@@ -3661,15 +3670,16 @@
          pcre2_match_context *mcontext,
          int *workspace, PCRE2_SIZE wscount);
 
-       The  function  pcre2_dfa_match()  is  called  to match a subject string
-       against a compiled pattern, using a matching algorithm that  scans  the
+       The function pcre2_dfa_match() is called  to  match  a  subject  string
+       against  a  compiled pattern, using a matching algorithm that scans the
        subject string just once (not counting lookaround assertions), and does
-       not backtrack.  This has different characteristics to the normal  algo-
-       rithm,  and  is not compatible with Perl. Some of the features of PCRE2
-       patterns are not supported.  Nevertheless, there are  times  when  this
-       kind  of  matching  can be useful. For a discussion of the two matching
-       algorithms, and a list of features that pcre2_dfa_match() does not sup-
-       port, see the pcre2matching documentation.
+       not  backtrack (except when processing lookaround assertions). This has
+       different characteristics to the normal algorithm, and is not  compati-
+       ble  with  Perl.  Some  of  the features of PCRE2 patterns are not sup-
+       ported. Nevertheless, there are times when this kind of matching can be
+       useful.  For a discussion of the two matching algorithms, and a list of
+       features that pcre2_dfa_match() does not support, see the pcre2matching
+       documentation.
 
        The  arguments  for  the pcre2_dfa_match() function are the same as for
        pcre2_match(), plus two extras. The ovector within the match data block
@@ -3698,7 +3708,7 @@
            wspace,         /* working space vector */
            20);            /* number of elements (NOT size in bytes) */
 
-   Option bits for pcre_dfa_match()
+   Option bits for pcre2_dfa_match()
 
        The  unused  bits of the options argument for pcre2_dfa_match() must be
        zero.  The  only   bits   that   may   be   set   are   PCRE2_ANCHORED,
@@ -3848,7 +3858,7 @@
 
 REVISION
 
-       Last updated: 30 August 2021
+       Last updated: 14 December 2021
        Copyright (c) 1997-2021 University of Cambridge.
 ------------------------------------------------------------------------------
 
@@ -3961,8 +3971,8 @@
        0x10ffff  in  the  strings that they handle. Unicode support also gives
        access to the Unicode properties of characters, using  pattern  escapes
        such as \P, \p, and \X. Only the general category properties such as Lu
-       and Nd are supported. Details are given in the pcre2pattern  documenta-
-       tion.
+       and Nd, script names, and some bi-directional properties are supported.
+       Details are given in the pcre2pattern documentation.
 
        Pattern escapes such as \d and \w do not by default make use of Unicode
        properties. The application can request that they  do  by  setting  the
@@ -4128,7 +4138,7 @@
        for --with-match-limit. You can set a lower default  limit  by  adding,
        for example,
 
-         --with-match-limit_depth=10000
+         --with-match-limit-depth=10000
 
        to  the  configure  command.  This value can be overridden at run time.
        This depth limit indirectly limits the amount of heap  memory  that  is
@@ -4444,8 +4454,8 @@
 
 REVISION
 
-       Last updated: 20 March 2020
-       Copyright (c) 1997-2020 University of Cambridge.
+       Last updated: 08 December 2021
+       Copyright (c) 1997-2021 University of Cambridge.
 ------------------------------------------------------------------------------
 
 
@@ -4890,57 +4900,64 @@
 
        This  document describes some of the differences in the ways that PCRE2
        and Perl handle regular expressions. The differences described here are
-       with  respect  to  Perl  version 5.32.0, but as both Perl and PCRE2 are
+       with  respect  to  Perl  version 5.34.0, but as both Perl and PCRE2 are
        continually changing, the information may at times be out of date.
 
-       1. PCRE2 has only a subset of Perl's Unicode support. Details  of  what
+       1. When PCRE2_DOTALL (equivalent to Perl's /s qualifier)  is  not  set,
+       the behaviour of the '.' metacharacter differs from Perl. In PCRE2, '.'
+       matches the next character unless it is the  start  of  a  newline  se-
+       quence.  This  means  that, if the newline setting is CR, CRLF, or NUL,
+       '.' will match the code point LF (0x0A) in ASCII/Unicode  environments,
+       and  NL  (either  0x15 or 0x25) when using EBCDIC. In Perl, '.' appears
+       never to match LF, even when 0x0A is not a newline indicator.
+
+       2. PCRE2 has only a subset of Perl's Unicode support. Details  of  what
        it does have are given in the pcre2unicode page.
 
-       2.  Like  Perl, PCRE2 allows repeat quantifiers on parenthesized asser-
+       3.  Like  Perl, PCRE2 allows repeat quantifiers on parenthesized asser-
        tions, but they do not mean what you might think. For example, (?!a){3}
        does not assert that the next three characters are not "a". It just as-
        serts that the next character is not "a"  three  times  (in  principle;
        PCRE2  optimizes this to run the assertion just once). Perl allows some
-       repeat quantifiers on other  assertions,  for  example,  \b*  (but  not
-       \b{3},  though oddly it does allow ^{3}), but these do not seem to have
-       any use. PCRE2 does not allow any kind of quantifier on  non-lookaround
-       assertions.
+       repeat quantifiers on other assertions, for example, \b* , but these do
+       not  seem  to have any use. PCRE2 does not allow any kind of quantifier
+       on non-lookaround assertions.
 
-       3.  Capture groups that occur inside negative lookaround assertions are
-       counted, but their entries in the offsets vector are set  only  when  a
-       negative  assertion is a condition that has a matching branch (that is,
-       the condition is false).  Perl may set such  capture  groups  in  other
+       4. Capture groups that occur inside negative lookaround assertions  are
+       counted,  but  their  entries in the offsets vector are set only when a
+       negative assertion is a condition that has a matching branch (that  is,
+       the  condition  is  false).   Perl may set such capture groups in other
        circumstances.
 
-       4.  The  following Perl escape sequences are not supported: \F, \l, \L,
+       5. The following Perl escape sequences are not supported: \F,  \l,  \L,
        \u, \U, and \N when followed by a character name. \N on its own, match-
-       ing  a  non-newline  character, and \N{U+dd..}, matching a Unicode code
-       point, are supported. The escapes that modify  the  case  of  following
-       letters  are  implemented by Perl's general string-handling and are not
+       ing a non-newline character, and \N{U+dd..}, matching  a  Unicode  code
+       point,  are  supported.  The  escapes that modify the case of following
+       letters are implemented by Perl's general string-handling and  are  not
        part of its pattern matching engine. If any of these are encountered by
-       PCRE2,  an  error  is  generated  by default. However, if either of the
-       PCRE2_ALT_BSUX or PCRE2_EXTRA_ALT_BSUX options is set, \U  and  \u  are
+       PCRE2, an error is generated by default.  However,  if  either  of  the
+       PCRE2_ALT_BSUX  or  PCRE2_EXTRA_ALT_BSUX  options is set, \U and \u are
        interpreted as ECMAScript interprets them.
 
-       5. The Perl escape sequences \p, \P, and \X are supported only if PCRE2
+       6. The Perl escape sequences \p, \P, and \X are supported only if PCRE2
        is built with Unicode support (the default). The properties that can be
-       tested  with  \p  and \P are limited to the general category properties
-       such as Lu and Nd, script names such as Greek or Han, and  the  derived
-       properties  Any and L&.  Both PCRE2 and Perl support the Cs (surrogate)
-       property, but in PCRE2 its use is limited. See the  pcre2pattern  docu-
-       mentation  for  details. The long synonyms for property names that Perl
-       supports (such as \p{Letter}) are not supported by  PCRE2,  nor  is  it
-       permitted to prefix any of these properties with "Is".
+       tested with \p and \P are limited to the  general  category  properties
+       such  as  Lu  and  Nd,  script  names such as Greek or Han, Bidi_Class,
+       Bidi_Control, and the derived properties Any and LC (synonym L&).  Both
+       PCRE2  and  Perl  support the Cs (surrogate) property, but in PCRE2 its
+       use is limited. See the pcre2pattern  documentation  for  details.  The
+       long  synonyms  for  property names that Perl supports (such as \p{Let-
+       ter}) are not supported by PCRE2, nor is it permitted to prefix any  of
+       these properties with "Is".
 
-       6. PCRE2 supports the \Q...\E escape for quoting substrings. Characters
+       7. PCRE2 supports the \Q...\E escape for quoting substrings. Characters
        in between are treated as literals. However, this is slightly different
        from  Perl  in  that  $  and  @ are also handled as literals inside the
-       quotes. In Perl, they cause variable interpolation (but of course PCRE2
-       does not have variables). Also, Perl does "double-quotish backslash in-
-       terpolation" on any backslashes between \Q and \E which, its documenta-
-       tion  says,  "may  lead to confusing results". PCRE2 treats a backslash
-       between \Q and \E just like any other character. Note the following ex-
-       amples:
+       quotes. In Perl, they cause variable interpolation (PCRE2 does not have
+       variables). Also, Perl does "double-quotish backslash interpolation" on
+       any backslashes between \Q and \E which, its documentation  says,  "may
+       lead  to confusing results". PCRE2 treats a backslash between \Q and \E
+       just like any other character. Note the following examples:
 
            Pattern            PCRE2 matches     Perl matches
 
@@ -4951,81 +4968,82 @@
            \QA\B\E            A\B               A\B
            \Q\\E              \                 \\E
 
-       The  \Q...\E  sequence  is recognized both inside and outside character
+       The \Q...\E sequence is recognized both inside  and  outside  character
        classes by both PCRE2 and Perl.
 
-       7.  Fairly  obviously,  PCRE2  does  not  support  the  (?{code})   and
+       8.   Fairly  obviously,  PCRE2  does  not  support  the  (?{code})  and
        (??{code}) constructions. However, PCRE2 does have a "callout" feature,
        which allows an external function to be called during pattern matching.
        See the pcre2callout documentation for details.
 
-       8.  Subroutine  calls (whether recursive or not) were treated as atomic
-       groups up to PCRE2 release 10.23, but from release 10.30 this  changed,
+       9. Subroutine calls (whether recursive or not) were treated  as  atomic
+       groups  up to PCRE2 release 10.23, but from release 10.30 this changed,
        and backtracking into subroutine calls is now supported, as in Perl.
 
-       9.  In  PCRE2,  if  any of the backtracking control verbs are used in a
-       group that is called as a  subroutine  (whether  or  not  recursively),
-       their  effect is confined to that group; it does not extend to the sur-
-       rounding pattern. This is not always the case in Perl.  In  particular,
-       if  (*THEN)  is  present in a group that is called as a subroutine, its
+       10. In PCRE2, if any of the backtracking control verbs are  used  in  a
+       group  that  is  called  as  a subroutine (whether or not recursively),
+       their effect is confined to that group; it does not extend to the  sur-
+       rounding  pattern.  This is not always the case in Perl. In particular,
+       if (*THEN) is present in a group that is called as  a  subroutine,  its
        action is limited to that group, even if the group does not contain any
-       |  characters.  Note  that such groups are processed as anchored at the
+       | characters. Note that such groups are processed as  anchored  at  the
        point where they are tested.
 
-       10. If a pattern contains more than one backtracking control verb,  the
-       first  one  that  is backtracked onto acts. For example, in the pattern
-       A(*COMMIT)B(*PRUNE)C a failure in B triggers (*COMMIT), but  a  failure
+       11.  If a pattern contains more than one backtracking control verb, the
+       first one that is backtracked onto acts. For example,  in  the  pattern
+       A(*COMMIT)B(*PRUNE)C  a  failure in B triggers (*COMMIT), but a failure
        in C triggers (*PRUNE). Perl's behaviour is more complex; in many cases
        it is the same as PCRE2, but there are cases where it differs.
 
-       11. There are some differences that are concerned with the settings  of
-       captured  strings  when  part  of  a  pattern is repeated. For example,
-       matching "aba" against the pattern /^(a(b)?)+$/ in Perl leaves  $2  un-
+       12.  There are some differences that are concerned with the settings of
+       captured strings when part of  a  pattern  is  repeated.  For  example,
+       matching  "aba"  against the pattern /^(a(b)?)+$/ in Perl leaves $2 un-
        set, but in PCRE2 it is set to "b".
 
-       12.  PCRE2's  handling  of duplicate capture group numbers and names is
-       not as general as Perl's. This is a consequence of the fact  the  PCRE2
-       works  internally  just with numbers, using an external table to trans-
-       late between numbers and  names.  In  particular,  a  pattern  such  as
-       (?|(?<a>A)|(?<b>B)),  where the two capture groups have the same number
-       but different names, is not supported, and causes an error  at  compile
+       13. PCRE2's handling of duplicate capture group numbers  and  names  is
+       not  as  general as Perl's. This is a consequence of the fact the PCRE2
+       works internally just with numbers, using an external table  to  trans-
+       late  between  numbers  and  names.  In  particular,  a pattern such as
+       (?|(?<a>A)|(?<b>B)), where the two capture groups have the same  number
+       but  different  names, is not supported, and causes an error at compile
        time. If it were allowed, it would not be possible to distinguish which
-       group matched, because both names map to capture  group  number  1.  To
+       group  matched,  because  both  names map to capture group number 1. To
        avoid this confusing situation, an error is given at compile time.
 
-       13. Perl used to recognize comments in some places that PCRE2 does not,
-       for example, between the ( and ? at the start of a  group.  If  the  /x
-       modifier  is  set,  Perl allowed white space between ( and ? though the
-       latest Perls give an error (for a while it was just deprecated).  There
+       14. Perl used to recognize comments in some places that PCRE2 does not,
+       for  example,  between  the  ( and ? at the start of a group. If the /x
+       modifier is set, Perl allowed white space between ( and  ?  though  the
+       latest  Perls give an error (for a while it was just deprecated). There
        may still be some cases where Perl behaves differently.
 
-       14.  Perl,  when  in warning mode, gives warnings for character classes
-       such as [A-\d] or [a-[:digit:]]. It then treats the hyphens  as  liter-
+       15. Perl, when in warning mode, gives warnings  for  character  classes
+       such  as  [A-\d] or [a-[:digit:]]. It then treats the hyphens as liter-
        als. PCRE2 has no warning features, so it gives an error in these cases
        because they are almost certainly user mistakes.
 
-       15. In PCRE2, the upper/lower case character properties Lu and  Ll  are
-       not  affected when case-independent matching is specified. For example,
+       16.  In  PCRE2, the upper/lower case character properties Lu and Ll are
+       not affected when case-independent matching is specified. For  example,
        \p{Lu} always matches an upper case letter. I think Perl has changed in
-       this  respect; in the release at the time of writing (5.32), \p{Lu} and
+       this respect; in the release at the time of writing (5.34), \p{Lu}  and
        \p{Ll} match all letters, regardless of case, when case independence is
        specified.
 
-       16. From release 5.32.0, Perl locks out the use of \K in lookaround as-
-       sertions. From release 10.38 PCRE2 does the same by  default.  However,
-       there  is  an  option for re-enabling the previous behaviour. When this
-       option is set, \K is acted on when it occurs  in  positive  assertions,
+       17. From release 5.32.0, Perl locks out the use of \K in lookaround as-
+       sertions.  From  release 10.38 PCRE2 does the same by default. However,
+       there is an option for re-enabling the previous  behaviour.  When  this
+       option  is  set,  \K is acted on when it occurs in positive assertions,
        but is ignored in negative assertions.
 
-       17.  PCRE2  provides some extensions to the Perl regular expression fa-
-       cilities.  Perl 5.10 included new features that  were  not  in  earlier
-       versions  of  Perl,  some  of which (such as named parentheses) were in
-       PCRE2 for some time before. This list is with respect to Perl 5.32:
+       18. PCRE2 provides some extensions to the Perl regular  expression  fa-
+       cilities.   Perl  5.10  included  new features that were not in earlier
+       versions of Perl, some of which (such as  named  parentheses)  were  in
+       PCRE2 for some time before. This list is with respect to Perl 5.34:
 
-       (a) Although lookbehind assertions in PCRE2  must  match  fixed  length
+       (a)  Although  lookbehind  assertions  in PCRE2 must match fixed length
        strings, each alternative toplevel branch of a lookbehind assertion can
-       match a different length of string. Perl requires them all to have  the
-       same length.
+       match  a  different  length of string. Perl used to require them all to
+       have the same length, but the latest version has some  variable  length
+       support.
 
        (b) From PCRE2 10.23, backreferences to groups of fixed length are sup-
        ported in lookbehinds, provided that there is no possibility of  refer-
@@ -5067,12 +5085,12 @@
        an extension to the lookaround facilities. The default, Perl-compatible
        lookarounds are atomic.
 
-       18.  The  Perl  /a modifier restricts /d numbers to pure ascii, and the
+       19.  The  Perl  /a modifier restricts /d numbers to pure ascii, and the
        /aa modifier restricts /i case-insensitive matching to pure ascii,  ig-
        noring  Unicode  rules.  This  separation  cannot  be  represented with
        PCRE2_UCP.
 
-       19. Perl has different limits than PCRE2. See the pcre2limit documenta-
+       20. Perl has different limits than PCRE2. See the pcre2limit documenta-
        tion for details. Perl went with 5.10 from recursion to iteration keep-
        ing the intermediate matches on the heap, which is ~10% slower but does
        not  fall into any stack-overflow limit. PCRE2 made a similar change at
@@ -5089,7 +5107,7 @@
 
 REVISION
 
-       Last updated: 30 August 2021
+       Last updated: 08 December 2021
        Copyright (c) 1997-2021 University of Cambridge.
 ------------------------------------------------------------------------------
 
@@ -5434,7 +5452,7 @@
        void pcre2_jit_free_unused_memory(pcre2_general_context *gcontext);
 
        The JIT executable allocator does not free all memory when it is possi-
-       ble.  It expects new allocations, and keeps some free memory around  to
+       ble. It expects new allocations, and keeps some free memory  around  to
        improve  allocation  speed. However, in low memory conditions, it might
        be better to free all possible memory. You can cause this to happen  by
        calling  pcre2_jit_free_unused_memory(). Its argument is a general con-
@@ -5492,12 +5510,13 @@
 
        When  you call pcre2_match(), as well as testing for invalid options, a
        number of other sanity checks are performed on the arguments. For exam-
-       ple, if the subject pointer is NULL, an immediate error is given. Also,
-       unless PCRE2_NO_UTF_CHECK is set, a UTF subject string  is  tested  for
-       validity.  In the interests of speed, these checks do not happen on the
-       JIT fast path, and if invalid data is passed, the result is undefined.
+       ple,  if the subject pointer is NULL but the length is non-zero, an im-
+       mediate error is given. Also, unless PCRE2_NO_UTF_CHECK is set,  a  UTF
+       subject string is tested for validity. In the interests of speed, these
+       checks do not happen on the JIT fast  path,  and  if  invalid  data  is
+       passed, the result is undefined.
 
-       Bypassing the sanity checks and the  pcre2_match()  wrapping  can  give
+       Bypassing  the  sanity  checks  and the pcre2_match() wrapping can give
        speedups of more than 10%.
 
 
@@ -5515,8 +5534,8 @@
 
 REVISION
 
-       Last updated: 23 May 2019
-       Copyright (c) 1997-2019 University of Cambridge.
+       Last updated: 30 November 2021
+       Copyright (c) 1997-2021 University of Cambridge.
 ------------------------------------------------------------------------------
 
 
@@ -6870,68 +6889,65 @@
        ters  whose code points are less than U+0100 and U+10000, respectively.
        In 32-bit non-UTF mode, code points greater than 0x10ffff (the  Unicode
        limit)  may  be  encountered. These are all treated as being in the Un-
-       known script and with an unassigned type. The  extra  escape  sequences
-       are:
+       known script and with an unassigned type.
+
+       Matching characters by Unicode property is not fast, because PCRE2  has
+       to  do  a  multistage table lookup in order to find a character's prop-
+       erty. That is why the traditional escape sequences such as \d and \w do
+       not  use  Unicode  properties  in PCRE2 by default, though you can make
+       them do so by setting the PCRE2_UCP option or by starting  the  pattern
+       with (*UCP).
+
+       The extra escape sequences that provide property support are:
 
          \p{xx}   a character with the xx property
          \P{xx}   a character without the xx property
          \X       a Unicode extended grapheme cluster
 
-       The property names represented by xx above are case-sensitive. There is
-       support for Unicode script names, Unicode general category  properties,
-       "Any",  which  matches any character (including newline), and some spe-
-       cial PCRE2 properties (described in  the  next  section).   Other  Perl
-       properties such as "InMusicalSymbols" are not supported by PCRE2.  Note
-       that \P{Any} does not match any characters, so always  causes  a  match
-       failure.
+       The  property names represented by xx above are not case-sensitive, and
+       in accordance with Unicode's "loose matching" rules,  spaces,  hyphens,
+       and underscores are ignored. There is support for Unicode script names,
+       Unicode general category properties, "Any", which matches any character
+       (including  newline),  Bidi_Class,  a number of binary (yes/no) proper-
+       ties, and some special PCRE2  properties  (described  below).   Certain
+       other  Perl  properties such as "InMusicalSymbols" are not supported by
+       PCRE2. Note that \P{Any} does  not  match  any  characters,  so  always
+       causes a match failure.
 
-       Sets of Unicode characters are defined as belonging to certain scripts.
-       A character from one of these sets can be matched using a script  name.
-       For example:
+   Script properties for \p and \P
 
-         \p{Greek}
-         \P{Han}
+       There are three different syntax forms for matching a script. Each Uni-
+       code character has a basic script and,  optionally,  a  list  of  other
+       scripts ("Script Extensions") with which it is commonly used. Using the
+       Adlam script as an example, \p{sc:Adlam} matches characters whose basic
+       script is Adlam, whereas \p{scx:Adlam} matches, in addition, characters
+       that have Adlam in their extensions list. The full names  "script"  and
+       "script extensions" for the property types are recognized, and a equals
+       sign is an alternative to the colon. If a script name is given  without
+       a  property  type,  for example, \p{Adlam}, it is treated as \p{scx:Ad-
+       lam}. Perl changed to this interpretation at  release  5.26  and  PCRE2
+       changed at release 10.40.
 
        Unassigned characters (and in non-UTF 32-bit mode, characters with code
        points greater than 0x10FFFF) are assigned the "Unknown" script. Others
        that  are not part of an identified script are lumped together as "Com-
-       mon". The current list of scripts is:
+       mon". The current list of recognized script names and their 4-character
+       abbreviations can be obtained by running this command:
 
-       Adlam, Ahom, Anatolian_Hieroglyphs, Arabic,  Armenian,  Avestan,  Bali-
-       nese,  Bamum,  Bassa_Vah,  Batak, Bengali, Bhaiksuki, Bopomofo, Brahmi,
-       Braille, Buginese, Buhid, Canadian_Aboriginal, Carian,  Caucasian_Alba-
-       nian,  Chakma,  Cham,  Cherokee, Chorasmian, Common, Coptic, Cuneiform,
-       Cypriot, Cypro_Minoan, Cyrillic, Deseret, Devanagari, Dives_Akuru,  Do-
-       gra,  Duployan, Egyptian_Hieroglyphs, Elbasan, Elymaic, Ethiopic, Geor-
-       gian, Glagolitic, Gothic, Grantha, Greek, Gujarati, Gunjala_Gondi, Gur-
-       mukhi, Han, Hangul, Hanifi_Rohingya, Hanunoo, Hatran, Hebrew, Hiragana,
-       Imperial_Aramaic,    Inherited,     Inscriptional_Pahlavi,     Inscrip-
-       tional_Parthian,   Javanese,   Kaithi,   Kannada,  Katakana,  Kayah_Li,
-       Kharoshthi, Khitan_Small_Script, Khmer, Khojki, Khudawadi, Lao,  Latin,
-       Lepcha,  Limbu,  Linear_A,  Linear_B,  Lisu,  Lycian, Lydian, Mahajani,
-       Makasar, Malayalam, Mandaic, Manichaean, Marchen, Masaram_Gondi,  Mede-
-       faidrin, Meetei_Mayek, Mende_Kikakui, Meroitic_Cursive, Meroitic_Hiero-
-       glyphs, Miao, Modi, Mongolian, Mro, Multani, Myanmar, Nabataean, Nandi-
-       nagari,  New_Tai_Lue,  Newa,  Nko, Nushu, Nyakeng_Puachue_Hmong, Ogham,
-       Ol_Chiki,  Old_Hungarian,  Old_Italic,  Old_North_Arabian,  Old_Permic,
-       Old_Persian,  Old_Sogdian,  Old_South_Arabian,  Old_Turkic, Old_Uyghur,
-       Oriya, Osage, Osmanya, Pahawh_Hmong, Palmyrene, Pau_Cin_Hau,  Phags_Pa,
-       Phoenician,  Psalter_Pahlavi,  Rejang,  Runic,  Samaritan,  Saurashtra,
-       Sharada, Shavian, Siddham, SignWriting, Sinhala, Sogdian, Sora_Sompeng,
-       Soyombo,  Sundanese,  Syloti_Nagri,  Syriac, Tagalog, Tagbanwa, Tai_Le,
-       Tai_Tham, Tai_Viet, Takri, Tamil, Tangsa, Tangut, Telugu, Thaana, Thai,
-       Tibetan,  Tifinagh,  Tirhuta,  Toto,  Ugaritic, Unknown, Vai, Vithkuqi,
-       Wancho, Warang_Citi, Yezidi, Yi, Zanabazar_Square.
+         pcre2test -LS
+
+
+   The general category property for \p and \P
 
        Each character has exactly one Unicode general category property, spec-
-       ified  by a two-letter abbreviation. For compatibility with Perl, nega-
-       tion can be specified by including a  circumflex  between  the  opening
-       brace  and  the  property  name.  For  example,  \p{^Lu} is the same as
+       ified by a two-letter abbreviation. For compatibility with Perl,  nega-
+       tion  can  be  specified  by including a circumflex between the opening
+       brace and the property name.  For  example,  \p{^Lu}  is  the  same  as
        \P{Lu}.
 
        If only one letter is specified with \p or \P, it includes all the gen-
-       eral  category properties that start with that letter. In this case, in
-       the absence of negation, the curly brackets in the escape sequence  are
+       eral category properties that start with that letter. In this case,  in
+       the  absence of negation, the curly brackets in the escape sequence are
        optional; these two examples have the same effect:
 
          \p{L}
@@ -6983,36 +6999,73 @@
          Zp    Paragraph separator
          Zs    Space separator
 
-       The  special property L& is also supported: it matches a character that
-       has the Lu, Ll, or Lt property, in other words, a letter  that  is  not
-       classified as a modifier or "other".
+       The special property LC, which has the synonym L&, is  also  supported:
+       it  matches  a  character that has the Lu, Ll, or Lt property, in other
+       words, a letter that is not classified as a modifier or "other".
 
-       The  Cs  (Surrogate)  property  applies  only  to characters whose code
-       points are in the range U+D800 to U+DFFF. These characters are no  dif-
-       ferent  to any other character when PCRE2 is not in UTF mode (using the
-       16-bit or 32-bit library).  However, they  are  not  valid  in  Unicode
+       The Cs (Surrogate) property  applies  only  to  characters  whose  code
+       points  are in the range U+D800 to U+DFFF. These characters are no dif-
+       ferent to any other character when PCRE2 is not in UTF mode (using  the
+       16-bit  or  32-bit  library).   However,  they are not valid in Unicode
        strings and so cannot be tested by PCRE2 in UTF mode, unless UTF valid-
-       ity  checking  has   been   turned   off   (see   the   discussion   of
+       ity   checking   has   been   turned   off   (see   the  discussion  of
        PCRE2_NO_UTF_CHECK in the pcre2api page).
 
-       The  long  synonyms  for  property  names  that  Perl supports (such as
-       \p{Letter}) are not supported by PCRE2, nor is it permitted  to  prefix
+       The long synonyms for  property  names  that  Perl  supports  (such  as
+       \p{Letter})  are  not supported by PCRE2, nor is it permitted to prefix
        any of these properties with "Is".
 
        No character that is in the Unicode table has the Cn (unassigned) prop-
        erty.  Instead, this property is assumed for any code point that is not
        in the Unicode table.
 
-       Specifying  caseless  matching  does not affect these escape sequences.
-       For example, \p{Lu} always matches only upper  case  letters.  This  is
+       Specifying caseless matching does not affect  these  escape  sequences.
+       For  example,  \p{Lu}  always  matches only upper case letters. This is
        different from the behaviour of current versions of Perl.
 
-       Matching  characters by Unicode property is not fast, because PCRE2 has
-       to do a multistage table lookup in order to find  a  character's  prop-
-       erty. That is why the traditional escape sequences such as \d and \w do
-       not use Unicode properties in PCRE2 by default,  though  you  can  make
-       them  do  so by setting the PCRE2_UCP option or by starting the pattern
-       with (*UCP).
+   Binary (yes/no) properties for \p and \P
+
+       Unicode defines a number of  binary  properties,  that  is,  properties
+       whose  only  values  are  true or false. You can obtain a list of those
+       that are recognized by \p and \P, along with  their  abbreviations,  by
+       running this command:
+
+         pcre2test -LP
+
+
+   The Bidi_Class property for \p and \P
+
+         \p{Bidi_Class:<class>}   matches a character with the given class
+         \p{BC:<class>}           matches a character with the given class
+
+       The recognized classes are:
+
+         AL          Arabic letter
+         AN          Arabic number
+         B           paragraph separator
+         BN          boundary neutral
+         CS          common separator
+         EN          European number
+         ES          European separator
+         ET          European terminator
+         FSI         first strong isolate
+         L           left-to-right
+         LRE         left-to-right embedding
+         LRI         left-to-right isolate
+         LRO         left-to-right override
+         NSM         non-spacing mark
+         ON          other neutral
+         PDF         pop directional format
+         PDI         pop directional isolate
+         R           right-to-left
+         RLE         right-to-left embedding
+         RLI         right-to-left isolate
+         RLO         right-to-left override
+         S           segment separator
+         WS          which space
+
+       An  equals  sign  may  be  used instead of a colon. The class names are
+       case-insensitive; only the short names listed above are recognized.
 
    Extended grapheme clusters
 
@@ -7267,14 +7320,16 @@
 
        Outside a character class, a dot in the pattern matches any one charac-
        ter  in  the subject string except (by default) a character that signi-
-       fies the end of a line.
+       fies the end of a line. One or more characters may be specified as line
+       terminators (see "Newline conventions" above).
 
-       When a line ending is defined as a single character, dot never  matches
-       that  character; when the two-character sequence CRLF is used, dot does
-       not match CR if it is immediately followed  by  LF,  but  otherwise  it
-       matches  all characters (including isolated CRs and LFs). When any Uni-
-       code line endings are being recognized, dot does not match CR or LF  or
-       any of the other line ending characters.
+       Dot  never matches a single line-ending character. When the two-charac-
+       ter sequence CRLF is the only line ending, dot does not match CR if  it
+       is  immediately followed by LF, but otherwise it matches all characters
+       (including isolated CRs and LFs). When ANYCRLF  is  selected  for  line
+       endings,  no  occurences  of  CR of LF match dot. When all Unicode line
+       endings are being recognized, dot does not match CR or LF or any of the
+       other line ending characters.
 
        The  behaviour  of  dot  with regard to newlines can be changed. If the
        PCRE2_DOTALL option is set, a dot matches any  one  character,  without
@@ -8068,7 +8123,7 @@
 
          (*atomic:\d+)foo
 
-       This kind of parenthesized group "locks up" the  part of the pattern it
+       This  kind of parenthesized group "locks up" the part of the pattern it
        contains once it has matched, and a failure further into the pattern is
        prevented  from  backtracking into it. Backtracking past it to previous
        items, however, works as normal.
@@ -9640,8 +9695,8 @@
 
 REVISION
 
-       Last updated: 30 August 2021
-       Copyright (c) 1997-2021 University of Cambridge.
+       Last updated: 12 January 2022
+       Copyright (c) 1997-2022 University of Cambridge.
 ------------------------------------------------------------------------------
 
 
@@ -10312,11 +10367,11 @@
 SAVING AND RE-USING PRECOMPILED PCRE2 PATTERNS
 
        int32_t pcre2_serialize_decode(pcre2_code **codes,
-         int32_t number_of_codes, const uint32_t *bytes,
+         int32_t number_of_codes, const uint8_t *bytes,
          pcre2_general_context *gcontext);
 
-       int32_t pcre2_serialize_encode(pcre2_code **codes,
-         int32_t number_of_codes, uint32_t **serialized_bytes,
+       int32_t pcre2_serialize_encode(const pcre2_code **codes,
+         int32_t number_of_codes, uint8_t **serialized_bytes,
          PCRE2_SIZE *serialized_size, pcre2_general_context *gcontext);
 
        void pcre2_serialize_free(uint8_t *bytes);
@@ -10440,7 +10495,6 @@
        If this argument is NULL, malloc() and free() are used. After deserial-
        ization, the byte stream is no longer needed and can be discarded.
 
-         int32_t number_of_codes;
          pcre2_code *list_of_codes[2];
          uint8_t *bytes = <serialized data>;
          int32_t number_of_codes =
@@ -10588,6 +10642,10 @@
        iour of these escape sequences is changed to use Unicode properties and
        they match many more characters.
 
+       Property descriptions in \p and \P are matched caselessly; hyphens, un-
+       derscores,  and  white  space are ignored, in accordance with Unicode's
+       "loose matching" rules.
+
 
 GENERAL CATEGORY PROPERTIES FOR \p and \P
 
@@ -10604,6 +10662,7 @@
          Lo         Other letter
          Lt         Title case letter
          Lu         Upper case letter
+         Lc         Ll, Lu, or Lt
          L&         Ll, Lu, or Lt
 
          M          Mark
@@ -10650,33 +10709,56 @@
        acter set at release 5.18.
 
 
-SCRIPT NAMES FOR \p AND \P
+BINARY PROPERTIES FOR \p AND \P
 
-       Adlam,  Ahom,  Anatolian_Hieroglyphs,  Arabic, Armenian, Avestan, Bali-
-       nese, Bamum, Bassa_Vah, Batak, Bengali,  Bhaiksuki,  Bopomofo,  Brahmi,
-       Braille,  Buginese, Buhid, Canadian_Aboriginal, Carian, Caucasian_Alba-
-       nian, Chakma, Cham, Cherokee, Chorasmian,  Common,  Coptic,  Cuneiform,
-       Cypriot,  Cypro_Minoan, Cyrillic, Deseret, Devanagari, Dives_Akuru, Do-
-       gra, Duployan, Egyptian_Hieroglyphs, Elbasan, Elymaic, Ethiopic,  Geor-
-       gian, Glagolitic, Gothic, Grantha, Greek, Gujarati, Gunjala_Gondi, Gur-
-       mukhi, Han, Hangul, Hanifi_Rohingya, Hanunoo, Hatran, Hebrew, Hiragana,
-       Imperial_Aramaic,     Inherited,     Inscriptional_Pahlavi,    Inscrip-
-       tional_Parthian,  Javanese,  Kaithi,   Kannada,   Katakana,   Kayah_Li,
-       Kharoshthi,  Khitan_Small_Script, Khmer, Khojki, Khudawadi, Lao, Latin,
-       Lepcha, Limbu, Linear_A,  Linear_B,  Lisu,  Lycian,  Lydian,  Mahajani,
-       Makasar,  Malayalam, Mandaic, Manichaean, Marchen, Masaram_Gondi, Mede-
-       faidrin, Meetei_Mayek, Mende_Kikakui, Meroitic_Cursive, Meroitic_Hiero-
-       glyphs, Miao, Modi, Mongolian, Mro, Multani, Myanmar, Nabataean, Nandi-
-       nagari, New_Tai_Lue, Newa, Nko,  Nushu,  Nyakeng_Puachue_Hmong,  Ogham,
-       Ol_Chiki,  Old_Hungarian,  Old_Italic,  Old_North_Arabian,  Old_Permic,
-       Old_Persian, Old_Sogdian,  Old_South_Arabian,  Old_Turkic,  Old_Uyghur,
-       Oriya,  Osage, Osmanya, Pahawh_Hmong, Palmyrene, Pau_Cin_Hau, Phags_Pa,
-       Phoenician,  Psalter_Pahlavi,  Rejang,  Runic,  Samaritan,  Saurashtra,
-       Sharada, Shavian, Siddham, SignWriting, Sinhala, Sogdian, Sora_Sompeng,
-       Soyombo, Sundanese, Syloti_Nagri, Syriac,  Tagalog,  Tagbanwa,  Tai_Le,
-       Tai_Tham, Tai_Viet, Takri, Tamil, Tangsa, Tangut, Telugu, Thaana, Thai,
-       Tibetan, Tifinagh, Tirhuta,  Toto,  Ugaritic,  Vai,  Vithkuqi,  Wancho,
-       Warang_Citi, Yezidi, Yi, Zanabazar_Square.
+       Unicode  defines  a  number  of  binary properties, that is, properties
+       whose only values are true or false. You can obtain  a  list  of  those
+       that  are  recognized  by \p and \P, along with their abbreviations, by
+       running this command:
+
+         pcre2test -LP
+
+
+SCRIPT MATCHING WITH \p AND \P
+
+       Many script names and their 4-letter abbreviations  are  recognized  in
+       \p{sc:...}  or  \p{scx:...} items, or on their own with \p (and also \P
+       of course). You can obtain a list of these scripts by running this com-
+       mand:
+
+         pcre2test -LS
+
+
+THE BIDI_CLASS PROPERTY FOR \p AND \P
+
+         \p{Bidi_Class:<class>}   matches a character with the given class
+         \p{BC:<class>}           matches a character with the given class
+
+       The recognized classes are:
+
+         AL          Arabic letter
+         AN          Arabic number
+         B           paragraph separator
+         BN          boundary neutral
+         CS          common separator
+         EN          European number
+         ES          European separator
+         ET          European terminator
+         FSI         first strong isolate
+         L           left-to-right
+         LRE         left-to-right embedding
+         LRI         left-to-right isolate
+         LRO         left-to-right override
+         NSM         non-spacing mark
+         ON          other neutral
+         PDF         pop directional format
+         PDI         pop directional isolate
+         R           right-to-left
+         RLE         right-to-left embedding
+         RLI         right-to-left isolate
+         RLO         right-to-left override
+         S           segment separator
+         WS          which space
 
 
 CHARACTER CLASSES
@@ -11008,8 +11090,8 @@
 
 REVISION
 
-       Last updated: 30 August 2021
-       Copyright (c) 1997-2021 University of Cambridge.
+       Last updated: 12 January 2022
+       Copyright (c) 1997-2022 University of Cambridge.
 ------------------------------------------------------------------------------
 
 
@@ -11051,15 +11133,17 @@
 
        When  PCRE2 is built with Unicode support, the escape sequences \p{..},
        \P{..}, and \X can be used. This is not dependent on the PCRE2_UTF set-
-       ting.   The  Unicode  properties  that can be tested are limited to the
-       general category properties such as Lu for an upper case letter  or  Nd
-       for  a  decimal number, the Unicode script names such as Arabic or Han,
-       and the derived properties Any and L&. Full  lists  are  given  in  the
-       pcre2pattern  and  pcre2syntax  documentation. Only the short names for
-       properties are supported. For example, \p{L} matches a letter. Its Perl
-       synonym,  \p{Letter},  is  not  supported.   Furthermore, in Perl, many
-       properties may optionally be prefixed by "Is", for  compatibility  with
-       Perl 5.6. PCRE2 does not support this.
+       ting.   The Unicode properties that can be tested are a subset of those
+       that Perl supports. Currently they are limited to the general  category
+       properties such as Lu for an upper case letter or Nd for a decimal num-
+       ber, the Unicode script  names  such  as  Arabic  or  Han,  Bidi_Class,
+       Bidi_Control,  and the derived properties Any and LC (synonym L&). Full
+       lists are given in the pcre2pattern and pcre2syntax  documentation.  In
+       general,  only the short names for properties are supported.  For exam-
+       ple, \p{L} matches a letter. Its longer  synonym,  \p{Letter},  is  not
+       supported. Furthermore, in Perl, many properties may optionally be pre-
+       fixed by "Is", for compatibility with Perl 5.6. PCRE2 does not  support
+       this.
 
 
 WIDE CHARACTERS AND UTF MODES
@@ -11437,14 +11521,14 @@
 AUTHOR
 
        Philip Hazel
-       University Computing Service
+       Retired from University Computing Service
        Cambridge, England.
 
 
 REVISION
 
-       Last updated: 23 February 2020
-       Copyright (c) 1997-2020 University of Cambridge.
+       Last updated: 22 December 2021
+       Copyright (c) 1997-2021 University of Cambridge.
 ------------------------------------------------------------------------------
 
 
diff --git a/doc/pcre2_jit_stack_create.3 b/doc/pcre2_jit_stack_create.3
index f0b29f0..d332b72 100644
--- a/doc/pcre2_jit_stack_create.3
+++ b/doc/pcre2_jit_stack_create.3
@@ -22,7 +22,8 @@
 \fBpcre2_jit_stack_assign()\fP to associate the stack with a compiled pattern,
 which can then be processed by \fBpcre2_match()\fP or \fBpcre2_jit_match()\fP.
 A maximum stack size of 512KiB to 1MiB should be more than enough for any
-pattern. For more details, see the
+pattern. If the stack couldn't be allocated or the values passed were not
+reasonable, NULL will be returned. For more details, see the
 .\" HREF
 \fBpcre2jit\fP
 .\"
diff --git a/doc/pcre2_set_compile_extra_options.3 b/doc/pcre2_set_compile_extra_options.3
index 58cefe5..0dcc8de 100644
--- a/doc/pcre2_set_compile_extra_options.3
+++ b/doc/pcre2_set_compile_extra_options.3
@@ -18,9 +18,9 @@
 housed in a compile context. It completely replaces all the bits. The extra
 options are:
 .sp
-.\" JOIN
   PCRE2_EXTRA_ALLOW_LOOKAROUND_BSK     Allow \eK in lookarounds
-  PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES  Allow \ex{df800} to \ex{dfff}
+.\" JOIN
+  PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES  Allow \ex{d800} to \ex{dfff}
                                          in UTF-8 and UTF-32 modes
 .\" JOIN
   PCRE2_EXTRA_ALT_BSUX                 Extended alternate \eu, \eU, and
diff --git a/doc/pcre2_substitute.3 b/doc/pcre2_substitute.3
index cceb784..7ee4b6a 100644
--- a/doc/pcre2_substitute.3
+++ b/doc/pcre2_substitute.3
@@ -55,32 +55,42 @@
 The subject and replacement lengths can be given as PCRE2_ZERO_TERMINATED for
 zero-terminated strings. The options are:
 .sp
-  PCRE2_ANCHORED             Match only at the first position
-  PCRE2_ENDANCHORED          Pattern can match only at end of subject
-  PCRE2_NOTBOL               Subject is not the beginning of a line
-  PCRE2_NOTEOL               Subject is not the end of a line
-  PCRE2_NOTEMPTY             An empty string is not a valid match
+  PCRE2_ANCHORED                     Match only at the first position
+  PCRE2_ENDANCHORED                  Match only at end of subject
 .\" JOIN
-  PCRE2_NOTEMPTY_ATSTART     An empty string at the start of the
-                              subject is not a valid match
-  PCRE2_NO_JIT               Do not use JIT matching
+  PCRE2_NOTBOL                       Subject is not the beginning of a
+                                      line
+  PCRE2_NOTEOL                       Subject is not the end of a line
 .\" JOIN
-  PCRE2_NO_UTF_CHECK         Do not check the subject or replacement
-                              for UTF validity (only relevant if
-                              PCRE2_UTF was set at compile time)
-  PCRE2_SUBSTITUTE_EXTENDED  Do extended replacement processing
-  PCRE2_SUBSTITUTE_GLOBAL    Replace all occurrences in the subject
-  PCRE2_SUBSTITUTE_LITERAL   The replacement string is literal
-  PCRE2_SUBSTITUTE_MATCHED   Use pre-existing match data for 1st match
-  PCRE2_SUBSTITUTE_OVERFLOW_LENGTH  If overflow, compute needed length
+  PCRE2_NOTEMPTY                     An empty string is not a
+                                      valid match
+.\" JOIN
+  PCRE2_NOTEMPTY_ATSTART             An empty string at the start of
+                                      the subject is not a valid match
+  PCRE2_NO_JIT                       Do not use JIT matching
+.\" JOIN
+  PCRE2_NO_UTF_CHECK                 Do not check for UTF validity in
+                                      the subject or replacement
+.\" JOIN
+                                      (only relevant if PCRE2_UTF was
+                                      set at compile time)
+  PCRE2_SUBSTITUTE_EXTENDED          Do extended replacement processing
+.\" JOIN
+  PCRE2_SUBSTITUTE_GLOBAL            Replace all occurrences in the
+                                      subject
+  PCRE2_SUBSTITUTE_LITERAL           The replacement string is literal
+.\" JOIN
+  PCRE2_SUBSTITUTE_MATCHED           Use pre-existing match data for
+                                      first match
+  PCRE2_SUBSTITUTE_OVERFLOW_LENGTH   If overflow, compute needed length
   PCRE2_SUBSTITUTE_REPLACEMENT_ONLY  Return only replacement string(s)
-  PCRE2_SUBSTITUTE_UNKNOWN_UNSET  Treat unknown group as unset
-  PCRE2_SUBSTITUTE_UNSET_EMPTY  Simple unset insert = empty string
+  PCRE2_SUBSTITUTE_UNKNOWN_UNSET     Treat unknown group as unset
+  PCRE2_SUBSTITUTE_UNSET_EMPTY       Simple unset insert = empty string
 .sp
 If PCRE2_SUBSTITUTE_LITERAL is set, PCRE2_SUBSTITUTE_EXTENDED,
 PCRE2_SUBSTITUTE_UNKNOWN_UNSET, and PCRE2_SUBSTITUTE_UNSET_EMPTY are ignored.
 .P
-If PCRE2_SUBSTITUTE_MATCHED is set, \fImatch_data\fP must be non-zero; its
+If PCRE2_SUBSTITUTE_MATCHED is set, \fImatch_data\fP must be non-NULL; its
 contents must be the result of a call to \fBpcre2_match()\fP using the same
 pattern and subject.
 .P
diff --git a/doc/pcre2api.3 b/doc/pcre2api.3
index 1ad6e26..edde3db 100644
--- a/doc/pcre2api.3
+++ b/doc/pcre2api.3
@@ -1,4 +1,4 @@
-.TH PCRE2API 3 "30 August 2021" "PCRE2 10.38"
+.TH PCRE2API 3 "14 December 2021" "PCRE2 10.40"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .sp
@@ -1794,7 +1794,7 @@
 undefined. It may cause your program to crash or loop.
 .P
 Note that this option can also be passed to \fBpcre2_match()\fP and
-\fBpcre_dfa_match()\fP, to suppress UTF validity checking of the subject
+\fBpcre2_dfa_match()\fP, to suppress UTF validity checking of the subject
 string.
 .P
 Note also that setting PCRE2_NO_UTF_CHECK at compile time does not disable the
@@ -2015,8 +2015,8 @@
 256. By default, higher-valued code points never match escapes such as \ew or
 \ed.
 .P
-When PCRE2 is built with Unicode support (the default), the Unicode properties
-of all characters can be tested with \ep and \eP, or, alternatively, the
+When PCRE2 is built with Unicode support (the default), certain Unicode
+character properties can be tested with \ep and \eP, or, alternatively, the
 PCRE2_UCP option can be set when a pattern is compiled; this causes \ew and
 friends to use Unicode property support instead of the built-in tables.
 PCRE2_UCP also causes upper/lower casing operations on characters with code
@@ -2279,7 +2279,7 @@
   PCRE2_INFO_LASTCODETYPE
 .sp
 Returns 1 if there is a rightmost literal code unit that must exist in any
-matched string, other than at its start. The third argument should  point to a
+matched string, other than at its start. The third argument should point to a
 \fBuint32_t\fP variable. If there is no such value, 0 is returned. When 1 is
 returned, the code unit value itself can be retrieved using
 PCRE2_INFO_LASTCODEUNIT. For anchored patterns, a last literal value is
@@ -2624,7 +2624,9 @@
 \fIstartoffset\fP. The length and offset are in code units, not characters.
 That is, they are in bytes for the 8-bit library, 16-bit code units for the
 16-bit library, and 32-bit code units for the 32-bit library, whether or not
-UTF processing is enabled.
+UTF processing is enabled. As a special case, if \fIsubject\fP is NULL and
+\fIlength\fP is zero, the subject is assumed to be an empty string. If
+\fIlength\fP is non-zero, an error occurs if \fIsubject\fP is NULL.
 .P
 If \fIstartoffset\fP is greater than the length of the subject,
 \fBpcre2_match()\fP returns PCRE2_ERROR_BADOFFSET. When the starting offset is
@@ -3413,12 +3415,16 @@
 .P
 This function optionally calls \fBpcre2_match()\fP and then makes a copy of the
 subject string in \fIoutputbuffer\fP, replacing parts that were matched with
-the \fIreplacement\fP string, whose length is supplied in \fBrlength\fP. This
-can be given as PCRE2_ZERO_TERMINATED for a zero-terminated string. There is an
-option (see PCRE2_SUBSTITUTE_REPLACEMENT_ONLY below) to return just the
-replacement string(s). The default action is to perform just one replacement if
-the pattern matches, but there is an option that requests multiple replacements
-(see PCRE2_SUBSTITUTE_GLOBAL below).
+the \fIreplacement\fP string, whose length is supplied in \fBrlength\fP, which
+can be given as PCRE2_ZERO_TERMINATED for a zero-terminated string. As a
+special case, if \fIreplacement\fP is NULL and \fIrlength\fP is zero, the
+replacement is assumed to be an empty string. If \fIrlength\fP is non-zero, an
+error occurs if \fIreplacement\fP is NULL.
+.P
+There is an option (see PCRE2_SUBSTITUTE_REPLACEMENT_ONLY below) to return just
+the replacement string(s). The default action is to perform just one
+replacement if the pattern matches, but there is an option that requests
+multiple replacements (see PCRE2_SUBSTITUTE_GLOBAL below).
 .P
 If successful, \fBpcre2_substitute()\fP returns the number of substitutions
 that were carried out. This may be zero if no match was found, and is never
@@ -3447,12 +3453,12 @@
 As well as the usual options for \fBpcre2_match()\fP, a number of additional
 options can be set in the \fIoptions\fP argument of \fBpcre2_substitute()\fP.
 One such option is PCRE2_SUBSTITUTE_MATCHED. When this is set, an external
-\fImatch_data\fP block must be provided, and it must have been used for an
-external call to \fBpcre2_match()\fP. The data in the \fImatch_data\fP block
-(return code, offset vector) is used for the first substitution instead of
-calling \fBpcre2_match()\fP from within \fBpcre2_substitute()\fP. This allows
-an application to check for a match before choosing to substitute, without
-having to repeat the match.
+\fImatch_data\fP block must be provided, and it must have already been used for
+an external call to \fBpcre2_match()\fP with the same pattern and subject
+arguments. The data in the \fImatch_data\fP block (return code, offset vector)
+is then used for the first substitution instead of calling \fBpcre2_match()\fP
+from within \fBpcre2_substitute()\fP. This allows an application to check for a
+match before choosing to substitute, without having to repeat the match.
 .P
 The contents of the externally supplied match data block are not changed when
 PCRE2_SUBSTITUTE_MATCHED is set. If PCRE2_SUBSTITUTE_GLOBAL is also set,
@@ -3584,7 +3590,7 @@
 terminating a \eQ quoted sequence) reverts to no case forcing. The sequences
 \eu and \el force the next character (if it is a letter) to upper or lower
 case, respectively, and then the state automatically reverts to no case
-forcing. Case forcing applies to all inserted  characters, including those from
+forcing. Case forcing applies to all inserted characters, including those from
 capture groups and letters within \eQ...\eE quoted sequences. If either
 PCRE2_UTF or PCRE2_UCP was set when the pattern was compiled, Unicode
 properties are used for case forcing characters whose code points are greater
@@ -3649,7 +3655,9 @@
 default.
 .P
 PCRE2_ERROR_NULL is returned if PCRE2_SUBSTITUTE_MATCHED is set but the
-\fImatch_data\fP argument is NULL.
+\fImatch_data\fP argument is NULL or if the \fIsubject\fP or \fIreplacement\fP
+arguments are NULL. For backward compatibility reasons an exception is made for
+the \fIreplacement\fP argument if the \fIrlength\fP argument is also 0.
 .P
 PCRE2_ERROR_BADREPLACEMENT is used for miscellaneous syntax errors in the
 replacement string, with more particular errors being PCRE2_ERROR_BADREPESCAPE
@@ -3811,12 +3819,13 @@
 .P
 The function \fBpcre2_dfa_match()\fP is called to match a subject string
 against a compiled pattern, using a matching algorithm that scans the subject
-string just once (not counting lookaround assertions), and does not backtrack.
-This has different characteristics to the normal algorithm, and is not
-compatible with Perl. Some of the features of PCRE2 patterns are not supported.
-Nevertheless, there are times when this kind of matching can be useful. For a
-discussion of the two matching algorithms, and a list of features that
-\fBpcre2_dfa_match()\fP does not support, see the
+string just once (not counting lookaround assertions), and does not backtrack
+(except when processing lookaround assertions). This has different
+characteristics to the normal algorithm, and is not compatible with Perl. Some
+of the features of PCRE2 patterns are not supported. Nevertheless, there are
+times when this kind of matching can be useful. For a discussion of the two
+matching algorithms, and a list of features that \fBpcre2_dfa_match()\fP does
+not support, see the
 .\" HREF
 \fBpcre2matching\fP
 .\"
@@ -3848,7 +3857,7 @@
     wspace,         /* working space vector */
     20);            /* number of elements (NOT size in bytes) */
 .
-.SS "Option bits for \fBpcre_dfa_match()\fP"
+.SS "Option bits for \fBpcre2_dfa_match()\fP"
 .rs
 .sp
 The unused bits of the \fIoptions\fP argument for \fBpcre2_dfa_match()\fP must
@@ -4016,6 +4025,6 @@
 .rs
 .sp
 .nf
-Last updated: 30 August 2021
+Last updated: 14 December 2021
 Copyright (c) 1997-2021 University of Cambridge.
 .fi
diff --git a/doc/pcre2build.3 b/doc/pcre2build.3
index 60931bf..5fca3dc 100644
--- a/doc/pcre2build.3
+++ b/doc/pcre2build.3
@@ -1,4 +1,4 @@
-.TH PCRE2BUILD 3 "20 March 2020" "PCRE2 10.35"
+.TH PCRE2BUILD 3 "08 December 2021" "PCRE2 10.40"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .
@@ -122,8 +122,9 @@
 UTF support allows the libraries to process character code points up to
 0x10ffff in the strings that they handle. Unicode support also gives access to
 the Unicode properties of characters, using pattern escapes such as \eP, \ep,
-and \eX. Only the general category properties such as \fILu\fP and \fINd\fP are
-supported. Details are given in the
+and \eX. Only the general category properties such as \fILu\fP and \fINd\fP,
+script names, and some bi-directional properties are supported. Details are
+given in the
 .\" HREF
 \fBpcre2pattern\fP
 .\"
@@ -302,7 +303,7 @@
 for --with-match-limit. You can set a lower default limit by adding, for
 example,
 .sp
-  --with-match-limit_depth=10000
+  --with-match-limit-depth=10000
 .sp
 to the \fBconfigure\fP command. This value can be overridden at run time. This
 depth limit indirectly limits the amount of heap memory that is used, but
@@ -633,6 +634,6 @@
 .rs
 .sp
 .nf
-Last updated: 20 March 2020
-Copyright (c) 1997-2020 University of Cambridge.
+Last updated: 08 December 2021
+Copyright (c) 1997-2021 University of Cambridge.
 .fi
diff --git a/doc/pcre2compat.3 b/doc/pcre2compat.3
index 311d6eb..8333d3e 100644
--- a/doc/pcre2compat.3
+++ b/doc/pcre2compat.3
@@ -1,4 +1,4 @@
-.TH PCRE2COMPAT 3 "30 August 2021" "PCRE2 10.38"
+.TH PCRE2COMPAT 3 "08 December 2021" "PCRE2 10.40"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH "DIFFERENCES BETWEEN PCRE2 AND PERL"
@@ -6,31 +6,38 @@
 .sp
 This document describes some of the differences in the ways that PCRE2 and Perl
 handle regular expressions. The differences described here are with respect to
-Perl version 5.32.0, but as both Perl and PCRE2 are continually changing, the
+Perl version 5.34.0, but as both Perl and PCRE2 are continually changing, the
 information may at times be out of date.
 .P
-1. PCRE2 has only a subset of Perl's Unicode support. Details of what it does
+1. When PCRE2_DOTALL (equivalent to Perl's /s qualifier) is not set, the
+behaviour of the '.' metacharacter differs from Perl. In PCRE2, '.' matches the
+next character unless it is the start of a newline sequence. This means that,
+if the newline setting is CR, CRLF, or NUL, '.' will match the code point LF
+(0x0A) in ASCII/Unicode environments, and NL (either 0x15 or 0x25) when using
+EBCDIC. In Perl, '.' appears never to match LF, even when 0x0A is not a newline
+indicator.
+.P
+2. PCRE2 has only a subset of Perl's Unicode support. Details of what it does
 have are given in the
 .\" HREF
 \fBpcre2unicode\fP
 .\"
 page.
 .P
-2. Like Perl, PCRE2 allows repeat quantifiers on parenthesized assertions, but
+3. Like Perl, PCRE2 allows repeat quantifiers on parenthesized assertions, but
 they do not mean what you might think. For example, (?!a){3} does not assert
 that the next three characters are not "a". It just asserts that the next
 character is not "a" three times (in principle; PCRE2 optimizes this to run the
 assertion just once). Perl allows some repeat quantifiers on other assertions,
-for example, \eb* (but not \eb{3}, though oddly it does allow ^{3}), but these
-do not seem to have any use. PCRE2 does not allow any kind of quantifier on
-non-lookaround assertions.
+for example, \eb* , but these do not seem to have any use. PCRE2 does not allow
+any kind of quantifier on non-lookaround assertions.
 .P
-3. Capture groups that occur inside negative lookaround assertions are counted,
+4. Capture groups that occur inside negative lookaround assertions are counted,
 but their entries in the offsets vector are set only when a negative assertion
 is a condition that has a matching branch (that is, the condition is false).
 Perl may set such capture groups in other circumstances.
 .P
-4. The following Perl escape sequences are not supported: \eF, \el, \eL, \eu,
+5. The following Perl escape sequences are not supported: \eF, \el, \eL, \eu,
 \eU, and \eN when followed by a character name. \eN on its own, matching a
 non-newline character, and \eN{U+dd..}, matching a Unicode code point, are
 supported. The escapes that modify the case of following letters are
@@ -40,12 +47,12 @@
 PCRE2_EXTRA_ALT_BSUX options is set, \eU and \eu are interpreted as ECMAScript
 interprets them.
 .P
-5. The Perl escape sequences \ep, \eP, and \eX are supported only if PCRE2 is
+6. The Perl escape sequences \ep, \eP, and \eX are supported only if PCRE2 is
 built with Unicode support (the default). The properties that can be tested
 with \ep and \eP are limited to the general category properties such as Lu and
-Nd, script names such as Greek or Han, and the derived properties Any and L&.
-Both PCRE2 and Perl support the Cs (surrogate) property, but in PCRE2 its use
-is limited. See the
+Nd, script names such as Greek or Han, Bidi_Class, Bidi_Control, and the
+derived properties Any and LC (synonym L&). Both PCRE2 and Perl support the Cs
+(surrogate) property, but in PCRE2 its use is limited. See the
 .\" HREF
 \fBpcre2pattern\fP
 .\"
@@ -53,14 +60,14 @@
 supports (such as \ep{Letter}) are not supported by PCRE2, nor is it permitted
 to prefix any of these properties with "Is".
 .P
-6. PCRE2 supports the \eQ...\eE escape for quoting substrings. Characters
+7. PCRE2 supports the \eQ...\eE escape for quoting substrings. Characters
 in between are treated as literals. However, this is slightly different from
 Perl in that $ and @ are also handled as literals inside the quotes. In Perl,
-they cause variable interpolation (but of course PCRE2 does not have
-variables). Also, Perl does "double-quotish backslash interpolation" on any
-backslashes between \eQ and \eE which, its documentation says, "may lead to
-confusing results". PCRE2 treats a backslash between \eQ and \eE just like any
-other character. Note the following examples:
+they cause variable interpolation (PCRE2 does not have variables). Also, Perl
+does "double-quotish backslash interpolation" on any backslashes between \eQ
+and \eE which, its documentation says, "may lead to confusing results". PCRE2
+treats a backslash between \eQ and \eE just like any other character. Note the
+following examples:
 .sp
     Pattern            PCRE2 matches     Perl matches
 .sp
@@ -75,7 +82,7 @@
 The \eQ...\eE sequence is recognized both inside and outside character classes
 by both PCRE2 and Perl.
 .P
-7. Fairly obviously, PCRE2 does not support the (?{code}) and (??{code})
+8. Fairly obviously, PCRE2 does not support the (?{code}) and (??{code})
 constructions. However, PCRE2 does have a "callout" feature, which allows an
 external function to be called during pattern matching. See the
 .\" HREF
@@ -83,11 +90,11 @@
 .\"
 documentation for details.
 .P
-8. Subroutine calls (whether recursive or not) were treated as atomic groups up
+9. Subroutine calls (whether recursive or not) were treated as atomic groups up
 to PCRE2 release 10.23, but from release 10.30 this changed, and backtracking
 into subroutine calls is now supported, as in Perl.
 .P
-9. In PCRE2, if any of the backtracking control verbs are used in a group that
+10. In PCRE2, if any of the backtracking control verbs are used in a group that
 is called as a subroutine (whether or not recursively), their effect is
 confined to that group; it does not extend to the surrounding pattern. This is
 not always the case in Perl. In particular, if (*THEN) is present in a group
@@ -95,18 +102,18 @@
 the group does not contain any | characters. Note that such groups are
 processed as anchored at the point where they are tested.
 .P
-10. If a pattern contains more than one backtracking control verb, the first
+11. If a pattern contains more than one backtracking control verb, the first
 one that is backtracked onto acts. For example, in the pattern
 A(*COMMIT)B(*PRUNE)C a failure in B triggers (*COMMIT), but a failure in C
 triggers (*PRUNE). Perl's behaviour is more complex; in many cases it is the
 same as PCRE2, but there are cases where it differs.
 .P
-11. There are some differences that are concerned with the settings of captured
+12. There are some differences that are concerned with the settings of captured
 strings when part of a pattern is repeated. For example, matching "aba" against
 the pattern /^(a(b)?)+$/ in Perl leaves $2 unset, but in PCRE2 it is set to
 "b".
 .P
-12. PCRE2's handling of duplicate capture group numbers and names is not as
+13. PCRE2's handling of duplicate capture group numbers and names is not as
 general as Perl's. This is a consequence of the fact the PCRE2 works internally
 just with numbers, using an external table to translate between numbers and
 names. In particular, a pattern such as (?|(?<a>A)|(?<b>B)), where the two
@@ -115,37 +122,38 @@
 to distinguish which group matched, because both names map to capture group
 number 1. To avoid this confusing situation, an error is given at compile time.
 .P
-13. Perl used to recognize comments in some places that PCRE2 does not, for
+14. Perl used to recognize comments in some places that PCRE2 does not, for
 example, between the ( and ? at the start of a group. If the /x modifier is
 set, Perl allowed white space between ( and ? though the latest Perls give an
 error (for a while it was just deprecated). There may still be some cases where
 Perl behaves differently.
 .P
-14. Perl, when in warning mode, gives warnings for character classes such as
+15. Perl, when in warning mode, gives warnings for character classes such as
 [A-\ed] or [a-[:digit:]]. It then treats the hyphens as literals. PCRE2 has no
 warning features, so it gives an error in these cases because they are almost
 certainly user mistakes.
 .P
-15. In PCRE2, the upper/lower case character properties Lu and Ll are not
+16. In PCRE2, the upper/lower case character properties Lu and Ll are not
 affected when case-independent matching is specified. For example, \ep{Lu}
 always matches an upper case letter. I think Perl has changed in this respect;
-in the release at the time of writing (5.32), \ep{Lu} and \ep{Ll} match all
+in the release at the time of writing (5.34), \ep{Lu} and \ep{Ll} match all
 letters, regardless of case, when case independence is specified.
 .P
-16. From release 5.32.0, Perl locks out the use of \eK in lookaround
+17. From release 5.32.0, Perl locks out the use of \eK in lookaround
 assertions. From release 10.38 PCRE2 does the same by default. However, there
 is an option for re-enabling the previous behaviour. When this option is set,
 \eK is acted on when it occurs in positive assertions, but is ignored in
 negative assertions.
 .P
-17. PCRE2 provides some extensions to the Perl regular expression facilities.
+18. PCRE2 provides some extensions to the Perl regular expression facilities.
 Perl 5.10 included new features that were not in earlier versions of Perl, some
 of which (such as named parentheses) were in PCRE2 for some time before. This
-list is with respect to Perl 5.32:
+list is with respect to Perl 5.34:
 .sp
 (a) Although lookbehind assertions in PCRE2 must match fixed length strings,
 each alternative toplevel branch of a lookbehind assertion can match a
-different length of string. Perl requires them all to have the same length.
+different length of string. Perl used to require them all to have the same
+length, but the latest version has some variable length support.
 .sp
 (b) From PCRE2 10.23, backreferences to groups of fixed length are supported
 in lookbehinds, provided that there is no possibility of referencing a
@@ -186,11 +194,11 @@
 extension to the lookaround facilities. The default, Perl-compatible
 lookarounds are atomic.
 .P
-18. The Perl /a modifier restricts /d numbers to pure ascii, and the /aa
+19. The Perl /a modifier restricts /d numbers to pure ascii, and the /aa
 modifier restricts /i case-insensitive matching to pure ascii, ignoring Unicode
 rules. This separation cannot be represented with PCRE2_UCP.
 .P
-19. Perl has different limits than PCRE2. See the
+20. Perl has different limits than PCRE2. See the
 .\" HREF
 \fBpcre2limit\fP
 .\"
@@ -214,6 +222,6 @@
 .rs
 .sp
 .nf
-Last updated: 30 August 2021
+Last updated: 08 December 2021
 Copyright (c) 1997-2021 University of Cambridge.
 .fi
diff --git a/doc/pcre2jit.3 b/doc/pcre2jit.3
index 9b77550..f0b3b15 100644
--- a/doc/pcre2jit.3
+++ b/doc/pcre2jit.3
@@ -1,4 +1,4 @@
-.TH PCRE2JIT 3 "23 May 2019" "PCRE2 10.34"
+.TH PCRE2JIT 3 "30 November 2021" "PCRE2 10.40"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH "PCRE2 JUST-IN-TIME COMPILER SUPPORT"
@@ -251,11 +251,11 @@
 starts another match, that match must use a different JIT stack to the one used
 for currently suspended match(es).
 .P
-In a multithread application, if you do not
-specify a JIT stack, or if you assign or pass back NULL from a callback, that
-is thread-safe, because each thread has its own machine stack. However, if you
-assign or pass back a non-NULL JIT stack, this must be a different stack for
-each thread so that the application is thread-safe.
+In a multithread application, if you do not specify a JIT stack, or if you
+assign or pass back NULL from a callback, that is thread-safe, because each
+thread has its own machine stack. However, if you assign or pass back a
+non-NULL JIT stack, this must be a different stack for each thread so that the
+application is thread-safe.
 .P
 Strictly speaking, even more is allowed. You can assign the same non-NULL stack
 to a match context that is used by any number of patterns, as long as they are
@@ -355,8 +355,8 @@
 .B void pcre2_jit_free_unused_memory(pcre2_general_context *\fIgcontext\fP);
 .fi
 .P
-The JIT executable allocator does not free all memory when it is possible.
-It expects new allocations, and keeps some free memory around to improve
+The JIT executable allocator does not free all memory when it is possible. It
+expects new allocations, and keeps some free memory around to improve
 allocation speed. However, in low memory conditions, it might be better to free
 all possible memory. You can cause this to happen by calling
 pcre2_jit_free_unused_memory(). Its argument is a general context, for custom
@@ -416,10 +416,10 @@
 .P
 When you call \fBpcre2_match()\fP, as well as testing for invalid options, a
 number of other sanity checks are performed on the arguments. For example, if
-the subject pointer is NULL, an immediate error is given. Also, unless
-PCRE2_NO_UTF_CHECK is set, a UTF subject string is tested for validity. In the
-interests of speed, these checks do not happen on the JIT fast path, and if
-invalid data is passed, the result is undefined.
+the subject pointer is NULL but the length is non-zero, an immediate error is
+given. Also, unless PCRE2_NO_UTF_CHECK is set, a UTF subject string is tested
+for validity. In the interests of speed, these checks do not happen on the JIT
+fast path, and if invalid data is passed, the result is undefined.
 .P
 Bypassing the sanity checks and the \fBpcre2_match()\fP wrapping can give
 speedups of more than 10%.
@@ -445,6 +445,6 @@
 .rs
 .sp
 .nf
-Last updated: 23 May 2019
-Copyright (c) 1997-2019 University of Cambridge.
+Last updated: 30 November 2021
+Copyright (c) 1997-2021 University of Cambridge.
 .fi
diff --git a/doc/pcre2pattern.3 b/doc/pcre2pattern.3
index 627f229..3088ec0 100644
--- a/doc/pcre2pattern.3
+++ b/doc/pcre2pattern.3
@@ -1,4 +1,4 @@
-.TH PCRE2PATTERN 3 "3o0 August 2021" "PCRE2 10.38"
+.TH PCRE2PATTERN 3 "12 January 2022" "PCRE2 10.40"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH "PCRE2 REGULAR EXPRESSION DETAILS"
@@ -509,7 +509,6 @@
 .\" JOIN
   \e377   might be a backreference, otherwise
             the value 255 (decimal)
-.\" JOIN
   \e81    is always a backreference
 .sp
 Note that octal values of 100 or greater that are specified using this syntax
@@ -773,200 +772,64 @@
 sequences are of course limited to testing characters whose code points are
 less than U+0100 and U+10000, respectively. In 32-bit non-UTF mode, code points
 greater than 0x10ffff (the Unicode limit) may be encountered. These are all
-treated as being in the Unknown script and with an unassigned type. The extra
-escape sequences are:
+treated as being in the Unknown script and with an unassigned type.
+.P
+Matching characters by Unicode property is not fast, because PCRE2 has to do a
+multistage table lookup in order to find a character's property. That is why
+the traditional escape sequences such as \ed and \ew do not use Unicode
+properties in PCRE2 by default, though you can make them do so by setting the
+PCRE2_UCP option or by starting the pattern with (*UCP).
+.P
+The extra escape sequences that provide property support are:
 .sp
   \ep{\fIxx\fP}   a character with the \fIxx\fP property
   \eP{\fIxx\fP}   a character without the \fIxx\fP property
   \eX       a Unicode extended grapheme cluster
 .sp
-The property names represented by \fIxx\fP above are case-sensitive. There is
-support for Unicode script names, Unicode general category properties, "Any",
-which matches any character (including newline), and some special PCRE2
-properties (described in the
+The property names represented by \fIxx\fP above are not case-sensitive, and in
+accordance with Unicode's "loose matching" rules, spaces, hyphens, and
+underscores are ignored. There is support for Unicode script names, Unicode
+general category properties, "Any", which matches any character (including
+newline), Bidi_Class, a number of binary (yes/no) properties, and some special
+PCRE2 properties (described
 .\" HTML <a href="#extraprops">
 .\" </a>
-next section).
+below).
 .\"
-Other Perl properties such as "InMusicalSymbols" are not supported by PCRE2.
-Note that \eP{Any} does not match any characters, so always causes a match
-failure.
+Certain other Perl properties such as "InMusicalSymbols" are not supported by
+PCRE2. Note that \eP{Any} does not match any characters, so always causes a
+match failure.
+.
+.
+.
+.SS "Script properties for \ep and \eP"
+.rs
+.sp
+There are three different syntax forms for matching a script. Each Unicode
+character has a basic script and, optionally, a list of other scripts ("Script
+Extensions") with which it is commonly used. Using the Adlam script as an
+example, \ep{sc:Adlam} matches characters whose basic script is Adlam, whereas
+\ep{scx:Adlam} matches, in addition, characters that have Adlam in their
+extensions list. The full names "script" and "script extensions" for the
+property types are recognized, and a equals sign is an alternative to the
+colon. If a script name is given without a property type, for example,
+\ep{Adlam}, it is treated as \ep{scx:Adlam}. Perl changed to this
+interpretation at release 5.26 and PCRE2 changed at release 10.40.
 .P
-Sets of Unicode characters are defined as belonging to certain scripts. A
-character from one of these sets can be matched using a script name. For
-example:
-.sp
-  \ep{Greek}
-  \eP{Han}
-.sp
 Unassigned characters (and in non-UTF 32-bit mode, characters with code points
 greater than 0x10FFFF) are assigned the "Unknown" script. Others that are not
 part of an identified script are lumped together as "Common". The current list
-of scripts is:
-.P
-Adlam,
-Ahom,
-Anatolian_Hieroglyphs,
-Arabic,
-Armenian,
-Avestan,
-Balinese,
-Bamum,
-Bassa_Vah,
-Batak,
-Bengali,
-Bhaiksuki,
-Bopomofo,
-Brahmi,
-Braille,
-Buginese,
-Buhid,
-Canadian_Aboriginal,
-Carian,
-Caucasian_Albanian,
-Chakma,
-Cham,
-Cherokee,
-Chorasmian,
-Common,
-Coptic,
-Cuneiform,
-Cypriot,
-Cypro_Minoan,
-Cyrillic,
-Deseret,
-Devanagari,
-Dives_Akuru,
-Dogra,
-Duployan,
-Egyptian_Hieroglyphs,
-Elbasan,
-Elymaic,
-Ethiopic,
-Georgian,
-Glagolitic,
-Gothic,
-Grantha,
-Greek,
-Gujarati,
-Gunjala_Gondi,
-Gurmukhi,
-Han,
-Hangul,
-Hanifi_Rohingya,
-Hanunoo,
-Hatran,
-Hebrew,
-Hiragana,
-Imperial_Aramaic,
-Inherited,
-Inscriptional_Pahlavi,
-Inscriptional_Parthian,
-Javanese,
-Kaithi,
-Kannada,
-Katakana,
-Kayah_Li,
-Kharoshthi,
-Khitan_Small_Script,
-Khmer,
-Khojki,
-Khudawadi,
-Lao,
-Latin,
-Lepcha,
-Limbu,
-Linear_A,
-Linear_B,
-Lisu,
-Lycian,
-Lydian,
-Mahajani,
-Makasar,
-Malayalam,
-Mandaic,
-Manichaean,
-Marchen,
-Masaram_Gondi,
-Medefaidrin,
-Meetei_Mayek,
-Mende_Kikakui,
-Meroitic_Cursive,
-Meroitic_Hieroglyphs,
-Miao,
-Modi,
-Mongolian,
-Mro,
-Multani,
-Myanmar,
-Nabataean,
-Nandinagari,
-New_Tai_Lue,
-Newa,
-Nko,
-Nushu,
-Nyakeng_Puachue_Hmong,
-Ogham,
-Ol_Chiki,
-Old_Hungarian,
-Old_Italic,
-Old_North_Arabian,
-Old_Permic,
-Old_Persian,
-Old_Sogdian,
-Old_South_Arabian,
-Old_Turkic,
-Old_Uyghur,
-Oriya,
-Osage,
-Osmanya,
-Pahawh_Hmong,
-Palmyrene,
-Pau_Cin_Hau,
-Phags_Pa,
-Phoenician,
-Psalter_Pahlavi,
-Rejang,
-Runic,
-Samaritan,
-Saurashtra,
-Sharada,
-Shavian,
-Siddham,
-SignWriting,
-Sinhala,
-Sogdian,
-Sora_Sompeng,
-Soyombo,
-Sundanese,
-Syloti_Nagri,
-Syriac,
-Tagalog,
-Tagbanwa,
-Tai_Le,
-Tai_Tham,
-Tai_Viet,
-Takri,
-Tamil,
-Tangsa,
-Tangut,
-Telugu,
-Thaana,
-Thai,
-Tibetan,
-Tifinagh,
-Tirhuta,
-Toto,
-Ugaritic,
-Unknown,
-Vai,
-Vithkuqi,
-Wancho,
-Warang_Citi,
-Yezidi,
-Yi,
-Zanabazar_Square.
-.P
+of recognized script names and their 4-character abbreviations can be obtained
+by running this command:
+.sp
+  pcre2test -LS
+.sp
+.
+.
+.
+.SS "The general category property for \ep and \eP"
+.rs
+.sp
 Each character has exactly one Unicode general category property, specified by
 a two-letter abbreviation. For compatibility with Perl, negation can be
 specified by including a circumflex between the opening brace and the property
@@ -1026,9 +889,9 @@
   Zp    Paragraph separator
   Zs    Space separator
 .sp
-The special property L& is also supported: it matches a character that has
-the Lu, Ll, or Lt property, in other words, a letter that is not classified as
-a modifier or "other".
+The special property LC, which has the synonym L&, is also supported: it
+matches a character that has the Lu, Ll, or Lt property, in other words, a
+letter that is not classified as a modifier or "other".
 .P
 The Cs (Surrogate) property applies only to characters whose code points are in
 the range U+D800 to U+DFFF. These characters are no different to any other
@@ -1052,12 +915,53 @@
 Specifying caseless matching does not affect these escape sequences. For
 example, \ep{Lu} always matches only upper case letters. This is different from
 the behaviour of current versions of Perl.
-.P
-Matching characters by Unicode property is not fast, because PCRE2 has to do a
-multistage table lookup in order to find a character's property. That is why
-the traditional escape sequences such as \ed and \ew do not use Unicode
-properties in PCRE2 by default, though you can make them do so by setting the
-PCRE2_UCP option or by starting the pattern with (*UCP).
+.
+.
+.SS "Binary (yes/no) properties for \ep and \eP"
+.rs
+.sp
+Unicode defines a number of binary properties, that is, properties whose only
+values are true or false. You can obtain a list of those that are recognized by
+\ep and \eP, along with their abbreviations, by running this command:
+.sp
+  pcre2test -LP
+.sp
+.
+.
+.SS "The Bidi_Class property for \ep and \eP"
+.rs
+.sp
+  \ep{Bidi_Class:<class>}   matches a character with the given class
+  \ep{BC:<class>}           matches a character with the given class
+.sp
+The recognized classes are:
+.sp
+  AL          Arabic letter
+  AN          Arabic number
+  B           paragraph separator
+  BN          boundary neutral
+  CS          common separator
+  EN          European number
+  ES          European separator
+  ET          European terminator
+  FSI         first strong isolate
+  L           left-to-right
+  LRE         left-to-right embedding
+  LRI         left-to-right isolate
+  LRO         left-to-right override
+  NSM         non-spacing mark
+  ON          other neutral
+  PDF         pop directional format
+  PDI         pop directional isolate
+  R           right-to-left
+  RLE         right-to-left embedding
+  RLI         right-to-left isolate
+  RLO         right-to-left override
+  S           segment separator
+  WS          which space
+.sp
+An equals sign may be used instead of a colon. The class names are
+case-insensitive; only the short names listed above are recognized.
 .
 .
 .SS Extended grapheme clusters
@@ -1336,14 +1240,19 @@
 .sp
 Outside a character class, a dot in the pattern matches any one character in
 the subject string except (by default) a character that signifies the end of a
-line.
+line. One or more characters may be specified as line terminators (see
+.\" HTML <a href="#newlines">
+.\" </a>
+"Newline conventions"
+.\"
+above).
 .P
-When a line ending is defined as a single character, dot never matches that
-character; when the two-character sequence CRLF is used, dot does not match CR
-if it is immediately followed by LF, but otherwise it matches all characters
-(including isolated CRs and LFs). When any Unicode line endings are being
-recognized, dot does not match CR or LF or any of the other line ending
-characters.
+Dot never matches a single line-ending character. When the two-character
+sequence CRLF is the only line ending, dot does not match CR if it is
+immediately followed by LF, but otherwise it matches all characters (including
+isolated CRs and LFs). When ANYCRLF is selected for line endings, no occurences
+of CR of LF match dot. When all Unicode line endings are being recognized, dot
+does not match CR or LF or any of the other line ending characters.
 .P
 The behaviour of dot with regard to newlines can be changed. If the
 PCRE2_DOTALL option is set, a dot matches any one character, without exception.
@@ -2186,10 +2095,10 @@
 .sp
   (*atomic:\ed+)foo
 .sp
-This kind of parenthesized group "locks up" the  part of the pattern it
-contains once it has matched, and a failure further into the pattern is
-prevented from backtracking into it. Backtracking past it to previous items,
-however, works as normal.
+This kind of parenthesized group "locks up" the part of the pattern it contains
+once it has matched, and a failure further into the pattern is prevented from
+backtracking into it. Backtracking past it to previous items, however, works as
+normal.
 .P
 An alternative description is that a group of this type matches exactly the
 string of characters that an identical standalone pattern would match, if
@@ -3905,6 +3814,6 @@
 .rs
 .sp
 .nf
-Last updated: 30 August 2021
-Copyright (c) 1997-2021 University of Cambridge.
+Last updated: 12 January 2022
+Copyright (c) 1997-2022 University of Cambridge.
 .fi
diff --git a/doc/pcre2serialize.3 b/doc/pcre2serialize.3
index 85aee9b..987bc3a 100644
--- a/doc/pcre2serialize.3
+++ b/doc/pcre2serialize.3
@@ -6,11 +6,11 @@
 .sp
 .nf
 .B int32_t pcre2_serialize_decode(pcre2_code **\fIcodes\fP,
-.B "  int32_t \fInumber_of_codes\fP, const uint32_t *\fIbytes\fP,"
+.B "  int32_t \fInumber_of_codes\fP, const uint8_t *\fIbytes\fP,"
 .B "  pcre2_general_context *\fIgcontext\fP);"
 .sp
-.B int32_t pcre2_serialize_encode(pcre2_code **\fIcodes\fP,
-.B "  int32_t \fInumber_of_codes\fP, uint32_t **\fIserialized_bytes\fP,"
+.B int32_t pcre2_serialize_encode(const pcre2_code **\fIcodes\fP,
+.B "  int32_t \fInumber_of_codes\fP, uint8_t **\fIserialized_bytes\fP,"
 .B "  PCRE2_SIZE *\fIserialized_size\fP, pcre2_general_context *\fIgcontext\fP);"
 .sp
 .B void pcre2_serialize_free(uint8_t *\fIbytes\fP);
@@ -141,7 +141,6 @@
 \fBmalloc()\fP and \fBfree()\fP are used. After deserialization, the byte
 stream is no longer needed and can be discarded.
 .sp
-  int32_t number_of_codes;
   pcre2_code *list_of_codes[2];
   uint8_t *bytes = <serialized data>;
   int32_t number_of_codes =
diff --git a/doc/pcre2syntax.3 b/doc/pcre2syntax.3
index 937c817..c0a496f 100644
--- a/doc/pcre2syntax.3
+++ b/doc/pcre2syntax.3
@@ -1,4 +1,4 @@
-.TH PCRE2SYNTAX 3 "30 August 2021" "PCRE2 10.38"
+.TH PCRE2SYNTAX 3 "12 January 2022" "PCRE2 10.40"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH "PCRE2 REGULAR EXPRESSION SYNTAX SUMMARY"
@@ -102,6 +102,10 @@
 128-255. If the PCRE2_UCP option is set, the behaviour of these escape
 sequences is changed to use Unicode properties and they match many more
 characters.
+.P
+Property descriptions in \ep and \eP are matched caselessly; hyphens,
+underscores, and white space are ignored, in accordance with Unicode's "loose
+matching" rules.
 .
 .
 .SH "GENERAL CATEGORY PROPERTIES FOR \ep and \eP"
@@ -120,6 +124,7 @@
   Lo         Other letter
   Lt         Title case letter
   Lu         Upper case letter
+  Lc         Ll, Lu, or Lt
   L&         Ll, Lu, or Lt
 .sp
   M          Mark
@@ -167,170 +172,59 @@
 at release 5.18.
 .
 .
-.SH "SCRIPT NAMES FOR \ep AND \eP"
+.SH "BINARY PROPERTIES FOR \ep AND \eP"
 .rs
 .sp
-Adlam,
-Ahom,
-Anatolian_Hieroglyphs,
-Arabic,
-Armenian,
-Avestan,
-Balinese,
-Bamum,
-Bassa_Vah,
-Batak,
-Bengali,
-Bhaiksuki,
-Bopomofo,
-Brahmi,
-Braille,
-Buginese,
-Buhid,
-Canadian_Aboriginal,
-Carian,
-Caucasian_Albanian,
-Chakma,
-Cham,
-Cherokee,
-Chorasmian,
-Common,
-Coptic,
-Cuneiform,
-Cypriot,
-Cypro_Minoan,
-Cyrillic,
-Deseret,
-Devanagari,
-Dives_Akuru,
-Dogra,
-Duployan,
-Egyptian_Hieroglyphs,
-Elbasan,
-Elymaic,
-Ethiopic,
-Georgian,
-Glagolitic,
-Gothic,
-Grantha,
-Greek,
-Gujarati,
-Gunjala_Gondi,
-Gurmukhi,
-Han,
-Hangul,
-Hanifi_Rohingya,
-Hanunoo,
-Hatran,
-Hebrew,
-Hiragana,
-Imperial_Aramaic,
-Inherited,
-Inscriptional_Pahlavi,
-Inscriptional_Parthian,
-Javanese,
-Kaithi,
-Kannada,
-Katakana,
-Kayah_Li,
-Kharoshthi,
-Khitan_Small_Script,
-Khmer,
-Khojki,
-Khudawadi,
-Lao,
-Latin,
-Lepcha,
-Limbu,
-Linear_A,
-Linear_B,
-Lisu,
-Lycian,
-Lydian,
-Mahajani,
-Makasar,
-Malayalam,
-Mandaic,
-Manichaean,
-Marchen,
-Masaram_Gondi,
-Medefaidrin,
-Meetei_Mayek,
-Mende_Kikakui,
-Meroitic_Cursive,
-Meroitic_Hieroglyphs,
-Miao,
-Modi,
-Mongolian,
-Mro,
-Multani,
-Myanmar,
-Nabataean,
-Nandinagari,
-New_Tai_Lue,
-Newa,
-Nko,
-Nushu,
-Nyakeng_Puachue_Hmong,
-Ogham,
-Ol_Chiki,
-Old_Hungarian,
-Old_Italic,
-Old_North_Arabian,
-Old_Permic,
-Old_Persian,
-Old_Sogdian,
-Old_South_Arabian,
-Old_Turkic,
-Old_Uyghur,
-Oriya,
-Osage,
-Osmanya,
-Pahawh_Hmong,
-Palmyrene,
-Pau_Cin_Hau,
-Phags_Pa,
-Phoenician,
-Psalter_Pahlavi,
-Rejang,
-Runic,
-Samaritan,
-Saurashtra,
-Sharada,
-Shavian,
-Siddham,
-SignWriting,
-Sinhala,
-Sogdian,
-Sora_Sompeng,
-Soyombo,
-Sundanese,
-Syloti_Nagri,
-Syriac,
-Tagalog,
-Tagbanwa,
-Tai_Le,
-Tai_Tham,
-Tai_Viet,
-Takri,
-Tamil,
-Tangsa,
-Tangut,
-Telugu,
-Thaana,
-Thai,
-Tibetan,
-Tifinagh,
-Tirhuta,
-Toto,
-Ugaritic,
-Vai,
-Vithkuqi,
-Wancho,
-Warang_Citi,
-Yezidi,
-Yi,
-Zanabazar_Square.
+Unicode defines a number of binary properties, that is, properties whose only
+values are true or false. You can obtain a list of those that are recognized by
+\ep and \eP, along with their abbreviations, by running this command:
+.sp
+  pcre2test -LP
+.
+.
+.
+.SH "SCRIPT MATCHING WITH \ep AND \eP"
+.rs
+.sp
+Many script names and their 4-letter abbreviations are recognized in
+\ep{sc:...} or \ep{scx:...} items, or on their own with \ep (and also \eP of
+course). You can obtain a list of these scripts by running this command:
+.sp
+  pcre2test -LS
+.
+.
+.
+.SH "THE BIDI_CLASS PROPERTY FOR \ep AND \eP"
+.rs
+.sp
+  \ep{Bidi_Class:<class>}   matches a character with the given class
+  \ep{BC:<class>}           matches a character with the given class
+.sp
+The recognized classes are:
+.sp
+  AL          Arabic letter
+  AN          Arabic number
+  B           paragraph separator
+  BN          boundary neutral
+  CS          common separator
+  EN          European number
+  ES          European separator
+  ET          European terminator
+  FSI         first strong isolate
+  L           left-to-right
+  LRE         left-to-right embedding
+  LRI         left-to-right isolate
+  LRO         left-to-right override
+  NSM         non-spacing mark
+  ON          other neutral
+  PDF         pop directional format
+  PDI         pop directional isolate
+  R           right-to-left
+  RLE         right-to-left embedding
+  RLI         right-to-left isolate
+  RLO         right-to-left override
+  S           segment separator
+  WS          which space
 .
 .
 .SH "CHARACTER CLASSES"
@@ -684,6 +578,6 @@
 .rs
 .sp
 .nf
-Last updated: 30 August 2021
-Copyright (c) 1997-2021 University of Cambridge.
+Last updated: 12 January 2022
+Copyright (c) 1997-2022 University of Cambridge.
 .fi
diff --git a/doc/pcre2test.1 b/doc/pcre2test.1
index d98e974..d374f3e 100644
--- a/doc/pcre2test.1
+++ b/doc/pcre2test.1
@@ -1,4 +1,4 @@
-.TH PCRE2TEST 1 "30 August 2021" "PCRE 10.38"
+.TH PCRE2TEST 1 "12 January 2022" "PCRE 10.40"
 .SH NAME
 pcre2test - a program for testing Perl-compatible regular expressions.
 .SH SYNOPSIS
@@ -47,7 +47,7 @@
 to 8-bit code units for output.
 .P
 In the rest of this document, the names of library functions and structures
-are given in generic form, for example, \fBpcre_compile()\fP. The actual
+are given in generic form, for example, \fBpcre2_compile()\fP. The actual
 names used in the libraries have a suffix _8, _16, or _32, as appropriate.
 .
 .
@@ -211,7 +211,17 @@
 \fB-LM\fP
 List modifiers: write a list of available pattern and subject modifiers to the
 standard output, then exit with zero exit code. All other options are ignored.
-If both -C and -LM are present, whichever is first is recognized.
+If both -C and any -Lx options are present, whichever is first is recognized.
+.TP 10
+\fB-LP\fP
+List properties: write a list of recognized Unicode properties to the standard
+output, then exit with zero exit code. All other options are ignored. If both
+-C and any -Lx options are present, whichever is first is recognized.
+.TP 10
+\fB-LS\fP
+List scripts: write a list of recogized Unicode script names to the standard
+output, then exit with zero exit code. All other options are ignored. If both
+-C and any -Lx options are present, whichever is first is recognized.
 .TP 10
 \fB-pattern\fP \fImodifier-list\fP
 Behave as if each pattern line contains the given modifiers.
@@ -1206,6 +1216,8 @@
       match_limit=<n>            set a match limit
       memory                     show heap memory usage
       null_context               match with a NULL context
+      null_replacement           substitute with NULL replacement
+      null_subject               match with NULL subject
       offset=<n>                 set starting offset
       offset_limit=<n>           set offset limit
       ovector=<n>                set size of output vector
@@ -1629,7 +1641,7 @@
 passing the replacement string as zero-terminated.
 .
 .
-.SS "Passing a NULL context"
+.SS "Passing a NULL context, subject, or replacement"
 .rs
 .sp
 Normally, \fBpcre2test\fP passes a context block to \fBpcre2_match()\fP,
@@ -1638,6 +1650,10 @@
 testing that the matching and substitution functions behave correctly in this
 case (they use default values). This modifier cannot be used with the
 \fBfind_limits\fP or \fBsubstitute_callout\fP modifiers.
+.P
+Similarly, for testing purposes, if the \fBnull_subject\fP or
+\fBnull_replacement\fP modifier is set, the subject or replacement string
+pointers are passed as NULL, respectively, to the relevant functions.
 .
 .
 .SH "THE ALTERNATIVE MATCHING FUNCTION"
@@ -2103,6 +2119,6 @@
 .rs
 .sp
 .nf
-Last updated: 30 August 2021
-Copyright (c) 1997-2021 University of Cambridge.
+Last updated: 12 January 2022
+Copyright (c) 1997-2022 University of Cambridge.
 .fi
diff --git a/doc/pcre2test.txt b/doc/pcre2test.txt
index 217bed5..ed7dd20 100644
--- a/doc/pcre2test.txt
+++ b/doc/pcre2test.txt
@@ -44,7 +44,7 @@
        output.
 
        In the rest of this document, the names of library functions and struc-
-       tures  are  given in generic form, for example, pcre_compile(). The ac-
+       tures  are given in generic form, for example, pcre2_compile(). The ac-
        tual names used in the libraries have a suffix _8, _16, or _32, as  ap-
        propriate.
 
@@ -197,7 +197,17 @@
 
        -LM       List modifiers: write a list of available pattern and subject
                  modifiers to the standard output, then exit  with  zero  exit
-                 code.  All other options are ignored.  If both -C and -LM are
+                 code.  All other options are ignored.  If both -C and any -Lx
+                 options are present, whichever is first is recognized.
+
+       -LP       List properties: write a list of recognized  Unicode  proper-
+                 ties  to  the standard output, then exit with zero exit code.
+                 All other options are ignored. If both -C and any -Lx options
+                 are present, whichever is first is recognized.
+
+       -LS       List  scripts: write a list of recogized Unicode script names
+                 to the standard output, then exit with zero  exit  code.  All
+                 other options are ignored. If both -C and any -Lx options are
                  present, whichever is first is recognized.
 
        -pattern modifier-list
@@ -1111,6 +1121,8 @@
              match_limit=<n>            set a match limit
              memory                     show heap memory usage
              null_context               match with a NULL context
+             null_replacement           substitute with NULL replacement
+             null_subject               match with NULL subject
              offset=<n>                 set starting offset
              offset_limit=<n>           set offset limit
              ovector=<n>                set size of output vector
@@ -1499,7 +1511,7 @@
        When testing pcre2_substitute(), this modifier also has the  effect  of
        passing the replacement string as zero-terminated.
 
-   Passing a NULL context
+   Passing a NULL context, subject, or replacement
 
        Normally,   pcre2test   passes   a   context  block  to  pcre2_match(),
        pcre2_dfa_match(), pcre2_jit_match()  or  pcre2_substitute().   If  the
@@ -1508,6 +1520,10 @@
        in  this  case  (they use default values). This modifier cannot be used
        with the find_limits or substitute_callout modifiers.
 
+       Similarly, for testing purposes, if the null_subject  or  null_replace-
+       ment  modifier  is  set, the subject or replacement string pointers are
+       passed as NULL, respectively, to the relevant functions.
+
 
 THE ALTERNATIVE MATCHING FUNCTION
 
@@ -1933,5 +1949,5 @@
 
 REVISION
 
-       Last updated: 30 August 2021
-       Copyright (c) 1997-2021 University of Cambridge.
+       Last updated: 12 January 2022
+       Copyright (c) 1997-2022 University of Cambridge.
diff --git a/doc/pcre2unicode.3 b/doc/pcre2unicode.3
index 055a4ce..e7e37a3 100644
--- a/doc/pcre2unicode.3
+++ b/doc/pcre2unicode.3
@@ -1,4 +1,4 @@
-.TH PCRE2UNICODE 3 "23 February 2020" "PCRE2 10.35"
+.TH PCRE2UNICODE 3 "22 December 2021" "PCRE2 10.40"
 .SH NAME
 PCRE - Perl-compatible regular expressions (revised API)
 .SH "UNICODE AND UTF SUPPORT"
@@ -40,10 +40,11 @@
 .sp
 When PCRE2 is built with Unicode support, the escape sequences \ep{..},
 \eP{..}, and \eX can be used. This is not dependent on the PCRE2_UTF setting.
-The Unicode properties that can be tested are limited to the general category
-properties such as Lu for an upper case letter or Nd for a decimal number, the
-Unicode script names such as Arabic or Han, and the derived properties Any and
-L&. Full lists are given in the
+The Unicode properties that can be tested are a subset of those that Perl
+supports. Currently they are limited to the general category properties such as
+Lu for an upper case letter or Nd for a decimal number, the Unicode script
+names such as Arabic or Han, Bidi_Class, Bidi_Control, and the derived
+properties Any and LC (synonym L&). Full lists are given in the
 .\" HREF
 \fBpcre2pattern\fP
 .\"
@@ -51,10 +52,10 @@
 .\" HREF
 \fBpcre2syntax\fP
 .\"
-documentation. Only the short names for properties are supported. For example,
-\ep{L} matches a letter. Its Perl synonym, \ep{Letter}, is not supported.
-Furthermore, in Perl, many properties may optionally be prefixed by "Is", for
-compatibility with Perl 5.6. PCRE2 does not support this.
+documentation. In general, only the short names for properties are supported.
+For example, \ep{L} matches a letter. Its longer synonym, \ep{Letter}, is not
+supported. Furthermore, in Perl, many properties may optionally be prefixed by
+"Is", for compatibility with Perl 5.6. PCRE2 does not support this.
 .
 .
 .SH "WIDE CHARACTERS AND UTF MODES"
@@ -448,7 +449,7 @@
 .sp
 .nf
 Philip Hazel
-University Computing Service
+Retired from University Computing Service
 Cambridge, England.
 .fi
 .
@@ -457,6 +458,6 @@
 .rs
 .sp
 .nf
-Last updated: 23 February 2020
-Copyright (c) 1997-2020 University of Cambridge.
+Last updated: 22 December 2021
+Copyright (c) 1997-2021 University of Cambridge.
 .fi