Make the Android fast-path UTF-8 decoder follow the Unicode Standard and the W3C Encoding standard.

The behavior of UTF-8 decoder in the RI has been made to strictly
follow the Unicode standard since OpenJDK 8. JDK-7096080
Essentially, it rejects
1. 3-byte surrogate/6-byte surrogate pair (CESU-8 sequence)
2. treats an ill-formed sequence, e.g. a surrogate, as individual ill-formed bytes.

This change updates Android's fast-path UTF-8 decoder to
- follow the Unicode standard
- have a behavior closer to RI OpenJDK 8
- have consistent behavior between java.nio.charset.CharsetDecoder and fast-path code

It implements the W3C recommended UTF-8 decoder.
https://www.w3.org/TR/encoding/#utf-8-decoder

Behavior change of the fast-path UTF-8 decoder
- No longer behaves like a decoder for Modified UTF-8 and CESU-8 sequence
  -- If an app needs to decode a Modified UTF-8 / CESU-8 sequence,
     the app can use public API DataInputStream.readUTF or JNI function NewStringUTF
     See example at StringTest.decodeModifiedUTF8
- Treat overlong sequence as ill-formed.
  For example, byte sequence "c0 b1" is over-long form of character '1' U+0031.
- Treat surrogate (U+D800..U+DFFF) as ill-formed
- Maximal subpart should be replaced by a single U+FFFD.
For example, in byte sequence "41 C0 AF 41 F4 80 80 41", the maximal subparts are
"C0", "AF", and "F4 80 80". "F4 80 80" can be the initial subsequence of "F4 80 80 80",
but "C0 AF" can't be the initial subsequence of any well-formed code unit sequence.
Thus, the output should be "A\ufffd\ufffdA\ufffdA".

Test change:
- CharsetEncoder2Test.testUtf8Encoding: UTF-8 encoded Surrogate is treated as invalid
- X500PrincipalTest.testValidDN: Overlong sequence is now treated as invalid.
  According to my test, Android Conscrypt (and BoringSSL) has rejected a certificate with
  such overlong sequence in CN since OC MR1. Thus, it has little use case to create
  X500Principal with overlong UTF-8 sequence.
  Also, RI doesn't pass this test either.
  Context: From my understanding, certificate and X500 principal are stored
  in ASN.1 format. The RFC standards quoted in X500Principal don't
  prohibit overlong UTF-8 sequences. But the new standards RFC5280 for X.509
  and RFC3629 for UTF-8 explicitly prohibits any overlong UTF-8 sequences.

Performance change:
The performance of the fast-path decoder is similar before and after the change.

=== Before the change ===
CharsetBenchmark
Experiment {instrument=runtime, benchmarkMethod=time_new_String_BString, vm=default, parameters={length=10000, name=UTF-8}}
  Results:
    runtime(ns): min=574795.84, 1st qu.=574795.84, median=574795.84, mean=574795.84, 3rd qu.=574795.84, max=574795.84
Trial Report (1 of 4):

CharsetUtf8Benchmark
Experiment {instrument=runtime, benchmarkMethod=time_ascii, vm=default, parameters={}}
  Results:
    runtime(ns): min=58290943.00, 1st qu.=58290943.00, median=58290943.00, mean=58290943.00, 3rd qu.=58290943.00, max=58290943.00
Trial Report (2 of 4):
  Experiment {instrument=runtime, benchmarkMethod=time_bmp2, vm=default, parameters={}}
  Results:
    runtime(ns): min=77581414.00, 1st qu.=77581414.00, median=77581414.00, mean=77581414.00, 3rd qu.=77581414.00, max=77581414.00
Trial Report (3 of 4):
  Experiment {instrument=runtime, benchmarkMethod=time_bmp3, vm=default, parameters={}}
  Results:
    runtime(ns): min=57457297.00, 1st qu.=57457297.00, median=57457297.00, mean=57457297.00, 3rd qu.=57457297.00, max=57457297.00
Trial Report (4 of 4):
  Experiment {instrument=runtime, benchmarkMethod=time_supplementary, vm=default, parameters={}}
  Results:
    runtime(ns): min=60723183.00, 1st qu.=60723183.00, median=60723183.00, mean=60723183.00, 3rd qu.=60723183.00, max=60723183.00

=== After the change ===
CharsetBenchmark
Experiment {instrument=runtime, benchmarkMethod=time_new_String_BString, vm=default, parameters={length=10000, name=UTF-8}}
  Results:
    runtime(ns): min=523638.25, 1st qu.=523638.25, median=523638.25, mean=523638.25, 3rd qu.=523638.25, max=523638.25
CharsetUtf8Benchmark
Trial Report (1 of 4):
  Experiment {instrument=runtime, benchmarkMethod=time_ascii, vm=default, parameters={}}
  Results:
    runtime(ns): min=57101725.00, 1st qu.=57101725.00, median=57101725.00, mean=57101725.00, 3rd qu.=57101725.00, max=57101725.00
Trial Report (2 of 4):
  Experiment {instrument=runtime, benchmarkMethod=time_bmp2, vm=default, parameters={}}
  Results:
    runtime(ns): min=76573080.00, 1st qu.=76573080.00, median=76573080.00, mean=76573080.00, 3rd qu.=76573080.00, max=76573080.00
Trial Report (3 of 4):
  Experiment {instrument=runtime, benchmarkMethod=time_bmp3, vm=default, parameters={}}
  Results:
    runtime(ns): min=59655214.00, 1st qu.=59655214.00, median=59655214.00, mean=59655214.00, 3rd qu.=59655214.00, max=59655214.00
Trial Report (4 of 4):
  Experiment {instrument=runtime, benchmarkMethod=time_supplementary, vm=default, parameters={}}
  Results:
    runtime(ns): min=67283548.00, 1st qu.=67283548.00, median=67283548.00, mean=67283548.00, 3rd qu.=67283548.00, max=67283548.00

Test: cts-tradefed run cts-dev -m CtsLibcoreTestCases
Test: cts-tradefed run cts-dev -m CtsLibcoreOjTestCases
Bug: 69599767
Bug: 70511691
Change-Id: I2c3e84808b19c969905813f6654ba552b6745354
6 files changed