internal/encoding/text: replace use of regular expression in decoding
Improve performance by replacing use of regular expressions with direct
parsing code.
Compared to latest version:
name old time/op new time/op delta
Text/Unmarshal/google_message1_proto2-4 21.8µs ± 5% 14.0µs ± 9% -35.69% (p=0.000 n=10+9)
Text/Unmarshal/google_message1_proto3-4 19.6µs ± 4% 13.8µs ±10% -29.47% (p=0.000 n=10+10)
Text/Unmarshal/google_message2-4 13.4ms ± 4% 4.9ms ± 4% -63.44% (p=0.000 n=10+10)
Text/Marshal/google_message1_proto2-4 13.8µs ± 2% 14.1µs ± 4% +2.42% (p=0.011 n=9+10)
Text/Marshal/google_message1_proto3-4 11.6µs ± 2% 11.8µs ± 8% ~ (p=0.573 n=8+10)
Text/Marshal/google_message2-4 8.01ms ±48% 5.97ms ± 5% -25.44% (p=0.000 n=10+10)
name old alloc/op new alloc/op delta
Text/Unmarshal/google_message1_proto2-4 13.0kB ± 0% 12.6kB ± 0% -3.40% (p=0.000 n=10+10)
Text/Unmarshal/google_message1_proto3-4 13.0kB ± 0% 12.5kB ± 0% -3.50% (p=0.000 n=10+10)
Text/Unmarshal/google_message2-4 5.67MB ± 0% 5.50MB ± 0% -3.13% (p=0.000 n=10+10)
Text/Marshal/google_message1_proto2-4 12.0kB ± 0% 12.1kB ± 0% +0.02% (p=0.000 n=10+10)
Text/Marshal/google_message1_proto3-4 11.7kB ± 0% 11.7kB ± 0% +0.01% (p=0.000 n=10+10)
Text/Marshal/google_message2-4 5.68MB ± 0% 5.68MB ± 0% +0.01% (p=0.000 n=10+10)
name old allocs/op new allocs/op delta
Text/Unmarshal/google_message1_proto2-4 142 ± 0% 142 ± 0% ~ (all equal)
Text/Unmarshal/google_message1_proto3-4 156 ± 0% 156 ± 0% ~ (all equal)
Text/Unmarshal/google_message2-4 70.1k ± 0% 65.4k ± 0% -6.76% (p=0.000 n=10+10)
Text/Marshal/google_message1_proto2-4 91.0 ± 0% 91.0 ± 0% ~ (all equal)
Text/Marshal/google_message1_proto3-4 80.0 ± 0% 80.0 ± 0% ~ (all equal)
Text/Marshal/google_message2-4 36.4k ± 0% 36.4k ± 0% ~ (all equal)
Change-Id: Ia5d3c16e9e33961aae03bac0d53fcfc5b1943d2a
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/173360
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
diff --git a/internal/encoding/text/encode.go b/internal/encoding/text/encode.go
index 982037b..1f49761 100644
--- a/internal/encoding/text/encode.go
+++ b/internal/encoding/text/encode.go
@@ -6,6 +6,7 @@
import (
"bytes"
+ "regexp"
"strings"
"google.golang.org/protobuf/internal/detrand"
@@ -142,6 +143,12 @@
return nil
}
+// This expression is more liberal than ConsumeAnyTypeUrl in C++.
+// However, the C++ parser does not handle many legal URL strings.
+// The Go implementation is more liberal to be backwards compatible with
+// the historical Go implementation which was overly liberal (and buggy).
+var urlRegexp = regexp.MustCompile(`^[-_a-zA-Z0-9]+([./][-_a-zA-Z0-9]+)*`)
+
func (p *encoder) marshalKey(v Value) error {
switch v.Type() {
case String: