goprotobuf: Change Size implementation to use the same code structure as Marshal (encode).

This is much faster (2x-4x), and makes zero allocations.

R=r
CC=golang-dev
https://codereview.appspot.com/14430057
diff --git a/proto/lib.go b/proto/lib.go
index fa6fe22..5d5e345 100644
--- a/proto/lib.go
+++ b/proto/lib.go
@@ -223,6 +223,7 @@
 	Decode  uint64 // number of decodes
 	Chit    uint64 // number of cache hits
 	Cmiss   uint64 // number of cache misses
+	Size    uint64 // number of sizes
 }
 
 // Set to true to enable stats collection.