commit | 953599543e6936b7d7285ecab7273b26da98b97f | [log] [tgz] |
---|---|---|
author | Carl Mastrangelo <notcarl@google.com> | Fri Jul 15 20:00:44 2016 -0700 |
committer | Carl Mastrangelo <notcarl@google.com> | Mon Jul 18 09:50:47 2016 -0700 |
tree | 2339a3db1cf7096492d9acccfd7ae277d2ccb4fb | |
parent | 8f3173617c2e825558c415555e1259ae2fa2e1cf [diff] |
netty: reduce contention in WriteQueue WriteQueue uses LinkedBlockingQueue, which has stronger synchronization semantics than we need. It also requires that we batch reads from it in order to get reasonable performance. After profiling the delay between writing to LBQ and reading from it, there was a ~10us delay. This change switches to using ConcurrentLinkedQueue as the underlying queue, and removes the batching (reads). Using CLQ with batching is slightly slower. Benchmarks show favorable numbers for both latency and throughput. Each of the following results were run serveral times: Before: Benchmark (direct) (transport) Mode Cnt Score Error Units TransportBenchmark.unaryCall1024 true NETTY sample 321575 124185.027 ± 406.112 ns/op TransportBenchmark.unaryCall1024 false NETTY sample 237400 168232.991 ± 548.043 ns/op After: Benchmark (direct) (transport) Mode Cnt Score Error Units TransportBenchmark.unaryCall1024 true NETTY sample 354773 112552.339 ± 362.471 ns/op TransportBenchmark.unaryCall1024 false NETTY sample 263297 151660.490 ± 507.463 ns/op Qps with 10 outstanding RPCs per channel: Before: Channels: 4 Outstanding RPCs per Channel: 10 Server Payload Size: 0 Client Payload Size: 0 50%ile Latency (in micros): 396 90%ile Latency (in micros): 680 95%ile Latency (in micros): 838 99%ile Latency (in micros): 1476 99.9%ile Latency (in micros): 5231 Maximum Latency (in micros): 43327 QPS: 85761 After: Channels: 4 Outstanding RPCs per Channel: 10 Server Payload Size: 0 Client Payload Size: 0 50%ile Latency (in micros): 384 90%ile Latency (in micros): 612 95%ile Latency (in micros): 725 99%ile Latency (in micros): 1080 99.9%ile Latency (in micros): 3107 Maximum Latency (in micros): 30447 QPS: 93353 The results are even better when under heavy load. Qps with 100 outstanding RPCs per channel: Before: Channels: 4 Outstanding RPCs per Channel: 100 Server Payload Size: 0 Client Payload Size: 0 50%ile Latency (in micros): 2735 90%ile Latency (in micros): 5051 95%ile Latency (in micros): 6219 99%ile Latency (in micros): 9271 99.9%ile Latency (in micros): 13759 Maximum Latency (in micros): 44831 QPS: 125775 After: Channels: 4 Outstanding RPCs per Channel: 100 Server Payload Size: 0 Client Payload Size: 0 50%ile Latency (in micros): 2697 90%ile Latency (in micros): 4639 95%ile Latency (in micros): 5539 99%ile Latency (in micros): 7931 99.9%ile Latency (in micros): 12335 Maximum Latency (in micros): 61823 QPS: 131904
gRPC-Java works with JDK 6. TLS usage typically requires using Java 8, or Play Services Dynamic Security Provider on Android. Please see the Security Readme.
Download the JARs. Or for Maven with non-Android, add to your pom.xml
:
<dependency> <groupId>io.grpc</groupId> <artifactId>grpc-netty</artifactId> <version>0.14.0</version> </dependency> <dependency> <groupId>io.grpc</groupId> <artifactId>grpc-protobuf</artifactId> <version>0.14.0</version> </dependency> <dependency> <groupId>io.grpc</groupId> <artifactId>grpc-stub</artifactId> <version>0.14.0</version> </dependency>
Or for Gradle with non-Android, add to your dependencies:
compile 'io.grpc:grpc-netty:0.14.0' compile 'io.grpc:grpc-protobuf:0.14.0' compile 'io.grpc:grpc-stub:0.14.0'
For Android client, use grpc-okhttp
instead of grpc-netty
and grpc-protobuf-nano
or grpc-protobuf-lite
instead of grpc-protobuf
:
compile 'io.grpc:grpc-okhttp:0.14.0' compile 'io.grpc:grpc-protobuf-nano:0.14.0' compile 'io.grpc:grpc-stub:0.14.0'
Development snapshots are available in Sonatypes's snapshot repository.
For protobuf-based codegen, you can put your proto files in the src/main/proto
and src/test/proto
directories along with an appropriate plugin.
For protobuf-based codegen integrated with the Maven build system, you can use protobuf-maven-plugin:
<build> <extensions> <extension> <groupId>kr.motd.maven</groupId> <artifactId>os-maven-plugin</artifactId> <version>1.4.1.Final</version> </extension> </extensions> <plugins> <plugin> <groupId>org.xolstice.maven.plugins</groupId> <artifactId>protobuf-maven-plugin</artifactId> <version>0.5.0</version> <configuration> <!-- The version of protoc must match protobuf-java. If you don't depend on protobuf-java directly, you will be transitively depending on the protobuf-java version that grpc depends on. --> <protocArtifact>com.google.protobuf:protoc:3.0.0-beta-2:exe:${os.detected.classifier}</protocArtifact> <pluginId>grpc-java</pluginId> <pluginArtifact>io.grpc:protoc-gen-grpc-java:0.14.0:exe:${os.detected.classifier}</pluginArtifact> </configuration> <executions> <execution> <goals> <goal>compile</goal> <goal>compile-custom</goal> </goals> </execution> </executions> </plugin> </plugins> </build>
For protobuf-based codegen integrated with the Gradle build system, you can use protobuf-gradle-plugin:
apply plugin: 'java' apply plugin: 'com.google.protobuf' buildscript { repositories { mavenCentral() } dependencies { // ASSUMES GRADLE 2.12 OR HIGHER. Use plugin version 0.7.5 with earlier // gradle versions classpath 'com.google.protobuf:protobuf-gradle-plugin:0.7.7' } } protobuf { protoc { // The version of protoc must match protobuf-java. If you don't depend on // protobuf-java directly, you will be transitively depending on the // protobuf-java version that grpc depends on. artifact = "com.google.protobuf:protoc:3.0.0-beta-2" } plugins { grpc { artifact = 'io.grpc:protoc-gen-grpc-java:0.14.0' } } generateProtoTasks { all()*.plugins { grpc {} } } }
If you are making changes to gRPC-Java, see the compiling instructions.
Here's a quick readers' guide to the code to help folks get started. At a high level there are three distinct layers to the library: Stub, Channel & Transport.
The Stub layer is what is exposed to most developers and provides type-safe bindings to whatever datamodel/IDL/interface you are adapting. gRPC comes with a plugin to the protocol-buffers compiler that generates Stub interfaces out of .proto
files, but bindings to other datamodel/IDL should be trivial to add and are welcome.
The Channel layer is an abstraction over Transport handling that is suitable for interception/decoration and exposes more behavior to the application than the Stub layer. It is intended to be easy for application frameworks to use this layer to address cross-cutting concerns such as logging, monitoring, auth etc. Flow-control is also exposed at this layer to allow more sophisticated applications to interact with it directly.
The Transport layer does the heavy lifting of putting and taking bytes off the wire. The interfaces to it are abstract just enough to allow plugging in of different implementations. Transports are modeled as Stream
factories. The variation in interface between a server Stream and a client Stream exists to codify their differing semantics for cancellation and error reporting.
Note the transport layer API is considered internal to gRPC and has weaker API guarantees than the core API under package io.grpc
.
gRPC comes with three Transport implementations:
Tests showing how these layers are composed to execute calls using protobuf messages can be found here https://github.com/google/grpc-java/tree/master/interop-testing/src/main/java/io/grpc/testing/integration