| README.txt |
| |
| |
| This Poller class demonstrates access to poll(2) functionality in Java. |
| |
| Requires Solaris production (native threads) JDK 1.2 or later, currently |
| the C code compiles only on Solaris (SPARC and Intel). |
| |
| Poller.java is the class, Poller.c is the supporting JNI code. |
| |
| PollingServer.java is a sample application which uses the Poller class |
| to multiplex sockets. |
| |
| SimpleServer.java is the functional equivalent that does not multiplex |
| but uses a single thread to handle each client connection. |
| |
| Client.java is a sample application to drive against either server. |
| |
| To build the Poller class and client/server demo : |
| javac PollingServer.java Client.java |
| javah Poller |
| cc -G -o libpoller.so -I ${JAVA_HOME}/include -I ${JAVA_HOME}/include/solaris\ |
| Poller.c |
| |
| You will need to set the environment variable LD_LIBRARY_PATH to search |
| the directory containing libpoller.so. |
| |
| To use client/server, bump up your fd limit to handle the connections you |
| want (need root access to go beyond 1024). For info on changing your file |
| descriptor limit, type "man limit". If you are using Solaris 2.6 |
| or later, a regression in loopback read() performance may hit you at low |
| numbers of connections, so run the client on another machine. |
| |
| BASICs of Poller class usage : |
| run "javadoc Poller" or see Poller.java for more details. |
| |
| { |
| Poller Mux = new Poller(65535); // allow it to contain 64K IO objects |
| |
| int fd1 = Mux.add(socket1, Poller.POLLIN); |
| ... |
| int fdN = Mux.add(socketN, Poller.POLLIN); |
| |
| int[] fds = new int[100]; |
| short[] revents = new revents[100]; |
| |
| int numEvents = Mux.waitMultiple(100, fds, revents, timeout); |
| |
| for (int i = 0; i < numEvents; i++) { |
| /* |
| * Probably need more sophisticated mapping scheme than this! |
| */ |
| if (fds[i] == fd1) { |
| System.out.println("Got data on socket1"); |
| socket1.getInputStream().read(byteArray); |
| // Do something based upon state of fd1 connection |
| } |
| ... |
| } |
| } |
| |
| Poller class implementation notes : |
| |
| Currently all add(),remove(),isMember(), and waitMultiple() methods |
| are synchronized for each Poller object. If one thread is blocked in |
| pObj.waitMultiple(), another thread calling pObj.add(fd) will block |
| until waitMultiple() returns. There is no provided mechanism to |
| interrupt waitMultiple(), as one might expect a ServerSocket to be in |
| the list waited on (see PollingServer.java). |
| |
| One might also need to interrupt waitMultiple() to remove() |
| fds/sockets, in which case one could create a Pipe or loopback localhost |
| connection (at the level of PollingServer) and use a write() to that |
| connection to interrupt. Or, better, one could queue up deletions |
| until the next return of waitMultiple(). Or one could implement an |
| interrupt mechanism in the JNI C code using a pipe(), and expose that |
| at the Java level. |
| |
| If frequent deletions/re-additions of socks/fds is to be done with |
| very large sets of monitored fds, the Solaris 7 kernel cache will |
| likely perform poorly without some tuning. One could differentiate |
| between deleted (no longer cared for) fds/socks and those that are |
| merely being disabled while data is processed on their behalf. In |
| that case, re-enabling a disabled fd/sock could put it in it's |
| original position in the poll array, thereby increasing the kernel |
| cache performance. This would best be done in Poller.c. Of course |
| this is not necessary for optimal /dev/poll performance. |
| |
| Caution...the next paragraph gets a little technical for the |
| benefit of those who already understand poll()ing fairly well. Others |
| may choose to skip over it to read notes on the demo server. |
| |
| An optimal solution for frequent enabling/disabling of socks/fds |
| could involve a separately synchronized structure of "async" |
| operations. Using a simple array (0..64k) containing the action |
| (ADD,ENABLE,DISABLE, NONE), the events, and the index into the poll |
| array, and having nativeWait() wake up in the poll() call periodically |
| to process these async operations, I was able to speed up performance |
| of the PollingServer by a factor of 2x at 8000 connections. Of course |
| much of that gain was from the fact that I could (with the advent of |
| an asyncAdd() method) move the accept() loop into a separate thread |
| from the main poll() loop, and avoid the overhead of calling poll() |
| with up to 7999 fds just for an accept. In implementing the async |
| Disable/Enable, a further large optimization was to auto-disable fds |
| with events available (before return from nativeWait()), so I could |
| just call asyncEnable(fd) after processing (read()ing) the available |
| data. This removed the need for inefficient gang-scheduling the |
| attached PollingServer uses. In order to separately synchronize the |
| async structure, yet still be able to operate on it from within |
| nativeWait(), synchronization had to be done at the C level here. Due |
| to the new complexities this introduced, as well as the fact that it |
| was tuned specifically for Solaris 7 poll() improvements (not |
| /dev/poll), this extra logic was left out of this demo. |
| |
| |
| Client/Server Demo Notes : |
| |
| Do not run the sample client/server with high numbers of connections |
| unless you have a lot of free memory on your machine, as it can saturate |
| CPU and lock you out of CDE just by its very resource intensive nature |
| (much more so the SimpleServer than PollingServer). |
| |
| Different OS versions will behave very differently as far as poll() |
| performance (or /dev/poll existence) but, generally, real world applications |
| "hit the wall" much earlier when a separate thread is used to handle |
| each client connection. Issues of thread synchronization and locking |
| granularity become performance killers. There is some overhead associated |
| with multiplexing, such as keeping track of the state of each connection; as |
| the number of connections gets very large, however, this overhead is more |
| than made up for by the reduced synchronization overhead. |
| |
| As an example, running the servers on a Solaris 7 PC (Pentium II-350 x |
| 2 CPUS) with 1 GB RAM, and the client on an Ultra-2, I got the following |
| times (shorter is better) : |
| |
| 1000 connections : |
| |
| PollingServer took 11 seconds |
| SimpleServer took 12 seconds |
| |
| 4000 connections : |
| |
| PollingServer took 20 seconds |
| SimpleServer took 37 seconds |
| |
| 8000 connections : |
| |
| PollingServer took 39 seconds |
| SimpleServer took 1:48 seconds |
| |
| This demo is not, however, meant to be considered some form of proof |
| that multiplexing with the Poller class will gain you performance; this |
| code is actually very heavily biased towards the non-polling server as |
| very little synchronization is done, and most of the overhead is in the |
| kernel IO for both servers. Use of multiplexing may be helpful in |
| many, but certainly not all, circumstances. |
| |
| Benchmarking a major Java server application which can run |
| in a single-thread-per-client mode or using the new Poller class showed |
| Poller provided a 253% improvement in throughput at a moderate load, as |
| well as a 300% improvement in peak capacity. It also yielded a 21% |
| smaller memory footprint at the lower load level. |
| |
| Finally, there is code in Poller.c to take advantage of /dev/poll |
| on OS versions that have that device; however, DEVPOLL must be defined |
| in compiling Poller.c (and it must be compiled on a machine with |
| /usr/include/sys/devpoll.h) to use it. Code compiled with DEVPOLL |
| turned on will work on machines that don't have kernel support for |
| the device, as it will fall back to using poll() in those cases. |
| Currently /dev/poll does not correctly return an error if you attempt |
| to remove() an object that was never added, but this should be fixed |
| in an upcoming /dev/poll patch. The binary as shipped is not built with |
| /dev/poll support as our build machine does not have devpoll.h. |
| |