jdk/src/solaris/demo/jni/Poller/README.txt - platform/libcore - Gitiles

 README.txt


 This Poller class demonstrates access to poll(2) functionality in Java.

 Requires Solaris production (native threads) JDK 1.2 or later, currently
 the C code compiles only on Solaris (SPARC and Intel).

 Poller.java is the class, Poller.c is the supporting JNI code.

 PollingServer.java is a sample application which uses the Poller class
 to multiplex sockets.

 SimpleServer.java is the functional equivalent that does not multiplex
 but uses a single thread to handle each client connection.

 Client.java is a sample application to drive against either server.

 To build the Poller class and client/server demo :
  javac PollingServer.java Client.java
  javah Poller
  cc -G -o libpoller.so -I ${JAVA_HOME}/include -I ${JAVA_HOME}/include/solaris\
   Poller.c

 You will need to set the environment variable LD_LIBRARY_PATH to search
 the directory containing libpoller.so.

 To use client/server, bump up your fd limit to handle the connections you
 want (need root access to go beyond 1024).  For info on changing your file
 descriptor limit, type "man limit".  If you are using Solaris 2.6
 or later, a regression in loopback read() performance may hit you at low
 numbers of connections, so run the client on another machine.

 BASICs of Poller class usage :
  run "javadoc Poller" or see Poller.java for more details.

 {
     Poller Mux = new Poller(65535); // allow it to contain 64K IO objects

     int fd1 = Mux.add(socket1, Poller.POLLIN);
     ...
     int fdN = Mux.add(socketN, Poller.POLLIN);

     int[] fds = new int[100];
     short[] revents = new revents[100];

     int numEvents = Mux.waitMultiple(100, fds, revents, timeout);

     for (int i = 0; i < numEvents; i++) {
        /*
         * Probably need more sophisticated mapping scheme than this!
 	*/
         if (fds[i] == fd1) {
 	    System.out.println("Got data on socket1");
 	    socket1.getInputStream().read(byteArray);
 	    // Do something based upon state of fd1 connection
 	}
 	...
     }
 }

 Poller class implementation notes :

   Currently all add(),remove(),isMember(), and waitMultiple() methods
 are synchronized for each Poller object.  If one thread is blocked in
 pObj.waitMultiple(), another thread calling pObj.add(fd) will block
 until waitMultiple() returns.  There is no provided mechanism to
 interrupt waitMultiple(), as one might expect a ServerSocket to be in
 the list waited on (see PollingServer.java).

   One might also need to interrupt waitMultiple() to remove()
 fds/sockets, in which case one could create a Pipe or loopback localhost
 connection (at the level of PollingServer) and use a write() to that
 connection to interrupt.  Or, better, one could queue up deletions
 until the next return of waitMultiple().  Or one could implement an
 interrupt mechanism in the JNI C code using a pipe(), and expose that
 at the Java level.

   If frequent deletions/re-additions of socks/fds is to be done with
 very large sets of monitored fds, the Solaris 7 kernel cache will
 likely perform poorly without some tuning.  One could differentiate
 between deleted (no longer cared for) fds/socks and those that are
 merely being disabled while data is processed on their behalf.  In
 that case, re-enabling a disabled fd/sock could put it in it's
 original position in the poll array, thereby increasing the kernel
 cache performance.  This would best be done in Poller.c.  Of course
 this is not necessary for optimal /dev/poll performance.

   Caution...the next paragraph gets a little technical for the
 benefit of those who already understand poll()ing fairly well.  Others
 may choose to skip over it to read notes on the demo server.

   An optimal solution for frequent enabling/disabling of socks/fds
 could involve a separately synchronized structure of "async"
 operations.  Using a simple array (0..64k) containing the action
 (ADD,ENABLE,DISABLE, NONE), the events, and the index into the poll
 array, and having nativeWait() wake up in the poll() call periodically
 to process these async operations, I was able to speed up performance
 of the PollingServer by a factor of 2x at 8000 connections.  Of course
 much of that gain was from the fact that I could (with the advent of
 an asyncAdd() method) move the accept() loop into a separate thread
 from the main poll() loop, and avoid the overhead of calling poll()
 with up to 7999 fds just for an accept.  In implementing the async
 Disable/Enable, a further large optimization was to auto-disable fds
 with events available (before return from nativeWait()), so I could
 just call asyncEnable(fd) after processing (read()ing) the available
 data.  This removed the need for inefficient gang-scheduling the
 attached PollingServer uses.  In order to separately synchronize the
 async structure, yet still be able to operate on it from within
 nativeWait(), synchronization had to be done at the C level here.  Due
 to the new complexities this introduced, as well as the fact that it
 was tuned specifically for Solaris 7 poll() improvements (not
 /dev/poll), this extra logic was left out of this demo.


 Client/Server Demo Notes :

   Do not run the sample client/server with high numbers of connections
 unless you have a lot of free memory on your machine, as it can saturate
 CPU and lock you out of CDE just by its very resource intensive nature
 (much more so the SimpleServer than PollingServer).

   Different OS versions will behave very differently as far as poll()
 performance (or /dev/poll existence) but, generally, real world applications
 "hit the wall" much earlier when a separate thread is used to handle
 each client connection.  Issues of thread synchronization and locking
 granularity become performance killers.  There is some overhead associated
 with multiplexing, such as keeping track of the state of each connection; as
 the number of connections gets very large, however, this overhead is more
 than made up for by the reduced synchronization overhead.

   As an example, running the servers on a Solaris 7 PC (Pentium II-350 x
 2 CPUS) with 1 GB RAM, and the client on an Ultra-2, I got the following
 times (shorter is better) :

   1000 connections :

 PollingServer took 11 seconds
 SimpleServer took 12 seconds

   4000 connections :

 PollingServer took 20 seconds
 SimpleServer took 37 seconds

   8000 connections :

 PollingServer took 39 seconds
 SimpleServer took 1:48 seconds

   This demo is not, however, meant to be considered some form of proof
 that multiplexing with the Poller class will gain you performance; this
 code is actually very heavily biased towards the non-polling server as
 very little synchronization is done, and most of the overhead is in the
 kernel IO for both servers.  Use of multiplexing may be helpful in
 many, but certainly not all, circumstances.

   Benchmarking a major Java server application which can run
 in a single-thread-per-client mode or using the  new Poller class showed
 Poller provided a 253% improvement in throughput at a moderate load, as
 well as a 300% improvement in peak capacity.  It also yielded a 21%
 smaller memory footprint at the lower load level.

   Finally, there is code in Poller.c to take advantage of /dev/poll
 on OS versions that have that device; however, DEVPOLL must be defined
 in compiling Poller.c (and it must be compiled on a machine with
 /usr/include/sys/devpoll.h) to use it.  Code compiled with DEVPOLL
 turned on will work on machines that don't have kernel support for
 the device, as it will fall back to using poll() in those cases.
 Currently /dev/poll does not correctly return an error if you attempt
 to remove() an object that was never added, but this should be fixed
 in an upcoming /dev/poll patch.  The binary as shipped is not built with
 /dev/poll support as our build machine does not have devpoll.h.
	README.txt


	This Poller class demonstrates access to poll(2) functionality in Java.

	Requires Solaris production (native threads) JDK 1.2 or later, currently
	the C code compiles only on Solaris (SPARC and Intel).

	Poller.java is the class, Poller.c is the supporting JNI code.

	PollingServer.java is a sample application which uses the Poller class
	to multiplex sockets.

	SimpleServer.java is the functional equivalent that does not multiplex
	but uses a single thread to handle each client connection.

	Client.java is a sample application to drive against either server.

	To build the Poller class and client/server demo :
	javac PollingServer.java Client.java
	javah Poller
	cc -G -o libpoller.so -I ${JAVA_HOME}/include -I ${JAVA_HOME}/include/solaris\
	Poller.c

	You will need to set the environment variable LD_LIBRARY_PATH to search
	the directory containing libpoller.so.

	To use client/server, bump up your fd limit to handle the connections you
	want (need root access to go beyond 1024). For info on changing your file
	descriptor limit, type "man limit". If you are using Solaris 2.6
	or later, a regression in loopback read() performance may hit you at low
	numbers of connections, so run the client on another machine.

	BASICs of Poller class usage :
	run "javadoc Poller" or see Poller.java for more details.

	{
	Poller Mux = new Poller(65535); // allow it to contain 64K IO objects

	int fd1 = Mux.add(socket1, Poller.POLLIN);
	...
	int fdN = Mux.add(socketN, Poller.POLLIN);

	int[] fds = new int[100];
	short[] revents = new revents[100];

	int numEvents = Mux.waitMultiple(100, fds, revents, timeout);

	for (int i = 0; i < numEvents; i++) {
	/*
	* Probably need more sophisticated mapping scheme than this!
	*/
	if (fds[i] == fd1) {
	System.out.println("Got data on socket1");
	socket1.getInputStream().read(byteArray);
	// Do something based upon state of fd1 connection
	}
	...
	}
	}

	Poller class implementation notes :

	Currently all add(),remove(),isMember(), and waitMultiple() methods
	are synchronized for each Poller object. If one thread is blocked in
	pObj.waitMultiple(), another thread calling pObj.add(fd) will block
	until waitMultiple() returns. There is no provided mechanism to
	interrupt waitMultiple(), as one might expect a ServerSocket to be in
	the list waited on (see PollingServer.java).

	One might also need to interrupt waitMultiple() to remove()
	fds/sockets, in which case one could create a Pipe or loopback localhost
	connection (at the level of PollingServer) and use a write() to that
	connection to interrupt. Or, better, one could queue up deletions
	until the next return of waitMultiple(). Or one could implement an
	interrupt mechanism in the JNI C code using a pipe(), and expose that
	at the Java level.

	If frequent deletions/re-additions of socks/fds is to be done with
	very large sets of monitored fds, the Solaris 7 kernel cache will
	likely perform poorly without some tuning. One could differentiate
	between deleted (no longer cared for) fds/socks and those that are
	merely being disabled while data is processed on their behalf. In
	that case, re-enabling a disabled fd/sock could put it in it's
	original position in the poll array, thereby increasing the kernel
	cache performance. This would best be done in Poller.c. Of course
	this is not necessary for optimal /dev/poll performance.

	Caution...the next paragraph gets a little technical for the
	benefit of those who already understand poll()ing fairly well. Others
	may choose to skip over it to read notes on the demo server.

	An optimal solution for frequent enabling/disabling of socks/fds
	could involve a separately synchronized structure of "async"
	operations. Using a simple array (0..64k) containing the action
	(ADD,ENABLE,DISABLE, NONE), the events, and the index into the poll
	array, and having nativeWait() wake up in the poll() call periodically
	to process these async operations, I was able to speed up performance
	of the PollingServer by a factor of 2x at 8000 connections. Of course
	much of that gain was from the fact that I could (with the advent of
	an asyncAdd() method) move the accept() loop into a separate thread
	from the main poll() loop, and avoid the overhead of calling poll()
	with up to 7999 fds just for an accept. In implementing the async
	Disable/Enable, a further large optimization was to auto-disable fds
	with events available (before return from nativeWait()), so I could
	just call asyncEnable(fd) after processing (read()ing) the available
	data. This removed the need for inefficient gang-scheduling the
	attached PollingServer uses. In order to separately synchronize the
	async structure, yet still be able to operate on it from within
	nativeWait(), synchronization had to be done at the C level here. Due
	to the new complexities this introduced, as well as the fact that it
	was tuned specifically for Solaris 7 poll() improvements (not
	/dev/poll), this extra logic was left out of this demo.


	Client/Server Demo Notes :

	Do not run the sample client/server with high numbers of connections
	unless you have a lot of free memory on your machine, as it can saturate
	CPU and lock you out of CDE just by its very resource intensive nature
	(much more so the SimpleServer than PollingServer).

	Different OS versions will behave very differently as far as poll()
	performance (or /dev/poll existence) but, generally, real world applications
	"hit the wall" much earlier when a separate thread is used to handle
	each client connection. Issues of thread synchronization and locking
	granularity become performance killers. There is some overhead associated
	with multiplexing, such as keeping track of the state of each connection; as
	the number of connections gets very large, however, this overhead is more
	than made up for by the reduced synchronization overhead.

	As an example, running the servers on a Solaris 7 PC (Pentium II-350 x
	2 CPUS) with 1 GB RAM, and the client on an Ultra-2, I got the following
	times (shorter is better) :

	1000 connections :

	PollingServer took 11 seconds
	SimpleServer took 12 seconds

	4000 connections :

	PollingServer took 20 seconds
	SimpleServer took 37 seconds

	8000 connections :

	PollingServer took 39 seconds
	SimpleServer took 1:48 seconds

	This demo is not, however, meant to be considered some form of proof
	that multiplexing with the Poller class will gain you performance; this
	code is actually very heavily biased towards the non-polling server as
	very little synchronization is done, and most of the overhead is in the
	kernel IO for both servers. Use of multiplexing may be helpful in
	many, but certainly not all, circumstances.

	Benchmarking a major Java server application which can run
	in a single-thread-per-client mode or using the new Poller class showed
	Poller provided a 253% improvement in throughput at a moderate load, as
	well as a 300% improvement in peak capacity. It also yielded a 21%
	smaller memory footprint at the lower load level.

	Finally, there is code in Poller.c to take advantage of /dev/poll
	on OS versions that have that device; however, DEVPOLL must be defined
	in compiling Poller.c (and it must be compiled on a machine with
	/usr/include/sys/devpoll.h) to use it. Code compiled with DEVPOLL
	turned on will work on machines that don't have kernel support for
	the device, as it will fall back to using poll() in those cases.
	Currently /dev/poll does not correctly return an error if you attempt
	to remove() an object that was never added, but this should be fixed
	in an upcoming /dev/poll patch. The binary as shipped is not built with
	/dev/poll support as our build machine does not have devpoll.h.