replace per header mallocs with single malloc 3 level struct

This big patch replaces the malloc / realloc per header
approach used until now with a single three-level struct
that gets malloc'd during the header union phase and freed
in one go when we transition to a different union phase.

It's more expensive in that we malloc a bit more than 4Kbytes,
but it's a lot cheaper in terms of malloc, frees, heap fragmentation,
no reallocs, nothing to configure.  It also moves from arrays of
pointers (8 bytes on x86_64) to unsigned short offsets into the
data array, (2 bytes on all platforms).

The 3-level thing is all in one struct

 - array indexed by the header enum, pointing to first "fragment" index
	(ie, header type to fragment lookup, or 0 for none)

 - array of fragments indexes, enough for 2 x the number of known headers
	(fragment array... note that fragments can point to a "next"
	fragment if the same header is spread across multiple entries)

 - linear char array where the known header payload gets written
	(fragments point into null-terminated strings stored in here,
	only the known header content is stored)

http headers can legally be split over multiple headers of the same
name which should be concatenated.  This scheme does not linearly
conatenate them but uses a linked list in the fragment structs to
link them.  There are apis to get the total length and copy out a
linear, concatenated version to a buffer.

Signed-off-by: Andy Green <andy.green@linaro.org>
diff --git a/changelog b/changelog
index a6e0b58..442ab6a 100644
--- a/changelog
+++ b/changelog
@@ -54,21 +54,30 @@
 	By correctly setting this, you can save a lot of memory when your
 	protocol has small frames (see the test server and client sources).
 
+ - LWS_MAX_HEADER_LEN now defaults to 1024 and is the total amount of known
+ 	header payload lws can cope with, that includes the GET URL, origin
+	etc.  Headers not understood by lws are ignored and their payload
+	not included in this.
+
 
 User api removals
 -----------------
 
-The configuration-time option MAX_USER_RX_BUFFER has been replaced by a
-buffer size chosen per-protocol.  For compatibility, there's a default of
-4096 rx buffer, but user code should set the appropriate size for the
-protocol frames.
+ - The configuration-time option MAX_USER_RX_BUFFER has been replaced by a
+	buffer size chosen per-protocol.  For compatibility, there's a default
+	of 4096 rx buffer, but user code should set the appropriate size for
+	the protocol frames.
+
+ - LWS_INITIAL_HDR_ALLOC and LWS_ADDITIONAL_HDR_ALLOC are no longer needed
+ 	and have been removed.  There's a new header management scheme that
+	handles them in a much more compact way.
 
 
 New features
 ------------
 
  - Cmake project file added, aimed initially at Windows support: this replaces
-the visual studio project files that were in the tree until now.
+	the visual studio project files that were in the tree until now.
 
  - PATH_MAX or MAX_PATH no longer needed
 
@@ -84,6 +93,15 @@
 	below the threshold, so it's removed.  Veto the compression extension
 	in your user callback if you will typically have very small frames.
 
+ - There are many memory usage improvements, both a reduction in malloc/
+ 	realloc and architectural changes.  A websocket connection now
+	consumes only 296 bytes with SSL or 272 bytes without on x86_64,
+	during header processing an additional 1262 bytes is allocated in a
+	single malloc, but is freed when the websocket connection starts.
+	The RX frame buffer defined by the protocol in user
+	code is also allocated per connection, this represents the largest
+	frame you can receive atomically in that protocol.
+
 
 v1.1-chrome26-firefox18
 =======================