Improve thread-->arena assignment.

Rather than blindly assigning threads to arenas in round-robin fashion,
choose the lowest-numbered arena that currently has the smallest number
of threads assigned to it.

Add the "stats.arenas.<i>.nthreads" mallctl.
diff --git a/jemalloc/src/arena.c b/jemalloc/src/arena.c
index a1fa2a3..022f9ec 100644
--- a/jemalloc/src/arena.c
+++ b/jemalloc/src/arena.c
@@ -2175,6 +2175,7 @@
 	arena_bin_t *bin;
 
 	arena->ind = ind;
+	arena->nthreads = 0;
 
 	if (malloc_mutex_init(&arena->lock))
 		return (true);