Make perf ring buffer size configurable

As discussed in #966, this PR makes the size of the ring buffer used to send
data to userspace configurable. It changes the Python, Lua and C++ APIs to
expose this knob.

It also defaults the buffer size to a larger value (64 pages per CPU, an 8x
increase) for several tools which produce a lot of output, as well as making it
configurable in `trace` via a `-b` flag.
diff --git a/tools/cpuunclaimed.py b/tools/cpuunclaimed.py
index 9624b50..3998f9f 100755
--- a/tools/cpuunclaimed.py
+++ b/tools/cpuunclaimed.py
@@ -205,7 +205,7 @@
 trigger = int(0.8 * (1000000000 / frequency))
 
 # read events
-b["events"].open_perf_buffer(print_event)
+b["events"].open_perf_buffer(print_event, page_cnt=64)
 while 1:
     # allow some buffering by calling sleep(), to reduce the context switch
     # rate and lower overhead.