Make perf ring buffer size configurable

As discussed in #966, this PR makes the size of the ring buffer used to send
data to userspace configurable. It changes the Python, Lua and C++ APIs to
expose this knob.

It also defaults the buffer size to a larger value (64 pages per CPU, an 8x
increase) for several tools which produce a lot of output, as well as making it
configurable in `trace` via a `-b` flag.
diff --git a/tools/biosnoop.py b/tools/biosnoop.py
index aa8a077..3d77e52 100755
--- a/tools/biosnoop.py
+++ b/tools/biosnoop.py
@@ -182,6 +182,6 @@
     start_ts = 1
 
 # loop with callback to print_event
-b["events"].open_perf_buffer(print_event)
+b["events"].open_perf_buffer(print_event, page_cnt=64)
 while 1:
     b.kprobe_poll()