trace, argdist: -I switch for trace and miscellaneous fixes (#761)

* trace: Additional include files support

Similarly to `argdist`, `trace` now has a `-I` option for adding
include files that can be used in filter and print expressions.
This also required a slight modification to `argdist`'s syntax
for consistency: where previously we would allow `-I header1 header2`,
we now require `-I header1 -I header2` to avoid any mixups with
which argument is a header file and which is a probe for `trace`.

This is very unlikely to break anyone, because I haven't seen the
`-I` option used at all, not to mention extensively with multiple
headers.

Also made sure the man and example pages are up to date.

* argdist: Update -C and -H switches for consistency

This commit updates `argdist`'s `-H` and `-C` switches for consistency
with the `-I` switch and `trace`'s switches. Specifically, each probe
needs an explicit `-C` or `-H` specifier in front of it. This also
allows safe and understandable mixing of histogram and counting probes,
for example:

```
argdist -C 'p:c:write()' -H 'p::vfs__write(int fd, const void *buf, size_t size):size_t:size#write sizes'
```

* trace: Fix stack trace support for tracepoints

Tracepoint probes don't have a `ctx` argument, it's called `args`
instead. The recently-added stack trace support code didn't take
this into account, and consequently didn't work for tracepoints.
This commit fixes the issue, so we can now do things like
`trace -K t:block:block_rq_complete`.
diff --git a/tools/trace.py b/tools/trace.py
index 6915fc0..d6aef8d 100755
--- a/tools/trace.py
+++ b/tools/trace.py
@@ -4,7 +4,7 @@
 #               parameters, with an optional filter.
 #
 # USAGE: trace [-h] [-p PID] [-v] [-Z STRING_SIZE] [-S] [-M MAX_EVENTS] [-o]
-#              probe [probe ...]
+#              [-K] [-U] [-I header] probe [probe ...]
 #
 # Licensed under the Apache License, Version 2.0 (the "License")
 # Copyright (C) 2016 Sasha Goldshtein.
@@ -362,18 +362,6 @@
                 for i, expr in enumerate(self.values):
                         data_fields += self._generate_field_assign(i)
 
-                stack_trace = ""
-                if self.user_stack:
-                        stack_trace += """
-        __data.user_stack_id = %s.get_stackid(
-          ctx, BPF_F_REUSE_STACKID | BPF_F_USER_STACK
-        );""" % self.stacks_name
-                if self.kernel_stack:
-                        stack_trace += """
-        __data.kernel_stack_id = %s.get_stackid(
-          ctx, BPF_F_REUSE_STACKID
-        );""" % self.stacks_name
-
                 if self.probe_type == "t":
                         heading = "TRACEPOINT_PROBE(%s, %s)" % \
                                   (self.tp_category, self.tp_event)
@@ -381,6 +369,19 @@
                 else:
                         heading = "int %s(%s)" % (self.probe_name, signature)
                         ctx_name = "ctx"
+
+                stack_trace = ""
+                if self.user_stack:
+                        stack_trace += """
+        __data.user_stack_id = %s.get_stackid(
+          %s, BPF_F_REUSE_STACKID | BPF_F_USER_STACK
+        );""" % (self.stacks_name, ctx_name)
+                if self.kernel_stack:
+                        stack_trace += """
+        __data.kernel_stack_id = %s.get_stackid(
+          %s, BPF_F_REUSE_STACKID
+        );""" % (self.stacks_name, ctx_name)
+
                 text = heading + """
 {
         %s
@@ -551,10 +552,13 @@
                   help="use relative time from first traced message")
                 parser.add_argument("-K", "--kernel-stack", action="store_true",
                   help="output kernel stack trace")
-                parser.add_argument("-U", "--user_stack", action="store_true",
+                parser.add_argument("-U", "--user-stack", action="store_true",
                   help="output user stack trace")
                 parser.add_argument(metavar="probe", dest="probes", nargs="+",
                   help="probe specifier (see examples)")
+                parser.add_argument("-I", "--include", action="append",
+                  metavar="header",
+                  help="additional header files to include in the BPF program")
                 self.args = parser.parse_args()
 
         def _create_probes(self):
@@ -571,6 +575,8 @@
 #include <linux/sched.h>        /* For TASK_COMM_LEN */
 
 """
+                for include in (self.args.include or []):
+                        self.program += "#include <%s>\n" % include
                 self.program += BPF.generate_auto_includes(
                         map(lambda p: p.raw_probe, self.probes))
                 for probe in self.probes: