Fix argdist, trace, tplist to use the libbcc USDT support (#698)

* Allow argdist to enable USDT probes without a pid

The current code would only pass the pid to the USDT
class, thereby not allowing USDT probes to be enabled
from the binary path only. If the probe doesn't have
a semaphore, it can actually be enabled for all
processes in a uniform fashion -- which is now
supported.

* Reintroduce USDT support into tplist

To print USDT probe information, tplist needs an API
to return the probe data, including the number of
arguments and locations for each probe. This commit
introduces this API, called bcc_usdt_foreach, and
invokes it from the revised tplist implementation.

Although the result is not 100% identical to the
original tplist, which could also print the probe
argument information, this is not strictly required
for users of the argdist and trace tools, which is
why it was omitted for now.

* Fix trace.py tracepoint support

Somehow, the import of the Perf class was omitted
from tracepoint.py, which would cause failures when
trace enables kernel tracepoints.

* trace: Native bcc USDT support

trace now works again by using the new bcc USDT support
instead of the home-grown Python USDT parser. This
required an additional change in the BPF Python API
to allow multiple USDT context objects to be passed to
the constructor in order to support multiple USDT
probes in a single invocation of trace. Otherwise, the
USDT-related code in trace was greatly simplified, and
uses the `bpf_usdt_readarg` macros to obtain probe
argument values.

One minor inconvenience that was introduced in the bcc
USDT API is that USDT probes with multiple locations
that reside in a shared object *must* have a pid
specified to enable, even if they don't have an
associated semaphore. The reason is that the bcc USDT
code figures out which location invoked the probe by
inspecting `ctx->ip`, which, for shared objects, can
only be determined when the specific process context is
available to figure out where the shared object was
loaded. This limitation did not previously exist,
because instead of looking at `ctx->ip`, the Python
USDT reader generated separate code for each probe
location with an incrementing identifier. It's not a
very big deal because it only means that some probes
can't be enabled without specifying a process id, which
is almost always desired anyway for USDT probes.

argdist has not yet been retrofitted with support for
multiple USDT probes, and needs to be updated in a
separate commit.

* argdist: Support multiple USDT probes

argdist now supports multiple USDT probes, as it did
before the transition to the native bcc USDT support.
This requires aggregating the USDT objects from each
probe and passing them together to the BPF constructor
when the probes are initialized and attached.

Also add a more descriptive exception message to the
USDT class when it fails to enable a probe.
diff --git a/src/cc/bcc_usdt.h b/src/cc/bcc_usdt.h
index 7148b10..b7df011 100644
--- a/src/cc/bcc_usdt.h
+++ b/src/cc/bcc_usdt.h
@@ -26,6 +26,18 @@
 void *bcc_usdt_new_frompath(const char *path);
 void bcc_usdt_close(void *usdt);
 
+struct bcc_usdt {
+    const char *provider;
+    const char *name;
+    const char *bin_path;
+    uint64_t semaphore;
+    int num_locations;
+    int num_arguments;
+};
+
+typedef void (*bcc_usdt_cb)(struct bcc_usdt *);
+void bcc_usdt_foreach(void *usdt, bcc_usdt_cb callback);
+
 int bcc_usdt_enable_probe(void *, const char *, const char *);
 const char *bcc_usdt_genargs(void *);
 
diff --git a/src/cc/usdt.cc b/src/cc/usdt.cc
index a469eea..3bc4940 100644
--- a/src/cc/usdt.cc
+++ b/src/cc/usdt.cc
@@ -24,6 +24,7 @@
 #include "bcc_proc.h"
 #include "usdt.h"
 #include "vendor/tinyformat.hpp"
+#include "bcc_usdt.h"
 
 namespace USDT {
 
@@ -255,6 +256,19 @@
   return p && p->enable(fn_name);
 }
 
+void Context::each(each_cb callback) {
+  for (const auto &probe : probes_) {
+    struct bcc_usdt info = {0};
+    info.provider = probe->provider().c_str();
+    info.bin_path = probe->bin_path().c_str();
+    info.name = probe->name().c_str();
+    info.semaphore = probe->semaphore();
+    info.num_locations = probe->num_locations();
+    info.num_arguments = probe->num_arguments();
+    callback(&info);
+  }
+}
+
 void Context::each_uprobe(each_uprobe_cb callback) {
   for (auto &p : probes_) {
     if (!p->enabled())
@@ -288,7 +302,6 @@
 }
 
 extern "C" {
-#include "bcc_usdt.h"
 
 void *bcc_usdt_new_frompid(int pid) {
   USDT::Context *ctx = new USDT::Context(pid);
@@ -331,6 +344,11 @@
   return storage_.c_str();
 }
 
+void bcc_usdt_foreach(void *usdt, bcc_usdt_cb callback) {
+  USDT::Context *ctx = static_cast<USDT::Context *>(usdt);
+  ctx->each(callback);
+}
+
 void bcc_usdt_foreach_uprobe(void *usdt, bcc_usdt_uprobe_cb callback) {
   USDT::Context *ctx = static_cast<USDT::Context *>(usdt);
   ctx->each_uprobe(callback);
diff --git a/src/cc/usdt.h b/src/cc/usdt.h
index e67d89c..621f554 100644
--- a/src/cc/usdt.h
+++ b/src/cc/usdt.h
@@ -23,6 +23,8 @@
 #include "syms.h"
 #include "vendor/optional.hpp"
 
+struct bcc_usdt;
+
 namespace USDT {
 
 using std::experimental::optional;
@@ -148,6 +150,7 @@
 
   size_t num_locations() const { return locations_.size(); }
   size_t num_arguments() const { return locations_.front().arguments_.size(); }
+  uint64_t semaphore()   const { return semaphore_; }
 
   uint64_t address(size_t n = 0) const { return locations_[n].address_; }
   bool usdt_getarg(std::ostream &stream);
@@ -194,6 +197,9 @@
   bool enable_probe(const std::string &probe_name, const std::string &fn_name);
   bool generate_usdt_args(std::ostream &stream);
 
+  typedef void (*each_cb)(struct bcc_usdt *);
+  void each(each_cb callback);
+
   typedef void (*each_uprobe_cb)(const char *, const char *, uint64_t, int);
   void each_uprobe(each_uprobe_cb callback);
 };
diff --git a/src/python/bcc/__init__.py b/src/python/bcc/__init__.py
index 2ed0223..35d293d 100644
--- a/src/python/bcc/__init__.py
+++ b/src/python/bcc/__init__.py
@@ -149,7 +149,7 @@
         return None
 
     def __init__(self, src_file="", hdr_file="", text=None, cb=None, debug=0,
-            cflags=[], usdt=None):
+            cflags=[], usdt_contexts=[]):
         """Create a a new BPF module with the given source code.
 
         Note:
@@ -179,7 +179,15 @@
         self.tables = {}
         cflags_array = (ct.c_char_p * len(cflags))()
         for i, s in enumerate(cflags): cflags_array[i] = s.encode("ascii")
-        if usdt and text: text = usdt.get_text() + text
+        if text:
+            for usdt_context in usdt_contexts:
+                usdt_text = usdt_context.get_text()
+                if usdt_text is None:
+                    raise Exception("can't generate USDT probe arguments; " +
+                                    "possible cause is missing pid when a " +
+                                    "probe in a shared object has multiple " +
+                                    "locations")
+                text = usdt_context.get_text() + text
 
         if text:
             self.module = lib.bpf_module_create_c_from_string(text.encode("ascii"),
@@ -197,7 +205,8 @@
         if not self.module:
             raise Exception("Failed to compile BPF module %s" % src_file)
 
-        if usdt: usdt.attach_uprobes(self)
+        for usdt_context in usdt_contexts:
+            usdt_context.attach_uprobes(self)
 
         # If any "kprobe__" or "tracepoint__" prefixed functions were defined,
         # they will be loaded and attached here.
diff --git a/src/python/bcc/libbcc.py b/src/python/bcc/libbcc.py
index 257a83d..7273ee9 100644
--- a/src/python/bcc/libbcc.py
+++ b/src/python/bcc/libbcc.py
@@ -157,7 +157,23 @@
 lib.bcc_usdt_genargs.restype = ct.c_char_p
 lib.bcc_usdt_genargs.argtypes = [ct.c_void_p]
 
-_USDT_CB = ct.CFUNCTYPE(None, ct.c_char_p, ct.c_char_p, ct.c_ulonglong, ct.c_int)
+class bcc_usdt(ct.Structure):
+    _fields_ = [
+            ('provider', ct.c_char_p),
+            ('name', ct.c_char_p),
+            ('bin_path', ct.c_char_p),
+            ('semaphore', ct.c_ulonglong),
+            ('num_locations', ct.c_int),
+            ('num_arguments', ct.c_int),
+        ]
+
+_USDT_CB = ct.CFUNCTYPE(None, ct.POINTER(bcc_usdt))
+
+lib.bcc_usdt_foreach.restype = None
+lib.bcc_usdt_foreach.argtypes = [ct.c_void_p, _USDT_CB]
+
+_USDT_PROBE_CB = ct.CFUNCTYPE(None, ct.c_char_p, ct.c_char_p,
+                              ct.c_ulonglong, ct.c_int)
 
 lib.bcc_usdt_foreach_uprobe.restype = None
-lib.bcc_usdt_foreach_uprobe.argtypes = [ct.c_void_p, _USDT_CB]
+lib.bcc_usdt_foreach_uprobe.argtypes = [ct.c_void_p, _USDT_PROBE_CB]
diff --git a/src/python/bcc/tracepoint.py b/src/python/bcc/tracepoint.py
index 412c681..31abaaa 100644
--- a/src/python/bcc/tracepoint.py
+++ b/src/python/bcc/tracepoint.py
@@ -16,6 +16,7 @@
 import multiprocessing
 import os
 import re
+from .perf import Perf
 
 class Tracepoint(object):
         enabled_tracepoints = []
diff --git a/src/python/bcc/usdt.py b/src/python/bcc/usdt.py
index 98d87b8..3af28a4 100644
--- a/src/python/bcc/usdt.py
+++ b/src/python/bcc/usdt.py
@@ -12,34 +12,67 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 
-from .libbcc import lib, _USDT_CB
+from .libbcc import lib, _USDT_CB, _USDT_PROBE_CB
+
+class USDTProbe(object):
+    def __init__(self, usdt):
+        self.provider = usdt.provider
+        self.name = usdt.name
+        self.bin_path = usdt.bin_path
+        self.semaphore = usdt.semaphore
+        self.num_locations = usdt.num_locations
+        self.num_arguments = usdt.num_arguments
+
+    def __str__(self):
+        return "%s %s:%s [sema 0x%x]\n  %d location(s)\n  %d argument(s)" % \
+               (self.bin_path, self.provider, self.name, self.semaphore,
+                self.num_locations, self.num_arguments)
+
+    def short_name(self):
+        return "%s:%s" % (self.provider, self.name)
 
 class USDT(object):
     def __init__(self, pid=None, path=None):
-        if pid:
+        if pid and pid != -1:
             self.pid = pid
             self.context = lib.bcc_usdt_new_frompid(pid)
             if self.context == None:
-                raise Exception("USDT failed to instrument PID %d" % pid) 
+                raise Exception("USDT failed to instrument PID %d" % pid)
         elif path:
             self.path = path
             self.context = lib.bcc_usdt_new_frompath(path)
             if self.context == None:
-                raise Exception("USDT failed to instrument path %s" % path) 
+                raise Exception("USDT failed to instrument path %s" % path)
+        else:
+            raise Exception("either a pid or a binary path must be specified")
 
     def enable_probe(self, probe, fn_name):
         if lib.bcc_usdt_enable_probe(self.context, probe, fn_name) != 0:
-            raise Exception("failed to enable probe '%s'" % probe)
+            raise Exception(("failed to enable probe '%s'; a possible cause " +
+                            "can be that the probe requires a pid to enable") %
+                            probe)
 
     def get_text(self):
         return lib.bcc_usdt_genargs(self.context)
 
+    def enumerate_probes(self):
+        probes = []
+        def _add_probe(probe):
+            probes.append(USDTProbe(probe.contents))
+
+        lib.bcc_usdt_foreach(self.context, _USDT_CB(_add_probe))
+        return probes
+
+    # This is called by the BPF module's __init__ when it realizes that there
+    # is a USDT context and probes need to be attached.
     def attach_uprobes(self, bpf):
         probes = []
         def _add_probe(binpath, fn_name, addr, pid):
             probes.append((binpath, fn_name, addr, pid))
 
-        lib.bcc_usdt_foreach_uprobe(self.context, _USDT_CB(_add_probe))
+        lib.bcc_usdt_foreach_uprobe(self.context, _USDT_PROBE_CB(_add_probe))
 
         for (binpath, fn_name, addr, pid) in probes:
-            bpf.attach_uprobe(name=binpath, fn_name=fn_name, addr=addr, pid=pid)
+            bpf.attach_uprobe(name=binpath, fn_name=fn_name,
+                              addr=addr, pid=pid)
+
diff --git a/tools/argdist.py b/tools/argdist.py
index fa9276c..bc8aff3 100755
--- a/tools/argdist.py
+++ b/tools/argdist.py
@@ -175,8 +175,9 @@
                         self._bail("no exprs specified")
                 self.exprs = exprs.split(',')
 
-        def __init__(self, bpf, type, specifier):
-                self.pid = bpf.args.pid
+        def __init__(self, tool, type, specifier):
+                self.usdt_ctx = None
+                self.pid = tool.args.pid
                 self.raw_spec = specifier
                 self._validate_specifier()
 
@@ -200,8 +201,7 @@
                         self.library = parts[1]
                         self.probe_func_name = "%s_probe%d" % \
                                 (self.function, Probe.next_probe_index)
-                        bpf.enable_usdt_probe(self.function,
-                                        fn_name=self.probe_func_name)
+                        self._enable_usdt_probe()
                 else:
                         self.library = parts[1]
                 self.is_user = len(self.library) > 0
@@ -242,8 +242,10 @@
                         (self.function, Probe.next_probe_index)
                 Probe.next_probe_index += 1
 
-        def close(self):
-                pass
+        def _enable_usdt_probe(self):
+                self.usdt_ctx = USDT(path=self.library, pid=self.pid)
+                self.usdt_ctx.enable_probe(
+                        self.function, self.probe_func_name)
 
         def _substitute_exprs(self):
                 def repl(expr):
@@ -262,12 +264,17 @@
                 else:
                         return "%s v%d;\n" % (self.expr_types[i], i)
 
+        def _generate_usdt_arg_assignment(self, i):
+                expr = self.exprs[i]
+                if self.probe_type == "u" and expr[0:3] == "arg":
+                        return ("        u64 %s = 0;\n" +
+                                "        bpf_usdt_readarg(%s, ctx, &%s);\n") % \
+                                (expr, expr[3], expr)
+                else:
+                        return ""
+
         def _generate_field_assignment(self, i):
-                text = ""
-                if self.probe_type == "u" and self.exprs[i][0:3] == "arg":
-                    text = ("        u64 %s;\n" + 
-                           "        bpf_usdt_readarg(%s, ctx, &%s);\n") % \
-                           (self.exprs[i], self.exprs[i][3], self.exprs[i])
+                text = self._generate_usdt_arg_assignment(i)
                 if self._is_string(self.expr_types[i]):
                         return (text + "        bpf_probe_read(&__key.v%d.s," +
                                 " sizeof(__key.v%d.s), (void *)%s);\n") % \
@@ -291,8 +298,9 @@
 
         def _generate_key_assignment(self):
                 if self.type == "hist":
-                        return "%s __key = %s;\n" % \
-                                (self.expr_types[0], self.exprs[0])
+                        return self._generate_usdt_arg_assignment(0) + \
+                               ("%s __key = %s;\n" % \
+                                (self.expr_types[0], self.exprs[0]))
                 else:
                         text = "struct %s_key_t __key = {};\n" % \
                                 self.probe_hash_name
@@ -590,11 +598,6 @@
                         print("at least one specifier is required")
                         exit()
 
-        def enable_usdt_probe(self, probe_name, fn_name):
-                if not self.usdt_ctx:
-                        self.usdt_ctx = USDT(pid=self.args.pid)
-                self.usdt_ctx.enable_probe(probe_name, fn_name)
-
         def _generate_program(self):
                 bpf_source = """
 struct __string_t { char s[%d]; };
@@ -610,9 +613,13 @@
                 for probe in self.probes:
                         bpf_source += probe.generate_text()
                 if self.args.verbose:
-                        if self.usdt_ctx: print(self.usdt_ctx.get_text())
+                        for text in [probe.usdt_ctx.get_text() \
+                                     for probe in self.probes if probe.usdt_ctx]:
+                            print(text)
                         print(bpf_source)
-                self.bpf = BPF(text=bpf_source, usdt=self.usdt_ctx)
+                usdt_contexts = [probe.usdt_ctx
+                                 for probe in self.probes if probe.usdt_ctx]
+                self.bpf = BPF(text=bpf_source, usdt_contexts=usdt_contexts)
 
         def _attach(self):
                 Tracepoint.attach(self.bpf)
@@ -637,12 +644,6 @@
                            count_so_far >= self.args.count:
                                 exit()
 
-        def _close_probes(self):
-                for probe in self.probes:
-                        probe.close()
-                        if self.args.verbose:
-                                print("closed probe: " + str(probe))
-
         def run(self):
                 try:
                         self._create_probes()
@@ -654,7 +655,6 @@
                                 traceback.print_exc()
                         elif sys.exc_info()[0] is not SystemExit:
                                 print(sys.exc_info()[1])
-                self._close_probes()
 
 if __name__ == "__main__":
         Tool().run()
diff --git a/tools/tplist.py b/tools/tplist.py
index ff00744..2572041 100755
--- a/tools/tplist.py
+++ b/tools/tplist.py
@@ -13,7 +13,7 @@
 import re
 import sys
 
-from bcc import USDTReader
+from bcc import USDT
 
 trace_root = "/sys/kernel/debug/tracing"
 event_root = os.path.join(trace_root, "events")
@@ -21,7 +21,7 @@
 parser = argparse.ArgumentParser(description=
                 "Display kernel tracepoints or USDT probes and their formats.",
                 formatter_class=argparse.RawDescriptionHelpFormatter)
-parser.add_argument("-p", "--pid", type=int, default=-1, help=
+parser.add_argument("-p", "--pid", type=int, default=None, help=
                 "List USDT probes in the specified process")
 parser.add_argument("-l", "--lib", default="", help=
                 "List USDT probes in the specified library or executable")
@@ -65,23 +65,23 @@
                                 print_tpoint(category, event)
 
 def print_usdt(pid, lib):
-        reader = USDTReader(bin_path=lib, pid=pid)
+        reader = USDT(path=lib, pid=pid)
         probes_seen = []
-        for probe in reader.probes:
-                probe_name = "%s:%s" % (probe.provider, probe.name)
+        for probe in reader.enumerate_probes():
+                probe_name = probe.short_name()
                 if not args.filter or fnmatch.fnmatch(probe_name, args.filter):
                         if probe_name in probes_seen:
                                 continue
                         probes_seen.append(probe_name)
                         if args.variables:
-                                print(probe.display_verbose())
+                                print(probe)
                         else:
                                 print("%s %s:%s" % (probe.bin_path,
-                                        probe.provider, probe.name))
+                                                    probe.provider, probe.name))
 
 if __name__ == "__main__":
         try:
-                if args.pid != -1 or args.lib != "":
+                if args.pid or args.lib != "":
                         print_usdt(args.pid, args.lib)
                 else:
                         print_tracepoints()
diff --git a/tools/tplist_example.txt b/tools/tplist_example.txt
index dfa13e2..7beb9b2 100644
--- a/tools/tplist_example.txt
+++ b/tools/tplist_example.txt
@@ -17,25 +17,18 @@
 /home/vagrant/basic_usdt basic_usdt:loop_iter
 /home/vagrant/basic_usdt basic_usdt:end_main
 
-The loop_iter probe sounds interesting. What are the locations of that
-probe, and which variables are available?
+The loop_iter probe sounds interesting. How many arguments are available?
 
 $ tplist '*loop_iter' -l basic_usdt -v
 /home/vagrant/basic_usdt basic_usdt:loop_iter [sema 0x601036]
-  location 0x400550 raw args: -4@$42 8@%rax
-    4   signed bytes @ constant 42
-    8 unsigned bytes @ register %rax
-  location 0x40056f raw args: 8@-8(%rbp) 8@%rax
-    8 unsigned bytes @ -8(%rbp)
-    8 unsigned bytes @ register %rax
+  2 location(s)
+  2 argument(s)
 
 This output indicates that the loop_iter probe is used in two locations
-in the basic_usdt executable. The first location passes a constant value,
-42, to the probe. The second location passes a variable value located at
-an offset from the %rbp register. Don't worry -- you don't have to trace
-the register values yourself. The argdist and trace tools understand the
-probe format and can print out the arguments automatically -- you can
-refer to them as arg1, arg2, and so on.
+in the basic_usdt executable, and that it has two arguments. Fortunately,
+the argdist and trace tools understand the probe format and can print out
+the arguments automatically -- you can refer to them as arg1, arg2, and
+so on.
 
 Try to explore with some common libraries on your system and see if they
 contain UDST probes. Here are two examples you might find interesting:
diff --git a/tools/trace.py b/tools/trace.py
index d4b604b..d37c60d 100755
--- a/tools/trace.py
+++ b/tools/trace.py
@@ -59,6 +59,7 @@
                 cls.pid = args.pid or -1
 
         def __init__(self, probe, string_size):
+                self.usdt = None
                 self.raw_probe = probe
                 self.string_size = string_size
                 Probe.probe_count += 1
@@ -145,30 +146,15 @@
                         # We will discover the USDT provider by matching on
                         # the USDT name in the specified library
                         self._find_usdt_probe()
-                        self._enable_usdt_probe()
                 else:
                         self.library = parts[1]
                         self.function = parts[2]
 
-        def _enable_usdt_probe(self):
-                if self.usdt.need_enable():
-                        if Probe.pid == -1:
-                                self._bail("probe needs pid to enable")
-                        self.usdt.enable(Probe.pid)
-
-        def _disable_usdt_probe(self):
-                if self.probe_type == "u" and self.usdt.need_enable():
-                        self.usdt.disable(Probe.pid)
-
-        def close(self):
-                self._disable_usdt_probe()
-
         def _find_usdt_probe(self):
-                reader = USDTReader(bin_path=self.library)
-                for probe in reader.probes:
+                self.usdt = USDT(path=self.library, pid=Probe.pid)
+                for probe in self.usdt.enumerate_probes():
                         if probe.name == self.usdt_name:
-                                self.usdt = probe
-                                return
+                                return # Found it, will enable later
                 self._bail("unrecognized USDT probe %s" % self.usdt_name)
 
         def _parse_filter(self, filt):
@@ -219,7 +205,8 @@
         def _replace_args(self, expr):
                 for alias, replacement in Probe.aliases.items():
                         # For USDT probes, we replace argN values with the
-                        # actual arguments for that probe.
+                        # actual arguments for that probe obtained using special
+                        # bpf_readarg_N macros emitted at BPF construction.
                         if alias.startswith("arg") and self.probe_type == "u":
                                 continue
                         expr = expr.replace(alias, replacement)
@@ -294,15 +281,21 @@
 
         def _generate_field_assign(self, idx):
                 field_type = self.types[idx]
-                expr = self.values[idx]
+                expr = self.values[idx].strip()
+                text = ""
+                if self.probe_type == "u" and expr[0:3] == "arg":
+                        text = ("        u64 %s;\n" +
+                                "        bpf_usdt_readarg(%s, ctx, &%s);\n") % \
+                                (expr, expr[3], expr)
+
                 if field_type == "s":
-                        return """
+                        return text + """
         if (%s != 0) {
                 bpf_probe_read(&__data.v%d, sizeof(__data.v%d), (void *)%s);
         }
 """                     % (expr, idx, idx, expr)
                 if field_type in Probe.fmt_types:
-                        return "        __data.v%d = (%s)%s;\n" % \
+                        return text + "        __data.v%d = (%s)%s;\n" % \
                                         (idx, Probe.c_type[field_type], expr)
                 self._bail("unrecognized field type %s" % field_type)
 
@@ -324,23 +317,17 @@
                         pid_filter = ""
 
                 prefix = ""
-                qualifier = ""
                 signature = "struct pt_regs *ctx"
                 if self.probe_type == "t":
                         data_decl += self.tp.generate_struct()
                         prefix = self.tp.generate_get_struct()
-                elif self.probe_type == "u":
-                        signature += ", int __loc_id"
-                        prefix = self.usdt.generate_usdt_cases(
-                                pid=Probe.pid if Probe.pid != -1 else None)
-                        qualifier = "static inline"
 
                 data_fields = ""
                 for i, expr in enumerate(self.values):
                         data_fields += self._generate_field_assign(i)
 
                 text = """
-%s int %s(%s)
+int %s(%s)
 {
         %s
         %s
@@ -355,15 +342,10 @@
         return 0;
 }
 """
-                text = text % (qualifier, self.probe_name, signature,
+                text = text % (self.probe_name, signature,
                                pid_filter, prefix, self.filter,
                                self.struct_name, data_fields, self.events_name)
 
-                if self.probe_type == "u":
-                        self.usdt_thunk_names = []
-                        text += self.usdt.generate_usdt_thunks(
-                                        self.probe_name, self.usdt_thunk_names)
-
                 return data_decl + "\n" + text
 
         @classmethod
@@ -421,11 +403,7 @@
                         self._bail("unable to find library %s" % self.library)
 
                 if self.probe_type == "u":
-                        for i, location in enumerate(self.usdt.locations):
-                                bpf.attach_uprobe(name=libpath,
-                                        addr=location.address,
-                                        fn_name=self.usdt_thunk_names[i],
-                                        pid=Probe.pid)
+                        pass # Was already enabled by the BPF constructor
                 elif self.probe_type == "r":
                         bpf.attach_uretprobe(name=libpath,
                                              sym=self.function,
@@ -511,7 +489,16 @@
                         print(self.program)
 
         def _attach_probes(self):
-                self.bpf = BPF(text=self.program)
+                usdt_contexts = []
+                for probe in self.probes:
+                    if probe.usdt:
+                        # USDT probes must be enabled before the BPF object
+                        # is initialized, because that's where the actual
+                        # uprobe is being attached.
+                        probe.usdt.enable_probe(
+                                probe.usdt_name, probe.probe_name)
+                        usdt_contexts.append(probe.usdt)
+                self.bpf = BPF(text=self.program, usdt_contexts=usdt_contexts)
                 Tracepoint.attach(self.bpf)
                 for probe in self.probes:
                         if self.args.verbose:
@@ -530,12 +517,6 @@
                 while True:
                         self.bpf.kprobe_poll()
 
-        def _close_probes(self):
-                for probe in self.probes:
-                        probe.close()
-                        if self.args.verbose:
-                                print("closed probe: " + str(probe))
-
         def run(self):
                 try:
                         self._create_probes()
@@ -547,7 +528,6 @@
                                 traceback.print_exc()
                         elif sys.exc_info()[0] is not SystemExit:
                                 print(sys.exc_info()[1])
-                self._close_probes()
 
 if __name__ == "__main__":
        Tool().run()