{f,m}_bpf: allow for sharing maps This larger work addresses one of the bigger remaining issues on tc's eBPF frontend, that is, to allow for persistent file descriptors. Whenever tc parses the ELF object, extracts and loads maps into the kernel, these file descriptors will be out of reach after the tc instance exits. Meaning, for simple (unnested) programs which contain one or multiple maps, the kernel holds a reference, and they will live on inside the kernel until the program holding them is unloaded, but they will be out of reach for user space, even worse with (also multiple nested) tail calls. For this issue, we introduced the concept of an agent that can receive the set of file descriptors from the tc instance creating them, in order to be able to further inspect/update map data for a specific use case. However, while that is more tied towards specific applications, it still doesn't easily allow for sharing maps accross multiple tc instances and would require a daemon to be running in the background. F.e. when a map should be shared by two eBPF programs, one attached to ingress, one to egress, this currently doesn't work with the tc frontend. This work solves exactly that, i.e. if requested, maps can now be _arbitrarily_ shared between object files (PIN_GLOBAL_NS) or within a single object (but various program sections, PIN_OBJECT_NS) without "loosing" the file descriptor set. To make that happen, we use eBPF object pinning introduced in kernel commit b2197755b263 ("bpf: add support for persistent maps/progs") for exactly this purpose. The shipped examples/bpf/bpf_shared.c code from this patch can be easily applied, for instance, as: - classifier-classifier shared: tc filter add dev foo parent 1: bpf obj shared.o sec egress tc filter add dev foo parent ffff: bpf obj shared.o sec ingress - classifier-action shared (here: late binding to a dummy classifier): tc actions add action bpf obj shared.o sec egress pass index 42 tc filter add dev foo parent ffff: bpf obj shared.o sec ingress tc filter add dev foo parent 1: bpf bytecode '1,6 0 0 4294967295,' \ action bpf index 42 The toy example increments a shared counter on egress and dumps its value on ingress (if no sharing (PIN_NONE) would have been chosen, map value is 0, of course, due to the two map instances being created): [...] <idle>-0 [002] ..s. 38264.788234: : map val: 4 <idle>-0 [002] ..s. 38264.788919: : map val: 4 <idle>-0 [002] ..s. 38264.789599: : map val: 5 [...] ... thus if both sections reference the pinned map(s) in question, tc will take care of fetching the appropriate file descriptor. The patch has been tested extensively on both, classifier and action sides. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

commit: 32e93fb7f66d55d597b52ec3b10fd44a47784114 [log] [tgz]
author: Daniel Borkmann <daniel@iogearbox.net> Fri Nov 13 00:39:29 2015 +0100
committer: Stephen Hemminger <shemming@brocade.com> Mon Nov 23 16:10:44 2015 -0800
tree: 592eebfbb71b163ac5da3d95669093a674264d6b
parent: e149d4e84384f88965ce43a6390acf7ba356187c [diff] [blame]
diff --git a/tc/tc_bpf.h b/tc/tc_bpf.h
index 2ad8812..dea3c3b 100644
--- a/tc/tc_bpf.h
+++ b/tc/tc_bpf.h

@@ -13,61 +13,56 @@
 #ifndef _TC_BPF_H_
 #define _TC_BPF_H_ 1
 
-#include <linux/filter.h>
 #include <linux/netlink.h>
-#include <linux/rtnetlink.h>
 #include <linux/bpf.h>
-#include <sys/syscall.h>
-#include <errno.h>
-#include <stdio.h>
-#include <stdint.h>
+#include <linux/magic.h>
 
 #include "utils.h"
 #include "bpf_scm.h"
 
+enum {
+	BPF_NLA_OPS_LEN = 0,
+	BPF_NLA_OPS,
+	BPF_NLA_FD,
+	BPF_NLA_NAME,
+	__BPF_NLA_MAX,
+};
+
+#define BPF_NLA_MAX	__BPF_NLA_MAX
+
 #define BPF_ENV_UDS	"TC_BPF_UDS"
+#define BPF_ENV_MNT	"TC_BPF_MNT"
+#define BPF_ENV_NOLOG	"TC_BPF_NOLOG"
 
-int bpf_parse_string(char *arg, bool from_file, __u16 *bpf_len,
-		     char **bpf_string, bool *need_release,
-		     const char separator);
-int bpf_parse_ops(int argc, char **argv, struct sock_filter *bpf_ops,
-		  bool from_file);
-void bpf_print_ops(FILE *f, struct rtattr *bpf_ops, __u16 len);
+#ifndef BPF_FS_MAGIC
+# define BPF_FS_MAGIC	0xcafe4a11
+#endif
 
+#define BPF_DIR_MNT	"/sys/fs/bpf"
+
+#define BPF_DIR_TC	"tc"
+#define BPF_DIR_GLOBALS	"globals"
+
+#ifndef TRACEFS_MAGIC
+# define TRACEFS_MAGIC	0x74726163
+#endif
+
+#define TRACE_DIR_MNT	"/sys/kernel/tracing"
+
+int bpf_trace_pipe(void);
 const char *bpf_default_section(const enum bpf_prog_type type);
 
-#ifdef HAVE_ELF
-int bpf_open_object(const char *path, enum bpf_prog_type type,
-		    const char *sec, bool verbose);
+int bpf_parse_common(int *ptr_argc, char ***ptr_argv, const int *nla_tbl,
+		     enum bpf_prog_type type, const char **ptr_object,
+		     const char **ptr_uds_name, struct nlmsghdr *n);
 
+void bpf_print_ops(FILE *f, struct rtattr *bpf_ops, __u16 len);
+
+#ifdef HAVE_ELF
 int bpf_send_map_fds(const char *path, const char *obj);
 int bpf_recv_map_fds(const char *path, int *fds, struct bpf_map_aux *aux,
 		     unsigned int entries);
-
-static inline __u64 bpf_ptr_to_u64(const void *ptr)
-{
-	return (__u64) (unsigned long) ptr;
-}
-
-static inline int bpf(int cmd, union bpf_attr *attr, unsigned int size)
-{
-#ifdef __NR_bpf
-	return syscall(__NR_bpf, cmd, attr, size);
 #else
-	fprintf(stderr, "No bpf syscall, kernel headers too old?\n");
-	errno = ENOSYS;
-	return -1;
-#endif
-}
-#else
-static inline int bpf_open_object(const char *path, enum bpf_prog_type type,
-				  const char *sec, bool verbose)
-{
-	fprintf(stderr, "No ELF library support compiled in.\n");
-	errno = ENOSYS;
-	return -1;
-}
-
 static inline int bpf_send_map_fds(const char *path, const char *obj)
 {
 	return 0;
commit	32e93fb7f66d55d597b52ec3b10fd44a47784114	[log] [tgz]
author	Daniel Borkmann <daniel@iogearbox.net>	Fri Nov 13 00:39:29 2015 +0100
committer	Stephen Hemminger <shemming@brocade.com>	Mon Nov 23 16:10:44 2015 -0800
tree	592eebfbb71b163ac5da3d95669093a674264d6b
parent	e149d4e84384f88965ce43a6390acf7ba356187c [diff] [blame]