Blame - Documentation/bpf/bpf_design_QA.rst - kernel/msm-4.19

blob: 6780a6d8174580ea1caeac4a13fb4f8dae6bd91b [file] [log] [blame]

Jesper Dangaard Brouer	1a6ac1d	2018-05-14 15:42:22 +0200	[diff] [blame]	1	==============
				2	BPF Design Q&A
				3	==============
				4
Alexei Starovoitov	2e39748	2017-10-30 19:39:56 -0700	[diff] [blame]	5	BPF extensibility and applicability to networking, tracing, security
				6	in the linux kernel and several user space implementations of BPF
				7	virtual machine led to a number of misunderstanding on what BPF actually is.
				8	This short QA is an attempt to address that and outline a direction
				9	of where BPF is heading long term.
				10
Jesper Dangaard Brouer	1a6ac1d	2018-05-14 15:42:22 +0200	[diff] [blame]	11	.. contents::
				12	:local:
				13	:depth: 3
				14
				15	Questions and Answers
				16	=====================
				17
Alexei Starovoitov	2e39748	2017-10-30 19:39:56 -0700	[diff] [blame]	18	Q: Is BPF a generic instruction set similar to x64 and arm64?
Jesper Dangaard Brouer	1a6ac1d	2018-05-14 15:42:22 +0200	[diff] [blame]	19	-------------------------------------------------------------
Alexei Starovoitov	2e39748	2017-10-30 19:39:56 -0700	[diff] [blame]	20	A: NO.
				21
				22	Q: Is BPF a generic virtual machine ?
Jesper Dangaard Brouer	1a6ac1d	2018-05-14 15:42:22 +0200	[diff] [blame]	23	-------------------------------------
Alexei Starovoitov	2e39748	2017-10-30 19:39:56 -0700	[diff] [blame]	24	A: NO.
				25
Jesper Dangaard Brouer	1a6ac1d	2018-05-14 15:42:22 +0200	[diff] [blame]	26	BPF is generic instruction set with C calling convention.
				27	-----------------------------------------------------------
Alexei Starovoitov	2e39748	2017-10-30 19:39:56 -0700	[diff] [blame]	28
				29	Q: Why C calling convention was chosen?
Jesper Dangaard Brouer	1a6ac1d	2018-05-14 15:42:22 +0200	[diff] [blame]	30	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
				31
Alexei Starovoitov	2e39748	2017-10-30 19:39:56 -0700	[diff] [blame]	32	A: Because BPF programs are designed to run in the linux kernel
Jesper Dangaard Brouer	1a6ac1d	2018-05-14 15:42:22 +0200	[diff] [blame]	33	which is written in C, hence BPF defines instruction set compatible
				34	with two most used architectures x64 and arm64 (and takes into
				35	consideration important quirks of other architectures) and
				36	defines calling convention that is compatible with C calling
				37	convention of the linux kernel on those architectures.
Alexei Starovoitov	2e39748	2017-10-30 19:39:56 -0700	[diff] [blame]	38
				39	Q: can multiple return values be supported in the future?
Jesper Dangaard Brouer	1a6ac1d	2018-05-14 15:42:22 +0200	[diff] [blame]	40	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Alexei Starovoitov	2e39748	2017-10-30 19:39:56 -0700	[diff] [blame]	41	A: NO. BPF allows only register R0 to be used as return value.
				42
				43	Q: can more than 5 function arguments be supported in the future?
Jesper Dangaard Brouer	1a6ac1d	2018-05-14 15:42:22 +0200	[diff] [blame]	44	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Alexei Starovoitov	2e39748	2017-10-30 19:39:56 -0700	[diff] [blame]	45	A: NO. BPF calling convention only allows registers R1-R5 to be used
Jesper Dangaard Brouer	1a6ac1d	2018-05-14 15:42:22 +0200	[diff] [blame]	46	as arguments. BPF is not a standalone instruction set.
				47	(unlike x64 ISA that allows msft, cdecl and other conventions)
Alexei Starovoitov	2e39748	2017-10-30 19:39:56 -0700	[diff] [blame]	48
				49	Q: can BPF programs access instruction pointer or return address?
Jesper Dangaard Brouer	1a6ac1d	2018-05-14 15:42:22 +0200	[diff] [blame]	50	-----------------------------------------------------------------
Alexei Starovoitov	2e39748	2017-10-30 19:39:56 -0700	[diff] [blame]	51	A: NO.
				52
				53	Q: can BPF programs access stack pointer ?
Jesper Dangaard Brouer	1a6ac1d	2018-05-14 15:42:22 +0200	[diff] [blame]	54	------------------------------------------
				55	A: NO.
				56
				57	Only frame pointer (register R10) is accessible.
				58	From compiler point of view it's necessary to have stack pointer.
				59	For example LLVM defines register R11 as stack pointer in its
				60	BPF backend, but it makes sure that generated code never uses it.
Alexei Starovoitov	2e39748	2017-10-30 19:39:56 -0700	[diff] [blame]	61
				62	Q: Does C-calling convention diminishes possible use cases?
Jesper Dangaard Brouer	1a6ac1d	2018-05-14 15:42:22 +0200	[diff] [blame]	63	-----------------------------------------------------------
				64	A: YES.
				65
				66	BPF design forces addition of major functionality in the form
				67	of kernel helper functions and kernel objects like BPF maps with
				68	seamless interoperability between them. It lets kernel call into
				69	BPF programs and programs call kernel helpers with zero overhead.
				70	As all of them were native C code. That is particularly the case
				71	for JITed BPF programs that are indistinguishable from
				72	native kernel C code.
Alexei Starovoitov	2e39748	2017-10-30 19:39:56 -0700	[diff] [blame]	73
				74	Q: Does it mean that 'innovative' extensions to BPF code are disallowed?
Jesper Dangaard Brouer	1a6ac1d	2018-05-14 15:42:22 +0200	[diff] [blame]	75	------------------------------------------------------------------------
				76	A: Soft yes.
				77
				78	At least for now until BPF core has support for
				79	bpf-to-bpf calls, indirect calls, loops, global variables,
				80	jump tables, read only sections and all other normal constructs
				81	that C code can produce.
Alexei Starovoitov	2e39748	2017-10-30 19:39:56 -0700	[diff] [blame]	82
				83	Q: Can loops be supported in a safe way?
Jesper Dangaard Brouer	1a6ac1d	2018-05-14 15:42:22 +0200	[diff] [blame]	84	----------------------------------------
				85	A: It's not clear yet.
				86
				87	BPF developers are trying to find a way to
				88	support bounded loops where the verifier can guarantee that
				89	the program terminates in less than 4096 instructions.
				90
				91	Instruction level questions
				92	---------------------------
				93
				94	Q: LD_ABS and LD_IND instructions vs C code
				95	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Alexei Starovoitov	2e39748	2017-10-30 19:39:56 -0700	[diff] [blame]	96
				97	Q: How come LD_ABS and LD_IND instruction are present in BPF whereas
Jesper Dangaard Brouer	1a6ac1d	2018-05-14 15:42:22 +0200	[diff] [blame]	98	C code cannot express them and has to use builtin intrinsics?
Alexei Starovoitov	2e39748	2017-10-30 19:39:56 -0700	[diff] [blame]	99
Jesper Dangaard Brouer	1a6ac1d	2018-05-14 15:42:22 +0200	[diff] [blame]	100	A: This is artifact of compatibility with classic BPF. Modern
				101	networking code in BPF performs better without them.
				102	See 'direct packet access'.
				103
				104	Q: BPF instructions mapping not one-to-one to native CPU
				105	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Alexei Starovoitov	2e39748	2017-10-30 19:39:56 -0700	[diff] [blame]	106	Q: It seems not all BPF instructions are one-to-one to native CPU.
Jesper Dangaard Brouer	1a6ac1d	2018-05-14 15:42:22 +0200	[diff] [blame]	107	For example why BPF_JNE and other compare and jumps are not cpu-like?
				108
Alexei Starovoitov	2e39748	2017-10-30 19:39:56 -0700	[diff] [blame]	109	A: This was necessary to avoid introducing flags into ISA which are
Jesper Dangaard Brouer	1a6ac1d	2018-05-14 15:42:22 +0200	[diff] [blame]	110	impossible to make generic and efficient across CPU architectures.
Alexei Starovoitov	2e39748	2017-10-30 19:39:56 -0700	[diff] [blame]	111
				112	Q: why BPF_DIV instruction doesn't map to x64 div?
Jesper Dangaard Brouer	1a6ac1d	2018-05-14 15:42:22 +0200	[diff] [blame]	113	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Alexei Starovoitov	2e39748	2017-10-30 19:39:56 -0700	[diff] [blame]	114	A: Because if we picked one-to-one relationship to x64 it would have made
Jesper Dangaard Brouer	1a6ac1d	2018-05-14 15:42:22 +0200	[diff] [blame]	115	it more complicated to support on arm64 and other archs. Also it
				116	needs div-by-zero runtime check.
Alexei Starovoitov	2e39748	2017-10-30 19:39:56 -0700	[diff] [blame]	117
				118	Q: why there is no BPF_SDIV for signed divide operation?
Jesper Dangaard Brouer	1a6ac1d	2018-05-14 15:42:22 +0200	[diff] [blame]	119	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Alexei Starovoitov	2e39748	2017-10-30 19:39:56 -0700	[diff] [blame]	120	A: Because it would be rarely used. llvm errors in such case and
Jesper Dangaard Brouer	1a6ac1d	2018-05-14 15:42:22 +0200	[diff] [blame]	121	prints a suggestion to use unsigned divide instead
Alexei Starovoitov	2e39748	2017-10-30 19:39:56 -0700	[diff] [blame]	122
				123	Q: Why BPF has implicit prologue and epilogue?
Jesper Dangaard Brouer	1a6ac1d	2018-05-14 15:42:22 +0200	[diff] [blame]	124	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Alexei Starovoitov	2e39748	2017-10-30 19:39:56 -0700	[diff] [blame]	125	A: Because architectures like sparc have register windows and in general
Jesper Dangaard Brouer	1a6ac1d	2018-05-14 15:42:22 +0200	[diff] [blame]	126	there are enough subtle differences between architectures, so naive
				127	store return address into stack won't work. Another reason is BPF has
				128	to be safe from division by zero (and legacy exception path
				129	of LD_ABS insn). Those instructions need to invoke epilogue and
				130	return implicitly.
Alexei Starovoitov	2e39748	2017-10-30 19:39:56 -0700	[diff] [blame]	131
				132	Q: Why BPF_JLT and BPF_JLE instructions were not introduced in the beginning?
Jesper Dangaard Brouer	1a6ac1d	2018-05-14 15:42:22 +0200	[diff] [blame]	133	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Alexei Starovoitov	2e39748	2017-10-30 19:39:56 -0700	[diff] [blame]	134	A: Because classic BPF didn't have them and BPF authors felt that compiler
Jesper Dangaard Brouer	1a6ac1d	2018-05-14 15:42:22 +0200	[diff] [blame]	135	workaround would be acceptable. Turned out that programs lose performance
				136	due to lack of these compare instructions and they were added.
				137	These two instructions is a perfect example what kind of new BPF
				138	instructions are acceptable and can be added in the future.
				139	These two already had equivalent instructions in native CPUs.
				140	New instructions that don't have one-to-one mapping to HW instructions
				141	will not be accepted.
Alexei Starovoitov	2e39748	2017-10-30 19:39:56 -0700	[diff] [blame]	142
Jesper Dangaard Brouer	1a6ac1d	2018-05-14 15:42:22 +0200	[diff] [blame]	143	Q: BPF 32-bit subregister requirements
				144	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Alexei Starovoitov	2e39748	2017-10-30 19:39:56 -0700	[diff] [blame]	145	Q: BPF 32-bit subregisters have a requirement to zero upper 32-bits of BPF
Jesper Dangaard Brouer	1a6ac1d	2018-05-14 15:42:22 +0200	[diff] [blame]	146	registers which makes BPF inefficient virtual machine for 32-bit
				147	CPU architectures and 32-bit HW accelerators. Can true 32-bit registers
				148	be added to BPF in the future?
				149
Alexei Starovoitov	2e39748	2017-10-30 19:39:56 -0700	[diff] [blame]	150	A: NO. The first thing to improve performance on 32-bit archs is to teach
Jesper Dangaard Brouer	1a6ac1d	2018-05-14 15:42:22 +0200	[diff] [blame]	151	LLVM to generate code that uses 32-bit subregisters. Then second step
				152	is to teach verifier to mark operations where zero-ing upper bits
				153	is unnecessary. Then JITs can take advantage of those markings and
				154	drastically reduce size of generated code and improve performance.
Alexei Starovoitov	2e39748	2017-10-30 19:39:56 -0700	[diff] [blame]	155
				156	Q: Does BPF have a stable ABI?
Jesper Dangaard Brouer	1a6ac1d	2018-05-14 15:42:22 +0200	[diff] [blame]	157	------------------------------
Alexei Starovoitov	2e39748	2017-10-30 19:39:56 -0700	[diff] [blame]	158	A: YES. BPF instructions, arguments to BPF programs, set of helper
Jesper Dangaard Brouer	1a6ac1d	2018-05-14 15:42:22 +0200	[diff] [blame]	159	functions and their arguments, recognized return codes are all part
				160	of ABI. However when tracing programs are using bpf_probe_read() helper
				161	to walk kernel internal datastructures and compile with kernel
				162	internal headers these accesses can and will break with newer
				163	kernels. The union bpf_attr -> kern_version is checked at load time
				164	to prevent accidentally loading kprobe-based bpf programs written
				165	for a different kernel. Networking programs don't do kern_version check.
Alexei Starovoitov	2e39748	2017-10-30 19:39:56 -0700	[diff] [blame]	166
				167	Q: How much stack space a BPF program uses?
Jesper Dangaard Brouer	1a6ac1d	2018-05-14 15:42:22 +0200	[diff] [blame]	168	-------------------------------------------
Alexei Starovoitov	2e39748	2017-10-30 19:39:56 -0700	[diff] [blame]	169	A: Currently all program types are limited to 512 bytes of stack
Jesper Dangaard Brouer	1a6ac1d	2018-05-14 15:42:22 +0200	[diff] [blame]	170	space, but the verifier computes the actual amount of stack used
				171	and both interpreter and most JITed code consume necessary amount.
Alexei Starovoitov	2e39748	2017-10-30 19:39:56 -0700	[diff] [blame]	172
				173	Q: Can BPF be offloaded to HW?
Jesper Dangaard Brouer	1a6ac1d	2018-05-14 15:42:22 +0200	[diff] [blame]	174	------------------------------
Alexei Starovoitov	2e39748	2017-10-30 19:39:56 -0700	[diff] [blame]	175	A: YES. BPF HW offload is supported by NFP driver.
				176
				177	Q: Does classic BPF interpreter still exist?
Jesper Dangaard Brouer	1a6ac1d	2018-05-14 15:42:22 +0200	[diff] [blame]	178	--------------------------------------------
Alexei Starovoitov	2e39748	2017-10-30 19:39:56 -0700	[diff] [blame]	179	A: NO. Classic BPF programs are converted into extend BPF instructions.
				180
				181	Q: Can BPF call arbitrary kernel functions?
Jesper Dangaard Brouer	1a6ac1d	2018-05-14 15:42:22 +0200	[diff] [blame]	182	-------------------------------------------
Alexei Starovoitov	2e39748	2017-10-30 19:39:56 -0700	[diff] [blame]	183	A: NO. BPF programs can only call a set of helper functions which
Jesper Dangaard Brouer	1a6ac1d	2018-05-14 15:42:22 +0200	[diff] [blame]	184	is defined for every program type.
Alexei Starovoitov	2e39748	2017-10-30 19:39:56 -0700	[diff] [blame]	185
				186	Q: Can BPF overwrite arbitrary kernel memory?
Jesper Dangaard Brouer	1a6ac1d	2018-05-14 15:42:22 +0200	[diff] [blame]	187	---------------------------------------------
				188	A: NO.
				189
				190	Tracing bpf programs can read arbitrary memory with bpf_probe_read()
				191	and bpf_probe_read_str() helpers. Networking programs cannot read
				192	arbitrary memory, since they don't have access to these helpers.
				193	Programs can never read or write arbitrary memory directly.
Alexei Starovoitov	2e39748	2017-10-30 19:39:56 -0700	[diff] [blame]	194
				195	Q: Can BPF overwrite arbitrary user memory?
Jesper Dangaard Brouer	1a6ac1d	2018-05-14 15:42:22 +0200	[diff] [blame]	196	-------------------------------------------
				197	A: Sort-of.
Alexei Starovoitov	2e39748	2017-10-30 19:39:56 -0700	[diff] [blame]	198
Jesper Dangaard Brouer	1a6ac1d	2018-05-14 15:42:22 +0200	[diff] [blame]	199	Tracing BPF programs can overwrite the user memory
				200	of the current task with bpf_probe_write_user(). Every time such
				201	program is loaded the kernel will print warning message, so
				202	this helper is only useful for experiments and prototypes.
				203	Tracing BPF programs are root only.
				204
				205	Q: bpf_trace_printk() helper warning
				206	------------------------------------
Alexei Starovoitov	2e39748	2017-10-30 19:39:56 -0700	[diff] [blame]	207	Q: When bpf_trace_printk() helper is used the kernel prints nasty
Jesper Dangaard Brouer	1a6ac1d	2018-05-14 15:42:22 +0200	[diff] [blame]	208	warning message. Why is that?
Alexei Starovoitov	2e39748	2017-10-30 19:39:56 -0700	[diff] [blame]	209
Jesper Dangaard Brouer	1a6ac1d	2018-05-14 15:42:22 +0200	[diff] [blame]	210	A: This is done to nudge program authors into better interfaces when
				211	programs need to pass data to user space. Like bpf_perf_event_output()
				212	can be used to efficiently stream data via perf ring buffer.
				213	BPF maps can be used for asynchronous data sharing between kernel
				214	and user space. bpf_trace_printk() should only be used for debugging.
				215
				216	Q: New functionality via kernel modules?
				217	----------------------------------------
Alexei Starovoitov	2e39748	2017-10-30 19:39:56 -0700	[diff] [blame]	218	Q: Can BPF functionality such as new program or map types, new
Jesper Dangaard Brouer	1a6ac1d	2018-05-14 15:42:22 +0200	[diff] [blame]	219	helpers, etc be added out of kernel module code?
				220
Alexei Starovoitov	2e39748	2017-10-30 19:39:56 -0700	[diff] [blame]	221	A: NO.