blob: 65d16263599dd0df926905bb32b4a0ca8cba0172 [file] [log] [blame]
Stéphane Lesimple228385c2020-01-21 21:10:49 +01001.TH MINIJAIL0 "5" "July 2011" "Chromium OS" "User Commands"
Will Drewry32ac9f52011-08-18 21:36:27 -05002.SH NAME
3minijail0 \- sandbox a process
4.SH DESCRIPTION
5.PP
Mike Frysinger4ce3ab12020-01-30 15:57:17 -05006Runs PROGRAM inside a sandbox. See \fBminijail0\fR(1) for details.
Will Drewry32ac9f52011-08-18 21:36:27 -05007.SH EXAMPLES
8
9Safely switch from root to nobody while dropping all capabilities and
10inheriting any groups from nobody:
11
12 # minijail0 -c 0 -G -u nobody /usr/bin/whoami
13 nobody
14
15Run in a PID and VFS namespace without superuser capabilities (but still
16as root) and with a private view of /proc:
17
18 # minijail0 -p -v -r -c 0 /bin/ps
19 PID TTY TIME CMD
20 1 pts/0 00:00:00 minijail0
21 2 pts/0 00:00:00 ps
22
23Running a process with a seccomp filter policy at reduced privileges:
24
25 # minijail0 -S /usr/share/minijail0/$(uname -m)/cat.policy -- \\
26 /bin/cat /proc/self/seccomp_filter
27 ...
28
29.SH SECCOMP_FILTER POLICY
30The policy file supplied to the \fB-S\fR argument supports the following syntax:
31
32 \fB<syscall_name>\fR:\fB<ftrace filter policy>\fR
33 \fB<syscall_number>\fR:\fB<ftrace filter policy>\fR
34 \fB<empty line>\fR
35 \fB# any single line comment\fR
36
Mike Frysinger29c72342019-03-15 01:39:31 -040037Long lines may be broken up using \\ at the end.
38
Mike Frysinger0fe4e4f2017-06-20 14:01:09 -040039A policy that emulates \fBseccomp\fR(2) in mode 1 may look like:
Will Drewry32ac9f52011-08-18 21:36:27 -050040 read: 1
41 write: 1
42 sig_return: 1
43 exit: 1
44
45The "1" acts as a wildcard and allows any use of the mentioned system
46call. More advanced filtering is possible if your kernel supports
47CONFIG_FTRACE_SYSCALLS. For example, we can allow a process to open any
48file read only and mmap PROT_READ only:
49
50 # open with O_LARGEFILE|O_RDONLY|O_NONBLOCK or some combination
Mike Frysinger182d4522018-06-14 19:02:18 -040051 open: arg1 == 32768 || arg1 == 0 || arg1 == 34816 || arg1 == 2048
52 mmap2: arg2 == 0x0
Will Drewry32ac9f52011-08-18 21:36:27 -050053 munmap: 1
54 close: 1
55
56The supported arguments may be found by reviewing the system call
57prototypes in the Linux kernel source code. Be aware that any
58non-numeric comparison may be subject to time-of-check-time-of-use
59attacks and cannot be considered safe.
60
61\fBexecve\fR may only be used when invoking with CAP_SYS_ADMIN privileges.
62
Luis Hector Chavez4b6ce5c2018-10-19 15:41:50 -070063In order to promote reusability, policy files can include other policy files
64using the following syntax:
65
66 \fB@include /absolute/path/to/file.policy\fR
67 \fB@include ./path/relative/to/CWD/file.policy\fR
68
69Inclusion is limited to a single level (i.e. files that are \fB@include\fRd
70cannot themselves \fB@include\fR more files), since that makes the policies
71harder to understand.
72
Mike Frysinger182d4522018-06-14 19:02:18 -040073.SH SECCOMP_FILTER SYNTAX
74More formally, the expression after the colon can be an expression in
75Disjunctive Normal Form (DNF): a disjunction ("or", \fI||\fR) of
76conjunctions ("and", \fI&&\fR) of atoms.
77
78.SS "Atom Syntax"
Luis Hector Chavez466f2312018-10-31 10:44:43 -070079Atoms are of the form \fIarg{DNUM} {OP} {VAL}\fR where:
Mike Frysinger182d4522018-06-14 19:02:18 -040080.IP
81\[bu] \fIDNUM\fR is a decimal number
82
Luis Hector Chavez1c937832018-07-21 22:45:47 -070083\[bu] \fIOP\fR is an unsigned comparison operator:
84\fI==\fR, \fI!=\fR, \fI<\fR, \fI<=\fR, \fI>\fR, \fI>=\fR, \fI&\fR (flags set),
85or \fIin\fR (inclusion)
Mike Frysinger182d4522018-06-14 19:02:18 -040086
Luis Hector Chavez466f2312018-10-31 10:44:43 -070087\[bu] \fVAL\fR is a constant expression. It can be a named constant (like
88\fBO_RDONLY\fR), a number (octal, decimal, or hexadecimal), a mask of constants
89separated by \fI|\fR, or a parenthesized constant expression. Constant
90expressions can also be prefixed with the bitwise complement operator \fI~\fR
91to produce their complement.
Mike Frysinger182d4522018-06-14 19:02:18 -040092.RE
93
Luis Hector Chavez1c937832018-07-21 22:45:47 -070094\fI==\fR, \fI!=\fR, \fI<\fR, \fI<=\fR, \fI>\fR, and \fI>=\fR should be pretty
95self explanatory.
Mike Frysinger182d4522018-06-14 19:02:18 -040096
97\fI&\fR will test for a flag being set, for example, O_RDONLY for
98.BR open (2):
99
100 open: arg1 & O_RDONLY
101
102Minijail supports most common named constants, like O_RDONLY.
103It's preferable to use named constants rather than numeric values as not all
104architectures use the same numeric value.
105
106When the possible combinations of allowed flags grow, specifying them all can
107be cumbersome.
108This is where the \fIin\fR operator comes handy.
109The system call will be allowed iff the flags set in the argument are included
110(as a set) in the flags in the policy:
111
112 mmap: arg3 in MAP_PRIVATE|MAP_ANONYMOUS
113
114This will allow \fBmmap\fR(2) as long as \fIarg3\fR (flags) has any combination
Luis Hector Chavez466f2312018-10-31 10:44:43 -0700115of MAP_PRIVATE and MAP_ANONYMOUS, but nothing else. One common use of this is
116to restrict \fBmmap\fR(2) / \fBmprotect\fR(2) to only allow write^exec
117mappings:
118
119 mmap: arg2 in ~PROT_EXEC || arg2 in ~PROT_WRITE
120 mprotect: arg2 in ~PROT_EXEC || arg2 in ~PROT_WRITE
Mike Frysinger182d4522018-06-14 19:02:18 -0400121
122.SS "Return Values"
123
124By default, blocked syscalls call the process to be killed.
125The \fIreturn {NUM}\fR syntax can be used to force a specific errno to be
126returned instead.
127
128 read: return EBADF
129
130This expression will block the \fBread\fR(2) syscall, make it return -1, and set
131\fBerrno\fR to EBADF (9 on x86 platforms).
132
133An expression can also include an optional \fIreturn <errno>\fR clause,
134separated by a semicolon:
135
136 read: arg0 == 0; return EBADF
137
138This is, if the first argument to read is 0, then allow the syscall;
139else, block the syscall, return -1, and set \fBerrno\fR to EBADF.
140
Will Drewry32ac9f52011-08-18 21:36:27 -0500141.SH SECCOMP_FILTER POLICY WRITING
142
143Determining policy for seccomp_filter can be time consuming. System
144calls are often named in arch-specific, or legacy tainted, ways. E.g.,
145geteuid versus geteuid32. On process death due to a seccomp filter
146rule, the offending system call number will be supplied with a best
147guess of the ABI defined name. This information may be used to produce
148working baseline policies. However, if the process being contained has
Luis Hector Chavez5ad756a2018-03-09 09:58:35 -0800149a fairly tight working domain, using \fBtools/generate_seccomp_policy.py\fR
150with the output of \fBstrace -f -e raw=all <program>\fR can generate the list
151of system calls that are needed. Note that when using libminijail or minijail
152with preloading, supporting initial process setup calls will not be required.
153Be conservative.
Will Drewry32ac9f52011-08-18 21:36:27 -0500154
155It's also possible to analyze the binary checking for all non-dead
156functions and determining if any of them issue system calls. There is
157no active implementation for this, but something like
158code.google.com/p/seccompsandbox is one possible runtime variant.
159
160.SH AUTHOR
161The Chromium OS Authors <chromiumos-dev@chromium.org>
162.SH COPYRIGHT
163Copyright \(co 2011 The Chromium OS Authors
164License BSD-like.
165.SH "SEE ALSO"
Mike Frysinger4ce3ab12020-01-30 15:57:17 -0500166.BR minijail0 (1)