Will Drewry | 32ac9f5 | 2011-08-18 21:36:27 -0500 | [diff] [blame] | 1 | .TH MINIJAIL0 "1" "July 2011" "Chromium OS" "User Commands" |
| 2 | .SH NAME |
| 3 | minijail0 \- sandbox a process |
| 4 | .SH DESCRIPTION |
| 5 | .PP |
Mike Frysinger | 0fe4e4f | 2017-06-20 14:01:09 -0400 | [diff] [blame] | 6 | Runs PROGRAM inside a sandbox. See \fBminijail\fR(1) for details. |
Will Drewry | 32ac9f5 | 2011-08-18 21:36:27 -0500 | [diff] [blame] | 7 | .SH EXAMPLES |
| 8 | |
| 9 | Safely switch from root to nobody while dropping all capabilities and |
| 10 | inheriting any groups from nobody: |
| 11 | |
| 12 | # minijail0 -c 0 -G -u nobody /usr/bin/whoami |
| 13 | nobody |
| 14 | |
| 15 | Run in a PID and VFS namespace without superuser capabilities (but still |
| 16 | as root) and with a private view of /proc: |
| 17 | |
| 18 | # minijail0 -p -v -r -c 0 /bin/ps |
| 19 | PID TTY TIME CMD |
| 20 | 1 pts/0 00:00:00 minijail0 |
| 21 | 2 pts/0 00:00:00 ps |
| 22 | |
| 23 | Running a process with a seccomp filter policy at reduced privileges: |
| 24 | |
| 25 | # minijail0 -S /usr/share/minijail0/$(uname -m)/cat.policy -- \\ |
| 26 | /bin/cat /proc/self/seccomp_filter |
| 27 | ... |
| 28 | |
| 29 | .SH SECCOMP_FILTER POLICY |
| 30 | The policy file supplied to the \fB-S\fR argument supports the following syntax: |
| 31 | |
| 32 | \fB<syscall_name>\fR:\fB<ftrace filter policy>\fR |
| 33 | \fB<syscall_number>\fR:\fB<ftrace filter policy>\fR |
| 34 | \fB<empty line>\fR |
| 35 | \fB# any single line comment\fR |
| 36 | |
Mike Frysinger | 29c7234 | 2019-03-15 01:39:31 -0400 | [diff] [blame] | 37 | Long lines may be broken up using \\ at the end. |
| 38 | |
Mike Frysinger | 0fe4e4f | 2017-06-20 14:01:09 -0400 | [diff] [blame] | 39 | A policy that emulates \fBseccomp\fR(2) in mode 1 may look like: |
Will Drewry | 32ac9f5 | 2011-08-18 21:36:27 -0500 | [diff] [blame] | 40 | read: 1 |
| 41 | write: 1 |
| 42 | sig_return: 1 |
| 43 | exit: 1 |
| 44 | |
| 45 | The "1" acts as a wildcard and allows any use of the mentioned system |
| 46 | call. More advanced filtering is possible if your kernel supports |
| 47 | CONFIG_FTRACE_SYSCALLS. For example, we can allow a process to open any |
| 48 | file read only and mmap PROT_READ only: |
| 49 | |
| 50 | # open with O_LARGEFILE|O_RDONLY|O_NONBLOCK or some combination |
Mike Frysinger | 182d452 | 2018-06-14 19:02:18 -0400 | [diff] [blame] | 51 | open: arg1 == 32768 || arg1 == 0 || arg1 == 34816 || arg1 == 2048 |
| 52 | mmap2: arg2 == 0x0 |
Will Drewry | 32ac9f5 | 2011-08-18 21:36:27 -0500 | [diff] [blame] | 53 | munmap: 1 |
| 54 | close: 1 |
| 55 | |
| 56 | The supported arguments may be found by reviewing the system call |
| 57 | prototypes in the Linux kernel source code. Be aware that any |
| 58 | non-numeric comparison may be subject to time-of-check-time-of-use |
| 59 | attacks and cannot be considered safe. |
| 60 | |
| 61 | \fBexecve\fR may only be used when invoking with CAP_SYS_ADMIN privileges. |
| 62 | |
Luis Hector Chavez | 4b6ce5c | 2018-10-19 15:41:50 -0700 | [diff] [blame] | 63 | In order to promote reusability, policy files can include other policy files |
| 64 | using the following syntax: |
| 65 | |
| 66 | \fB@include /absolute/path/to/file.policy\fR |
| 67 | \fB@include ./path/relative/to/CWD/file.policy\fR |
| 68 | |
| 69 | Inclusion is limited to a single level (i.e. files that are \fB@include\fRd |
| 70 | cannot themselves \fB@include\fR more files), since that makes the policies |
| 71 | harder to understand. |
| 72 | |
Mike Frysinger | 182d452 | 2018-06-14 19:02:18 -0400 | [diff] [blame] | 73 | .SH SECCOMP_FILTER SYNTAX |
| 74 | More formally, the expression after the colon can be an expression in |
| 75 | Disjunctive Normal Form (DNF): a disjunction ("or", \fI||\fR) of |
| 76 | conjunctions ("and", \fI&&\fR) of atoms. |
| 77 | |
| 78 | .SS "Atom Syntax" |
Luis Hector Chavez | 466f231 | 2018-10-31 10:44:43 -0700 | [diff] [blame] | 79 | Atoms are of the form \fIarg{DNUM} {OP} {VAL}\fR where: |
Mike Frysinger | 182d452 | 2018-06-14 19:02:18 -0400 | [diff] [blame] | 80 | .IP |
| 81 | \[bu] \fIDNUM\fR is a decimal number |
| 82 | |
Luis Hector Chavez | 1c93783 | 2018-07-21 22:45:47 -0700 | [diff] [blame] | 83 | \[bu] \fIOP\fR is an unsigned comparison operator: |
| 84 | \fI==\fR, \fI!=\fR, \fI<\fR, \fI<=\fR, \fI>\fR, \fI>=\fR, \fI&\fR (flags set), |
| 85 | or \fIin\fR (inclusion) |
Mike Frysinger | 182d452 | 2018-06-14 19:02:18 -0400 | [diff] [blame] | 86 | |
Luis Hector Chavez | 466f231 | 2018-10-31 10:44:43 -0700 | [diff] [blame] | 87 | \[bu] \fVAL\fR is a constant expression. It can be a named constant (like |
| 88 | \fBO_RDONLY\fR), a number (octal, decimal, or hexadecimal), a mask of constants |
| 89 | separated by \fI|\fR, or a parenthesized constant expression. Constant |
| 90 | expressions can also be prefixed with the bitwise complement operator \fI~\fR |
| 91 | to produce their complement. |
Mike Frysinger | 182d452 | 2018-06-14 19:02:18 -0400 | [diff] [blame] | 92 | .RE |
| 93 | |
Luis Hector Chavez | 1c93783 | 2018-07-21 22:45:47 -0700 | [diff] [blame] | 94 | \fI==\fR, \fI!=\fR, \fI<\fR, \fI<=\fR, \fI>\fR, and \fI>=\fR should be pretty |
| 95 | self explanatory. |
Mike Frysinger | 182d452 | 2018-06-14 19:02:18 -0400 | [diff] [blame] | 96 | |
| 97 | \fI&\fR will test for a flag being set, for example, O_RDONLY for |
| 98 | .BR open (2): |
| 99 | |
| 100 | open: arg1 & O_RDONLY |
| 101 | |
| 102 | Minijail supports most common named constants, like O_RDONLY. |
| 103 | It's preferable to use named constants rather than numeric values as not all |
| 104 | architectures use the same numeric value. |
| 105 | |
| 106 | When the possible combinations of allowed flags grow, specifying them all can |
| 107 | be cumbersome. |
| 108 | This is where the \fIin\fR operator comes handy. |
| 109 | The system call will be allowed iff the flags set in the argument are included |
| 110 | (as a set) in the flags in the policy: |
| 111 | |
| 112 | mmap: arg3 in MAP_PRIVATE|MAP_ANONYMOUS |
| 113 | |
| 114 | This will allow \fBmmap\fR(2) as long as \fIarg3\fR (flags) has any combination |
Luis Hector Chavez | 466f231 | 2018-10-31 10:44:43 -0700 | [diff] [blame] | 115 | of MAP_PRIVATE and MAP_ANONYMOUS, but nothing else. One common use of this is |
| 116 | to restrict \fBmmap\fR(2) / \fBmprotect\fR(2) to only allow write^exec |
| 117 | mappings: |
| 118 | |
| 119 | mmap: arg2 in ~PROT_EXEC || arg2 in ~PROT_WRITE |
| 120 | mprotect: arg2 in ~PROT_EXEC || arg2 in ~PROT_WRITE |
Mike Frysinger | 182d452 | 2018-06-14 19:02:18 -0400 | [diff] [blame] | 121 | |
| 122 | .SS "Return Values" |
| 123 | |
| 124 | By default, blocked syscalls call the process to be killed. |
| 125 | The \fIreturn {NUM}\fR syntax can be used to force a specific errno to be |
| 126 | returned instead. |
| 127 | |
| 128 | read: return EBADF |
| 129 | |
| 130 | This expression will block the \fBread\fR(2) syscall, make it return -1, and set |
| 131 | \fBerrno\fR to EBADF (9 on x86 platforms). |
| 132 | |
| 133 | An expression can also include an optional \fIreturn <errno>\fR clause, |
| 134 | separated by a semicolon: |
| 135 | |
| 136 | read: arg0 == 0; return EBADF |
| 137 | |
| 138 | This is, if the first argument to read is 0, then allow the syscall; |
| 139 | else, block the syscall, return -1, and set \fBerrno\fR to EBADF. |
| 140 | |
Will Drewry | 32ac9f5 | 2011-08-18 21:36:27 -0500 | [diff] [blame] | 141 | .SH SECCOMP_FILTER POLICY WRITING |
| 142 | |
| 143 | Determining policy for seccomp_filter can be time consuming. System |
| 144 | calls are often named in arch-specific, or legacy tainted, ways. E.g., |
| 145 | geteuid versus geteuid32. On process death due to a seccomp filter |
| 146 | rule, the offending system call number will be supplied with a best |
| 147 | guess of the ABI defined name. This information may be used to produce |
| 148 | working baseline policies. However, if the process being contained has |
Luis Hector Chavez | 5ad756a | 2018-03-09 09:58:35 -0800 | [diff] [blame] | 149 | a fairly tight working domain, using \fBtools/generate_seccomp_policy.py\fR |
| 150 | with the output of \fBstrace -f -e raw=all <program>\fR can generate the list |
| 151 | of system calls that are needed. Note that when using libminijail or minijail |
| 152 | with preloading, supporting initial process setup calls will not be required. |
| 153 | Be conservative. |
Will Drewry | 32ac9f5 | 2011-08-18 21:36:27 -0500 | [diff] [blame] | 154 | |
| 155 | It's also possible to analyze the binary checking for all non-dead |
| 156 | functions and determining if any of them issue system calls. There is |
| 157 | no active implementation for this, but something like |
| 158 | code.google.com/p/seccompsandbox is one possible runtime variant. |
| 159 | |
| 160 | .SH AUTHOR |
| 161 | The Chromium OS Authors <chromiumos-dev@chromium.org> |
| 162 | .SH COPYRIGHT |
| 163 | Copyright \(co 2011 The Chromium OS Authors |
| 164 | License BSD-like. |
| 165 | .SH "SEE ALSO" |
| 166 | \fBminijail\fR(1) |