| Device Whitelist Controller |
| |
| 1. Description: |
| |
| Implement a cgroup to track and enforce open and mknod restrictions |
| on device files. A device cgroup associates a device access |
| whitelist with each cgroup. A whitelist entry has 4 fields. |
| 'type' is a (all), c (char), or b (block). 'all' means it applies |
| to all types and all major and minor numbers. Major and minor are |
| either an integer or * for all. Access is a composition of r |
| (read), w (write), and m (mknod). |
| |
| The root device cgroup starts with rwm to 'all'. A child device |
| cgroup gets a copy of the parent. Administrators can then remove |
| devices from the whitelist or add new entries. A child cgroup can |
| never receive a device access which is denied by its parent. |
| |
| 2. User Interface |
| |
| An entry is added using devices.allow, and removed using |
| devices.deny. For instance |
| |
| echo 'c 1:3 mr' > /sys/fs/cgroup/1/devices.allow |
| |
| allows cgroup 1 to read and mknod the device usually known as |
| /dev/null. Doing |
| |
| echo a > /sys/fs/cgroup/1/devices.deny |
| |
| will remove the default 'a *:* rwm' entry. Doing |
| |
| echo a > /sys/fs/cgroup/1/devices.allow |
| |
| will add the 'a *:* rwm' entry to the whitelist. |
| |
| 3. Security |
| |
| Any task can move itself between cgroups. This clearly won't |
| suffice, but we can decide the best way to adequately restrict |
| movement as people get some experience with this. We may just want |
| to require CAP_SYS_ADMIN, which at least is a separate bit from |
| CAP_MKNOD. We may want to just refuse moving to a cgroup which |
| isn't a descendant of the current one. Or we may want to use |
| CAP_MAC_ADMIN, since we really are trying to lock down root. |
| |
| CAP_SYS_ADMIN is needed to modify the whitelist or move another |
| task to a new cgroup. (Again we'll probably want to change that). |
| |
| A cgroup may not be granted more permissions than the cgroup's |
| parent has. |
| |
| 4. Hierarchy |
| |
| device cgroups maintain hierarchy by making sure a cgroup never has more |
| access permissions than its parent. Every time an entry is written to |
| a cgroup's devices.deny file, all its children will have that entry removed |
| from their whitelist and all the locally set whitelist entries will be |
| re-evaluated. In case one of the locally set whitelist entries would provide |
| more access than the cgroup's parent, it'll be removed from the whitelist. |
| |
| Example: |
| A |
| / \ |
| B |
| |
| group behavior exceptions |
| A allow "b 8:* rwm", "c 116:1 rw" |
| B deny "c 1:3 rwm", "c 116:2 rwm", "b 3:* rwm" |
| |
| If a device is denied in group A: |
| # echo "c 116:* r" > A/devices.deny |
| it'll propagate down and after revalidating B's entries, the whitelist entry |
| "c 116:2 rwm" will be removed: |
| |
| group whitelist entries denied devices |
| A all "b 8:* rwm", "c 116:* rw" |
| B "c 1:3 rwm", "b 3:* rwm" all the rest |
| |
| In case parent's exceptions change and local exceptions are not allowed |
| anymore, they'll be deleted. |
| |
| Notice that new whitelist entries will not be propagated: |
| A |
| / \ |
| B |
| |
| group whitelist entries denied devices |
| A "c 1:3 rwm", "c 1:5 r" all the rest |
| B "c 1:3 rwm", "c 1:5 r" all the rest |
| |
| when adding "c *:3 rwm": |
| # echo "c *:3 rwm" >A/devices.allow |
| |
| the result: |
| group whitelist entries denied devices |
| A "c *:3 rwm", "c 1:5 r" all the rest |
| B "c 1:3 rwm", "c 1:5 r" all the rest |
| |
| but now it'll be possible to add new entries to B: |
| # echo "c 2:3 rwm" >B/devices.allow |
| # echo "c 50:3 r" >B/devices.allow |
| or even |
| # echo "c *:3 rwm" >B/devices.allow |
| |
| Allowing or denying all by writing 'a' to devices.allow or devices.deny will |
| not be possible once the device cgroups has children. |
| |
| 4.1 Hierarchy (internal implementation) |
| |
| device cgroups is implemented internally using a behavior (ALLOW, DENY) and a |
| list of exceptions. The internal state is controlled using the same user |
| interface to preserve compatibility with the previous whitelist-only |
| implementation. Removal or addition of exceptions that will reduce the access |
| to devices will be propagated down the hierarchy. |
| For every propagated exception, the effective rules will be re-evaluated based |
| on current parent's access rules. |