Li Zefan | 3b1b3f6 | 2008-11-12 13:26:50 -0800 | [diff] [blame] | 1 | The cgroup freezer is useful to batch job management system which start |
Matt Helsley | bde5ab6 | 2008-10-18 20:27:24 -0700 | [diff] [blame] | 2 | and stop sets of tasks in order to schedule the resources of a machine |
| 3 | according to the desires of a system administrator. This sort of program |
| 4 | is often used on HPC clusters to schedule access to the cluster as a |
| 5 | whole. The cgroup freezer uses cgroups to describe the set of tasks to |
| 6 | be started/stopped by the batch job management system. It also provides |
| 7 | a means to start and stop the tasks composing the job. |
| 8 | |
Li Zefan | 3b1b3f6 | 2008-11-12 13:26:50 -0800 | [diff] [blame] | 9 | The cgroup freezer will also be useful for checkpointing running groups |
Matt Helsley | bde5ab6 | 2008-10-18 20:27:24 -0700 | [diff] [blame] | 10 | of tasks. The freezer allows the checkpoint code to obtain a consistent |
| 11 | image of the tasks by attempting to force the tasks in a cgroup into a |
| 12 | quiescent state. Once the tasks are quiescent another task can |
| 13 | walk /proc or invoke a kernel interface to gather information about the |
| 14 | quiesced tasks. Checkpointed tasks can be restarted later should a |
| 15 | recoverable error occur. This also allows the checkpointed tasks to be |
| 16 | migrated between nodes in a cluster by copying the gathered information |
| 17 | to another node and restarting the tasks there. |
| 18 | |
Li Zefan | 3b1b3f6 | 2008-11-12 13:26:50 -0800 | [diff] [blame] | 19 | Sequences of SIGSTOP and SIGCONT are not always sufficient for stopping |
Matt Helsley | bde5ab6 | 2008-10-18 20:27:24 -0700 | [diff] [blame] | 20 | and resuming tasks in userspace. Both of these signals are observable |
| 21 | from within the tasks we wish to freeze. While SIGSTOP cannot be caught, |
| 22 | blocked, or ignored it can be seen by waiting or ptracing parent tasks. |
| 23 | SIGCONT is especially unsuitable since it can be caught by the task. Any |
| 24 | programs designed to watch for SIGSTOP and SIGCONT could be broken by |
| 25 | attempting to use SIGSTOP and SIGCONT to stop and resume tasks. We can |
| 26 | demonstrate this problem using nested bash shells: |
| 27 | |
| 28 | $ echo $$ |
| 29 | 16644 |
| 30 | $ bash |
| 31 | $ echo $$ |
| 32 | 16690 |
| 33 | |
| 34 | From a second, unrelated bash shell: |
| 35 | $ kill -SIGSTOP 16690 |
Rafael J. Wysocki | 5f11161 | 2011-11-06 22:00:20 +0100 | [diff] [blame] | 36 | $ kill -SIGCONT 16690 |
Matt Helsley | bde5ab6 | 2008-10-18 20:27:24 -0700 | [diff] [blame] | 37 | |
Rafael J. Wysocki | 5f11161 | 2011-11-06 22:00:20 +0100 | [diff] [blame] | 38 | <at this point 16690 exits and causes 16644 to exit too> |
Matt Helsley | bde5ab6 | 2008-10-18 20:27:24 -0700 | [diff] [blame] | 39 | |
Li Zefan | 3b1b3f6 | 2008-11-12 13:26:50 -0800 | [diff] [blame] | 40 | This happens because bash can observe both signals and choose how it |
Matt Helsley | bde5ab6 | 2008-10-18 20:27:24 -0700 | [diff] [blame] | 41 | responds to them. |
| 42 | |
Li Zefan | 3b1b3f6 | 2008-11-12 13:26:50 -0800 | [diff] [blame] | 43 | Another example of a program which catches and responds to these |
Matt Helsley | bde5ab6 | 2008-10-18 20:27:24 -0700 | [diff] [blame] | 44 | signals is gdb. In fact any program designed to use ptrace is likely to |
| 45 | have a problem with this method of stopping and resuming tasks. |
| 46 | |
Li Zefan | 3b1b3f6 | 2008-11-12 13:26:50 -0800 | [diff] [blame] | 47 | In contrast, the cgroup freezer uses the kernel freezer code to |
Matt Helsley | bde5ab6 | 2008-10-18 20:27:24 -0700 | [diff] [blame] | 48 | prevent the freeze/unfreeze cycle from becoming visible to the tasks |
| 49 | being frozen. This allows the bash example above and gdb to run as |
| 50 | expected. |
| 51 | |
Tejun Heo | ef9fe98 | 2012-11-09 09:12:30 -0800 | [diff] [blame] | 52 | The cgroup freezer is hierarchical. Freezing a cgroup freezes all |
| 53 | tasks beloning to the cgroup and all its descendant cgroups. Each |
| 54 | cgroup has its own state (self-state) and the state inherited from the |
| 55 | parent (parent-state). Iff both states are THAWED, the cgroup is |
| 56 | THAWED. |
Matt Helsley | bde5ab6 | 2008-10-18 20:27:24 -0700 | [diff] [blame] | 57 | |
Tejun Heo | ef9fe98 | 2012-11-09 09:12:30 -0800 | [diff] [blame] | 58 | The following cgroupfs files are created by cgroup freezer. |
| 59 | |
| 60 | * freezer.state: Read-write. |
| 61 | |
| 62 | When read, returns the effective state of the cgroup - "THAWED", |
| 63 | "FREEZING" or "FROZEN". This is the combined self and parent-states. |
| 64 | If any is freezing, the cgroup is freezing (FREEZING or FROZEN). |
| 65 | |
| 66 | FREEZING cgroup transitions into FROZEN state when all tasks |
| 67 | belonging to the cgroup and its descendants become frozen. Note that |
| 68 | a cgroup reverts to FREEZING from FROZEN after a new task is added |
| 69 | to the cgroup or one of its descendant cgroups until the new task is |
| 70 | frozen. |
| 71 | |
| 72 | When written, sets the self-state of the cgroup. Two values are |
| 73 | allowed - "FROZEN" and "THAWED". If FROZEN is written, the cgroup, |
| 74 | if not already freezing, enters FREEZING state along with all its |
| 75 | descendant cgroups. |
| 76 | |
| 77 | If THAWED is written, the self-state of the cgroup is changed to |
| 78 | THAWED. Note that the effective state may not change to THAWED if |
| 79 | the parent-state is still freezing. If a cgroup's effective state |
| 80 | becomes THAWED, all its descendants which are freezing because of |
| 81 | the cgroup also leave the freezing state. |
| 82 | |
| 83 | * freezer.self_freezing: Read only. |
| 84 | |
| 85 | Shows the self-state. 0 if the self-state is THAWED; otherwise, 1. |
| 86 | This value is 1 iff the last write to freezer.state was "FROZEN". |
| 87 | |
| 88 | * freezer.parent_freezing: Read only. |
| 89 | |
| 90 | Shows the parent-state. 0 if none of the cgroup's ancestors is |
| 91 | frozen; otherwise, 1. |
| 92 | |
| 93 | The root cgroup is non-freezable and the above interface files don't |
| 94 | exist. |
Li Zefan | 3b1b3f6 | 2008-11-12 13:26:50 -0800 | [diff] [blame] | 95 | |
Matt Helsley | bde5ab6 | 2008-10-18 20:27:24 -0700 | [diff] [blame] | 96 | * Examples of usage : |
| 97 | |
Jörg Sommer | f6e07d3 | 2011-06-15 12:59:45 -0700 | [diff] [blame] | 98 | # mkdir /sys/fs/cgroup/freezer |
| 99 | # mount -t cgroup -ofreezer freezer /sys/fs/cgroup/freezer |
| 100 | # mkdir /sys/fs/cgroup/freezer/0 |
| 101 | # echo $some_pid > /sys/fs/cgroup/freezer/0/tasks |
Matt Helsley | bde5ab6 | 2008-10-18 20:27:24 -0700 | [diff] [blame] | 102 | |
| 103 | to get status of the freezer subsystem : |
| 104 | |
Jörg Sommer | f6e07d3 | 2011-06-15 12:59:45 -0700 | [diff] [blame] | 105 | # cat /sys/fs/cgroup/freezer/0/freezer.state |
Matt Helsley | bde5ab6 | 2008-10-18 20:27:24 -0700 | [diff] [blame] | 106 | THAWED |
| 107 | |
| 108 | to freeze all tasks in the container : |
| 109 | |
Jörg Sommer | f6e07d3 | 2011-06-15 12:59:45 -0700 | [diff] [blame] | 110 | # echo FROZEN > /sys/fs/cgroup/freezer/0/freezer.state |
| 111 | # cat /sys/fs/cgroup/freezer/0/freezer.state |
Matt Helsley | bde5ab6 | 2008-10-18 20:27:24 -0700 | [diff] [blame] | 112 | FREEZING |
Jörg Sommer | f6e07d3 | 2011-06-15 12:59:45 -0700 | [diff] [blame] | 113 | # cat /sys/fs/cgroup/freezer/0/freezer.state |
Matt Helsley | bde5ab6 | 2008-10-18 20:27:24 -0700 | [diff] [blame] | 114 | FROZEN |
| 115 | |
| 116 | to unfreeze all tasks in the container : |
| 117 | |
Jörg Sommer | f6e07d3 | 2011-06-15 12:59:45 -0700 | [diff] [blame] | 118 | # echo THAWED > /sys/fs/cgroup/freezer/0/freezer.state |
| 119 | # cat /sys/fs/cgroup/freezer/0/freezer.state |
Matt Helsley | bde5ab6 | 2008-10-18 20:27:24 -0700 | [diff] [blame] | 120 | THAWED |
| 121 | |
| 122 | This is the basic mechanism which should do the right thing for user space task |
| 123 | in a simple scenario. |