blob: 41f37fea1276839b80cd4c220af27963f52c6a1d [file] [log] [blame]
Li Zefan3b1b3f62008-11-12 13:26:50 -08001The cgroup freezer is useful to batch job management system which start
Matt Helsleybde5ab62008-10-18 20:27:24 -07002and stop sets of tasks in order to schedule the resources of a machine
3according to the desires of a system administrator. This sort of program
4is often used on HPC clusters to schedule access to the cluster as a
5whole. The cgroup freezer uses cgroups to describe the set of tasks to
6be started/stopped by the batch job management system. It also provides
7a means to start and stop the tasks composing the job.
8
Li Zefan3b1b3f62008-11-12 13:26:50 -08009The cgroup freezer will also be useful for checkpointing running groups
Matt Helsleybde5ab62008-10-18 20:27:24 -070010of tasks. The freezer allows the checkpoint code to obtain a consistent
11image of the tasks by attempting to force the tasks in a cgroup into a
12quiescent state. Once the tasks are quiescent another task can
13walk /proc or invoke a kernel interface to gather information about the
14quiesced tasks. Checkpointed tasks can be restarted later should a
15recoverable error occur. This also allows the checkpointed tasks to be
16migrated between nodes in a cluster by copying the gathered information
17to another node and restarting the tasks there.
18
Li Zefan3b1b3f62008-11-12 13:26:50 -080019Sequences of SIGSTOP and SIGCONT are not always sufficient for stopping
Matt Helsleybde5ab62008-10-18 20:27:24 -070020and resuming tasks in userspace. Both of these signals are observable
21from within the tasks we wish to freeze. While SIGSTOP cannot be caught,
22blocked, or ignored it can be seen by waiting or ptracing parent tasks.
23SIGCONT is especially unsuitable since it can be caught by the task. Any
24programs designed to watch for SIGSTOP and SIGCONT could be broken by
25attempting to use SIGSTOP and SIGCONT to stop and resume tasks. We can
26demonstrate this problem using nested bash shells:
27
28 $ echo $$
29 16644
30 $ bash
31 $ echo $$
32 16690
33
34 From a second, unrelated bash shell:
35 $ kill -SIGSTOP 16690
36 $ kill -SIGCONT 16990
37
38 <at this point 16990 exits and causes 16644 to exit too>
39
Li Zefan3b1b3f62008-11-12 13:26:50 -080040This happens because bash can observe both signals and choose how it
Matt Helsleybde5ab62008-10-18 20:27:24 -070041responds to them.
42
Li Zefan3b1b3f62008-11-12 13:26:50 -080043Another example of a program which catches and responds to these
Matt Helsleybde5ab62008-10-18 20:27:24 -070044signals is gdb. In fact any program designed to use ptrace is likely to
45have a problem with this method of stopping and resuming tasks.
46
Li Zefan3b1b3f62008-11-12 13:26:50 -080047In contrast, the cgroup freezer uses the kernel freezer code to
Matt Helsleybde5ab62008-10-18 20:27:24 -070048prevent the freeze/unfreeze cycle from becoming visible to the tasks
49being frozen. This allows the bash example above and gdb to run as
50expected.
51
Li Zefan3b1b3f62008-11-12 13:26:50 -080052The freezer subsystem in the container filesystem defines a file named
Matt Helsleybde5ab62008-10-18 20:27:24 -070053freezer.state. Writing "FROZEN" to the state file will freeze all tasks in the
54cgroup. Subsequently writing "THAWED" will unfreeze the tasks in the cgroup.
55Reading will return the current state.
56
Li Zefan3b1b3f62008-11-12 13:26:50 -080057Note freezer.state doesn't exist in root cgroup, which means root cgroup
58is non-freezable.
59
Matt Helsleybde5ab62008-10-18 20:27:24 -070060* Examples of usage :
61
Li Zefan3b1b3f62008-11-12 13:26:50 -080062 # mkdir /containers
Matt Helsleybde5ab62008-10-18 20:27:24 -070063 # mount -t cgroup -ofreezer freezer /containers
64 # mkdir /containers/0
65 # echo $some_pid > /containers/0/tasks
66
67to get status of the freezer subsystem :
68
69 # cat /containers/0/freezer.state
70 THAWED
71
72to freeze all tasks in the container :
73
74 # echo FROZEN > /containers/0/freezer.state
75 # cat /containers/0/freezer.state
76 FREEZING
77 # cat /containers/0/freezer.state
78 FROZEN
79
80to unfreeze all tasks in the container :
81
82 # echo THAWED > /containers/0/freezer.state
83 # cat /containers/0/freezer.state
84 THAWED
85
86This is the basic mechanism which should do the right thing for user space task
87in a simple scenario.
88
89It's important to note that freezing can be incomplete. In that case we return
90EBUSY. This means that some tasks in the cgroup are busy doing something that
91prevents us from completely freezing the cgroup at this time. After EBUSY,
92the cgroup will remain partially frozen -- reflected by freezer.state reporting
93"FREEZING" when read. The state will remain "FREEZING" until one of these
94things happens:
95
96 1) Userspace cancels the freezing operation by writing "THAWED" to
97 the freezer.state file
98 2) Userspace retries the freezing operation by writing "FROZEN" to
99 the freezer.state file (writing "FREEZING" is not legal
Li Zefan3b1b3f62008-11-12 13:26:50 -0800100 and returns EINVAL)
Matt Helsleybde5ab62008-10-18 20:27:24 -0700101 3) The tasks that blocked the cgroup from entering the "FROZEN"
102 state disappear from the cgroup's set of tasks.