| Per-task statistics interface |
| ----------------------------- |
| |
| |
| Taskstats is a netlink-based interface for sending per-task and |
| per-process statistics from the kernel to userspace. |
| |
| Taskstats was designed for the following benefits: |
| |
| - efficiently provide statistics during lifetime of a task and on its exit |
| - unified interface for multiple accounting subsystems |
| - extensibility for use by future accounting patches |
| |
| Terminology |
| ----------- |
| |
| "pid", "tid" and "task" are used interchangeably and refer to the standard |
| Linux task defined by struct task_struct. per-pid stats are the same as |
| per-task stats. |
| |
| "tgid", "process" and "thread group" are used interchangeably and refer to the |
| tasks that share an mm_struct i.e. the traditional Unix process. Despite the |
| use of tgid, there is no special treatment for the task that is thread group |
| leader - a process is deemed alive as long as it has any task belonging to it. |
| |
| Usage |
| ----- |
| |
| To get statistics during task's lifetime, userspace opens a unicast netlink |
| socket (NETLINK_GENERIC family) and sends commands specifying a pid or a tgid. |
| The response contains statistics for a task (if pid is specified) or the sum of |
| statistics for all tasks of the process (if tgid is specified). |
| |
| To obtain statistics for tasks which are exiting, userspace opens a multicast |
| netlink socket. Each time a task exits, two records are sent by the kernel to |
| each listener on the multicast socket. The first the per-pid task's statistics |
| and the second is the sum for all tasks of the process to which the task |
| belongs (the task does not need to be the thread group leader). The need for |
| per-tgid stats to be sent for each exiting task is explained in the per-tgid |
| stats section below. |
| |
| |
| Interface |
| --------- |
| |
| The user-kernel interface is encapsulated in include/linux/taskstats.h |
| |
| To avoid this documentation becoming obsolete as the interface evolves, only |
| an outline of the current version is given. taskstats.h always overrides the |
| description here. |
| |
| struct taskstats is the common accounting structure for both per-pid and |
| per-tgid data. It is versioned and can be extended by each accounting subsystem |
| that is added to the kernel. The fields and their semantics are defined in the |
| taskstats.h file. |
| |
| The data exchanged between user and kernel space is a netlink message belonging |
| to the NETLINK_GENERIC family and using the netlink attributes interface. |
| The messages are in the format |
| |
| +----------+- - -+-------------+-------------------+ |
| | nlmsghdr | Pad | genlmsghdr | taskstats payload | |
| +----------+- - -+-------------+-------------------+ |
| |
| |
| The taskstats payload is one of the following three kinds: |
| |
| 1. Commands: Sent from user to kernel. The payload is one attribute, of type |
| TASKSTATS_CMD_ATTR_PID/TGID, containing a u32 pid or tgid in the attribute |
| payload. The pid/tgid denotes the task/process for which userspace wants |
| statistics. |
| |
| 2. Response for a command: sent from the kernel in response to a userspace |
| command. The payload is a series of three attributes of type: |
| |
| a) TASKSTATS_TYPE_AGGR_PID/TGID : attribute containing no payload but indicates |
| a pid/tgid will be followed by some stats. |
| |
| b) TASKSTATS_TYPE_PID/TGID: attribute whose payload is the pid/tgid whose stats |
| is being returned. |
| |
| c) TASKSTATS_TYPE_STATS: attribute with a struct taskstsats as payload. The |
| same structure is used for both per-pid and per-tgid stats. |
| |
| 3. New message sent by kernel whenever a task exits. The payload consists of a |
| series of attributes of the following type: |
| |
| a) TASKSTATS_TYPE_AGGR_PID: indicates next two attributes will be pid+stats |
| b) TASKSTATS_TYPE_PID: contains exiting task's pid |
| c) TASKSTATS_TYPE_STATS: contains the exiting task's per-pid stats |
| d) TASKSTATS_TYPE_AGGR_TGID: indicates next two attributes will be tgid+stats |
| e) TASKSTATS_TYPE_TGID: contains tgid of process to which task belongs |
| f) TASKSTATS_TYPE_STATS: contains the per-tgid stats for exiting task's process |
| |
| |
| per-tgid stats |
| -------------- |
| |
| Taskstats provides per-process stats, in addition to per-task stats, since |
| resource management is often done at a process granularity and aggregating task |
| stats in userspace alone is inefficient and potentially inaccurate (due to lack |
| of atomicity). |
| |
| However, maintaining per-process, in addition to per-task stats, within the |
| kernel has space and time overheads. Hence the taskstats implementation |
| dynamically sums up the per-task stats for each task belonging to a process |
| whenever per-process stats are needed. |
| |
| Not maintaining per-tgid stats creates a problem when userspace is interested |
| in getting these stats when the process dies i.e. the last thread of |
| a process exits. It isn't possible to simply return some aggregated per-process |
| statistic from the kernel. |
| |
| The approach taken by taskstats is to return the per-tgid stats *each* time |
| a task exits, in addition to the per-pid stats for that task. Userspace can |
| maintain task<->process mappings and use them to maintain the per-process stats |
| in userspace, updating the aggregate appropriately as the tasks of a process |
| exit. |
| |
| Extending taskstats |
| ------------------- |
| |
| There are two ways to extend the taskstats interface to export more |
| per-task/process stats as patches to collect them get added to the kernel |
| in future: |
| |
| 1. Adding more fields to the end of the existing struct taskstats. Backward |
| compatibility is ensured by the version number within the |
| structure. Userspace will use only the fields of the struct that correspond |
| to the version its using. |
| |
| 2. Defining separate statistic structs and using the netlink attributes |
| interface to return them. Since userspace processes each netlink attribute |
| independently, it can always ignore attributes whose type it does not |
| understand (because it is using an older version of the interface). |
| |
| |
| Choosing between 1. and 2. is a matter of trading off flexibility and |
| overhead. If only a few fields need to be added, then 1. is the preferable |
| path since the kernel and userspace don't need to incur the overhead of |
| processing new netlink attributes. But if the new fields expand the existing |
| struct too much, requiring disparate userspace accounting utilities to |
| unnecessarily receive large structures whose fields are of no interest, then |
| extending the attributes structure would be worthwhile. |
| |
| ---- |