blob: 3b0ebd4cf423013deae58b5dc0acb64fa539d875 [file] [log] [blame]
Fenghua Yuf20e5782016-10-28 15:04:40 -07001User Interface for Resource Allocation in Intel Resource Director Technology
2
3Copyright (C) 2016 Intel Corporation
4
5Fenghua Yu <fenghua.yu@intel.com>
6Tony Luck <tony.luck@intel.com>
7
8This feature is enabled by the CONFIG_INTEL_RDT_A Kconfig and the
9X86 /proc/cpuinfo flag bits "rdt", "cat_l3" and "cdp_l3".
10
11To use the feature mount the file system:
12
13 # mount -t resctrl resctrl [-o cdp] /sys/fs/resctrl
14
15mount options are:
16
17"cdp": Enable code/data prioritization in L3 cache allocations.
18
19
20Resource groups
21---------------
22Resource groups are represented as directories in the resctrl file
23system. The default group is the root directory. Other groups may be
24created as desired by the system administrator using the "mkdir(1)"
25command, and removed using "rmdir(1)".
26
27There are three files associated with each group:
28
29"tasks": A list of tasks that belongs to this group. Tasks can be
30 added to a group by writing the task ID to the "tasks" file
31 (which will automatically remove them from the previous
32 group to which they belonged). New tasks created by fork(2)
33 and clone(2) are added to the same group as their parent.
34 If a pid is not in any sub partition, it is in root partition
35 (i.e. default partition).
36
37"cpus": A bitmask of logical CPUs assigned to this group. Writing
38 a new mask can add/remove CPUs from this group. Added CPUs
39 are removed from their previous group. Removed ones are
40 given to the default (root) group. You cannot remove CPUs
41 from the default group.
42
43"schemata": A list of all the resources available to this group.
44 Each resource has its own line and format - see below for
45 details.
46
47When a task is running the following rules define which resources
48are available to it:
49
501) If the task is a member of a non-default group, then the schemata
51for that group is used.
52
532) Else if the task belongs to the default group, but is running on a
54CPU that is assigned to some specific group, then the schemata for
55the CPU's group is used.
56
573) Otherwise the schemata for the default group is used.
58
59
60Schemata files - general concepts
61---------------------------------
62Each line in the file describes one resource. The line starts with
63the name of the resource, followed by specific values to be applied
64in each of the instances of that resource on the system.
65
66Cache IDs
67---------
68On current generation systems there is one L3 cache per socket and L2
69caches are generally just shared by the hyperthreads on a core, but this
70isn't an architectural requirement. We could have multiple separate L3
71caches on a socket, multiple cores could share an L2 cache. So instead
72of using "socket" or "core" to define the set of logical cpus sharing
73a resource we use a "Cache ID". At a given cache level this will be a
74unique number across the whole system (but it isn't guaranteed to be a
75contiguous sequence, there may be gaps). To find the ID for each logical
76CPU look in /sys/devices/system/cpu/cpu*/cache/index*/id
77
78Cache Bit Masks (CBM)
79---------------------
80For cache resources we describe the portion of the cache that is available
81for allocation using a bitmask. The maximum value of the mask is defined
82by each cpu model (and may be different for different cache levels). It
83is found using CPUID, but is also provided in the "info" directory of
84the resctrl file system in "info/{resource}/cbm_mask". X86 hardware
85requires that these masks have all the '1' bits in a contiguous block. So
860x3, 0x6 and 0xC are legal 4-bit masks with two bits set, but 0x5, 0x9
87and 0xA are not. On a system with a 20-bit mask each bit represents 5%
88of the capacity of the cache. You could partition the cache into four
89equal parts with masks: 0x1f, 0x3e0, 0x7c00, 0xf8000.
90
91
92L3 details (code and data prioritization disabled)
93--------------------------------------------------
94With CDP disabled the L3 schemata format is:
95
96 L3:<cache_id0>=<cbm>;<cache_id1>=<cbm>;...
97
98L3 details (CDP enabled via mount option to resctrl)
99----------------------------------------------------
100When CDP is enabled L3 control is split into two separate resources
101so you can specify independent masks for code and data like this:
102
103 L3data:<cache_id0>=<cbm>;<cache_id1>=<cbm>;...
104 L3code:<cache_id0>=<cbm>;<cache_id1>=<cbm>;...
105
106L2 details
107----------
108L2 cache does not support code and data prioritization, so the
109schemata format is always:
110
111 L2:<cache_id0>=<cbm>;<cache_id1>=<cbm>;...
112
113Example 1
114---------
115On a two socket machine (one L3 cache per socket) with just four bits
116for cache bit masks
117
118# mount -t resctrl resctrl /sys/fs/resctrl
119# cd /sys/fs/resctrl
120# mkdir p0 p1
121# echo "L3:0=3;1=c" > /sys/fs/resctrl/p0/schemata
122# echo "L3:0=3;1=3" > /sys/fs/resctrl/p1/schemata
123
124The default resource group is unmodified, so we have access to all parts
125of all caches (its schemata file reads "L3:0=f;1=f").
126
127Tasks that are under the control of group "p0" may only allocate from the
128"lower" 50% on cache ID 0, and the "upper" 50% of cache ID 1.
129Tasks in group "p1" use the "lower" 50% of cache on both sockets.
130
131Example 2
132---------
133Again two sockets, but this time with a more realistic 20-bit mask.
134
135Two real time tasks pid=1234 running on processor 0 and pid=5678 running on
136processor 1 on socket 0 on a 2-socket and dual core machine. To avoid noisy
137neighbors, each of the two real-time tasks exclusively occupies one quarter
138of L3 cache on socket 0.
139
140# mount -t resctrl resctrl /sys/fs/resctrl
141# cd /sys/fs/resctrl
142
143First we reset the schemata for the default group so that the "upper"
14450% of the L3 cache on socket 0 cannot be used by ordinary tasks:
145
146# echo "L3:0=3ff;1=fffff" > schemata
147
148Next we make a resource group for our first real time task and give
149it access to the "top" 25% of the cache on socket 0.
150
151# mkdir p0
152# echo "L3:0=f8000;1=fffff" > p0/schemata
153
154Finally we move our first real time task into this resource group. We
155also use taskset(1) to ensure the task always runs on a dedicated CPU
156on socket 0. Most uses of resource groups will also constrain which
157processors tasks run on.
158
159# echo 1234 > p0/tasks
160# taskset -cp 1 1234
161
162Ditto for the second real time task (with the remaining 25% of cache):
163
164# mkdir p1
165# echo "L3:0=7c00;1=fffff" > p1/schemata
166# echo 5678 > p1/tasks
167# taskset -cp 2 5678
168
169Example 3
170---------
171
172A single socket system which has real-time tasks running on core 4-7 and
173non real-time workload assigned to core 0-3. The real-time tasks share text
174and data, so a per task association is not required and due to interaction
175with the kernel it's desired that the kernel on these cores shares L3 with
176the tasks.
177
178# mount -t resctrl resctrl /sys/fs/resctrl
179# cd /sys/fs/resctrl
180
181First we reset the schemata for the default group so that the "upper"
18250% of the L3 cache on socket 0 cannot be used by ordinary tasks:
183
184# echo "L3:0=3ff" > schemata
185
186Next we make a resource group for our real time cores and give
187it access to the "top" 50% of the cache on socket 0.
188
189# mkdir p0
190# echo "L3:0=ffc00;" > p0/schemata
191
192Finally we move core 4-7 over to the new group and make sure that the
193kernel and the tasks running there get 50% of the cache.
194
195# echo C0 > p0/cpus