blob: d918d268cd72bd6e8c7ec839dc9235e75233a72b [file] [log] [blame]
Fenghua Yuf20e5782016-10-28 15:04:40 -07001User Interface for Resource Allocation in Intel Resource Director Technology
2
3Copyright (C) 2016 Intel Corporation
4
5Fenghua Yu <fenghua.yu@intel.com>
6Tony Luck <tony.luck@intel.com>
7
8This feature is enabled by the CONFIG_INTEL_RDT_A Kconfig and the
9X86 /proc/cpuinfo flag bits "rdt", "cat_l3" and "cdp_l3".
10
11To use the feature mount the file system:
12
13 # mount -t resctrl resctrl [-o cdp] /sys/fs/resctrl
14
15mount options are:
16
17"cdp": Enable code/data prioritization in L3 cache allocations.
18
19
Thomas Gleixner458b0d62016-11-07 11:58:12 +010020Info directory
21--------------
22
23The 'info' directory contains information about the enabled
24resources. Each resource has its own subdirectory. The subdirectory
25names reflect the resource names. Each subdirectory contains the
26following files:
27
28"num_closids": The number of CLOSIDs which are valid for this
29 resource. The kernel uses the smallest number of
30 CLOSIDs of all enabled resources as limit.
31
32"cbm_mask": The bitmask which is valid for this resource. This
33 mask is equivalent to 100%.
34
35"min_cbm_bits": The minimum number of consecutive bits which must be
36 set when writing a mask.
37
38
Fenghua Yuf20e5782016-10-28 15:04:40 -070039Resource groups
40---------------
41Resource groups are represented as directories in the resctrl file
42system. The default group is the root directory. Other groups may be
43created as desired by the system administrator using the "mkdir(1)"
44command, and removed using "rmdir(1)".
45
46There are three files associated with each group:
47
48"tasks": A list of tasks that belongs to this group. Tasks can be
49 added to a group by writing the task ID to the "tasks" file
50 (which will automatically remove them from the previous
51 group to which they belonged). New tasks created by fork(2)
52 and clone(2) are added to the same group as their parent.
53 If a pid is not in any sub partition, it is in root partition
54 (i.e. default partition).
55
56"cpus": A bitmask of logical CPUs assigned to this group. Writing
57 a new mask can add/remove CPUs from this group. Added CPUs
58 are removed from their previous group. Removed ones are
59 given to the default (root) group. You cannot remove CPUs
60 from the default group.
61
62"schemata": A list of all the resources available to this group.
63 Each resource has its own line and format - see below for
64 details.
65
66When a task is running the following rules define which resources
67are available to it:
68
691) If the task is a member of a non-default group, then the schemata
70for that group is used.
71
722) Else if the task belongs to the default group, but is running on a
73CPU that is assigned to some specific group, then the schemata for
74the CPU's group is used.
75
763) Otherwise the schemata for the default group is used.
77
78
79Schemata files - general concepts
80---------------------------------
81Each line in the file describes one resource. The line starts with
82the name of the resource, followed by specific values to be applied
83in each of the instances of that resource on the system.
84
85Cache IDs
86---------
87On current generation systems there is one L3 cache per socket and L2
88caches are generally just shared by the hyperthreads on a core, but this
89isn't an architectural requirement. We could have multiple separate L3
90caches on a socket, multiple cores could share an L2 cache. So instead
91of using "socket" or "core" to define the set of logical cpus sharing
92a resource we use a "Cache ID". At a given cache level this will be a
93unique number across the whole system (but it isn't guaranteed to be a
94contiguous sequence, there may be gaps). To find the ID for each logical
95CPU look in /sys/devices/system/cpu/cpu*/cache/index*/id
96
97Cache Bit Masks (CBM)
98---------------------
99For cache resources we describe the portion of the cache that is available
100for allocation using a bitmask. The maximum value of the mask is defined
101by each cpu model (and may be different for different cache levels). It
102is found using CPUID, but is also provided in the "info" directory of
103the resctrl file system in "info/{resource}/cbm_mask". X86 hardware
104requires that these masks have all the '1' bits in a contiguous block. So
1050x3, 0x6 and 0xC are legal 4-bit masks with two bits set, but 0x5, 0x9
106and 0xA are not. On a system with a 20-bit mask each bit represents 5%
107of the capacity of the cache. You could partition the cache into four
108equal parts with masks: 0x1f, 0x3e0, 0x7c00, 0xf8000.
109
110
111L3 details (code and data prioritization disabled)
112--------------------------------------------------
113With CDP disabled the L3 schemata format is:
114
115 L3:<cache_id0>=<cbm>;<cache_id1>=<cbm>;...
116
117L3 details (CDP enabled via mount option to resctrl)
118----------------------------------------------------
119When CDP is enabled L3 control is split into two separate resources
120so you can specify independent masks for code and data like this:
121
122 L3data:<cache_id0>=<cbm>;<cache_id1>=<cbm>;...
123 L3code:<cache_id0>=<cbm>;<cache_id1>=<cbm>;...
124
125L2 details
126----------
127L2 cache does not support code and data prioritization, so the
128schemata format is always:
129
130 L2:<cache_id0>=<cbm>;<cache_id1>=<cbm>;...
131
132Example 1
133---------
134On a two socket machine (one L3 cache per socket) with just four bits
135for cache bit masks
136
137# mount -t resctrl resctrl /sys/fs/resctrl
138# cd /sys/fs/resctrl
139# mkdir p0 p1
140# echo "L3:0=3;1=c" > /sys/fs/resctrl/p0/schemata
141# echo "L3:0=3;1=3" > /sys/fs/resctrl/p1/schemata
142
143The default resource group is unmodified, so we have access to all parts
144of all caches (its schemata file reads "L3:0=f;1=f").
145
146Tasks that are under the control of group "p0" may only allocate from the
147"lower" 50% on cache ID 0, and the "upper" 50% of cache ID 1.
148Tasks in group "p1" use the "lower" 50% of cache on both sockets.
149
150Example 2
151---------
152Again two sockets, but this time with a more realistic 20-bit mask.
153
154Two real time tasks pid=1234 running on processor 0 and pid=5678 running on
155processor 1 on socket 0 on a 2-socket and dual core machine. To avoid noisy
156neighbors, each of the two real-time tasks exclusively occupies one quarter
157of L3 cache on socket 0.
158
159# mount -t resctrl resctrl /sys/fs/resctrl
160# cd /sys/fs/resctrl
161
162First we reset the schemata for the default group so that the "upper"
16350% of the L3 cache on socket 0 cannot be used by ordinary tasks:
164
165# echo "L3:0=3ff;1=fffff" > schemata
166
167Next we make a resource group for our first real time task and give
168it access to the "top" 25% of the cache on socket 0.
169
170# mkdir p0
171# echo "L3:0=f8000;1=fffff" > p0/schemata
172
173Finally we move our first real time task into this resource group. We
174also use taskset(1) to ensure the task always runs on a dedicated CPU
175on socket 0. Most uses of resource groups will also constrain which
176processors tasks run on.
177
178# echo 1234 > p0/tasks
179# taskset -cp 1 1234
180
181Ditto for the second real time task (with the remaining 25% of cache):
182
183# mkdir p1
184# echo "L3:0=7c00;1=fffff" > p1/schemata
185# echo 5678 > p1/tasks
186# taskset -cp 2 5678
187
188Example 3
189---------
190
191A single socket system which has real-time tasks running on core 4-7 and
192non real-time workload assigned to core 0-3. The real-time tasks share text
193and data, so a per task association is not required and due to interaction
194with the kernel it's desired that the kernel on these cores shares L3 with
195the tasks.
196
197# mount -t resctrl resctrl /sys/fs/resctrl
198# cd /sys/fs/resctrl
199
200First we reset the schemata for the default group so that the "upper"
20150% of the L3 cache on socket 0 cannot be used by ordinary tasks:
202
203# echo "L3:0=3ff" > schemata
204
205Next we make a resource group for our real time cores and give
206it access to the "top" 50% of the cache on socket 0.
207
208# mkdir p0
209# echo "L3:0=ffc00;" > p0/schemata
210
211Finally we move core 4-7 over to the new group and make sure that the
212kernel and the tasks running there get 50% of the cache.
213
214# echo C0 > p0/cpus