blob: 3ea1984604697fe594cb9ea843b4603a070e86b9 [file] [log] [blame]
Fenghua Yuf20e5782016-10-28 15:04:40 -07001User Interface for Resource Allocation in Intel Resource Director Technology
2
3Copyright (C) 2016 Intel Corporation
4
5Fenghua Yu <fenghua.yu@intel.com>
6Tony Luck <tony.luck@intel.com>
7
8This feature is enabled by the CONFIG_INTEL_RDT_A Kconfig and the
9X86 /proc/cpuinfo flag bits "rdt", "cat_l3" and "cdp_l3".
10
11To use the feature mount the file system:
12
13 # mount -t resctrl resctrl [-o cdp] /sys/fs/resctrl
14
15mount options are:
16
17"cdp": Enable code/data prioritization in L3 cache allocations.
18
19
Thomas Gleixner458b0d62016-11-07 11:58:12 +010020Info directory
21--------------
22
23The 'info' directory contains information about the enabled
24resources. Each resource has its own subdirectory. The subdirectory
25names reflect the resource names. Each subdirectory contains the
26following files:
27
28"num_closids": The number of CLOSIDs which are valid for this
29 resource. The kernel uses the smallest number of
30 CLOSIDs of all enabled resources as limit.
31
32"cbm_mask": The bitmask which is valid for this resource. This
33 mask is equivalent to 100%.
34
35"min_cbm_bits": The minimum number of consecutive bits which must be
36 set when writing a mask.
37
38
Fenghua Yuf20e5782016-10-28 15:04:40 -070039Resource groups
40---------------
41Resource groups are represented as directories in the resctrl file
42system. The default group is the root directory. Other groups may be
43created as desired by the system administrator using the "mkdir(1)"
44command, and removed using "rmdir(1)".
45
46There are three files associated with each group:
47
48"tasks": A list of tasks that belongs to this group. Tasks can be
49 added to a group by writing the task ID to the "tasks" file
50 (which will automatically remove them from the previous
51 group to which they belonged). New tasks created by fork(2)
52 and clone(2) are added to the same group as their parent.
53 If a pid is not in any sub partition, it is in root partition
54 (i.e. default partition).
55
56"cpus": A bitmask of logical CPUs assigned to this group. Writing
57 a new mask can add/remove CPUs from this group. Added CPUs
58 are removed from their previous group. Removed ones are
59 given to the default (root) group. You cannot remove CPUs
60 from the default group.
61
62"schemata": A list of all the resources available to this group.
63 Each resource has its own line and format - see below for
64 details.
65
66When a task is running the following rules define which resources
67are available to it:
68
691) If the task is a member of a non-default group, then the schemata
70for that group is used.
71
722) Else if the task belongs to the default group, but is running on a
73CPU that is assigned to some specific group, then the schemata for
74the CPU's group is used.
75
763) Otherwise the schemata for the default group is used.
77
78
79Schemata files - general concepts
80---------------------------------
81Each line in the file describes one resource. The line starts with
82the name of the resource, followed by specific values to be applied
83in each of the instances of that resource on the system.
84
85Cache IDs
86---------
87On current generation systems there is one L3 cache per socket and L2
88caches are generally just shared by the hyperthreads on a core, but this
89isn't an architectural requirement. We could have multiple separate L3
90caches on a socket, multiple cores could share an L2 cache. So instead
91of using "socket" or "core" to define the set of logical cpus sharing
92a resource we use a "Cache ID". At a given cache level this will be a
93unique number across the whole system (but it isn't guaranteed to be a
94contiguous sequence, there may be gaps). To find the ID for each logical
95CPU look in /sys/devices/system/cpu/cpu*/cache/index*/id
96
97Cache Bit Masks (CBM)
98---------------------
99For cache resources we describe the portion of the cache that is available
100for allocation using a bitmask. The maximum value of the mask is defined
101by each cpu model (and may be different for different cache levels). It
102is found using CPUID, but is also provided in the "info" directory of
103the resctrl file system in "info/{resource}/cbm_mask". X86 hardware
104requires that these masks have all the '1' bits in a contiguous block. So
1050x3, 0x6 and 0xC are legal 4-bit masks with two bits set, but 0x5, 0x9
106and 0xA are not. On a system with a 20-bit mask each bit represents 5%
107of the capacity of the cache. You could partition the cache into four
108equal parts with masks: 0x1f, 0x3e0, 0x7c00, 0xf8000.
109
110
111L3 details (code and data prioritization disabled)
112--------------------------------------------------
113With CDP disabled the L3 schemata format is:
114
115 L3:<cache_id0>=<cbm>;<cache_id1>=<cbm>;...
116
117L3 details (CDP enabled via mount option to resctrl)
118----------------------------------------------------
119When CDP is enabled L3 control is split into two separate resources
120so you can specify independent masks for code and data like this:
121
122 L3data:<cache_id0>=<cbm>;<cache_id1>=<cbm>;...
123 L3code:<cache_id0>=<cbm>;<cache_id1>=<cbm>;...
124
125L2 details
126----------
127L2 cache does not support code and data prioritization, so the
128schemata format is always:
129
130 L2:<cache_id0>=<cbm>;<cache_id1>=<cbm>;...
131
Tony Luckc4026b72017-04-03 14:44:16 -0700132Reading/writing the schemata file
133---------------------------------
134Reading the schemata file will show the state of all resources
135on all domains. When writing you only need to specify those values
136which you wish to change. E.g.
137
138# cat schemata
139L3DATA:0=fffff;1=fffff;2=fffff;3=fffff
140L3CODE:0=fffff;1=fffff;2=fffff;3=fffff
141# echo "L3DATA:2=3c0;" > schemata
142# cat schemata
143L3DATA:0=fffff;1=fffff;2=3c0;3=fffff
144L3CODE:0=fffff;1=fffff;2=fffff;3=fffff
145
Fenghua Yuf20e5782016-10-28 15:04:40 -0700146Example 1
147---------
148On a two socket machine (one L3 cache per socket) with just four bits
149for cache bit masks
150
151# mount -t resctrl resctrl /sys/fs/resctrl
152# cd /sys/fs/resctrl
153# mkdir p0 p1
154# echo "L3:0=3;1=c" > /sys/fs/resctrl/p0/schemata
155# echo "L3:0=3;1=3" > /sys/fs/resctrl/p1/schemata
156
157The default resource group is unmodified, so we have access to all parts
158of all caches (its schemata file reads "L3:0=f;1=f").
159
160Tasks that are under the control of group "p0" may only allocate from the
161"lower" 50% on cache ID 0, and the "upper" 50% of cache ID 1.
162Tasks in group "p1" use the "lower" 50% of cache on both sockets.
163
164Example 2
165---------
166Again two sockets, but this time with a more realistic 20-bit mask.
167
168Two real time tasks pid=1234 running on processor 0 and pid=5678 running on
169processor 1 on socket 0 on a 2-socket and dual core machine. To avoid noisy
170neighbors, each of the two real-time tasks exclusively occupies one quarter
171of L3 cache on socket 0.
172
173# mount -t resctrl resctrl /sys/fs/resctrl
174# cd /sys/fs/resctrl
175
176First we reset the schemata for the default group so that the "upper"
17750% of the L3 cache on socket 0 cannot be used by ordinary tasks:
178
179# echo "L3:0=3ff;1=fffff" > schemata
180
181Next we make a resource group for our first real time task and give
182it access to the "top" 25% of the cache on socket 0.
183
184# mkdir p0
185# echo "L3:0=f8000;1=fffff" > p0/schemata
186
187Finally we move our first real time task into this resource group. We
188also use taskset(1) to ensure the task always runs on a dedicated CPU
189on socket 0. Most uses of resource groups will also constrain which
190processors tasks run on.
191
192# echo 1234 > p0/tasks
193# taskset -cp 1 1234
194
195Ditto for the second real time task (with the remaining 25% of cache):
196
197# mkdir p1
198# echo "L3:0=7c00;1=fffff" > p1/schemata
199# echo 5678 > p1/tasks
200# taskset -cp 2 5678
201
202Example 3
203---------
204
205A single socket system which has real-time tasks running on core 4-7 and
206non real-time workload assigned to core 0-3. The real-time tasks share text
207and data, so a per task association is not required and due to interaction
208with the kernel it's desired that the kernel on these cores shares L3 with
209the tasks.
210
211# mount -t resctrl resctrl /sys/fs/resctrl
212# cd /sys/fs/resctrl
213
214First we reset the schemata for the default group so that the "upper"
21550% of the L3 cache on socket 0 cannot be used by ordinary tasks:
216
217# echo "L3:0=3ff" > schemata
218
219Next we make a resource group for our real time cores and give
220it access to the "top" 50% of the cache on socket 0.
221
222# mkdir p0
223# echo "L3:0=ffc00;" > p0/schemata
224
225Finally we move core 4-7 over to the new group and make sure that the
226kernel and the tasks running there get 50% of the cache.
227
228# echo C0 > p0/cpus
Marcelo Tosatti3c2a7692016-12-14 15:08:37 -0200229
2304) Locking between applications
231
232Certain operations on the resctrl filesystem, composed of read/writes
233to/from multiple files, must be atomic.
234
235As an example, the allocation of an exclusive reservation of L3 cache
236involves:
237
238 1. Read the cbmmasks from each directory
239 2. Find a contiguous set of bits in the global CBM bitmask that is clear
240 in any of the directory cbmmasks
241 3. Create a new directory
242 4. Set the bits found in step 2 to the new directory "schemata" file
243
244If two applications attempt to allocate space concurrently then they can
245end up allocating the same bits so the reservations are shared instead of
246exclusive.
247
248To coordinate atomic operations on the resctrlfs and to avoid the problem
249above, the following locking procedure is recommended:
250
251Locking is based on flock, which is available in libc and also as a shell
252script command
253
254Write lock:
255
256 A) Take flock(LOCK_EX) on /sys/fs/resctrl
257 B) Read/write the directory structure.
258 C) funlock
259
260Read lock:
261
262 A) Take flock(LOCK_SH) on /sys/fs/resctrl
263 B) If success read the directory structure.
264 C) funlock
265
266Example with bash:
267
268# Atomically read directory structure
269$ flock -s /sys/fs/resctrl/ find /sys/fs/resctrl
270
271# Read directory contents and create new subdirectory
272
273$ cat create-dir.sh
274find /sys/fs/resctrl/ > output.txt
275mask = function-of(output.txt)
276mkdir /sys/fs/resctrl/newres/
277echo mask > /sys/fs/resctrl/newres/schemata
278
279$ flock /sys/fs/resctrl/ ./create-dir.sh
280
281Example with C:
282
283/*
284 * Example code do take advisory locks
285 * before accessing resctrl filesystem
286 */
287#include <sys/file.h>
288#include <stdlib.h>
289
290void resctrl_take_shared_lock(int fd)
291{
292 int ret;
293
294 /* take shared lock on resctrl filesystem */
295 ret = flock(fd, LOCK_SH);
296 if (ret) {
297 perror("flock");
298 exit(-1);
299 }
300}
301
302void resctrl_take_exclusive_lock(int fd)
303{
304 int ret;
305
306 /* release lock on resctrl filesystem */
307 ret = flock(fd, LOCK_EX);
308 if (ret) {
309 perror("flock");
310 exit(-1);
311 }
312}
313
314void resctrl_release_lock(int fd)
315{
316 int ret;
317
318 /* take shared lock on resctrl filesystem */
319 ret = flock(fd, LOCK_UN);
320 if (ret) {
321 perror("flock");
322 exit(-1);
323 }
324}
325
326void main(void)
327{
328 int fd, ret;
329
330 fd = open("/sys/fs/resctrl", O_DIRECTORY);
331 if (fd == -1) {
332 perror("open");
333 exit(-1);
334 }
335 resctrl_take_shared_lock(fd);
336 /* code to read directory contents */
337 resctrl_release_lock(fd);
338
339 resctrl_take_exclusive_lock(fd);
340 /* code to read and write directory contents */
341 resctrl_release_lock(fd);
342}