blob: a1ace91c4b86ab1c91d8c306d556f8b700d0ced2 [file] [log] [blame]
Fenghua Yuf20e5782016-10-28 15:04:40 -07001User Interface for Resource Allocation in Intel Resource Director Technology
2
3Copyright (C) 2016 Intel Corporation
4
5Fenghua Yu <fenghua.yu@intel.com>
6Tony Luck <tony.luck@intel.com>
7
8This feature is enabled by the CONFIG_INTEL_RDT_A Kconfig and the
9X86 /proc/cpuinfo flag bits "rdt", "cat_l3" and "cdp_l3".
10
11To use the feature mount the file system:
12
13 # mount -t resctrl resctrl [-o cdp] /sys/fs/resctrl
14
15mount options are:
16
17"cdp": Enable code/data prioritization in L3 cache allocations.
18
19
Thomas Gleixner458b0d62016-11-07 11:58:12 +010020Info directory
21--------------
22
23The 'info' directory contains information about the enabled
24resources. Each resource has its own subdirectory. The subdirectory
25names reflect the resource names. Each subdirectory contains the
26following files:
27
28"num_closids": The number of CLOSIDs which are valid for this
29 resource. The kernel uses the smallest number of
30 CLOSIDs of all enabled resources as limit.
31
32"cbm_mask": The bitmask which is valid for this resource. This
33 mask is equivalent to 100%.
34
35"min_cbm_bits": The minimum number of consecutive bits which must be
36 set when writing a mask.
37
38
Fenghua Yuf20e5782016-10-28 15:04:40 -070039Resource groups
40---------------
41Resource groups are represented as directories in the resctrl file
42system. The default group is the root directory. Other groups may be
43created as desired by the system administrator using the "mkdir(1)"
44command, and removed using "rmdir(1)".
45
46There are three files associated with each group:
47
48"tasks": A list of tasks that belongs to this group. Tasks can be
49 added to a group by writing the task ID to the "tasks" file
50 (which will automatically remove them from the previous
51 group to which they belonged). New tasks created by fork(2)
52 and clone(2) are added to the same group as their parent.
53 If a pid is not in any sub partition, it is in root partition
54 (i.e. default partition).
55
56"cpus": A bitmask of logical CPUs assigned to this group. Writing
57 a new mask can add/remove CPUs from this group. Added CPUs
58 are removed from their previous group. Removed ones are
59 given to the default (root) group. You cannot remove CPUs
60 from the default group.
61
Jiri Olsa4ffa3c92017-04-10 16:52:32 +020062"cpus_list": One or more CPU ranges of logical CPUs assigned to this
63 group. Same rules apply like for the "cpus" file.
64
Fenghua Yuf20e5782016-10-28 15:04:40 -070065"schemata": A list of all the resources available to this group.
66 Each resource has its own line and format - see below for
67 details.
68
69When a task is running the following rules define which resources
70are available to it:
71
721) If the task is a member of a non-default group, then the schemata
73for that group is used.
74
752) Else if the task belongs to the default group, but is running on a
76CPU that is assigned to some specific group, then the schemata for
77the CPU's group is used.
78
793) Otherwise the schemata for the default group is used.
80
81
82Schemata files - general concepts
83---------------------------------
84Each line in the file describes one resource. The line starts with
85the name of the resource, followed by specific values to be applied
86in each of the instances of that resource on the system.
87
88Cache IDs
89---------
90On current generation systems there is one L3 cache per socket and L2
91caches are generally just shared by the hyperthreads on a core, but this
92isn't an architectural requirement. We could have multiple separate L3
93caches on a socket, multiple cores could share an L2 cache. So instead
94of using "socket" or "core" to define the set of logical cpus sharing
95a resource we use a "Cache ID". At a given cache level this will be a
96unique number across the whole system (but it isn't guaranteed to be a
97contiguous sequence, there may be gaps). To find the ID for each logical
98CPU look in /sys/devices/system/cpu/cpu*/cache/index*/id
99
100Cache Bit Masks (CBM)
101---------------------
102For cache resources we describe the portion of the cache that is available
103for allocation using a bitmask. The maximum value of the mask is defined
104by each cpu model (and may be different for different cache levels). It
105is found using CPUID, but is also provided in the "info" directory of
106the resctrl file system in "info/{resource}/cbm_mask". X86 hardware
107requires that these masks have all the '1' bits in a contiguous block. So
1080x3, 0x6 and 0xC are legal 4-bit masks with two bits set, but 0x5, 0x9
109and 0xA are not. On a system with a 20-bit mask each bit represents 5%
110of the capacity of the cache. You could partition the cache into four
111equal parts with masks: 0x1f, 0x3e0, 0x7c00, 0xf8000.
112
113
114L3 details (code and data prioritization disabled)
115--------------------------------------------------
116With CDP disabled the L3 schemata format is:
117
118 L3:<cache_id0>=<cbm>;<cache_id1>=<cbm>;...
119
120L3 details (CDP enabled via mount option to resctrl)
121----------------------------------------------------
122When CDP is enabled L3 control is split into two separate resources
123so you can specify independent masks for code and data like this:
124
125 L3data:<cache_id0>=<cbm>;<cache_id1>=<cbm>;...
126 L3code:<cache_id0>=<cbm>;<cache_id1>=<cbm>;...
127
128L2 details
129----------
130L2 cache does not support code and data prioritization, so the
131schemata format is always:
132
133 L2:<cache_id0>=<cbm>;<cache_id1>=<cbm>;...
134
Tony Luckc4026b72017-04-03 14:44:16 -0700135Reading/writing the schemata file
136---------------------------------
137Reading the schemata file will show the state of all resources
138on all domains. When writing you only need to specify those values
139which you wish to change. E.g.
140
141# cat schemata
142L3DATA:0=fffff;1=fffff;2=fffff;3=fffff
143L3CODE:0=fffff;1=fffff;2=fffff;3=fffff
144# echo "L3DATA:2=3c0;" > schemata
145# cat schemata
146L3DATA:0=fffff;1=fffff;2=3c0;3=fffff
147L3CODE:0=fffff;1=fffff;2=fffff;3=fffff
148
Fenghua Yuf20e5782016-10-28 15:04:40 -0700149Example 1
150---------
151On a two socket machine (one L3 cache per socket) with just four bits
152for cache bit masks
153
154# mount -t resctrl resctrl /sys/fs/resctrl
155# cd /sys/fs/resctrl
156# mkdir p0 p1
157# echo "L3:0=3;1=c" > /sys/fs/resctrl/p0/schemata
158# echo "L3:0=3;1=3" > /sys/fs/resctrl/p1/schemata
159
160The default resource group is unmodified, so we have access to all parts
161of all caches (its schemata file reads "L3:0=f;1=f").
162
163Tasks that are under the control of group "p0" may only allocate from the
164"lower" 50% on cache ID 0, and the "upper" 50% of cache ID 1.
165Tasks in group "p1" use the "lower" 50% of cache on both sockets.
166
167Example 2
168---------
169Again two sockets, but this time with a more realistic 20-bit mask.
170
171Two real time tasks pid=1234 running on processor 0 and pid=5678 running on
172processor 1 on socket 0 on a 2-socket and dual core machine. To avoid noisy
173neighbors, each of the two real-time tasks exclusively occupies one quarter
174of L3 cache on socket 0.
175
176# mount -t resctrl resctrl /sys/fs/resctrl
177# cd /sys/fs/resctrl
178
179First we reset the schemata for the default group so that the "upper"
18050% of the L3 cache on socket 0 cannot be used by ordinary tasks:
181
182# echo "L3:0=3ff;1=fffff" > schemata
183
184Next we make a resource group for our first real time task and give
185it access to the "top" 25% of the cache on socket 0.
186
187# mkdir p0
188# echo "L3:0=f8000;1=fffff" > p0/schemata
189
190Finally we move our first real time task into this resource group. We
191also use taskset(1) to ensure the task always runs on a dedicated CPU
192on socket 0. Most uses of resource groups will also constrain which
193processors tasks run on.
194
195# echo 1234 > p0/tasks
196# taskset -cp 1 1234
197
198Ditto for the second real time task (with the remaining 25% of cache):
199
200# mkdir p1
201# echo "L3:0=7c00;1=fffff" > p1/schemata
202# echo 5678 > p1/tasks
203# taskset -cp 2 5678
204
205Example 3
206---------
207
208A single socket system which has real-time tasks running on core 4-7 and
209non real-time workload assigned to core 0-3. The real-time tasks share text
210and data, so a per task association is not required and due to interaction
211with the kernel it's desired that the kernel on these cores shares L3 with
212the tasks.
213
214# mount -t resctrl resctrl /sys/fs/resctrl
215# cd /sys/fs/resctrl
216
217First we reset the schemata for the default group so that the "upper"
21850% of the L3 cache on socket 0 cannot be used by ordinary tasks:
219
220# echo "L3:0=3ff" > schemata
221
222Next we make a resource group for our real time cores and give
223it access to the "top" 50% of the cache on socket 0.
224
225# mkdir p0
226# echo "L3:0=ffc00;" > p0/schemata
227
228Finally we move core 4-7 over to the new group and make sure that the
229kernel and the tasks running there get 50% of the cache.
230
231# echo C0 > p0/cpus
Marcelo Tosatti3c2a7692016-12-14 15:08:37 -0200232
2334) Locking between applications
234
235Certain operations on the resctrl filesystem, composed of read/writes
236to/from multiple files, must be atomic.
237
238As an example, the allocation of an exclusive reservation of L3 cache
239involves:
240
241 1. Read the cbmmasks from each directory
242 2. Find a contiguous set of bits in the global CBM bitmask that is clear
243 in any of the directory cbmmasks
244 3. Create a new directory
245 4. Set the bits found in step 2 to the new directory "schemata" file
246
247If two applications attempt to allocate space concurrently then they can
248end up allocating the same bits so the reservations are shared instead of
249exclusive.
250
251To coordinate atomic operations on the resctrlfs and to avoid the problem
252above, the following locking procedure is recommended:
253
254Locking is based on flock, which is available in libc and also as a shell
255script command
256
257Write lock:
258
259 A) Take flock(LOCK_EX) on /sys/fs/resctrl
260 B) Read/write the directory structure.
261 C) funlock
262
263Read lock:
264
265 A) Take flock(LOCK_SH) on /sys/fs/resctrl
266 B) If success read the directory structure.
267 C) funlock
268
269Example with bash:
270
271# Atomically read directory structure
272$ flock -s /sys/fs/resctrl/ find /sys/fs/resctrl
273
274# Read directory contents and create new subdirectory
275
276$ cat create-dir.sh
277find /sys/fs/resctrl/ > output.txt
278mask = function-of(output.txt)
279mkdir /sys/fs/resctrl/newres/
280echo mask > /sys/fs/resctrl/newres/schemata
281
282$ flock /sys/fs/resctrl/ ./create-dir.sh
283
284Example with C:
285
286/*
287 * Example code do take advisory locks
288 * before accessing resctrl filesystem
289 */
290#include <sys/file.h>
291#include <stdlib.h>
292
293void resctrl_take_shared_lock(int fd)
294{
295 int ret;
296
297 /* take shared lock on resctrl filesystem */
298 ret = flock(fd, LOCK_SH);
299 if (ret) {
300 perror("flock");
301 exit(-1);
302 }
303}
304
305void resctrl_take_exclusive_lock(int fd)
306{
307 int ret;
308
309 /* release lock on resctrl filesystem */
310 ret = flock(fd, LOCK_EX);
311 if (ret) {
312 perror("flock");
313 exit(-1);
314 }
315}
316
317void resctrl_release_lock(int fd)
318{
319 int ret;
320
321 /* take shared lock on resctrl filesystem */
322 ret = flock(fd, LOCK_UN);
323 if (ret) {
324 perror("flock");
325 exit(-1);
326 }
327}
328
329void main(void)
330{
331 int fd, ret;
332
333 fd = open("/sys/fs/resctrl", O_DIRECTORY);
334 if (fd == -1) {
335 perror("open");
336 exit(-1);
337 }
338 resctrl_take_shared_lock(fd);
339 /* code to read directory contents */
340 resctrl_release_lock(fd);
341
342 resctrl_take_exclusive_lock(fd);
343 /* code to read and write directory contents */
344 resctrl_release_lock(fd);
345}