blob: 76f21e2ac1761dc09f1a2a0cbc99d29110c53691 [file] [log] [blame]
Fenghua Yuf20e5782016-10-28 15:04:40 -07001User Interface for Resource Allocation in Intel Resource Director Technology
2
3Copyright (C) 2016 Intel Corporation
4
5Fenghua Yu <fenghua.yu@intel.com>
6Tony Luck <tony.luck@intel.com>
Vikas Shivappaa9cad3d2017-04-07 17:33:50 -07007Vikas Shivappa <vikas.shivappa@intel.com>
Fenghua Yuf20e5782016-10-28 15:04:40 -07008
Vikas Shivappa1640ae92017-07-25 14:14:21 -07009This feature is enabled by the CONFIG_INTEL_RDT Kconfig and the
10X86 /proc/cpuinfo flag bits "rdt", "cqm", "cat_l3" and "cdp_l3".
Fenghua Yuf20e5782016-10-28 15:04:40 -070011
12To use the feature mount the file system:
13
14 # mount -t resctrl resctrl [-o cdp] /sys/fs/resctrl
15
16mount options are:
17
18"cdp": Enable code/data prioritization in L3 cache allocations.
19
Vikas Shivappa1640ae92017-07-25 14:14:21 -070020RDT features are orthogonal. A particular system may support only
21monitoring, only control, or both monitoring and control.
22
23The mount succeeds if either of allocation or monitoring is present, but
24only those files and directories supported by the system will be created.
25For more details on the behavior of the interface during monitoring
26and allocation, see the "Resource alloc and monitor groups" section.
Fenghua Yuf20e5782016-10-28 15:04:40 -070027
Thomas Gleixner458b0d62016-11-07 11:58:12 +010028Info directory
29--------------
30
31The 'info' directory contains information about the enabled
32resources. Each resource has its own subdirectory. The subdirectory
Vikas Shivappaa9cad3d2017-04-07 17:33:50 -070033names reflect the resource names.
Vikas Shivappa1640ae92017-07-25 14:14:21 -070034
35Each subdirectory contains the following files with respect to
36allocation:
37
38Cache resource(L3/L2) subdirectory contains the following files
39related to allocation:
Thomas Gleixner458b0d62016-11-07 11:58:12 +010040
Vikas Shivappaa9cad3d2017-04-07 17:33:50 -070041"num_closids": The number of CLOSIDs which are valid for this
42 resource. The kernel uses the smallest number of
43 CLOSIDs of all enabled resources as limit.
Thomas Gleixner458b0d62016-11-07 11:58:12 +010044
Vikas Shivappaa9cad3d2017-04-07 17:33:50 -070045"cbm_mask": The bitmask which is valid for this resource.
46 This mask is equivalent to 100%.
Thomas Gleixner458b0d62016-11-07 11:58:12 +010047
Vikas Shivappaa9cad3d2017-04-07 17:33:50 -070048"min_cbm_bits": The minimum number of consecutive bits which
49 must be set when writing a mask.
Thomas Gleixner458b0d62016-11-07 11:58:12 +010050
Vikas Shivappa1640ae92017-07-25 14:14:21 -070051Memory bandwitdh(MB) subdirectory contains the following files
52with respect to allocation:
Vikas Shivappaa9cad3d2017-04-07 17:33:50 -070053
54"min_bandwidth": The minimum memory bandwidth percentage which
55 user can request.
56
57"bandwidth_gran": The granularity in which the memory bandwidth
58 percentage is allocated. The allocated
59 b/w percentage is rounded off to the next
60 control step available on the hardware. The
61 available bandwidth control steps are:
62 min_bandwidth + N * bandwidth_gran.
63
64"delay_linear": Indicates if the delay scale is linear or
65 non-linear. This field is purely informational
66 only.
Thomas Gleixner458b0d62016-11-07 11:58:12 +010067
Vikas Shivappa1640ae92017-07-25 14:14:21 -070068If RDT monitoring is available there will be an "L3_MON" directory
69with the following files:
70
71"num_rmids": The number of RMIDs available. This is the
72 upper bound for how many "CTRL_MON" + "MON"
73 groups can be created.
74
75"mon_features": Lists the monitoring events if
76 monitoring is enabled for the resource.
77
78"max_threshold_occupancy":
79 Read/write file provides the largest value (in
80 bytes) at which a previously used LLC_occupancy
81 counter can be considered for re-use.
82
83
84Resource alloc and monitor groups
85---------------------------------
86
Fenghua Yuf20e5782016-10-28 15:04:40 -070087Resource groups are represented as directories in the resctrl file
Vikas Shivappa1640ae92017-07-25 14:14:21 -070088system. The default group is the root directory which, immediately
89after mounting, owns all the tasks and cpus in the system and can make
90full use of all resources.
Fenghua Yuf20e5782016-10-28 15:04:40 -070091
Vikas Shivappa1640ae92017-07-25 14:14:21 -070092On a system with RDT control features additional directories can be
93created in the root directory that specify different amounts of each
94resource (see "schemata" below). The root and these additional top level
95directories are referred to as "CTRL_MON" groups below.
Fenghua Yuf20e5782016-10-28 15:04:40 -070096
Vikas Shivappa1640ae92017-07-25 14:14:21 -070097On a system with RDT monitoring the root directory and other top level
98directories contain a directory named "mon_groups" in which additional
99directories can be created to monitor subsets of tasks in the CTRL_MON
100group that is their ancestor. These are called "MON" groups in the rest
101of this document.
Fenghua Yuf20e5782016-10-28 15:04:40 -0700102
Vikas Shivappa1640ae92017-07-25 14:14:21 -0700103Removing a directory will move all tasks and cpus owned by the group it
104represents to the parent. Removing one of the created CTRL_MON groups
105will automatically remove all MON groups below it.
Fenghua Yuf20e5782016-10-28 15:04:40 -0700106
Vikas Shivappa1640ae92017-07-25 14:14:21 -0700107All groups contain the following files:
Jiri Olsa4ffa3c92017-04-10 16:52:32 +0200108
Vikas Shivappa1640ae92017-07-25 14:14:21 -0700109"tasks":
110 Reading this file shows the list of all tasks that belong to
111 this group. Writing a task id to the file will add a task to the
112 group. If the group is a CTRL_MON group the task is removed from
113 whichever previous CTRL_MON group owned the task and also from
114 any MON group that owned the task. If the group is a MON group,
115 then the task must already belong to the CTRL_MON parent of this
116 group. The task is removed from any previous MON group.
Fenghua Yuf20e5782016-10-28 15:04:40 -0700117
Vikas Shivappa1640ae92017-07-25 14:14:21 -0700118
119"cpus":
120 Reading this file shows a bitmask of the logical CPUs owned by
121 this group. Writing a mask to this file will add and remove
122 CPUs to/from this group. As with the tasks file a hierarchy is
123 maintained where MON groups may only include CPUs owned by the
124 parent CTRL_MON group.
125
126
127"cpus_list":
128 Just like "cpus", only using ranges of CPUs instead of bitmasks.
129
130
131When control is enabled all CTRL_MON groups will also contain:
132
133"schemata":
134 A list of all the resources available to this group.
135 Each resource has its own line and format - see below for details.
136
137When monitoring is enabled all MON groups will also contain:
138
139"mon_data":
140 This contains a set of files organized by L3 domain and by
141 RDT event. E.g. on a system with two L3 domains there will
142 be subdirectories "mon_L3_00" and "mon_L3_01". Each of these
143 directories have one file per event (e.g. "llc_occupancy",
144 "mbm_total_bytes", and "mbm_local_bytes"). In a MON group these
145 files provide a read out of the current value of the event for
146 all tasks in the group. In CTRL_MON groups these files provide
147 the sum for all tasks in the CTRL_MON group and all tasks in
148 MON groups. Please see example section for more details on usage.
149
150Resource allocation rules
151-------------------------
152When a task is running the following rules define which resources are
153available to it:
Fenghua Yuf20e5782016-10-28 15:04:40 -0700154
1551) If the task is a member of a non-default group, then the schemata
Vikas Shivappa1640ae92017-07-25 14:14:21 -0700156 for that group is used.
Fenghua Yuf20e5782016-10-28 15:04:40 -0700157
1582) Else if the task belongs to the default group, but is running on a
Vikas Shivappa1640ae92017-07-25 14:14:21 -0700159 CPU that is assigned to some specific group, then the schemata for the
160 CPU's group is used.
Fenghua Yuf20e5782016-10-28 15:04:40 -0700161
1623) Otherwise the schemata for the default group is used.
163
Vikas Shivappa1640ae92017-07-25 14:14:21 -0700164Resource monitoring rules
165-------------------------
1661) If a task is a member of a MON group, or non-default CTRL_MON group
167 then RDT events for the task will be reported in that group.
168
1692) If a task is a member of the default CTRL_MON group, but is running
170 on a CPU that is assigned to some specific group, then the RDT events
171 for the task will be reported in that group.
172
1733) Otherwise RDT events for the task will be reported in the root level
174 "mon_data" group.
175
176
177Notes on cache occupancy monitoring and control
178-----------------------------------------------
179When moving a task from one group to another you should remember that
180this only affects *new* cache allocations by the task. E.g. you may have
181a task in a monitor group showing 3 MB of cache occupancy. If you move
182to a new group and immediately check the occupancy of the old and new
183groups you will likely see that the old group is still showing 3 MB and
184the new group zero. When the task accesses locations still in cache from
185before the move, the h/w does not update any counters. On a busy system
186you will likely see the occupancy in the old group go down as cache lines
187are evicted and re-used while the occupancy in the new group rises as
188the task accesses memory and loads into the cache are counted based on
189membership in the new group.
190
191The same applies to cache allocation control. Moving a task to a group
192with a smaller cache partition will not evict any cache lines. The
193process may continue to use them from the old partition.
194
195Hardware uses CLOSid(Class of service ID) and an RMID(Resource monitoring ID)
196to identify a control group and a monitoring group respectively. Each of
197the resource groups are mapped to these IDs based on the kind of group. The
198number of CLOSid and RMID are limited by the hardware and hence the creation of
199a "CTRL_MON" directory may fail if we run out of either CLOSID or RMID
200and creation of "MON" group may fail if we run out of RMIDs.
201
202max_threshold_occupancy - generic concepts
203------------------------------------------
204
205Note that an RMID once freed may not be immediately available for use as
206the RMID is still tagged the cache lines of the previous user of RMID.
207Hence such RMIDs are placed on limbo list and checked back if the cache
208occupancy has gone down. If there is a time when system has a lot of
209limbo RMIDs but which are not ready to be used, user may see an -EBUSY
210during mkdir.
211
212max_threshold_occupancy is a user configurable value to determine the
213occupancy at which an RMID can be freed.
Fenghua Yuf20e5782016-10-28 15:04:40 -0700214
215Schemata files - general concepts
216---------------------------------
217Each line in the file describes one resource. The line starts with
218the name of the resource, followed by specific values to be applied
219in each of the instances of that resource on the system.
220
221Cache IDs
222---------
223On current generation systems there is one L3 cache per socket and L2
224caches are generally just shared by the hyperthreads on a core, but this
225isn't an architectural requirement. We could have multiple separate L3
226caches on a socket, multiple cores could share an L2 cache. So instead
227of using "socket" or "core" to define the set of logical cpus sharing
228a resource we use a "Cache ID". At a given cache level this will be a
229unique number across the whole system (but it isn't guaranteed to be a
230contiguous sequence, there may be gaps). To find the ID for each logical
231CPU look in /sys/devices/system/cpu/cpu*/cache/index*/id
232
233Cache Bit Masks (CBM)
234---------------------
235For cache resources we describe the portion of the cache that is available
236for allocation using a bitmask. The maximum value of the mask is defined
237by each cpu model (and may be different for different cache levels). It
238is found using CPUID, but is also provided in the "info" directory of
239the resctrl file system in "info/{resource}/cbm_mask". X86 hardware
240requires that these masks have all the '1' bits in a contiguous block. So
2410x3, 0x6 and 0xC are legal 4-bit masks with two bits set, but 0x5, 0x9
242and 0xA are not. On a system with a 20-bit mask each bit represents 5%
243of the capacity of the cache. You could partition the cache into four
244equal parts with masks: 0x1f, 0x3e0, 0x7c00, 0xf8000.
245
Vikas Shivappaa9cad3d2017-04-07 17:33:50 -0700246Memory bandwidth(b/w) percentage
247--------------------------------
248For Memory b/w resource, user controls the resource by indicating the
249percentage of total memory b/w.
250
251The minimum bandwidth percentage value for each cpu model is predefined
252and can be looked up through "info/MB/min_bandwidth". The bandwidth
253granularity that is allocated is also dependent on the cpu model and can
254be looked up at "info/MB/bandwidth_gran". The available bandwidth
255control steps are: min_bw + N * bw_gran. Intermediate values are rounded
256to the next control step available on the hardware.
257
258The bandwidth throttling is a core specific mechanism on some of Intel
259SKUs. Using a high bandwidth and a low bandwidth setting on two threads
260sharing a core will result in both threads being throttled to use the
261low bandwidth.
Fenghua Yuf20e5782016-10-28 15:04:40 -0700262
Vikas Shivappa1640ae92017-07-25 14:14:21 -0700263L3 schemata file details (code and data prioritization disabled)
264----------------------------------------------------------------
Fenghua Yuf20e5782016-10-28 15:04:40 -0700265With CDP disabled the L3 schemata format is:
266
267 L3:<cache_id0>=<cbm>;<cache_id1>=<cbm>;...
268
Vikas Shivappa1640ae92017-07-25 14:14:21 -0700269L3 schemata file details (CDP enabled via mount option to resctrl)
270------------------------------------------------------------------
Fenghua Yuf20e5782016-10-28 15:04:40 -0700271When CDP is enabled L3 control is split into two separate resources
272so you can specify independent masks for code and data like this:
273
274 L3data:<cache_id0>=<cbm>;<cache_id1>=<cbm>;...
275 L3code:<cache_id0>=<cbm>;<cache_id1>=<cbm>;...
276
Vikas Shivappa1640ae92017-07-25 14:14:21 -0700277L2 schemata file details
278------------------------
Fenghua Yuf20e5782016-10-28 15:04:40 -0700279L2 cache does not support code and data prioritization, so the
280schemata format is always:
281
282 L2:<cache_id0>=<cbm>;<cache_id1>=<cbm>;...
283
Vikas Shivappaa9cad3d2017-04-07 17:33:50 -0700284Memory b/w Allocation details
285-----------------------------
286
287Memory b/w domain is L3 cache.
288
289 MB:<cache_id0>=bandwidth0;<cache_id1>=bandwidth1;...
290
Tony Luckc4026b72017-04-03 14:44:16 -0700291Reading/writing the schemata file
292---------------------------------
293Reading the schemata file will show the state of all resources
294on all domains. When writing you only need to specify those values
295which you wish to change. E.g.
296
297# cat schemata
298L3DATA:0=fffff;1=fffff;2=fffff;3=fffff
299L3CODE:0=fffff;1=fffff;2=fffff;3=fffff
300# echo "L3DATA:2=3c0;" > schemata
301# cat schemata
302L3DATA:0=fffff;1=fffff;2=3c0;3=fffff
303L3CODE:0=fffff;1=fffff;2=fffff;3=fffff
304
Vikas Shivappa1640ae92017-07-25 14:14:21 -0700305Examples for RDT allocation usage:
306
Fenghua Yuf20e5782016-10-28 15:04:40 -0700307Example 1
308---------
309On a two socket machine (one L3 cache per socket) with just four bits
Vikas Shivappaa9cad3d2017-04-07 17:33:50 -0700310for cache bit masks, minimum b/w of 10% with a memory bandwidth
311granularity of 10%
Fenghua Yuf20e5782016-10-28 15:04:40 -0700312
313# mount -t resctrl resctrl /sys/fs/resctrl
314# cd /sys/fs/resctrl
315# mkdir p0 p1
Vikas Shivappaa9cad3d2017-04-07 17:33:50 -0700316# echo "L3:0=3;1=c\nMB:0=50;1=50" > /sys/fs/resctrl/p0/schemata
317# echo "L3:0=3;1=3\nMB:0=50;1=50" > /sys/fs/resctrl/p1/schemata
Fenghua Yuf20e5782016-10-28 15:04:40 -0700318
319The default resource group is unmodified, so we have access to all parts
320of all caches (its schemata file reads "L3:0=f;1=f").
321
322Tasks that are under the control of group "p0" may only allocate from the
323"lower" 50% on cache ID 0, and the "upper" 50% of cache ID 1.
324Tasks in group "p1" use the "lower" 50% of cache on both sockets.
325
Vikas Shivappaa9cad3d2017-04-07 17:33:50 -0700326Similarly, tasks that are under the control of group "p0" may use a
327maximum memory b/w of 50% on socket0 and 50% on socket 1.
328Tasks in group "p1" may also use 50% memory b/w on both sockets.
329Note that unlike cache masks, memory b/w cannot specify whether these
330allocations can overlap or not. The allocations specifies the maximum
331b/w that the group may be able to use and the system admin can configure
332the b/w accordingly.
333
Fenghua Yuf20e5782016-10-28 15:04:40 -0700334Example 2
335---------
336Again two sockets, but this time with a more realistic 20-bit mask.
337
338Two real time tasks pid=1234 running on processor 0 and pid=5678 running on
339processor 1 on socket 0 on a 2-socket and dual core machine. To avoid noisy
340neighbors, each of the two real-time tasks exclusively occupies one quarter
341of L3 cache on socket 0.
342
343# mount -t resctrl resctrl /sys/fs/resctrl
344# cd /sys/fs/resctrl
345
346First we reset the schemata for the default group so that the "upper"
Vikas Shivappaa9cad3d2017-04-07 17:33:50 -070034750% of the L3 cache on socket 0 and 50% of memory b/w cannot be used by
348ordinary tasks:
Fenghua Yuf20e5782016-10-28 15:04:40 -0700349
Vikas Shivappaa9cad3d2017-04-07 17:33:50 -0700350# echo "L3:0=3ff;1=fffff\nMB:0=50;1=100" > schemata
Fenghua Yuf20e5782016-10-28 15:04:40 -0700351
352Next we make a resource group for our first real time task and give
353it access to the "top" 25% of the cache on socket 0.
354
355# mkdir p0
356# echo "L3:0=f8000;1=fffff" > p0/schemata
357
358Finally we move our first real time task into this resource group. We
359also use taskset(1) to ensure the task always runs on a dedicated CPU
360on socket 0. Most uses of resource groups will also constrain which
361processors tasks run on.
362
363# echo 1234 > p0/tasks
364# taskset -cp 1 1234
365
366Ditto for the second real time task (with the remaining 25% of cache):
367
368# mkdir p1
369# echo "L3:0=7c00;1=fffff" > p1/schemata
370# echo 5678 > p1/tasks
371# taskset -cp 2 5678
372
Vikas Shivappaa9cad3d2017-04-07 17:33:50 -0700373For the same 2 socket system with memory b/w resource and CAT L3 the
374schemata would look like(Assume min_bandwidth 10 and bandwidth_gran is
37510):
376
377For our first real time task this would request 20% memory b/w on socket
3780.
379
380# echo -e "L3:0=f8000;1=fffff\nMB:0=20;1=100" > p0/schemata
381
382For our second real time task this would request an other 20% memory b/w
383on socket 0.
384
385# echo -e "L3:0=f8000;1=fffff\nMB:0=20;1=100" > p0/schemata
386
Fenghua Yuf20e5782016-10-28 15:04:40 -0700387Example 3
388---------
389
390A single socket system which has real-time tasks running on core 4-7 and
391non real-time workload assigned to core 0-3. The real-time tasks share text
392and data, so a per task association is not required and due to interaction
393with the kernel it's desired that the kernel on these cores shares L3 with
394the tasks.
395
396# mount -t resctrl resctrl /sys/fs/resctrl
397# cd /sys/fs/resctrl
398
399First we reset the schemata for the default group so that the "upper"
Vikas Shivappaa9cad3d2017-04-07 17:33:50 -070040050% of the L3 cache on socket 0, and 50% of memory bandwidth on socket 0
401cannot be used by ordinary tasks:
Fenghua Yuf20e5782016-10-28 15:04:40 -0700402
Vikas Shivappaa9cad3d2017-04-07 17:33:50 -0700403# echo "L3:0=3ff\nMB:0=50" > schemata
Fenghua Yuf20e5782016-10-28 15:04:40 -0700404
Vikas Shivappaa9cad3d2017-04-07 17:33:50 -0700405Next we make a resource group for our real time cores and give it access
406to the "top" 50% of the cache on socket 0 and 50% of memory bandwidth on
407socket 0.
Fenghua Yuf20e5782016-10-28 15:04:40 -0700408
409# mkdir p0
Vikas Shivappaa9cad3d2017-04-07 17:33:50 -0700410# echo "L3:0=ffc00\nMB:0=50" > p0/schemata
Fenghua Yuf20e5782016-10-28 15:04:40 -0700411
412Finally we move core 4-7 over to the new group and make sure that the
Vikas Shivappaa9cad3d2017-04-07 17:33:50 -0700413kernel and the tasks running there get 50% of the cache. They should
414also get 50% of memory bandwidth assuming that the cores 4-7 are SMT
415siblings and only the real time threads are scheduled on the cores 4-7.
Fenghua Yuf20e5782016-10-28 15:04:40 -0700416
Xiaochen Shenfb8fb462017-05-03 11:15:56 +0800417# echo F0 > p0/cpus
Marcelo Tosatti3c2a7692016-12-14 15:08:37 -0200418
4194) Locking between applications
420
421Certain operations on the resctrl filesystem, composed of read/writes
422to/from multiple files, must be atomic.
423
424As an example, the allocation of an exclusive reservation of L3 cache
425involves:
426
427 1. Read the cbmmasks from each directory
428 2. Find a contiguous set of bits in the global CBM bitmask that is clear
429 in any of the directory cbmmasks
430 3. Create a new directory
431 4. Set the bits found in step 2 to the new directory "schemata" file
432
433If two applications attempt to allocate space concurrently then they can
434end up allocating the same bits so the reservations are shared instead of
435exclusive.
436
437To coordinate atomic operations on the resctrlfs and to avoid the problem
438above, the following locking procedure is recommended:
439
440Locking is based on flock, which is available in libc and also as a shell
441script command
442
443Write lock:
444
445 A) Take flock(LOCK_EX) on /sys/fs/resctrl
446 B) Read/write the directory structure.
447 C) funlock
448
449Read lock:
450
451 A) Take flock(LOCK_SH) on /sys/fs/resctrl
452 B) If success read the directory structure.
453 C) funlock
454
455Example with bash:
456
457# Atomically read directory structure
458$ flock -s /sys/fs/resctrl/ find /sys/fs/resctrl
459
460# Read directory contents and create new subdirectory
461
462$ cat create-dir.sh
463find /sys/fs/resctrl/ > output.txt
464mask = function-of(output.txt)
465mkdir /sys/fs/resctrl/newres/
466echo mask > /sys/fs/resctrl/newres/schemata
467
468$ flock /sys/fs/resctrl/ ./create-dir.sh
469
470Example with C:
471
472/*
473 * Example code do take advisory locks
474 * before accessing resctrl filesystem
475 */
476#include <sys/file.h>
477#include <stdlib.h>
478
479void resctrl_take_shared_lock(int fd)
480{
481 int ret;
482
483 /* take shared lock on resctrl filesystem */
484 ret = flock(fd, LOCK_SH);
485 if (ret) {
486 perror("flock");
487 exit(-1);
488 }
489}
490
491void resctrl_take_exclusive_lock(int fd)
492{
493 int ret;
494
495 /* release lock on resctrl filesystem */
496 ret = flock(fd, LOCK_EX);
497 if (ret) {
498 perror("flock");
499 exit(-1);
500 }
501}
502
503void resctrl_release_lock(int fd)
504{
505 int ret;
506
507 /* take shared lock on resctrl filesystem */
508 ret = flock(fd, LOCK_UN);
509 if (ret) {
510 perror("flock");
511 exit(-1);
512 }
513}
514
515void main(void)
516{
517 int fd, ret;
518
519 fd = open("/sys/fs/resctrl", O_DIRECTORY);
520 if (fd == -1) {
521 perror("open");
522 exit(-1);
523 }
524 resctrl_take_shared_lock(fd);
525 /* code to read directory contents */
526 resctrl_release_lock(fd);
527
528 resctrl_take_exclusive_lock(fd);
529 /* code to read and write directory contents */
530 resctrl_release_lock(fd);
531}
Vikas Shivappa1640ae92017-07-25 14:14:21 -0700532
533Examples for RDT Monitoring along with allocation usage:
534
535Reading monitored data
536----------------------
537Reading an event file (for ex: mon_data/mon_L3_00/llc_occupancy) would
538show the current snapshot of LLC occupancy of the corresponding MON
539group or CTRL_MON group.
540
541
542Example 1 (Monitor CTRL_MON group and subset of tasks in CTRL_MON group)
543---------
544On a two socket machine (one L3 cache per socket) with just four bits
545for cache bit masks
546
547# mount -t resctrl resctrl /sys/fs/resctrl
548# cd /sys/fs/resctrl
549# mkdir p0 p1
550# echo "L3:0=3;1=c" > /sys/fs/resctrl/p0/schemata
551# echo "L3:0=3;1=3" > /sys/fs/resctrl/p1/schemata
552# echo 5678 > p1/tasks
553# echo 5679 > p1/tasks
554
555The default resource group is unmodified, so we have access to all parts
556of all caches (its schemata file reads "L3:0=f;1=f").
557
558Tasks that are under the control of group "p0" may only allocate from the
559"lower" 50% on cache ID 0, and the "upper" 50% of cache ID 1.
560Tasks in group "p1" use the "lower" 50% of cache on both sockets.
561
562Create monitor groups and assign a subset of tasks to each monitor group.
563
564# cd /sys/fs/resctrl/p1/mon_groups
565# mkdir m11 m12
566# echo 5678 > m11/tasks
567# echo 5679 > m12/tasks
568
569fetch data (data shown in bytes)
570
571# cat m11/mon_data/mon_L3_00/llc_occupancy
57216234000
573# cat m11/mon_data/mon_L3_01/llc_occupancy
57414789000
575# cat m12/mon_data/mon_L3_00/llc_occupancy
57616789000
577
578The parent ctrl_mon group shows the aggregated data.
579
580# cat /sys/fs/resctrl/p1/mon_data/mon_l3_00/llc_occupancy
58131234000
582
583Example 2 (Monitor a task from its creation)
584---------
585On a two socket machine (one L3 cache per socket)
586
587# mount -t resctrl resctrl /sys/fs/resctrl
588# cd /sys/fs/resctrl
589# mkdir p0 p1
590
591An RMID is allocated to the group once its created and hence the <cmd>
592below is monitored from its creation.
593
594# echo $$ > /sys/fs/resctrl/p1/tasks
595# <cmd>
596
597Fetch the data
598
599# cat /sys/fs/resctrl/p1/mon_data/mon_l3_00/llc_occupancy
60031789000
601
602Example 3 (Monitor without CAT support or before creating CAT groups)
603---------
604
605Assume a system like HSW has only CQM and no CAT support. In this case
606the resctrl will still mount but cannot create CTRL_MON directories.
607But user can create different MON groups within the root group thereby
608able to monitor all tasks including kernel threads.
609
610This can also be used to profile jobs cache size footprint before being
611able to allocate them to different allocation groups.
612
613# mount -t resctrl resctrl /sys/fs/resctrl
614# cd /sys/fs/resctrl
615# mkdir mon_groups/m01
616# mkdir mon_groups/m02
617
618# echo 3478 > /sys/fs/resctrl/mon_groups/m01/tasks
619# echo 2467 > /sys/fs/resctrl/mon_groups/m02/tasks
620
621Monitor the groups separately and also get per domain data. From the
622below its apparent that the tasks are mostly doing work on
623domain(socket) 0.
624
625# cat /sys/fs/resctrl/mon_groups/m01/mon_L3_00/llc_occupancy
62631234000
627# cat /sys/fs/resctrl/mon_groups/m01/mon_L3_01/llc_occupancy
62834555
629# cat /sys/fs/resctrl/mon_groups/m02/mon_L3_00/llc_occupancy
63031234000
631# cat /sys/fs/resctrl/mon_groups/m02/mon_L3_01/llc_occupancy
63232789
633
634
635Example 4 (Monitor real time tasks)
636-----------------------------------
637
638A single socket system which has real time tasks running on cores 4-7
639and non real time tasks on other cpus. We want to monitor the cache
640occupancy of the real time threads on these cores.
641
642# mount -t resctrl resctrl /sys/fs/resctrl
643# cd /sys/fs/resctrl
644# mkdir p1
645
646Move the cpus 4-7 over to p1
647# echo f0 > p0/cpus
648
649View the llc occupancy snapshot
650
651# cat /sys/fs/resctrl/p1/mon_data/mon_L3_00/llc_occupancy
65211234000