Parav Pandit | 9c1e67f | 2017-01-10 00:02:15 +0000 | [diff] [blame] | 1 | RDMA Controller |
| 2 | ---------------- |
| 3 | |
| 4 | Contents |
| 5 | -------- |
| 6 | |
| 7 | 1. Overview |
| 8 | 1-1. What is RDMA controller? |
| 9 | 1-2. Why RDMA controller needed? |
| 10 | 1-3. How is RDMA controller implemented? |
| 11 | 2. Usage Examples |
| 12 | |
| 13 | 1. Overview |
| 14 | |
| 15 | 1-1. What is RDMA controller? |
| 16 | ----------------------------- |
| 17 | |
| 18 | RDMA controller allows user to limit RDMA/IB specific resources that a given |
| 19 | set of processes can use. These processes are grouped using RDMA controller. |
| 20 | |
| 21 | RDMA controller defines two resources which can be limited for processes of a |
| 22 | cgroup. |
| 23 | |
| 24 | 1-2. Why RDMA controller needed? |
| 25 | -------------------------------- |
| 26 | |
| 27 | Currently user space applications can easily take away all the rdma verb |
| 28 | specific resources such as AH, CQ, QP, MR etc. Due to which other applications |
| 29 | in other cgroup or kernel space ULPs may not even get chance to allocate any |
| 30 | rdma resources. This can leads to service unavailability. |
| 31 | |
| 32 | Therefore RDMA controller is needed through which resource consumption |
| 33 | of processes can be limited. Through this controller different rdma |
| 34 | resources can be accounted. |
| 35 | |
| 36 | 1-3. How is RDMA controller implemented? |
| 37 | ---------------------------------------- |
| 38 | |
| 39 | RDMA cgroup allows limit configuration of resources. Rdma cgroup maintains |
| 40 | resource accounting per cgroup, per device using resource pool structure. |
| 41 | Each such resource pool is limited up to 64 resources in given resource pool |
| 42 | by rdma cgroup, which can be extended later if required. |
| 43 | |
| 44 | This resource pool object is linked to the cgroup css. Typically there |
| 45 | are 0 to 4 resource pool instances per cgroup, per device in most use cases. |
| 46 | But nothing limits to have it more. At present hundreds of RDMA devices per |
| 47 | single cgroup may not be handled optimally, however there is no |
| 48 | known use case or requirement for such configuration either. |
| 49 | |
| 50 | Since RDMA resources can be allocated from any process and can be freed by any |
| 51 | of the child processes which shares the address space, rdma resources are |
| 52 | always owned by the creator cgroup css. This allows process migration from one |
| 53 | to other cgroup without major complexity of transferring resource ownership; |
| 54 | because such ownership is not really present due to shared nature of |
| 55 | rdma resources. Linking resources around css also ensures that cgroups can be |
| 56 | deleted after processes migrated. This allow progress migration as well with |
| 57 | active resources, even though that is not a primary use case. |
| 58 | |
| 59 | Whenever RDMA resource charging occurs, owner rdma cgroup is returned to |
| 60 | the caller. Same rdma cgroup should be passed while uncharging the resource. |
| 61 | This also allows process migrated with active RDMA resource to charge |
| 62 | to new owner cgroup for new resource. It also allows to uncharge resource of |
| 63 | a process from previously charged cgroup which is migrated to new cgroup, |
| 64 | even though that is not a primary use case. |
| 65 | |
| 66 | Resource pool object is created in following situations. |
| 67 | (a) User sets the limit and no previous resource pool exist for the device |
| 68 | of interest for the cgroup. |
| 69 | (b) No resource limits were configured, but IB/RDMA stack tries to |
| 70 | charge the resource. So that it correctly uncharge them when applications are |
| 71 | running without limits and later on when limits are enforced during uncharging, |
| 72 | otherwise usage count will drop to negative. |
| 73 | |
| 74 | Resource pool is destroyed if all the resource limits are set to max and |
| 75 | it is the last resource getting deallocated. |
| 76 | |
| 77 | User should set all the limit to max value if it intents to remove/unconfigure |
| 78 | the resource pool for a particular device. |
| 79 | |
| 80 | IB stack honors limits enforced by the rdma controller. When application |
| 81 | query about maximum resource limits of IB device, it returns minimum of |
| 82 | what is configured by user for a given cgroup and what is supported by |
| 83 | IB device. |
| 84 | |
| 85 | Following resources can be accounted by rdma controller. |
| 86 | hca_handle Maximum number of HCA Handles |
| 87 | hca_object Maximum number of HCA Objects |
| 88 | |
| 89 | 2. Usage Examples |
| 90 | ----------------- |
| 91 | |
| 92 | (a) Configure resource limit: |
| 93 | echo mlx4_0 hca_handle=2 hca_object=2000 > /sys/fs/cgroup/rdma/1/rdma.max |
| 94 | echo ocrdma1 hca_handle=3 > /sys/fs/cgroup/rdma/2/rdma.max |
| 95 | |
| 96 | (b) Query resource limit: |
| 97 | cat /sys/fs/cgroup/rdma/2/rdma.max |
| 98 | #Output: |
| 99 | mlx4_0 hca_handle=2 hca_object=2000 |
| 100 | ocrdma1 hca_handle=3 hca_object=max |
| 101 | |
| 102 | (c) Query current usage: |
| 103 | cat /sys/fs/cgroup/rdma/2/rdma.current |
| 104 | #Output: |
| 105 | mlx4_0 hca_handle=1 hca_object=20 |
| 106 | ocrdma1 hca_handle=1 hca_object=23 |
| 107 | |
| 108 | (d) Delete resource limit: |
| 109 | echo echo mlx4_0 hca_handle=max hca_object=max > /sys/fs/cgroup/rdma/1/rdma.max |