Paul E. McKenney | 49717cb | 2013-04-11 08:07:11 -0700 | [diff] [blame] | 1 | REDUCING OS JITTER DUE TO PER-CPU KTHREADS |
| 2 | |
| 3 | This document lists per-CPU kthreads in the Linux kernel and presents |
| 4 | options to control their OS jitter. Note that non-per-CPU kthreads are |
| 5 | not listed here. To reduce OS jitter from non-per-CPU kthreads, bind |
| 6 | them to a "housekeeping" CPU dedicated to such work. |
| 7 | |
| 8 | |
| 9 | REFERENCES |
| 10 | |
| 11 | o Documentation/IRQ-affinity.txt: Binding interrupts to sets of CPUs. |
| 12 | |
| 13 | o Documentation/cgroups: Using cgroups to bind tasks to sets of CPUs. |
| 14 | |
| 15 | o man taskset: Using the taskset command to bind tasks to sets |
| 16 | of CPUs. |
| 17 | |
| 18 | o man sched_setaffinity: Using the sched_setaffinity() system |
| 19 | call to bind tasks to sets of CPUs. |
| 20 | |
| 21 | o /sys/devices/system/cpu/cpuN/online: Control CPU N's hotplug state, |
| 22 | writing "0" to offline and "1" to online. |
| 23 | |
| 24 | o In order to locate kernel-generated OS jitter on CPU N: |
| 25 | |
| 26 | cd /sys/kernel/debug/tracing |
| 27 | echo 1 > max_graph_depth # Increase the "1" for more detail |
| 28 | echo function_graph > current_tracer |
| 29 | # run workload |
| 30 | cat per_cpu/cpuN/trace |
| 31 | |
| 32 | |
| 33 | KTHREADS |
| 34 | |
| 35 | Name: ehca_comp/%u |
| 36 | Purpose: Periodically process Infiniband-related work. |
| 37 | To reduce its OS jitter, do any of the following: |
| 38 | 1. Don't use eHCA Infiniband hardware, instead choosing hardware |
| 39 | that does not require per-CPU kthreads. This will prevent these |
| 40 | kthreads from being created in the first place. (This will |
| 41 | work for most people, as this hardware, though important, is |
| 42 | relatively old and is produced in relatively low unit volumes.) |
| 43 | 2. Do all eHCA-Infiniband-related work on other CPUs, including |
| 44 | interrupts. |
| 45 | 3. Rework the eHCA driver so that its per-CPU kthreads are |
| 46 | provisioned only on selected CPUs. |
| 47 | |
| 48 | |
| 49 | Name: irq/%d-%s |
| 50 | Purpose: Handle threaded interrupts. |
| 51 | To reduce its OS jitter, do the following: |
| 52 | 1. Use irq affinity to force the irq threads to execute on |
| 53 | some other CPU. |
| 54 | |
| 55 | Name: kcmtpd_ctr_%d |
| 56 | Purpose: Handle Bluetooth work. |
| 57 | To reduce its OS jitter, do one of the following: |
| 58 | 1. Don't use Bluetooth, in which case these kthreads won't be |
| 59 | created in the first place. |
| 60 | 2. Use irq affinity to force Bluetooth-related interrupts to |
| 61 | occur on some other CPU and furthermore initiate all |
| 62 | Bluetooth activity on some other CPU. |
| 63 | |
| 64 | Name: ksoftirqd/%u |
| 65 | Purpose: Execute softirq handlers when threaded or when under heavy load. |
| 66 | To reduce its OS jitter, each softirq vector must be handled |
| 67 | separately as follows: |
| 68 | TIMER_SOFTIRQ: Do all of the following: |
| 69 | 1. To the extent possible, keep the CPU out of the kernel when it |
| 70 | is non-idle, for example, by avoiding system calls and by forcing |
| 71 | both kernel threads and interrupts to execute elsewhere. |
| 72 | 2. Build with CONFIG_HOTPLUG_CPU=y. After boot completes, force |
| 73 | the CPU offline, then bring it back online. This forces |
| 74 | recurring timers to migrate elsewhere. If you are concerned |
| 75 | with multiple CPUs, force them all offline before bringing the |
| 76 | first one back online. Once you have onlined the CPUs in question, |
| 77 | do not offline any other CPUs, because doing so could force the |
| 78 | timer back onto one of the CPUs in question. |
| 79 | NET_TX_SOFTIRQ and NET_RX_SOFTIRQ: Do all of the following: |
| 80 | 1. Force networking interrupts onto other CPUs. |
| 81 | 2. Initiate any network I/O on other CPUs. |
| 82 | 3. Once your application has started, prevent CPU-hotplug operations |
| 83 | from being initiated from tasks that might run on the CPU to |
| 84 | be de-jittered. (It is OK to force this CPU offline and then |
| 85 | bring it back online before you start your application.) |
| 86 | BLOCK_SOFTIRQ: Do all of the following: |
| 87 | 1. Force block-device interrupts onto some other CPU. |
| 88 | 2. Initiate any block I/O on other CPUs. |
| 89 | 3. Once your application has started, prevent CPU-hotplug operations |
| 90 | from being initiated from tasks that might run on the CPU to |
| 91 | be de-jittered. (It is OK to force this CPU offline and then |
| 92 | bring it back online before you start your application.) |
| 93 | BLOCK_IOPOLL_SOFTIRQ: Do all of the following: |
| 94 | 1. Force block-device interrupts onto some other CPU. |
| 95 | 2. Initiate any block I/O and block-I/O polling on other CPUs. |
| 96 | 3. Once your application has started, prevent CPU-hotplug operations |
| 97 | from being initiated from tasks that might run on the CPU to |
| 98 | be de-jittered. (It is OK to force this CPU offline and then |
| 99 | bring it back online before you start your application.) |
| 100 | TASKLET_SOFTIRQ: Do one or more of the following: |
| 101 | 1. Avoid use of drivers that use tasklets. (Such drivers will contain |
| 102 | calls to things like tasklet_schedule().) |
| 103 | 2. Convert all drivers that you must use from tasklets to workqueues. |
| 104 | 3. Force interrupts for drivers using tasklets onto other CPUs, |
| 105 | and also do I/O involving these drivers on other CPUs. |
| 106 | SCHED_SOFTIRQ: Do all of the following: |
| 107 | 1. Avoid sending scheduler IPIs to the CPU to be de-jittered, |
| 108 | for example, ensure that at most one runnable kthread is present |
| 109 | on that CPU. If a thread that expects to run on the de-jittered |
| 110 | CPU awakens, the scheduler will send an IPI that can result in |
| 111 | a subsequent SCHED_SOFTIRQ. |
| 112 | 2. Build with CONFIG_RCU_NOCB_CPU=y, CONFIG_RCU_NOCB_CPU_ALL=y, |
| 113 | CONFIG_NO_HZ_FULL=y, and, in addition, ensure that the CPU |
| 114 | to be de-jittered is marked as an adaptive-ticks CPU using the |
| 115 | "nohz_full=" boot parameter. This reduces the number of |
| 116 | scheduler-clock interrupts that the de-jittered CPU receives, |
| 117 | minimizing its chances of being selected to do the load balancing |
| 118 | work that runs in SCHED_SOFTIRQ context. |
| 119 | 3. To the extent possible, keep the CPU out of the kernel when it |
| 120 | is non-idle, for example, by avoiding system calls and by |
| 121 | forcing both kernel threads and interrupts to execute elsewhere. |
| 122 | This further reduces the number of scheduler-clock interrupts |
| 123 | received by the de-jittered CPU. |
| 124 | HRTIMER_SOFTIRQ: Do all of the following: |
| 125 | 1. To the extent possible, keep the CPU out of the kernel when it |
| 126 | is non-idle. For example, avoid system calls and force both |
| 127 | kernel threads and interrupts to execute elsewhere. |
| 128 | 2. Build with CONFIG_HOTPLUG_CPU=y. Once boot completes, force the |
| 129 | CPU offline, then bring it back online. This forces recurring |
| 130 | timers to migrate elsewhere. If you are concerned with multiple |
| 131 | CPUs, force them all offline before bringing the first one |
| 132 | back online. Once you have onlined the CPUs in question, do not |
| 133 | offline any other CPUs, because doing so could force the timer |
| 134 | back onto one of the CPUs in question. |
| 135 | RCU_SOFTIRQ: Do at least one of the following: |
| 136 | 1. Offload callbacks and keep the CPU in either dyntick-idle or |
| 137 | adaptive-ticks state by doing all of the following: |
| 138 | a. Build with CONFIG_RCU_NOCB_CPU=y, CONFIG_RCU_NOCB_CPU_ALL=y, |
| 139 | CONFIG_NO_HZ_FULL=y, and, in addition ensure that the CPU |
| 140 | to be de-jittered is marked as an adaptive-ticks CPU using |
| 141 | the "nohz_full=" boot parameter. Bind the rcuo kthreads |
| 142 | to housekeeping CPUs, which can tolerate OS jitter. |
| 143 | b. To the extent possible, keep the CPU out of the kernel |
| 144 | when it is non-idle, for example, by avoiding system |
| 145 | calls and by forcing both kernel threads and interrupts |
| 146 | to execute elsewhere. |
| 147 | 2. Enable RCU to do its processing remotely via dyntick-idle by |
| 148 | doing all of the following: |
| 149 | a. Build with CONFIG_NO_HZ=y and CONFIG_RCU_FAST_NO_HZ=y. |
| 150 | b. Ensure that the CPU goes idle frequently, allowing other |
| 151 | CPUs to detect that it has passed through an RCU quiescent |
| 152 | state. If the kernel is built with CONFIG_NO_HZ_FULL=y, |
| 153 | userspace execution also allows other CPUs to detect that |
| 154 | the CPU in question has passed through a quiescent state. |
| 155 | c. To the extent possible, keep the CPU out of the kernel |
| 156 | when it is non-idle, for example, by avoiding system |
| 157 | calls and by forcing both kernel threads and interrupts |
| 158 | to execute elsewhere. |
| 159 | |
| 160 | Name: rcuc/%u |
| 161 | Purpose: Execute RCU callbacks in CONFIG_RCU_BOOST=y kernels. |
| 162 | To reduce its OS jitter, do at least one of the following: |
| 163 | 1. Build the kernel with CONFIG_PREEMPT=n. This prevents these |
| 164 | kthreads from being created in the first place, and also obviates |
| 165 | the need for RCU priority boosting. This approach is feasible |
| 166 | for workloads that do not require high degrees of responsiveness. |
| 167 | 2. Build the kernel with CONFIG_RCU_BOOST=n. This prevents these |
| 168 | kthreads from being created in the first place. This approach |
| 169 | is feasible only if your workload never requires RCU priority |
| 170 | boosting, for example, if you ensure frequent idle time on all |
| 171 | CPUs that might execute within the kernel. |
| 172 | 3. Build with CONFIG_RCU_NOCB_CPU=y and CONFIG_RCU_NOCB_CPU_ALL=y, |
| 173 | which offloads all RCU callbacks to kthreads that can be moved |
| 174 | off of CPUs susceptible to OS jitter. This approach prevents the |
| 175 | rcuc/%u kthreads from having any work to do, so that they are |
| 176 | never awakened. |
| 177 | 4. Ensure that the CPU never enters the kernel, and, in particular, |
| 178 | avoid initiating any CPU hotplug operations on this CPU. This is |
| 179 | another way of preventing any callbacks from being queued on the |
| 180 | CPU, again preventing the rcuc/%u kthreads from having any work |
| 181 | to do. |
| 182 | |
| 183 | Name: rcuob/%d, rcuop/%d, and rcuos/%d |
| 184 | Purpose: Offload RCU callbacks from the corresponding CPU. |
| 185 | To reduce its OS jitter, do at least one of the following: |
| 186 | 1. Use affinity, cgroups, or other mechanism to force these kthreads |
| 187 | to execute on some other CPU. |
| 188 | 2. Build with CONFIG_RCU_NOCB_CPUS=n, which will prevent these |
| 189 | kthreads from being created in the first place. However, please |
| 190 | note that this will not eliminate OS jitter, but will instead |
| 191 | shift it to RCU_SOFTIRQ. |
| 192 | |
| 193 | Name: watchdog/%u |
| 194 | Purpose: Detect software lockups on each CPU. |
| 195 | To reduce its OS jitter, do at least one of the following: |
| 196 | 1. Build with CONFIG_LOCKUP_DETECTOR=n, which will prevent these |
| 197 | kthreads from being created in the first place. |
| 198 | 2. Echo a zero to /proc/sys/kernel/watchdog to disable the |
| 199 | watchdog timer. |
| 200 | 3. Echo a large number of /proc/sys/kernel/watchdog_thresh in |
| 201 | order to reduce the frequency of OS jitter due to the watchdog |
| 202 | timer down to a level that is acceptable for your workload. |