Paul E. McKenney | a241ec6 | 2005-10-30 15:03:12 -0800 | [diff] [blame] | 1 | RCU Torture Test Operation |
| 2 | |
| 3 | |
| 4 | CONFIG_RCU_TORTURE_TEST |
| 5 | |
| 6 | The CONFIG_RCU_TORTURE_TEST config option is available for all RCU |
| 7 | implementations. It creates an rcutorture kernel module that can |
| 8 | be loaded to run a torture test. The test periodically outputs |
| 9 | status messages via printk(), which can be examined via the dmesg |
Paul E. McKenney | 72e9bb5 | 2006-06-27 02:54:03 -0700 | [diff] [blame] | 10 | command (perhaps grepping for "torture"). The test is started |
Paul E. McKenney | a241ec6 | 2005-10-30 15:03:12 -0800 | [diff] [blame] | 11 | when the module is loaded, and stops when the module is unloaded. |
| 12 | |
Paul E. McKenney | a241ec6 | 2005-10-30 15:03:12 -0800 | [diff] [blame] | 13 | |
| 14 | MODULE PARAMETERS |
| 15 | |
| 16 | This module has the following parameters: |
| 17 | |
Paul E. McKenney | 4c54005 | 2010-01-14 16:10:57 -0800 | [diff] [blame] | 18 | fqs_duration Duration (in microseconds) of artificially induced bursts |
| 19 | of force_quiescent_state() invocations. In RCU |
| 20 | implementations having force_quiescent_state(), these |
| 21 | bursts help force races between forcing a given grace |
| 22 | period and that grace period ending on its own. |
| 23 | |
| 24 | fqs_holdoff Holdoff time (in microseconds) between consecutive calls |
| 25 | to force_quiescent_state() within a burst. |
| 26 | |
| 27 | fqs_stutter Wait time (in seconds) between consecutive bursts |
| 28 | of calls to force_quiescent_state(). |
| 29 | |
Paul E. McKenney | 2ec1f2d | 2013-06-12 15:12:21 -0700 | [diff] [blame] | 30 | gp_normal Make the fake writers use normal synchronous grace-period |
| 31 | primitives. |
| 32 | |
| 33 | gp_exp Make the fake writers use expedited synchronous grace-period |
| 34 | primitives. If both gp_normal and gp_exp are set, or |
| 35 | if neither gp_normal nor gp_exp are set, then randomly |
| 36 | choose the primitive so that about 50% are normal and |
| 37 | 50% expedited. By default, neither are set, which |
| 38 | gives best overall test coverage. |
| 39 | |
Paul E. McKenney | 63cd758 | 2011-06-05 10:07:18 -0700 | [diff] [blame] | 40 | irqreader Says to invoke RCU readers from irq level. This is currently |
Paul E. McKenney | 0729fbf | 2008-06-25 12:24:52 -0700 | [diff] [blame] | 41 | done via timers. Defaults to "1" for variants of RCU that |
| 42 | permit this. (Or, more accurately, variants of RCU that do |
| 43 | -not- permit this know to ignore this variable.) |
Paul E. McKenney | a241ec6 | 2005-10-30 15:03:12 -0800 | [diff] [blame] | 44 | |
Paul E. McKenney | fae4b54 | 2012-02-20 17:51:45 -0800 | [diff] [blame] | 45 | n_barrier_cbs If this is nonzero, RCU barrier testing will be conducted, |
| 46 | in which case n_barrier_cbs specifies the number of |
| 47 | RCU callbacks (and corresponding kthreads) to use for |
| 48 | this testing. The value cannot be negative. If you |
| 49 | specify this to be non-zero when torture_type indicates a |
| 50 | synchronous RCU implementation (one for which a member of |
| 51 | the synchronize_rcu() rather than the call_rcu() family is |
| 52 | used -- see the documentation for torture_type below), an |
| 53 | error will be reported and no testing will be carried out. |
| 54 | |
Josh Triplett | b772e1d | 2006-10-04 02:17:13 -0700 | [diff] [blame] | 55 | nfakewriters This is the number of RCU fake writer threads to run. Fake |
| 56 | writer threads repeatedly use the synchronous "wait for |
| 57 | current readers" function of the interface selected by |
| 58 | torture_type, with a delay between calls to allow for various |
| 59 | different numbers of writers running in parallel. |
| 60 | nfakewriters defaults to 4, which provides enough parallelism |
| 61 | to trigger special cases caused by multiple writers, such as |
| 62 | the synchronize_srcu() early return optimization. |
| 63 | |
Paul E. McKenney | 0729fbf | 2008-06-25 12:24:52 -0700 | [diff] [blame] | 64 | nreaders This is the number of RCU reading threads supported. |
| 65 | The default is twice the number of CPUs. Why twice? |
| 66 | To properly exercise RCU implementations with preemptible |
| 67 | read-side critical sections. |
| 68 | |
Paul E. McKenney | b58bdcc | 2011-11-16 17:48:21 -0800 | [diff] [blame] | 69 | onoff_interval |
| 70 | The number of seconds between each attempt to execute a |
| 71 | randomly selected CPU-hotplug operation. Defaults to |
| 72 | zero, which disables CPU hotplugging. In HOTPLUG_CPU=n |
| 73 | kernels, rcutorture will silently refuse to do any |
| 74 | CPU-hotplug operations regardless of what value is |
| 75 | specified for onoff_interval. |
| 76 | |
Paul E. McKenney | 9b9ec9b | 2012-01-17 14:36:51 -0800 | [diff] [blame] | 77 | onoff_holdoff The number of seconds to wait until starting CPU-hotplug |
| 78 | operations. This would normally only be used when |
| 79 | rcutorture was built into the kernel and started |
| 80 | automatically at boot time, in which case it is useful |
| 81 | in order to avoid confusing boot-time code with CPUs |
| 82 | coming and going. |
| 83 | |
Paul E. McKenney | 0729fbf | 2008-06-25 12:24:52 -0700 | [diff] [blame] | 84 | shuffle_interval |
| 85 | The number of seconds to keep the test threads affinitied |
| 86 | to a particular subset of the CPUs, defaults to 3 seconds. |
| 87 | Used in conjunction with test_no_idle_hz. |
| 88 | |
Paul E. McKenney | d5f546d | 2011-11-04 11:44:12 -0700 | [diff] [blame] | 89 | shutdown_secs The number of seconds to run the test before terminating |
| 90 | the test and powering off the system. The default is |
| 91 | zero, which disables test termination and system shutdown. |
| 92 | This capability is useful for automated testing. |
| 93 | |
Paul E. McKenney | c13f375 | 2012-01-20 15:36:33 -0800 | [diff] [blame] | 94 | stall_cpu The number of seconds that a CPU should be stalled while |
| 95 | within both an rcu_read_lock() and a preempt_disable(). |
| 96 | This stall happens only once per rcutorture run. |
| 97 | If you need multiple stalls, use modprobe and rmmod to |
| 98 | repeatedly run rcutorture. The default for stall_cpu |
| 99 | is zero, which prevents rcutorture from stalling a CPU. |
| 100 | |
| 101 | Note that attempts to rmmod rcutorture while the stall |
| 102 | is ongoing will hang, so be careful what value you |
| 103 | choose for this module parameter! In addition, too-large |
| 104 | values for stall_cpu might well induce failures and |
| 105 | warnings in other parts of the kernel. You have been |
| 106 | warned! |
| 107 | |
| 108 | stall_cpu_holdoff |
| 109 | The number of seconds to wait after rcutorture starts |
| 110 | before stalling a CPU. Defaults to 10 seconds. |
| 111 | |
Paul E. McKenney | a241ec6 | 2005-10-30 15:03:12 -0800 | [diff] [blame] | 112 | stat_interval The number of seconds between output of torture |
| 113 | statistics (via printk()). Regardless of the interval, |
| 114 | statistics are printed when the module is unloaded. |
| 115 | Setting the interval to zero causes the statistics to |
| 116 | be printed -only- when the module is unloaded, and this |
| 117 | is the default. |
| 118 | |
Paul E. McKenney | d120f65 | 2008-06-18 05:21:44 -0700 | [diff] [blame] | 119 | stutter The length of time to run the test before pausing for this |
| 120 | same period of time. Defaults to "stutter=5", so as |
| 121 | to run and pause for (roughly) five-second intervals. |
| 122 | Specifying "stutter=0" causes the test to run continuously |
| 123 | without pausing, which is the old default behavior. |
| 124 | |
Paul E. McKenney | 63cd758 | 2011-06-05 10:07:18 -0700 | [diff] [blame] | 125 | test_boost Whether or not to test the ability of RCU to do priority |
| 126 | boosting. Defaults to "test_boost=1", which performs |
| 127 | RCU priority-inversion testing only if the selected |
| 128 | RCU implementation supports priority boosting. Specifying |
| 129 | "test_boost=0" never performs RCU priority-inversion |
| 130 | testing. Specifying "test_boost=2" performs RCU |
| 131 | priority-inversion testing even if the selected RCU |
| 132 | implementation does not support RCU priority boosting, |
| 133 | which can be used to test rcutorture's ability to |
| 134 | carry out RCU priority-inversion testing. |
| 135 | |
| 136 | test_boost_interval |
| 137 | The number of seconds in an RCU priority-inversion test |
| 138 | cycle. Defaults to "test_boost_interval=7". It is |
| 139 | usually wise for this value to be relatively prime to |
| 140 | the value selected for "stutter". |
| 141 | |
| 142 | test_boost_duration |
| 143 | The number of seconds to do RCU priority-inversion testing |
| 144 | within any given "test_boost_interval". Defaults to |
| 145 | "test_boost_duration=4". |
| 146 | |
Paul E. McKenney | 29766f1 | 2006-06-27 02:54:02 -0700 | [diff] [blame] | 147 | test_no_idle_hz Whether or not to test the ability of RCU to operate in |
| 148 | a kernel that disables the scheduling-clock interrupt to |
| 149 | idle CPUs. Boolean parameter, "1" to test, "0" otherwise. |
Paul E. McKenney | f85d6c7 | 2008-01-25 21:08:25 +0100 | [diff] [blame] | 150 | Defaults to omitting this test. |
Paul E. McKenney | 29766f1 | 2006-06-27 02:54:02 -0700 | [diff] [blame] | 151 | |
Paul E. McKenney | 63cd758 | 2011-06-05 10:07:18 -0700 | [diff] [blame] | 152 | torture_type The type of RCU to test, with string values as follows: |
| 153 | |
Paul E. McKenney | b672adf | 2015-08-24 11:46:00 -0700 | [diff] [blame] | 154 | "rcu": rcu_read_lock(), rcu_read_unlock() and call_rcu(), |
| 155 | along with expedited, synchronous, and polling |
| 156 | variants. |
Paul E. McKenney | 63cd758 | 2011-06-05 10:07:18 -0700 | [diff] [blame] | 157 | |
| 158 | "rcu_bh": rcu_read_lock_bh(), rcu_read_unlock_bh(), and |
Paul E. McKenney | b672adf | 2015-08-24 11:46:00 -0700 | [diff] [blame] | 159 | call_rcu_bh(), along with expedited and synchronous |
| 160 | variants. |
Paul E. McKenney | 63cd758 | 2011-06-05 10:07:18 -0700 | [diff] [blame] | 161 | |
Paul E. McKenney | b672adf | 2015-08-24 11:46:00 -0700 | [diff] [blame] | 162 | "rcu_busted": This tests an intentionally incorrect version |
| 163 | of RCU in order to help test rcutorture itself. |
Paul E. McKenney | bdf2a43 | 2011-06-07 16:59:35 -0700 | [diff] [blame] | 164 | |
Paul E. McKenney | 63cd758 | 2011-06-05 10:07:18 -0700 | [diff] [blame] | 165 | "srcu": srcu_read_lock(), srcu_read_unlock() and |
Paul E. McKenney | b672adf | 2015-08-24 11:46:00 -0700 | [diff] [blame] | 166 | call_srcu(), along with expedited and |
| 167 | synchronous variants. |
Paul E. McKenney | 63cd758 | 2011-06-05 10:07:18 -0700 | [diff] [blame] | 168 | |
| 169 | "sched": preempt_disable(), preempt_enable(), and |
Paul E. McKenney | b672adf | 2015-08-24 11:46:00 -0700 | [diff] [blame] | 170 | call_rcu_sched(), along with expedited, |
| 171 | synchronous, and polling variants. |
Paul E. McKenney | 63cd758 | 2011-06-05 10:07:18 -0700 | [diff] [blame] | 172 | |
Paul E. McKenney | b672adf | 2015-08-24 11:46:00 -0700 | [diff] [blame] | 173 | "tasks": voluntary context switch and call_rcu_tasks(), |
| 174 | along with expedited and synchronous variants. |
Paul E. McKenney | 63cd758 | 2011-06-05 10:07:18 -0700 | [diff] [blame] | 175 | |
| 176 | Defaults to "rcu". |
Paul E. McKenney | 72e9bb5 | 2006-06-27 02:54:03 -0700 | [diff] [blame] | 177 | |
Paul E. McKenney | a241ec6 | 2005-10-30 15:03:12 -0800 | [diff] [blame] | 178 | verbose Enable debug printk()s. Default is disabled. |
| 179 | |
| 180 | |
| 181 | OUTPUT |
| 182 | |
| 183 | The statistics output is as follows: |
| 184 | |
Paul E. McKenney | 63cd758 | 2011-06-05 10:07:18 -0700 | [diff] [blame] | 185 | rcu-torture:--- Start of test: nreaders=16 nfakewriters=4 stat_interval=30 verbose=0 test_no_idle_hz=1 shuffle_interval=3 stutter=5 irqreader=1 fqs_duration=0 fqs_holdoff=0 fqs_stutter=3 test_boost=1/0 test_boost_interval=7 test_boost_duration=4 |
Paul E. McKenney | fae4b54 | 2012-02-20 17:51:45 -0800 | [diff] [blame] | 186 | rcu-torture: rtc: (null) ver: 155441 tfle: 0 rta: 155441 rtaf: 8884 rtf: 155440 rtmbe: 0 rtbe: 0 rtbke: 0 rtbre: 0 rtbf: 0 rtb: 0 nt: 3055767 |
Paul E. McKenney | 63cd758 | 2011-06-05 10:07:18 -0700 | [diff] [blame] | 187 | rcu-torture: Reader Pipe: 727860534 34213 0 0 0 0 0 0 0 0 0 |
| 188 | rcu-torture: Reader Batch: 727877838 17003 0 0 0 0 0 0 0 0 0 |
| 189 | rcu-torture: Free-Block Circulation: 155440 155440 155440 155440 155440 155440 155440 155440 155440 155440 0 |
| 190 | rcu-torture:--- End of test: SUCCESS: nreaders=16 nfakewriters=4 stat_interval=30 verbose=0 test_no_idle_hz=1 shuffle_interval=3 stutter=5 irqreader=1 fqs_duration=0 fqs_holdoff=0 fqs_stutter=3 test_boost=1/0 test_boost_interval=7 test_boost_duration=4 |
Paul E. McKenney | a241ec6 | 2005-10-30 15:03:12 -0800 | [diff] [blame] | 191 | |
Paul E. McKenney | 72e9bb5 | 2006-06-27 02:54:03 -0700 | [diff] [blame] | 192 | The command "dmesg | grep torture:" will extract this information on |
Paul E. McKenney | a241ec6 | 2005-10-30 15:03:12 -0800 | [diff] [blame] | 193 | most systems. On more esoteric configurations, it may be necessary to |
| 194 | use other commands to access the output of the printk()s used by |
| 195 | the RCU torture test. The printk()s use KERN_ALERT, so they should |
| 196 | be evident. ;-) |
| 197 | |
Paul E. McKenney | 63cd758 | 2011-06-05 10:07:18 -0700 | [diff] [blame] | 198 | The first and last lines show the rcutorture module parameters, and the |
| 199 | last line shows either "SUCCESS" or "FAILURE", based on rcutorture's |
| 200 | automatic determination as to whether RCU operated correctly. |
| 201 | |
Paul E. McKenney | a241ec6 | 2005-10-30 15:03:12 -0800 | [diff] [blame] | 202 | The entries are as follows: |
| 203 | |
Paul E. McKenney | a241ec6 | 2005-10-30 15:03:12 -0800 | [diff] [blame] | 204 | o "rtc": The hexadecimal address of the structure currently visible |
| 205 | to readers. |
| 206 | |
Paul E. McKenney | 63cd758 | 2011-06-05 10:07:18 -0700 | [diff] [blame] | 207 | o "ver": The number of times since boot that the RCU writer task |
Paul E. McKenney | a241ec6 | 2005-10-30 15:03:12 -0800 | [diff] [blame] | 208 | has changed the structure visible to readers. |
| 209 | |
| 210 | o "tfle": If non-zero, indicates that the "torture freelist" |
Paul E. McKenney | 63cd758 | 2011-06-05 10:07:18 -0700 | [diff] [blame] | 211 | containing structures to be placed into the "rtc" area is empty. |
Paul E. McKenney | a241ec6 | 2005-10-30 15:03:12 -0800 | [diff] [blame] | 212 | This condition is important, since it can fool you into thinking |
| 213 | that RCU is working when it is not. :-/ |
| 214 | |
| 215 | o "rta": Number of structures allocated from the torture freelist. |
| 216 | |
| 217 | o "rtaf": Number of allocations from the torture freelist that have |
Paul E. McKenney | 63cd758 | 2011-06-05 10:07:18 -0700 | [diff] [blame] | 218 | failed due to the list being empty. It is not unusual for this |
| 219 | to be non-zero, but it is bad for it to be a large fraction of |
| 220 | the value indicated by "rta". |
Paul E. McKenney | a241ec6 | 2005-10-30 15:03:12 -0800 | [diff] [blame] | 221 | |
| 222 | o "rtf": Number of frees into the torture freelist. |
| 223 | |
Paul E. McKenney | 63cd758 | 2011-06-05 10:07:18 -0700 | [diff] [blame] | 224 | o "rtmbe": A non-zero value indicates that rcutorture believes that |
| 225 | rcu_assign_pointer() and rcu_dereference() are not working |
| 226 | correctly. This value should be zero. |
| 227 | |
Paul E. McKenney | fae4b54 | 2012-02-20 17:51:45 -0800 | [diff] [blame] | 228 | o "rtbe": A non-zero value indicates that one of the rcu_barrier() |
| 229 | family of functions is not working correctly. |
| 230 | |
Paul E. McKenney | 63cd758 | 2011-06-05 10:07:18 -0700 | [diff] [blame] | 231 | o "rtbke": rcutorture was unable to create the real-time kthreads |
| 232 | used to force RCU priority inversion. This value should be zero. |
| 233 | |
| 234 | o "rtbre": Although rcutorture successfully created the kthreads |
| 235 | used to force RCU priority inversion, it was unable to set them |
| 236 | to the real-time priority level of 1. This value should be zero. |
| 237 | |
| 238 | o "rtbf": The number of times that RCU priority boosting failed |
| 239 | to resolve RCU priority inversion. |
| 240 | |
| 241 | o "rtb": The number of times that rcutorture attempted to force |
| 242 | an RCU priority inversion condition. If you are testing RCU |
| 243 | priority boosting via the "test_boost" module parameter, this |
| 244 | value should be non-zero. |
| 245 | |
| 246 | o "nt": The number of times rcutorture ran RCU read-side code from |
| 247 | within a timer handler. This value should be non-zero only |
| 248 | if you specified the "irqreader" module parameter. |
| 249 | |
Paul E. McKenney | a241ec6 | 2005-10-30 15:03:12 -0800 | [diff] [blame] | 250 | o "Reader Pipe": Histogram of "ages" of structures seen by readers. |
| 251 | If any entries past the first two are non-zero, RCU is broken. |
| 252 | And rcutorture prints the error flag string "!!!" to make sure |
| 253 | you notice. The age of a newly allocated structure is zero, |
| 254 | it becomes one when removed from reader visibility, and is |
| 255 | incremented once per grace period subsequently -- and is freed |
| 256 | after passing through (RCU_TORTURE_PIPE_LEN-2) grace periods. |
| 257 | |
| 258 | The output displayed above was taken from a correctly working |
| 259 | RCU. If you want to see what it looks like when broken, break |
| 260 | it yourself. ;-) |
| 261 | |
| 262 | o "Reader Batch": Another histogram of "ages" of structures seen |
| 263 | by readers, but in terms of counter flips (or batches) rather |
| 264 | than in terms of grace periods. The legal number of non-zero |
Paul E. McKenney | f85d6c7 | 2008-01-25 21:08:25 +0100 | [diff] [blame] | 265 | entries is again two. The reason for this separate view is that |
| 266 | it is sometimes easier to get the third entry to show up in the |
Paul E. McKenney | a241ec6 | 2005-10-30 15:03:12 -0800 | [diff] [blame] | 267 | "Reader Batch" list than in the "Reader Pipe" list. |
| 268 | |
| 269 | o "Free-Block Circulation": Shows the number of torture structures |
| 270 | that have reached a given point in the pipeline. The first element |
| 271 | should closely correspond to the number of structures allocated, |
| 272 | the second to the number that have been removed from reader view, |
| 273 | and all but the last remaining to the corresponding number of |
| 274 | passes through a grace period. The last entry should be zero, |
| 275 | as it is only incremented if a torture structure's counter |
| 276 | somehow gets incremented farther than it should. |
| 277 | |
Paul E. McKenney | b2896d2 | 2006-10-04 02:17:03 -0700 | [diff] [blame] | 278 | Different implementations of RCU can provide implementation-specific |
Paul E. McKenney | 63cd758 | 2011-06-05 10:07:18 -0700 | [diff] [blame] | 279 | additional information. For example, SRCU provides the following |
| 280 | additional line: |
Paul E. McKenney | b2896d2 | 2006-10-04 02:17:03 -0700 | [diff] [blame] | 281 | |
Paul E. McKenney | b2896d2 | 2006-10-04 02:17:03 -0700 | [diff] [blame] | 282 | srcu-torture: per-CPU(idx=1): 0(0,1) 1(0,1) 2(0,0) 3(0,1) |
| 283 | |
Paul E. McKenney | 63cd758 | 2011-06-05 10:07:18 -0700 | [diff] [blame] | 284 | This line shows the per-CPU counter state. The numbers in parentheses are |
| 285 | the values of the "old" and "current" counters for the corresponding CPU. |
| 286 | The "idx" value maps the "old" and "current" values to the underlying |
| 287 | array, and is useful for debugging. |
Paul E. McKenney | 240ebbf | 2009-06-25 09:08:18 -0700 | [diff] [blame] | 288 | |
Paul E. McKenney | a241ec6 | 2005-10-30 15:03:12 -0800 | [diff] [blame] | 289 | |
| 290 | USAGE |
| 291 | |
| 292 | The following script may be used to torture RCU: |
| 293 | |
| 294 | #!/bin/sh |
| 295 | |
| 296 | modprobe rcutorture |
Paul E. McKenney | 105617d | 2012-02-02 11:27:02 -0800 | [diff] [blame] | 297 | sleep 3600 |
Paul E. McKenney | a241ec6 | 2005-10-30 15:03:12 -0800 | [diff] [blame] | 298 | rmmod rcutorture |
Paul E. McKenney | 72e9bb5 | 2006-06-27 02:54:03 -0700 | [diff] [blame] | 299 | dmesg | grep torture: |
Paul E. McKenney | a241ec6 | 2005-10-30 15:03:12 -0800 | [diff] [blame] | 300 | |
| 301 | The output can be manually inspected for the error flag of "!!!". |
| 302 | One could of course create a more elaborate script that automatically |
Paul E. McKenney | 9b9ec9b | 2012-01-17 14:36:51 -0800 | [diff] [blame] | 303 | checked for such errors. The "rmmod" command forces a "SUCCESS", |
| 304 | "FAILURE", or "RCU_HOTPLUG" indication to be printk()ed. The first |
| 305 | two are self-explanatory, while the last indicates that while there |
| 306 | were no RCU failures, CPU-hotplug problems were detected. |