Paul E. McKenney | 1930605 | 2005-09-06 15:16:35 -0700 | [diff] [blame] | 1 | Using RCU to Protect Dynamic NMI Handlers |
| 2 | |
| 3 | |
| 4 | Although RCU is usually used to protect read-mostly data structures, |
| 5 | it is possible to use RCU to provide dynamic non-maskable interrupt |
| 6 | handlers, as well as dynamic irq handlers. This document describes |
| 7 | how to do this, drawing loosely from Zwane Mwaikambo's NMI-timer |
Wanlong Gao | 25eb650 | 2011-06-13 17:53:53 +0800 | [diff] [blame] | 8 | work in "arch/x86/oprofile/nmi_timer_int.c" and in |
| 9 | "arch/x86/kernel/traps.c". |
Paul E. McKenney | 1930605 | 2005-09-06 15:16:35 -0700 | [diff] [blame] | 10 | |
| 11 | The relevant pieces of code are listed below, each followed by a |
| 12 | brief explanation. |
| 13 | |
| 14 | static int dummy_nmi_callback(struct pt_regs *regs, int cpu) |
| 15 | { |
| 16 | return 0; |
| 17 | } |
| 18 | |
| 19 | The dummy_nmi_callback() function is a "dummy" NMI handler that does |
| 20 | nothing, but returns zero, thus saying that it did nothing, allowing |
| 21 | the NMI handler to take the default machine-specific action. |
| 22 | |
| 23 | static nmi_callback_t nmi_callback = dummy_nmi_callback; |
| 24 | |
| 25 | This nmi_callback variable is a global function pointer to the current |
| 26 | NMI handler. |
| 27 | |
Harvey Harrison | b5606c2 | 2008-02-13 15:03:16 -0800 | [diff] [blame] | 28 | void do_nmi(struct pt_regs * regs, long error_code) |
Paul E. McKenney | 1930605 | 2005-09-06 15:16:35 -0700 | [diff] [blame] | 29 | { |
| 30 | int cpu; |
| 31 | |
| 32 | nmi_enter(); |
| 33 | |
| 34 | cpu = smp_processor_id(); |
| 35 | ++nmi_count(cpu); |
| 36 | |
Paul E. McKenney | 50aec00 | 2010-04-09 15:39:12 -0700 | [diff] [blame] | 37 | if (!rcu_dereference_sched(nmi_callback)(regs, cpu)) |
Paul E. McKenney | 1930605 | 2005-09-06 15:16:35 -0700 | [diff] [blame] | 38 | default_do_nmi(regs); |
| 39 | |
| 40 | nmi_exit(); |
| 41 | } |
| 42 | |
| 43 | The do_nmi() function processes each NMI. It first disables preemption |
| 44 | in the same way that a hardware irq would, then increments the per-CPU |
| 45 | count of NMIs. It then invokes the NMI handler stored in the nmi_callback |
| 46 | function pointer. If this handler returns zero, do_nmi() invokes the |
| 47 | default_do_nmi() function to handle a machine-specific NMI. Finally, |
| 48 | preemption is restored. |
| 49 | |
Paul E. McKenney | 50aec00 | 2010-04-09 15:39:12 -0700 | [diff] [blame] | 50 | In theory, rcu_dereference_sched() is not needed, since this code runs |
| 51 | only on i386, which in theory does not need rcu_dereference_sched() |
| 52 | anyway. However, in practice it is a good documentation aid, particularly |
| 53 | for anyone attempting to do something similar on Alpha or on systems |
| 54 | with aggressive optimizing compilers. |
Paul E. McKenney | 1930605 | 2005-09-06 15:16:35 -0700 | [diff] [blame] | 55 | |
Paul E. McKenney | 50aec00 | 2010-04-09 15:39:12 -0700 | [diff] [blame] | 56 | Quick Quiz: Why might the rcu_dereference_sched() be necessary on Alpha, |
Paul E. McKenney | 1930605 | 2005-09-06 15:16:35 -0700 | [diff] [blame] | 57 | given that the code referenced by the pointer is read-only? |
| 58 | |
| 59 | |
| 60 | Back to the discussion of NMI and RCU... |
| 61 | |
| 62 | void set_nmi_callback(nmi_callback_t callback) |
| 63 | { |
| 64 | rcu_assign_pointer(nmi_callback, callback); |
| 65 | } |
| 66 | |
| 67 | The set_nmi_callback() function registers an NMI handler. Note that any |
| 68 | data that is to be used by the callback must be initialized up -before- |
| 69 | the call to set_nmi_callback(). On architectures that do not order |
| 70 | writes, the rcu_assign_pointer() ensures that the NMI handler sees the |
| 71 | initialized values. |
| 72 | |
| 73 | void unset_nmi_callback(void) |
| 74 | { |
| 75 | rcu_assign_pointer(nmi_callback, dummy_nmi_callback); |
| 76 | } |
| 77 | |
| 78 | This function unregisters an NMI handler, restoring the original |
| 79 | dummy_nmi_handler(). However, there may well be an NMI handler |
| 80 | currently executing on some other CPU. We therefore cannot free |
| 81 | up any data structures used by the old NMI handler until execution |
| 82 | of it completes on all other CPUs. |
| 83 | |
| 84 | One way to accomplish this is via synchronize_sched(), perhaps as |
| 85 | follows: |
| 86 | |
| 87 | unset_nmi_callback(); |
| 88 | synchronize_sched(); |
| 89 | kfree(my_nmi_data); |
| 90 | |
| 91 | This works because synchronize_sched() blocks until all CPUs complete |
| 92 | any preemption-disabled segments of code that they were executing. |
| 93 | Since NMI handlers disable preemption, synchronize_sched() is guaranteed |
| 94 | not to return until all ongoing NMI handlers exit. It is therefore safe |
| 95 | to free up the handler's data as soon as synchronize_sched() returns. |
| 96 | |
Paul E. McKenney | 3230075 | 2008-05-12 21:21:05 +0200 | [diff] [blame] | 97 | Important note: for this to work, the architecture in question must |
| 98 | invoke irq_enter() and irq_exit() on NMI entry and exit, respectively. |
| 99 | |
Paul E. McKenney | 1930605 | 2005-09-06 15:16:35 -0700 | [diff] [blame] | 100 | |
| 101 | Answer to Quick Quiz |
| 102 | |
Paul E. McKenney | 50aec00 | 2010-04-09 15:39:12 -0700 | [diff] [blame] | 103 | Why might the rcu_dereference_sched() be necessary on Alpha, given |
Paul E. McKenney | 1930605 | 2005-09-06 15:16:35 -0700 | [diff] [blame] | 104 | that the code referenced by the pointer is read-only? |
| 105 | |
| 106 | Answer: The caller to set_nmi_callback() might well have |
Paul E. McKenney | 50aec00 | 2010-04-09 15:39:12 -0700 | [diff] [blame] | 107 | initialized some data that is to be used by the new NMI |
| 108 | handler. In this case, the rcu_dereference_sched() would |
| 109 | be needed, because otherwise a CPU that received an NMI |
| 110 | just after the new handler was set might see the pointer |
| 111 | to the new NMI handler, but the old pre-initialized |
| 112 | version of the handler's data. |
Paul E. McKenney | 1930605 | 2005-09-06 15:16:35 -0700 | [diff] [blame] | 113 | |
Paul E. McKenney | 50aec00 | 2010-04-09 15:39:12 -0700 | [diff] [blame] | 114 | This same sad story can happen on other CPUs when using |
| 115 | a compiler with aggressive pointer-value speculation |
| 116 | optimizations. |
| 117 | |
| 118 | More important, the rcu_dereference_sched() makes it |
| 119 | clear to someone reading the code that the pointer is |
| 120 | being protected by RCU-sched. |