Ard Biesheuvel | 2afd0a0 | 2013-08-23 14:47:09 +0100 | [diff] [blame] | 1 | Kernel mode NEON |
| 2 | ================ |
| 3 | |
| 4 | TL;DR summary |
| 5 | ------------- |
| 6 | * Use only NEON instructions, or VFP instructions that don't rely on support |
| 7 | code |
| 8 | * Isolate your NEON code in a separate compilation unit, and compile it with |
| 9 | '-mfpu=neon -mfloat-abi=softfp' |
| 10 | * Put kernel_neon_begin() and kernel_neon_end() calls around the calls into your |
| 11 | NEON code |
| 12 | * Don't sleep in your NEON code, and be aware that it will be executed with |
| 13 | preemption disabled |
| 14 | |
| 15 | |
| 16 | Introduction |
| 17 | ------------ |
| 18 | It is possible to use NEON instructions (and in some cases, VFP instructions) in |
| 19 | code that runs in kernel mode. However, for performance reasons, the NEON/VFP |
| 20 | register file is not preserved and restored at every context switch or taken |
| 21 | exception like the normal register file is, so some manual intervention is |
| 22 | required. Furthermore, special care is required for code that may sleep [i.e., |
| 23 | may call schedule()], as NEON or VFP instructions will be executed in a |
| 24 | non-preemptible section for reasons outlined below. |
| 25 | |
| 26 | |
| 27 | Lazy preserve and restore |
| 28 | ------------------------- |
| 29 | The NEON/VFP register file is managed using lazy preserve (on UP systems) and |
| 30 | lazy restore (on both SMP and UP systems). This means that the register file is |
| 31 | kept 'live', and is only preserved and restored when multiple tasks are |
| 32 | contending for the NEON/VFP unit (or, in the SMP case, when a task migrates to |
| 33 | another core). Lazy restore is implemented by disabling the NEON/VFP unit after |
| 34 | every context switch, resulting in a trap when subsequently a NEON/VFP |
| 35 | instruction is issued, allowing the kernel to step in and perform the restore if |
| 36 | necessary. |
| 37 | |
| 38 | Any use of the NEON/VFP unit in kernel mode should not interfere with this, so |
| 39 | it is required to do an 'eager' preserve of the NEON/VFP register file, and |
| 40 | enable the NEON/VFP unit explicitly so no exceptions are generated on first |
| 41 | subsequent use. This is handled by the function kernel_neon_begin(), which |
| 42 | should be called before any kernel mode NEON or VFP instructions are issued. |
| 43 | Likewise, the NEON/VFP unit should be disabled again after use to make sure user |
| 44 | mode will hit the lazy restore trap upon next use. This is handled by the |
| 45 | function kernel_neon_end(). |
| 46 | |
| 47 | |
| 48 | Interruptions in kernel mode |
| 49 | ---------------------------- |
| 50 | For reasons of performance and simplicity, it was decided that there shall be no |
| 51 | preserve/restore mechanism for the kernel mode NEON/VFP register contents. This |
| 52 | implies that interruptions of a kernel mode NEON section can only be allowed if |
| 53 | they are guaranteed not to touch the NEON/VFP registers. For this reason, the |
| 54 | following rules and restrictions apply in the kernel: |
| 55 | * NEON/VFP code is not allowed in interrupt context; |
| 56 | * NEON/VFP code is not allowed to sleep; |
| 57 | * NEON/VFP code is executed with preemption disabled. |
| 58 | |
| 59 | If latency is a concern, it is possible to put back to back calls to |
| 60 | kernel_neon_end() and kernel_neon_begin() in places in your code where none of |
| 61 | the NEON registers are live. (Additional calls to kernel_neon_begin() should be |
| 62 | reasonably cheap if no context switch occurred in the meantime) |
| 63 | |
| 64 | |
| 65 | VFP and support code |
| 66 | -------------------- |
| 67 | Earlier versions of VFP (prior to version 3) rely on software support for things |
| 68 | like IEEE-754 compliant underflow handling etc. When the VFP unit needs such |
| 69 | software assistance, it signals the kernel by raising an undefined instruction |
| 70 | exception. The kernel responds by inspecting the VFP control registers and the |
| 71 | current instruction and arguments, and emulates the instruction in software. |
| 72 | |
| 73 | Such software assistance is currently not implemented for VFP instructions |
| 74 | executed in kernel mode. If such a condition is encountered, the kernel will |
| 75 | fail and generate an OOPS. |
| 76 | |
| 77 | |
| 78 | Separating NEON code from ordinary code |
| 79 | --------------------------------------- |
| 80 | The compiler is not aware of the special significance of kernel_neon_begin() and |
| 81 | kernel_neon_end(), i.e., that it is only allowed to issue NEON/VFP instructions |
| 82 | between calls to these respective functions. Furthermore, GCC may generate NEON |
| 83 | instructions of its own at -O3 level if -mfpu=neon is selected, and even if the |
| 84 | kernel is currently compiled at -O2, future changes may result in NEON/VFP |
| 85 | instructions appearing in unexpected places if no special care is taken. |
| 86 | |
| 87 | Therefore, the recommended and only supported way of using NEON/VFP in the |
| 88 | kernel is by adhering to the following rules: |
| 89 | * isolate the NEON code in a separate compilation unit and compile it with |
| 90 | '-mfpu=neon -mfloat-abi=softfp'; |
| 91 | * issue the calls to kernel_neon_begin(), kernel_neon_end() as well as the calls |
| 92 | into the unit containing the NEON code from a compilation unit which is *not* |
| 93 | built with the GCC flag '-mfpu=neon' set. |
| 94 | |
| 95 | As the kernel is compiled with '-msoft-float', the above will guarantee that |
| 96 | both NEON and VFP instructions will only ever appear in designated compilation |
| 97 | units at any optimization level. |
| 98 | |
| 99 | |
| 100 | NEON assembler |
| 101 | -------------- |
| 102 | NEON assembler is supported with no additional caveats as long as the rules |
| 103 | above are followed. |
| 104 | |
| 105 | |
| 106 | NEON code generated by GCC |
| 107 | -------------------------- |
| 108 | The GCC option -ftree-vectorize (implied by -O3) tries to exploit implicit |
| 109 | parallelism, and generates NEON code from ordinary C source code. This is fully |
| 110 | supported as long as the rules above are followed. |
| 111 | |
| 112 | |
| 113 | NEON intrinsics |
| 114 | --------------- |
| 115 | NEON intrinsics are also supported. However, as code using NEON intrinsics |
| 116 | relies on the GCC header <arm_neon.h>, (which #includes <stdint.h>), you should |
| 117 | observe the following in addition to the rules above: |
| 118 | * Compile the unit containing the NEON intrinsics with '-ffreestanding' so GCC |
| 119 | uses its builtin version of <stdint.h> (this is a C99 header which the kernel |
| 120 | does not supply); |
| 121 | * Include <arm_neon.h> last, or at least after <linux/types.h> |