Catalin Marinas | 9703d9d | 2012-03-05 11:49:27 +0000 | [diff] [blame] | 1 | Booting AArch64 Linux |
| 2 | ===================== |
| 3 | |
| 4 | Author: Will Deacon <will.deacon@arm.com> |
| 5 | Date : 07 September 2012 |
| 6 | |
| 7 | This document is based on the ARM booting document by Russell King and |
| 8 | is relevant to all public releases of the AArch64 Linux kernel. |
| 9 | |
| 10 | The AArch64 exception model is made up of a number of exception levels |
| 11 | (EL0 - EL3), with EL0 and EL1 having a secure and a non-secure |
| 12 | counterpart. EL2 is the hypervisor level and exists only in non-secure |
| 13 | mode. EL3 is the highest priority level and exists only in secure mode. |
| 14 | |
| 15 | For the purposes of this document, we will use the term `boot loader' |
| 16 | simply to define all software that executes on the CPU(s) before control |
| 17 | is passed to the Linux kernel. This may include secure monitor and |
| 18 | hypervisor code, or it may just be a handful of instructions for |
| 19 | preparing a minimal boot environment. |
| 20 | |
| 21 | Essentially, the boot loader should provide (as a minimum) the |
| 22 | following: |
| 23 | |
| 24 | 1. Setup and initialise the RAM |
| 25 | 2. Setup the device tree |
| 26 | 3. Decompress the kernel image |
| 27 | 4. Call the kernel image |
| 28 | |
| 29 | |
| 30 | 1. Setup and initialise RAM |
| 31 | --------------------------- |
| 32 | |
| 33 | Requirement: MANDATORY |
| 34 | |
| 35 | The boot loader is expected to find and initialise all RAM that the |
| 36 | kernel will use for volatile data storage in the system. It performs |
| 37 | this in a machine dependent manner. (It may use internal algorithms |
| 38 | to automatically locate and size all RAM, or it may use knowledge of |
| 39 | the RAM in the machine, or any other method the boot loader designer |
| 40 | sees fit.) |
| 41 | |
| 42 | |
| 43 | 2. Setup the device tree |
| 44 | ------------------------- |
| 45 | |
| 46 | Requirement: MANDATORY |
| 47 | |
Mark Salter | 4d5e0b1 | 2013-09-04 15:10:02 +0100 | [diff] [blame] | 48 | The device tree blob (dtb) must be placed on an 8-byte boundary within |
| 49 | the first 512 megabytes from the start of the kernel image and must not |
| 50 | cross a 2-megabyte boundary. This is to allow the kernel to map the |
Catalin Marinas | 9703d9d | 2012-03-05 11:49:27 +0000 | [diff] [blame] | 51 | blob using a single section mapping in the initial page tables. |
| 52 | |
| 53 | |
| 54 | 3. Decompress the kernel image |
| 55 | ------------------------------ |
| 56 | |
| 57 | Requirement: OPTIONAL |
| 58 | |
| 59 | The AArch64 kernel does not currently provide a decompressor and |
| 60 | therefore requires decompression (gzip etc.) to be performed by the boot |
| 61 | loader if a compressed Image target (e.g. Image.gz) is used. For |
| 62 | bootloaders that do not implement this requirement, the uncompressed |
| 63 | Image target is available instead. |
| 64 | |
| 65 | |
| 66 | 4. Call the kernel image |
| 67 | ------------------------ |
| 68 | |
| 69 | Requirement: MANDATORY |
| 70 | |
Roy Franz | 4370eec | 2013-08-15 00:10:00 +0100 | [diff] [blame] | 71 | The decompressed kernel image contains a 64-byte header as follows: |
Catalin Marinas | 9703d9d | 2012-03-05 11:49:27 +0000 | [diff] [blame] | 72 | |
Roy Franz | 4370eec | 2013-08-15 00:10:00 +0100 | [diff] [blame] | 73 | u32 code0; /* Executable code */ |
| 74 | u32 code1; /* Executable code */ |
Mark Rutland | a2c1d73 | 2014-06-24 16:51:36 +0100 | [diff] [blame] | 75 | u64 text_offset; /* Image load offset, little endian */ |
| 76 | u64 image_size; /* Effective Image size, little endian */ |
| 77 | u64 flags; /* kernel flags, little endian */ |
Catalin Marinas | 9703d9d | 2012-03-05 11:49:27 +0000 | [diff] [blame] | 78 | u64 res2 = 0; /* reserved */ |
Roy Franz | 4370eec | 2013-08-15 00:10:00 +0100 | [diff] [blame] | 79 | u64 res3 = 0; /* reserved */ |
| 80 | u64 res4 = 0; /* reserved */ |
| 81 | u32 magic = 0x644d5241; /* Magic number, little endian, "ARM\x64" */ |
Mark Rutland | a2c1d73 | 2014-06-24 16:51:36 +0100 | [diff] [blame] | 82 | u32 res5; /* reserved (used for PE COFF offset) */ |
Roy Franz | 4370eec | 2013-08-15 00:10:00 +0100 | [diff] [blame] | 83 | |
| 84 | |
| 85 | Header notes: |
| 86 | |
Mark Rutland | a2c1d73 | 2014-06-24 16:51:36 +0100 | [diff] [blame] | 87 | - As of v3.17, all fields are little endian unless stated otherwise. |
| 88 | |
Roy Franz | 4370eec | 2013-08-15 00:10:00 +0100 | [diff] [blame] | 89 | - code0/code1 are responsible for branching to stext. |
Mark Rutland | a2c1d73 | 2014-06-24 16:51:36 +0100 | [diff] [blame] | 90 | |
Mark Salter | cdd7857 | 2013-11-29 16:00:14 -0500 | [diff] [blame] | 91 | - when booting through EFI, code0/code1 are initially skipped. |
| 92 | res5 is an offset to the PE header and the PE header has the EFI |
Mark Rutland | a2c1d73 | 2014-06-24 16:51:36 +0100 | [diff] [blame] | 93 | entry point (efi_stub_entry). When the stub has done its work, it |
Mark Salter | cdd7857 | 2013-11-29 16:00:14 -0500 | [diff] [blame] | 94 | jumps to code0 to resume the normal boot process. |
Catalin Marinas | 9703d9d | 2012-03-05 11:49:27 +0000 | [diff] [blame] | 95 | |
Mark Rutland | a2c1d73 | 2014-06-24 16:51:36 +0100 | [diff] [blame] | 96 | - Prior to v3.17, the endianness of text_offset was not specified. In |
| 97 | these cases image_size is zero and text_offset is 0x80000 in the |
| 98 | endianness of the kernel. Where image_size is non-zero image_size is |
| 99 | little-endian and must be respected. Where image_size is zero, |
| 100 | text_offset can be assumed to be 0x80000. |
| 101 | |
| 102 | - The flags field (introduced in v3.17) is a little-endian 64-bit field |
| 103 | composed as follows: |
| 104 | Bit 0: Kernel endianness. 1 if BE, 0 if LE. |
| 105 | Bits 1-63: Reserved. |
| 106 | |
| 107 | - When image_size is zero, a bootloader should attempt to keep as much |
| 108 | memory as possible free for use by the kernel immediately after the |
| 109 | end of the kernel image. The amount of space required will vary |
| 110 | depending on selected features, and is effectively unbound. |
| 111 | |
| 112 | The Image must be placed text_offset bytes from a 2MB aligned base |
| 113 | address near the start of usable system RAM and called there. Memory |
| 114 | below that base address is currently unusable by Linux, and therefore it |
| 115 | is strongly recommended that this location is the start of system RAM. |
| 116 | At least image_size bytes from the start of the image must be free for |
| 117 | use by the kernel. |
| 118 | |
| 119 | Any memory described to the kernel (even that below the 2MB aligned base |
| 120 | address) which is not marked as reserved from the kernel e.g. with a |
| 121 | memreserve region in the device tree) will be considered as available to |
| 122 | the kernel. |
Catalin Marinas | 9703d9d | 2012-03-05 11:49:27 +0000 | [diff] [blame] | 123 | |
| 124 | Before jumping into the kernel, the following conditions must be met: |
| 125 | |
| 126 | - Quiesce all DMA capable devices so that memory does not get |
| 127 | corrupted by bogus network packets or disk data. This will save |
| 128 | you many hours of debug. |
| 129 | |
| 130 | - Primary CPU general-purpose register settings |
| 131 | x0 = physical address of device tree blob (dtb) in system RAM. |
| 132 | x1 = 0 (reserved for future use) |
| 133 | x2 = 0 (reserved for future use) |
| 134 | x3 = 0 (reserved for future use) |
| 135 | |
| 136 | - CPU mode |
| 137 | All forms of interrupts must be masked in PSTATE.DAIF (Debug, SError, |
| 138 | IRQ and FIQ). |
| 139 | The CPU must be in either EL2 (RECOMMENDED in order to have access to |
| 140 | the virtualisation extensions) or non-secure EL1. |
| 141 | |
| 142 | - Caches, MMUs |
| 143 | The MMU must be off. |
| 144 | Instruction cache may be on or off. |
Catalin Marinas | c218bca | 2014-03-26 18:25:55 +0000 | [diff] [blame] | 145 | The address range corresponding to the loaded kernel image must be |
| 146 | cleaned to the PoC. In the presence of a system cache or other |
| 147 | coherent masters with caches enabled, this will typically require |
| 148 | cache maintenance by VA rather than set/way operations. |
| 149 | System caches which respect the architected cache maintenance by VA |
| 150 | operations must be configured and may be enabled. |
| 151 | System caches which do not respect architected cache maintenance by VA |
| 152 | operations (not recommended) must be configured and disabled. |
Catalin Marinas | 9703d9d | 2012-03-05 11:49:27 +0000 | [diff] [blame] | 153 | |
| 154 | - Architected timers |
Mark Rutland | 4fcd6e1 | 2013-10-11 14:52:07 +0100 | [diff] [blame] | 155 | CNTFRQ must be programmed with the timer frequency and CNTVOFF must |
| 156 | be programmed with a consistent value on all CPUs. If entering the |
| 157 | kernel at EL1, CNTHCTL_EL2 must have EL1PCTEN (bit 0) set where |
| 158 | available. |
Catalin Marinas | 9703d9d | 2012-03-05 11:49:27 +0000 | [diff] [blame] | 159 | |
| 160 | - Coherency |
| 161 | All CPUs to be booted by the kernel must be part of the same coherency |
| 162 | domain on entry to the kernel. This may require IMPLEMENTATION DEFINED |
| 163 | initialisation to enable the receiving of maintenance operations on |
| 164 | each CPU. |
| 165 | |
| 166 | - System registers |
| 167 | All writable architected system registers at the exception level where |
| 168 | the kernel image will be entered must be initialised by software at a |
| 169 | higher exception level to prevent execution in an UNKNOWN state. |
| 170 | |
Marc Zyngier | 63f8344 | 2013-11-28 18:24:58 +0000 | [diff] [blame] | 171 | For systems with a GICv3 interrupt controller: |
| 172 | - If EL3 is present: |
| 173 | ICC_SRE_EL3.Enable (bit 3) must be initialiased to 0b1. |
| 174 | ICC_SRE_EL3.SRE (bit 0) must be initialised to 0b1. |
| 175 | - If the kernel is entered at EL1: |
| 176 | ICC.SRE_EL2.Enable (bit 3) must be initialised to 0b1 |
| 177 | ICC_SRE_EL2.SRE (bit 0) must be initialised to 0b1. |
| 178 | |
Mark Rutland | 4fcd6e1 | 2013-10-11 14:52:07 +0100 | [diff] [blame] | 179 | The requirements described above for CPU mode, caches, MMUs, architected |
| 180 | timers, coherency and system registers apply to all CPUs. All CPUs must |
| 181 | enter the kernel in the same exception level. |
| 182 | |
Catalin Marinas | 9703d9d | 2012-03-05 11:49:27 +0000 | [diff] [blame] | 183 | The boot loader is expected to enter the kernel on each CPU in the |
| 184 | following manner: |
| 185 | |
| 186 | - The primary CPU must jump directly to the first instruction of the |
| 187 | kernel image. The device tree blob passed by this CPU must contain |
Mark Rutland | 4fcd6e1 | 2013-10-11 14:52:07 +0100 | [diff] [blame] | 188 | an 'enable-method' property for each cpu node. The supported |
| 189 | enable-methods are described below. |
Catalin Marinas | 9703d9d | 2012-03-05 11:49:27 +0000 | [diff] [blame] | 190 | |
| 191 | It is expected that the bootloader will generate these device tree |
| 192 | properties and insert them into the blob prior to kernel entry. |
| 193 | |
Mark Rutland | 4fcd6e1 | 2013-10-11 14:52:07 +0100 | [diff] [blame] | 194 | - CPUs with a "spin-table" enable-method must have a 'cpu-release-addr' |
| 195 | property in their cpu node. This property identifies a |
| 196 | naturally-aligned 64-bit zero-initalised memory location. |
| 197 | |
| 198 | These CPUs should spin outside of the kernel in a reserved area of |
| 199 | memory (communicated to the kernel by a /memreserve/ region in the |
Catalin Marinas | 9703d9d | 2012-03-05 11:49:27 +0000 | [diff] [blame] | 200 | device tree) polling their cpu-release-addr location, which must be |
| 201 | contained in the reserved region. A wfe instruction may be inserted |
| 202 | to reduce the overhead of the busy-loop and a sev will be issued by |
| 203 | the primary CPU. When a read of the location pointed to by the |
Mark Rutland | 4fcd6e1 | 2013-10-11 14:52:07 +0100 | [diff] [blame] | 204 | cpu-release-addr returns a non-zero value, the CPU must jump to this |
| 205 | value. The value will be written as a single 64-bit little-endian |
| 206 | value, so CPUs must convert the read value to their native endianness |
| 207 | before jumping to it. |
| 208 | |
| 209 | - CPUs with a "psci" enable method should remain outside of |
| 210 | the kernel (i.e. outside of the regions of memory described to the |
| 211 | kernel in the memory node, or in a reserved area of memory described |
| 212 | to the kernel by a /memreserve/ region in the device tree). The |
| 213 | kernel will issue CPU_ON calls as described in ARM document number ARM |
| 214 | DEN 0022A ("Power State Coordination Interface System Software on ARM |
| 215 | processors") to bring CPUs into the kernel. |
| 216 | |
| 217 | The device tree should contain a 'psci' node, as described in |
| 218 | Documentation/devicetree/bindings/arm/psci.txt. |
Catalin Marinas | 9703d9d | 2012-03-05 11:49:27 +0000 | [diff] [blame] | 219 | |
| 220 | - Secondary CPU general-purpose register settings |
| 221 | x0 = 0 (reserved for future use) |
| 222 | x1 = 0 (reserved for future use) |
| 223 | x2 = 0 (reserved for future use) |
| 224 | x3 = 0 (reserved for future use) |