Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 1 | Memory management for CRIS/MMU |
| 2 | ------------------------------ |
| 3 | HISTORY: |
| 4 | |
| 5 | $Log: README.mm,v $ |
| 6 | Revision 1.1 2001/12/17 13:59:27 bjornw |
| 7 | Initial revision |
| 8 | |
| 9 | Revision 1.1 2000/07/10 16:25:21 bjornw |
| 10 | Initial revision |
| 11 | |
| 12 | Revision 1.4 2000/01/17 02:31:59 bjornw |
| 13 | Added discussion of paging and VM. |
| 14 | |
| 15 | Revision 1.3 1999/12/03 16:43:23 hp |
| 16 | Blurb about that the 3.5G-limitation is not a MMU limitation |
| 17 | |
| 18 | Revision 1.2 1999/12/03 16:04:21 hp |
| 19 | Picky comment about not mapping the first page |
| 20 | |
| 21 | Revision 1.1 1999/12/03 15:41:30 bjornw |
| 22 | First version of CRIS/MMU memory layout specification. |
| 23 | |
| 24 | |
| 25 | |
| 26 | |
| 27 | |
| 28 | ------------------------------ |
| 29 | |
| 30 | See the ETRAX-NG HSDD for reference. |
| 31 | |
| 32 | We use the page-size of 8 kbytes, as opposed to the i386 page-size of 4 kbytes. |
| 33 | |
| 34 | The MMU can, apart from the normal mapping of pages, also do a top-level |
| 35 | segmentation of the kernel memory space. We use this feature to avoid having |
| 36 | to use page-tables to map the physical memory into the kernel's address |
| 37 | space. We also use it to keep the user-mode virtual mapping in the same |
| 38 | map during kernel-mode, so that the kernel easily can access the corresponding |
| 39 | user-mode process' data. |
| 40 | |
| 41 | As a comparision, the Linux/i386 2.0 puts the kernel and physical RAM at |
| 42 | address 0, overlapping with the user-mode virtual space, so that descriptor |
| 43 | registers are needed for each memory access to specify which MMU space to |
| 44 | map through. That changed in 2.2, putting the kernel/physical RAM at |
| 45 | 0xc0000000, to co-exist with the user-mode mapping. We will do something |
| 46 | quite similar, but with the additional complexity of having to map the |
| 47 | internal chip I/O registers and the flash memory area (including SRAM |
| 48 | and peripherial chip-selets). |
| 49 | |
| 50 | The kernel-mode segmentation map: |
| 51 | |
| 52 | ------------------------ ------------------------ |
| 53 | FFFFFFFF| | => cached | | |
| 54 | | kernel seg_f | flash | | |
| 55 | F0000000|______________________| | | |
| 56 | EFFFFFFF| | => uncached | | |
| 57 | | kernel seg_e | flash | | |
| 58 | E0000000|______________________| | DRAM | |
| 59 | DFFFFFFF| | paged to any | Un-cached | |
| 60 | | kernel seg_d | =======> | | |
| 61 | D0000000|______________________| | | |
| 62 | CFFFFFFF| | | | |
| 63 | | kernel seg_c |==\ | | |
| 64 | C0000000|______________________| \ |______________________| |
| 65 | BFFFFFFF| | uncached | | |
| 66 | | kernel seg_b |=====\=========>| Registers | |
| 67 | B0000000|______________________| \c |______________________| |
| 68 | AFFFFFFF| | \a | | |
| 69 | | | \c | FLASH/SRAM/Peripheral| |
| 70 | | | \h |______________________| |
| 71 | | | \e | | |
| 72 | | | \d | | |
| 73 | | kernel seg_0 - seg_a | \==>| DRAM | |
| 74 | | | | Cached | |
| 75 | | | paged to any | | |
| 76 | | | =======> |______________________| |
| 77 | | | | | |
| 78 | | | | Illegal | |
| 79 | | | |______________________| |
| 80 | | | | | |
| 81 | | | | FLASH/SRAM/Peripheral| |
| 82 | 00000000|______________________| |______________________| |
| 83 | |
| 84 | In user-mode it looks the same except that only the space 0-AFFFFFFF is |
| 85 | available. Therefore, in this model, the virtual address space per process |
| 86 | is limited to 0xb0000000 bytes (minus 8192 bytes, since the first page, |
| 87 | 0..8191, is never mapped, in order to trap NULL references). |
| 88 | |
| 89 | It also means that the total physical RAM that can be mapped is 256 MB |
| 90 | (kseg_c above). More RAM can be mapped by choosing a different segmentation |
| 91 | and shrinking the user-mode memory space. |
| 92 | |
| 93 | The MMU can map all 4 GB in user mode, but doing that would mean that a |
| 94 | few extra instructions would be needed for each access to user mode |
| 95 | memory. |
| 96 | |
| 97 | The kernel needs access to both cached and uncached flash. Uncached is |
| 98 | necessary because of the special write/erase sequences. Also, the |
| 99 | peripherial chip-selects are decoded from that region. |
| 100 | |
| 101 | The kernel also needs its own virtual memory space. That is kseg_d. It |
| 102 | is used by the vmalloc() kernel function to allocate virtual contiguous |
| 103 | chunks of memory not possible using the normal kmalloc physical RAM |
| 104 | allocator. |
| 105 | |
| 106 | The setting of the actual MMU control registers to use this layout would |
| 107 | be something like this: |
| 108 | |
| 109 | R_MMU_KSEG = ( ( seg_f, seg ) | // Flash cached |
| 110 | ( seg_e, seg ) | // Flash uncached |
| 111 | ( seg_d, page ) | // kernel vmalloc area |
| 112 | ( seg_c, seg ) | // kernel linear segment |
| 113 | ( seg_b, seg ) | // kernel linear segment |
| 114 | ( seg_a, page ) | |
| 115 | ( seg_9, page ) | |
| 116 | ( seg_8, page ) | |
| 117 | ( seg_7, page ) | |
| 118 | ( seg_6, page ) | |
| 119 | ( seg_5, page ) | |
| 120 | ( seg_4, page ) | |
| 121 | ( seg_3, page ) | |
| 122 | ( seg_2, page ) | |
| 123 | ( seg_1, page ) | |
| 124 | ( seg_0, page ) ); |
| 125 | |
| 126 | R_MMU_KBASE_HI = ( ( base_f, 0x0 ) | // flash/sram/periph cached |
| 127 | ( base_e, 0x8 ) | // flash/sram/periph uncached |
| 128 | ( base_d, 0x0 ) | // don't care |
| 129 | ( base_c, 0x4 ) | // physical RAM cached area |
| 130 | ( base_b, 0xb ) | // uncached on-chip registers |
| 131 | ( base_a, 0x0 ) | // don't care |
| 132 | ( base_9, 0x0 ) | // don't care |
| 133 | ( base_8, 0x0 ) ); // don't care |
| 134 | |
| 135 | R_MMU_KBASE_LO = ( ( base_7, 0x0 ) | // don't care |
| 136 | ( base_6, 0x0 ) | // don't care |
| 137 | ( base_5, 0x0 ) | // don't care |
| 138 | ( base_4, 0x0 ) | // don't care |
| 139 | ( base_3, 0x0 ) | // don't care |
| 140 | ( base_2, 0x0 ) | // don't care |
| 141 | ( base_1, 0x0 ) | // don't care |
| 142 | ( base_0, 0x0 ) ); // don't care |
| 143 | |
| 144 | NOTE: while setting up the MMU, we run in a non-mapped mode in the DRAM (0x40 |
| 145 | segment) and need to setup the seg_4 to a unity mapping, so that we don't get |
| 146 | a fault before we have had time to jump into the real kernel segment (0xc0). This |
| 147 | is done in head.S temporarily, but fixed by the kernel later in paging_init. |
| 148 | |
| 149 | |
| 150 | Paging - PTE's, PMD's and PGD's |
| 151 | ------------------------------- |
| 152 | |
| 153 | [ References: asm/pgtable.h, asm/page.h, asm/mmu.h ] |
| 154 | |
| 155 | The paging mechanism uses virtual addresses to split a process memory-space into |
| 156 | pages, a page being the smallest unit that can be freely remapped in memory. On |
| 157 | Linux/CRIS, a page is 8192 bytes (for technical reasons not equal to 4096 as in |
| 158 | most other 32-bit architectures). It would be inefficient to let a virtual memory |
| 159 | mapping be controlled by a long table of page mappings, so it is broken down into |
| 160 | a 2-level structure with a Page Directory containing pointers to Page Tables which |
| 161 | each have maps of up to 2048 pages (8192 / sizeof(void *)). Linux can actually |
| 162 | handle 3-level structures as well, with a Page Middle Directory in between, but |
| 163 | in many cases, this is folded into a two-level structure by excluding the Middle |
| 164 | Directory. |
| 165 | |
| 166 | We'll take a look at how an address is translated while we discuss how it's handled |
| 167 | in the Linux kernel. |
| 168 | |
| 169 | The example address is 0xd004000c; in binary this is: |
| 170 | |
| 171 | 31 23 15 7 0 |
| 172 | 11010000 00000100 00000000 00001100 |
| 173 | |
| 174 | |______| |__________||____________| |
| 175 | PGD PTE page offset |
| 176 | |
| 177 | Given the top-level Page Directory, the offset in that directory is calculated |
| 178 | using the upper 8 bits: |
| 179 | |
Adrian Bunk | d9b5444 | 2005-11-07 00:58:44 -0800 | [diff] [blame] | 180 | static inline pgd_t * pgd_offset(struct mm_struct * mm, unsigned long address) |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 181 | { |
| 182 | return mm->pgd + (address >> PGDIR_SHIFT); |
| 183 | } |
| 184 | |
| 185 | PGDIR_SHIFT is the log2 of the amount of memory an entry in the PGD can map; in our |
| 186 | case it is 24, corresponding to 16 MB. This means that each entry in the PGD |
| 187 | corresponds to 16 MB of virtual memory. |
| 188 | |
| 189 | The pgd_t from our example will therefore be the 208'th (0xd0) entry in mm->pgd. |
| 190 | |
| 191 | Since the Middle Directory does not exist, it is a unity mapping: |
| 192 | |
Adrian Bunk | d9b5444 | 2005-11-07 00:58:44 -0800 | [diff] [blame] | 193 | static inline pmd_t * pmd_offset(pgd_t * dir, unsigned long address) |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 194 | { |
| 195 | return (pmd_t *) dir; |
| 196 | } |
| 197 | |
| 198 | The Page Table provides the final lookup by using bits 13 to 23 as index: |
| 199 | |
Adrian Bunk | d9b5444 | 2005-11-07 00:58:44 -0800 | [diff] [blame] | 200 | static inline pte_t * pte_offset(pmd_t * dir, unsigned long address) |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 201 | { |
| 202 | return (pte_t *) pmd_page(*dir) + ((address >> PAGE_SHIFT) & |
| 203 | (PTRS_PER_PTE - 1)); |
| 204 | } |
| 205 | |
| 206 | PAGE_SHIFT is the log2 of the size of a page; 13 in our case. PTRS_PER_PTE is |
| 207 | the number of pointers that fit in a Page Table and is used to mask off the |
| 208 | PGD-part of the address. |
| 209 | |
| 210 | The so-far unused bits 0 to 12 are used to index inside a page linearily. |
| 211 | |
| 212 | The VM system |
| 213 | ------------- |
| 214 | |
| 215 | The kernels own page-directory is the swapper_pg_dir, cleared in paging_init, |
| 216 | and contains the kernels virtual mappings (the kernel itself is not paged - it |
| 217 | is mapped linearily using kseg_c as described above). Architectures without |
| 218 | kernel segments like the i386, need to setup swapper_pg_dir directly in head.S |
| 219 | to map the kernel itself. swapper_pg_dir is pointed to by init_mm.pgd as the |
| 220 | init-task's PGD. |
| 221 | |
| 222 | To see what support functions are used to setup a page-table, let's look at the |
| 223 | kernel's internal paged memory system, vmalloc/vfree. |
| 224 | |
| 225 | void * vmalloc(unsigned long size) |
| 226 | |
| 227 | The vmalloc-system keeps a paged segment in kernel-space at 0xd0000000. What |
| 228 | happens first is that a virtual address chunk is allocated to the request using |
| 229 | get_vm_area(size). After that, physical RAM pages are allocated and put into |
| 230 | the kernel's page-table using alloc_area_pages(addr, size). |
| 231 | |
| 232 | static int alloc_area_pages(unsigned long address, unsigned long size) |
| 233 | |
| 234 | First the PGD entry is found using init_mm.pgd. This is passed to |
| 235 | alloc_area_pmd (remember the 3->2 folding). It uses pte_alloc_kernel to |
| 236 | check if the PGD entry points anywhere - if not, a page table page is |
| 237 | allocated and the PGD entry updated. Then the alloc_area_pte function is |
| 238 | used just like alloc_area_pmd to check which page table entry is desired, |
| 239 | and a physical page is allocated and the table entry updated. All of this |
| 240 | is repeated at the top-level until the entire address range specified has |
| 241 | been mapped. |
| 242 | |
| 243 | |
| 244 | |