Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 1 | Started Nov 1999 by Kanoj Sarcar <kanoj@sgi.com> |
| 2 | |
| 3 | The intent of this file is to have an uptodate, running commentary |
| 4 | from different people about NUMA specific code in the Linux vm. |
| 5 | |
| 6 | What is NUMA? It is an architecture where the memory access times |
| 7 | for different regions of memory from a given processor varies |
| 8 | according to the "distance" of the memory region from the processor. |
| 9 | Each region of memory to which access times are the same from any |
| 10 | cpu, is called a node. On such architectures, it is beneficial if |
| 11 | the kernel tries to minimize inter node communications. Schemes |
| 12 | for this range from kernel text and read-only data replication |
| 13 | across nodes, and trying to house all the data structures that |
| 14 | key components of the kernel need on memory on that node. |
| 15 | |
| 16 | Currently, all the numa support is to provide efficient handling |
| 17 | of widely discontiguous physical memory, so architectures which |
| 18 | are not NUMA but can have huge holes in the physical address space |
| 19 | can use the same code. All this code is bracketed by CONFIG_DISCONTIGMEM. |
| 20 | |
| 21 | The initial port includes NUMAizing the bootmem allocator code by |
| 22 | encapsulating all the pieces of information into a bootmem_data_t |
| 23 | structure. Node specific calls have been added to the allocator. |
| 24 | In theory, any platform which uses the bootmem allocator should |
| 25 | be able to to put the bootmem and mem_map data structures anywhere |
| 26 | it deems best. |
| 27 | |
| 28 | Each node's page allocation data structures have also been encapsulated |
| 29 | into a pg_data_t. The bootmem_data_t is just one part of this. To |
| 30 | make the code look uniform between NUMA and regular UMA platforms, |
| 31 | UMA platforms have a statically allocated pg_data_t too (contig_page_data). |
| 32 | For the sake of uniformity, the function num_online_nodes() is also defined |
| 33 | for all platforms. As we run benchmarks, we might decide to NUMAize |
| 34 | more variables like low_on_memory, nr_free_pages etc into the pg_data_t. |
| 35 | |
| 36 | The NUMA aware page allocation code currently tries to allocate pages |
| 37 | from different nodes in a round robin manner. This will be changed to |
| 38 | do concentratic circle search, starting from current node, once the |
| 39 | NUMA port achieves more maturity. The call alloc_pages_node has been |
| 40 | added, so that drivers can make the call and not worry about whether |
| 41 | it is running on a NUMA or UMA platform. |