blob: 5a4dea6abebdd44a3cdb5767639f0b9a150b1b0a [file] [log] [blame]
Bjorn Helgaas32e62c62006-05-05 17:19:50 -06001 MEMORY ATTRIBUTE ALIASING ON IA-64
2
3 Bjorn Helgaas
4 <bjorn.helgaas@hp.com>
5 May 4, 2006
6
7
8MEMORY ATTRIBUTES
9
10 Itanium supports several attributes for virtual memory references.
11 The attribute is part of the virtual translation, i.e., it is
12 contained in the TLB entry. The ones of most interest to the Linux
13 kernel are:
14
15 WB Write-back (cacheable)
16 UC Uncacheable
17 WC Write-coalescing
18
19 System memory typically uses the WB attribute. The UC attribute is
20 used for memory-mapped I/O devices. The WC attribute is uncacheable
21 like UC is, but writes may be delayed and combined to increase
22 performance for things like frame buffers.
23
24 The Itanium architecture requires that we avoid accessing the same
25 page with both a cacheable mapping and an uncacheable mapping[1].
26
27 The design of the chipset determines which attributes are supported
28 on which regions of the address space. For example, some chipsets
29 support either WB or UC access to main memory, while others support
30 only WB access.
31
32MEMORY MAP
33
34 Platform firmware describes the physical memory map and the
35 supported attributes for each region. At boot-time, the kernel uses
36 the EFI GetMemoryMap() interface. ACPI can also describe memory
37 devices and the attributes they support, but Linux/ia64 currently
38 doesn't use this information.
39
40 The kernel uses the efi_memmap table returned from GetMemoryMap() to
41 learn the attributes supported by each region of physical address
42 space. Unfortunately, this table does not completely describe the
43 address space because some machines omit some or all of the MMIO
44 regions from the map.
45
46 The kernel maintains another table, kern_memmap, which describes the
47 memory Linux is actually using and the attribute for each region.
48 This contains only system memory; it does not contain MMIO space.
49
50 The kern_memmap table typically contains only a subset of the system
51 memory described by the efi_memmap. Linux/ia64 can't use all memory
52 in the system because of constraints imposed by the identity mapping
53 scheme.
54
55 The efi_memmap table is preserved unmodified because the original
56 boot-time information is required for kexec.
57
58KERNEL IDENTITY MAPPINGS
59
60 Linux/ia64 identity mappings are done with large pages, currently
61 either 16MB or 64MB, referred to as "granules." Cacheable mappings
62 are speculative[2], so the processor can read any location in the
63 page at any time, independent of the programmer's intentions. This
64 means that to avoid attribute aliasing, Linux can create a cacheable
65 identity mapping only when the entire granule supports cacheable
66 access.
67
68 Therefore, kern_memmap contains only full granule-sized regions that
69 can referenced safely by an identity mapping.
70
71 Uncacheable mappings are not speculative, so the processor will
72 generate UC accesses only to locations explicitly referenced by
73 software. This allows UC identity mappings to cover granules that
74 are only partially populated, or populated with a combination of UC
75 and WB regions.
76
77USER MAPPINGS
78
79 User mappings are typically done with 16K or 64K pages. The smaller
80 page size allows more flexibility because only 16K or 64K has to be
81 homogeneous with respect to memory attributes.
82
83POTENTIAL ATTRIBUTE ALIASING CASES
84
85 There are several ways the kernel creates new mappings:
86
87 mmap of /dev/mem
88
89 This uses remap_pfn_range(), which creates user mappings. These
90 mappings may be either WB or UC. If the region being mapped
91 happens to be in kern_memmap, meaning that it may also be mapped
92 by a kernel identity mapping, the user mapping must use the same
93 attribute as the kernel mapping.
94
95 If the region is not in kern_memmap, the user mapping should use
96 an attribute reported as being supported in the EFI memory map.
97
98 Since the EFI memory map does not describe MMIO on some
99 machines, this should use an uncacheable mapping as a fallback.
100
101 mmap of /sys/class/pci_bus/.../legacy_mem
102
103 This is very similar to mmap of /dev/mem, except that legacy_mem
104 only allows mmap of the one megabyte "legacy MMIO" area for a
105 specific PCI bus. Typically this is the first megabyte of
106 physical address space, but it may be different on machines with
107 several VGA devices.
108
109 "X" uses this to access VGA frame buffers. Using legacy_mem
110 rather than /dev/mem allows multiple instances of X to talk to
111 different VGA cards.
112
113 The /dev/mem mmap constraints apply.
114
Alex Chiang012b7102007-07-11 11:02:15 -0600115 mmap of /proc/bus/pci/.../??.?
116
117 This is an MMIO mmap of PCI functions, which additionally may or
118 may not be requested as using the WC attribute.
119
120 If WC is requested, and the region in kern_memmap is either WC
121 or UC, and the EFI memory map designates the region as WC, then
122 the WC mapping is allowed.
123
124 Otherwise, the user mapping must use the same attribute as the
125 kernel mapping.
126
Bjorn Helgaas32e62c62006-05-05 17:19:50 -0600127 read/write of /dev/mem
128
129 This uses copy_from_user(), which implicitly uses a kernel
130 identity mapping. This is obviously safe for things in
131 kern_memmap.
132
133 There may be corner cases of things that are not in kern_memmap,
134 but could be accessed this way. For example, registers in MMIO
135 space are not in kern_memmap, but could be accessed with a UC
136 mapping. This would not cause attribute aliasing. But
137 registers typically can be accessed only with four-byte or
138 eight-byte accesses, and the copy_from_user() path doesn't allow
139 any control over the access size, so this would be dangerous.
140
141 ioremap()
142
Bjorn Helgaasddd83ef2007-03-30 10:39:42 -0600143 This returns a mapping for use inside the kernel.
Bjorn Helgaas32e62c62006-05-05 17:19:50 -0600144
145 If the region is in kern_memmap, we should use the attribute
Bjorn Helgaasddd83ef2007-03-30 10:39:42 -0600146 specified there.
147
148 If the EFI memory map reports that the entire granule supports
149 WB, we should use that (granules that are partially reserved
150 or occupied by firmware do not appear in kern_memmap).
151
152 If the granule contains non-WB memory, but we can cover the
153 region safely with kernel page table mappings, we can use
154 ioremap_page_range() as most other architectures do.
155
156 Failing all of the above, we have to fall back to a UC mapping.
Bjorn Helgaas32e62c62006-05-05 17:19:50 -0600157
158PAST PROBLEM CASES
159
160 mmap of various MMIO regions from /dev/mem by "X" on Intel platforms
161
162 The EFI memory map may not report these MMIO regions.
163
164 These must be allowed so that X will work. This means that
165 when the EFI memory map is incomplete, every /dev/mem mmap must
166 succeed. It may create either WB or UC user mappings, depending
167 on whether the region is in kern_memmap or the EFI memory map.
168
Bjorn Helgaasddd83ef2007-03-30 10:39:42 -0600169 mmap of 0x0-0x9FFFF /dev/mem by "hwinfo" on HP sx1000 with VGA enabled
Bjorn Helgaas32e62c62006-05-05 17:19:50 -0600170
Bjorn Helgaas32e62c62006-05-05 17:19:50 -0600171 The EFI memory map reports the following attributes:
172 0x00000-0x9FFFF WB only
173 0xA0000-0xBFFFF UC only (VGA frame buffer)
174 0xC0000-0xFFFFF WB only
175
176 This mmap is done with user pages, not kernel identity mappings,
177 so it is safe to use WB mappings.
178
179 The kernel VGA driver may ioremap the VGA frame buffer at 0xA0000,
Bjorn Helgaasddd83ef2007-03-30 10:39:42 -0600180 which uses a granule-sized UC mapping. This granule will cover some
181 WB-only memory, but since UC is non-speculative, the processor will
182 never generate an uncacheable reference to the WB-only areas unless
183 the driver explicitly touches them.
Bjorn Helgaas32e62c62006-05-05 17:19:50 -0600184
185 mmap of 0x0-0xFFFFF legacy_mem by "X"
186
Bjorn Helgaasddd83ef2007-03-30 10:39:42 -0600187 If the EFI memory map reports that the entire range supports the
188 same attributes, we can allow the mmap (and we will prefer WB if
189 supported, as is the case with HP sx[12]000 machines with VGA
190 disabled).
Bjorn Helgaas32e62c62006-05-05 17:19:50 -0600191
Bjorn Helgaasddd83ef2007-03-30 10:39:42 -0600192 If EFI reports the range as partly WB and partly UC (as on sx[12]000
193 machines with VGA enabled), we must fail the mmap because there's no
194 safe attribute to use.
Bjorn Helgaas32e62c62006-05-05 17:19:50 -0600195
Bjorn Helgaasddd83ef2007-03-30 10:39:42 -0600196 If EFI reports some of the range but not all (as on Intel firmware
197 that doesn't report the VGA frame buffer at all), we should fail the
198 mmap and force the user to map just the specific region of interest.
Bjorn Helgaas32e62c62006-05-05 17:19:50 -0600199
200 mmap of 0xA0000-0xBFFFF legacy_mem by "X" on HP sx1000 with VGA disabled
201
202 The EFI memory map reports the following attributes:
203 0x00000-0xFFFFF WB only (no VGA MMIO hole)
204
205 This is a special case of the previous case, and the mmap should
206 fail for the same reason as above.
207
Bjorn Helgaasddd83ef2007-03-30 10:39:42 -0600208 read of /sys/devices/.../rom
209
210 For VGA devices, this may cause an ioremap() of 0xC0000. This
211 used to be done with a UC mapping, because the VGA frame buffer
212 at 0xA0000 prevents use of a WB granule. The UC mapping causes
213 an MCA on HP sx[12]000 chipsets.
214
215 We should use WB page table mappings to avoid covering the VGA
216 frame buffer.
217
Bjorn Helgaas32e62c62006-05-05 17:19:50 -0600218NOTES
219
220 [1] SDM rev 2.2, vol 2, sec 4.4.1.
221 [2] SDM rev 2.2, vol 2, sec 4.4.6.