blob: 2f6e38a19e8305ec92ec3d148243a4c932569d31 [file] [log] [blame]
David 'Digit' Turner454cb832014-04-12 01:02:51 +02001An overview of memory management in QEMU:
2
3I. RAM Management:
4==================
5
6I.1. RAM Address space:
7-----------------------
8
9All pages of virtual RAM used by QEMU at runtime are allocated from
10contiguous blocks in a specific abstract "RAM address space".
11|ram_addr_t| is the type of block addresses in this space.
12
13A single block of contiguous RAM is allocated with 'qemu_ram_alloc()', which
14takes a size in bytes, and allocates the pages through mmap() in the QEMU
15host process. It also sets up the corresponding KVM / Xen / HAX mappings,
16depending on each accelerator's specific needs.
17
18Each block has a name, which is used for snapshot support.
19
20'qemu_ram_alloc_from_ptr()' can also be used to allocated a new RAM
21block, by passing its content explicitly (can be useful for pages of
22ROM).
23
24'qemu_get_ram_ptr()' will translate a 'ram_addr_t' into the corresponding
25address in the QEMU host process. 'qemu_ram_addr_from_host()' does the
26opposite (i.e. translates a host address into a ram_addr_t if possible,
27or return an error).
28
29Note that ram_addr_t addresses are an internal implementation detail of
30QEMU, i.e. the virtual CPU never sees their values directly; it relies
31instead of addresses in its virtual physical address space, described
32in section II. below.
33
34As an example, when emulating an Android/x86 virtual device, the following
35RAM space is being used:
36
37 0x0000_0000 ... 0x1000_0000 "pc.ram"
38 0x1000_0000 ... 0x1002_0000 "bios.bin"
39 0x1002_0000 ... 0x1004_0000 "pc.rom"
40
41
42I.2. RAM Dirty tracking:
43------------------------
44
45QEMU also associates with each RAM page an 8-bit 'dirty' bitmap. The
46main idea is that whenever a page is written to, the value 0xff is
47written to the page's 'dirty' bitmap. Various clients can later inspect
48some of the flags and clear them. I.e.:
49
50 VGA_DIRTY_FLAG (0x1) is typically used by framebuffer drivers to detect
51 which pages of video RAM were touched since the latest VSYNC. The driver
52 typically copies the pixel values to the real QEMU output, then clears
53 the bits. This is very useful to avoid needless copies if nothing
54 changed in the framebuffer.
55
56 MIGRATION_DIRTY_FLAG (0x8) is used to tracked modified RAM pages during
57 live migration (i.e. moving a QEMU virtual machine from one host to
58 another)
59
60 CODE_DIRTY_FLAG (0x2) is a bit more special, and is used to support
61 self-modifying code properly. More on this later.
62
63
64II. The physical address space:
65===============================
66
67Represents the address space that the virtual CPU can read from / write to.
68|hwaddr| is the type of addresses in this space, which is decomposed
69into 'pages'. Each page in the address space is either unassigned, or
70mapped to a specific kind of memory region.
71
72See |phys_page_find()| and |phys_page_find_alloc()| in translate-all.c for
73the implementation details.
74
75
76II.1. Memory region types:
77--------------------------
78
79There are several memory region types:
80
81 - Regions of RAM pages.
82 - Regions of ROM pages (similar to RAM, but cannot be written to).
83 - Regions of I/O pages, used to communicate with virtual hardware.
84
85Virtual devices can register a new I/O region type by calling
86|cpu_register_io_memory()|. This function allows them to provide
87callbacks that will be invoked every time the virtual CPU reads from
88or writes to any page of the corresponding type.
89
90The memory region type of a given page is encoded using PAGE_BITS bits
91in the following format:
92
93 +-------------------------------+
94 | mem_type_index | flags |
95 +-------------------------------+
96
97Where |mem_type_index| is a unique value identifying a given memory
98region type, and |flags| is a 3-bit bitmap used to store flags that are
99only relevant for I/O pages.
100
101The following memory region type values are important:
102
103 IO_MEM_RAM (mem_type_index=0, flags=0):
104 Used for regular RAM pages, always all zero on purpose.
105
106 IO_MEM_ROM (mem_type_index=1, flags=0):
107 Used for ROM pages.
108
109 IO_MEM_UNASSIGNED (mem_type_index=2, flags=0):
110 Used to identify unassigned pages of the physical address space.
111
112 IO_MEM_NOTDIRTY (mem_type_index=3, flags=0):
113 Used to implement tracking of dirty RAM pages. This is essentially
114 used for RAM pages that have not been written to yet.
115
116Any mem_type_index value of 4 or higher corresponds to a device-specific
117I/O memory region type (i.e. with custom read/write callbaks, a
118corresponding 'opaque' value), and can also use the following bits
119in |flags|:
120
121 IO_MEM_ROMD (0x1):
122 Used for ROM-like I/O pages, i.e. they are backed by a page from
123 the RAM address space, but writing to them triggers a device-specific
124 write callback (instead of being ignored or faulting the CPU).
125
126 IO_MEM_SUBPAGE (0x02)
127 Used to indicate that not all addresses in this page map to the same
128 I/O region type / callbacks.
129
130 IO_MEM_SUBWIDTH (0x04)
131 Probably obsolete. Set to indicate that the corresponding I/O region
132 type doesn't support reading/writing values of all possible sizes
133 (1, 2 and 4 bytes). This seems to be never used by the current code.
134
135Note that cpu_register_io_memory() returns a new memory region type value.
136
137II.2. Physical address map:
138---------------------------
139
140QEMU maintains for each assigned page in the physical address space
141two values:
142
143 |phys_offset|, a combination of ram address and memory region type.
144
145 |region_offset|, an optional offset into the region backing the
146 page. This is only useful for I/O pages.
147
148The |phys_offset| value has many interesting encoding which require
149further clarification:
150
151 - Generally speaking, a phys_offset value is decomposed into
152 the following bit fields:
153
154 +-----------------------------------------------------+
155 | high_addr | mem_type |
156 +-----------------------------------------------------+
157
158 where |mem_type| is a PAGE_BITS memory region type as described
159 previously, and |high_addr| may contain the high bits of a
160 ram_addr_t address for RAM-backed pages.
161
162More specifically:
163
164 - Unassigned pages always have the special value IO_MEM_UNASSIGNED
165 (high_addr=0, mem_type=IO_MEM_UNASSIGNED)
166
167 - RAM pages have mem_type=0 (i.e. IO_MEM_RAM) while high_addr are
168 the high bits of the corresponding ram_addr_t. Hence, a simple call to
169 qemu_get_ram_ptr(phys_offset) will return the corresponding
170 address in host QEMU memory.
171
172 This is the reson why IO_MEM_RAM is always 0:
173
174 RAM page phys_offset value:
175 +-----------------------------------------------------+
176 | high_addr | 0 |
177 +-----------------------------------------------------+
178
179
180 - ROM pages are like RAM pages, but have mem_type=IO_MEM_ROM.
181 QEMU ensures that writing to such a page is a no-op, except on
182 some target architectures, like Sparc, this may cause a CPU fault.
183
184 ROM page phys_offset value:
185 +-----------------------------------------------------+
186 | high_addr | IO_MEM_ROM |
187 +-----------------------------------------------------+
188
189 - Dirty RAM page tracking is implemented by using special
190 phys_offset values with mem_type=IO_MEM_NOTDIRTY. Note that these
191 values do not appear directly in the physical page map, but in
192 the CPU TLB cache (explained later).
193
194 non-dirty RAM page phys_offset value (CPU TLB cache only):
195 +-----------------------------------------------------+
196 | high_addr | IO_MEM_NOTDIRTY |
197 +-----------------------------------------------------+
198
199 - Other pages are I/O pages, and their high_addr value will
200 be 0 / ignored:
201
202 I/O page phys_offset value:
203 +----------------------------------------------------------+
204 | 0 | mem_type_index | flags |
205 +----------------------------------------------------------+
206
207 Note that when reading from or writing to I/O pages, the lowest
208 PAGE_BITS bits of the corresponding hwaddr value will be added
209 to the page's |region_offset| value. This new address is passed
210 to the read/write callback as the 'i/o address' for the operation.
211
212 - As a special exception, if the I/O page's IO_MEM_ROMD flag is
213 set, then high_addr is not 0, but the high bits of the corresponding
214 ram_addr_t backing the page's contents on reads. On write operations
215 though, the I/O region type's write callback will be called instead.
216
217 ROMD I/O page phys_offset value:
218 +----------------------------------------------------------+
219 | high_addr | mem_type_index | flags |
220 +----------------------------------------------------------+
221
222 Note that |region_offset| is ignored when reading from such pages,
223 it's only used when writing to the I/O page.