blob: 40634b0db9f754fe822e63856d4a895cdca4100d [file] [log] [blame]
Rusty Russell2e04ef72009-07-30 16:03:45 -06001/*P:900
Rusty Russella91d74a2009-07-30 16:03:45 -06002 * This is the Switcher: code which sits at 0xFFC00000 (or 0xFFE00000) astride
3 * both the Host and Guest to do the low-level Guest<->Host switch. It is as
4 * simple as it can be made, but it's naturally very specific to x86.
Rusty Russellf938d2c2007-07-26 10:41:02 -07005 *
6 * You have now completed Preparation. If this has whet your appetite; if you
7 * are feeling invigorated and refreshed then the next, more challenging stage
Rusty Russell2e04ef72009-07-30 16:03:45 -06008 * can be found in "make Guest".
9 :*/
Rusty Russelld7e28ff2007-07-19 01:49:23 -070010
Rusty Russell2e04ef72009-07-30 16:03:45 -060011/*M:012
12 * Lguest is meant to be simple: my rule of thumb is that 1% more LOC must
Rusty Russelle1e72962007-10-25 15:02:50 +100013 * gain at least 1% more performance. Since neither LOC nor performance can be
14 * measured beforehand, it generally means implementing a feature then deciding
15 * if it's worth it. And once it's implemented, who can say no?
16 *
17 * This is why I haven't implemented this idea myself. I want to, but I
18 * haven't. You could, though.
19 *
20 * The main place where lguest performance sucks is Guest page faulting. When
21 * a Guest userspace process hits an unmapped page we switch back to the Host,
22 * walk the page tables, find it's not mapped, switch back to the Guest page
23 * fault handler, which calls a hypercall to set the page table entry, then
24 * finally returns to userspace. That's two round-trips.
25 *
26 * If we had a small walker in the Switcher, we could quickly check the Guest
27 * page table and if the page isn't mapped, immediately reflect the fault back
28 * into the Guest. This means the Switcher would have to know the top of the
29 * Guest page table and the page fault handler address.
30 *
31 * For simplicity, the Guest should only handle the case where the privilege
32 * level of the fault is 3 and probably only not present or write faults. It
33 * should also detect recursive faults, and hand the original fault to the
34 * Host (which is actually really easy).
35 *
36 * Two questions remain. Would the performance gain outweigh the complexity?
Rusty Russell2e04ef72009-07-30 16:03:45 -060037 * And who would write the verse documenting it?
38:*/
Rusty Russelle1e72962007-10-25 15:02:50 +100039
Rusty Russell2e04ef72009-07-30 16:03:45 -060040/*M:011
41 * Lguest64 handles NMI. This gave me NMI envy (until I looked at their
Rusty Russelle1e72962007-10-25 15:02:50 +100042 * code). It's worth doing though, since it would let us use oprofile in the
Rusty Russell2e04ef72009-07-30 16:03:45 -060043 * Host when a Guest is running.
44:*/
Rusty Russelle1e72962007-10-25 15:02:50 +100045
Rusty Russellf8f0fdc2007-07-26 10:41:04 -070046/*S:100
47 * Welcome to the Switcher itself!
48 *
49 * This file contains the low-level code which changes the CPU to run the Guest
50 * code, and returns to the Host when something happens. Understand this, and
51 * you understand the heart of our journey.
52 *
53 * Because this is in assembler rather than C, our tale switches from prose to
54 * verse. First I tried limericks:
55 *
56 * There once was an eax reg,
57 * To which our pointer was fed,
58 * It needed an add,
59 * Which asm-offsets.h had
60 * But this limerick is hurting my head.
61 *
62 * Next I tried haikus, but fitting the required reference to the seasons in
63 * every stanza was quickly becoming tiresome:
64 *
65 * The %eax reg
66 * Holds "struct lguest_pages" now:
67 * Cherry blossoms fall.
68 *
69 * Then I started with Heroic Verse, but the rhyming requirement leeched away
70 * the content density and led to some uniquely awful oblique rhymes:
71 *
72 * These constants are coming from struct offsets
73 * For use within the asm switcher text.
74 *
75 * Finally, I settled for something between heroic hexameter, and normal prose
76 * with inappropriate linebreaks. Anyway, it aint no Shakespeare.
77 */
78
79// Not all kernel headers work from assembler
80// But these ones are needed: the ENTRY() define
81// And constants extracted from struct offsets
82// To avoid magic numbers and breakage:
83// Should they change the compiler can't save us
84// Down here in the depths of assembler code.
Rusty Russelld7e28ff2007-07-19 01:49:23 -070085#include <linux/linkage.h>
86#include <asm/asm-offsets.h>
Rusty Russell0d027c02007-08-09 20:57:13 +100087#include <asm/page.h>
Jes Sorensen625efab2007-10-22 11:03:28 +100088#include <asm/segment.h>
89#include <asm/lguest.h>
Rusty Russelld7e28ff2007-07-19 01:49:23 -070090
Rusty Russellf8f0fdc2007-07-26 10:41:04 -070091// We mark the start of the code to copy
92// It's placed in .text tho it's never run here
93// You'll see the trick macro at the end
94// Which interleaves data and text to effect.
Rusty Russelld7e28ff2007-07-19 01:49:23 -070095.text
96ENTRY(start_switcher_text)
97
Rusty Russellf8f0fdc2007-07-26 10:41:04 -070098// When we reach switch_to_guest we have just left
99// The safe and comforting shores of C code
100// %eax has the "struct lguest_pages" to use
101// Where we save state and still see it from the Guest
102// And %ebx holds the Guest shadow pagetable:
103// Once set we have truly left Host behind.
Rusty Russelld7e28ff2007-07-19 01:49:23 -0700104ENTRY(switch_to_guest)
Rusty Russellf8f0fdc2007-07-26 10:41:04 -0700105 // We told gcc all its regs could fade,
106 // Clobbered by our journey into the Guest
107 // We could have saved them, if we tried
108 // But time is our master and cycles count.
109
110 // Segment registers must be saved for the Host
111 // We push them on the Host stack for later
Rusty Russelld7e28ff2007-07-19 01:49:23 -0700112 pushl %es
113 pushl %ds
114 pushl %gs
115 pushl %fs
Rusty Russellf8f0fdc2007-07-26 10:41:04 -0700116 // But the compiler is fickle, and heeds
117 // No warning of %ebp clobbers
118 // When frame pointers are used. That register
119 // Must be saved and restored or chaos strikes.
Rusty Russelld7e28ff2007-07-19 01:49:23 -0700120 pushl %ebp
Rusty Russellf8f0fdc2007-07-26 10:41:04 -0700121 // The Host's stack is done, now save it away
122 // In our "struct lguest_pages" at offset
123 // Distilled into asm-offsets.h
Rusty Russelld7e28ff2007-07-19 01:49:23 -0700124 movl %esp, LGUEST_PAGES_host_sp(%eax)
Rusty Russellf8f0fdc2007-07-26 10:41:04 -0700125
126 // All saved and there's now five steps before us:
127 // Stack, GDT, IDT, TSS
Rusty Russelle1e72962007-10-25 15:02:50 +1000128 // Then last of all the page tables are flipped.
Rusty Russellf8f0fdc2007-07-26 10:41:04 -0700129
130 // Yet beware that our stack pointer must be
131 // Always valid lest an NMI hits
132 // %edx does the duty here as we juggle
133 // %eax is lguest_pages: our stack lies within.
Rusty Russelld7e28ff2007-07-19 01:49:23 -0700134 movl %eax, %edx
135 addl $LGUEST_PAGES_regs, %edx
136 movl %edx, %esp
Rusty Russellf8f0fdc2007-07-26 10:41:04 -0700137
138 // The Guest's GDT we so carefully
139 // Placed in the "struct lguest_pages" before
Rusty Russelld7e28ff2007-07-19 01:49:23 -0700140 lgdt LGUEST_PAGES_guest_gdt_desc(%eax)
Rusty Russellf8f0fdc2007-07-26 10:41:04 -0700141
142 // The Guest's IDT we did partially
Rusty Russelle1e72962007-10-25 15:02:50 +1000143 // Copy to "struct lguest_pages" as well.
Rusty Russelld7e28ff2007-07-19 01:49:23 -0700144 lidt LGUEST_PAGES_guest_idt_desc(%eax)
Rusty Russellf8f0fdc2007-07-26 10:41:04 -0700145
146 // The TSS entry which controls traps
147 // Must be loaded up with "ltr" now:
Rusty Russelle1e72962007-10-25 15:02:50 +1000148 // The GDT entry that TSS uses
149 // Changes type when we load it: damn Intel!
Rusty Russellf8f0fdc2007-07-26 10:41:04 -0700150 // For after we switch over our page tables
Rusty Russelle1e72962007-10-25 15:02:50 +1000151 // That entry will be read-only: we'd crash.
Rusty Russelld7e28ff2007-07-19 01:49:23 -0700152 movl $(GDT_ENTRY_TSS*8), %edx
153 ltr %dx
Rusty Russellf8f0fdc2007-07-26 10:41:04 -0700154
155 // Look back now, before we take this last step!
156 // The Host's TSS entry was also marked used;
Rusty Russelle1e72962007-10-25 15:02:50 +1000157 // Let's clear it again for our return.
Rusty Russellf8f0fdc2007-07-26 10:41:04 -0700158 // The GDT descriptor of the Host
159 // Points to the table after two "size" bytes
Rusty Russelld7e28ff2007-07-19 01:49:23 -0700160 movl (LGUEST_PAGES_host_gdt_desc+2)(%eax), %edx
Rusty Russelle1e72962007-10-25 15:02:50 +1000161 // Clear "used" from type field (byte 5, bit 2)
Rusty Russelld7e28ff2007-07-19 01:49:23 -0700162 andb $0xFD, (GDT_ENTRY_TSS*8 + 5)(%edx)
Rusty Russellf8f0fdc2007-07-26 10:41:04 -0700163
164 // Once our page table's switched, the Guest is live!
165 // The Host fades as we run this final step.
166 // Our "struct lguest_pages" is now read-only.
Rusty Russelld7e28ff2007-07-19 01:49:23 -0700167 movl %ebx, %cr3
Rusty Russellf8f0fdc2007-07-26 10:41:04 -0700168
169 // The page table change did one tricky thing:
170 // The Guest's register page has been mapped
Rusty Russelle1e72962007-10-25 15:02:50 +1000171 // Writable under our %esp (stack) --
Rusty Russellf8f0fdc2007-07-26 10:41:04 -0700172 // We can simply pop off all Guest regs.
Jes Sorensen4614a3a2007-10-22 11:03:29 +1000173 popl %eax
Rusty Russelld7e28ff2007-07-19 01:49:23 -0700174 popl %ebx
175 popl %ecx
176 popl %edx
177 popl %esi
178 popl %edi
179 popl %ebp
180 popl %gs
Rusty Russelld7e28ff2007-07-19 01:49:23 -0700181 popl %fs
182 popl %ds
183 popl %es
Rusty Russellf8f0fdc2007-07-26 10:41:04 -0700184
185 // Near the base of the stack lurk two strange fields
186 // Which we fill as we exit the Guest
187 // These are the trap number and its error
188 // We can simply step past them on our way.
Rusty Russelld7e28ff2007-07-19 01:49:23 -0700189 addl $8, %esp
Rusty Russellf8f0fdc2007-07-26 10:41:04 -0700190
191 // The last five stack slots hold return address
Rusty Russelle1e72962007-10-25 15:02:50 +1000192 // And everything needed to switch privilege
193 // From Switcher's level 0 to Guest's 1,
Rusty Russellf8f0fdc2007-07-26 10:41:04 -0700194 // And the stack where the Guest had last left it.
195 // Interrupts are turned back on: we are Guest.
Rusty Russelld7e28ff2007-07-19 01:49:23 -0700196 iret
197
Rusty Russella6bd8e12008-03-28 11:05:53 -0500198// We tread two paths to switch back to the Host
Rusty Russelle1e72962007-10-25 15:02:50 +1000199// Yet both must save Guest state and restore Host
Rusty Russellf8f0fdc2007-07-26 10:41:04 -0700200// So we put the routine in a macro.
Rusty Russelld7e28ff2007-07-19 01:49:23 -0700201#define SWITCH_TO_HOST \
Rusty Russellf8f0fdc2007-07-26 10:41:04 -0700202 /* We save the Guest state: all registers first \
203 * Laid out just as "struct lguest_regs" defines */ \
Rusty Russelld7e28ff2007-07-19 01:49:23 -0700204 pushl %es; \
205 pushl %ds; \
206 pushl %fs; \
Rusty Russelld7e28ff2007-07-19 01:49:23 -0700207 pushl %gs; \
208 pushl %ebp; \
209 pushl %edi; \
210 pushl %esi; \
211 pushl %edx; \
212 pushl %ecx; \
213 pushl %ebx; \
Jes Sorensen4614a3a2007-10-22 11:03:29 +1000214 pushl %eax; \
Rusty Russellf8f0fdc2007-07-26 10:41:04 -0700215 /* Our stack and our code are using segments \
216 * Set in the TSS and IDT \
217 * Yet if we were to touch data we'd use \
218 * Whatever data segment the Guest had. \
219 * Load the lguest ds segment for now. */ \
Rusty Russelld7e28ff2007-07-19 01:49:23 -0700220 movl $(LGUEST_DS), %eax; \
221 movl %eax, %ds; \
Rusty Russellf8f0fdc2007-07-26 10:41:04 -0700222 /* So where are we? Which CPU, which struct? \
Rusty Russell0d027c02007-08-09 20:57:13 +1000223 * The stack is our clue: our TSS starts \
224 * It at the end of "struct lguest_pages". \
225 * Or we may have stumbled while restoring \
226 * Our Guest segment regs while in switch_to_guest, \
227 * The fault pushed atop that part-unwound stack. \
228 * If we round the stack down to the page start \
229 * We're at the start of "struct lguest_pages". */ \
Rusty Russelld7e28ff2007-07-19 01:49:23 -0700230 movl %esp, %eax; \
Rusty Russell0d027c02007-08-09 20:57:13 +1000231 andl $(~(1 << PAGE_SHIFT - 1)), %eax; \
Rusty Russellf8f0fdc2007-07-26 10:41:04 -0700232 /* Save our trap number: the switch will obscure it \
Rusty Russelle1e72962007-10-25 15:02:50 +1000233 * (In the Host the Guest regs are not mapped here) \
Rusty Russellf8f0fdc2007-07-26 10:41:04 -0700234 * %ebx holds it safe for deliver_to_host */ \
Rusty Russelld7e28ff2007-07-19 01:49:23 -0700235 movl LGUEST_PAGES_regs_trapnum(%eax), %ebx; \
Rusty Russellf8f0fdc2007-07-26 10:41:04 -0700236 /* The Host GDT, IDT and stack! \
237 * All these lie safely hidden from the Guest: \
238 * We must return to the Host page tables \
239 * (Hence that was saved in struct lguest_pages) */ \
Rusty Russelld7e28ff2007-07-19 01:49:23 -0700240 movl LGUEST_PAGES_host_cr3(%eax), %edx; \
241 movl %edx, %cr3; \
Rusty Russellf8f0fdc2007-07-26 10:41:04 -0700242 /* As before, when we looked back at the Host \
243 * As we left and marked TSS unused \
244 * So must we now for the Guest left behind. */ \
Rusty Russelld7e28ff2007-07-19 01:49:23 -0700245 andb $0xFD, (LGUEST_PAGES_guest_gdt+GDT_ENTRY_TSS*8+5)(%eax); \
Rusty Russellf8f0fdc2007-07-26 10:41:04 -0700246 /* Switch to Host's GDT, IDT. */ \
Rusty Russelld7e28ff2007-07-19 01:49:23 -0700247 lgdt LGUEST_PAGES_host_gdt_desc(%eax); \
248 lidt LGUEST_PAGES_host_idt_desc(%eax); \
Rusty Russelle1e72962007-10-25 15:02:50 +1000249 /* Restore the Host's stack where its saved regs lie */ \
Rusty Russelld7e28ff2007-07-19 01:49:23 -0700250 movl LGUEST_PAGES_host_sp(%eax), %esp; \
Rusty Russelle1e72962007-10-25 15:02:50 +1000251 /* Last the TSS: our Host is returned */ \
Rusty Russelld7e28ff2007-07-19 01:49:23 -0700252 movl $(GDT_ENTRY_TSS*8), %edx; \
253 ltr %dx; \
Rusty Russellf8f0fdc2007-07-26 10:41:04 -0700254 /* Restore now the regs saved right at the first. */ \
Rusty Russelld7e28ff2007-07-19 01:49:23 -0700255 popl %ebp; \
256 popl %fs; \
257 popl %gs; \
258 popl %ds; \
259 popl %es
260
Rusty Russelle1e72962007-10-25 15:02:50 +1000261// The first path is trod when the Guest has trapped:
262// (Which trap it was has been pushed on the stack).
Rusty Russellf8f0fdc2007-07-26 10:41:04 -0700263// We need only switch back, and the Host will decode
264// Why we came home, and what needs to be done.
Rusty Russelld7e28ff2007-07-19 01:49:23 -0700265return_to_host:
266 SWITCH_TO_HOST
267 iret
268
Rusty Russelle1e72962007-10-25 15:02:50 +1000269// We are lead to the second path like so:
Rusty Russellf8f0fdc2007-07-26 10:41:04 -0700270// An interrupt, with some cause external
271// Has ajerked us rudely from the Guest's code
272// Again we must return home to the Host
Rusty Russelld7e28ff2007-07-19 01:49:23 -0700273deliver_to_host:
274 SWITCH_TO_HOST
Rusty Russellf8f0fdc2007-07-26 10:41:04 -0700275 // But now we must go home via that place
276 // Where that interrupt was supposed to go
277 // Had we not been ensconced, running the Guest.
Rusty Russelle1e72962007-10-25 15:02:50 +1000278 // Here we see the trickness of run_guest_once():
Rusty Russellf8f0fdc2007-07-26 10:41:04 -0700279 // The Host stack is formed like an interrupt
280 // With EIP, CS and EFLAGS layered.
281 // Interrupt handlers end with "iret"
282 // And that will take us home at long long last.
283
284 // But first we must find the handler to call!
285 // The IDT descriptor for the Host
286 // Has two bytes for size, and four for address:
287 // %edx will hold it for us for now.
Rusty Russelld7e28ff2007-07-19 01:49:23 -0700288 movl (LGUEST_PAGES_host_idt_desc+2)(%eax), %edx
Rusty Russellf8f0fdc2007-07-26 10:41:04 -0700289 // We now know the table address we need,
290 // And saved the trap's number inside %ebx.
291 // Yet the pointer to the handler is smeared
292 // Across the bits of the table entry.
293 // What oracle can tell us how to extract
294 // From such a convoluted encoding?
295 // I consulted gcc, and it gave
296 // These instructions, which I gladly credit:
Rusty Russelld7e28ff2007-07-19 01:49:23 -0700297 leal (%edx,%ebx,8), %eax
298 movzwl (%eax),%edx
299 movl 4(%eax), %eax
300 xorw %ax, %ax
301 orl %eax, %edx
Rusty Russellf8f0fdc2007-07-26 10:41:04 -0700302 // Now the address of the handler's in %edx
Rusty Russelle1e72962007-10-25 15:02:50 +1000303 // We call it now: its "iret" drops us home.
Rusty Russelld7e28ff2007-07-19 01:49:23 -0700304 jmp *%edx
305
Rusty Russellf8f0fdc2007-07-26 10:41:04 -0700306// Every interrupt can come to us here
307// But we must truly tell each apart.
308// They number two hundred and fifty six
309// And each must land in a different spot,
310// Push its number on stack, and join the stream.
311
312// And worse, a mere six of the traps stand apart
313// And push on their stack an addition:
314// An error number, thirty two bits long
315// So we punish the other two fifty
316// And make them push a zero so they match.
317
318// Yet two fifty six entries is long
319// And all will look most the same as the last
320// So we create a macro which can make
321// As many entries as we need to fill.
322
323// Note the change to .data then .text:
324// We plant the address of each entry
325// Into a (data) table for the Host
326// To know where each Guest interrupt should go.
Rusty Russelld7e28ff2007-07-19 01:49:23 -0700327.macro IRQ_STUB N TARGET
328 .data; .long 1f; .text; 1:
Rusty Russellf8f0fdc2007-07-26 10:41:04 -0700329 // Trap eight, ten through fourteen and seventeen
330 // Supply an error number. Else zero.
Rusty Russelld7e28ff2007-07-19 01:49:23 -0700331 .if (\N <> 8) && (\N < 10 || \N > 14) && (\N <> 17)
332 pushl $0
333 .endif
334 pushl $\N
335 jmp \TARGET
336 ALIGN
337.endm
338
Rusty Russellf8f0fdc2007-07-26 10:41:04 -0700339// This macro creates numerous entries
340// Using GAS macros which out-power C's.
Rusty Russelld7e28ff2007-07-19 01:49:23 -0700341.macro IRQ_STUBS FIRST LAST TARGET
342 irq=\FIRST
343 .rept \LAST-\FIRST+1
344 IRQ_STUB irq \TARGET
345 irq=irq+1
346 .endr
347.endm
348
Rusty Russellf8f0fdc2007-07-26 10:41:04 -0700349// Here's the marker for our pointer table
350// Laid in the data section just before
351// Each macro places the address of code
352// Forming an array: each one points to text
353// Which handles interrupt in its turn.
Rusty Russelld7e28ff2007-07-19 01:49:23 -0700354.data
355.global default_idt_entries
356default_idt_entries:
357.text
Rusty Russellf8f0fdc2007-07-26 10:41:04 -0700358 // The first two traps go straight back to the Host
359 IRQ_STUBS 0 1 return_to_host
360 // We'll say nothing, yet, about NMI
361 IRQ_STUB 2 handle_nmi
362 // Other traps also return to the Host
363 IRQ_STUBS 3 31 return_to_host
364 // All interrupts go via their handlers
365 IRQ_STUBS 32 127 deliver_to_host
366 // 'Cept system calls coming from userspace
367 // Are to go to the Guest, never the Host.
368 IRQ_STUB 128 return_to_host
369 IRQ_STUBS 129 255 deliver_to_host
Rusty Russelld7e28ff2007-07-19 01:49:23 -0700370
Rusty Russellf8f0fdc2007-07-26 10:41:04 -0700371// The NMI, what a fabulous beast
372// Which swoops in and stops us no matter that
373// We're suspended between heaven and hell,
374// (Or more likely between the Host and Guest)
375// When in it comes! We are dazed and confused
376// So we do the simplest thing which one can.
377// Though we've pushed the trap number and zero
378// We discard them, return, and hope we live.
Rusty Russelld7e28ff2007-07-19 01:49:23 -0700379handle_nmi:
380 addl $8, %esp
381 iret
382
Rusty Russellf8f0fdc2007-07-26 10:41:04 -0700383// We are done; all that's left is Mastery
384// And "make Mastery" is a journey long
385// Designed to make your fingers itch to code.
386
387// Here ends the text, the file and poem.
Rusty Russelld7e28ff2007-07-19 01:49:23 -0700388ENTRY(end_switcher_text)