[PATCH] uml: TLB operation batching

This adds VM op batching to skas0.  Rather than having a context switch to and
from the userspace stub for each address space change, we write a number of
operations to the stub data page and invoke a different stub which loops over
them and executes them all in one go.

The operations are stored as [ system call number, arg1, arg2, ... ] tuples.

The set is terminated by a system call number of 0.  Single operations, i.e.
page faults, are handled in the old way, since that is slightly more
efficient.

For a kernel build, a minority (~1/4) of the operations are part of a set.
These sets averaged ~100 in length, so for this quarter, the context switching
overhead is greatly reduced.

Signed-off-by: Jeff Dike <jdike@addtoit.com>
Cc: Paolo Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
diff --git a/arch/um/sys-x86_64/stub.S b/arch/um/sys-x86_64/stub.S
index 31c1492..957f2ef 100644
--- a/arch/um/sys-x86_64/stub.S
+++ b/arch/um/sys-x86_64/stub.S
@@ -13,3 +13,24 @@
 	or	%rcx, %rbx
 	movq	%rax, (%rbx)
 	int3
+
+	.globl batch_syscall_stub
+batch_syscall_stub:
+	movq	$(UML_CONFIG_STUB_DATA >> 32), %rbx
+	salq	$32, %rbx
+	movq	$(UML_CONFIG_STUB_DATA & 0xffffffff), %rcx
+	or	%rcx, %rbx
+	movq	%rbx, %rsp
+again:	pop	%rax
+	cmpq	$0, %rax
+jz	done
+	pop	%rdi
+	pop	%rsi
+	pop	%rdx
+	pop	%r10
+ 	pop	%r8
+	pop	%r9
+	syscall
+	mov	%rax, (%rbx)
+	jmp	again
+done:	int3