x86: atomic64: Reduce size of functions

cmpxchg8b is a huge instruction in terms of register footprint,
we almost never want to inline it, not even within the same
code module.

GCC 4.3 still messes up for two functions, under-judging the
true cost of this instruction - so annotate two key functions
to reduce the bloat:

arch/x86/lib/atomic64_32.o:

   text	   data	    bss	    dec	    hex	filename
   1763	      0	      0	   1763	    6e3	atomic64_32.o.before
    435	      0	      0	    435	    1b3	atomic64_32.o.after

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
LKML-Reference: <alpine.LFD.2.01.0907021653030.3210@localhost.localdomain>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
diff --git a/arch/x86/lib/atomic64_32.c b/arch/x86/lib/atomic64_32.c
index 6195962..a910238 100644
--- a/arch/x86/lib/atomic64_32.c
+++ b/arch/x86/lib/atomic64_32.c
@@ -4,7 +4,7 @@
 #include <asm/cmpxchg.h>
 #include <asm/atomic.h>
 
-static inline u64 cmpxchg8b(u64 *ptr, u64 old, u64 new)
+static noinline u64 cmpxchg8b(u64 *ptr, u64 old, u64 new)
 {
 	u32 low = new;
 	u32 high = new >> 32;
@@ -74,7 +74,7 @@
  *
  * Atomically adds @delta to @ptr and returns @delta + *@ptr
  */
-u64 atomic64_add_return(u64 delta, atomic64_t *ptr)
+noinline u64 atomic64_add_return(u64 delta, atomic64_t *ptr)
 {
 	/*
 	 * Try first with a (probably incorrect) assumption about