[SPARC64]: Inline membar()'s again.

Since GCC has to emit a call and a delay slot to the
out-of-line "membar" routines in arch/sparc64/lib/mb.S
it is much better to just do the necessary predicted
branch inline instead as:

	ba,pt	%xcc, 1f
	 membar	#whatever
1:

instead of the current:

	call	membar_foo
	 dslot

because this way GCC is not required to allocate a stack
frame if the function can be a leaf function.

This also makes this bug fix easier to backport to 2.4.x

Signed-off-by: David S. Miller <davem@davemloft.net>
diff --git a/arch/sparc64/lib/Makefile b/arch/sparc64/lib/Makefile
index 6201f10..40dbeec 100644
--- a/arch/sparc64/lib/Makefile
+++ b/arch/sparc64/lib/Makefile
@@ -12,7 +12,7 @@
 	 U1memcpy.o U1copy_from_user.o U1copy_to_user.o \
 	 U3memcpy.o U3copy_from_user.o U3copy_to_user.o U3patch.o \
 	 copy_in_user.o user_fixup.o memmove.o \
-	 mcount.o ipcsum.o rwsem.o xor.o find_bit.o delay.o mb.o
+	 mcount.o ipcsum.o rwsem.o xor.o find_bit.o delay.o
 
 lib-$(CONFIG_DEBUG_SPINLOCK) += debuglocks.o
 lib-$(CONFIG_HAVE_DEC_LOCK) += dec_and_lock.o