[SPARC64]: Fix missing fold at end of checksums.

Both csum_partial() and the csum_partial_copy*() family of routines
forget to do a final fold on the computed checksum value on sparc64.
So do the standard Sparc "add + set condition codes, add carry"
sequence, then make sure the high 32-bits of the return value are
clear.

Based upon some excellent detective work and debugging done by
Richard Braun and Samuel Thibault.

Signed-off-by: David S. Miller <davem@davemloft.net>
diff --git a/arch/sparc64/lib/csum_copy.S b/arch/sparc64/lib/csum_copy.S
index 71af488..e566c77 100644
--- a/arch/sparc64/lib/csum_copy.S
+++ b/arch/sparc64/lib/csum_copy.S
@@ -221,11 +221,12 @@
 	sll		%g1, 8, %g1
 	or		%o5, %g1, %o4
 
-1:	add		%o3, %o4, %o3
+1:	addcc		%o3, %o4, %o3
+	addc		%g0, %o3, %o3
 
 70:
 	retl
-	 mov		%o3, %o0
+	 srl		%o3, 0, %o0
 
 95:	mov		0, GLOBAL_SPARE
 	brlez,pn	%o2, 4f