crypto: x86/sha1 - fix stack alignment of AVX2 variant

The AVX2 implementation might waste up to a page of stack memory because
of a wrong alignment calculation. This will, in the worst case, increase
the stack usage of sha1_transform_avx2() alone to 5.4 kB -- way to big
for a kernel function. Even worse, it might also allocate *less* bytes
than needed if the stack pointer is already aligned bacause in that case
the 'sub %rbx, %rsp' is effectively moving the stack pointer upwards,
not downwards.

Fix those issues by changing and simplifying the alignment calculation
to use a 32 byte alignment, the alignment really needed.

Cc: Chandramouli Narayanan <mouli@linux.intel.com>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Reviewed-by: H. Peter Anvin <hpa@linux.intel.com>
Reviewed-by: Marek Vasut <marex@denx.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
diff --git a/arch/x86/crypto/sha1_avx2_x86_64_asm.S b/arch/x86/crypto/sha1_avx2_x86_64_asm.S
index 4f34854..bacac22 100644
--- a/arch/x86/crypto/sha1_avx2_x86_64_asm.S
+++ b/arch/x86/crypto/sha1_avx2_x86_64_asm.S
@@ -636,9 +636,7 @@
 
 	/* Align stack */
 	mov	%rsp, %rbx
-	and	$(0x1000-1), %rbx
-	sub	$(8+32), %rbx
-	sub	%rbx, %rsp
+	and	$~(0x20-1), %rsp
 	push	%rbx
 	sub	$RESERVE_STACK, %rsp
 
@@ -665,8 +663,7 @@
 	avx2_zeroupper
 
 	add	$RESERVE_STACK, %rsp
-	pop	%rbx
-	add	%rbx, %rsp
+	pop	%rsp
 
 	pop	%r15
 	pop	%r14