slub: Do not cross cacheline boundaries for very small objects

SLUB should pack even small objects nicely into cachelines if that is what
has been asked for. Use the same algorithm as SLAB for this.

The effect of this patch for a system with a cacheline size of 64
bytes is that the 24 byte sized slab caches will now put exactly
2 objects into a cacheline instead of 3 with some overlap into
the next cacheline. This reduces the object density in a 4k slab
from 170 to 128 objects (same as SLAB).

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Christoph Lameter <clameter@sgi.com>
diff --git a/mm/slub.c b/mm/slub.c
index a96e11c..96d63eb 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1856,12 +1856,15 @@
 	 * The hardware cache alignment cannot override the specified
 	 * alignment though. If that is greater then use it.
 	 */
-	if ((flags & SLAB_HWCACHE_ALIGN) &&
-			size > cache_line_size() / 2)
-		return max_t(unsigned long, align, cache_line_size());
+	if (flags & SLAB_HWCACHE_ALIGN) {
+		unsigned long ralign = cache_line_size();
+		while (size <= ralign / 2)
+			ralign /= 2;
+		align = max(align, ralign);
+	}
 
 	if (align < ARCH_SLAB_MINALIGN)
-		return ARCH_SLAB_MINALIGN;
+		align = ARCH_SLAB_MINALIGN;
 
 	return ALIGN(align, sizeof(void *));
 }