net: better pcpu data alignment

Tunnels can force an alignment of their percpu data to reduce number of
cache lines used in fast path, or read in .ndo_get_stats()

percpu_alloc() is a very fine grained allocator, so any small hole will
be used anyway.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
index a7a1860..cec0938 100644
--- a/net/ipv6/sit.c
+++ b/net/ipv6/sit.c
@@ -91,7 +91,7 @@
 	unsigned long	rx_bytes;
 	unsigned long	tx_packets;
 	unsigned long	tx_bytes;
-};
+} __attribute__((aligned(4*sizeof(unsigned long))));
 
 static struct net_device_stats *ipip6_get_stats(struct net_device *dev)
 {