vlan: Precise RX stats accounting

With multi queue devices, its possible that several cpus call
vlan RX routines simultaneously for the same vlan device.

We update RX stats counter without any locking, so we can
get slightly wrong counters.

One possible fix is to use percpu counters, to get precise
accounting and also get guarantee of no cache line ping pongs
between cpus.

Note: this adds 16 bytes (32 bytes on 64bit arches) of percpu
data per vlan device.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff --git a/net/8021q/vlan_core.c b/net/8021q/vlan_core.c
index 971d375..e75a2f3 100644
--- a/net/8021q/vlan_core.c
+++ b/net/8021q/vlan_core.c
@@ -31,7 +31,7 @@
 int vlan_hwaccel_do_receive(struct sk_buff *skb)
 {
 	struct net_device *dev = skb->dev;
-	struct net_device_stats *stats;
+	struct vlan_rx_stats     *rx_stats;
 
 	skb->dev = vlan_dev_info(dev)->real_dev;
 	netif_nit_deliver(skb);
@@ -40,15 +40,17 @@
 	skb->priority = vlan_get_ingress_priority(dev, skb->vlan_tci);
 	skb->vlan_tci = 0;
 
-	stats = &dev->stats;
-	stats->rx_packets++;
-	stats->rx_bytes += skb->len;
+	rx_stats = per_cpu_ptr(vlan_dev_info(dev)->vlan_rx_stats,
+			       smp_processor_id());
+
+	rx_stats->rx_packets++;
+	rx_stats->rx_bytes += skb->len;
 
 	switch (skb->pkt_type) {
 	case PACKET_BROADCAST:
 		break;
 	case PACKET_MULTICAST:
-		stats->multicast++;
+		rx_stats->multicast++;
 		break;
 	case PACKET_OTHERHOST:
 		/* Our lower layer thinks this is not local, let's make sure.