net: output path optimizations

1) Avoid dirtying neighbour's confirmed field.

  TCP workloads hits this cache line for each incoming ACK.
  Lets write n->confirmed only if there is a jiffie change.

2) Optimize neigh_hh_output() for the common Ethernet case, were
   hh_len is less than 16 bytes. Replace the memcpy() call
   by two inlined 64bit load/stores on x86_64.

Bench results using udpflood test, with -C option (MSG_CONFIRM flag
added to sendto(), to reproduce the n->confirmed dirtying on UDP)

24 threads doing 1.000.000 UDP sendto() on dummy device, 4 runs.

before : 2.247s, 2.235s, 2.247s, 2.318s
after  : 1.884s, 1.905s, 1.891s, 1.895s

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 files changed