openvswitch: optimize flow compare and mask functions

Make sure the sw_flow_key structure and valid mask boundaries are always
machine word aligned. Optimize the flow compare and mask operations
using machine word size operations. This patch improves throughput on
average by 15% when CPU is the bottleneck of forwarding packets.

This patch is inspired by ideas and code from a patch submitted by Peter
Klausler titled "replace memcmp() with specialized comparator".
However, The original patch only optimizes for architectures
support unaligned machine word access. This patch optimizes for all
architectures.

Signed-off-by: Andy Zhou <azhou@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
2 files changed