openvswitch: use percpu flow stats
Instead of using flow stats per NUMA node, use it per CPU. When using
megaflows, the stats lock can be a bottleneck in scalability.
On a E5-2690 12-core system, usual throughput went from ~4Mpps to
~15Mpps when forwarding between two 40GbE ports with a single flow
configured on the datapath.
This has been tested on a system with possible CPUs 0-7,16-23. After
module removal, there were no corruption on the slab cache.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com>
Cc: pravin shelar <pshelar@ovn.org>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff --git a/net/openvswitch/flow.h b/net/openvswitch/flow.h
index 156a302..ae783f5 100644
--- a/net/openvswitch/flow.h
+++ b/net/openvswitch/flow.h
@@ -178,14 +178,14 @@
struct hlist_node node[2];
u32 hash;
} flow_table, ufid_table;
- int stats_last_writer; /* NUMA-node id of the last writer on
+ int stats_last_writer; /* CPU id of the last writer on
* 'stats[0]'.
*/
struct sw_flow_key key;
struct sw_flow_id id;
struct sw_flow_mask *mask;
struct sw_flow_actions __rcu *sf_acts;
- struct flow_stats __rcu *stats[]; /* One for each NUMA node. First one
+ struct flow_stats __rcu *stats[]; /* One for each CPU. First one
* is allocated at flow creation time,
* the rest are allocated on demand
* while holding the 'stats[0].lock'.