MIPS: Collect FPU emulator statistics per-CPU.

On SMP systems, the collection of statistics can cause cache line
bouncing in the lines associated with the counters.  Also there are
races incrementing the counters on multiple CPUs.

To fix both problems, we collect the statistics in per-CPU variables,
and add them up in the debugfs read operation.

As a test I ran the LTP float_bessel test on a 12 CPU Octeon system.

Without CONFIG_DEBUG_FS :             2602 seconds.
With CONFIG_DEBUG_FS:                 2640 seconds.
With non-cpu-local atomic statistics: 14569 seconds.

Signed-off-by: David Daney <ddaney@caviumnetworks.com>
Cc: linux-mips@linux-mips.org
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
3 files changed