dmaengine: mv_xor: optimize performance by using a subset of the XOR channels

Due to how async_tx behaves internally, having more XOR channels than
CPUs is actually hurting performance more than it improves it, because
memcpy requests get scheduled on a different channel than the XOR
requests, but async_tx will still wait for the completion of the
memcpy requests before scheduling the XOR requests.

It is in fact more efficient to have at most one channel per CPU,
which this patch implements by limiting the number of channels per
engine, and the number of engines registered depending on the number
of availables CPUs.

Marvell platforms are currently available in one CPU, two CPUs and
four CPUs configurations:

 - in the configurations with one CPU, only one channel from one
   engine is used.

 - in the configurations with two CPUs, only one channel from each
   engine is used (they are two XOR engines)

 - in the configurations with four CPUs, both channels of both engines
   are used.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
1 file changed