iommu: arm-smmu: Limit maximum batch size while holding spinlocks

Previously enhancements were made to io-pgtable-arm.c to loop through
an entire sg_table to prevent additional overhead from traversing back
and forth from the upper layer APIs in iommu.c to the low-level APIs in
io-pgtable-arm.c

However, with the cpu operating at minimum frequency and with
~14 Mb sg_tables, the irqsoff tracer reported latencys over 1ms.
Target a maximum latency of 500us under these conditions by spliting
sg_tables into smaller pieces.

This change is expected to have an impact on overall top performance:
Before (8998):
    size        iommu_map_sg      iommu_unmap
      4K            5.672 us        10.743 us
     64K            6.690 us        10.684 us
      2M           48.981 us        19.038 us
     12M          259.648 us       154.106 us
     20M          429.331 us       158.832 us

After:
    size        iommu_map_sg      iommu_unmap
      4K            5.731 us        10.578 us
     64K            6.923 us        10.691 us
      2M           53.107 us        19.278 us
     12M          291.640 us       153.895 us
     20M          477.925 us       158.704 us

Before (sdm845):
(average over 10 iterations)
    size        iommu_map_sg      iommu_unmap
      4K           16.750 us         9.302 us
     64K           18.229 us         9.349 us
      2M           90.364 us        19.864 us
     12M          477.432 us        33.161 us
     20M          774.515 us        43.656 us

After:
(average over 10 iterations)
    size        iommu_map_sg      iommu_unmap
      4K           16.614 us         9.364 us
     64K           18.187 us         9.380 us
      2M           96.494 us        20.036 us
     12M          504.958 us        44.541 us
     20M          838.952 us        44.583 us

Change-Id: I18cf0e86b4de7183c06684129a835a8806263193
Signed-off-by: Patrick Daly <pdaly@codeaurora.org>
1 file changed