arm: add single instruction fma

vfma is available in the vfpv4 fpu and above, the ACLE standard feature
test for double precision hardware fma support is
  __ARM_FEATURE_FMA && __ARM_FP&8
we need further checks to work around clang bugs (fixed in clang >=7.0)
  && !__SOFTFP__
because __ARM_FP is defined even with -mfloat-abi=soft
  && !BROKEN_VFP_ASM
to disable the single precision code when inline asm handling is broken.

For runtime selection the HWCAP_ARM_VFPv4 hwcap flag can be used, but
that requires further work.
2 files changed