math: new pow

from https://github.com/ARM-software/optimized-routines,
commit 04884bd04eac4b251da4026900010ea7d8850edc

The underflow exception is signaled if the result is in the subnormal
range even if the result is exact.

code size change: +3421 bytes.
benchmark on x86_64 before, after, speedup:

-Os:
   pow rthruput: 102.96 ns/call 33.38 ns/call 3.08x
    pow latency: 144.37 ns/call 54.75 ns/call 2.64x
-O3:
   pow rthruput:  98.91 ns/call 32.79 ns/call 3.02x
    pow latency: 138.74 ns/call 53.78 ns/call 2.58x
4 files changed