Implement single-precision round intrinsic in x86

Rationale:
X86 does not provide a direct instruction for the
required rounding and NaN and large positive numbers
must be dealt with too. This CL generates code that
correctly implements SP round in a reasonably
efficient manner (I hope....)

Test: 580-checker-round

BUG=26327751

Change-Id: Ic5f4d9cff9c27c855a8ad577c51ed3ed37fb60cd
3 files changed