[X86] Heuristic to selectively build Newton-Raphson SQRT estimation On modern Intel processors hardware SQRT in many cases is faster than RSQRT followed by Newton-Raphson refinement. The patch introduces a simple heuristic to choose between hardware SQRT instruction and Newton-Raphson software estimation. The patch treats scalars and vectors differently. The heuristic is that for scalars the compiler should optimize for latency while for vectors it should optimize for throughput. It is based on the assumption that throughput bound code is likely to be vectorized. Basically, the patch disables scalar NR for big cores and disables NR completely for Skylake. Firstly, scalar SQRT has shorter latency than NR code in big cores. Secondly, vector SQRT has been greatly improved in Skylake and has better throughput compared to NR. Differential Revision: https://reviews.llvm.org/D21379 llvm-svn: 277725

commit: f679530ba18023d29765bde397fa77048bf17985 [log] [tgz]
author: Nikolai Bozhenov <nikolai.bozhenov@intel.com> Thu Aug 04 12:47:28 2016 +0000
committer: Nikolai Bozhenov <nikolai.bozhenov@intel.com> Thu Aug 04 12:47:28 2016 +0000
tree: 26d32ee662bbb6f153eb39b81350d1d6859cd044
parent: 8950cead7f2032d4dee6b17be4eb4c6b5d755403 [diff] [blame]
diff --git a/llvm/lib/Target/X86/X86Subtarget.h b/llvm/lib/Target/X86/X86Subtarget.h
index a274b79..c1f862d 100644
--- a/llvm/lib/Target/X86/X86Subtarget.h
+++ b/llvm/lib/Target/X86/X86Subtarget.h

@@ -199,6 +199,14 @@
   /// of a YMM register without clearing the upper part.
   bool HasFastPartialYMMWrite;
 
+  /// True if hardware SQRTSS instruction is at least as fast (latency) as
+  /// RSQRTSS followed by a Newton-Raphson iteration.
+  bool HasFastScalarFSQRT;
+
+  /// True if hardware SQRTPS/VSQRTPS instructions are at least as fast
+  /// (throughput) as RSQRTPS/VRSQRTPS followed by a Newton-Raphson iteration.
+  bool HasFastVectorFSQRT;
+
   /// True if 8-bit divisions are significantly faster than
   /// 32-bit divisions and should be used when possible.
   bool HasSlowDivide32;
@@ -434,6 +442,8 @@
   bool hasCmpxchg16b() const { return HasCmpxchg16b; }
   bool useLeaForSP() const { return UseLeaForSP; }
   bool hasFastPartialYMMWrite() const { return HasFastPartialYMMWrite; }
+  bool hasFastScalarFSQRT() const { return HasFastScalarFSQRT; }
+  bool hasFastVectorFSQRT() const { return HasFastVectorFSQRT; }
   bool hasSlowDivide32() const { return HasSlowDivide32; }
   bool hasSlowDivide64() const { return HasSlowDivide64; }
   bool padShortFunctions() const { return PadShortFunctions; }
commit	f679530ba18023d29765bde397fa77048bf17985	[log] [tgz]
author	Nikolai Bozhenov <nikolai.bozhenov@intel.com>	Thu Aug 04 12:47:28 2016 +0000
committer	Nikolai Bozhenov <nikolai.bozhenov@intel.com>	Thu Aug 04 12:47:28 2016 +0000
tree	26d32ee662bbb6f153eb39b81350d1d6859cd044
parent	8950cead7f2032d4dee6b17be4eb4c6b5d755403 [diff] [blame]