Issue #8692: Improve performance of math.factorial:

(1) use a different algorithm that roughly halves the total number of
    multiplications required and results in more balanced multiplications
(2) use a lookup table for small arguments
(3) fast accumulation of products in C integer arithmetic rather than
    PyLong arithmetic when possible.

Typical speedup, from unscientific testing on a 64-bit laptop, is 4.5x
to 6.5x for arguments in the range 100 - 10000.

Patch by Daniel Stutzbach; extensive reviews by Alexander Belopolsky.
diff --git a/Misc/NEWS b/Misc/NEWS
index 3da54ab..a4a6938 100644
--- a/Misc/NEWS
+++ b/Misc/NEWS
@@ -1132,6 +1132,12 @@
 Extension Modules
 -----------------
 
+- Issue #8692: Optimize math.factorial: replace the previous naive
+  algorithm with an improved 'binary-split' algorithm that uses fewer
+  multiplications and allows many of the multiplications to be
+  performed using plain C integer arithmetic instead of PyLong
+  arithmetic.  Also uses a lookup table for small arguments.
+
 - Issue #8674: Fixed a number of incorrect or undefined-behaviour-inducing
   overflow checks in the audioop module.