Issue #5512: speed up the long division algorithm for Python longs.
The basic algorithm remains the same; the most significant speedups
come from the following three changes:

  (1) normalize by shifting instead of multiplying and dividing
  (2) the old algorithm usually did an unnecessary extra iteration of
      the outer loop; remove this.  As a special case, this means that
      long divisions with a single-digit result run twice as fast as
      before.
  (3) make inner loop much tighter.

Various benchmarks show speedups of between 50% and 150% for long
integer divisions and modulo operations.
diff --git a/Misc/NEWS b/Misc/NEWS
index 17bc214..e96ec17 100644
--- a/Misc/NEWS
+++ b/Misc/NEWS
@@ -12,6 +12,10 @@
 Core and Builtins
 -----------------
 
+- Issue #5512: Rewrite PyLong long division algorithm (x_divrem) to
+  improve its performance.  Long divisions and remainder operations
+  are now between 50% and 150% faster.
+
 - Issue #4258: Make it possible to use base 2**30 instead of base
   2**15 for the internal representation of integers, for performance
   reasons.  Base 2**30 is enabled by default on 64-bit machines.  Add