Sped the usual case for sorting by calling PyObject_RichCompareBool
directly when no comparison function is specified.  This saves a layer
of function call on every compare then.  Measured speedups:

 i    2**i  *sort  \sort  /sort  3sort  +sort  %sort  ~sort  =sort  !sort
15   32768  12.5%   0.0%   0.0% 100.0%   0.0%  50.0% 100.0% 100.0% -50.0%
16   65536   8.7%   0.0%   0.0%   0.0%   0.0%   0.0%  12.5%   0.0%   0.0%
17  131072   8.0%  25.0%   0.0%  25.0%   0.0%  14.3%   5.9%   0.0%   0.0%
18  262144   6.3% -10.0%  12.5%  11.1%   0.0%   6.3%   5.6%  12.5%   0.0%
19  524288   5.3%   5.9%   0.0%   5.6%   0.0%   5.9%   5.4%   0.0%   2.9%
20 1048576   5.3%   2.9%   2.9%   5.1%   2.8%   1.3%   5.9%   2.9%   4.2%

The best indicators are those that take significant time (larger i), and
where sort doesn't do very few compares (so *sort and ~sort benefit most
reliably).  The large numbers are due to roundoff noise combined with
platform variability; e.g., the 14.3% speedup for %sort at i=17 reflects
a printed elapsed time of 0.18 seconds falling to 0.17, but a change in
the last digit isn't really meaningful (indeed, if it really took 0.175
seconds, one electron having a lazy nanosecond could shift it to either
value <wink>).  Similarly the 25% at 3sort i=17 was a meaningless change
from 0.05 to 0.04.  However, almost all the "meaningless changes" were
in the same direction, which is good.  The before-and-after times for
*sort are clearest:

before after
  0.18  0.16
  0.25  0.23
  0.54  0.50
  1.18  1.11
  2.57  2.44
  5.58  5.30
1 file changed