Small optimizations for list_slice() and list_extend_internal().

* Using addition instead of substraction on array indices allows the
  compiler to use a fast addressing mode.  Saves about 10%.

* Using PyTuple_GET_ITEM and PyList_SET_ITEM is about 7% faster than
  PySequenceFast_GET_ITEM which has to make a list check on every pass.
1 file changed