Fix MIPS DSPr2 4:2:0 upsample bug w/ small images
The DSPr2 code was errantly comparing the residual (t9, width & 0xF)
with the end pointer (t4, out + width) instead of the width directly
(a1).  This would give the wrong results with any image whose output
width was less than 16.  The other small changes (ulw to lw and removal
of the nop) are just some easy optimizations around this code.

This issue caused a buffer overrun and subsequent segfault on images
whose scaled output height was 1 pixel and whose scaled output width was
< 16 pixels.  Note that the "plain" (non-fancy and non-merged) upsample
routine, which was affected by this bug, is normally not used except
when decompressing a non-YCbCr JPEG image, but it is also used when
decompressing a single-row image (because the other upsampling
algorithms require at least two rows.)

Closes #16.
diff --git a/ChangeLog.txt b/ChangeLog.txt
index 69e1262..fb996c1 100644
--- a/ChangeLog.txt
+++ b/ChangeLog.txt
@@ -21,6 +21,13 @@
 structure members into a single 64-bit register, and this exposed the ABI
 conformance issue.
 
+[4] Fixed a bug in the MIPS DSPr2 4:2:0 "plain" (non-fancy and non-merged)
+upsampling routine that caused a buffer overflow (and subsequent segfault) when
+decompressing a 4:2:0 JPEG image whose scaled output width was less than 16
+pixels.  The "plain" upsampling routines are normally only used when
+decompressing a non-YCbCr JPEG image, but they are also used when decompressing
+a JPEG image whose scaled output height is 1.
+
 
 1.4.1
 =====