Here's a quick cleanup of md5sum. Executive summary: smaller and faster.

On my machine, for a 2.2 GB file of random bytes, the timings with
warm cache are:
toybox before:     11.4 seconds
toybox after:      8.3 seconds
GNU md5sum:        3.9 seconds
openssl dgst -md5: 3.5 seconds

This is clearly better than before (3x openssl), but still slow (2x openssl).

I suspect there is more low-hanging fruit to be had by eliminating the
memcpy in hash_update (maybe not too much - hash_update accounts for
about 4% of total runtime versus 92% for md5_transform according to
perf - but this would also help sha1sum).

make bloatcheck on x86_64 gcc 4.8.2 -Os:
name                                           old       new      delta
-----------------------------------------------------------------------
 md5rot                                          0        64         64
 md5_transform                                 365       223       -142
-----------------------------------------------------------------------
                                                                    -78 total

Rationale for the changes:

Move definition of 'rol' up so it can be used in md5_transform. This
is purely cosmetic; it expands to exactly the same code.

Put rotation counts in a lookup table instead of calculating them on
the fly. This is mostly a wash size-wise, +5 bytes total, but
worthwhile for readability and speed.

Instead of accessing the state array using a rotating index (the
variable formerly known as 'a'), access the state with constant
offsets and rotate the contents of the array instead.  This is the big
win - it eliminates all the crazy memory addressing math inside the
loop.
1 file changed