commit | f867d556dd8525fe6ff0d22a34249528e590f994 | [log] [tgz] |
---|---|---|
author | Christophe Leroy <christophe.leroy@c-s.fr> | Tue Sep 22 16:34:32 2015 +0200 |
committer | Scott Wood <oss@buserror.net> | Fri Mar 04 23:03:45 2016 -0600 |
tree | 32ebba9cfc1b00d1f394b480d5cfab443382864e | |
parent | 48821a34b1bdc5d89505cb814b3f7c166940f200 [diff] |
powerpc32: optimise csum_partial() loop On the 8xx, load latency is 2 cycles and taking branches also takes 2 cycles. So let's unroll the loop. This patch improves csum_partial() speed by around 10% on both: * 8xx (single issue processor with parallel execution) * 83xx (superscalar 6xx processor with dual instruction fetch and parallel execution) Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <oss@buserror.net>