Merge "TF-A Aarch32: optimise memcpy4()" into integration