7846fa876820fde373b4402e5d1cf3d24f06d11f - platform/external/mesa3d

commit	7846fa876820fde373b4402e5d1cf3d24f06d11f	[log] [tgz]
author	Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	Tue May 10 00:49:39 2016 +0200
committer	Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	Thu May 26 22:07:04 2016 +0200
tree	e4f26b31ddd08e953c79bf253bf9e8b51b36132d
parent	c49e68dc4bcc14cac529d1e3be5fe0090ed4d146 [diff]

radeonsi: Add offchip buffer address calculation. Instead of creating a memory area per patch and per vertex, we put the same attribute of every vertex & patch together. Most loads and stores access the same attribute across all lanes, only for different patches and vertices. For the TCS this results in tightly packed data for 4-component stores. For the TES this is not the case as within a patch the loads often also access the same vertex. However if there are < 4 vertices/patch, this still results in a reduction of the number of cache lines. In the LDS situation we only do better than worst case if the data per patch < 64 bytes, which due to the tessellation factors is pretty much never. We do not use hardware swizzling for this. It would slightly reduce the number of executed VALU instructions, but I had issues with increased wait times that I haven't been able to solve yet. Furthermore, the tbuffer_store intrinsic does not support both VGPR offset and an index, so we have a problem storing indirectly indexed outputs. This can be solved by temporarily storing arrays in LDS and then copying them, but I don't think that is worth the effort. The difference in VALU cycles hardware swizzling gives is about 0.2% of total busy cycles. That is without handling the array case. I chose for attributes instead of components as they are often accessed together, and the software swizzling takes VALU cycles for calculating offsets. v2: - Rename functions to get_tcs_tes_buffer_address. - multiply by 16 as late as possible. - Use tgsi_full_src_register_from_dst. - Remove some bad comments. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com>