gallivm: add load/store scratch support.

Scratch space is per-thread space, so allocate the scratch size
* vector width, and add a per-thread base offset to each
load/store.

This is needed for OpenCL private memory space

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7304>
4 files changed