commit | 09de60fc9332106d0d83bb6b5fef7d74f11801ec | [log] [tgz] |
---|---|---|
author | Juan Zhao <juan.j.zhao@intel.com> | Tue Jun 30 05:39:55 2015 +0000 |
committer | Juan Zhao <juan.j.zhao@intel.com> | Tue Jun 30 17:47:55 2015 +0800 |
tree | eeada7374ccb9db0881c2709291cd8e93385ac87 | |
parent | fd89162db7843cd5d1ed5fba37cfa6093d1dc861 [diff] |
bilateral performance tuning: SLM method Use SLM, we will load the shared local memory and add a barrier to share the reading bandwidth. And also use pragma unroll to avoid manually open the loop. Now the performance can reach 68fps in ivybridge GT2 which have 24 EUs, and keep 13fps in baytraili. Signed-off-by: Juan Zhao <juan.j.zhao@intel.com>