Implement optimized kernel kickoff for T30

bug 7190126
~3x perf gain from lightweight intrinsics

Change-Id: I6cf001a2790f228efe252e0083e1915bd6373416
1 file changed