Speed up draw-call-bound apps with index range cache

Previously, we were looping through the entire
index buffer (can be 10^3-10^4+ items) on every
draw call, in order to find min/max vertex index.
This operation often consumes half or more
of the time taken per draw call (from systrace).

The min/max vertex index are required if:
- we are in some "immediate array" mode where
  we got an array passed as "offset" argument of
  glDrawElements (i.e., index buffer 0 bound)
- validation (dEQP tests, debugging draw calls
  where vertex buffer out-of-bounds is in question)

ANGLE uses the concept of an "index range cache"
in order to avoid recalculating index ranges
that are known already.

This CL incorporates the IndexRangeCache class
from ANGLE, greatly improving glDrawElements run time
by making it not depend on the size of the index buffer.

It also makes a slight further tweak: if
we do not flush every draw call, but instead
every two draw calls, we have lower pipe overhead
and can get about 1 FPS more.

The performance improvement: ~10-20% FPS on
non-draw-call-limited GPUs.

Linux, Quadro K2200: Antutu v6: ~35->~40 FPS

No dEQP GLES2 or EGL regressions were found.

Change-Id: I29be0f405c6d3e3257e212912c6af6c6f3e12fa7
diff --git a/shared/OpenglCodecCommon/GLSharedGroup.cpp b/shared/OpenglCodecCommon/GLSharedGroup.cpp
index b079b6d..1f7c629 100755
--- a/shared/OpenglCodecCommon/GLSharedGroup.cpp
+++ b/shared/OpenglCodecCommon/GLSharedGroup.cpp
@@ -278,6 +278,8 @@
 
     //it's safe to update now
     memcpy((char*)buf->m_fixedBuffer.ptr() + offset, data, size);
+
+    buf->m_indexRangeCache.invalidateRange((size_t)offset, (size_t)size);
     return GL_NO_ERROR;
 }