Split VAO dirty bits to speed iteration.

Using > 64 bits (we had over 90) would use a much slower dirty bit
iteration. Speed this up by splitting the dirty bits into two levels.
The first top level only has a single dirty bit per attrib, per
binding, and one bit for the element array buffer. The next level has
separate dirty bits for attribs and bindings.

The D3D11 back-end doesn't actually care about individual dirty bits
of attribs or bindings, since it resets entire attributes at a time,
but the GL back-end only refreshes the necessary info.

Improves the score of a simple state change microbenchmark by 15% on
the D3D11 and GL back-ends with a no-op driver. Real-world impact will
be smaller.

Also includes a test suppression for an NVIDIA bug that surfaced when
we changed the order of that GL commands were sent to the driver.

BUG=angleproject:2389

Change-Id: If8d5e5eb0b27e2a77e20535e33626183d372d311
Reviewed-on: https://chromium-review.googlesource.com/556799
Reviewed-by: Geoff Lang <geofflang@chromium.org>
Reviewed-by: Yuly Novikov <ynovikov@chromium.org>
Commit-Queue: Jamie Madill <jmadill@chromium.org>
diff --git a/src/libANGLE/renderer/VertexArrayImpl.h b/src/libANGLE/renderer/VertexArrayImpl.h
index e48cc53..76728f9 100644
--- a/src/libANGLE/renderer/VertexArrayImpl.h
+++ b/src/libANGLE/renderer/VertexArrayImpl.h
@@ -21,7 +21,10 @@
 {
   public:
     VertexArrayImpl(const gl::VertexArrayState &state) : mState(state) {}
-    virtual void syncState(const gl::Context *context, const gl::VertexArray::DirtyBits &dirtyBits)
+    virtual void syncState(const gl::Context *context,
+                           const gl::VertexArray::DirtyBits &dirtyBits,
+                           const gl::VertexArray::DirtyAttribBitsArray &attribBits,
+                           const gl::VertexArray::DirtyBindingBitsArray &bindingBits)
     {
     }
 
@@ -32,6 +35,6 @@
     const gl::VertexArrayState &mState;
 };
 
-}
+}  // namespace rx
 
 #endif // LIBANGLE_RENDERER_VERTEXARRAYIMPL_H_