Add sk_parallel_for()

This should be a drop-in replacement for most for-loops to make them run in parallel:
   for (int i = 0; i < N; i++) { code... }
   ~~~>
   sk_parallel_for(N, [&](int i) { code... });

This is just syntax sugar over SkTaskGroup to make this use case really easy to write.
There's no more overhead that we weren't already forced to add using an interface like batch(),
and no extra heap allocations.

I've replaced 3 uses of SkTaskGroup with sk_parallel_for:
  1) My unit tests for SkOnce.
  2) Cary's path fuzzer.
  3) SkMultiPictureDraw.
Performance should be the same.  Please compare left and right for readability. :)

BUG=skia:

No public API changes.
TBR=reed@google.com

Review URL: https://codereview.chromium.org/1184373003
diff --git a/tests/SkpSkGrTest.cpp b/tests/SkpSkGrTest.cpp
index 2c3c5b7..212b0f6 100644
--- a/tests/SkpSkGrTest.cpp
+++ b/tests/SkpSkGrTest.cpp
@@ -1,6 +1,9 @@
-#if !SK_SUPPORT_GPU
-#error "GPU support required"
-#endif
+/*
+ * Copyright 2013 Google Inc.
+ *
+ * Use of this source code is governed by a BSD-style license that can be
+ * found in the LICENSE file.
+ */
 
 #include "GrContext.h"
 #include "GrContextFactory.h"
@@ -27,6 +30,10 @@
 #include "SkTime.h"
 #include "Test.h"
 
+#if !SK_SUPPORT_GPU
+#error "GPU support required"
+#endif
+
 #ifdef SK_BUILD_FOR_WIN
     #define PATH_SLASH "\\"
     #define IN_DIR "D:\\9-30-13\\"
@@ -162,10 +169,11 @@
 }
 
 void SkpSkGrThreadedTestRunner::render() {
-    SkTaskGroup tg;
-    for (int index = 0; index < fRunnables.count(); ++ index) {
-        tg.add(fRunnables[index]);
-    }
+    // TODO: we don't really need to be using SkRunnables here anymore.
+    // We can just write the code we'd run right in the for loop.
+    sk_parallel_for(fRunnables.count(), [&](int i) {
+        fRunnables[i]->run();
+    });
 }
 
 ////////////////////////////////////////////////