Run the first bench for 1000ms to warm up the nanobench if FLAGS_ms < 1000.

Otherwise, the first few benches' measurements will be inaccurate.

For example, without this CL, the first few measurements are:
  337ns, 566µs, 1000µs, ... without "--ms 1000" arg
  211ns, 285µs,  874µs, ... with "--ms 1000" arg

With this CL, the first few measurements are:
  195ns, 296µs, 1.03ms, ... without "--ms 1000" arg
  204ns, 280µs,  859µs, ... with "--ms 1000" arg

In the example above, the first two measurements are vastly (>50%)
different without this CL. I think that's the reason why I keep
using "--ms 1000" arg locally. But it's really only necessary for
the first bench to warm up nanobench. It's a waste to apply
"--ms 1000" to all the following benches.

Bug: skia:
Change-Id: I1924ba3ff9185ed89aeda72794fafd1fe6625eef
Reviewed-on: https://skia-review.googlesource.com/49742
Reviewed-by: Yuqian Li <liyuqian@google.com>
Commit-Queue: Yuqian Li <liyuqian@google.com>
diff --git a/bench/nanobench.cpp b/bench/nanobench.cpp
index d784439..0458330 100644
--- a/bench/nanobench.cpp
+++ b/bench/nanobench.cpp
@@ -1284,6 +1284,15 @@
                 ? setup_gpu_bench(target, bench.get(), maxFrameLag)
                 : setup_cpu_bench(overhead, target, bench.get());
 
+            if (runs == 0 && FLAGS_ms < 1000) {
+                // Run the first bench for 1000ms to warm up the nanobench if FLAGS_ms < 1000.
+                // Otherwise, the first few benches' measurements will be inaccurate.
+                auto stop = now_ms() + 1000;
+                do {
+                    time(loops, bench.get(), target);
+                } while (now_ms() < stop);
+            }
+
             if (FLAGS_ms) {
                 samples.reset();
                 auto stop = now_ms() + FLAGS_ms;