Fix skewed latencies for rate IOPS

- when -rate_iops is specified, FIO periodically calls usleep() to limit IOPS

- Before usleep(), FIO always wait until the completion of all pending I/O

- For all I/O completions, FIO shows erroneous behavior of logging their
  latency, with that of the longest latency IO sample.

- w/ QD=8/ mixed R:W=33:66, up to 7 samples may get false latency log when
  waiting all I/O completions => False report of latency distribution

Signed-off-by: Jens Axboe <axboe@kernel.dk>
diff --git a/io_u.c b/io_u.c
index e474b48..7941a6d 100644
--- a/io_u.c
+++ b/io_u.c
@@ -531,10 +531,10 @@
 	 * io's that have been actually submitted to an async engine,
 	 * and cur_depth is meaningless for sync engines.
 	 */
-	if (td->io_u_in_flight) {
+	while (td->io_u_in_flight) {
 		int fio_unused ret;
 
-		ret = io_u_queued_complete(td, td->io_u_in_flight, NULL);
+		ret = io_u_queued_complete(td, 1, NULL);
 	}
 
 	fio_gettime(&t, NULL);