Fix rate option with iodepth > 1

The rate option currently doesnt work when used with libaio engine.
The math currently, calculates the time t2 (when the I/O completed) -
t1 (when the io_u unit was created) as the time it takes for the I/O
and the bandwidth for the rate calculation is calculated from that.
This math will work correctly for sync engine as there is only one io
in progress at a time, but for libaio engine, when there are multiple
I/Os queued, the same time (as in from t1 to t2) could be attributed
to other I/Os as well so the actual bandwidth is actually higher.
I have a patch, but this is more brute force where I take the total
bytes read/written divided by the time since I/Os started to calculate
the bandwidth and decide on the time that needs to be spent sleeping
(if any).This is a little more heavy weight than the previous math. I
think there are probably simpler/cleaner solutions than this but this
is the current patch I have for it.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
diff --git a/io_u.c b/io_u.c
index 4be958d..1845d3b 100644
--- a/io_u.c
+++ b/io_u.c
@@ -984,6 +984,7 @@
 	if (!io_u->error) {
 		unsigned int bytes = io_u->buflen - io_u->resid;
 		const enum fio_ddir idx = io_u->ddir;
+		const enum fio_ddir odx = io_u->ddir ^ 1;
 		int ret;
 
 		td->io_blocks[idx]++;
@@ -992,15 +993,10 @@
 
 		if (ramp_time_over(td)) {
 			unsigned long uninitialized_var(lusec);
-			unsigned long uninitialized_var(rusec);
 
 			if (!td->o.disable_clat || !td->o.disable_bw)
 				lusec = utime_since(&io_u->issue_time,
 							&icd->time);
-			if (__should_check_rate(td, idx) ||
-			    __should_check_rate(td, idx ^ 1))
-				rusec = utime_since(&io_u->start_time,
-							&icd->time);
 
 			if (!td->o.disable_clat) {
 				add_clat_sample(td, idx, lusec, bytes);
@@ -1009,11 +1005,16 @@
 			if (!td->o.disable_bw)
 				add_bw_sample(td, idx, bytes, &icd->time);
 			if (__should_check_rate(td, idx)) {
-				td->rate_pending_usleep[idx] +=
-					(long) td->rate_usec_cycle[idx] - rusec;
+				td->rate_pending_usleep[idx] =
+					((td->this_io_bytes[idx] *
+					  td->rate_nsec_cycle[idx]) / 1000 -
+					 utime_since_now(&td->start));
 			}
 			if (__should_check_rate(td, idx ^ 1))
-				td->rate_pending_usleep[idx ^ 1] -= rusec;
+				td->rate_pending_usleep[odx] =
+					((td->this_io_bytes[odx] *
+					  td->rate_nsec_cycle[odx]) / 1000 -
+					 utime_since_now(&td->start));
 		}
 
 		if (td_write(td) && idx == DDIR_WRITE &&