Improve performance of Burst executions

Prior to this CL, the Burst object operated under one of two modes:
(1) "non-blocking" mode, where the Burst controller and server both
    constantly poll whether data is available in the FMQ. This approach
    is good because it results in very low IPC latency times, but has
    the potential to waste power because the CPU is constantly doing
    work.
(2) "blocking" mode, where the Burst controller and server both wait on
    a futex whenever they attempt to read data from the FMQ. This
    approach is good because it saves power (the thread is idle), but
    results in higher IPC latency times because the thread must be
    awoken before it can continue to retrieve the data.

This CL fuses the two approaches for better performance. Specifically,
the FMQ consumer will poll/spin for a period of time to see if the data
is available. If the data becomes available in this time, it
immediately retrieves it and continues processing. If the data does not
become available within this time, the consumer waits on the futex
until the data becomes available in order to save power.

This makes Burst operate with very low IPC latencies when the driver
service executes fully within the polling time window, and makes the
Burst operate with IPC latencies similar to the synchronous execution
path when the driver service has a longer run time.

In this CL, the default polling time is 0us for power saving execution
preference and 50us otherwise. The polling time is configurable with a
system property, and the time can be specified by running either:
  adb shell setprop debug.nn.burst-controller-polling-window <microseconds>
  adb shell setprop debug.nn.sample-driver-burst-polling-window <microseconds>

This change also adds includes that were missing indicated by the IWYU
repohook.

Bug: 132073143
Test: mma
Test: NeuralNetworksTest_static
Test: VtsHalNeuralnetworksV1_*TargetTest
Test: inspected logcat and ensured that both spinning- and futex-based
    waiting schemes were used for both the ExecutionBurstController and
    ExecutionBurstServer

Change-Id: I120e0b24c7236105d75d93696dde8deddd0e3507
diff --git a/nn/common/ExecutionBurstServer.cpp b/nn/common/ExecutionBurstServer.cpp
index 74bc340..ec935da 100644
--- a/nn/common/ExecutionBurstServer.cpp
+++ b/nn/common/ExecutionBurstServer.cpp
@@ -20,9 +20,14 @@
 
 #include <android-base/logging.h>
 
+#include <algorithm>
 #include <cstring>
 #include <limits>
 #include <map>
+#include <memory>
+#include <tuple>
+#include <utility>
+#include <vector>
 
 #include "Tracing.h"
 
@@ -31,6 +36,8 @@
 
 using namespace hal;
 
+using hardware::MQDescriptorSync;
+
 constexpr Timing kNoTiming = {std::numeric_limits<uint64_t>::max(),
                               std::numeric_limits<uint64_t>::max()};
 
@@ -298,20 +305,27 @@
 // RequestChannelReceiver methods
 
 std::unique_ptr<RequestChannelReceiver> RequestChannelReceiver::create(
-        const FmqRequestDescriptor& requestChannel) {
+        const FmqRequestDescriptor& requestChannel, std::chrono::microseconds pollingTimeWindow) {
     std::unique_ptr<FmqRequestChannel> fmqRequestChannel =
             std::make_unique<FmqRequestChannel>(requestChannel);
+
     if (!fmqRequestChannel->isValid()) {
         LOG(ERROR) << "Unable to create RequestChannelReceiver";
         return nullptr;
     }
-    const bool blocking = fmqRequestChannel->getEventFlagWord() != nullptr;
-    return std::make_unique<RequestChannelReceiver>(std::move(fmqRequestChannel), blocking);
+    if (fmqRequestChannel->getEventFlagWord() == nullptr) {
+        LOG(ERROR)
+                << "RequestChannelReceiver::create was passed an MQDescriptor without an EventFlag";
+        return nullptr;
+    }
+
+    return std::make_unique<RequestChannelReceiver>(std::move(fmqRequestChannel),
+                                                    pollingTimeWindow);
 }
 
 RequestChannelReceiver::RequestChannelReceiver(std::unique_ptr<FmqRequestChannel> fmqRequestChannel,
-                                               bool blocking)
-    : mFmqRequestChannel(std::move(fmqRequestChannel)), mBlocking(blocking) {}
+                                               std::chrono::microseconds pollingTimeWindow)
+    : mFmqRequestChannel(std::move(fmqRequestChannel)), kPollingTimeWindow(pollingTimeWindow) {}
 
 std::optional<std::tuple<Request, std::vector<int32_t>, MeasureTiming>>
 RequestChannelReceiver::getBlocking() {
@@ -328,17 +342,15 @@
 
     // force unblock
     // ExecutionBurstServer is by default waiting on a request packet. If the
-    // client process destroys its burst object, the server will still be
-    // waiting on the futex (assuming mBlocking is true). This force unblock
-    // wakes up any thread waiting on the futex.
-    if (mBlocking) {
-        // TODO: look for a different/better way to signal/notify the futex to
-        // wake up any thread waiting on it
-        FmqRequestDatum datum;
-        datum.packetInformation({/*.packetSize=*/0, /*.numberOfInputOperands=*/0,
-                                 /*.numberOfOutputOperands=*/0, /*.numberOfPools=*/0});
-        mFmqRequestChannel->writeBlocking(&datum, 1);
-    }
+    // client process destroys its burst object, the server may still be waiting
+    // on the futex. This force unblock wakes up any thread waiting on the
+    // futex.
+    // TODO: look for a different/better way to signal/notify the futex to wake
+    // up any thread waiting on it
+    FmqRequestDatum datum;
+    datum.packetInformation({/*.packetSize=*/0, /*.numberOfInputOperands=*/0,
+                             /*.numberOfOutputOperands=*/0, /*.numberOfPools=*/0});
+    mFmqRequestChannel->writeBlocking(&datum, 1);
 }
 
 std::optional<std::vector<FmqRequestDatum>> RequestChannelReceiver::getPacketBlocking() {
@@ -348,17 +360,53 @@
         return std::nullopt;
     }
 
-    // wait for request packet and read first element of request packet
-    FmqRequestDatum datum;
-    bool success = false;
-    if (mBlocking) {
-        success = mFmqRequestChannel->readBlocking(&datum, 1);
-    } else {
-        while ((success = !mTeardown.load(std::memory_order_relaxed)) &&
-               !mFmqRequestChannel->read(&datum, 1)) {
+    // First spend time polling if results are available in FMQ instead of
+    // waiting on the futex. Polling is more responsive (yielding lower
+    // latencies), but can take up more power, so only poll for a limited period
+    // of time.
+
+    auto& getCurrentTime = std::chrono::high_resolution_clock::now;
+    const auto timeToStopPolling = getCurrentTime() + kPollingTimeWindow;
+
+    while (getCurrentTime() < timeToStopPolling) {
+        // if class is being torn down, immediately return
+        if (mTeardown.load(std::memory_order_relaxed)) {
+            return std::nullopt;
+        }
+
+        // Check if data is available. If it is, immediately retrieve it and
+        // return.
+        const size_t available = mFmqRequestChannel->availableToRead();
+        if (available > 0) {
+            // This is the first point when we know an execution is occurring,
+            // so begin to collect systraces. Note that a similar systrace does
+            // not exist at the corresponding point in
+            // ResultChannelReceiver::getPacketBlocking because the execution is
+            // already in flight.
+            NNTRACE_FULL(NNTRACE_LAYER_IPC, NNTRACE_PHASE_EXECUTION,
+                         "ExecutionBurstServer getting packet");
+            std::vector<FmqRequestDatum> packet(available);
+            const bool success = mFmqRequestChannel->read(packet.data(), available);
+            if (!success) {
+                LOG(ERROR) << "Error receiving packet";
+                return std::nullopt;
+            }
+            return std::make_optional(std::move(packet));
         }
     }
 
+    // If we get to this point, we either stopped polling because it was taking
+    // too long or polling was not allowed. Instead, perform a blocking call
+    // which uses a futex to save power.
+
+    // wait for request packet and read first element of request packet
+    FmqRequestDatum datum;
+    bool success = mFmqRequestChannel->readBlocking(&datum, 1);
+
+    // This is the first point when we know an execution is occurring, so begin
+    // to collect systraces. Note that a similar systrace does not exist at the
+    // corresponding point in ResultChannelReceiver::getPacketBlocking because
+    // the execution is already in flight.
     NNTRACE_FULL(NNTRACE_LAYER_IPC, NNTRACE_PHASE_EXECUTION, "ExecutionBurstServer getting packet");
 
     // retrieve remaining elements
@@ -393,17 +441,21 @@
         const FmqResultDescriptor& resultChannel) {
     std::unique_ptr<FmqResultChannel> fmqResultChannel =
             std::make_unique<FmqResultChannel>(resultChannel);
+
     if (!fmqResultChannel->isValid()) {
         LOG(ERROR) << "Unable to create RequestChannelSender";
         return nullptr;
     }
-    const bool blocking = fmqResultChannel->getEventFlagWord() != nullptr;
-    return std::make_unique<ResultChannelSender>(std::move(fmqResultChannel), blocking);
+    if (fmqResultChannel->getEventFlagWord() == nullptr) {
+        LOG(ERROR) << "ResultChannelSender::create was passed an MQDescriptor without an EventFlag";
+        return nullptr;
+    }
+
+    return std::make_unique<ResultChannelSender>(std::move(fmqResultChannel));
 }
 
-ResultChannelSender::ResultChannelSender(std::unique_ptr<FmqResultChannel> fmqResultChannel,
-                                         bool blocking)
-    : mFmqResultChannel(std::move(fmqResultChannel)), mBlocking(blocking) {}
+ResultChannelSender::ResultChannelSender(std::unique_ptr<FmqResultChannel> fmqResultChannel)
+    : mFmqResultChannel(std::move(fmqResultChannel)) {}
 
 bool ResultChannelSender::send(ErrorStatus errorStatus,
                                const std::vector<OutputShape>& outputShapes, Timing timing) {
@@ -417,18 +469,15 @@
                 << "ResultChannelSender::sendPacket -- packet size exceeds size available in FMQ";
         const std::vector<FmqResultDatum> errorPacket =
                 serialize(ErrorStatus::GENERAL_FAILURE, {}, kNoTiming);
-        if (mBlocking) {
-            return mFmqResultChannel->writeBlocking(errorPacket.data(), errorPacket.size());
-        } else {
-            return mFmqResultChannel->write(errorPacket.data(), errorPacket.size());
-        }
+
+        // Always send the packet with "blocking" because this signals the futex
+        // and unblocks the consumer if it is waiting on the futex.
+        return mFmqResultChannel->writeBlocking(errorPacket.data(), errorPacket.size());
     }
 
-    if (mBlocking) {
-        return mFmqResultChannel->writeBlocking(packet.data(), packet.size());
-    } else {
-        return mFmqResultChannel->write(packet.data(), packet.size());
-    }
+    // Always send the packet with "blocking" because this signals the futex and
+    // unblocks the consumer if it is waiting on the futex.
+    return mFmqResultChannel->writeBlocking(packet.data(), packet.size());
 }
 
 // ExecutionBurstServer methods
@@ -436,7 +485,8 @@
 sp<ExecutionBurstServer> ExecutionBurstServer::create(
         const sp<IBurstCallback>& callback, const MQDescriptorSync<FmqRequestDatum>& requestChannel,
         const MQDescriptorSync<FmqResultDatum>& resultChannel,
-        std::shared_ptr<IBurstExecutorWithCache> executorWithCache) {
+        std::shared_ptr<IBurstExecutorWithCache> executorWithCache,
+        std::chrono::microseconds pollingTimeWindow) {
     // check inputs
     if (callback == nullptr || executorWithCache == nullptr) {
         LOG(ERROR) << "ExecutionBurstServer::create passed a nullptr";
@@ -445,7 +495,7 @@
 
     // create FMQ objects
     std::unique_ptr<RequestChannelReceiver> requestChannelReceiver =
-            RequestChannelReceiver::create(requestChannel);
+            RequestChannelReceiver::create(requestChannel, pollingTimeWindow);
     std::unique_ptr<ResultChannelSender> resultChannelSender =
             ResultChannelSender::create(resultChannel);
 
@@ -462,7 +512,8 @@
 
 sp<ExecutionBurstServer> ExecutionBurstServer::create(
         const sp<IBurstCallback>& callback, const MQDescriptorSync<FmqRequestDatum>& requestChannel,
-        const MQDescriptorSync<FmqResultDatum>& resultChannel, IPreparedModel* preparedModel) {
+        const MQDescriptorSync<FmqResultDatum>& resultChannel, IPreparedModel* preparedModel,
+        std::chrono::microseconds pollingTimeWindow) {
     // check relevant input
     if (preparedModel == nullptr) {
         LOG(ERROR) << "ExecutionBurstServer::create passed a nullptr";
@@ -475,7 +526,7 @@
 
     // make and return context
     return ExecutionBurstServer::create(callback, requestChannel, resultChannel,
-                                        preparedModelAdapter);
+                                        preparedModelAdapter, pollingTimeWindow);
 }
 
 ExecutionBurstServer::ExecutionBurstServer(