Implement ftrace flushing

This CL finally implements flushing of ftrace data and
wires it up to the Flush() API of the tracing service.
The major change introduced is in CpuReader, the multi
threaded class that reads ftrace data.
The core scheduling algorithm is left unchanged (see
//docs/ftrace.md). The old logic is still preserved:
[wait for command] -> [splice blocking] -> [splice nonblock]
However, this CL introduces a way for the FtraceController
to interrupt the splice and switch to read() mode for
flushes. read(), in fact, is the only way to read partial
pages out of the ftrace buffer.
Even when in read() mode, the scheduling algorithm is still
the same, just s/splice/read/.
There are a bunch of caveats and they are thoroughly
described in b/120188810. Essentially once we switch to
read() for the flush, we need to be careful and wait for
the ftrace read pointer to become page-aligned again before
switching back to splice().
Furthermore this CL gets rid of the internal pipe to move
data between the worker thread and the main thread and switches
to the recently introduced page pool. We still have an internal
pipe (for splice) but use it entirely in the worker therad.
Exposing the pipe to the main thread had two drawbacks:
1) Essentially limits the transfer bandwidth between the worker
   thread and the main thread, making the ftrace buffer size
   useless. Before, even if one specified a buffer size of 1GB,
   we would at most read 1 pipe buffer per read cycle.
2) That hits another kernel bug around non-blocking splice
   (b/119805587).

Test: perfetto_integrationtests --gtest_filter=PerfettoTest.VeryLargePackets
Bug: 73886018
Change-Id: I4a6e82b284971a3765b42959904f5402095e4c4e
diff --git a/src/traced/probes/ftrace/ftrace_controller.h b/src/traced/probes/ftrace/ftrace_controller.h
index b1845ce..5201e34 100644
--- a/src/traced/probes/ftrace/ftrace_controller.h
+++ b/src/traced/probes/ftrace/ftrace_controller.h
@@ -17,14 +17,13 @@
 #ifndef SRC_TRACED_PROBES_FTRACE_FTRACE_CONTROLLER_H_
 #define SRC_TRACED_PROBES_FTRACE_FTRACE_CONTROLLER_H_
 
+#include <stdint.h>
 #include <unistd.h>
 
 #include <bitset>
-#include <condition_variable>
 #include <functional>
 #include <map>
 #include <memory>
-#include <mutex>
 #include <set>
 #include <string>
 
@@ -32,7 +31,9 @@
 #include "perfetto/base/task_runner.h"
 #include "perfetto/base/utils.h"
 #include "perfetto/base/weak_ptr.h"
+#include "perfetto/tracing/core/basic_types.h"
 #include "src/traced/probes/ftrace/ftrace_config.h"
+#include "src/traced/probes/ftrace/ftrace_thread_sync.h"
 
 namespace perfetto {
 
@@ -61,6 +62,10 @@
   static std::unique_ptr<FtraceController> Create(base::TaskRunner*, Observer*);
   virtual ~FtraceController();
 
+  // These two methods are called by CpuReader(s) from their worker threads.
+  static void OnCpuReaderRead(size_t cpu, int generation, FtraceThreadSync*);
+  static void OnCpuReaderFlush(size_t cpu, int generation, FtraceThreadSync*);
+
   void DisableAllEvents();
   void WriteTraceMarker(const std::string& s);
   void ClearTrace();
@@ -69,6 +74,11 @@
   bool StartDataSource(FtraceDataSource*);
   void RemoveDataSource(FtraceDataSource*);
 
+  // Force a read of the ftrace buffers, including kernel buffer pages that
+  // are not full. Will call OnFtraceFlushComplete() on all
+  // |started_data_sources_| once all workers have flushed (or timed out).
+  void Flush(FlushRequestID);
+
   void DumpFtraceStats(FtraceStats*);
 
   base::WeakPtr<FtraceController> GetWeakPtr() {
@@ -95,36 +105,28 @@
   FtraceController(const FtraceController&) = delete;
   FtraceController& operator=(const FtraceController&) = delete;
 
-  // Called on a worker thread when |cpu| has at least one page of data
-  // available for reading.
-  void OnDataAvailable(base::WeakPtr<FtraceController>,
-                       size_t generation,
-                       size_t cpu,
-                       uint32_t drain_period_ms);
-
-  static void DrainCPUs(base::WeakPtr<FtraceController>, size_t generation);
-  static void UnblockReaders(const base::WeakPtr<FtraceController>&);
+  void OnFlushTimeout(FlushRequestID);
+  void DrainCPUs(int generation);
+  void UnblockReaders();
+  void NotifyFlushCompleteToStartedDataSources(FlushRequestID);
+  void IssueThreadSyncCmd(FtraceThreadSync::Cmd,
+                          std::unique_lock<std::mutex> = {});
 
   uint32_t GetDrainPeriodMs();
 
   void StartIfNeeded();
   void StopIfNeeded();
 
-  // Begin lock-protected members.
-  std::mutex lock_;
-  std::condition_variable data_drained_;
-  std::bitset<base::kMaxCpus> cpus_to_drain_;
-  bool listening_for_raw_trace_data_ = false;
-  // End lock-protected members.
-
   base::TaskRunner* const task_runner_;
   Observer* const observer_;
+  FtraceThreadSync thread_sync_;
   std::unique_ptr<FtraceProcfs> ftrace_procfs_;
   std::unique_ptr<ProtoTranslationTable> table_;
   std::unique_ptr<FtraceConfigMuxer> ftrace_config_muxer_;
-  size_t generation_ = 0;
+  int generation_ = 0;
+  FlushRequestID cur_flush_request_id_ = 0;
   bool atrace_running_ = false;
-  std::map<size_t, std::unique_ptr<CpuReader>> cpu_readers_;
+  std::vector<std::unique_ptr<CpuReader>> cpu_readers_;
   std::set<FtraceDataSource*> data_sources_;
   std::set<FtraceDataSource*> started_data_sources_;
   base::WeakPtrFactory<FtraceController> weak_factory_;  // Keep last.