Add PyPerf, example of profiling Python using BPF (#2239)
This is a tool attaches BPF program to CPU Perf Events for profiling. The BPF program understands CPython internal data structure and hence able to walk actual Python stack-trace, as oppose to strac-trace of the CPython runtime itself as we would normally get with Linux perf.
To use the tool, just run the PyPerf binary:
Use -d / --duration to specify intended profiling duration, in milliseconds. Default value, if not specified, is 1000ms.
Use -c / --sample-rate to specify intended profiling sample rate, same as -c argument of Linux perf. Default value, if not specified, is 1e6.
You can also use -v / --verbose to specify logging verbosity 1 or 2 for more detailed information during profiling.
The tool is a prototype at this point is by no mean mature. It currently has follow limitation:
It only runs on CPU Cycles event.
It only works on Python 3.6 at this point. In fact all Python version from 3.0 to 3.6 should work, I just need to verify and change the constant value. However in Python 3.7 there are some internal data structure changes that the actual parsing logic needs to be updated.
It currently hard-codes the Python internal data structure offsets. It would be better to get a dependency of python-devel and get them directly from the header files.
The output is pretty horrible. No de-duplication across same stack, and we always output the GIL state, Thread state and output them in raw integer value. I will need to work on prettifying the output and make better sense of the enum values.
Landing it in C++ example for now, once it's mature enough I will move it to tools/.diff --git a/examples/cpp/pyperf/PyPerfUtil.h b/examples/cpp/pyperf/PyPerfUtil.h
new file mode 100644
index 0000000..3e69a29
--- /dev/null
+++ b/examples/cpp/pyperf/PyPerfUtil.h
@@ -0,0 +1,74 @@
+/*
+ * Copyright (c) Facebook, Inc.
+ * Licensed under the Apache License, Version 2.0 (the "License")
+ */
+
+#pragma once
+
+#include <string>
+#include <vector>
+
+#include <linux/perf_event.h>
+#include <sys/types.h>
+
+#include "BPF.h"
+#include "PyPerfType.h"
+
+namespace ebpf {
+namespace pyperf {
+
+class PyPerfUtil {
+ public:
+ enum class PyPerfResult : int {
+ SUCCESS = 0,
+ INIT_FAIL,
+ PERF_BUF_OPEN_FAIL,
+ NO_INIT,
+ EVENT_ATTACH_FAIL,
+ EVENT_DETACH_FAIL
+ };
+
+ struct Sample {
+ pid_t pid;
+ pid_t tid;
+ std::string comm;
+ uint8_t threadStateMatch;
+ uint8_t gilState;
+ uint8_t pthreadIDMatch;
+ uint8_t stackStatus;
+ std::vector<int32_t> pyStackIds;
+
+ explicit Sample(const Event* raw, int rawSize)
+ : pid(raw->pid),
+ tid(raw->tid),
+ comm(raw->comm),
+ threadStateMatch(raw->thread_state_match),
+ gilState(raw->gil_state),
+ pthreadIDMatch(raw->pthread_id_match),
+ stackStatus(raw->stack_status),
+ pyStackIds(raw->stack, raw->stack + raw->stack_len) {}
+ };
+
+ // init must be invoked exactly once before invoking profile
+ PyPerfResult init();
+
+ PyPerfResult profile(int64_t sampleRate, int64_t durationMs);
+
+ private:
+ uint32_t lostSymbols_ = 0, totalSamples_ = 0, lostSamples_ = 0, truncatedStack_ = 0;
+
+ ebpf::BPF bpf_{0, nullptr, false, "", true};
+ std::vector<Sample> samples_;
+ bool initCompleted_{false};
+
+ void handleSample(const void* data, int dataSize);
+ void handleLostSamples(int lostCnt);
+ friend void handleLostSamplesCallback(void*, uint64_t);
+ friend void handleSampleCallback(void*, void*, int);
+
+ std::string getSymbolName(Symbol& sym) const;
+
+ bool tryTargetPid(int pid, PidData& data);
+};
+} // namespace pyperf
+} // namespace ebpf