|  | =================== | 
|  | Debugging with XRay | 
|  | =================== | 
|  |  | 
|  | This document shows an example of how you would go about analyzing applications | 
|  | built with XRay instrumentation. Here we will attempt to debug ``llc`` | 
|  | compiling some sample LLVM IR generated by Clang. | 
|  |  | 
|  | .. contents:: | 
|  | :local: | 
|  |  | 
|  | Building with XRay | 
|  | ------------------ | 
|  |  | 
|  | To debug an application with XRay instrumentation, we need to build it with a | 
|  | Clang that supports the ``-fxray-instrument`` option. See `XRay <XRay.html>`_ | 
|  | for more technical details of how XRay works for background information. | 
|  |  | 
|  | In our example, we need to add ``-fxray-instrument`` to the list of flags | 
|  | passed to Clang when building a binary. Note that we need to link with Clang as | 
|  | well to get the XRay runtime linked in appropriately. For building ``llc`` with | 
|  | XRay, we do something similar below for our LLVM build: | 
|  |  | 
|  | :: | 
|  |  | 
|  | $ mkdir -p llvm-build && cd llvm-build | 
|  | # Assume that the LLVM sources are at ../llvm | 
|  | $ cmake -GNinja ../llvm -DCMAKE_BUILD_TYPE=Release \ | 
|  | -DCMAKE_C_FLAGS_RELEASE="-fxray-instrument" -DCMAKE_CXX_FLAGS="-fxray-instrument" \ | 
|  | # Once this finishes, we should build llc | 
|  | $ ninja llc | 
|  |  | 
|  |  | 
|  | To verify that we have an XRay instrumented binary, we can use ``objdump`` to | 
|  | look for the ``xray_instr_map`` section. | 
|  |  | 
|  | :: | 
|  |  | 
|  | $ objdump -h -j xray_instr_map ./bin/llc | 
|  | ./bin/llc:     file format elf64-x86-64 | 
|  |  | 
|  | Sections: | 
|  | Idx Name          Size      VMA               LMA               File off  Algn | 
|  | 14 xray_instr_map 00002fc0  00000000041516c6  00000000041516c6  03d516c6  2**0 | 
|  | CONTENTS, ALLOC, LOAD, READONLY, DATA | 
|  |  | 
|  | Getting Traces | 
|  | -------------- | 
|  |  | 
|  | By default, XRay does not write out the trace files or patch the application | 
|  | before main starts. If we run ``llc`` it should work like a normally built | 
|  | binary. If we want to get a full trace of the application's operations (of the | 
|  | functions we do end up instrumenting with XRay) then we need to enable XRay | 
|  | at application start. To do this, XRay checks the ``XRAY_OPTIONS`` environment | 
|  | variable. | 
|  |  | 
|  | :: | 
|  |  | 
|  | # The following doesn't create an XRay trace by default. | 
|  | $ ./bin/llc input.ll | 
|  |  | 
|  | # We need to set the XRAY_OPTIONS to enable some features. | 
|  | $ XRAY_OPTIONS="patch_premain=true xray_mode=xray-basic verbosity=1" ./bin/llc input.ll | 
|  | ==69819==XRay: Log file in 'xray-log.llc.m35qPB' | 
|  |  | 
|  | At this point we now have an XRay trace we can start analysing. | 
|  |  | 
|  | The ``llvm-xray`` Tool | 
|  | ---------------------- | 
|  |  | 
|  | Having a trace then allows us to do basic accounting of the functions that were | 
|  | instrumented, and how much time we're spending in parts of the code. To make | 
|  | sense of this data, we use the ``llvm-xray`` tool which has a few subcommands | 
|  | to help us understand our trace. | 
|  |  | 
|  | One of the things we can do is to get an accounting of the functions that have | 
|  | been instrumented. We can see an example accounting with ``llvm-xray account``: | 
|  |  | 
|  | :: | 
|  |  | 
|  | $ llvm-xray account xray-log.llc.m35qPB -top=10 -sort=sum -sortorder=dsc -instr_map ./bin/llc | 
|  | Functions with latencies: 29 | 
|  | funcid      count [      min,       med,       90p,       99p,       max]       sum  function | 
|  | 187        360 [ 0.000000,  0.000001,  0.000014,  0.000032,  0.000075]  0.001596  LLLexer.cpp:446:0: llvm::LLLexer::LexIdentifier() | 
|  | 85        130 [ 0.000000,  0.000000,  0.000018,  0.000023,  0.000156]  0.000799  X86ISelDAGToDAG.cpp:1984:0: (anonymous namespace)::X86DAGToDAGISel::Select(llvm::SDNode*) | 
|  | 138        130 [ 0.000000,  0.000000,  0.000017,  0.000155,  0.000155]  0.000774  SelectionDAGISel.cpp:2963:0: llvm::SelectionDAGISel::SelectCodeCommon(llvm::SDNode*, unsigned char const*, unsigned int) | 
|  | 188        103 [ 0.000000,  0.000000,  0.000003,  0.000123,  0.000214]  0.000737  LLParser.cpp:2692:0: llvm::LLParser::ParseValID(llvm::ValID&, llvm::LLParser::PerFunctionState*) | 
|  | 88          1 [ 0.000562,  0.000562,  0.000562,  0.000562,  0.000562]  0.000562  X86ISelLowering.cpp:83:0: llvm::X86TargetLowering::X86TargetLowering(llvm::X86TargetMachine const&, llvm::X86Subtarget const&) | 
|  | 125        102 [ 0.000001,  0.000003,  0.000010,  0.000017,  0.000049]  0.000471  Verifier.cpp:3714:0: (anonymous namespace)::Verifier::visitInstruction(llvm::Instruction&) | 
|  | 90          8 [ 0.000023,  0.000035,  0.000106,  0.000106,  0.000106]  0.000342  X86ISelLowering.cpp:3363:0: llvm::X86TargetLowering::LowerCall(llvm::TargetLowering::CallLoweringInfo&, llvm::SmallVectorImpl<llvm::SDValue>&) const | 
|  | 124         32 [ 0.000003,  0.000007,  0.000016,  0.000041,  0.000041]  0.000310  Verifier.cpp:1967:0: (anonymous namespace)::Verifier::visitFunction(llvm::Function const&) | 
|  | 123          1 [ 0.000302,  0.000302,  0.000302,  0.000302,  0.000302]  0.000302  LLVMContextImpl.cpp:54:0: llvm::LLVMContextImpl::~LLVMContextImpl() | 
|  | 139         46 [ 0.000000,  0.000002,  0.000006,  0.000008,  0.000019]  0.000138  TargetLowering.cpp:506:0: llvm::TargetLowering::SimplifyDemandedBits(llvm::SDValue, llvm::APInt const&, llvm::APInt&, llvm::APInt&, llvm::TargetLowering::TargetLoweringOpt&, unsigned int, bool) const | 
|  |  | 
|  | This shows us that for our input file, ``llc`` spent the most cumulative time | 
|  | in the lexer (a total of 1 millisecond). If we wanted for example to work with | 
|  | this data in a spreadsheet, we can output the results as CSV using the | 
|  | ``-format=csv`` option to the command for further analysis. | 
|  |  | 
|  | If we want to get a textual representation of the raw trace we can use the | 
|  | ``llvm-xray convert`` tool to get YAML output. The first few lines of that | 
|  | output for an example trace would look like the following: | 
|  |  | 
|  | :: | 
|  |  | 
|  | $ llvm-xray convert -f yaml -symbolize -instr_map=./bin/llc xray-log.llc.m35qPB | 
|  | --- | 
|  | header: | 
|  | version:         1 | 
|  | type:            0 | 
|  | constant-tsc:    true | 
|  | nonstop-tsc:     true | 
|  | cycle-frequency: 2601000000 | 
|  | records: | 
|  | - { type: 0, func-id: 110, function: __cxx_global_var_init.8, cpu: 37, thread: 69819, kind: function-enter, tsc: 5434426023268520 } | 
|  | - { type: 0, func-id: 110, function: __cxx_global_var_init.8, cpu: 37, thread: 69819, kind: function-exit, tsc: 5434426023523052 } | 
|  | - { type: 0, func-id: 164, function: __cxx_global_var_init, cpu: 37, thread: 69819, kind: function-enter, tsc: 5434426029925386 } | 
|  | - { type: 0, func-id: 164, function: __cxx_global_var_init, cpu: 37, thread: 69819, kind: function-exit, tsc: 5434426030031128 } | 
|  | - { type: 0, func-id: 142, function: '(anonymous namespace)::CommandLineParser::ParseCommandLineOptions(int, char const* const*, llvm::StringRef, llvm::raw_ostream*)', cpu: 37, thread: 69819, kind: function-enter, tsc: 5434426046951388 } | 
|  | - { type: 0, func-id: 142, function: '(anonymous namespace)::CommandLineParser::ParseCommandLineOptions(int, char const* const*, llvm::StringRef, llvm::raw_ostream*)', cpu: 37, thread: 69819, kind: function-exit, tsc: 5434426047282020 } | 
|  | - { type: 0, func-id: 187, function: 'llvm::LLLexer::LexIdentifier()', cpu: 37, thread: 69819, kind: function-enter, tsc: 5434426047857332 } | 
|  | - { type: 0, func-id: 187, function: 'llvm::LLLexer::LexIdentifier()', cpu: 37, thread: 69819, kind: function-exit, tsc: 5434426047984152 } | 
|  | - { type: 0, func-id: 187, function: 'llvm::LLLexer::LexIdentifier()', cpu: 37, thread: 69819, kind: function-enter, tsc: 5434426048036584 } | 
|  | - { type: 0, func-id: 187, function: 'llvm::LLLexer::LexIdentifier()', cpu: 37, thread: 69819, kind: function-exit, tsc: 5434426048042292 } | 
|  | - { type: 0, func-id: 187, function: 'llvm::LLLexer::LexIdentifier()', cpu: 37, thread: 69819, kind: function-enter, tsc: 5434426048055056 } | 
|  | - { type: 0, func-id: 187, function: 'llvm::LLLexer::LexIdentifier()', cpu: 37, thread: 69819, kind: function-exit, tsc: 5434426048067316 } | 
|  |  | 
|  | Controlling Fidelity | 
|  | -------------------- | 
|  |  | 
|  | So far in our examples, we haven't been getting full coverage of the functions | 
|  | we have in the binary. To get that, we need to modify the compiler flags so | 
|  | that we can instrument more (if not all) the functions we have in the binary. | 
|  | We have two options for doing that, and we explore both of these below. | 
|  |  | 
|  | Instruction Threshold | 
|  | ````````````````````` | 
|  |  | 
|  | The first "blunt" way of doing this is by setting the minimum threshold for | 
|  | function bodies to 1. We can do that with the | 
|  | ``-fxray-instruction-threshold=N`` flag when building our binary. We rebuild | 
|  | ``llc`` with this option and observe the results: | 
|  |  | 
|  | :: | 
|  |  | 
|  | $ rm CMakeCache.txt | 
|  | $ cmake -GNinja ../llvm -DCMAKE_BUILD_TYPE=Release \ | 
|  | -DCMAKE_C_FLAGS_RELEASE="-fxray-instrument -fxray-instruction-threshold=1" \ | 
|  | -DCMAKE_CXX_FLAGS="-fxray-instrument -fxray-instruction-threshold=1" | 
|  | $ ninja llc | 
|  | $ XRAY_OPTIONS="patch_premain=true" ./bin/llc input.ll | 
|  | ==69819==XRay: Log file in 'xray-log.llc.5rqxkU' | 
|  |  | 
|  | $ llvm-xray account xray-log.llc.5rqxkU -top=10 -sort=sum -sortorder=dsc -instr_map ./bin/llc | 
|  | Functions with latencies: 36652 | 
|  | funcid      count [      min,       med,       90p,       99p,       max]       sum  function | 
|  | 75          1 [ 0.672368,  0.672368,  0.672368,  0.672368,  0.672368]  0.672368  llc.cpp:271:0: main | 
|  | 78          1 [ 0.626455,  0.626455,  0.626455,  0.626455,  0.626455]  0.626455  llc.cpp:381:0: compileModule(char**, llvm::LLVMContext&) | 
|  | 139617          1 [ 0.472618,  0.472618,  0.472618,  0.472618,  0.472618]  0.472618  LegacyPassManager.cpp:1723:0: llvm::legacy::PassManager::run(llvm::Module&) | 
|  | 139610          1 [ 0.472618,  0.472618,  0.472618,  0.472618,  0.472618]  0.472618  LegacyPassManager.cpp:1681:0: llvm::legacy::PassManagerImpl::run(llvm::Module&) | 
|  | 139612          1 [ 0.470948,  0.470948,  0.470948,  0.470948,  0.470948]  0.470948  LegacyPassManager.cpp:1564:0: (anonymous namespace)::MPPassManager::runOnModule(llvm::Module&) | 
|  | 139607          2 [ 0.147345,  0.315994,  0.315994,  0.315994,  0.315994]  0.463340  LegacyPassManager.cpp:1530:0: llvm::FPPassManager::runOnModule(llvm::Module&) | 
|  | 139605         21 [ 0.000002,  0.000002,  0.102593,  0.213336,  0.213336]  0.463331  LegacyPassManager.cpp:1491:0: llvm::FPPassManager::runOnFunction(llvm::Function&) | 
|  | 139563      26096 [ 0.000002,  0.000002,  0.000037,  0.000063,  0.000215]  0.225708  LegacyPassManager.cpp:1083:0: llvm::PMDataManager::findAnalysisPass(void const*, bool) | 
|  | 108055        188 [ 0.000002,  0.000120,  0.001375,  0.004523,  0.062624]  0.159279  MachineFunctionPass.cpp:38:0: llvm::MachineFunctionPass::runOnFunction(llvm::Function&) | 
|  | 62635         22 [ 0.000041,  0.000046,  0.000050,  0.126744,  0.126744]  0.127715  X86TargetMachine.cpp:242:0: llvm::X86TargetMachine::getSubtargetImpl(llvm::Function const&) const | 
|  |  | 
|  |  | 
|  | Instrumentation Attributes | 
|  | `````````````````````````` | 
|  |  | 
|  | The other way is to use configuration files for selecting which functions | 
|  | should always be instrumented by the compiler. This gives us a way of ensuring | 
|  | that certain functions are either always or never instrumented by not having to | 
|  | add the attribute to the source. | 
|  |  | 
|  | To use this feature, you can define one file for the functions to always | 
|  | instrument, and another for functions to never instrument. The format of these | 
|  | files are exactly the same as the SanitizerLists files that control similar | 
|  | things for the sanitizer implementations. For example: | 
|  |  | 
|  | :: | 
|  |  | 
|  | # xray-attr-list.txt | 
|  | # always instrument functions that match the following filters: | 
|  | [always] | 
|  | fun:main | 
|  |  | 
|  | # never instrument functions that match the following filters: | 
|  | [never] | 
|  | fun:__cxx_* | 
|  |  | 
|  | Given the file above we can re-build by providing it to the | 
|  | ``-fxray-attr-list=`` flag to clang. You can have multiple files, each defining | 
|  | different sets of attribute sets, to be combined into a single list by clang. | 
|  |  | 
|  | The XRay stack tool | 
|  | ------------------- | 
|  |  | 
|  | Given a trace, and optionally an instrumentation map, the ``llvm-xray stack`` | 
|  | command can be used to analyze a call stack graph constructed from the function | 
|  | call timeline. | 
|  |  | 
|  | The way to use the command is to output the top stacks by call count and time spent. | 
|  |  | 
|  | :: | 
|  |  | 
|  | $ llvm-xray stack xray-log.llc.5rqxkU -instr_map ./bin/llc | 
|  |  | 
|  | Unique Stacks: 3069 | 
|  | Top 10 Stacks by leaf sum: | 
|  |  | 
|  | Sum: 9633790 | 
|  | lvl   function                                                            count              sum | 
|  | #0    main                                                                    1         58421550 | 
|  | #1    compileModule(char**, llvm::LLVMContext&)                               1         51440360 | 
|  | #2    llvm::legacy::PassManagerImpl::run(llvm::Module&)                       1         40535375 | 
|  | #3    llvm::FPPassManager::runOnModule(llvm::Module&)                         2         39337525 | 
|  | #4    llvm::FPPassManager::runOnFunction(llvm::Function&)                     6         39331465 | 
|  | #5    llvm::PMDataManager::verifyPreservedAnalysis(llvm::Pass*)             399         16628590 | 
|  | #6    llvm::PMTopLevelManager::findAnalysisPass(void const*)               4584         15155600 | 
|  | #7    llvm::PMDataManager::findAnalysisPass(void const*, bool)            32088          9633790 | 
|  |  | 
|  | ..etc.. | 
|  |  | 
|  | In the default mode, identical stacks on different threads are independently | 
|  | aggregated. In a multithreaded program, you may end up having identical call | 
|  | stacks fill your list of top calls. | 
|  |  | 
|  | To address this, you may specify the ``-aggregate-threads`` or | 
|  | ``-per-thread-stacks`` flags. ``-per-thread-stacks`` treats the thread id as an | 
|  | implicit root in each call stack tree, while ``-aggregate-threads`` combines | 
|  | identical stacks from all threads. | 
|  |  | 
|  | Flame Graph Generation | 
|  | ---------------------- | 
|  |  | 
|  | The ``llvm-xray stack`` tool may also be used to generate flamegraphs for | 
|  | visualizing your instrumented invocations. The tool does not generate the graphs | 
|  | themselves, but instead generates a format that can be used with Brendan Gregg's | 
|  | FlameGraph tool, currently available on `github | 
|  | <https://github.com/brendangregg/FlameGraph>`_. | 
|  |  | 
|  | To generate output for a flamegraph, a few more options are necessary. | 
|  |  | 
|  | - ``-all-stacks`` - Emits all of the stacks. | 
|  | - ``-stack-format`` - Choose the flamegraph output format 'flame'. | 
|  | - ``-aggregation-type`` - Choose the metric to graph. | 
|  |  | 
|  | You may pipe the command output directly to the flamegraph tool to obtain an | 
|  | svg file. | 
|  |  | 
|  | :: | 
|  |  | 
|  | $llvm-xray stack xray-log.llc.5rqxkU -instr_map ./bin/llc -stack-format=flame -aggregation-type=time -all-stacks | \ | 
|  | /path/to/FlameGraph/flamegraph.pl > flamegraph.svg | 
|  |  | 
|  | If you open the svg in a browser, mouse events allow exploring the call stacks. | 
|  |  | 
|  | Chrome Trace Viewer Visualization | 
|  | --------------------------------- | 
|  |  | 
|  | We can also generate a trace which can be loaded by the Chrome Trace Viewer | 
|  | from the same generated trace: | 
|  |  | 
|  | :: | 
|  |  | 
|  | $ llvm-xray convert -symbolize -instr_map=./bin/llc \ | 
|  | -output_format=trace_event xray-log.llc.5rqxkU \ | 
|  | | gzip > llc-trace.txt.gz | 
|  |  | 
|  | From a Chrome browser, navigating to ``chrome:///tracing`` allows us to load | 
|  | the ``sample-trace.txt.gz`` file to visualize the execution trace. | 
|  |  | 
|  | Further Exploration | 
|  | ------------------- | 
|  |  | 
|  | The ``llvm-xray`` tool has a few other subcommands that are in various stages | 
|  | of being developed. One interesting subcommand that can highlight a few | 
|  | interesting things is the ``graph`` subcommand. Given for example the following | 
|  | toy program that we build with XRay instrumentation, we can see how the | 
|  | generated graph may be a helpful indicator of where time is being spent for the | 
|  | application. | 
|  |  | 
|  | .. code-block:: c++ | 
|  |  | 
|  | // sample.cc | 
|  | #include <iostream> | 
|  | #include <thread> | 
|  |  | 
|  | [[clang::xray_always_instrument]] void f() { | 
|  | std::cerr << '.'; | 
|  | } | 
|  |  | 
|  | [[clang::xray_always_instrument]] void g() { | 
|  | for (int i = 0; i < 1 << 10; ++i) { | 
|  | std::cerr << '-'; | 
|  | } | 
|  | } | 
|  |  | 
|  | int main(int argc, char* argv[]) { | 
|  | std::thread t1([] { | 
|  | for (int i = 0; i < 1 << 10; ++i) | 
|  | f(); | 
|  | }); | 
|  | std::thread t2([] { | 
|  | g(); | 
|  | }); | 
|  | t1.join(); | 
|  | t2.join(); | 
|  | std::cerr << '\n'; | 
|  | } | 
|  |  | 
|  | We then build the above with XRay instrumentation: | 
|  |  | 
|  | :: | 
|  |  | 
|  | $ clang++ -o sample -O3 sample.cc -std=c++11 -fxray-instrument -fxray-instruction-threshold=1 | 
|  | $ XRAY_OPTIONS="patch_premain=true xray_mode=xray-basic" ./sample | 
|  |  | 
|  | We can then explore the graph rendering of the trace generated by this sample | 
|  | application. We assume you have the graphviz toosl available in your system, | 
|  | including both ``unflatten`` and ``dot``. If you prefer rendering or exploring | 
|  | the graph using another tool, then that should be feasible as well. ``llvm-xray | 
|  | graph`` will create DOT format graphs which should be usable in most graph | 
|  | rendering applications. One example invocation of the ``llvm-xray graph`` | 
|  | command should yield some interesting insights to the workings of C++ | 
|  | applications: | 
|  |  | 
|  | :: | 
|  |  | 
|  | $ llvm-xray graph xray-log.sample.* -m sample -color-edges=sum -edge-label=sum \ | 
|  | | unflatten -f -l10 | dot -Tsvg -o sample.svg | 
|  |  | 
|  |  | 
|  | Next Steps | 
|  | ---------- | 
|  |  | 
|  | If you have some interesting analyses you'd like to implement as part of the | 
|  | llvm-xray tool, please feel free to propose them on the llvm-dev@ mailing list. | 
|  | The following are some ideas to inspire you in getting involved and potentially | 
|  | making things better. | 
|  |  | 
|  | - Implement a query/filtering library that allows for finding patterns in the | 
|  | XRay traces. | 
|  | - Collecting function call stacks and how often they're encountered in the | 
|  | XRay trace. | 
|  |  | 
|  |  |