<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> | |
<html xmlns="http://www.w3.org/1999/xhtml"> | |
<head> | |
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /> | |
<link href="style.css" rel="stylesheet" type="text/css" /> | |
<title>Symbolicating with LLDB</title> | |
</head> | |
<body> | |
<div class="www_title"> | |
The <strong>LLDB</strong> Debugger | |
</div> | |
<div id="container"> | |
<div id="content"> | |
<!--#include virtual="sidebar.incl"--> | |
<div id="middle"> | |
<div class="post"> | |
<h1 class="postheader">Manual Symbolication with LLDB</h1> | |
<div class="postcontent"> | |
<p>LLDB is separated into a shared library that contains the core of the debugger, | |
and a driver that implements debugging and a command interpreter. LLDB can be | |
used to symbolicate your crash logs and can often provide more information than | |
other symbolication programs: | |
</p> | |
<ul> | |
<li>Inlined functions</li> | |
<li>Variables that are in scope for an address, along with their locations</li> | |
</ul> | |
<p>The simplest form of symbolication is to load an executable:</p> | |
<code><pre><tt><b>(lldb)</b> target create --no-dependents --arch x86_64 /tmp/a.out | |
</tt></pre></code> | |
<p>We use the "--no-dependents" flag with the "target create" command so | |
that we don't load all of the dependent shared libraries from the current | |
system. When we symbolicate, we are often symbolicating a binary that | |
was running on another system, and even though the main executable might | |
reference shared libraries in "/usr/lib", we often don't want to load | |
the versions on the current computer.</p> | |
<p>Using the "image list" command will show us a list of all shared libraries | |
associated with the current target. As expected, we currently only have a single | |
binary: | |
</p> | |
<code><pre><tt><b>(lldb)</b> image list | |
[ 0] 73431214-6B76-3489-9557-5075F03E36B4 0x0000000100000000 /tmp/a.out | |
/tmp/a.out.dSYM/Contents/Resources/DWARF/a.out | |
</tt></pre></code> | |
<p>Now we can look up an address:</p> | |
<code><pre><tt><b>(lldb)</b> image lookup --address 0x100000aa3 | |
Address: a.out[0x0000000100000aa3] (a.out.__TEXT.__text + 131) | |
Summary: a.out`main + 67 at main.c:13 | |
</tt></pre></code> | |
<p>Since we haven't specified a slide or any load addresses for individual sections | |
in the binary, the address that we use here is a <b>file</b> address. A <b>file</b> | |
address refers to a virtual address as defined by each object file. | |
</p> | |
<p>If we didn't use the "--no-dependents" option with "target create", we would | |
have loaded all dependent shared libraries:<p> | |
<code><pre><tt><b>(lldb)</b> image list | |
[ 0] 73431214-6B76-3489-9557-5075F03E36B4 0x0000000100000000 /tmp/a.out | |
/tmp/a.out.dSYM/Contents/Resources/DWARF/a.out | |
[ 1] 8CBCF9B9-EBB7-365E-A3FF-2F3850763C6B 0x0000000000000000 /usr/lib/system/libsystem_c.dylib | |
[ 2] 62AA0B84-188A-348B-8F9E-3E2DB08DB93C 0x0000000000000000 /usr/lib/system/libsystem_dnssd.dylib | |
[ 3] C0535565-35D1-31A7-A744-63D9F10F12A4 0x0000000000000000 /usr/lib/system/libsystem_kernel.dylib | |
... | |
</tt></pre></code> | |
<p>Now if we do a lookup using a <b>file</b> address, this can result in multiple | |
matches since most shared libraries have a virtual address space that starts at zero:</p> | |
<code><pre><tt><b>(lldb)</b> image lookup -a 0x1000 | |
Address: a.out[0x0000000000001000] (a.out.__PAGEZERO + 4096) | |
Address: libsystem_c.dylib[0x0000000000001000] (libsystem_c.dylib.__TEXT.__text + 928) | |
Summary: libsystem_c.dylib`mcount + 9 | |
Address: libsystem_dnssd.dylib[0x0000000000001000] (libsystem_dnssd.dylib.__TEXT.__text + 456) | |
Summary: libsystem_dnssd.dylib`ConvertHeaderBytes + 38 | |
Address: libsystem_kernel.dylib[0x0000000000001000] (libsystem_kernel.dylib.__TEXT.__text + 1116) | |
Summary: libsystem_kernel.dylib`clock_get_time + 102 | |
... | |
</tt></pre></code> | |
<p>To avoid getting multiple file address matches, you can specify the | |
<b>name</b> of the shared library to limit the search:</p> | |
<code><pre><tt><b>(lldb)</b> image lookup -a 0x1000 <b>a.out</b> | |
Address: a.out[0x0000000000001000] (a.out.__PAGEZERO + 4096) | |
</tt></pre></code> | |
</div> | |
<div class="postfooter"></div> | |
</div> | |
<div class="post"> | |
<h1 class="postheader">Defining Load Addresses for Sections</h1> | |
<div class="postcontent"> | |
<p>When symbolicating your crash logs, it can be tedious if you always have to | |
adjust your crashlog-addresses into file addresses. To avoid having to do any | |
conversion, you can set the load address for the sections of the modules in your target. | |
Once you set any section load address, lookups will switch to using | |
<b>load</b> addresses. You can slide all sections in the executable by the same amount, | |
or set the <b>load</b> address for individual sections. The | |
"target modules load --slide" command allows us to set the <b>load</b> address for | |
all sections. | |
<p>Below is an example of sliding all sections in <b>a.out</b> by adding 0x123000 to each section's <b>file</b> address:</p> | |
<code><pre><tt><b>(lldb)</b> target create --no-dependents --arch x86_64 /tmp/a.out | |
<b>(lldb)</b> target modules load --file a.out --slide 0x123000 | |
</tt></pre></code> | |
<p>It is often much easier to specify the actual load location of each section by name. | |
Crash logs on Mac OS X have a <b>Binary Images</b> section that specifies | |
that address of the __TEXT segment for each binary. Specifying a slide requires | |
requires that you first find the original (<b>file</b>) address for the __TEXT | |
segment, and subtract the two values. | |
If you specify the | |
address of the __TEXT segment with "target modules load <i>section</i> <i>address</i>", you don't need to do any calculations. To specify | |
the load addresses of sections we can specify one or more section name + address pairs | |
in the "target modules load" command:</p> | |
<code><pre><tt><b>(lldb)</b> target create --no-dependents --arch x86_64 /tmp/a.out | |
<b>(lldb)</b> target modules load --file a.out __TEXT 0x100123000 | |
</tt></pre></code> | |
<p>We specified that the <b>__TEXT</b> section is loaded at 0x100123000. | |
Now that we have defined where sections have been loaded in our target, | |
any lookups we do will now use <b>load</b> addresses so we don't have to | |
do any math on the addresses in the crashlog backtraces, we can just use the | |
raw addresses:</p> | |
<code><pre><tt><b>(lldb)</b> image lookup --address 0x100123aa3 | |
Address: a.out[0x0000000100000aa3] (a.out.__TEXT.__text + 131) | |
Summary: a.out`main + 67 at main.c:13 | |
</tt></pre></code> | |
</div> | |
<div class="postfooter"></div> | |
</div> | |
<div class="post"> | |
<h1 class="postheader">Loading Multiple Executables</h1> | |
<div class="postcontent"> | |
<p>You often have more than one executable involved when you need to symbolicate | |
a crash log. When this happens, you create a target for the main executable | |
or one of the shared libraries, then add more modules to the target using the | |
"target modules add" command.<p> | |
<p>Lets say we have a Darwin crash log that contains the following images: | |
<code><pre><tt>Binary Images: | |
<font color=blue>0x100000000</font> - 0x100000ff7 <A866975B-CA1E-3649-98D0-6C5FAA444ECF> /tmp/a.out | |
<font color=green>0x7fff83f32000</font> - 0x7fff83ffefe7 <8CBCF9B9-EBB7-365E-A3FF-2F3850763C6B> /usr/lib/system/libsystem_c.dylib | |
<font color=red>0x7fff883db000</font> - 0x7fff883e3ff7 <62AA0B84-188A-348B-8F9E-3E2DB08DB93C> /usr/lib/system/libsystem_dnssd.dylib | |
<font color=purple>0x7fff8c0dc000</font> - 0x7fff8c0f7ff7 <C0535565-35D1-31A7-A744-63D9F10F12A4> /usr/lib/system/libsystem_kernel.dylib | |
</tt></pre></code> | |
<p>First we create the target using the main executable and then add any extra shared libraries we want:</p> | |
<code><pre><tt><b>(lldb)</b> target create --no-dependents --arch x86_64 /tmp/a.out | |
<b>(lldb)</b> target modules add /usr/lib/system/libsystem_c.dylib | |
<b>(lldb)</b> target modules add /usr/lib/system/libsystem_dnssd.dylib | |
<b>(lldb)</b> target modules add /usr/lib/system/libsystem_kernel.dylib | |
</tt></pre></code> | |
<p>If you have debug symbols in standalone files, such as dSYM files on Mac OS X, you can specify their paths using the <b>--symfile</b> option for the "target create" (recent LLDB releases only) and "target modules add" commands:</p> | |
<code><pre><tt><b>(lldb)</b> target create --no-dependents --arch x86_64 /tmp/a.out <b>--symfile /tmp/a.out.dSYM</b> | |
<b>(lldb)</b> target modules add /usr/lib/system/libsystem_c.dylib <b>--symfile /build/server/a/libsystem_c.dylib.dSYM</b> | |
<b>(lldb)</b> target modules add /usr/lib/system/libsystem_dnssd.dylib <b>--symfile /build/server/b/libsystem_dnssd.dylib.dSYM</b> | |
<b>(lldb)</b> target modules add /usr/lib/system/libsystem_kernel.dylib <b>--symfile /build/server/c/libsystem_kernel.dylib.dSYM</b> | |
</tt></pre></code> | |
<p>Then we set the load addresses for each __TEXT section (note the colors of the load addresses above and below) using the first address from the Binary Images section for each image:</p> | |
<code><pre><tt><b>(lldb)</b> target modules load --file a.out <font color=blue>0x100000000</font> | |
<b>(lldb)</b> target modules load --file libsystem_c.dylib <font color=green>0x7fff83f32000</font> | |
<b>(lldb)</b> target modules load --file libsystem_dnssd.dylib <font color=red>0x7fff883db000</font> | |
<b>(lldb)</b> target modules load --file libsystem_kernel.dylib <font color=purple>0x7fff8c0dc000</font> | |
</tt></pre></code> | |
<p>Now any stack backtraces that haven't been symbolicated can be symbolicated using "image lookup" | |
with the raw backtrace addresses.</p> | |
<p>Given the following raw backtrace:</p> | |
<code><pre><tt>Thread 0 Crashed:: Dispatch queue: com.apple.main-thread | |
0 libsystem_kernel.dylib 0x00007fff8a1e6d46 __kill + 10 | |
1 libsystem_c.dylib 0x00007fff84597df0 abort + 177 | |
2 libsystem_c.dylib 0x00007fff84598e2a __assert_rtn + 146 | |
3 a.out 0x0000000100000f46 main + 70 | |
4 libdyld.dylib 0x00007fff8c4197e1 start + 1 | |
</tt></pre></code> | |
<p>We can now symbolicate the <b>load</b> addresses:<p> | |
<code><pre><tt><b>(lldb)</b> image lookup -a 0x00007fff8a1e6d46 | |
<b>(lldb)</b> image lookup -a 0x00007fff84597df0 | |
<b>(lldb)</b> image lookup -a 0x00007fff84598e2a | |
<b>(lldb)</b> image lookup -a 0x0000000100000f46 | |
</tt></pre></code> | |
</div> | |
<div class="postfooter"></div> | |
</div> | |
<div class="post"> | |
<h1 class="postheader">Getting Variable Information</h1> | |
<div class="postcontent"> | |
<p>If you add the --verbose flag to the "image lookup --address" command, | |
you can get verbose information which can often include the locations | |
of some of your local variables: | |
<code><pre><tt>><b>(lldb)</b> image lookup --address 0x100123aa3 --verbose | |
Address: a.out[0x0000000100000aa3] (a.out.__TEXT.__text + 110) | |
Summary: a.out`main + 50 at main.c:13 | |
Module: file = "/tmp/a.out", arch = "x86_64" | |
CompileUnit: id = {0x00000000}, file = "/tmp/main.c", language = "ISO C:1999" | |
Function: id = {0x0000004f}, name = "main", range = [0x0000000100000bc0-0x0000000100000dc9) | |
FuncType: id = {0x0000004f}, decl = main.c:9, clang_type = "int (int, const char **, const char **, const char **)" | |
Blocks: id = {0x0000004f}, range = [0x100000bc0-0x100000dc9) | |
id = {0x000000ae}, range = [0x100000bf2-0x100000dc4) | |
LineEntry: [0x0000000100000bf2-0x0000000100000bfa): /tmp/main.c:13:23 | |
Symbol: id = {0x00000004}, range = [0x0000000100000bc0-0x0000000100000dc9), name="main" | |
Variable: id = {0x000000bf}, name = "path", type= "char [1024]", location = DW_OP_fbreg(-1072), decl = main.c:28 | |
Variable: id = {0x00000072}, name = "argc", type= "int", <b>location = r13</b>, decl = main.c:8 | |
Variable: id = {0x00000081}, name = "argv", type= "const char **", <b>location = r12</b>, decl = main.c:8 | |
Variable: id = {0x00000090}, name = "envp", type= "const char **", <b>location = r15</b>, decl = main.c:8 | |
Variable: id = {0x0000009f}, name = "aapl", type= "const char **", <b>location = rbx</b>, decl = main.c:8 | |
</tt></pre></code> | |
<p>The interesting part is the variables that are listed. The variables are | |
the parameters and local variables that are in scope for the address that | |
was specified. These variable entries have locations which are shown in bold | |
above. Crash logs often have register information for the first frame in each | |
stack, and being able to reconstruct one or more local variables can often | |
help you decipher more information from a crash log than you normally would be | |
able to. Note that this is really only useful for the first frame, and only if | |
your crash logs have register information for your threads. | |
</div> | |
<div class="postfooter"></div> | |
</div> | |
<div class="post"> | |
<h1 class="postheader">Using Python API to Symbolicate</h1> | |
<div class="postcontent"> | |
<p>All of the commands above can be done through the python script bridge. The code below | |
will recreate the target and add the three shared libraries that we added in the darwin | |
crash log example above: | |
<code><pre><tt>triple = "x86_64-apple-macosx" | |
platform_name = None | |
add_dependents = False | |
target = lldb.debugger.CreateTarget("/tmp/a.out", triple, platform_name, add_dependents, lldb.SBError()) | |
if target: | |
<font color=green># Get the executable module</font> | |
module = target.GetModuleAtIndex(0) | |
target.SetSectionLoadAddress(module.FindSection("__TEXT"), 0x100000000) | |
module = target.AddModule ("/usr/lib/system/libsystem_c.dylib", triple, None, "/build/server/a/libsystem_c.dylib.dSYM") | |
target.SetSectionLoadAddress(module.FindSection("__TEXT"), 0x7fff83f32000) | |
module = target.AddModule ("/usr/lib/system/libsystem_dnssd.dylib", triple, None, "/build/server/b/libsystem_dnssd.dylib.dSYM") | |
target.SetSectionLoadAddress(module.FindSection("__TEXT"), 0x7fff883db000) | |
module = target.AddModule ("/usr/lib/system/libsystem_kernel.dylib", triple, None, "/build/server/c/libsystem_kernel.dylib.dSYM") | |
target.SetSectionLoadAddress(module.FindSection("__TEXT"), 0x7fff8c0dc000) | |
load_addr = 0x00007fff8a1e6d46 | |
<font color=green># so_addr is a section offset address, or a lldb.SBAddress object</font> | |
so_addr = target.ResolveLoadAddress (load_addr) | |
<font color=green># Get a symbol context for the section offset address which includes | |
# a module, compile unit, function, block, line entry, and symbol</font> | |
sym_ctx = so_addr.GetSymbolContext (lldb.eSymbolContextEverything) | |
print sym_ctx | |
</tt></pre></code> | |
</div> | |
<div class="postfooter"></div> | |
</div> | |
<div class="post"> | |
<h1 class="postheader">Use Builtin Python module to Symbolicate</h1> | |
<div class="postcontent"> | |
<p>LLDB includes a module in the <b>lldb</b> package named <b>lldb.utils.symbolication</b>. | |
This module contains a lot of symbolication functions that simplify the symbolication | |
process by allowing you to create objects that represent symbolication class objects such as: | |
<ul> | |
<li>lldb.utils.symbolication.Address</li> | |
<li>lldb.utils.symbolication.Section</li> | |
<li>lldb.utils.symbolication.Image</li> | |
<li>lldb.utils.symbolication.Symbolicator</li> | |
</ul> | |
<h2>lldb.utils.symbolication.Address</h2> | |
<p>This class represents an address that will be symbolicated. It will cache any information | |
that has been looked up: module, compile unit, function, block, line entry, symbol. | |
It does this by having a lldb.SBSymbolContext as a member variable. | |
</p> | |
<h2>lldb.utils.symbolication.Section</h2> | |
<p>This class represents a section that might get loaded in a <b>lldb.utils.symbolication.Image</b>. | |
It has helper functions that allow you to set it from text that might have been extracted from | |
a crash log file. | |
</p> | |
<h2>lldb.utils.symbolication.Image</h2> | |
<p>This class represents a module that might get loaded into the target we use for symbolication. | |
This class contains the executable path, optional symbol file path, the triple, and the list of sections that will need to be loaded | |
if we choose the ask the target to load this image. Many of these objects will never be loaded | |
into the target unless they are needed by symbolication. You often have a crash log that has | |
100 to 200 different shared libraries loaded, but your crash log stack backtraces only use a few | |
of these shared libraries. Only the images that contain stack backtrace addresses need to be loaded | |
in the target in order to symbolicate. | |
</p> | |
<p>Subclasses of this class will want to override the <b>locate_module_and_debug_symbols</b> method: | |
<code><pre><tt>class CustomImage(lldb.utils.symbolication.Image): | |
def locate_module_and_debug_symbols (self): | |
<font color=green># Locate the module and symbol given the info found in the crash log</font> | |
</tt></pre></code> | |
<p>Overriding this function allows clients to find the correct executable module and symbol files as they might reside on a build server.<p> | |
<h2>lldb.utils.symbolication.Symbolicator</h2> | |
<p>This class coordinates the symbolication process by loading only the <b>lldb.utils.symbolication.Image</b> | |
instances that need to be loaded in order to symbolicate an supplied address. | |
</p> | |
<h2>lldb.macosx.crashlog</h2> | |
<p><b>lldb.macosx.crashlog</b> is a package that is distributed on Mac OS X builds that subclasses the above classes. | |
This module parses the information in the Darwin crash logs and creates symbolication objects that | |
represent the images, the sections and the thread frames for the backtraces. It then uses the functions | |
in the lldb.utils.symbolication to symbolicate the crash logs.</p> | |
<p> | |
This module installs a new "crashlog" command into the lldb command interpreter so that you can use | |
it to parse and symbolicate Mac OS X crash logs:</p> | |
<code><pre><tt><b>(lldb)</b> command script import lldb.macosx.crashlog | |
"crashlog" and "save_crashlog" command installed, use the "--help" option for detailed help | |
<b>(lldb)</b> crashlog /tmp/crash.log | |
... | |
</tt></pre></code> | |
<p>The command that is installed has built in help that shows the | |
options that can be used when symbolicating: | |
<code><pre><tt><b>(lldb)</b> crashlog --help | |
Usage: crashlog [options] <FILE> [FILE ...] | |
Symbolicate one or more darwin crash log files to provide source file and line | |
information, inlined stack frames back to the concrete functions, and | |
disassemble the location of the crash for the first frame of the crashed | |
thread. If this script is imported into the LLDB command interpreter, a | |
"crashlog" command will be added to the interpreter for use at the LLDB | |
command line. After a crash log has been parsed and symbolicated, a target | |
will have been created that has all of the shared libraries loaded at the load | |
addresses found in the crash log file. This allows you to explore the program | |
as if it were stopped at the locations described in the crash log and | |
functions can be disassembled and lookups can be performed using the | |
addresses found in the crash log. | |
Options: | |
-h, --help show this help message and exit | |
-v, --verbose display verbose debug info | |
-g, --debug display verbose debug logging | |
-a, --load-all load all executable images, not just the images found | |
in the crashed stack frames | |
--images show image list | |
--debug-delay=NSEC pause for NSEC seconds for debugger | |
-c, --crashed-only only symbolicate the crashed thread | |
-d DISASSEMBLE_DEPTH, --disasm-depth=DISASSEMBLE_DEPTH | |
set the depth in stack frames that should be | |
disassembled (default is 1) | |
-D, --disasm-all enabled disassembly of frames on all threads (not just | |
the crashed thread) | |
-B DISASSEMBLE_BEFORE, --disasm-before=DISASSEMBLE_BEFORE | |
the number of instructions to disassemble before the | |
frame PC | |
-A DISASSEMBLE_AFTER, --disasm-after=DISASSEMBLE_AFTER | |
the number of instructions to disassemble after the | |
frame PC | |
-C NLINES, --source-context=NLINES | |
show NLINES source lines of source context (default = | |
4) | |
--source-frames=NFRAMES | |
show source for NFRAMES (default = 4) | |
--source-all show source for all threads, not just the crashed | |
thread | |
-i, --interactive parse all crash logs and enter interactive mode | |
</tt></pre></code> | |
<p>The source for the "symbolication" and "crashlog" modules are available in SVN:</p> | |
<ul> | |
<li><a href="http://llvm.org/svn/llvm-project/lldb/trunk/examples/python/symbolication.py">symbolication.py</a></li> | |
<li><a href="http://llvm.org/svn/llvm-project/lldb/trunk/examples/python/crashlog.py">crashlog.py</a></li> | |
</ul> | |
</div> | |
<div class="postfooter"></div> | |
</div> | |
</div> | |
</body> | |
</html> |