<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> | |
<html xmlns="http://www.w3.org/1999/xhtml"> | |
<head> | |
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /> | |
<link href="style.css" rel="stylesheet" type="text/css" /> | |
<title>LLDB Example - Python Scripting to Debug a Problem</title> | |
</head> | |
<body> | |
<div class="www_title"> | |
Example - Using Scripting and Python to Debug in LLDB | |
</div> | |
<div id="container"> | |
<div id="content"> | |
<!--#include virtual="sidebar.incl"--> | |
<div id="middle"> | |
<div class="post"> | |
<h1 class ="postheader">Introduction</h1> | |
<div class="postcontent"> | |
<p>LLDB has been structured from the beginning to be scriptable in two ways | |
-- a Unix Python session can initiate/run a debug session non-interactively | |
using LLDB; and within the LLDB debugger tool, Python scripts can be used to | |
help with many tasks, including inspecting program data, iterating over | |
containers and determining if a breakpoint should stop execution or continue. | |
This document will show how to do some of these things by going through an | |
example, explaining how to use Python scripting to find a bug in a program | |
that searches for text in a large binary tree.</p> | |
</div> | |
<div class="postfooter"></div> | |
<div class="post"> | |
<h1 class ="postheader">The Test Program and Input</h1> | |
<div class="postcontent"> | |
<p>We have a simple C program (dictionary.c) that reads in a text file, and | |
stores all the words from the file in a Binary Search Tree, sorted | |
alphabetically. It then enters a loop prompting the user for a word, searching | |
for the word in the tree (using Binary Search), and reporting to the user | |
whether or not it found the word in the tree.</p> | |
<p>The input text file we are using to test our program contains the text for | |
William Shakespeare's famous tragedy "Romeo and Juliet".</p> | |
</div> | |
<div class="postfooter"></div> | |
<div class="post"> | |
<h1 class ="postheader">The Bug</h1> | |
<div class="postcontent"> | |
<p>When we try running our program, we find there is a problem. While it | |
successfully finds some of the words we would expect to find, such as "love" | |
or "sun", it fails to find the word "Romeo", which MUST be in the input text | |
file:</p> | |
<code color=#ff0000> | |
% ./dictionary Romeo-and-Juliet.txt<br> | |
Dictionary loaded.<br> | |
Enter search word: love<br> | |
Yes!<br> | |
Enter search word: sun<br> | |
Yes!<br> | |
Enter search word: Romeo<br> | |
No!<br> | |
Enter search word: ^D<br> | |
%<br> | |
</code> | |
</div> | |
<div class="postfooter"></div> | |
<div class="post"> | |
<h1 class ="postheader">Is the word in our tree: Using Depth First Search</h1> | |
<div class="postcontent"> | |
<p>Our first job is to determine if the word "Romeo" actually got inserted into | |
the tree or not. Since "Romeo and Juliet" has thousands of words, trying to | |
examine our binary search tree by hand is completely impractical. Therefore we | |
will write a Python script to search the tree for us. We will write a recursive | |
Depth First Search function that traverses the entire tree searching for a word, | |
and maintaining information about the path from the root of the tree to the | |
current node. If it finds the word in the tree, it returns the path from the | |
root to the node containing the word. This is what our DFS function in Python | |
would look like, with line numbers added for easy reference in later | |
explanations:</p> | |
<code> | |
<pre><tt> | |
1: def DFS (root, word, cur_path): | |
2: root_word_ptr = root.GetChildMemberWithName ("word") | |
3: left_child_ptr = root.GetChildMemberWithName ("left") | |
4: right_child_ptr = root.GetChildMemberWithName ("right") | |
5: root_word = root_word_ptr.GetSummary() | |
6: end = len (root_word) - 1 | |
7: if root_word[0] == '"' and root_word[end] == '"': | |
8: root_word = root_word[1:end] | |
9: end = len (root_word) - 1 | |
10: if root_word[0] == '\'' and root_word[end] == '\'': | |
11: root_word = root_word[1:end] | |
12: if root_word == word: | |
13: return cur_path | |
14: elif word < root_word: | |
15: if left_child_ptr.GetValue() == None: | |
16: return "" | |
17: else: | |
18: cur_path = cur_path + "L" | |
19: return DFS (left_child_ptr, word, cur_path) | |
20: else: | |
21: if right_child_ptr.GetValue() == None: | |
22: return "" | |
23: else: | |
24: cur_path = cur_path + "R" | |
25: return DFS (right_child_ptr, word, cur_path) | |
</tt></pre> | |
</code> | |
</div> | |
<div class="postfooter"></div> | |
<div class="post"> | |
<h1 class ="postheader"><a name="accessing-variables">Accessing & Manipulating <strong>Program</strong> Variables in Python</a> | |
</h1> | |
<div class="postcontent"> | |
<p>Before we can call any Python function on any of our program's variables, we | |
need to get the variable into a form that Python can access. To show you how to | |
do this we will look at the parameters for the DFS function. The first | |
parameter is going to be a node in our binary search tree, put into a Python | |
variable. The second parameter is the word we are searching for (a string), and | |
the third parameter is a string representing the path from the root of the tree | |
to our current node.</p> | |
<p>The most interesting parameter is the first one, the Python variable that | |
needs to contain a node in our search tree. How can we take a variable out of | |
our program and put it into a Python variable? What kind of Python variable | |
will it be? The answers are to use the LLDB API functions, provided as part of | |
the LLDB Python module. Running Python from inside LLDB, LLDB will | |
automatically give us our current frame object as a Python variable, | |
"lldb.frame". This variable has the type "SBFrame" (see the LLDB API for | |
more information about SBFrame objects). One of the things we can do with a | |
frame object, is to ask it to find and return its local variable. We will call | |
the API function "FindVariable" on the lldb.frame object to give us our | |
dictionary variable as a Python variable:</p> | |
<code> | |
root = lldb.frame.FindVariable ("dictionary") | |
</code> | |
<p>The line above, executed in the Python script interpreter in LLDB, asks the | |
current frame to find the variable named "dictionary" and return it. We then | |
store the returned value in the Python variable named "root". This answers the | |
question of HOW to get the variable, but it still doesn't explain WHAT actually | |
gets put into "root". If you examine the LLDB API, you will find that the | |
SBFrame method "FindVariable" returns an object of type SBValue. SBValue | |
objects are used, among other things, to wrap up program variables and values. | |
There are many useful methods defined in the SBValue class to allow you to get | |
information or children values out of SBValues. For complete information, see | |
the header file <a href="http://llvm.org/svn/llvm-project/lldb/trunk/include/lldb/API/SBValue.h">SBValue.h</a>. The | |
SBValue methods that we use in our DFS function are | |
<code>GetChildMemberWithName()</code>, | |
<code>GetSummary()</code>, and <code>GetValue()</code>.</p> | |
</div> | |
<div class="postfooter"></div> | |
<div class="post"> | |
<h1 class ="postheader">Explaining Depth First Search Script in Detail</h1> | |
<div class="postcontent"> | |
<p><strong>"DFS" Overview.</strong> Before diving into the details of this | |
code, it would be best to give a high-level overview of what it does. The nodes | |
in our binary search tree were defined to have type <code>tree_node *</code>, | |
which is defined as: | |
<code> | |
<pre><tt>typedef struct tree_node | |
{ | |
const char *word; | |
struct tree_node *left; | |
struct tree_node *right; | |
} tree_node;</tt></pre></code> | |
<p>Lines 2-11 of DFS are getting data out of the current tree node and getting | |
ready to do the actual search; lines 12-25 are the actual depth-first search. | |
Lines 2-4 of our DFS function get the <code>word</code>, <code>left</code> and | |
<code>right</code> fields out of the current node and store them in Python | |
variables. Since <code>root_word_ptr</code> is a pointer to our word, and we | |
want the actual word, line 5 calls <code>GetSummary()</code> to get a string | |
containing the value out of the pointer. Since <code>GetSummary()</code> adds | |
quotes around its result, lines 6-11 strip surrounding quotes off the word.</p> | |
<p>Line 12 checks to see if the word in the current node is the one we are | |
searching for. If so, we are done, and line 13 returns the current path. | |
Otherwise, line 14 checks to see if we should go left (search word comes before | |
the current word). If we decide to go left, line 15 checks to see if the left | |
pointer child is NULL ("None" is the Python equivalent of NULL). If the left | |
pointer is NULL, then the word is not in this tree and we return an empty path | |
(line 16). Otherwise, we add an "L" to the end of our current path string, to | |
indicate we are going left (line 18), and then recurse on the left child (line | |
19). Lines 20-25 are the same as lines 14-19, except for going right rather | |
than going left.</p> | |
<p>One other note: Typing something as long as our DFS function directly into | |
the interpreter can be difficult, as making a single typing mistake means having | |
to start all over. Therefore we recommend doing as we have done: Writing your | |
longer, more complicated script functions in a separate file (in this case | |
tree_utils.py) and then importing it into your LLDB Python interpreter.</p> | |
</div> | |
<div class="postfooter"></div> | |
<div class="post"> | |
<h1 class ="postheader">Seeing the DFS Script in Action</h1> | |
<div class="postcontent"> | |
<p>At this point we are ready to use the DFS function to see if the word "Romeo" | |
is in our tree or not. To actually use it in LLDB on our dictionary program, | |
you would do something like this:</p> | |
<code> | |
% <strong>lldb</strong><br> | |
(lldb) <strong>process attach -n "dictionary"</strong><br> | |
Architecture set to: x86_64.<br> | |
Process 521 stopped<br> | |
* thread #1: tid = 0x2c03, 0x00007fff86c8bea0 libSystem.B.dylib`read$NOCANCEL + 8, stop reason = signal SIGSTOP<br> | |
frame #0: 0x00007fff86c8bea0 libSystem.B.dylib`read$NOCANCEL + 8<br> | |
(lldb) <strong>breakpoint set -n find_word</strong><br> | |
Breakpoint created: 1: name = 'find_word', locations = 1, resolved = 1<br> | |
(lldb) <strong>continue</strong><br> | |
Process 521 resuming<br> | |
Process 521 stopped<br> | |
* thread #1: tid = 0x2c03, 0x0000000100001830 dictionary`find_word + 16 <br> | |
at dictionary.c:105, stop reason = breakpoint 1.1<br> | |
frame #0: 0x0000000100001830 dictionary`find_word + 16 at dictionary.c:105<br> | |
102 int<br> | |
103 find_word (tree_node *dictionary, char *word)<br> | |
104 {<br> | |
-> 105 if (!word || !dictionary)<br> | |
106 return 0;<br> | |
107 <br> | |
108 int compare_value = strcmp (word, dictionary->word);<br> | |
(lldb) <strong>script</strong><br> | |
Python Interactive Interpreter. To exit, type 'quit()', 'exit()' or Ctrl-D.<br> | |
>>> <strong>import tree_utils</strong><br> | |
>>> <strong>root = lldb.frame.FindVariable ("dictionary")</strong><br> | |
>>> <strong>current_path = ""</strong><br> | |
>>> <strong>path = tree_utils.DFS (root, "Romeo", current_path)</strong><br> | |
>>> <strong>print path</strong><br> | |
LLRRL<br> | |
>>> <strong>^D</strong><br> | |
(lldb) <br> | |
</code> | |
<p>The first bit of code above shows starting lldb, attaching to the dictionary | |
program, and getting to the find_word function in LLDB. The interesting part | |
(as far as this example is concerned) begins when we enter the | |
<code>script</code> command and drop into the embedded interactive Python | |
interpreter. We will go over this Python code line by line. The first line</p> | |
<code> | |
import tree_utils | |
</code> | |
<p>imports the file where we wrote our DFS function, tree_utils.py, into Python. | |
Notice that to import the file we leave off the ".py" extension. We can now | |
call any function in that file, giving it the prefix "tree_utils.", so that | |
Python knows where to look for the function. The line</p> | |
<code> | |
root = lldb.frame.FindVariable ("dictionary") | |
</code> | |
<p>gets our program variable "dictionary" (which contains the binary search | |
tree) and puts it into the Python variable "root". See | |
<a href="#accessing-variables">Accessing & Manipulating Program Variables in Python</a> | |
above for more details about how this works. The next line is</p> | |
<code> | |
current_path = "" | |
</code> | |
<p>This line initializes the current_path from the root of the tree to our | |
current node. Since we are starting at the root of the tree, our current path | |
starts as an empty string. As we go right and left through the tree, the DFS | |
function will append an 'R' or an 'L' to the current path, as appropriate. The | |
line</p> | |
<code> | |
path = tree_utils.DFS (root, "Romeo", current_path) | |
</code> | |
<p>calls our DFS function (prefixing it with the module name so that Python can | |
find it). We pass in our binary tree stored in the variable <code>root</code>, | |
the word we are searching for, and our current path. We assign whatever path | |
the DFS function returns to the Python variable <code>path</code>.</p> | |
<p>Finally, we want to see if the word was found or not, and if so we want to | |
see the path through the tree to the word. So we do</p> | |
<code> | |
print path | |
</code> | |
<p>From this we can see that the word "Romeo" was indeed found in the tree, and | |
the path from the root of the tree to the node containing "Romeo" is | |
left-left-right-right-left.</p> | |
</div> | |
<div class="postfooter"></div> | |
<div class="post"> | |
<h1 class ="postheader">What next? Using Breakpoint Command Scripts...</h1> | |
<div class="postcontent"> | |
<p>We are halfway to figuring out what the problem is. We know the word we are | |
looking for is in the binary tree, and we know exactly where it is in the binary | |
tree. Now we need to figure out why our binary search algorithm is not finding | |
the word. We will do this using breakpoint command scripts.</p> | |
<p>The idea is as follows. The binary search algorithm has two main decision | |
points: the decision to follow the right branch; and, the decision to follow | |
the left branch. We will set a breakpoint at each of these decision points, and | |
attach a Python breakpoint command script to each breakpoint. The breakpoint | |
commands will use the global <code>path</code> Python variable that we got from | |
our DFS function. Each time one of these decision breakpoints is hit, the script | |
will compare the actual decision with the decision the front of the | |
<code>path</code> variable says should be made (the first character of the | |
path). If the actual decision and the path agree, then the front character is | |
stripped off the path, and execution is resumed. In this case the user never | |
even sees the breakpoint being hit. But if the decision differs from what the | |
path says it should be, then the script prints out a message and does NOT resume | |
execution, leaving the user sitting at the first point where a wrong decision is | |
being made.</p> | |
</div> | |
<div class="postfooter"></div> | |
<div class="post"> | |
<h1 class ="postheader">Side Note: Python Breakpoint Command Scripts are NOT What They Seem</h1> | |
<div class="postcontent"> | |
</div> | |
<div class="postfooter"></div> | |
<p>What do we mean by that? When you enter a Python breakpoint command in LLDB, | |
it appears that you are entering one or more plain lines of Python. BUT LLDB | |
then takes what you entered and wraps it into a Python FUNCTION (just like using | |
the "def" Python command). It automatically gives the function an obscure, | |
unique, hard-to-stumble-across function name, and gives it two parameters: | |
<code>frame</code> and <code>bp_loc</code>. When the breakpoint gets hit, LLDB | |
wraps up the frame object where the breakpoint was hit, and the breakpoint | |
location object for the breakpoint that was hit, and puts them into Python | |
variables for you. It then calls the Python function that was created for the | |
breakpoint command, and passes in the frame and breakpoint location objects.</p> | |
<p>So, being practical, what does this mean for you when you write your Python | |
breakpoint commands? It means that there are two things you need to keep in | |
mind: 1. If you want to access any Python variables created outside your script, | |
<strong>you must declare such variables to be global</strong>. If you do not | |
declare them as global, then the Python function will treat them as local | |
variables, and you will get unexpected behavior. 2. <strong>All Python | |
breakpoint command scripts automatically have a <code>frame</code> and a | |
<code>bp_loc</code> variable.</strong> The variables are pre-loaded by LLDB | |
with the correct context for the breakpoint. You do not have to use these | |
variables, but they are there if you want them.</p> | |
</div> | |
<div class="postfooter"></div> | |
<div class="post"> | |
<h1 class ="postheader">The Decision Point Breakpoint Commands</h1> | |
<div class="postcontent"> | |
<p>This is what the Python breakpoint command script would look like for the | |
decision to go right:<p> | |
<code><pre><tt> | |
global path | |
if path[0] == 'R': | |
path = path[1:] | |
thread = frame.GetThread() | |
process = thread.GetProcess() | |
process.Continue() | |
else: | |
print "Here is the problem; going right, should go left!" | |
</tt></pre></code> | |
<p>Just as a reminder, LLDB is going to take this script and wrap it up in a | |
function, like this:</p> | |
<code><pre><tt> | |
def some_unique_and_obscure_function_name (frame, bp_loc): | |
global path | |
if path[0] == 'R': | |
path = path[1:] | |
thread = frame.GetThread() | |
process = thread.GetProcess() | |
process.Continue() | |
else: | |
print "Here is the problem; going right, should go left!" | |
</tt></pre></code> | |
<p>LLDB will call the function, passing in the correct frame and breakpoint | |
location whenever the breakpoint gets hit. There are several things to notice | |
about this function. The first one is that we are accessing and updating a | |
piece of state (the <code>path</code> variable), and actually conditioning our | |
behavior based upon this variable. Since the variable was defined outside of | |
our script (and therefore outside of the corresponding function) we need to tell | |
Python that we are accessing a global variable. That is what the first line of | |
the script does. Next we check where the path says we should go and compare it to | |
our decision (recall that we are at the breakpoint for the decision to go | |
right). If the path agrees with our decision, then we strip the first character | |
off of the path.</p> | |
<p>Since the decision matched the path, we want to resume execution. To do this | |
we make use of the <code>frame</code> parameter that LLDB guarantees will be | |
there for us. We use LLDB API functions to get the current thread from the | |
current frame, and then to get the process from the thread. Once we have the | |
process, we tell it to resume execution (using the <code>Continue()</code> API | |
function).</p> | |
<p>If the decision to go right does not agree with the path, then we do not | |
resume execution. We allow the breakpoint to remain stopped (by doing nothing), | |
and we print an informational message telling the user we have found the | |
problem, and what the problem is.</p> | |
</div> | |
<div class="postfooter"></div> | |
<div class="post"> | |
<h1 class ="postheader">Actually Using the Breakpoint Commands</h1> | |
<div class="postcontent"> | |
<p>Now we will look at what happens when we actually use these breakpoint | |
commands on our program. Doing a <code>source list -n find_word</code> shows | |
us the function containing our two decision points. Looking at the code below, | |
we see that we want to set our breakpoints on lines 113 and 115:</p> | |
<code><pre><tt> | |
(lldb) source list -n find_word | |
File: /Volumes/Data/HD2/carolinetice/Desktop/LLDB-Web-Examples/dictionary.c. | |
101 | |
102 int | |
103 find_word (tree_node *dictionary, char *word) | |
104 { | |
105 if (!word || !dictionary) | |
106 return 0; | |
107 | |
108 int compare_value = strcmp (word, dictionary->word); | |
109 | |
110 if (compare_value == 0) | |
111 return 1; | |
112 else if (compare_value < 0) | |
113 return find_word (dictionary->left, word); | |
114 else | |
115 return find_word (dictionary->right, word); | |
116 } | |
117 | |
</tt></pre></code> | |
<p>So, we set our breakpoints, enter our breakpoint command scripts, and see | |
what happens:<p> | |
<code><pre><tt> | |
(lldb) breakpoint set -l 113 | |
Breakpoint created: 2: file ='dictionary.c', line = 113, locations = 1, resolved = 1 | |
(lldb) breakpoint set -l 115 | |
Breakpoint created: 3: file ='dictionary.c', line = 115, locations = 1, resolved = 1 | |
(lldb) breakpoint command add -s python 2 | |
Enter your Python command(s). Type 'DONE' to end. | |
> global path | |
> if (path[0] == 'L'): | |
> path = path[1:] | |
> thread = frame.GetThread() | |
> process = thread.GetProcess() | |
> process.Continue() | |
> else: | |
> print "Here is the problem. Going left, should go right!" | |
> DONE | |
(lldb) breakpoint command add -s python 3 | |
Enter your Python command(s). Type 'DONE' to end. | |
> global path | |
> if (path[0] == 'R'): | |
> path = path[1:] | |
> thread = frame.GetThread() | |
> process = thread.GetProcess() | |
> process.Continue() | |
> else: | |
> print "Here is the problem. Going right, should go left!" | |
> DONE | |
(lldb) continue | |
Process 696 resuming | |
Here is the problem. Going right, should go left! | |
Process 696 stopped | |
* thread #1: tid = 0x2d03, 0x000000010000189f dictionary`find_word + 127 at dictionary.c:115, stop reason = breakpoint 3.1 | |
frame #0: 0x000000010000189f dictionary`find_word + 127 at dictionary.c:115 | |
112 else if (compare_value < 0) | |
113 return find_word (dictionary->left, word); | |
114 else | |
-> 115 return find_word (dictionary->right, word); | |
116 } | |
117 | |
118 void | |
(lldb) | |
</tt></pre></code> | |
<p>After setting our breakpoints, adding our breakpoint commands and continuing, | |
we run for a little bit and then hit one of our breakpoints, printing out the | |
error message from the breakpoint command. Apparently at this point the the | |
tree, our search algorithm decided to go right, but our path says the node we | |
want is to the left. Examining the word at the node where we stopped, and our | |
search word, we see:</p> | |
<code> | |
(lldb) expr dictionary->word<br> | |
(const char *) $1 = 0x0000000100100080 "dramatis"<br> | |
(lldb) expr word<br> | |
(char *) $2 = 0x00007fff5fbff108 "romeo"<br> | |
</code> | |
<p>So the word at our current node is "dramatis", and the word we are searching | |
for is "romeo". "romeo" comes after "dramatis" alphabetically, so it seems like | |
going right would be the correct decision. Let's ask Python what it thinks the | |
path from the current node to our word is:</p> | |
<code> | |
(lldb) script print path<br> | |
LLRRL<br> | |
</code> | |
<p>According to Python we need to go left-left-right-right-left from our current | |
node to find the word we are looking for. Let's double check our tree, and see | |
what word it has at that node:</p> | |
<code> | |
(lldb) expr dictionary->left->left->right->right->left->word<br> | |
(const char *) $4 = 0x0000000100100880 "Romeo"<br> | |
</code> | |
<p>So the word we are searching for is "romeo" and the word at our DFS location | |
is "Romeo". Aha! One is uppercase and the other is lowercase: We seem to have | |
a case conversion problem somewhere in our program (we do).</p> | |
<p>This is the end of our example on how you might use Python scripting in LLDB | |
to help you find bugs in your program.</p> | |
</div> | |
<div class="postfooter"></div> | |
<div class="post"> | |
<h1 class ="postheader">Source Files for The Example</h1> | |
<div class="postcontent"> | |
</div> | |
<div class="postfooter"></div> | |
<p> The complete code for the Dictionary program (with case-conversion bug), | |
the DFS function and other Python script examples (tree_utils.py) used for this | |
example are available via following file links:</p> | |
<a href="http://llvm.org/svn/llvm-project/lldb/trunk/examples/scripting/tree_utils.py">tree_utils.py</a> - Example Python functions using LLDB's API, including DFS<br> | |
<a href="http://llvm.org/svn/llvm-project/lldb/trunk/examples/scripting/dictionary.c">dictionary.c</a> - Sample dictionary program, with bug<br> | |
<p>The text for "Romeo and Juliet" can be obtained from the Gutenberg Project | |
(http://www.gutenberg.org).</p> | |
</div> | |
</div> | |
</div> | |
</div> | |
</body> | |
</html> |