blob: 20a6b859e1a010e2e065f71d8fafa9402e73c0ba [file] [log] [blame]
Bill Wendling7d623452015-03-18 13:36:07 -07001====
2YAPF
3====
4
Bill Wendlingfb8ab382015-03-18 20:24:14 -07005.. image:: https://travis-ci.org/google/yapf.svg?branch=master
6 :target: https://travis-ci.org/google/yapf
7 :alt: Build status
8
Bill Wendling7d623452015-03-18 13:36:07 -07009Introduction
10============
11
12Most of the current formatters for Python -- e.g., autopep8, and pep8ify -- are
13made to remove lint errors from code. This has some obvious limitations. For
14instance, code that conforms to the PEP 8 guidelines may not be reformatted.
15But it doesn't mean that the code looks good.
16
17YAPF takes a different approach. It's based off of 'clang-format', developed by
18Daniel Jasper. In essence, the algorithm takes the code and reformats it to the
19best formatting that conforms to the style guide, even if the original code
20didn't violate the style guide.
21
22The ultimate goal is that the code YAPF produces is as good as the code that a
23programmer would write if they were following the style guide.
24
Bill Wendling52e04112015-03-18 20:42:26 -070025.. note::
26
27 YAPF is not an official Google product (experimental or otherwise), it is
28 just code that happens to be owned by Google.
29
Bill Wendling7d623452015-03-18 13:36:07 -070030.. contents::
31
32Installation
33============
34
35From source directory::
36
37 $ sudo python ./setup.py install
38
39
40Usage
41=====
42
Bill Wendlingfa22c892015-03-18 13:42:25 -070043Options::
Bill Wendling7d623452015-03-18 13:36:07 -070044
Bill Wendlingfa22c892015-03-18 13:42:25 -070045 usage: __main__.py [-h] [-d | -i] [-l START-END | -r] ...
Bill Wendling7d623452015-03-18 13:36:07 -070046
Bill Wendlingfa22c892015-03-18 13:42:25 -070047 Formatter for Python code.
Bill Wendling7d623452015-03-18 13:36:07 -070048
Bill Wendlingfa22c892015-03-18 13:42:25 -070049 positional arguments:
50 files
51
52 optional arguments:
53 -h, --help show this help message and exit
54 -d, --diff print the diff for the fixed source
55 -i, --in-place make changes to files in place
56 -l START-END, --lines START-END
57 range of lines to reformat, one-based
58 -r, --recursive run recursively over directories
Bill Wendling7d623452015-03-18 13:36:07 -070059
60
61Why Not Improve Existing Tools?
62===============================
63
64We wanted to use clang-format's reformatting algorithm. It's very powerful and
65designed to come up with the best formatting possible. Existing tools were
66created with different goals in mind, and would require extensive modifications
67to convert to using clang-format's algorithm.
68
69
70Can I Use YAPF In My Program?
71=============================
72
73Please do! YAPF was designed to be used as a library as well as a command line
74tool. This means that a tool or IDE plugin is free to use YAPF.
75
76
77Gory Details
78============
79
80Algorithm Design
81----------------
82
83The main data structure in YAPF is the UnwrappedLine object. It holds a list of
84FormatTokens, that we would want to place on a single line if there were no
85column limit. An exception being a comment in the middle of an expression
86statement will force the line to be formatted on more than one line. The
87formatter works on one UnwrappedLine object at a time.
88
89An UnwrappedLine typically won't affect the formatting of lines before or after
90it. There is a part of the algorithm that may join two or more UnwrappedLines
91into one line. For instance, an if-then statement with a short body can be
Bill Wendlingfa22c892015-03-18 13:42:25 -070092placed on a single line::
Bill Wendling7d623452015-03-18 13:36:07 -070093
94 if a == 42: continue
95
96YAPF's formatting algorithm creates a weighted tree that acts as the solution
97space for the algorithm. Each node in the tree represents the result of a
98formatting decision --- i.e., whether to split or not to split before a token.
99Each formatting decision has a cost associated with it. Therefore, the cost is
100realized on the edge between two nodes. (In reality, the weighted tree doesn't
101have separate edge objects, so the cost resides on the nodes themselves.)
102
103For example, take the following Python code snippet. For the sake of this
104example, assume that line (1) violates the column limit restriction and needs to
105be reformatted.
106
Bill Wendlingfa22c892015-03-18 13:42:25 -0700107.. code-block:: python
Bill Wendling7d623452015-03-18 13:36:07 -0700108
Bill Wendlingfa22c892015-03-18 13:42:25 -0700109 def xxxxxxxxxxx(aaaaaaaaaaaa, bbbbbbbbb, cccccccc, dddddddd, eeeeee): # 1
110 pass # 2
Bill Wendling7d623452015-03-18 13:36:07 -0700111
112For line (1), the algorithm will build a tree where each node (a
113FormattingDecisionState object) is the state of the line at that token given the
114decision to split before the token or not. Note: the FormatDecisionState objects
115are copied by value so each node in the graph is unique and a change in one
116doesn't affect other nodes.
117
Bill Wendlingfa22c892015-03-18 13:42:25 -0700118Heuristics are used to determine the costs of splitting or not splitting.
119Because a node holds the state of the tree up to a token's insertion, it can
120easily determine if a splitting decision will violate one of the style
Bill Wendling7d623452015-03-18 13:36:07 -0700121requirements. For instance, the heuristic is able to apply an extra penalty to
122the edge when not splitting between the previous token and the one being added.
123
124There are some instances where we will never want to split the line, because
125doing so will always be detrimental (i.e., it will require a backslash-newline,
126which is very rarely desirable). For line (1), we will never want to split the
127first three tokens: 'def', 'xxxxxxxxxxx', and '('. Nor will we want to split
128between the ')' and the ':' at the end. These regions are said to be
129"unbreakable." This is reflected in the tree by there not being a 'split'
130decision (left hand branch) within the unbreakable region.
131
132Now that we have the tree, we determine what the "best" formatting is by finding
133the path through the tree with the lowest cost.
134
135And that's it!