blob: 7d472125a850a4cdd2ec4d428aabd7e08b21864c [file] [log] [blame]
Bill Wendling7d623452015-03-18 13:36:07 -07001====
2YAPF
3====
4
Bill Wendlingfb8ab382015-03-18 20:24:14 -07005.. image:: https://travis-ci.org/google/yapf.svg?branch=master
6 :target: https://travis-ci.org/google/yapf
7 :alt: Build status
8
Bill Wendling7d623452015-03-18 13:36:07 -07009Introduction
10============
11
12Most of the current formatters for Python -- e.g., autopep8, and pep8ify -- are
13made to remove lint errors from code. This has some obvious limitations. For
14instance, code that conforms to the PEP 8 guidelines may not be reformatted.
15But it doesn't mean that the code looks good.
16
17YAPF takes a different approach. It's based off of 'clang-format', developed by
18Daniel Jasper. In essence, the algorithm takes the code and reformats it to the
19best formatting that conforms to the style guide, even if the original code
20didn't violate the style guide.
21
22The ultimate goal is that the code YAPF produces is as good as the code that a
23programmer would write if they were following the style guide.
24
Bill Wendling52e04112015-03-18 20:42:26 -070025.. note::
26
27 YAPF is not an official Google product (experimental or otherwise), it is
28 just code that happens to be owned by Google.
29
Bill Wendling7d623452015-03-18 13:36:07 -070030.. contents::
31
32Installation
33============
34
Eli Bendersky78b616e2015-03-19 05:17:31 -070035YAPF currently requires Python 2.7; support for Python 3.x is planned in the
36near future.
37
38To install YAPF from the source directory::
Bill Wendling7d623452015-03-18 13:36:07 -070039
40 $ sudo python ./setup.py install
41
Bill Wendling7d623452015-03-18 13:36:07 -070042Usage
43=====
44
Bill Wendlingfa22c892015-03-18 13:42:25 -070045Options::
Bill Wendling7d623452015-03-18 13:36:07 -070046
Bill Wendlingfa22c892015-03-18 13:42:25 -070047 usage: __main__.py [-h] [-d | -i] [-l START-END | -r] ...
Bill Wendling7d623452015-03-18 13:36:07 -070048
Bill Wendlingfa22c892015-03-18 13:42:25 -070049 Formatter for Python code.
Bill Wendling7d623452015-03-18 13:36:07 -070050
Bill Wendlingfa22c892015-03-18 13:42:25 -070051 positional arguments:
52 files
53
54 optional arguments:
55 -h, --help show this help message and exit
56 -d, --diff print the diff for the fixed source
57 -i, --in-place make changes to files in place
58 -l START-END, --lines START-END
59 range of lines to reformat, one-based
60 -r, --recursive run recursively over directories
Bill Wendling7d623452015-03-18 13:36:07 -070061
62
63Why Not Improve Existing Tools?
64===============================
65
66We wanted to use clang-format's reformatting algorithm. It's very powerful and
67designed to come up with the best formatting possible. Existing tools were
68created with different goals in mind, and would require extensive modifications
69to convert to using clang-format's algorithm.
70
71
72Can I Use YAPF In My Program?
73=============================
74
75Please do! YAPF was designed to be used as a library as well as a command line
76tool. This means that a tool or IDE plugin is free to use YAPF.
77
78
79Gory Details
80============
81
82Algorithm Design
83----------------
84
85The main data structure in YAPF is the UnwrappedLine object. It holds a list of
86FormatTokens, that we would want to place on a single line if there were no
87column limit. An exception being a comment in the middle of an expression
88statement will force the line to be formatted on more than one line. The
89formatter works on one UnwrappedLine object at a time.
90
91An UnwrappedLine typically won't affect the formatting of lines before or after
92it. There is a part of the algorithm that may join two or more UnwrappedLines
93into one line. For instance, an if-then statement with a short body can be
Bill Wendlingfa22c892015-03-18 13:42:25 -070094placed on a single line::
Bill Wendling7d623452015-03-18 13:36:07 -070095
96 if a == 42: continue
97
98YAPF's formatting algorithm creates a weighted tree that acts as the solution
99space for the algorithm. Each node in the tree represents the result of a
100formatting decision --- i.e., whether to split or not to split before a token.
101Each formatting decision has a cost associated with it. Therefore, the cost is
102realized on the edge between two nodes. (In reality, the weighted tree doesn't
103have separate edge objects, so the cost resides on the nodes themselves.)
104
105For example, take the following Python code snippet. For the sake of this
106example, assume that line (1) violates the column limit restriction and needs to
107be reformatted.
108
Bill Wendlingfa22c892015-03-18 13:42:25 -0700109.. code-block:: python
Bill Wendling7d623452015-03-18 13:36:07 -0700110
Bill Wendlingfa22c892015-03-18 13:42:25 -0700111 def xxxxxxxxxxx(aaaaaaaaaaaa, bbbbbbbbb, cccccccc, dddddddd, eeeeee): # 1
112 pass # 2
Bill Wendling7d623452015-03-18 13:36:07 -0700113
114For line (1), the algorithm will build a tree where each node (a
115FormattingDecisionState object) is the state of the line at that token given the
116decision to split before the token or not. Note: the FormatDecisionState objects
117are copied by value so each node in the graph is unique and a change in one
118doesn't affect other nodes.
119
Bill Wendlingfa22c892015-03-18 13:42:25 -0700120Heuristics are used to determine the costs of splitting or not splitting.
121Because a node holds the state of the tree up to a token's insertion, it can
122easily determine if a splitting decision will violate one of the style
Bill Wendling7d623452015-03-18 13:36:07 -0700123requirements. For instance, the heuristic is able to apply an extra penalty to
124the edge when not splitting between the previous token and the one being added.
125
126There are some instances where we will never want to split the line, because
127doing so will always be detrimental (i.e., it will require a backslash-newline,
128which is very rarely desirable). For line (1), we will never want to split the
129first three tokens: 'def', 'xxxxxxxxxxx', and '('. Nor will we want to split
130between the ')' and the ':' at the end. These regions are said to be
131"unbreakable." This is reflected in the tree by there not being a 'split'
132decision (left hand branch) within the unbreakable region.
133
134Now that we have the tree, we determine what the "best" formatting is by finding
135the path through the tree with the lowest cost.
136
137And that's it!