blob: 000b54e75584dff8782aa93da04a35512ce7b401 [file] [log] [blame]
Bill Wendling7d623452015-03-18 13:36:07 -07001====
2YAPF
3====
4
Bill Wendlingfb8ab382015-03-18 20:24:14 -07005.. image:: https://travis-ci.org/google/yapf.svg?branch=master
6 :target: https://travis-ci.org/google/yapf
7 :alt: Build status
8
Bill Wendling7d623452015-03-18 13:36:07 -07009Introduction
10============
11
12Most of the current formatters for Python -- e.g., autopep8, and pep8ify -- are
13made to remove lint errors from code. This has some obvious limitations. For
14instance, code that conforms to the PEP 8 guidelines may not be reformatted.
15But it doesn't mean that the code looks good.
16
17YAPF takes a different approach. It's based off of 'clang-format', developed by
18Daniel Jasper. In essence, the algorithm takes the code and reformats it to the
19best formatting that conforms to the style guide, even if the original code
20didn't violate the style guide.
21
22The ultimate goal is that the code YAPF produces is as good as the code that a
23programmer would write if they were following the style guide.
24
25.. contents::
26
27Installation
28============
29
30From source directory::
31
32 $ sudo python ./setup.py install
33
34
35Usage
36=====
37
Bill Wendlingfa22c892015-03-18 13:42:25 -070038Options::
Bill Wendling7d623452015-03-18 13:36:07 -070039
Bill Wendlingfa22c892015-03-18 13:42:25 -070040 usage: __main__.py [-h] [-d | -i] [-l START-END | -r] ...
Bill Wendling7d623452015-03-18 13:36:07 -070041
Bill Wendlingfa22c892015-03-18 13:42:25 -070042 Formatter for Python code.
Bill Wendling7d623452015-03-18 13:36:07 -070043
Bill Wendlingfa22c892015-03-18 13:42:25 -070044 positional arguments:
45 files
46
47 optional arguments:
48 -h, --help show this help message and exit
49 -d, --diff print the diff for the fixed source
50 -i, --in-place make changes to files in place
51 -l START-END, --lines START-END
52 range of lines to reformat, one-based
53 -r, --recursive run recursively over directories
Bill Wendling7d623452015-03-18 13:36:07 -070054
55
56Why Not Improve Existing Tools?
57===============================
58
59We wanted to use clang-format's reformatting algorithm. It's very powerful and
60designed to come up with the best formatting possible. Existing tools were
61created with different goals in mind, and would require extensive modifications
62to convert to using clang-format's algorithm.
63
64
65Can I Use YAPF In My Program?
66=============================
67
68Please do! YAPF was designed to be used as a library as well as a command line
69tool. This means that a tool or IDE plugin is free to use YAPF.
70
71
72Gory Details
73============
74
75Algorithm Design
76----------------
77
78The main data structure in YAPF is the UnwrappedLine object. It holds a list of
79FormatTokens, that we would want to place on a single line if there were no
80column limit. An exception being a comment in the middle of an expression
81statement will force the line to be formatted on more than one line. The
82formatter works on one UnwrappedLine object at a time.
83
84An UnwrappedLine typically won't affect the formatting of lines before or after
85it. There is a part of the algorithm that may join two or more UnwrappedLines
86into one line. For instance, an if-then statement with a short body can be
Bill Wendlingfa22c892015-03-18 13:42:25 -070087placed on a single line::
Bill Wendling7d623452015-03-18 13:36:07 -070088
89 if a == 42: continue
90
91YAPF's formatting algorithm creates a weighted tree that acts as the solution
92space for the algorithm. Each node in the tree represents the result of a
93formatting decision --- i.e., whether to split or not to split before a token.
94Each formatting decision has a cost associated with it. Therefore, the cost is
95realized on the edge between two nodes. (In reality, the weighted tree doesn't
96have separate edge objects, so the cost resides on the nodes themselves.)
97
98For example, take the following Python code snippet. For the sake of this
99example, assume that line (1) violates the column limit restriction and needs to
100be reformatted.
101
Bill Wendlingfa22c892015-03-18 13:42:25 -0700102.. code-block:: python
Bill Wendling7d623452015-03-18 13:36:07 -0700103
Bill Wendlingfa22c892015-03-18 13:42:25 -0700104 def xxxxxxxxxxx(aaaaaaaaaaaa, bbbbbbbbb, cccccccc, dddddddd, eeeeee): # 1
105 pass # 2
Bill Wendling7d623452015-03-18 13:36:07 -0700106
107For line (1), the algorithm will build a tree where each node (a
108FormattingDecisionState object) is the state of the line at that token given the
109decision to split before the token or not. Note: the FormatDecisionState objects
110are copied by value so each node in the graph is unique and a change in one
111doesn't affect other nodes.
112
Bill Wendlingfa22c892015-03-18 13:42:25 -0700113Heuristics are used to determine the costs of splitting or not splitting.
114Because a node holds the state of the tree up to a token's insertion, it can
115easily determine if a splitting decision will violate one of the style
Bill Wendling7d623452015-03-18 13:36:07 -0700116requirements. For instance, the heuristic is able to apply an extra penalty to
117the edge when not splitting between the previous token and the one being added.
118
119There are some instances where we will never want to split the line, because
120doing so will always be detrimental (i.e., it will require a backslash-newline,
121which is very rarely desirable). For line (1), we will never want to split the
122first three tokens: 'def', 'xxxxxxxxxxx', and '('. Nor will we want to split
123between the ')' and the ':' at the end. These regions are said to be
124"unbreakable." This is reflected in the tree by there not being a 'split'
125decision (left hand branch) within the unbreakable region.
126
127Now that we have the tree, we determine what the "best" formatting is by finding
128the path through the tree with the lowest cost.
129
130And that's it!