blob: 77de93ed0f2b449aec242cdd6c1be122f0fd06d9 [file] [log] [blame]
Brett Cannon8045d972011-02-03 22:01:54 +00001.. _pyporting-howto:
2
3*********************************
4Porting Python 2 Code to Python 3
5*********************************
6
7:author: Brett Cannon
8
9.. topic:: Abstract
10
11 With Python 3 being the future of Python while Python 2 is still in active
12 use, it is good to have your project available for both major releases of
13 Python. This guide is meant to help you choose which strategy works best
14 for your project to support both Python 2 & 3 along with how to execute
15 that strategy.
16
17 If you are looking to port an extension module instead of pure Python code,
18 please see http://docs.python.org/py3k/howto/cporting.html .
19
20
21Choosing a Strategy
22===================
23When a project makes the decision that it's time to support both Python 2 & 3,
24a decision needs to be made as to how to go about accomplishing that goal.
25Which strategy goes with will depend on how large the project's existing
26codebase is and how much divergence you want from your Python 2 codebase from
27your Python 3 one (e.g., starting a new version with Python 3).
28
29If your project is brand-new or does not have a large codebase, then you may
30want to consider writing/porting :ref:`all of your code for Python 3
31and use 3to2 <use_3to2>` to port your code for Python 2.
32
33If your project has a pre-existing Python 2 codebase and you would like Python
343 support to start off a new branch or version of your project, then you will
35most likely want to :ref:`port using 2to3 <use_2to3>`. This will allow you port
36your Python 2 code to Python 3 in a semi-automated fashion and begin to
37maintain it separately from your Python 2 code. This approach can also work if
38your codebase is small and/or simple enough for the translation to occur
39quickly.
40
41Finally, if you want to maintain Python 2 and Python 3 versions of your project
42simultaneously and with no differences, then you can write :ref:`Python 2/3
43source-compatible code <use_same_source>`. While the code is not quite as
44idiomatic as it would be written just for Python 3 or automating the port from
45Python 2, it does makes it easier to continue to do rapid development
46regardless of what major version of Python you are developing against at the
47time.
48
49Regardless of which approach you choose, porting is probably not as hard or
50time-consuming as you might initially think. You can also tackle the problem
51piece-meal as a good portion of porting is simply updating your code to follow
52current best practices in a Python 2/3 compatible way.
53
54
55Universal Bits of Advice
56------------------------
57Regardless of what strategy you pick, there are a few things you should
58consider.
59
60One is make sure you have a robust test suite. You need to make sure everything
61continues to work, just like when you support a new minor version of Python.
62This means making sure your test suite is thorough and is ported properly
63between Python 2 & 3. You will also most likely want to use something like tox_
64to automate testing between both a Python 2 and Python 3 VM.
65
66Two, once your project has Python 3 support, make sure to add the proper
67classifier on the Cheeseshop_ (PyPI_). To have your project listed as Python 3
68compatible it must have the
69`Python 3 classifier <http://pypi.python.org/pypi?:action=browse&c=533>`_
70(from
71http://techspot.zzzeek.org/2011/01/24/zzzeek-s-guide-to-python-3-porting/)::
72
73 setup(
74 name='Your Library',
75 version='1.0',
76 classifiers=[
77 # make sure to use :: Python *and* :: Python :: 3 so
78 # that pypi can list the package on the python 3 page
79 'Programming Language :: Python',
80 'Programming Language :: Python :: 3'
81 ],
82 packages=['yourlibrary'],
83 # make sure to add custom_fixers to the MANIFEST.in
84 include_package_data=True,
85 # ...
86 )
87
88
89Doing so will cause your project to show up in the
90`Python 3 packages list
91<http://pypi.python.org/pypi?:action=browse&c=533&show=all>`_. You will know
92you set the classifier properly as visiting your project page on the Cheeseshop
93will show a Python 3 logo in the upper-left corner of the page.
94
95Three, the six_ project provides a library which helps iron out differences
96between Python 2 & 3. If you find there is a sticky point that is a continual
97point of contention in your translation or maintenance of code, consider using
98a source-compatible solution relying on six. If you have to create your own
99Python 2/3 compatible solution, you can use ``sys.version_info[0] >= 3`` as a
100guard.
101
102Four, read all the approaches. Just because some bit of advice applies to one
103approach more than another doesn't mean that some advice doesn't apply to other
104strategies.
105
106Five, drop support for older Python versions if possible. While not a
107requirement, `Python 2.5`_) introduced a lot of useful syntax and libraries
108which have become idiomatic in Python 3. `Python 2.6`_ introduced future
109statements which makes compatibility much easier if you are going from Python 2
110to 3.
111`Python 2.7`_ continues the trend in the stdlib. So choose the newest version
112of Python for which you believe you believe can be your minimum support version
113and work from there.
114
115
116.. _tox: http://codespeak.net/tox/
117.. _Cheeseshop:
118.. _PyPI: http://pypi.python.org/
119.. _six: http://packages.python.org/six
120.. _Python 2.7: http://www.python.org/2.7.x
121.. _Python 2.6: http://www.python.org/2.6.x
122.. _Python 2.5: http://www.python.org/2.5.x
123.. _Python 2.4: http://www.python.org/2.4.x
124
125
126.. _use_3to2:
127
128Python 3 and 3to2
129=================
130If you are starting a new project or your codebase is small enough, you may
131want to consider writing your code for Python 3 and backporting to Python 2
132using 3to2_. Thanks to Python 3 being more strict about things than Python 2
133(e.g., bytes vs. strings), the source translation can be easier and more
134straightforward than from Python 2 to 3. Plus it gives you more direct
135experience developing in Python 3 which, since it is the future of Python, is a
136good thing long-term.
137
138A drawback of this approach is that 3to2 is a third-party project. This means
139that the Python core developers (and thus this guide) can make no promises
140about how well 3to2 works at any time. There is nothing to suggest, though,
141that 3to2 is not a high-quality project.
142
143
144.. _3to2: https://bitbucket.org/amentajo/lib3to2/overview
145
146
147.. _use_2to3:
148
149Python 2 and 2to3
150=================
151Included with Python since 2.6, 2to3_ tool (and :mod:`lib2to3` module) helps
152with porting Python 2 to Python 3 by performing various source translations.
153This is a perfect solution for projects which wish to branch their Python 3
154code from their Python 2 codebase and maintain them as independent codebases.
155You can even begin preparing to use this approach today by writing
156future-compatible Python code which works cleanly in Python 2 in conjunction
157with 2to3; all steps outlined below will work with Python 2 code up to the
158point when the actual use of 2to3 occurs.
159
160Use of 2to3 as an on-demand translation step at install time is also possible,
161preventing the need to maintain a separate Python 3 codebase, but this approach
162does come with some drawbacks. While users will only have to pay the
163translation cost once at installation, you as a developer will need to pay the
164cost regularly during development. If your codebase is sufficiently large
165enough then the translation step ends up acting like a compilation step,
166robbing you of the rapid development process you are used to with Python.
167Obviously the time required to translate a project will vary, so do an
168experimental translation just to see how long it takes to evaluate whether you
169prefer this approach compared to using :ref:`use_same_source` or simply keeping
170a separate Python 3 codebase.
171
172Below are the typical steps taken by a project which uses a 2to3-based approach
173to supporting Python 2 & 3.
174
175
176Support Python 2.7
177------------------
178As a first step, make sure that your project is compatible with `Python 2.7`_.
179This is just good to do as Python 2.7 is the last release of Python 2 and thus
180will be used for a rather long time. It also allows for use of the ``-3`` flag
181to Python to help discover places in your code which 2to3 cannot handle but are
182known to cause issues.
183
184Try to Support Python 2.6 and Newer Only
185----------------------------------------
186While not possible for all projects, if you can support `Python 2.6`_ and newer
187**only**, your life will be much easier. Various future statements, stdlib
188additions, etc. exist only in Python 2.6 and later which greatly assist in
189porting to Python 3. But if you project must keep support for `Python 2.5`_ (or
190even `Python 2.4`_) then it is still possible to port to Python 3.
191
192Below are the benefits you gain if you only have to support Python 2.6 and
193newer. Some of these options are personal choice while others are
194**strongly** recommended (the ones that are more for personal choice are
195labeled as such). If you continue to support older versions of Python then you
196at least need to watch out for situations that these solutions fix.
197
198
199``from __future__ import division``
200'''''''''''''''''''''''''''''''''''
201While the exact same outcome can be had by using the ``-Qnew`` argument to
202Python, using this future statement lifts the requirement that your users use
203the flag to get the expected behavior of division in Python 3 (e.g., ``1/2 ==
2040.5; 1//2 == 0``).
205
206
207``from __future__ import absolute_imports``
208'''''''''''''''''''''''''''''''''''''''''''
209Implicit relative imports (e.g., importing ``spam.bacon`` from within
210``spam.eggs`` with the statement ``import bacon``) does not work in Python 3.
211This future statement moves away from that and allows the use of explicit
212relative imports (e.g., ``from . import bacon``).
213
214
215``from __future__ import print_function``
216'''''''''''''''''''''''''''''''''''''''''
217This is a personal choice. 2to3 handles the translation from the print
218statement to the print function rather well so this is an optional step. This
219future statement does help, though, with getting used to typing
220``print('Hello, World')`` instead of ``print 'Hello, World'``.
221
222
223``from __future__ import unicode_literals``
224'''''''''''''''''''''''''''''''''''''''''''
225Another personal choice. You can always mark what you want to be a (unicode)
226string with a ``u`` prefix to get the same effect. But regardless of whether
227you use this future statement or not, you **must** make sure you know exactly
228which Python 2 strings you want to be bytes, and which are to be strings. This
229means you should, **at minimum** mark all strings that are meant to be text
230strings with a ``u`` prefix if you do not use this future statement.
231
232
233Bytes literals
234''''''''''''''
235This is a **very** important one. The ability to prefix Python 2 strings that
236are meant to contain bytes with a ``b`` prefix help to very clearly delineate
237what is and is not a Python 3 string. When you run 2to3 on code, all Python 2
238strings become Python 3 strings **unless** they are prefixed with ``b``.
239
240There are some differences between byte literals in Python 2 and those in
241Python 3 thanks to the bytes type just being an alias to ``str`` in Python 2.
242Probably the biggest "gotcha" is that indexing results in different values. In
243Python 2, the value of ``b'py'[1]`` is ``'y'``, while in Python 3 it's ``121``.
244You can avoid this disparity by always slicing at the size of a single element:
245``b'py'[1:2]`` is ``'y'`` in Python 2 and ``b'y'`` in Python 3 (i.e., close
246enough).
247
248You cannot concatenate bytes and strings in Python 3. But since in Python
2492 has bytes aliased to ``str``, it will succeed: ``b'a' + u'b'`` works in
250Python 2, but ``b'a' + 'b'`` in Python 3 is a :exc:`TypeError`. A similar issue
251also comes about when doing comparisons between bytes and strings.
252
253
254:mod:`io` Module
255''''''''''''''''
256The built-in ``open()`` function in Python 2 always returns a Python 2 string,
257not a unicode string. This is problematic as Python 3's :func:`open` returns a
258string if a file is not opened as binary and bytes if it is.
259
260To help with compatibility, use :func:`io.open` instead of the built-in
261``open()``. Since :func:`io.open` is essentially the same function in both
262Python 2 and Python 3 it will help iron out any issues that might arise.
263
264
265Handle Common "Gotchas"
266-----------------------
267There are a few things that just consistently come up as sticking points for
268people which 2to3 cannot handle automatically or can easily be done in Python 2
269to help modernize your code.
270
271
272Subclass ``object``
273'''''''''''''''''''
274New-style classes have been around since Python 2.2. You need to make sure you
275are subclassing from ``object`` to avoid odd edge cases involving method
276resolution order, etc. This continues to be totally valid in Python 3 (although
277unneeded as all classes implicitly inherit from ``object``).
278
279
280Deal With the Bytes/String Dichotomy
281''''''''''''''''''''''''''''''''''''
282One of the biggest issues people have when porting code to Python 3 is handling
283the bytes/string dichotomy. Because Python 2 allowed the ``str`` type to hold
284textual data, people have over the years been rather loose in their delineation
285of what ``str`` instances held text compared to bytes. In Python 3 you cannot
286be so care-free anymore and need to properly handle the difference. The key
287handling this issue to to make sure that **every** string literal in your
288Python 2 code is either syntactically of functionally marked as either bytes or
289text data. After this is done you then need to make sure your APIs are designed
290to either handle a specific type or made to be properly polymorphic.
291
292
293Mark Up Python 2 String Literals
294********************************
295
296First thing you must do is designate every single string literal in Python 2
297as either textual or bytes data. If you are only supporting Python 2.6 or
298newer, this can be accomplished by marking bytes literals with a ``b`` prefix
299and then designating textual data with a ``u`` prefix or using the
300``unicode_literals`` future statement.
301
302If your project supports versions of Python pre-dating 2.6, then you should use
303the six_ project and its ``b()`` function to denote bytes literals. For text
304literals you can either use six's ``u()`` function or use a ``u`` prefix.
305
306
307Decide what APIs Will Accept
308****************************
309In Python 2 it was very easy to accidentally create an API that accepted both
310bytes and textual data. But in Python 3, thanks to the more strict handling of
311disparate types, this loose usage of bytes and text together tends to fail.
312
313Take the dict ``{b'a': 'bytes', u'a': 'text'}`` in Python 2.6. It creates the
314dict ``{u'a': 'text'}`` since ``b'a' == u'a'``. But in Python 3 the equivalent
315dict creates ``{b'a': 'bytes', 'a': 'text'}``, i.e., no lost data. Similar
316issues can crop up when transitioning Python 2 code to Python 3.
317
318This means you need to choose what an API is going to accept and create and
319consistently stick to that API in both Python 2 and 3.
320
321
322``__str__()``/``__unicode__()``
323'''''''''''''''''''''''''''''''
324In Python 2, objects can specify both a string and unicode representation of
325themselves. In Python 3, though, there is only a string representation. This
326becomes an issue as people can inadvertantly do things in their ``__str__()``
327methods which have unpredictable results (e.g., infinite recursion if you
328happen to use the ``unicode(self).encode('utf8')`` idiom as the body of your
329``__str__()`` method).
330
331There are two ways to solve this issue. One is to use a custom 2to3 fixer. The
332blog post at http://lucumr.pocoo.org/2011/1/22/forwards-compatible-python/
333specifies how to do this. That will allow 2to3 to change all instances of ``def
334__unicode(self): ...`` to ``def __str__(self): ...``. This does require you
335define your ``__str__()`` method in Python 2 before your ``__unicode__()``
336method.
337
338The other option is to use a mixin class. This allows you to only define a
339``__unicode__()`` method for your class and let the mixin derive
340``__str__()`` for you (code from
341http://lucumr.pocoo.org/2011/1/22/forwards-compatible-python/)::
342
343 import sys
344
345 class UnicodeMixin(object):
346
347 """Mixin class to handle defining the proper __str__/__unicode__
348 methods in Python 2 or 3."""
349
350 if sys.version_info[0] >= 3: # Python 3
351 def __str__(self):
352 return self.__unicode__()
353 else: # Python 2
354 def __str__(self):
355 return self.__unicode__().encode('utf8')
356
357
358 class Spam(UnicodeMixin):
359
360 def __unicode__(self):
361 return u'spam-spam-bacon-spam' # 2to3 will remove the 'u' prefix
362
363
364Specify when opening a file as binary
365'''''''''''''''''''''''''''''''''''''
366Unless you have been working on Windows, there is a chance you have not always
367bothered to add the ``b`` mode when opening a file (e.g., ``
368
369
370Use :func:``codecs.open()``
371'''''''''''''''''''''''''''
372If you are not able to limit your Python 2 compatibility to 2.6 or newer (and
373thus get to use :func:`io.open`), then you should make sure you use
374:func:`codecs.open` over the built-in ``open()`` function. This will make sure
375that you get back unicode strings in Python 2 when reading in text and an
376instance of ``str`` when dealing with bytes.
377
378
379Don't Index on Exceptions
380'''''''''''''''''''''''''
381In Python 2, the following worked::
382
383 >>> exc = Exception(1, 2, 3)
384 >>> exc.args[1]
385 2
386 >>> exc[1] # Python 2 only!
387 2
388
389But in Python 3, indexing directly off of an exception is an error. You need to
390make sure to only index on :attr:`BaseException.args` attribute which is a
391sequence containing all arguments passed to the :meth:`__init__` method.
392
393Even better is to use documented attributes the exception provides.
394
395
396Don't use ``__getslice__`` & Friends
397''''''''''''''''''''''''''''''''''''
398Been deprecated for a while, but Python 3 finally drops support for
399``__getslice__()``, etc. Move completely over to :meth:`__getitem__` and
400friends.
401
402
403Stop Using :mod:`doctest`
404'''''''''''''''''''''''''
405While 2to3 tries to port doctests properly, it's a rather tough thing to do. It
406is probably best to simply convert your critical doctests to :mod:`unittest`.
407
408
409Eliminate ``-3`` Warnings
410-------------------------
411When you run your application's test suite, run it using the ``-3`` flag passed
412to Python. This will cause various warnings to be raised during execution about
413things that 2to3 cannot handle automatically (e.g., modules that have been
414removed). Try to eliminate those warnings to make your code even more portable
415to Python 3.
416
417
418Run 2to3
419--------
420Once you have made your Python 2 code future-compatible with Python 3, it's
421time to use 2to3_ to actually port your code.
422
423
424Manually
425''''''''
426To manually convert source code using 2to3_, you use the ``2to3`` script that
427is installed with Python 2.6 and later.::
428
429 2to3 <directory or file to convert>
430
431This will cause 2to3 to write out a diff with all of the fixers applied for the
432converted source code. If you would like 2to3 to go ahead and apply the changes
433you can pass it the ``-w`` flag::
434
435 2to3 -w <stuff to convert>
436
437There are other flags available to control exactly which fixers are applied,
438etc.
439
440
441During Installation
442'''''''''''''''''''
443When a user installs your project for Python 3, you can have either
444:mod:`distutils` or Distribute_ run 2to3_ on your behalf.
445For distutils, use the following idiom::
446
447 try: # Python 3
448 from distutils.command.build_py import build_py_2to3 as build_py
449 except ImportError: # Python 2
450 from distutils.command.build_py import build_py
451
452 setup(cmdclass = {'build_py':build_py},
453 # ...
454 )
455
456For Distribute::
457
458 setup(use_2to3=True,
459 # ...
460 )
461
462This will allow you to not have to distribute a separate Python 3 version of
463your project. It does require, though, that when you perform development that
464you at least build your project and use the built Python 3 source for testing.
465
466
467Verify & Test
468-------------
469At this point you should (hopefully) have your project converted in such a way
470that it works in Python 3. Verify it by running your unit tests and making sure
471nothing has gone awry. If you miss something then figure out how to fix it in
472Python 3, backport to your Python 2 code, and run your code through 2to3 again
473to verify the fix transforms properly.
474
475
476.. _2to3: http://docs.python.org/py3k/library/2to3.html
477.. _Distribute: http://packages.python.org/distribute/
478
479
480.. _use_same_source:
481
482Python 2/3 Compatible Source
483============================
484While it may seem counter-intuitive, you can write Python code which is
485source-compatible between Python 2 & 3. It does lead to code that is not
486entirely idiomatic Python (e.g., having to extract the currently raised
487exception from ``sys.exc_info()[1]``), but it can be run under Python 2
488**and** Python 3 without using 2to3_ as a translation step. This allows you to
489continue to have a rapid development process regardless of whether you are
490developing under Python 2 or Python 3. Whether this approach or using
491:ref:`use_2to3` works best for you will be a per-project decision.
492
493To get a complete idea of what issues you will need to deal with, see the
494`What's New in Python 3.0`_. Others have reorganized the data in other formats
495such as http://docs.pythonsprints.com/python3_porting/py-porting.html .
496
497The following are some steps to take to try to support both Python 2 & 3 from
498the same source code.
499
500
501.. _What's New in Python 3.0: http://docs.python.org/release/3.0/whatsnew/3.0.html
502
503
504Follow The Steps for Using 2to3_ (sans 2to3)
505--------------------------------------------
506All of the steps outlined in how to
507:ref:`port Python 2 code with 2to3 <use_2to3>` apply
508to creating a Python 2/3 codebase. This includes trying only support Python 2.6
509or newer (the :mod:`__future__` statements work in Python 3 without issue),
510eliminating warnings that are triggered by ``-3``, etc.
511
512Essentially you should cover all of the steps short of running 2to3 itself.
513
514
515Use six_
516--------
517The six_ project contains many things to help you write portable Python code.
518You should make sure to read its documentation from beginning to end and use
519any and all features it provides. That way you will minimize any mistakes you
520might make in writing cross-version code.
521
522
523Capturing the Currently Raised Exception
524----------------------------------------
525One change between Python 2 and 3 that will require changing how you code is
526accessing the currently raised exception. In Python 2 the syntax to access the
527current exception is::
528
529 try:
530 raise Exception()
531 except Exception, exc:
532 # Current exception is 'exc'
533 pass
534
535This syntax changed in Python 3 to::
536
537 try:
538 raise Exception()
539 except Exception as exc:
540 # Current exception is 'exc'
541 pass
542
543Because of this syntax change you must change to capturing the current
544exception to::
545
546 try:
547 raise Exception()
548 except Exception:
549 import sys
550 exc = sys.exc_info()[1]
551 # Current exception is 'exc'
552 pass
553
554You can get more information about the raised exception from
555:func:`sys.exc_info` than simply the current exception instance, but you most
556likely don't need it. One very key point to understand, though, is **do not
557save the traceback to a variable without deleting it**! Because tracebacks
558contain references to the current executing frame you will inadvertently create
559a circular reference, prevent everything in the frame from being garbage
560collected. This can be a massive memory leak if you are not careful. Simply
561index into the returned value from :func:`sys.version_info` instead of
562assigning the tuple it returns to a variable.
563
564
565Other Resources
566===============
567The authors of the following blogs posts and wiki pages deserve special thanks
568for making public their tips for porting Python 2 code to Python 3 (and thus
569helping provide information for this document):
570
571* http://docs.pythonsprints.com/python3_porting/py-porting.html
572* http://techspot.zzzeek.org/2011/01/24/zzzeek-s-guide-to-python-3-porting/
573* http://dabeaz.blogspot.com/2011/01/porting-py65-and-my-superboard-to.html
574* http://lucumr.pocoo.org/2011/1/22/forwards-compatible-python/
575* http://lucumr.pocoo.org/2010/2/11/porting-to-python-3-a-guide/
576* http://wiki.python.org/moin/PortingPythonToPy3k
577
578If you feel there is something missing from this document that should be added,
579please email the python-porting_ mailing list.
580
581.. _python-porting: http://mail.python.org/mailman/listinfo/python-porting