Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 1 | .. _tut-fp-issues: |
| 2 | |
| 3 | ************************************************** |
| 4 | Floating Point Arithmetic: Issues and Limitations |
| 5 | ************************************************** |
| 6 | |
| 7 | .. sectionauthor:: Tim Peters <tim_one@users.sourceforge.net> |
| 8 | |
| 9 | |
| 10 | Floating-point numbers are represented in computer hardware as base 2 (binary) |
| 11 | fractions. For example, the decimal fraction :: |
| 12 | |
| 13 | 0.125 |
| 14 | |
| 15 | has value 1/10 + 2/100 + 5/1000, and in the same way the binary fraction :: |
| 16 | |
| 17 | 0.001 |
| 18 | |
| 19 | has value 0/2 + 0/4 + 1/8. These two fractions have identical values, the only |
| 20 | real difference being that the first is written in base 10 fractional notation, |
| 21 | and the second in base 2. |
| 22 | |
| 23 | Unfortunately, most decimal fractions cannot be represented exactly as binary |
| 24 | fractions. A consequence is that, in general, the decimal floating-point |
| 25 | numbers you enter are only approximated by the binary floating-point numbers |
| 26 | actually stored in the machine. |
| 27 | |
| 28 | The problem is easier to understand at first in base 10. Consider the fraction |
| 29 | 1/3. You can approximate that as a base 10 fraction:: |
| 30 | |
| 31 | 0.3 |
| 32 | |
| 33 | or, better, :: |
| 34 | |
| 35 | 0.33 |
| 36 | |
| 37 | or, better, :: |
| 38 | |
| 39 | 0.333 |
| 40 | |
| 41 | and so on. No matter how many digits you're willing to write down, the result |
| 42 | will never be exactly 1/3, but will be an increasingly better approximation of |
| 43 | 1/3. |
| 44 | |
| 45 | In the same way, no matter how many base 2 digits you're willing to use, the |
| 46 | decimal value 0.1 cannot be represented exactly as a base 2 fraction. In base |
| 47 | 2, 1/10 is the infinitely repeating fraction :: |
| 48 | |
| 49 | 0.0001100110011001100110011001100110011001100110011... |
| 50 | |
Mark Dickinson | 33e5935 | 2010-08-04 21:44:47 +0000 | [diff] [blame] | 51 | Stop at any finite number of bits, and you get an approximation. |
| 52 | |
| 53 | On a typical machine running Python, there are 53 bits of precision available |
| 54 | for a Python float, so the value stored internally when you enter the decimal |
| 55 | number ``0.1`` is the binary fraction :: |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 56 | |
Mark Dickinson | d5d3256 | 2010-07-30 12:58:44 +0000 | [diff] [blame] | 57 | 0.00011001100110011001100110011001100110011001100110011010 |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 58 | |
Mark Dickinson | d5d3256 | 2010-07-30 12:58:44 +0000 | [diff] [blame] | 59 | which is close to, but not exactly equal to, 1/10. |
| 60 | |
| 61 | It's easy to forget that the stored value is an approximation to the original |
| 62 | decimal fraction, because of the way that floats are displayed at the |
| 63 | interpreter prompt. Python only prints a decimal approximation to the true |
| 64 | decimal value of the binary approximation stored by the machine. If Python |
| 65 | were to print the true decimal value of the binary approximation stored for |
| 66 | 0.1, it would have to display :: |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 67 | |
| 68 | >>> 0.1 |
| 69 | 0.1000000000000000055511151231257827021181583404541015625 |
| 70 | |
Mark Dickinson | d5d3256 | 2010-07-30 12:58:44 +0000 | [diff] [blame] | 71 | That is more digits than most people find useful, so Python keeps the number |
| 72 | of digits manageable by displaying a rounded value instead :: |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 73 | |
Mark Dickinson | d5d3256 | 2010-07-30 12:58:44 +0000 | [diff] [blame] | 74 | >>> 0.1 |
| 75 | 0.1 |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 76 | |
Mark Dickinson | d5d3256 | 2010-07-30 12:58:44 +0000 | [diff] [blame] | 77 | It's important to realize that this is, in a real sense, an illusion: the value |
| 78 | in the machine is not exactly 1/10, you're simply rounding the *display* of the |
| 79 | true machine value. This fact becomes apparent as soon as you try to do |
| 80 | arithmetic with these values :: |
| 81 | |
| 82 | >>> 0.1 + 0.2 |
| 83 | 0.30000000000000004 |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 84 | |
Mark Dickinson | 33e5935 | 2010-08-04 21:44:47 +0000 | [diff] [blame] | 85 | Note that this is in the very nature of binary floating-point: this is not a |
| 86 | bug in Python, and it is not a bug in your code either. You'll see the same |
| 87 | kind of thing in all languages that support your hardware's floating-point |
| 88 | arithmetic (although some languages may not *display* the difference by |
| 89 | default, or in all output modes). |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 90 | |
Mark Dickinson | 33e5935 | 2010-08-04 21:44:47 +0000 | [diff] [blame] | 91 | Other surprises follow from this one. For example, if you try to round the |
| 92 | value 2.675 to two decimal places, you get this :: |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 93 | |
Mark Dickinson | d5d3256 | 2010-07-30 12:58:44 +0000 | [diff] [blame] | 94 | >>> round(2.675, 2) |
| 95 | 2.67 |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 96 | |
Mark Dickinson | d5d3256 | 2010-07-30 12:58:44 +0000 | [diff] [blame] | 97 | The documentation for the built-in :func:`round` function says that it rounds |
| 98 | to the nearest value, rounding ties away from zero. Since the decimal fraction |
| 99 | 2.675 is exactly halfway between 2.67 and 2.68, you might expect the result |
| 100 | here to be (a binary approximation to) 2.68. It's not, because when the |
Mark Dickinson | 33e5935 | 2010-08-04 21:44:47 +0000 | [diff] [blame] | 101 | decimal string ``2.675`` is converted to a binary floating-point number, it's |
Mark Dickinson | d5d3256 | 2010-07-30 12:58:44 +0000 | [diff] [blame] | 102 | again replaced with a binary approximation, whose exact value is :: |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 103 | |
Mark Dickinson | d5d3256 | 2010-07-30 12:58:44 +0000 | [diff] [blame] | 104 | 2.67499999999999982236431605997495353221893310546875 |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 105 | |
Mark Dickinson | d5d3256 | 2010-07-30 12:58:44 +0000 | [diff] [blame] | 106 | Since this approximation is slightly closer to 2.67 than to 2.68, it's rounded |
| 107 | down. |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 108 | |
Mark Dickinson | d5d3256 | 2010-07-30 12:58:44 +0000 | [diff] [blame] | 109 | If you're in a situation where you care which way your decimal halfway-cases |
| 110 | are rounded, you should consider using the :mod:`decimal` module. |
| 111 | Incidentally, the :mod:`decimal` module also provides a nice way to "see" the |
| 112 | exact value that's stored in any particular Python float :: |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 113 | |
Mark Dickinson | d5d3256 | 2010-07-30 12:58:44 +0000 | [diff] [blame] | 114 | >>> from decimal import Decimal |
| 115 | >>> Decimal(2.675) |
| 116 | Decimal('2.67499999999999982236431605997495353221893310546875') |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 117 | |
Mark Dickinson | 33e5935 | 2010-08-04 21:44:47 +0000 | [diff] [blame] | 118 | Another consequence is that since 0.1 is not exactly 1/10, summing ten values |
| 119 | of 0.1 may not yield exactly 1.0, either:: |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 120 | |
| 121 | >>> sum = 0.0 |
| 122 | >>> for i in range(10): |
| 123 | ... sum += 0.1 |
| 124 | ... |
| 125 | >>> sum |
Mark Dickinson | 6b87f11 | 2009-11-24 14:27:02 +0000 | [diff] [blame] | 126 | 0.9999999999999999 |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 127 | |
| 128 | Binary floating-point arithmetic holds many surprises like this. The problem |
| 129 | with "0.1" is explained in precise detail below, in the "Representation Error" |
| 130 | section. See `The Perils of Floating Point <http://www.lahey.com/float.htm>`_ |
| 131 | for a more complete account of other common surprises. |
| 132 | |
| 133 | As that says near the end, "there are no easy answers." Still, don't be unduly |
| 134 | wary of floating-point! The errors in Python float operations are inherited |
| 135 | from the floating-point hardware, and on most machines are on the order of no |
| 136 | more than 1 part in 2\*\*53 per operation. That's more than adequate for most |
| 137 | tasks, but you do need to keep in mind that it's not decimal arithmetic, and |
| 138 | that every float operation can suffer a new rounding error. |
| 139 | |
| 140 | While pathological cases do exist, for most casual use of floating-point |
| 141 | arithmetic you'll see the result you expect in the end if you simply round the |
Mark Dickinson | 33e5935 | 2010-08-04 21:44:47 +0000 | [diff] [blame] | 142 | display of your final results to the number of decimal digits you expect. For |
| 143 | fine control over how a float is displayed see the :meth:`str.format` method's |
| 144 | format specifiers in :ref:`formatstrings`. |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 145 | |
| 146 | |
| 147 | .. _tut-fp-error: |
| 148 | |
| 149 | Representation Error |
| 150 | ==================== |
| 151 | |
Mark Dickinson | 33e5935 | 2010-08-04 21:44:47 +0000 | [diff] [blame] | 152 | This section explains the "0.1" example in detail, and shows how you can |
| 153 | perform an exact analysis of cases like this yourself. Basic familiarity with |
| 154 | binary floating-point representation is assumed. |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 155 | |
| 156 | :dfn:`Representation error` refers to the fact that some (most, actually) |
| 157 | decimal fractions cannot be represented exactly as binary (base 2) fractions. |
| 158 | This is the chief reason why Python (or Perl, C, C++, Java, Fortran, and many |
| 159 | others) often won't display the exact decimal number you expect:: |
| 160 | |
Mark Dickinson | d5d3256 | 2010-07-30 12:58:44 +0000 | [diff] [blame] | 161 | >>> 0.1 + 0.2 |
| 162 | 0.30000000000000004 |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 163 | |
Mark Dickinson | d5d3256 | 2010-07-30 12:58:44 +0000 | [diff] [blame] | 164 | Why is that? 1/10 and 2/10 are not exactly representable as a binary |
| 165 | fraction. Almost all machines today (July 2010) use IEEE-754 floating point |
| 166 | arithmetic, and almost all platforms map Python floats to IEEE-754 "double |
| 167 | precision". 754 doubles contain 53 bits of precision, so on input the computer |
| 168 | strives to convert 0.1 to the closest fraction it can of the form *J*/2**\ *N* |
| 169 | where *J* is an integer containing exactly 53 bits. Rewriting :: |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 170 | |
| 171 | 1 / 10 ~= J / (2**N) |
| 172 | |
| 173 | as :: |
| 174 | |
| 175 | J ~= 2**N / 10 |
| 176 | |
| 177 | and recalling that *J* has exactly 53 bits (is ``>= 2**52`` but ``< 2**53``), |
| 178 | the best value for *N* is 56:: |
| 179 | |
| 180 | >>> 2**52 |
Mark Dickinson | 33e5935 | 2010-08-04 21:44:47 +0000 | [diff] [blame] | 181 | 4503599627370496 |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 182 | >>> 2**53 |
Mark Dickinson | 33e5935 | 2010-08-04 21:44:47 +0000 | [diff] [blame] | 183 | 9007199254740992 |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 184 | >>> 2**56/10 |
Mark Dickinson | 33e5935 | 2010-08-04 21:44:47 +0000 | [diff] [blame] | 185 | 7205759403792793 |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 186 | |
Mark Dickinson | 33e5935 | 2010-08-04 21:44:47 +0000 | [diff] [blame] | 187 | That is, 56 is the only value for *N* that leaves *J* with exactly 53 bits. |
| 188 | The best possible value for *J* is then that quotient rounded:: |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 189 | |
| 190 | >>> q, r = divmod(2**56, 10) |
| 191 | >>> r |
Mark Dickinson | 33e5935 | 2010-08-04 21:44:47 +0000 | [diff] [blame] | 192 | 6 |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 193 | |
| 194 | Since the remainder is more than half of 10, the best approximation is obtained |
| 195 | by rounding up:: |
| 196 | |
| 197 | >>> q+1 |
Mark Dickinson | 33e5935 | 2010-08-04 21:44:47 +0000 | [diff] [blame] | 198 | 7205759403792794 |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 199 | |
| 200 | Therefore the best possible approximation to 1/10 in 754 double precision is |
| 201 | that over 2\*\*56, or :: |
| 202 | |
| 203 | 7205759403792794 / 72057594037927936 |
| 204 | |
| 205 | Note that since we rounded up, this is actually a little bit larger than 1/10; |
Mark Dickinson | 33e5935 | 2010-08-04 21:44:47 +0000 | [diff] [blame] | 206 | if we had not rounded up, the quotient would have been a little bit smaller |
| 207 | than 1/10. But in no case can it be *exactly* 1/10! |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 208 | |
| 209 | So the computer never "sees" 1/10: what it sees is the exact fraction given |
| 210 | above, the best 754 double approximation it can get:: |
| 211 | |
| 212 | >>> .1 * 2**56 |
| 213 | 7205759403792794.0 |
| 214 | |
| 215 | If we multiply that fraction by 10\*\*30, we can see the (truncated) value of |
| 216 | its 30 most significant decimal digits:: |
| 217 | |
Mark Dickinson | 33e5935 | 2010-08-04 21:44:47 +0000 | [diff] [blame] | 218 | >>> 7205759403792794 * 10**30 // 2**56 |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 219 | 100000000000000005551115123125L |
| 220 | |
| 221 | meaning that the exact number stored in the computer is approximately equal to |
Mark Dickinson | d5d3256 | 2010-07-30 12:58:44 +0000 | [diff] [blame] | 222 | the decimal value 0.100000000000000005551115123125. In versions prior to |
| 223 | Python 2.7 and Python 3.1, Python rounded this value to 17 significant digits, |
Mark Dickinson | 33e5935 | 2010-08-04 21:44:47 +0000 | [diff] [blame] | 224 | giving '0.10000000000000001'. In current versions, Python displays a value |
| 225 | based on the shortest decimal fraction that rounds correctly back to the true |
| 226 | binary value, resulting simply in '0.1'. |