Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 1 | .. _tut-fp-issues: |
| 2 | |
| 3 | ************************************************** |
| 4 | Floating Point Arithmetic: Issues and Limitations |
| 5 | ************************************************** |
| 6 | |
| 7 | .. sectionauthor:: Tim Peters <tim_one@users.sourceforge.net> |
| 8 | |
| 9 | |
| 10 | Floating-point numbers are represented in computer hardware as base 2 (binary) |
| 11 | fractions. For example, the decimal fraction :: |
| 12 | |
| 13 | 0.125 |
| 14 | |
| 15 | has value 1/10 + 2/100 + 5/1000, and in the same way the binary fraction :: |
| 16 | |
| 17 | 0.001 |
| 18 | |
| 19 | has value 0/2 + 0/4 + 1/8. These two fractions have identical values, the only |
| 20 | real difference being that the first is written in base 10 fractional notation, |
| 21 | and the second in base 2. |
| 22 | |
| 23 | Unfortunately, most decimal fractions cannot be represented exactly as binary |
| 24 | fractions. A consequence is that, in general, the decimal floating-point |
| 25 | numbers you enter are only approximated by the binary floating-point numbers |
| 26 | actually stored in the machine. |
| 27 | |
| 28 | The problem is easier to understand at first in base 10. Consider the fraction |
| 29 | 1/3. You can approximate that as a base 10 fraction:: |
| 30 | |
| 31 | 0.3 |
| 32 | |
| 33 | or, better, :: |
| 34 | |
| 35 | 0.33 |
| 36 | |
| 37 | or, better, :: |
| 38 | |
| 39 | 0.333 |
| 40 | |
| 41 | and so on. No matter how many digits you're willing to write down, the result |
| 42 | will never be exactly 1/3, but will be an increasingly better approximation of |
| 43 | 1/3. |
| 44 | |
| 45 | In the same way, no matter how many base 2 digits you're willing to use, the |
| 46 | decimal value 0.1 cannot be represented exactly as a base 2 fraction. In base |
| 47 | 2, 1/10 is the infinitely repeating fraction :: |
| 48 | |
| 49 | 0.0001100110011001100110011001100110011001100110011... |
| 50 | |
Mark Dickinson | d5d3256 | 2010-07-30 12:58:44 +0000 | [diff] [blame] | 51 | Stop at any finite number of bits, and you get an approximation. On a typical |
| 52 | machine, there are 53 bits of precision available, so the value stored |
| 53 | internally is the binary fraction :: |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 54 | |
Mark Dickinson | d5d3256 | 2010-07-30 12:58:44 +0000 | [diff] [blame] | 55 | 0.00011001100110011001100110011001100110011001100110011010 |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 56 | |
Mark Dickinson | d5d3256 | 2010-07-30 12:58:44 +0000 | [diff] [blame] | 57 | which is close to, but not exactly equal to, 1/10. |
| 58 | |
| 59 | It's easy to forget that the stored value is an approximation to the original |
| 60 | decimal fraction, because of the way that floats are displayed at the |
| 61 | interpreter prompt. Python only prints a decimal approximation to the true |
| 62 | decimal value of the binary approximation stored by the machine. If Python |
| 63 | were to print the true decimal value of the binary approximation stored for |
| 64 | 0.1, it would have to display :: |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 65 | |
| 66 | >>> 0.1 |
| 67 | 0.1000000000000000055511151231257827021181583404541015625 |
| 68 | |
Mark Dickinson | d5d3256 | 2010-07-30 12:58:44 +0000 | [diff] [blame] | 69 | That is more digits than most people find useful, so Python keeps the number |
| 70 | of digits manageable by displaying a rounded value instead :: |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 71 | |
Mark Dickinson | d5d3256 | 2010-07-30 12:58:44 +0000 | [diff] [blame] | 72 | >>> 0.1 |
| 73 | 0.1 |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 74 | |
Mark Dickinson | d5d3256 | 2010-07-30 12:58:44 +0000 | [diff] [blame] | 75 | It's important to realize that this is, in a real sense, an illusion: the value |
| 76 | in the machine is not exactly 1/10, you're simply rounding the *display* of the |
| 77 | true machine value. This fact becomes apparent as soon as you try to do |
| 78 | arithmetic with these values :: |
| 79 | |
| 80 | >>> 0.1 + 0.2 |
| 81 | 0.30000000000000004 |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 82 | |
| 83 | Note that this is in the very nature of binary floating-point: this is not a bug |
| 84 | in Python, and it is not a bug in your code either. You'll see the same kind of |
| 85 | thing in all languages that support your hardware's floating-point arithmetic |
| 86 | (although some languages may not *display* the difference by default, or in all |
| 87 | output modes). |
| 88 | |
Mark Dickinson | d5d3256 | 2010-07-30 12:58:44 +0000 | [diff] [blame] | 89 | Other surprises follow from this one. For example, if you try to round the value |
| 90 | 2.675 to two decimal places, you get this :: |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 91 | |
Mark Dickinson | d5d3256 | 2010-07-30 12:58:44 +0000 | [diff] [blame] | 92 | >>> round(2.675, 2) |
| 93 | 2.67 |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 94 | |
Mark Dickinson | d5d3256 | 2010-07-30 12:58:44 +0000 | [diff] [blame] | 95 | The documentation for the built-in :func:`round` function says that it rounds |
| 96 | to the nearest value, rounding ties away from zero. Since the decimal fraction |
| 97 | 2.675 is exactly halfway between 2.67 and 2.68, you might expect the result |
| 98 | here to be (a binary approximation to) 2.68. It's not, because when the |
| 99 | decimal literal ``2.675`` is converted to a binary floating-point number, it's |
| 100 | again replaced with a binary approximation, whose exact value is :: |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 101 | |
Mark Dickinson | d5d3256 | 2010-07-30 12:58:44 +0000 | [diff] [blame] | 102 | 2.67499999999999982236431605997495353221893310546875 |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 103 | |
Mark Dickinson | d5d3256 | 2010-07-30 12:58:44 +0000 | [diff] [blame] | 104 | Since this approximation is slightly closer to 2.67 than to 2.68, it's rounded |
| 105 | down. |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 106 | |
Mark Dickinson | d5d3256 | 2010-07-30 12:58:44 +0000 | [diff] [blame] | 107 | If you're in a situation where you care which way your decimal halfway-cases |
| 108 | are rounded, you should consider using the :mod:`decimal` module. |
| 109 | Incidentally, the :mod:`decimal` module also provides a nice way to "see" the |
| 110 | exact value that's stored in any particular Python float :: |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 111 | |
Mark Dickinson | d5d3256 | 2010-07-30 12:58:44 +0000 | [diff] [blame] | 112 | >>> from decimal import Decimal |
| 113 | >>> Decimal(2.675) |
| 114 | Decimal('2.67499999999999982236431605997495353221893310546875') |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 115 | |
| 116 | Another consequence is that since 0.1 is not exactly 1/10, summing ten values of |
| 117 | 0.1 may not yield exactly 1.0, either:: |
| 118 | |
| 119 | >>> sum = 0.0 |
| 120 | >>> for i in range(10): |
| 121 | ... sum += 0.1 |
| 122 | ... |
| 123 | >>> sum |
Mark Dickinson | 6b87f11 | 2009-11-24 14:27:02 +0000 | [diff] [blame] | 124 | 0.9999999999999999 |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 125 | |
| 126 | Binary floating-point arithmetic holds many surprises like this. The problem |
| 127 | with "0.1" is explained in precise detail below, in the "Representation Error" |
| 128 | section. See `The Perils of Floating Point <http://www.lahey.com/float.htm>`_ |
| 129 | for a more complete account of other common surprises. |
| 130 | |
| 131 | As that says near the end, "there are no easy answers." Still, don't be unduly |
| 132 | wary of floating-point! The errors in Python float operations are inherited |
| 133 | from the floating-point hardware, and on most machines are on the order of no |
| 134 | more than 1 part in 2\*\*53 per operation. That's more than adequate for most |
| 135 | tasks, but you do need to keep in mind that it's not decimal arithmetic, and |
| 136 | that every float operation can suffer a new rounding error. |
| 137 | |
| 138 | While pathological cases do exist, for most casual use of floating-point |
| 139 | arithmetic you'll see the result you expect in the end if you simply round the |
| 140 | display of your final results to the number of decimal digits you expect. |
Benjamin Peterson | f9ef988 | 2008-05-26 00:54:22 +0000 | [diff] [blame] | 141 | :func:`str` usually suffices, and for finer control see the :meth:`str.format` |
| 142 | method's format specifiers in :ref:`formatstrings`. |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 143 | |
| 144 | |
| 145 | .. _tut-fp-error: |
| 146 | |
| 147 | Representation Error |
| 148 | ==================== |
| 149 | |
| 150 | This section explains the "0.1" example in detail, and shows how you can perform |
| 151 | an exact analysis of cases like this yourself. Basic familiarity with binary |
| 152 | floating-point representation is assumed. |
| 153 | |
| 154 | :dfn:`Representation error` refers to the fact that some (most, actually) |
| 155 | decimal fractions cannot be represented exactly as binary (base 2) fractions. |
| 156 | This is the chief reason why Python (or Perl, C, C++, Java, Fortran, and many |
| 157 | others) often won't display the exact decimal number you expect:: |
| 158 | |
Mark Dickinson | d5d3256 | 2010-07-30 12:58:44 +0000 | [diff] [blame] | 159 | >>> 0.1 + 0.2 |
| 160 | 0.30000000000000004 |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 161 | |
Mark Dickinson | d5d3256 | 2010-07-30 12:58:44 +0000 | [diff] [blame] | 162 | Why is that? 1/10 and 2/10 are not exactly representable as a binary |
| 163 | fraction. Almost all machines today (July 2010) use IEEE-754 floating point |
| 164 | arithmetic, and almost all platforms map Python floats to IEEE-754 "double |
| 165 | precision". 754 doubles contain 53 bits of precision, so on input the computer |
| 166 | strives to convert 0.1 to the closest fraction it can of the form *J*/2**\ *N* |
| 167 | where *J* is an integer containing exactly 53 bits. Rewriting :: |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 168 | |
| 169 | 1 / 10 ~= J / (2**N) |
| 170 | |
| 171 | as :: |
| 172 | |
| 173 | J ~= 2**N / 10 |
| 174 | |
| 175 | and recalling that *J* has exactly 53 bits (is ``>= 2**52`` but ``< 2**53``), |
| 176 | the best value for *N* is 56:: |
| 177 | |
| 178 | >>> 2**52 |
| 179 | 4503599627370496L |
| 180 | >>> 2**53 |
| 181 | 9007199254740992L |
| 182 | >>> 2**56/10 |
| 183 | 7205759403792793L |
| 184 | |
| 185 | That is, 56 is the only value for *N* that leaves *J* with exactly 53 bits. The |
| 186 | best possible value for *J* is then that quotient rounded:: |
| 187 | |
| 188 | >>> q, r = divmod(2**56, 10) |
| 189 | >>> r |
| 190 | 6L |
| 191 | |
| 192 | Since the remainder is more than half of 10, the best approximation is obtained |
| 193 | by rounding up:: |
| 194 | |
| 195 | >>> q+1 |
| 196 | 7205759403792794L |
| 197 | |
| 198 | Therefore the best possible approximation to 1/10 in 754 double precision is |
| 199 | that over 2\*\*56, or :: |
| 200 | |
| 201 | 7205759403792794 / 72057594037927936 |
| 202 | |
| 203 | Note that since we rounded up, this is actually a little bit larger than 1/10; |
| 204 | if we had not rounded up, the quotient would have been a little bit smaller than |
| 205 | 1/10. But in no case can it be *exactly* 1/10! |
| 206 | |
| 207 | So the computer never "sees" 1/10: what it sees is the exact fraction given |
| 208 | above, the best 754 double approximation it can get:: |
| 209 | |
| 210 | >>> .1 * 2**56 |
| 211 | 7205759403792794.0 |
| 212 | |
| 213 | If we multiply that fraction by 10\*\*30, we can see the (truncated) value of |
| 214 | its 30 most significant decimal digits:: |
| 215 | |
| 216 | >>> 7205759403792794 * 10**30 / 2**56 |
| 217 | 100000000000000005551115123125L |
| 218 | |
| 219 | meaning that the exact number stored in the computer is approximately equal to |
Mark Dickinson | d5d3256 | 2010-07-30 12:58:44 +0000 | [diff] [blame] | 220 | the decimal value 0.100000000000000005551115123125. In versions prior to |
| 221 | Python 2.7 and Python 3.1, Python rounded this value to 17 significant digits, |
| 222 | giving '0.10000000000000001'. In current versions, Python displays a value based |
| 223 | on the shortest decimal fraction that rounds correctly back to the true binary |
| 224 | value, resulting simply in '0.1'. |