MATLAB Language Common mistakes and errors Be aware of floating point inaccuracy


Floating-point numbers cannot represent all real numbers. This is known as floating point inaccuracy.

There are infinitely many floating points numbers and they can be infinitely long (e.g. π), thus being able to represent them perfectly would require infinitely amount of memory. Seeing this was a problem, a special representation for "real number" storage in computer was designed, the IEEE 754 standard. In short, it describes how computers store this type of numbers, with an exponent and mantissa, as,

floatnum = sign * 2^exponent * mantissa

With limited amount of bits for each of these, only a finite precision can be achieved. The smaller the number, smaller the gap between possible numbers (and vice versa!). You can try your real numbers in this online demo.

Be aware of this behavior and try to avoid all floating points comparison and their use as stopping conditions in loops. See below two examples:

Examples: Floating point comparison done WRONG:

>> 0.1 + 0.1 + 0.1  == 0.3

ans =



It is poor practice to use floating point comparison as shown by the precedent example. You can overcome it by taking the absolute value of their difference and comparing it to a (small) tolerance level.

Below is another example, where a floating point number is used as a stopping condition in a while loop:**

k = 0.1;
while k <= 0.3 
  k = k + 0.1;

% --- Output: ---

It misses the last expected loop (0.3 <= 0.3).

Example: Floating point comparison done RIGHT:

x = 0.1 + 0.1 + 0.1;
y = 0.3;
tolerance = 1e-10; % A "good enough" tolerance for this case.

if ( abs( x - y ) <= tolerance )
  disp('x == y');
  disp('x ~= y');

% --- Output: ---
x == y

Several things to note:

  • As expected, now x and y are treated as equivalent.
  • In the example above, the choice of tolerance was done arbitrarily. Thus, the chosen value might not be suitable for all cases (especially when working with much smaller numbers). Choosing the bound intelligently can be done using the eps function, i.e. N*eps(max(x,y)), where N is some problem-specific number. A reasonable choice for N, which is also permissive enough, is 1E2 (even though, in the above problem N=1 would also suffice).

Further reading:

See these questions for more information about floating point inaccuracy: