Representation error

Published

2023-08-01

Representation error of numeric types

Representation error occurs when we try to represent a number using a finite number of bits or digits which cannot be accurately represented in the system chosen. For example, in our familiar decimal system:

number decimal representation representation error
1 1 0
1/3 0.3333333333333333 0.0000000000000000333\ldots
1/7 0.1428571428571428 0.0000000000000000571428\ldots

Natural numbers, integers, rational numbers, and real numbers

You probably know that the set of all natural numbers

\mathbb{N} = \{0, 1, 2, 3, \ldots\}

is infinite.

From there it’s not a great leap to see that the set of all integers

\mathbb{Z} = \{\ldots, -2, -1, 0, 1, 2, \ldots\}

is infinite too.

The rational numbers, \mathbb{Q}, and set of all real numbers, \mathbb{R}, also are infinite.

This fact—that these sets are of infinite size—has implications for numeric representation and numeric calculations on computers.

When we work with computers, numbers are given integer or floating-point representations. For example, in Python, we have distinct types, int and float, for holding integer and floating-point numbers, respectively.

I won’t get into too much detail about how these are represented in binary, but here’s a little bit of information.

Integers

Representation of integers is relatively straightforward. Integers are represented as binary numbers with a position set aside for the sign. So, 12345 would be represented as 00110000 00111001. That’s

0 \times 2^{15} + 0 \times 2^{14} + 1 \times 2^{13} + 1 \times 2^{12}

+ \; 0 \times 10^{11} + 0 \times 10^{10} + 0 \times 10^9 + 0 \times 10^8

+ \; 0 \times 10^6 + 1 \times 2^5 + 1 \times 2^4 + 1 \times 2^3

+ \; 0 \times 2^2 + 0 \times 2^1 + 1 \times 2^0

This works out to

8192 + 4096 + 32 + 16 + 8 + 1 = 12345.

Negative integers are a little different. If you’re curious about this, see the Wikipedia article on Two’s Complement.

Floating-point numbers and IEEE 754

Floating-point numbers are a little tricky. Stop and think for a minute: How would you represent floating-point numbers? (It’s not as straightforward as you might think.)

Floating-point numbers are represented using the IEEE 754 standard (IEEE stands for “Institute of Electrical and Electronics Engineers”).1 There are three parts to this representation: the sign, the exponent, and the fraction (also called the mantissa or significand)—and all of these must be expressed in binary. IEEE 754 uses either 32 or 64 bits for representing floating point numbers. The issue with representation lies in the fact that there’s a fixed number of bits available: one bit for the sign of a number, eight bits for the exponent, and the rest for the fractional portion. With a finite number of bits, there’s a finite number of values that can be represented without error.

Examples of representation error

Now we have some idea of how integers and floating-point numbers are represented in a computer. Consider this: We have some fixed number of bits set aside for these representations.2 So we have a limited number of bits we can use to represent numbers on a computer. Do you see the problem?

The set of all integers, \mathbb{Z}, is infinite. The set of all real numbers, \mathbb{R}, is infinite. Any computer we can manufacture is finite. Now do you see the problem?

There exist infinitely more integers, and infinitely more real numbers, than we can represent in any system with a fixed number of bits. Let that soak in.

For any given machine or finite representation scheme, there are infinitely many numbers that cannot be represented in that system! This means that many numbers are represented by an approximation only.

Let’s return to the example of 1/3 in our decimal system. We can never write down enough digits to the right of the decimal point so that we have the exact value of 1/3.

0.333333333333333333333\ldots

No matter how far we extend this expansion, the value will only be an approximation of 1/3. However, the fact that its decimal expansion is non-terminating is determined by the choice of base (10).

What if we were to represent this in base 3? In base 3, decimal 1/3 is 0.1. In base 3, it’s easy to represent!

Of course our computers use binary, and so in that system (base 2) there are some numbers that can be represented accurately, and an infinite number that can only be approximated.

Here’s the canonical example, in Python:

>>> 0.1
0.1
>>> 0.2
0.2
>>> 0.1 + 0.2
0.30000000000000004

Wait! What? Yup. Something strange is going on. Python rounds values when displaying in the shell. Here’s proof:

>>> print(f'{0.1:.56f}')
0.10000000000000000555111512312578270211815834045410156250
>>> print(f'{0.2:.56f}')
0.20000000000000001110223024625156540423631668090820312500
>>> print(f'{0.1 + 0.2:.56f}')
0.30000000000000004440892098500626161694526672363281250000

The last, 0.1 + 0.2, is an example of representation error that accumulates to the point that it is no longer hidden by Python’s automatic rounding, hence

>>> 0.1 + 0.2
0.30000000000000004

Remember we only work with powers of two. So there’s no way to accurately represent these numbers in binary with a fixed number of decimal places.

What’s the point?

  1. The subset of real numbers that can be accurately represented within a given positional system depends on the base chosen (1/3 cannot be represented without error in the decimal system, but it can be in base 3).
  2. It’s important that we understand that no finite machine can represent all real numbers without error.
  3. Most numbers that we provide to the computer and which the computer provides to us in the form of answers are only approximations.
  4. Perhaps most important from a practical standpoint, representation error can accumulate with repeated calculations.
  5. Understanding representation error can prevent you from chasing bugs when none exist.

For more, see:

Original author: Clayton Cafiero < [given name] DOT [surname] AT uvm DOT edu >

No generative AI was used in producing this material. This was written the old-fashioned way.

This material is for free use under either the GNU Free Documentation License or the Creative Commons Attribution-ShareAlike 3.0 United States License (take your pick).

Footnotes

  1. For more, see: https://en.wikipedia.org/wiki/IEEE_754↩︎

  2. That’s not entirely true for integers in Python, but it’s reasonable to think of it this way for the purpose at hand.↩︎