The standard defines five basic formats that are
named for their numeric base and the number of bits used in their
interchange encoding. There are three binary floating-point basic
formats (encoded with 32, 64 or 128 bits) and two decimal
floating-point basic formats (encoded with 64 or 128 bits).
The typical precision of the basic binary formats
is one bit more than the width of its significand. The extra bit of
precision comes from an implied (hidden) leading 1 bit. The typical floating
point number will be normalized such that the most significant bit
will be a one. If the leading bit is known to be one, then it need not
be encoded in the interchange format.
|Name||Common name||Base||Digits||E min||E max||Decimal
Decimal digits is digits × log10 base, this gives an approximate precision in decimal.
Decimal E max is Emax × log10 base, this gives the maximum exponent in decimal.
Floating-point numbers are typically packed into
a computer datum as the sign bit, the exponent field, and the
significand (mantissa), from left to right.
For the IEEE 754 binary formats (basic and extended) which have extant hardware implementations, they are apportioned as follows
||Sign||Exponent||Significand||Total bits||Exponent bias||Bits precision||Number of expressible
|Half (IEEE 754-2008)||2
While the exponent can be positive or negative, in binary formats it is stored as an unsigned number that has a fixed "bias" added to it. Values of all 0s in this field are reserved for the zeros and subnormal numbers, values of all 1s are reserved for the infinities and NaNs. The exponent range for normalized numbers is [−126, 127] for single precision, [−1022, 1023] for double, or [−16382, 16383] for quad. Normalised numbers exclude subnormal values, zeros, infinities, and NaNs.
In the IEEE binary interchange formats the
leading 1 bit of a normalized significand is not actually stored in
the computer datum. It is called the "hidden" or "implicit" bit.
Because of this, single precision format actually has a significand
with 24 bits of precision, double precision format has 53, and quad
Multiple forms of floating point representation
are possible, and the IEEE 754 (2008) permits both the "Exponent +
Significand" form and the "Decimal Representation" number encoding
Like the binary floating-point formats, the number is divided into a sign, and exponent, and a significand. Unlike binary floating-point, numbers are not necessarily normalized; values with few significant digits have multiple possible representations: 1×102=0.1×103=0.01×104, etc.
When the significand is zero, the exponent can be
any value at all.
|1||1||1||1||Sign field (bits)|
|5||5||5||5||Combination field (bits)|
|6||8||12||w = 2×k + 4||Exponent continuation field (bits)|
|20||50||110||t = 30×k−10||Coefficient continuation field (bits)|
|32||64||128||32×k||Total size (bits)|
|7||16||34||p = 3×t/10+1 = 9×k−2||Coefficient size (decimal digits)|
|192||768||12288||3×2w = 48×4k||Exponent range|
|96||384||6144||Emax = 3×2w−1||Largest value is 9.99...×10Emax|
|−95||−383||−6143||Emin = 1−Emax||Smallest normalized value is 1.00...×10Emin|
|−101||−398||−6176||Etiny = 2−p−Emax||Smallest non-zero value is 1×10Etiny|
The exponent ranges were chosen so that the range available to normalized values is approximately symmetrical. Since this cannot be done exactly with an even number of possible exponent values, the extra value was given to Emax.
Two different representations are defined:
Both alternatives provide exactly the same range of representable values.
The most significant two bits of the exponent are
limited to the range of 0−2, and the most significant 4 bits of the
significand are limited to the range of 0−9. The 30 possible
combinations are encoded in a 5-bit field, along with special forms
for infinity and NaN.
An extended precision format extends a basic format by using more precision and more exponent range. An extendable precision format allows the user to specify the precision and exponent range. An implementation may use whatever internal representation it chooses for such formats; all that needs to be defined are its parameters (b, p, and emax). These parameters uniquely describe the set of finite numbers (combinations of sign, significand, and exponent for the given radix) that it can represent.
The standard does not require an implementation to support extended or extendable precision formats.
For an extended format with a precision between two basic formats the exponent range must be as great as that of the next wider basic format. So for instance a 64-bit extended precision binary number must have an 'emax' of at least 16383. The x87 80-bit extended format meets this requirement.
Interchange formats are intended for the exchange of floating-point data using a fixed-length bit-string for a given format.
For the exchange of binary floating-point
numbers, interchange formats of length 16 bits, 32 bits, 64 bits, and
any multiple of 32 bits ≥128 are defined.
The 16-bit format is intended for the exchange or storage of small numbers (e.g., for graphics).For the exchange of decimal floating-point numbers, interchange formats of any multiple of 32 bits are defined.
The encoding scheme for the decimal interchange formats similarly encodes the sign, exponent, and significand, but the scheme uses a more complex approach to allow the significand to be encoded as a compressed sequence of decimal digits (using densely packed decimal) or as a binary integer. In either case the set of numbers (combinations of sign, significand, and exponent) that may be encoded is identical, and signalling NaNs have a unique encoding (and the same set of possible payloads).
||15 August 2010
||12 April 2013
||14 June 2014
||Remove decimal format table