The standard defines five basic formats that are
named for their numeric base and the number of bits used in their
interchange encoding. There are three binary floatingpoint basic
formats (encoded with 32, 64 or 128 bits) and two decimal
floatingpoint basic formats (encoded with 64 or 128 bits).
The binary32 and binary64 formats are the single and double formats of IEEE 7541985. A conforming implementation must fully implement at least one of the basic formats.
The typical precision of the basic binary formats
is one bit more than the width of its significand. The extra bit of
precision comes from an implied (hidden) leading 1 bit. The typical floating
point number will be normalized such that the most significant bit
will be a one. If the leading bit is known to be one, then it need not
be encoded in the interchange format.
Name  Common name  Base  Digits  E min  E max  Decimal digits 
Decimal E max 
Notes 


binary16  Half precision  2  10+1  −14  +15  3.31  4.51  IEEE 

binary24 
Float24 
2 
Proposed 

binary32  Single precision  2  23+1  −126  +127  7.22  38.23  IEEE  
binary40 
Float40 
2 
Proposed 

binary64  Double precision  2  52+1  −1022  +1023  15.95  307.95  IEEE  
binary80 
Float80 
2 
Proposed 

binary128  Quadruple precision  2  112+1  −16382  +16383  34.02  4931.77  IEEE  
Name 
Common
name 
Base 
Digits 
E
min 
E
max 
Decimal
digits 
Decimal
E max 
Notes 
Decimal digits is digits × log_{10} base, this gives
an approximate precision in decimal.
Decimal E max is Emax × log_{10} base, this gives the maximum exponent in decimal.
Floatingpoint numbers are typically packed into
a computer datum as the sign bit, the exponent field, and the
significand (mantissa), from left to right.
For the IEEE 754 binary formats (basic and extended) which have extant
hardware implementations, they are apportioned as follows
Type  Bytes 
Sign  Exponent  Significand  Total bits  Exponent bias  Bits precision  Number of expressible 
Official
Status 
Notes 


Common Name 
Total 
Bit 
Bits 
Bits 
Bytes
x 8 
decimal digits  
Half (IEEE 7542008)  2 
: 
1  5  10  = 
16  15  11  ~3.3  IEEE754 
Experimental 

Float24 
3 
: 
1 
5 
18 
= 
24 
15 
19 
~ 
Proposed 

Single  4 
: 
1  8  23  = 
32  127  24  ~7.2  IEEE754  
Float40 
5 
: 
1 
10 
29 
= 
40 
511 (provisional) 
~ 
Proposed 

Double

8 
: 
1  11  52  = 
64  1023  53  ~15.9  IEEE754  Also IBM 

Float72 
9 
: 
1 
11 
61 
= 
72 
1023 (provisional) 
~ 
Proposed 

Double extended  10 
: 
1  15  64  = 
80  16383  64  ~19.2  IEEE754  Also IBM 

Quad  16 
: 
1  15  112  = 
128  16383  113  ~34.0  IEEE754  Also IBM 
While the exponent can be positive or negative, in binary formats it
is stored as an unsigned number that has a fixed "bias" added to it.
Values of all 0s in this field are reserved for the zeros and subnormal
numbers, values of all 1s are reserved for the infinities and
NaNs. The exponent range for normalized numbers is [−126, 127] for
single precision, [−1022, 1023] for double, or [−16382, 16383] for
quad. Normalised numbers exclude subnormal values, zeros, infinities,
and NaNs.
In the IEEE binary interchange formats the
leading 1 bit of a normalized significand is not actually stored in
the computer datum. It is called the "hidden" or "implicit" bit.
Because of this, single precision format actually has a significand
with 24 bits of precision, double precision format has 53, and quad
has 113.
Multiple forms of floating point representation
are possible, and the IEEE 754 (2008) permits both the "Exponent +
Significand" form and the "Decimal Representation" number encoding
forms.
Like the binary floatingpoint formats, the number is divided into a
sign, and exponent, and a significand. Unlike binary floatingpoint,
numbers are not necessarily normalized; values with few significant
digits have
multiple possible representations: 1×10^{2}=0.1×10^{3}=0.01×10^{4},
etc.
When the significand is zero, the exponent can be
any value at all.
decimal32  decimal64  decimal128  decimal(32k)  Format 

1  1  1  1  Sign field (bits) 
5  5  5  5  Combination field (bits) 
6  8  12  w = 2×k + 4  Exponent continuation field (bits) 
20  50  110  t = 30×k−10  Coefficient continuation field (bits) 
32  64  128  32×k  Total size (bits) 
7  16  34  p = 3×t/10+1 = 9×k−2  Coefficient size (decimal digits) 
192  768  12288  3×2^{w} = 48×4^{k}  Exponent range 
96  384  6144  Emax = 3×2^{w−1}  Largest value is 9.99...×10^{Emax} 
−95  −383  −6143  Emin = 1−Emax  Smallest normalized value is 1.00...×10^{Emin} 
−101  −398  −6176  Etiny = 2−p−Emax  Smallest nonzero value is 1×10^{Etiny} 
The exponent ranges were chosen so that the range available to
normalized values is approximately symmetrical. Since this cannot be
done exactly with an even number of possible exponent values, the
extra value was given to Emax.
Two different representations are defined:
Both alternatives provide exactly the same range of representable values.
The most significant two bits of the exponent are
limited to the range of 0−2, and the most significant 4 bits of the
significand are limited to the range of 0−9. The 30 possible
combinations are encoded in a 5bit field, along with special forms
for infinity and NaN.
The standard specifies extended and extendible
precision formats, which are recommended for allowing a greater
precision than that provided by the basic formats.^{}
An extended precision format extends a basic format by using more precision and more exponent range. An extendable precision format allows the user to specify the precision and exponent range. An implementation may use whatever internal representation it chooses for such formats; all that needs to be defined are its parameters (b, p, and emax). These parameters uniquely describe the set of finite numbers (combinations of sign, significand, and exponent for the given radix) that it can represent.
The standard does not require an implementation to support extended or extendable precision formats.
The standard recommends that languages provide a method of specifying p and emax for each supported base b.^{}
The standard recommends that languages and implementations support an extended format which has a greater precision than the largest basic format supported for each radix b.^{}
For an extended format with a precision between two basic formats the exponent range must be as great as that of the next wider basic format. So for instance a 64bit extended precision binary number must have an 'emax' of at least 16383. The x87 80bit extended format meets this requirement.
Interchange formats are intended for the exchange of floatingpoint data using a fixedlength bitstring for a given format.
For the exchange of binary floatingpoint
numbers, interchange formats of length 16 bits, 32 bits, 64 bits, and
any multiple of 32 bits ≥128 are defined.
The 16bit format is intended for the exchange or storage of small numbers (e.g., for graphics).
For the exchange of decimal floatingpoint numbers, interchange formats of any multiple of 32 bits are defined.The encoding scheme for the decimal interchange formats similarly encodes the sign, exponent, and significand, but the scheme uses a more complex approach to allow the significand to be encoded as a compressed sequence of decimal digits (using densely packed decimal) or as a binary integer. In either case the set of numbers (combinations of sign, significand, and exponent) that may be encoded is identical, and signalling NaNs have a unique encoding (and the same set of possible payloads).
Hardware
Created by 
Initial idea 
Initial version 
Current version 
Last revision 
Revision state 

Max Power 
15 August 2010 
12 April 2013 
14 June 2014 
Remove decimal format table 
Initial 