Copyright (C) 1994, Digital Equipment Corp.The generic interface
Float provides access to the floating-point
operations required or recommended by the IEEE floating-point
standard. Consult the standard to resolve any fine points in the
specification of the procedures. Non-IEEE implementations that
have values similar to NaNs and infinities should explain how those
values behave in an implementation guide. (NaN is an IEEE term
whose informal meaning is ``not a number''.)
GENERIC INTERFACEFloat (R); IMPORT FloatMode; TYPE T = R.T; PROCEDURE Scalb(x: T; n: INTEGER): T RAISES {FloatMode.Trap};
Return $\hboxx\cdot 2^{\hboxn}$.
PROCEDURE Logb(x: T): T RAISES {FloatMode.Trap};
Return the exponent ofx. More precisely, return the unique integer $n$ such that the ratio $\hboxABS(x) / Base^{n}$ is in the half-open interval[1..Base), unlessxis denormalized, in which case return the minimum exponent value forT.
PROCEDURE ILogb(x: T): INTEGER;
LikeLogb, but returns an integer, never raises an exception, and always returns the $n$ such that $\hboxABS(x) / Base^{n}$ is in the half-open interval[1..Base), even for denormalized numbers. Special cases: it returnsFIRST(INTEGER)whenx= 0.0,LAST(INTEGER)whenxis plus or minus infinity, and zero whenxis NaN.
PROCEDURE NextAfter(x, y: T): T RAISES {FloatMode.Trap};
Return the next representable neighbor ofxin the direction towardsy. Ifx = y, returnx.
PROCEDURE CopySign(x, y: T): T;
Returnxwith the sign ofy.
PROCEDURE Finite(x: T): BOOLEAN;
ReturnTRUEifxis strictly between minus infinity and plus infinity. This always returnsTRUEon non-IEEE implementations.
PROCEDURE IsNaN(x: T): BOOLEAN;
ReturnFALSEifxrepresents a numerical (possibly infinite) value, andTRUEifxdoes not represent a numerical value. For example, on IEEE implementations, returnsTRUEif x is a NaN,FALSEotherwise.
\index{NaN (not a number)}
PROCEDURE Sign(x: T): [0..1];
Return the sign bitx. For non-IEEE implementations, this is the same asORD(x >= 0); for IEEE implementations,Sign(-0) = 1andSign(+0) = 0.
PROCEDURE Differs(x, y: T): BOOLEAN;
Return(x < y OR y < x). Thus, for IEEE implementations,Differs(NaN,x)is alwaysFALSE; for non-IEEE implementations,Differs(x,y)is the same asx # y.
PROCEDURE Unordered(x, y: T): BOOLEAN;
ReturnNOT (x <= y OR y <= x). Thus, for IEEE implementations,Unordered(NaN, x)is alwaysTRUE; for non-IEEE implementations,Unordered(x, y)is alwaysFALSE.
PROCEDURE Sqrt(x: T): T RAISES {FloatMode.Trap};
Return the square root ofT. This must be correctly rounded ifFloatMode.IEEEisTRUE.
TYPE IEEEClass =
{SignalingNaN, QuietNaN, Infinity, Normal, Denormal, Zero};
PROCEDURE Class(x: T): IEEEClass;
Return the IEEE number class containingx. On non-IEEE systems, the result will beNormalorZero.
PROCEDURE FromDecimal(
sign: [0..1];
READONLY digits: ARRAY OF [0..9];
exp: INTEGER): T RAISES {FloatMode.Trap};
Convert from floating-decimal to type T. \index{floating-point!conversion from decimal} \index{decimal conversion!to floating-point}
Let F denote the nonnegative, floating-decimal number
digits[0] . digits[1] ... digits[LAST(digits)] * 10^exp
= sum(i, digits[i] * 10^(exp - i))
The result of FromDecimal is the number (-1)^sign * F, rounded
to a value of type T.
The procedure FromDecimal is a floating-point operation, just
like + and *, in the sense that it rounds its ideal result
correctly, observing the current rounding mode, and it sets flags
and raises traps by the usual rules. On IEEE implementations, it
returns minus zero when F is sufficiently small and sign=1.
TYPE DecimalApprox = RECORD
class: IEEEClass;
sign: [0..1];
len: [1..R.MaxSignifDigits];
digits: ARRAY[0..R.MaxSignifDigits-1] OF [0..9];
exp: INTEGER;
errorSign: [-1..1]
END;
PROCEDURE ToDecimal(x: T): DecimalApprox;
Convert from type T to floating-decimal. \index{floating-point!conversion to decimal} \index{decimal conversion!from floating-point}
Let D denote ToDecimal(x). Then, D.class = Class(x) and
D.sign = Sign(x). The other fields are defined only when
D.class is either Normal or Denormal. In those cases, the
values D.len, D.digits[0] through D.digits[D.len-1], and
D.exp encode a floating-decimal number F with the property that
(-1)^D.sign * F approximates x in a sense discussed below. The
encoding is such that
F = digits[0] . digits[1] ... digits[len - 1] * 10^exp
= sum(i, digits[i] * 10^(exp - i))
and
ABS(x) = F * (1 + errorSign * epsilon)
where epsilon is small and positive. In particular, D.errorSign
is +1, 0, or -1 according as ABS(x) is larger than, equal
to, or smaller than F.
The current rounding mode determines the sense in which the
floating-decimal number (-1)^sign * F approximates x, but in a
slightly subtle way. Define the opposite of a directed rounding
mode by reversing the direction, as follows:
Opp(TowardPlusInfinity) := TowardMinusInfinity
Opp(TowardMinusInfinity) := TowardPlusInfinity
Opp(TowardZero) := AwayFromZero
Note that AwayFromZero isn't actually a rounding mode, but it is
clear what it would mean if it were. For all other rounding modes
M, we define Opp(M) = M. If the current rounding mode is M,
the call ToDecimal(x) returns a floating-decimal number that
FromDecimal would convert, under rounding mode Opp(M), back to
x. Among all such numbers, the returned value has as few digits
as possible. This implies that both D.digits[0] and
D.digits[D.len-1] are nonzero. If there is a tie for having the
fewest digits, the tying number closest to x wins. If there is
also a tie for being closest to x, it must be a two-way tie and
the number whose last digit is even wins.
Unlike FromDecimal, ToDecimal never sets a FloatMode.Flag and
never raises FloatMode.Trap.
The idea of converting to decimal by retaining just as many digits
as are necessary to convert back to binary exactly was popularized
by Guy L.~Steele Jr.\ and Jon L White~\cite{Steele}. David M.~Gay
pointed out the importance, in this context, of demanding that the
conversion to binary handle mid-point cases by a known
rule~\cite{Gay}. For example, in IEEE double precision, the
floating-decimal number 1e23 is precisely halfway between two
adjacent floating-binary numbers. If conversion to binary were
allowed to go either way in such a mid-point case, conversion to
decimal would have to avoid producing the simple number 1e23,
producing instead either 1.0000000000000001e23 or
9.999999999999999e22. We believe the idea of combining the
Steele/White style of automatic precision control with directed
rounding by using opposite rounding modes, as above, is new with
Lyle Ramshaw.
END Float.