The present invention relates to the field of testing and evaluating mathematical functions, for example, those contained in mathematical libraries that may be supplied as part of computer language compilers or operating systems.
Mathematical libraries are extensively used in computer software to implement common mathematical functions. These mathematical libraries are often supplied as part of language compilers or operating systems to implement a range of mathematical functions such as division, multiplication, squares, powers, sin, cos and tan to more complex mathematical operations. The mathematical libraries provide computer programmers a convenient mechanism of executing a variety of mathematical functions simply by calling the desired mathematical function and providing the appropriate input arguments for each variable of the mathematical function (i.e. division represented as x/y is a two variable mathematical function requiring two input arguments). The mathematical function in return provides a result. However, the accuracy of the result and the speed at which it is produced is dependent on the mathematical library selected and how that particular mathematical function is implemented within the mathematical library. The results provided by many mathematical functions, and in particular floating-point based mathematical functions, are typically only approximations of the actual result. The reason that the mathematical functions produce approximations as opposed to precise results is either because the exact result is not representable in the floating-point number system, or because the computations required to calculate the exact result would be extremely costly in processing time. As such, trade-offs are often made in the design of mathematical libraries and mathematical functions and they therefore must be selected depending on the performance and accuracy required by the target application.
Mathematical libraries are typically furnished with limited information on the speed and accuracy of the mathematical function, if any. Even if this information is available, there are no consistent baselines against which the mathematical libraries are assessed. For example, the system or processor on which the mathematical library is tested may not adequately reflect the performance that will result in the target system or processor on which it will eventually be used. In addition, the methods by which the performance data is collected is not standardized. The common method used in the art to evaluate the performance of mathematical functions is accomplished by selection of a few thousand random arguments over a limited test interval (e.g. three thousand uniformly distributed random arguments selected between 0 and 10). However, this can result in a lack of small numbers being selected in the test interval, particularly when dealing with a floating-point number system. The resulting evaluation of the mathematical function, based on uniformly distributed numbers does not provide an accurate reflection of the density of floating-point numbers relative to each exponent. It therefore can be difficult for a programmer to evaluate the performance of a mathematical library based upon the limited information available. The ability to assess mathematical libraries and their mathematical functions in a consistent manner, and provide confidence in the coverage of the testing, is an area that requires improvement.
In an exemplary embodiment, the present invention provides a system and method for systematic random testing of single/double-precision, one/two-variable or scalar/vector mathematical functions against a known higher precision reference mathematical library or to the result of an arbitrary precision mathematical package. The systematic random testing of a mathematical function is performed across the entire floating-point range or a test interval defined therein, including denormalized numbers with random test arguments appropriately distributed over the test interval. By ensuring the density of coverage of random test arguments across the test interval against a statistical model and identifying the exception behavior of the mathematical function, an accurate picture of the performance and accuracy of the mathematical function can be assessed.
In accordance with one aspect of the present invention, there is provided a system of testing and evaluating the accuracy of a mathematical function for use in computer software in relation to a reference mathematical function over a test interval, the system comprising: an argument generation module for generating a piecewise uniform and overall exponential distribution of random test arguments for each floating-point exponent within the test interval; a computation module interfacing with the mathematical function to obtain a first set of results and the reference mathematical function to obtain a second set of results based on the random test arguments generated by the argument generation module; and an evaluation module for analyzing the first set of results with the second set of results for providing an indication of accuracy of the mathematical function.
In accordance with another aspect of the present invention, there is a provided a method of testing and evaluating the accuracy of a mathematical function to a known mathematical function over a test interval, the method comprising: (a) generating a set of piecewise uniformly and exponentially distributed random test arguments for each floating-point exponent within the test interval; (b) obtaining a first set of results from the mathematical function using the random test arguments; (c) obtaining a second set of results from reference mathematical function using the random test arguments; and (d) comparing said first set of results to said second set of results to determine the accuracy of the mathematical function.
Testing a mathematical function ideally involves exhaustively testing the mathematical function across all possible test arguments. Although this is feasible for some single-precision (e.g. 32-bit) floating-point mathematical functions, for double-precision (e.g. 64-bit) functions this is practically infeasible in terms of the time and processing required to adequately cover all arguments.
If the test interval 100 includes denormalized numbers (IEEE 754 Standard for Binary Floating-Point Arithmetic, incorporated here in by reference), they are equally represented by considering each order of magnitude of denormalized numbers to have an exponent value below that of the smallest normal exponent in the floating-point number system.
As part of the argument generation module 110, analysis of the coverage of the randomly generated test arguments across the test interval 100 is performed by comparison to known statistical models. For large n, the statistical distribution of the maximum gap g between two adjacent values, when n−1 values are uniformly randomly chosen between 0 and 1, can be approximated by the probability density function defined in Equation 1.
P(g>x)=(1−e−nx)n EQ. 1
The peak of P(g>x) can be shown to occur at x=(log n)/n, so the typical maximum gap can be expected to be (log n)/n.
For n−1 uniformly spaced (not random) numbers between 0 and 1, the gap is 1/n, which is a lower bound for the corresponding value for random numbers. Therefore, the scaled gap size obtained by multiplying the gap size by n is a useful measure of the coverage of the interval by the random test arguments. For large n, the probability density function of Eq. 1, predicts that the typical scaled maximum gap size should behave like log n.
To apply this model to the random arguments chosen by the test environment, the interval (0,1) in the model is mapped to the floating-point fraction range (0.5,1).
The random test arguments generated by the argument generation module 110 are passed to and processed by a computation module 120. The computation module 120 calls a test mathematical function such as test mathematical function N (FNC N) 140, from the mathematical library 130, and provides the random test arguments for each variable of FNC N 140. The calculated result from FNC N 140, is returned to the computation module 120 and provided to an evaluation module 170. Likewise, the same arguments are provided by the computation module 120 to a reference mathematical function N (REF FNC N) 160 from a reference mathematical library 150 and the results provided to an evaluation module 170. The random test arguments are provided either individually in a loop (for a scalar mathematical function), or as a vector (in the case of a vector mathematical function). The time for a number of repetitions of this calling process is measured and divided by the number of calls, providing a time per call metric. The number of repetitions is adjusted dynamically to ensure that the total time measured is large enough to reduce errors caused by the inherent noise of the timing process.
The reference mathematical library 150 and REF FNC N 160 are chosen to provide known higher accuracy results than the mathematical library 130 and FNC N 140. For example, REF FNC N 160 would be from a quad-precision mathematical reference library 150 while FNC N 140 would be from a double-precision mathematical library 130. Alternatively, the system 90 provides an interface (not shown) to a symbolic/numeric package, such as Mathematica™, allowing the REF FNC N 160 to be computed to any required precision when no higher precision reference library is available. For example, trigonometric mathematical functions with large arguments, whose accuracy depends on accurate range reduction beyond the capability of even quad-precision arithmetic would require such an interface to evaluate FNC N 140.
The values returned by the test mathematical function N 140 and reference mathematical function N 160 are compared by the evaluation module 170. In addition, the exception behavior to inputs such as ±∞ and NaN (Not a Number, IEEE 754 previously referred) inputs is also computed by the computation module 120 for FNC N 140 and REF FNC N 160. If one of REF N 140 or REF FNC N 160 returns an exception, and the other does not, or returns a different exception, this information is stored and provided to the evaluation module 170 as well.
The evaluation module 170 determines, on an individual floating-point exponent basis, parameters such as the argument(s) at which the maximum observed error occurred (e.g. in both decimal and hex), the maximum absolute error in units in last place (ulps), the maximum root-mean-squared (rms) error in ulps and the time per call metric as described above. The results of the accuracy testing, timing, and exception behavior analysis are presented in a report 180. An example of a condensed report is shown in
The test mathematical function that is to be evaluated, such as FNC N 140, is then called using the random test arguments generated by the argument generation module 110, by the computation module 120 at a compute test function step 210 for each variable of the FNC N 160. Likewise, the reference mathematical function N 160, is called using the same random test arguments by computation module 120 at the compute reference function step 220. Steps 210 and 220 also include the collection by the computation module 120 of the results from the mathematical function (FNC N 140 and REF FNC N 160) and storage of performance statistics, as previously discussed, for evaluation by the evaluation module 170. The method 190 continues to an evaluation of the results process at step 230, which is executed by the evaluation module 170. From step 230 a report can be produced summarizing the results of the method 190, a condensed example of which is shown in
The report shown in
The following columns have one line per floating-point exponent, and represent a summary of the results for all the test arguments with that exponent. Since this example is for a 2-variable mathematical function, the lines refer to all possible 2nd arguments, with the 1st argument having the given sign and exponent. (For a 1-variable mathematical function, the report would look similar, except that the “x2” columns would not appear.)
The columns of the report shown in
The [flags] column contains the exception information. The characters to the left of the “/” refer to the test mathematical function, and those to the right, to the reference mathematical function. The meaning of the symbols are:
The last three lines in the list are for the exception arguments −INF, +INF, and NaN. The line below the “overall statistics” label is a summary for the entire test. The union of all the exception flags and the location and size of the maximum overall error are shown. Following are the minimum and maximum arguments tested and the number of arguments tested (for verification purposes), the average scaled maximum gap and maximum scaled maximum gap between random arguments (for each variable) and the deviation of this from the statistical model as calculated by the argument generation module 110, the minimum and maximum signed ulp errors, and an error histogram. This histogram gives the percent of the test arguments for which the results was correctly rounded (ulp error <=0.5), between 0.5 and 1, between 1 and 2, between 2 and 10, or greater than 10. Finally, there is the minimum, average, and maximum time per call. Calls resulting in exceptions are listed separately.
The hardware elements of a computer system used to implement the present invention are shown in
In summary, an exemplary embodiment of the invention comprises a system and method for systematic random testing of single/double-precision, one/two-variable, scalar/vector mathematical functions. Another exemplary embodiment of the invention comprises a system and method to systematic random testing of mathematical functions across the entire floating-point range or a test interval defined therein, including denormalized numbers with random test arguments appropriately distributed over the test interval is described. Further, in an exemplary embodiment of the invention, by ensuring the density of coverage of random test arguments across the test interval and identifying the exception behavior of a mathematical function, an accurate picture of the performance and accuracy of the mathematical function can be assessed.
Further, in another exemplary embodiment of the present invention, by ensuring that the distribution of random test arguments matches the piecewise- uniform/exponential distribution of floating-point numbers and verifying that the density of coverage of these random arguments is consistent with the density achieved by a statistical model, reliable, representative results can be achieved. A density of randomly generated test arguments adapted to the exponential distribution of floating-point numbers is provided so that no arguments are over or under represented in the test set. In addition, a further exemplary embodiment of the invention can provide statistical analysis of the gap between generated test arguments which serves several purposes. The statistical analysis enables the identification of any potential problems in the test argument generation, provides feedback to guide the choice of testing density and increases confidence in the coverage of the test interval provided by the test arguments.
The results of the test mathematical function can be compared against the results of a reference function from a known higher precision reference mathematical library or to the result of an arbitrary precision mathematical package such as provided by Mathematica™ software by Wolfram Research Inc. ensuring the accuracy of the end result. In an embodiment of the invention, the flexibility to interface with a standard reference mathematical library interface is provided. This interface is utilized when sufficiently accurate reference mathematical functions would otherwise not be available. This situation can exist when no longer-precision mathematical library mathematical function is available (that is, double precision when testing single-precision mathematical functions, or quad-precision when testing double-precision mathematical functions). It can also occur when a longer-precision mathematical function is available but its accuracy on certain sensitive arguments is unknown or suspect.
Number | Date | Country | Kind |
---|---|---|---|
2,452,274 | Dec 2003 | CA | national |