The present disclosure relates generally to mathematical approximation, and more particularly, to an approximation of non-linear functions in fixed-point arithmetic using look-up tables.
Computing a non-linear function ƒ(x) in hardware or embedded systems can be very complex and resource intensive. Typically, a Taylor series expansion is used to approximate a non-linear function. However, approximation of a non-linear function ƒ(x) using Taylor series expansion may be computationally inefficient, as such approximation may require significant memory and processing time. There is currently a need for techniques to calculate an arbitrary nonlinear function more efficiently in hardware in which such techniques provide increased accuracy of the computation of the nonlinear function while reducing memory usage and/or processing time.
The following presents a simplified summary of one or more aspects in order to provide a basic understanding of such aspects. This summary is not an extensive overview of all contemplated aspects, and is intended to neither identify key or critical elements of all aspects nor delineate the scope of any or all aspects. Its sole purpose is to present some concepts of one or more aspects in a simplified form as a prelude to the more detailed description that is presented later.
Computing a non-linear function ƒ(x) in hardware or embedded systems can be very complex and resource intensive. Typically, a Taylor series expansion is used to approximate a non-linear function. However, approximation of a non-linear function ƒ(x) using Taylor series expansion may be computationally inefficient, as such approximation may require significant memory and processing time.
One or more of the embodiments of the present disclosure may be used to calculate nonlinear functions more accurately and efficiently in hardware using look-up tables (LUTs) and interpolation or extrapolation. Determining the value of a nonlinear function ƒ(x) for any value x may require time and/or memory space. For certain applications, aspects of the present disclosure may reduce the computation time and/or memory requirements for calculating certain nonlinear functions.
In an aspect of the disclosure, a method, a computer-readable medium, and an apparatus are provided. The apparatus may be a processor. The processor computes a non-linear function ƒ(x) for an input variable x, where ƒ(x)=g(y(x),z(x)). The processor determines an integer n by determining a position of a most significant bit (MSB) of an input variable x. In addition, the processor determines a value for y(x) based on a first look-up table and the determined integer n. In addition, the processor determines a value for z(x) based on n and the input variable x, and based on a second look-up table. Further, the processor computes ƒ(x) based on the determined values for y(x) and z(x).
To the accomplishment of the foregoing and related ends, the one or more aspects comprise the features hereinafter fully described and particularly pointed out in the claims. The following description and the annexed drawings set forth in detail certain illustrative features of the one or more aspects. These features are indicative, however, of but a few of the various ways in which the principles of various aspects may be employed, and this description is intended to include all such aspects and their equivalents.
The detailed description set forth below in connection with the appended drawings is intended as a description of various configurations and is not intended to represent the only configurations in which the concepts described herein may be practiced. The detailed description includes specific details for the purpose of providing a thorough understanding of various concepts. However, it will be apparent to those skilled in the art that these concepts may be practiced without these specific details. In some instances, well known structures and components are shown in block diagram form in order to avoid obscuring such concepts.
Several aspects of mathematical approximation methods will now be presented with reference to various system. These systems and methods will be described in the following detailed description and illustrated in the accompanying drawings by various blocks, components, processes, algorithms, etc. (collectively referred to as “elements”). These elements may be implemented using electronic hardware, computer software, or any combination thereof. Whether such elements are implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system.
By way of example, an element, or any portion of an element, or any combination of elements may be implemented as a “processing system” that includes one or more processors. Examples of processors include microprocessors, microcontrollers, graphics processing units (GPUs), central processing units (CPUs), application processors, digital signal processors (DSPs), reduced instruction set computing (RISC) processors, systems on a chip (SoC), baseband processors, field programmable gate arrays (FPGAs), programmable logic devices (PLDs), state machines, gated logic, discrete hardware circuits, and other suitable hardware configured to perform the various functionality described throughout this disclosure. One or more processors in the processing system may execute software. Software shall be construed broadly to mean instructions, instruction sets, code, code segments, program code, programs, subprograms, software components, applications, software applications, software packages, routines, subroutines, objects, executables, threads of execution, procedures, functions, etc., whether referred to as software, firmware, middleware, microcode, hardware description language, or otherwise.
Accordingly, in one or more example embodiments, the functions described may be implemented in hardware, software, or any combination thereof. If implemented in software, the functions may be stored on or encoded as one or more instructions or code on a computer-readable medium. Computer-readable media includes computer storage media. Storage media may be any available media that can be accessed by a computer. By way of example, and not limitation, such computer-readable media can comprise a random-access memory (RAM), a read-only memory (ROM), an electrically erasable programmable ROM (EEPROM), optical disk storage, magnetic disk storage, other magnetic storage devices, combinations of the aforementioned types of computer-readable media, or any other medium that can be used to store computer executable code in the form of instructions or data structures that can be accessed by a computer.
One or more of the embodiments of the present disclosure may be used to calculate nonlinear functions more accurately and efficiently in hardware using look-up tables (LUTs) and interpolation or extrapolation. Determining the value of a nonlinear function ƒ(x) for any value x may require time and/or memory space. For certain applications, aspects of the present disclosure may reduce the computation time and/or memory requirements for calculating certain nonlinear functions. By way of example and not limitation, computation of ƒ(x)=tan h(x), useful as an activation function in a neural network, will be used to illustrate various aspects of the disclosure.
Typically, a Taylor series expansion is used to approximate the non-linear function ƒ(x). However, the Taylor polynomials may only provide accurate approximations over a finite range of input values. Also, approximation of non-linear function ƒ(x) using Taylor series expansion may be computationally inefficient, as such approximation may require significant memory and processing time. Alternatively, LUTs containing precomputed results of the non-linear function ƒ(x) in conjunction with interpolation/extrapolation using the precomputed results of ƒ(x) from the LUTs may be used to approximate the non-linear function ƒ(x). However, the results of the interpolation/extrapolation may not provide sufficient accuracy. In an aspect, spline interpolation may be used to approximate the non-linear function ƒ(x). However, any other interpolation techniques, for example, rational interpolation, multivariate interpolation, or the like, can also be used to approximate the non-linear function ƒ(x) in fixed point.
In an aspect, spline interpolation of a non-linear function ƒ(x) may employ a low-degree polynomial for each segment of a plurality of segments used to approximate the non-linear function ƒ(x). For a given nonlinear function ƒ(x), a larger input range of x may require more segments to approximate the nonlinear function with a certain accuracy than when the nonlinear function is approximated over a smaller input range of x. In spline interpolation, each segment may be approximated by a polynomial. The polynomials of the segments may be chosen such that the polynomials fit smoothly together. In an aspect, cubic splines (polynomials of degree three), which may be used to approximate any function, may be utilized in the spline interpolation.
In one aspect, the approximation of a non-linear unction ƒ(x) with a large input range using spline interpolation may require a large number of splines/segments to cover the large input range and may use a LUT to store precomputed values of the coefficients of the nonlinear function ƒ(x) for each spline/segment. Therefore, in such approximation, the use of a LUT may require a large memory space to store the LUT. In an aspect, to reduce memory space, one or more contiguous segments, in which the value of the non-linear function ƒ(x) does not change frequently, may be combined into a larger segment represented by a single spline polynomial to reduce the total number of segments, which may reduce the memory requirements. In an aspect, the boundaries of segments may be restricted to powers of 2, instead of using segments of equal size or of arbitrary sizes. When the boundaries of segments occur at a power of 2, identification of the segment that contains the value of the input variable x of the non-linear function ƒ(x) may be determined at a lower computation cost based on a fixed point representation, e.g., binary representation, of the input variable x for the non-linear function ƒ(x).
In one aspect, the LUTs corresponding to the exponentially spaced segments (e.g., the boundaries of segments restricted to powers of 2) of the non-linear function ƒ(x) may allow a reduced complexity look-up of the segment that includes value of the input variable x. Utilizing exponentially spaced segments may also allow for higher precision calculation of the nonlinear function ƒ(x) for a certain range of the input and lower precision calculation of the nonlinear function ƒ(x) for other values of the input. For example, for activation functions in neural networks (e.g., sigmoid or tan h), the most interesting region may be close to 0. In a neural network, the activation function forces the result of a transfer function into a finite range {−1, 1}.
One or more of the embodiments of the present disclosure may be used to calculate nonlinear functions more accurately and efficiently in hardware using LUTs and spline interpolation or extrapolation.
In one aspect, the non-linear function ƒ(x) may be partitioned into several ranges of input values/segments/splines, where each segment may be approximated by a polynomial. In one embodiment, the segment boundaries of the non-linear function ƒ(x) may be pre-defined and stored in the memory 220. The polynomials of the segments, used to interpolate between the predefined values of ƒ(x) may be chosen such that the polynomials fit smoothly together. In an aspect, cubic splines (polynomials of degree three) may be utilized in the spline interpolation. The coefficients of the cubic of the nonlinear function ƒ(x) for each segment are stored in a corresponding LUT.
In some embodiments, the segments of the non-linear function ƒ(x) may be determined based on the nature of the non-linear function, the range of the input variable x of interest, whether the accuracy may be reduced for certain ranges of x where the non-linear function ƒ(x) is of less interest to the application, or the like. For example, in a neural network, an activation function may be represented by ƒ(x)=tan h(x) where the most interesting part of the function is in a range around zero, as the range 0<x<1 of the non-linear function tan h(x) has more variation of the function values and thus may need higher accuracy of calculation. On the other hand, for values of x greater than 4, the value of the function tan h(x) is approximately 1. Thus, the function ƒ(x)=tan h(x) for x≥0 may be represented by three segments as follows:
Segment #1: 0≤x<1 (330)
Segment #2: 1≤x<4 (340)
Segment #3: 4≤x<∞ (350)
Once the segments of the non-linear function ƒ(x) are determined, the processor 210 identifies the segment of the non-linear function ƒ(x)=tan h(x) that includes the value of the input variable x. For example, to identify the segment of the non-linear function ƒ(x)=tan h(x) based on x, the processor 210 may determine the location of the leading 1 of the binary representation of the input variable x=10.10110 (from the MSB side). The location of the leading 1 determines the segment of the non-linear function ƒ(x) in which the value of the input variable x belongs, which in this case is Segment #2 (340). For example, in the current exemplary embodiment, the binary representation of x=10.10110 (360) in the fixed point binary domain can be written in decimal as x=2.6875, which falls within Segment #2 (1≤x<4) (340).
Next, the processor 210 calculates the result of approximation of ƒ(x)=tan h(x) for the input variable x=10.10110 (360), using the spline polynomial corresponding to the Segment #2 (340) and the LUT containing precomputed values of the coefficients of the nonlinear function ƒ(x)=tan h(x) for each spline/segment. For example, if the spline polynomial representing Segment #2 (340) in which the value of the input variable x belongs is ax3+bx2+cx+d, the LUT may contain the values of the coefficients a, b, c, and d. The processor 310 may calculate the result of the approximation of ƒ(x)=tan h(x) for x=10.10110 (360) using the value of x in conjunction with the values of the coefficients a, b, c, and d from the equation ax3+bx2+cx+d.
Although, any non-linear function can be accurately approximated using the concept of exponentially spaced segments for splines, the concept as disclosed in the embodiment of
Another aspect of the present disclosure may calculate a nonlinear function via spline reuse. For certain classes of functions, e.g., when computing ƒ(x)=xβ or ƒ(x)=log x, polynomial approximations for the entire input range may not be necessary. Instead, splines defined for a smaller input range may be reused for the entire input range. Such nonlinear functions may be decomposed by rewriting x=y·z. With such a decomposition, a spline defined for a small range for the entire input range may be reused to approximate the nonlinear function over the entire input range of x. In one embodiment of the spline reuse approximation method, the spline segments may not be exponentially spaced.
For example, in one embodiment, in order to approximate (470) the non-linear function ƒ(x)=xβ (where β is a constant and x>0), using the concept of spline reuse, decomposition may be performed on the function ƒ(x)=xβ by rewriting x=y·z, or xβ=(y·z)β=yβ·zβ. Therefore, xβ can be determined by determining 450 the values of yβ and zβ. In this example, y=2n (445), therefore, yβ=(2n)β, (n∈I) and z∈(1.0, 2.0). In this case, y is an integer that is a power of 2 and the value of z is a real number between the values of 1 and 2. Therefore, the value of y can be determined 445 by determining 460 the value of n. Once n is determined 460, the processor 410 determines 440 the value of z from the value of input variable x 435. The above decomposition may be useful to approximate xβ, since this decomposition only uses two LUTs one for yβ and another LUT for zβ. The LUT for the zβ is over a limited input range, which may reduce memory space and thus improve processing time. In one embodiment, zβ can be computed using cubic spline approximations. For example, for z∈(1.0, 2.0), there may be four equally spaced splines:
1.0≤z<1.25
1.25≤z<1.5
1.5≤z<1.75
1.75≤z<2.0
The LUT 494 for zβ may contain precomputed values of the coefficients of the cubic polynomial for each segment (e.g., the values of the coefficients a, b, c, and d from the equation ax3+bx2+cx+d).
In this exemplary configuration, the non-linear function ƒ(x)=xβ, the LUTs for yβ and zβ for different β values may be pre-defined and stored in the memory 420. In some configuration, the value of β may be received by the memory 420. The processor 410 may access the LUTs for yβ and zβ for the received β value from the memory 420 to perform the approximation using the concept of spline reuse.
Once, n and z 490 are computed by the processor 410, the processor 410 determines 445 the value of y, using the value of n=4, since y=2n=24=16 (in decimal). Next, the processor 410 retrieves the value of yβ for n=4 from a LUT 492 for the yβ and the value for zβ for z=1.16796875 from a LUT 494 for zβ. The LUT 492 for yβ only contains a finite range of values because y is an integer that is power of 2. The LUT 494 for zβ can be reused, that is only one LUT is needed over a finite range for zβ. Following are two exemplary LUTs for yβ and one for zβ for β=2.
The LUT 492 for yβ is needed to cover a wide enough dynamic range of the input x 435. The linear increase of the LUT size corresponds to the exponential increase of the input range. Since in this case, y=2n, the LUT is exact and no approximation is needed to compute yβ. The LUT 494 for zβ provides an approximation to the exact value of zβ for all z∈(1.0, 2.0).
For the decomposition x=y·z, xβ=yβ·zβ, the processor 410 computes 470 the value of xβ by computing yβ·zβ, for a particular value of β, (e.g. β=2). In particular, the processor 410 computes 450 the value of yβ using the LUT 492 of yβ and the processor 410 computes 450 the value of zβ using the LUT 494 of zβ. The processor 410 then multiplies the computed values of yβ and zβ to compute 470 the value of xβ.
A similar decomposition technique may be used to compute the value of ƒ(x)=log x by rewriting x=y·z. Therefore, log x=log y+log z. Additionally, the value of log x may be calculated by determining the value of y and z using the above described technique, and using two LUTs, one LUT for log y and another LUT for log z. Again, log z may also be approximated using splines.
In another aspect for computing a nonlinear function by spline reuse and decomposition, an alternative decomposition of the input variable x may be used during evaluation of the function xβ (where β is a constant, and x>0) by adjusting the range of z. As such, yβ=(2n)β (n=pm, m∈I) and z∈(1.0, 2.0p), p∈I. In this case, the value of p=3, which may be predefined and stored in the memory 420. As such, y is an integer that is power of 8, since the value of n is a multiple of 3 (p=3) and the value of z is a real number between the values of 1 and 8. The value of p may be chosen based on the shape of xβ curve. For example, for a small β value (e.g., 0.001) a large p value may be chosen to cover a wide range of zβ values.
In this exemplary configuration, the non-linear function ƒ(x)=xβ, the LUTs for yβ and zβ for different β values may be pre-defined and stored in the memory 420. In some configurations, the value of β may be received by the memory 420. The processor 410 may access the LUTs for yβ and zβ for the received β value from the memory 420 to perform the approximation using the concept of spline reuse.
In this case, t>0 if the leading 1 (480) is to the left of the decimal point 485, and t<0 if the leading 1 (480) is to the right of the decimal point 485. In the current example, for the input variable x=10010.10110 (435)
Once, n and z are computed by the processor 410, the processor 410 determines 445 the value of y, using the value of n=3, since y=2n=23=8 (in decimal). Next, the processor 410 computes 450 the value of yβ for n=3 from a LUT 496 for the yβ and the value for zβ for z=2.3359375 (decimal) from a LUT 498 for zβ. The LUT 496 for yβ only contains a finite range of values because y is an integer that is power of 8. The LUT 498 for zβ also contains a finite range of values because z is a real number between the values of 1 and 8. The LUT 498 for zβ can be reused, that is only one LUT is needed over a finite range for zβ. Following are two exemplary LUTs for yβ and one for zβ for β=2.
The LUT 496 for yβ may cover a wide enough dynamic range of the input. The linear increase of the LUT size corresponds to the exponential increase of the input range. Since in this case, y=2n=23 8, no approximation is needed for the LUT 496 for yβ. The LUT 498 for zβ provides an approximation to the exact value of zβ for all z∈(1.0, 2.0p).
For the decomposition x=y·z, xβ=yβ·zβ, the processor 410 computes 470 the value of xβ by computing yβ·zβ, for a particular value of β, (e.g. β=2). In particular, the processor 410 computes 450 the value of yβ using the LUT 496 of yβ and the processor 410 computes 450 the value of zβ using the LUT 498 of zβ. The processor 410 then multiplies the computed values of yβ and zβ to compute the value of xβ. A similar decomposition technique may be used to compute the value of ƒ(x)=log x by rewriting x=y·z. Therefore, log x log y+log z. Additionally, the value of log x may be calculated by determining the value of y and z using the above described technique, and using two LUTs, one LUT for log y and another LUT for log z. Again, log z may also be approximated by using splines.
In one or more embodiments, spline reuse can be used for efficient computations of functions such as yβ or log x. For example, spline reuse method of
At 502, the processor determines an integer n (e.g., at 460 as shown in
For example, assume x=10010.10110, as discussed with respect to
At 504, the processor determines a value for y(x) (e.g., yβ as in the method of
For example, referring to the LUT 492 for n=4 of
As another example, referring to the LUT 496 for n=3 of
At 506, the processor determines a value for z(x) based on n and the input variable x, and based on a second look-up table. In one configuration, the second LUT (e.g., LUT 498 of zβ of
For example, referring to LUT 494 as illustrated in
As another example, referring to LUT 498 as illustrated in
A 508, the processor computes ƒ(x) based on the determined values for y(x) and z(x).
For example, as shown in
In one configuration, in order to determine the integer n (at 502), at 510 the processor determines a position of a decimal point (e.g. 485 in
For example, as shown with respect to
Next, at 512, the processor determines the position of the MSB of the binary representation of the input variable x. In one configuration, the position of the MSB of the binary representation of the input variable x is a position of a leading 1 in the binary representation of the input variable x (e.g., the position of leading 1 480 in input x 435 in
For example, as shown with respect to
At 514 the processor determines a number t (e.g., as discussed with respect to
For example, as discussed with respect to
Next, at 516, the processor determines n as
where p≥1 and is an integer.
For example, as shown in
In one configuration, in order to determine a value for z(x) based on n and the input variable x (at 506), at 518, the processor moves a decimal point in the binary representation of the input variable x by n positions to the left (e.g., at 440 at
For example, as shown with respect to
Next, at 520, the processor look up z(x) in the second look-up table (e.g., LUT 498 of zβ of
For example, referring to LUT 498 as illustrated in
The apparatus 602 further includes a Y determination component 606 that determines a value for y(x) based on a first LUT and the determined integer n. In one configuration, the first LUT may contain different values for y(x) and can be pre-computed by the Y determination component 606 and stored in a memory in the Y determination component 606. In one embodiment, y(x)=2nβ, where β is a constant and β∈, is a set of real numbers. In such case, the first LUT provides a mapping between at least one of n or 2n, and 2nβ, and the Y determination component 606 determines the value for y(x) by determining the value for 2nβ associated with the at least one of n or 2n.
The apparatus 602 also includes a Z determination component 608 that determines a value for z(x) based on n and the input variable x, and based on a second LUT. In one configuration, the second LUT may contain different values for z(x) and can be pre-computed by the Z determination component 608 and stored in a memory in the Z determination component 608. In one configuration, the value for z(x) is determined based on a binary representation of the input variable x. In some configurations, z(x) is a function of z, where z(x)=zβ, z∈(1.0, 2.0p), p is an integer. In such case, the second LUT provides a mapping between z and zβ, and the Z determination component 608 determines the value for z(x) by determining the value for zβ associated with z.
Additionally, the apparatus 602 also includes an F determination component 610 that computes the non-linear function ƒ(x) based on the determined values for y(x) and z(x).
In one configuration, in order to determine the integer n, the n determination component 604 determines a position of a decimal point in the binary representation of the input variable x. Next, the n determination component 604 determines the position of the MSB of the binary representation of the input variable x. In one configuration, the position of the MSB of the binary representation of the input variable x is a position of a leading 1 in the binary representation of the input variable x. Next, the n determination component 604 determines a number t as being a number of numeral digits between the position of the MSB and the position of the decimal point. The n determination component 604 then determines the integer n as
where p≥1 and is an integer.
In one configuration, in order to determine a value for z(x) based on n and the input variable x, the Z determination component 608 moves a decimal point in the binary representation of the input variable x by n positions to the left to determine z. Next, the Z determination component 608 looks up z(x) in the second look-up table based on the determined z.
The apparatus 602 may include additional components that perform each of the blocks of the algorithm in the aforementioned flowchart 500 of
The processor 712 is responsible for general processing, including the execution of software stored on the computer-readable medium/memory 714. The software, when executed by the processor 712, causes the processing system 716 to perform the various functions described supra for any particular apparatus. The computer-readable medium/memory 714 may also be used for storing data that is manipulated by the processor 712 when executing software. The processing system 716 further includes at least one of the components, the n determination component 704, the Y determination component 706, the Z determination component 708 and the F determination component 710. The components may be software components running in the processor 712, resident/stored in the computer readable medium/memory 714, one or more hardware components coupled to the processor 712, or some combination thereof.
In one configuration, the apparatus 602/602′ for computing a non-linear function ƒ(x) for an input variable x, where ƒ(x)=g(y(x),z(x), includes means for determining an integer n by determining a position of a most significant bit (MSB) of an input variable x; means for determining a value for y(x) based on a first look-up table and the determined integer n; means for determining a value for z(x) based on n and the input variable x, and based on a second look-up table; and means for computing ƒ(x) based on the determined values for y(x) and z(x). The apparatus 602/602′ further includes means for receiving a value for the input variable x via an input device
In one configuration, the position of the MSB of the input variable x is the position of the MSB of a binary representation of the input variable x. In some configurations, the value for z(x) is determined based on a binary representation of the input variable x. In one configuration, the position of the MSB of the binary representation of the input variable x is a position of a leading 1 in the binary representation of the input variable x.
In one configuration, the means for determining the integer n further configured to determine a position of a decimal point in the binary representation of the input variable x; determine the position of the MSB of the binary representation of the input variable x; and determine a number t as being a number of numeral digits between the position of the MSB and the position of the decimal point.
In one configuration, means for determining the integer n further configured to determine n as
where p≥1 and is an integer, z(x) is a function of z, and z∈(1.0, 2.0p). In some configurations, y(x)=2nβ, where n=pm,
β is a constant, and β∈, is a set of real numbers.
In one configuration, the means for determining the value for z(x) based on n further configured to move a decimal point in the binary representation of the input variable x by n positions to the left to determine z; and look up z(x) in the second look-up table based on the determined z.
In one configuration, ƒ(x)=y(x)*z(x), the non-linear function ƒ(x) is equal to xβ, where x>0, β is a constant and β∈, is a set of real numbers. In some other configurations, y(x)=2nβ. In one configuration, the first look-up table provides a mapping between at least one of n or 2n, and 2nβ, and the means for determining the value for y(x) further configured to determine the value for 2nβ associated with the at least one of n or 2nn. In some configurations, z(x)=zβ, z∈(1.0, 2.0p), p is an integer. In one configuration, the second look-up table provides a mapping between z and zβ, and the means for determining the value for z(x) further configured to determine the value for zβ associated with z. In some configurations, the input variable x is a positive real number. In some other configurations, ƒ(x)=y(x)+z(x), the non-linear function ƒ(x) is equal to log2 x, where x>0.
The aforementioned means may be one or more of the aforementioned components of the apparatus 602/602′ and/or the processing system 716 of the apparatus 602/602″ configured to perform the functions recited by the aforementioned means.
It is understood that the specific order or hierarchy of blocks in the processes/flowcharts disclosed is an illustration of exemplary approaches. Based upon design preferences, it is understood that the specific order or hierarchy of blocks in the processes/flowcharts may be rearranged. Further, some blocks may be combined or omitted. The accompanying method claims present elements of the various blocks in a sample order, and are not meant to be limited to the specific order or hierarchy presented.
The previous description is provided to enable any person skilled in the art to practice the various aspects described herein. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects. Thus, the claims are not intended to be limited to the aspects shown herein, but is to be accorded the full scope consistent with the language claims, wherein reference to an element in the singular is not intended to mean “one and only one” unless specifically so stated, but rather “one or more.” The word “exemplary” is used herein to mean “serving as an example, instance, or illustration.” Any aspect described herein as “exemplary” is not necessarily to be construed as preferred or advantageous over other aspects. Unless specifically stated otherwise, the term “some” refers to one or more. Combinations such as “at least one of A, B, or C,” “one or more of A, B, or C,” “at least one of A, B, and C,” “one or more of A, B, and C,” and “A, B, C, or any combination thereof” include any combination of A, B, and/or C, and may include multiples of A, multiples of B, or multiples of C. Specifically, combinations such as “at least one of A, B, or C,” “one or more of A, B, or C,” “at least one of A, B, and C,” “one or more of A, B, and C,” and “A, B, C, or any combination thereof” may be A only, B only, C only, A and B, A and C, B and C, or A and B and C, where any such combinations may contain one or more member or members of A, B, or C. All structural and functional equivalents to the elements of the various aspects described throughout this disclosure that are known or later come to be known to those of ordinary skill in the art are expressly incorporated herein by reference and are intended to be encompassed by the claims. Moreover, nothing disclosed herein is intended to be dedicated to the public regardless of whether such disclosure is explicitly recited in the claims. The words “module,” “mechanism,” “element,” “device,” and the like may not be a substitute for the word “means.” As such, no claim element is to be construed as a means plus function unless the element is expressly recited using the phrase “means for.”
Number | Name | Date | Kind |
---|---|---|---|
7366748 | Tang et al. | Apr 2008 | B1 |
7509362 | Singhal | Mar 2009 | B2 |
8356066 | Verma | Jan 2013 | B1 |
9069686 | Azadet et al. | Jun 2015 | B2 |
9207910 | Azadet et al. | Dec 2015 | B2 |
9252712 | Li et al. | Feb 2016 | B2 |
20100198895 | Azadet et al. | Aug 2010 | A1 |
Entry |
---|
International Search Report and Written Opinion—PCT/US2017/045622—ISA/EPO—dated Apr. 5, 2018. |
Muller J-M., “Elementary Functions Algorithms and Implementation, Passage”, Birkhaeuser, Basel, CH, 1997, pp. 143-145, XP001152382. |
Schulte M.J., et al., “Hardware Design for Exactly Rounded Elementary Functions”, IEEE Transactions On Computers, vol. 43, No. 8, Aug. 1994 (Aug. 1994), pp. 964-973. XP000457356, ISSN: 0018-9340. DOI: 10.1109/12.295858. |
Szabo T., et al., “An efficient hardware implementation of feed-forward neural networks”, Electronic Publishing, Artistic Imaging, And Digital Typography, [Lecture Notes in Computer Science, ISSN 0302-9743], Springer Verlag, DE, vol. 2070, 2001, pp. 300-313, XP002409092. |
Number | Date | Country | |
---|---|---|---|
20180060278 A1 | Mar 2018 | US |