The present application is based upon and claims the benefit of priority from the prior Japanese Patent Application No. 2012-182344 filed on Aug. 21, 2012, with the Japanese Patent Office, the entire contents of which are incorporated herein by reference.
The disclosures herein relate to an arithmetic circuit, a processor, and a division method.
Among the four arithmetic operations with respect to binary coded decimals, the division operation is a low-speed operation that involves a greater number of operation cycles than do the other arithmetic operations. In general, a high-precision division operation obtains a partial quotient and an intermediate remainder by use of restoring division. Generation of such an intermediate reminder becomes a critical factor. In basic restoring division, subtracting a divisor from an intermediate remainder is repeated. The fact that the arithmetic result has become negative leads to a conclusion that too many subtractions have been performed, so that the result obtained prior to the subtraction in the instant cycle is used as a partial quotient.
In the following, a procedure of restoring division will be described. In the following description, a dividend and an intermediate remainder will not be discriminated from each other, and will collectively be referred to as an intermediate remainder. At the beginning, an intermediate remainder, a divisor, and a partial quotient are supplied from an intermediate quotient register, a divisor register, and a partial quotient register, respectively. In the first subtraction loop, the partial quotient is zero. The following processes will be performed in the first and subsequent subtraction loops. First, the partial quotient is counted up. Next, a subtraction circuit subtracts the divisor from the intermediate remainder to produce a subtraction result and a carry-out bit. In the case of the carry-out bit being 1 (indicating that the result is a positive number), the subtraction result is stored in the intermediate remainder register, and the partial quotient counted up at the beginning of the current subtraction loop is stored in the partial quotient register, followed by proceeding to the next subtraction loop. In the case of the carry-out bit being 0 (indicating that the result is a negative number), the value stored in the intermediate remainder register (i.e., the value in existence prior to the subtraction in the current subtraction loop) is stored in the intermediate remainder register, and the value stored in the partial quotient register (i.e., the value in existence prior to counting up in the current subtraction loop) is stored in the partial quotient register. The procedure then comes to a halt. The value of the intermediate remainder register and the value of the partial quotient register at this moment are the final result values of the intermediate remainder and the partial quotient, respectively.
In this manner, restoring division involves repeating the process of subtracting a divisor from an intermediate remainder until the intermediate remainder becomes negative in order to produce a partial quotient and an intermediate remainder. In the case of decimal numbers, a one-digit quotient can assume any value in a range of 0 to 9, so that subtraction operations may be repeated up to ten times. Such a procedure is repeated until all the quotients for all the digits are obtained. The latency of an arithmetic device for performing division may become exacerbated.
The problem of basic restoring division is that the number of repeated subtraction loops for generating an intermediate remainder and a partial quotient is large. A common approach to obviating this problem may calculate one or more N-th multiples (N: integer) of the divisor in advance, and may then subtract these N-th multiples of the divisor from the intermediate remainder, respectively, followed by categorizing the results.
For example, a known method calculates first, second, and fifth multiples of a divisor in advance (see Patent Document 1, for example). In the first subtraction operation, the fifth multiple of the divisor is subtracted from the intermediate remainder. When the result is a negative number, this fact indicates that subtracting the fifth multiple of the divisor is excessive. It is thus concluded that the one-digit quotient is in the range of 0 to 4. Otherwise, it is concluded that the one-digit quotient is in the range of 5 to 9. In this manner, restoring division may be performed in a coarse fashion by using one or more N-th multiples of a divisor to narrow the range of values that the quotient can assume in the next cycle, thereby reducing the number of loops performed for generating a partial quotient and an intermediate remainder. According to the disclosed algorithm, the final result can be obtained by performing loops up to four times (Patent Document 1).
In stead of using one subtracter (see Patent Document 1), a plurality of subtracters may be used to obtain the results of subtractions with regard to two or more N-th multiples of a divisor at the same time. This serves to further enhance the speed. In an extreme example, the first through ninth multiples of a divisor may be prepared in advance, and nine subtracters may be utilized to produce all the results only in one loop. Alternatively, the first, second, third and sixth multiples of a divisor may be prepared in advance, and two subtracter circuits may be used to produce the results (see Patent Document 2, for example).
Another known method predicts a partial quotient and an intermediate remainder from the states of a dividend and a divisor in addition to the above-noted speed enhancement achieved by subtracting one or more N-th multiples of a divisor. For example, circuits may be configured to check, at the time of performing the second subtraction, the intermediate remainder and the states of upper order digits of the third multiple of a divisor, thereby selecting an N-th multiple of a divisor used in the second subtraction (Patent Document 2, for example). Speed enhancement may also be achieved by adding a quotient predicting circuit capable of predicting a partial quotient with an error margin of 1 or less based on the states of the intermediate remainder and the divisor and also by adding a circuit for correcting such an error (Patent Document 3, for example).
In the speed enhancement achieved by use of two or more N-th multiples of a divisor, there is a tradeoff between an increase in circuit size and the number of loops. When a division operation that uses a small number of subtracters is desirable due to hardware constrains, the number of cycles performed to obtain results becomes large. Further, an increase resulting from the addition of circuits is a bottleneck in the speed enhancement achieved by quotient prediction. When a control circuit is embedded in the loop that produces a partial quotient and an intermediate remainder, the number of logic stages in the loop is increased. High operating frequency implementation in such a case is difficult although the latency is improved by the reduction in the number of loops.
Even when quotient prediction and quotient correction are performed at high speed, the presence of a large number of remainder types and/or the use of an arithmetic circuit for multiplying a fixed number of 3N give rise to a problem (Patent Documents 2 and 3, for example). In a decimal-number arithmetic unit, the arithmetic circuit for multiplying a fixed number of 3N cannot be implemented without using an adder. The following three methods are conceivable to achieve this goal.
(1) An adder is added immediately before an adder
(2) Shared use with a subtracter is made.
(3) The sixth multiple of a divisor is generated prior to a loop, and is kept in a register.
The use of the method (1) causes the number of logic stages to be increased by a number equal to the number of adders, thereby imposing a negative effect on the delay. The use of the method (2) involves adding one cycle for generating a partial quotient and an intermediate remainder, and also complicates a control procedure. The use of the method (3) involves adding a register having a width equal to the width of a divisor, which gives rise to a problem of circuit area size.
Further, since quotient prediction involves a heavy logic operation, performing quotient prediction and subtraction simultaneously within one cycle is difficult in the case of high operating frequency. In such a case, the operation cycle may be divided, thereby posing a risk of deteriorating latency.
In the case of Patent Document 2, an intermediate remainder and the two upper digits of the third multiple of a divisor are compared as a method of quotient prediction. Since a comparator is generally implemented by use of an adder, this arrangement involves the use of an additional two-digit adder. In the case of high operating frequency, there is also a risk of deteriorating latency.
According to an aspect of the embodiment, an arithmetic circuit for performing division based on restoring division includes an intermediate remainder register configured to store an intermediate remainder, a quotient prediction circuit configured to perform, based on information about two most significant digits of the intermediate remainder and a most significant digit of a divisor, quotient prediction having lower precision than a highest precision obtainable from the information, thereby generating a prediction result, a fixed-value multiplication circuit configured to output one or more N-th (N: integer) multiples of the divisor selected in response to the prediction result generated by the quotient prediction circuit, one or more subtracters configured to subtract, from the intermediate remainder, the one or more N-th multiples of the divisor output from the fixed-value multiplication circuit, and a partial quotient calculating circuit configured to obtain a partial quotient in response to one or more carry-out bits of one or more subtractions performed by the one or more subtracters.
The object and advantages of the embodiment will be realized and attained by means of the elements and combinations particularly pointed out in the claims. It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the invention, as claimed.
In the following, embodiments of the invention will be described with reference to the accompanying drawings.
When division is performed by use of one or more N-th multiples of a divisor, the number of subtraction loops involved varies depending on the number of adders. When the optimal N-th multiples of a divisor are used, the number of subtraction loops involved is represented by the following formula. loga+1A=B (digits below a decimal points are rounded up in B)
Here, “a” represents the number of adders, “A” representing the possible range of partial quotients, and “B” representing the maximum number of subtractions performed to obtain the partial quotient.
Assuming that the number of adders is 1, the number of quotient candidates in the initial state may be reduced to 8 by performing some preprocessing such as quotient prediction. In such a case, the number of loops can be reduced to 3 from 4, which is the number of loops performed when the number of quotient candidates is 10. Assuming that the number of adders is 2, the number of quotient candidates in the initial state may be reduced to 9 by performing some preprocessing such as quotient prediction. In such a case, as is understood from the table of
In order to reduce the possible number of quotients from 10 to m (m: an integer smaller than 10), the 10 quotient candidates may be divided into a plurality of groups each including no more than m quotients, and, then, one group may be identified by use of quotient prediction. Each group is a sub-set of the set that contains all the possible values (i.e., 10 values) of quotients. In order to reduce the possible number of quotients from 10 to 9, for example, the 10 quotient candidates may be divided into two groups each including no more than 9 quotients, and, then, one group may be identified by use of quotient prediction. In so doing, the two groups may have an overlap with each other. Namely, the two groups may include one or more identical quotients.
What this means is that sufficient processing speed enhancement may be achieved by performing coarse quotient prediction without the need for performing high precision quotient prediction as disclosed in Patent Document 3. Coarse quotient prediction may be performed by using information about the two most significant digits of an intermediate remainder and the most significant digit of a devisor.
In the case of the number of adders being 2, for example, it suffices to reduce the possible number of quotients obtained by quotient prediction to 9 or less. There is thus no need to use all the data contained in the table of
In order to perform lower-precision quotient prediction, i.e., coarse quotient prediction, the table of
In this manner, division is made to obtain two groups each including 9 or less quotients. When quotient prediction as will be described later is performed to identify one of the groups, the number of quotient candidates in the initial state is reduced to at most 9. With this arrangement, in the case of two adders being used, as is understood from the table of
In the table of
In the example of grouping made by the boundary 10 described above, the number of elements in each group (i.e., the number of quotients included in each group) is 8 or less. With this arrangement, thus, also in the case of one adder being used, as is understood from the table of
The coarse quotient prediction disclosed herein performs, based the information about the two most significant digits of a dividend (i.e., intermediate remainder) and the most significant digit of a divisor, quotient prediction having lower precision than the highest precision that is obtainable from such information. This coarse quotient prediction is not limited to the use of a particular number of adders or a particular number of loops. This coarse quotient prediction does not identify a quotient range specified at an intersection between the row and the column that are identified by use of the two most significant digits of a dividend and the most significant digit of a divisor, but rather identifies a group of a plurality of intersections between rows and columns. Further, this coarse quotient prediction may be performed by using only part of all the bits that are comprised of the two most significant digits of a dividend (i.e., intermediate remainder) and the most significant digit of a devisor, for example.
In
In step S1, the intermediate remainder R and the divisor DIVs are provided as inputs. In step S2, quotient prediction is made. This quotient prediction is performed by identifying one of the group (including quotients of 0 to 7) on the upper side of the boundary in the table of
In step S3, a select signal is generated such that a process in step S4 is performed when the possible range of quotients is 4 to 9, and such that a process in step S9 is performed when the possible range of quotient is 0 to 7.
In step S4, the intermediate remainder and the divisor are supplied to the first and second subtracters. The first subtracter subtracts the fifth multiple of the divisor from the intermediate remainder R to produce the intermediate remainder R1 and the carry-out bit CO1. The second subtracter subtracts the eighth multiple of the divisor from the intermediate remainder R to produce the intermediate remainder R2 and the carry-out bit CO2.
In step S5, values to be set to the intermediate remainder R and the partial quotient Q are selected in response to a combination of the carry-out bits CO1 and CO2 of the first and second respective subtracters. When CO1 and CO2 are 0 and 0, respectively, in step S6, the value of the intermediate remainder R is left unchanged, and the partial quotient Q is set equal to 4. When CO1 and CO2 are 1 and 0, respectively, in step S7, the intermediate remainder R is set equal to the intermediate remainder R1, and the partial quotient Q is set equal to 5. When CO1 and CO2 are 1 and 1, respectively, in step S8, the intermediate remainder R is set equal to the intermediate remainder R2, and the partial quotient Q is set equal to 8.
In step S9, the intermediate remainder and the divisor are supplied to the first subtracter and the second subtracter. The first subtracter subtracts the second multiple of the divisor from the intermediate remainder R to produce the intermediate remainder R1 and the carry-out bit CO1. The second subtracter subtracts the fifth multiple of the divisor from the intermediate remainder R to produce the intermediate remainder R2 and the carry-out bit CO2.
In step S10, values to be set to the intermediate remainder R and the partial quotient Q are selected in response to a combination of the carry-out bits CO1 and CO2 of the first and second respective subtracters. When CO1 and CO2 are 0 and 0, respectively, in step S11, the value of the intermediate remainder R is left unchanged, and the partial quotient Q is set equal to 0. When CO1 and CO2 are 1 and 0, respectively, in step S12, the intermediate remainder R is set equal to the intermediate remainder R1, and the partial quotient Q is set equal to 2. When CO1 and CO2 are 1 and 1, respectively, in step S13, the intermediate remainder R is set equal to the intermediate remainder R2, and the partial quotient Q is set equal to 5.
The first subtraction loop is performed as described above. Subsequently, the second subtraction loop as will be described below is performed.
When the quotient prediction indicates a range of 4 to 9, and when CO1 and CO2 in the first subtraction loop are 0 and 0, respectively, in step S14, the fourth multiple of the divisor is subtracted from the intermediate remainder R, and the result of the subtraction is used as the intermediate remainder R.
In conditions other than the condition that the quotient prediction indicates a range of 4 to 9 and CO1 and CO2 in the first subtraction loop are 0 and 0, respectively, in step S15, the intermediate remainder and the divisor are applied to the first subtracter and the second subtracter. The first subtracter subtracts the divisor from the intermediate remainder R to produce the intermediate remainder R1 and the carry-out bit CO1. The second subtracter subtracts the second multiple of the divisor from the intermediate remainder R to produce the intermediate remainder R2 and the carry-out bit CO2.
In step S16, values to be set to the intermediate remainder R and the partial quotient Q are selected in response to a combination of the carry-out bits CO1 and CO2 of the first and second respective subtracters. When CO1 and CO2 are 0 and 0, respectively, in step S17, the value of the intermediate remainder R is left unchanged, and the partial quotient Q is also left unchanged. When CO1 and CO2 are 1 and 0, respectively, in step S18, the intermediate remainder R is set equal to the intermediate remainder R1, and the partial quotient Q is increased by 1. When CO1 and CO2 are 1 and 1, respectively, in step S19, the intermediate remainder R is set equal to the intermediate remainder R2, and the partial quotient Q is increased by 2.
In final step S20, the intermediate remainder R and the partial quotient Q are output. With the above-noted procedure, the intermediate remainder R and the partial quotient Q are obtained by performing two subtraction loops.
The above-noted computer system is an exemplified information processing apparatus utilizing a CPU (central processing unit), and is used to implement hardware for performing arithmetic on Oracle-numbers. In the processor 110, the cache memory system implemented as having a multilayer structure in which the primary cache unit 113 and the secondary cache unit 112 are provided. Specifically, the secondary cache unit 112 that can be accessed faster than the main memory is situated between the primary cache unit 113 and the main memory (i.e., the memory 111). With this arrangement, the frequency of access to the main memory upon the occurrence of cache misses in the primary cache unit 113 is reduced, thereby lowering cache-miss penalty.
The control unit (instruction control unit) 114 issues an instruction fetch address and an instruction fetch request to a primary instruction cache 113A to fetch an instruction from this instruction fetch address. The control unit 114 controls the arithmetic unit 115 in accordance with the decode results of the fetched instruction (e.g., division instruction) to execute the fetched instruction. The arithmetic controlling unit 117 operates under the control of the control unit 114 to supply data to be processed from the register 116 to the arithmetic device 118 and to store processed data in the register 116 at a specified register location. Further, the arithmetic controlling unit 117 specifies the type of arithmetic performed by the arithmetic device 118. Moreover, the arithmetic controlling unit 117 specifies an address to be accessed to perform a load instruction or a store instruction with respect to this address in the primary cache unit 113. Data read from the specified address by the load instruction is stored in the register 116 at a specified register location. Data stored at a specified location in the register 116 is written to the specified address by the store instruction. The arithmetic circuit 119A of the divider 119 included in the arithmetic device 118 serves to calculate a partial quotient and an intermediate remainder, and may be a circuit that can produce results with two adders by use of two loops based on the coarse quotient prediction that was previously described.
The fixed-value multiplication circuit 127 generates the second multiple of the divisor, the fourth multiple of the divisor, the fifth multiple of the divisor, and the eighth multiple of the divisor. Among the multiples of a binary coded decimal number, these N-th multiples of a divisor (i.e., N=2, 4, 5, 8) can be generated by use of simpler logic than the logic for generating other multiples.
In the second-multiple circuit, doubling the value of each digit will result in the value of each digit being an even number when carry propagation is ignored. As a result, the carry propagated from the lower digit can be accommodated in the least significant bit of each digit. It follows that there is no need to take into account successive carry propagations. When calculating the value of a digit of interest, only the value of this digit and the value of the next lower digit may be taken into account. Accordingly, a circuit for calculating a second multiple can be implemented as a combinatorial logic circuit based on a truth table that defines input values and output values. A circuit implemented in such a manner can calculate a second multiple faster than an adder calculating a second multiple.
In the case of a fourth-multiple circuit and an eighth-multiple circuit, two carry bits may be generated under some circumstances. Because of this, a circuit cannot be designed based on a single-digit truth table as described above. Since the second-multiple circuit can be implemented by a simple combinatorial logic circuit, a fourth-multiple circuit may be implemented by connecting two second-multiple circuits in series, and an eighth-multiple circuit may be implemented by connecting three second-multiple circuits in series.
In the case of a fifth-multiple circuit, an outcome of multiplying an input number by 10 may be divided by 2. This process can be implemented as follows. An input number is shifted to the left by four bits so as to perform 10-fold multiplication. 10 times the input number obtained in this manner is then shifted to the right by one bit so as to perform a halving process. This one-bit right shift operation produces a correct result (i.e., ½ of the input) when every bit “1” moves within the same digit. When a bit “1” moves from the n+1-th digit to the n-th digit, the value generated by the bit “1” moving from the n+1-th digit to the n-th digit is equal to 8 (10002). Half of the bit “1” in the n+1-th digit is equal to 5 in the n-th digit, so that the value “8” generated by the bit “1” moving from the n+1-th digit to the n-th digit is desirably converted into 5. In consideration of the above, when the most significant digit is 1 in any given digit, the most significant digit is changed to “0”, and 5 is added to this digit. When a one-bit right shift operation is performed as a halving process, the three lower bits of each digit can only assume a value in a range of 0 to 4. Adding 5 as described above does not end up generating a carry-out bit. Accordingly, a circuit for calculating a fifth multiple can be implemented as a combinatorial logic circuit based on a truth table that defines input values and output values. A circuit implemented in such a manner can calculate a fifth multiple faster than an adder calculating a fifth multiple.
Referring to
In the following, the operation of the arithmetic circuit illustrated in
In
The fixed-value multiplication circuit 127 supplies the fifth multiple of a divisor to the subtracter 128 in the case of the fifth-multiple selecting signal sel×5 being 1, and supplies an original divisor (the first multiple of a divisor) to the subtracter 128 in the case of the fifth-multiple selecting signal sel×5 being 0. The fixed-value multiplication circuit 127 supplies the second multiple of a divisor to the subtracter 129 when the fourth-multiple selecting signal sel×4 and the eighth-multiple selecting signal sel×8 are 0 and 0, respectively. The fixed-value multiplication circuit 127 supplies the fourth multiple of a divisor to the subtracter 129 when the fourth-multiple selecting signal sel×4 and the eighth-multiple selecting signal sel×8 are 1 and 0, respectively. The fixed-value multiplication circuit 127 supplies the eighth multiple of a divisor to the subtracter 129 when the eighth-multiple selecting signal sel×8 is 1.
The multiple selecting circuit 126 sets the fifth-multiple selecting signal sel×5 equal to 1 in the case of the subtraction count check signal “cycle” being 0. The multiple selecting circuit 126 outputs the supplied fourth-multiple selecting signal sel×4 without any change. The multiple selecting circuit 126 sets the eighth-multiple selecting signal sel×8 equal to 1 when the subtraction count check signal “cycle” and the quotient prediction signal preQ are 0 and 1, respectively.
Referring to
In
Referring to
When the fourth-multiple selecting signal sel×4 is 1, the partial quotient Q is output from the partial quotient calculating circuit 131 through the OR gate 206. The output partial quotient Q is supplied to and stored in the partial quotient register 133. When the subtraction count check signal “cycle” is 0, the first partial quotient as defined above is output from the partial quotient calculating circuit 131 through the OR gate 206. The output partial quotient Q is supplied to and stored in the partial quotient register 133. When the fourth-multiple selecting signal sel×4 and the subtraction count check signal “cycle” are 0 and 1, respectively, the second partial quotient as defined above is output from the partial quotient calculating circuit 131 through the OR gate 206. The output partial quotient Q is supplied to and stored in the partial quotient register 133.
The arithmetic circuit 119A illustrated in
According to at least one embodiment, an arithmetic circuit is provided that utilizes an efficient circuit configuration to reduce the number of subtraction loops in restoring division.
All examples and conditional language recited herein are intended for pedagogical purposes to aid the reader in understanding the invention and the concepts contributed by the inventor to furthering the art, and are to be construed as being without limitation to such specifically recited examples and conditions, nor does the organization of such examples in the specification relate to a showing of the superiority and inferiority of the invention. Although the embodiment(s) of the present inventions have been described in detail, it should be understood that the various changes, substitutions, and alterations could be made hereto without departing from the spirit and scope of the invention.
Number | Date | Country | Kind |
---|---|---|---|
2012-182344 | Aug 2012 | JP | national |