The invention relates to binary multipliers, and more particularly, to a multiplier to be incorporated into a built-in self-test device.
More and more integrated circuits are equipped with built-in self-test devices, also called BIST devices. Such devices are often used in the test phase in the production of integrated circuits. They are designed to assist the test equipment in testing specific functionalities and to reduce the testing time.
Use of a binary multiplier is frequently needed in a BIST device.
The result R is obtained by adding the binary numbers formed by the juxtaposed terms of the columns. A column is completed by a 0 when there is no term. The result thus comprises 7 bits R0 to R6 corresponding respectively to the columns, and an additional most significant bit R7 receiving any carry digit of the addition.
Many multiplier structures, which favor calculation speed, such as the Dadda multiplier or Wallace tree multiplier, are designed to perform the operations shown in
It is thus desirable to provide a particularly small multiplier with sufficient performance for use in a BIST device. This desire may be addressed by a multiplier of a binary number A of n bits by a binary number B of p bits, configured to add each term AiBj with a left shift by i+j bits, where Ai is the bit of weight i of number A, and Bj the bit of weight j of the number B, with i varying between 0 and n−1, and j varying between 0 and p−1. The multiplier comprises a first counter associated with the number A, configured to count modulo n, and paced by a clock. A second counter is associated with the number B, paced by the clock. The multiplier may also include means or circuitry for sequentially producing the terms AiBj by taking the contents of the first and second counters respectively as weights i and j. The multiplier may also include means or circuitry for shifting the content of one of the first and second counters by an increment when the other counter has achieved a revolution.
According to an embodiment, the multiplier comprises, to add the terms AiBj, a programmable increment counter of n+p bits, paced by the clock and receiving the current term AiBj as a setpoint to program an increment of 2i+j. The programmable increment counter may include a series of cascade-connected flip-flops such that if the programming input of rank i+j is active, the flip-flop of rank i+j toggles systematically, and any flip-flop of rank k>i+j toggles if the flip-flops of ranks between i+j and k−1 are in an active state.
According to an embodiment, the second counter may be configured to count modulo p+1, and the multiplier may include means or circuitry for excluding the current term AiBj when the second counter contains j=p. The second counter may be configured to count modulo p, and the multiplier may include means or circuitry for incrementing the first counter by an increment greater than or equal to 2 upon each revolution of the second counter.
According to an embodiment, the first counter may be configured to count up and the second counter is configured to count down. According to an embodiment, the means or circuitry for producing the terms AiBj may comprise two multiplexers respectively receiving the numbers A and B, and whose select commands are respectively supplied by the first and second counters. An AND gate may combining the outputs of the multiplexers.
According to an embodiment, the means or circuitry for excluding the term AiBj may comprise a gate configured to cancel the term when the second counter contains p. According to an embodiment, to exclude the term AiBj, the multiplexer associated with number B may comprise a (p+1)-th input selectable by the content (j) of the second counter, and which receives the value 0.
a is a schematic diagram of an asynchronous programmable increment counter in accordance with an embodiment of the present invention.
b is a schematic diagram of a synchronous programmable increment counter in accordance with an embodiment of the present invention.
In a BIST device, the calculation speed (e.g. high) of a multiplier may be relatively unimportant. In most situations, it is possible to merely provide a serial multiplier producing a result in at least n2 clock cycles, where n is the size of the multiplicands. Even if existing serial structures are particularly simple, the aim is to further reduce the occupied surface area of silicon.
The multiplicands A and B are supplied to respective multiplexers MUX, of n bits to 1 and of p bits to 1. The select inputs of the multiplexers respectively receive the weights i and j of the bits Ai and Bj to be selected in the numbers A and B, where i varies between 0 and n−1, and j varies normally between 0 and p−1. The selected bits Ai and Bi are combined by an AND gate 10 to produce the current term AiBj.
The weight i used to do the selection of the bit Ai is supplied by a counter CNT-A configured to count modulo n, while the weight j used to do the selection of the bit Bj is supplied by a counter CNT-B configured to count modulo p+1. The counters CNT-A and CNT-B are paced by a system clock CK and their initial content can be programmed by initialization lines INIT.
It is noted, with this configuration, that the weight j can reach the value p, which is a value not corresponding to any bit of number B, since the maximum weight of the bits of number B may be p−1. However, this possibility is authorized and, when it occurs, the value of the current term AiBj is ignored. To ignore such a term, given the structure of the multiplier, it is sufficient to cancel it. This is done, for example, using an AND gate 12 which combines the output of gate 10 and a masking signal MK, active when the counter CNT-B contains value p.
In the example represented, n=p=4. The counter CNT-A is a counter on 2 bits, while the counter CNT-B is a counter on 3 bits. The counter CNT-B can theoretically count between 0 and 7, but may be configured in practice to count between 0 and 4. The masking signal MK is then the value of the most significant bit MSB of the counter CNT-B. This most significant bit may furthermore not be part of the value j supplied to the multiplexer, as represented.
According to an alternative, to ignore the term AiBj when the weight j is not within the limits, the multiplexer associated with the number B can comprise, as represented in dotted lines, five inputs instead of four, and the fifth input, that is selected when j=4, receives the value 0 constantly. In this case, as represented in dotted lines, the most significant bit MSB of the counter CNT-B is part of the value j supplied to the multiplexer.
Generally speaking, for a number B of p bits and a counter CNT-B counting modulo p+1, the multiplexer associated with the number B comprises p+1 inputs, the last of which, selected when j=p, receives 0 constantly. The terms AiBj produced sequentially by the gate 10 are supplied to a demultiplexer 1 towards 7 DMUX. The select input of the demultiplexer receives the output of an adder 14 that supplies the sum of the weights i and j of the current term AiBj, these weights being respectively supplied by the counters CNT-A and CNT-B.
Therefore, the output line of the demultiplexer towards which the current term AiBj is directed is representative of the left shift to be performed to sum the term, i.e. of the rank of the column in which the term is arranged according to the representation in
The summation of the terms AiBj is done using a programmable increment counter CNT-R, paced by the clock CK. The increments are programmable by powers of 2. The size in bits of the counter is that n+p of the result R to be produced, i.e. eight bits R0 to R7 in the example. The counter comprises n+p−1=7 increment programming inputs, respectively for the bits R0 to R6, receiving the outputs of the demultiplexer DMUX.
The operation of the counter CNT-R is as follows. When a programming input of rank k is on 1, the value formed by the bits of rank k and above is incremented by 1 upon the next active edge of the clock CK, which amounts to incrementing the counter by 2k. In theory, several programming inputs can be on 1, but in the structure in
Upon the first clock cycle CK, the term A0B0 is added to the position 0 of the counter CNT-R. Upon the second cycle, the counter CNT-B includes the value 4, which is not within the limits. The term A1B4, not being defined, is cancelled and the content of the counter CNT-R remains unchanged. Upon the next two cycles, the terms A2B3 and A3B2 are successively added to the position 5 of the counter CNT-R, and so on and so forth.
After 20 cycles, all the terms AiBj have been added to the corresponding positions of the counter CNT-R, which then includes the result R of the multiplication. It can be seen that the weights i and j of the terms AiBj vary in any way during the clock cycles. This may important as the addition is commutative, and the final result is the same provided that the terms are taken into account with the proper shift and are all scanned. These properties are thus used to generate the sequence of the terms AiBj in a relatively simple way using two simple counters counting continuously.
The price of this relative simplicity is the generation of phantom cycles corresponding to the phases in which the counter CNT-B is not within the limits. In the example in
By starting the counters of the values 2 and 3, i.e. by starting just after the first phantom cycle, as represented by a range F1, the multiplication is performed in 19 cycles instead of 20. After the 19th cycle, instead of ending on the values 1 and 4, corresponding to a phantom cycle, the counters are forced to their starting values 2 and 3. In this case, each multiplication is performed in 19 cycles instead of 20.
Generally speaking, the proportion of phantom cycles is 1 for each revolution of the counter CNT-B, or of the counter that has the greatest number of cycles to count. Therefore, this proportion decreases when the number of bits of the multiplicands increases. The proportion can be further decreased by selecting the starting values of the counters just after a phantom cycle, which enables the number of phantom cycles per multiplication phase to be reduced by 1.
Therefore, upon the first clock cycle, the counters CNT-A and CNT-B have just been initialized to their default values 0 and 4. The first cycle is a phantom cycle, since the value 4 is not within the limits. The terms AiBj are then produced according to a sequence different from
In the example in
Upon the first cycle, the counters have been reset and each include the value 0. During the first four clock cycles, the contents of the counters CNT-A and CNT-B vary identically. Upon the fifth cycle, a phantom cycle, the counter CNT-A overruns and comes back to 0, while the counter CNT-B reaches its maximum value 4, to come back to 0 upon the sixth cycle.
The position in the result of the terms AiBj changes upon each cycle. This enables two counters of the same nature to be used starting from their reset value.
The multiplication is over at the 19th cycle, and the 20th cycle is a phantom cycle. This phantom cycle can thus be removed in consecutive multiplications by forcing the counters to reset at the end of each 19th cycle.
a schematically represents one embodiment of a programmable increment counter CNT-R. This embodiment is shown based on an asynchronous counter architecture.
An asynchronous counter comprises a series of cascade-connected T flip-flops, the clock input of each flip-flop receiving the Q/ inverted output of the previous flip-flop. The clock input of the first flip-flop receives the counting clock, and each flip-flop receives the value 1 at its toggle input T. With this configuration, each flip-flop toggles every time the Q/ output of the previous flip-flop has a rising edge, i.e. when the previous flip-flop toggles from 1 to 0.
In
The selection commands of the multiplexers are the increment programming inputs connected to the outputs of the demultiplexer DMUX in
When a programming input of rank k is on 1, the multiplexer directs the clock CK towards the flip-flop Rk. Therefore, if only this programming input is on 1, the others being on 0, the flip-flops Rk to R7 act like an 8-k-bit counter that is incremented upon the net falling edge of the clock CK, while the flip-flops R0 to Rk−1 are fixed.
b schematically represents one embodiment of a programmable increment counter with a synchronous structure. A synchronous counter comprises a series of T flip-flops paced simultaneously by a same clock CR. The toggle input T of each flip-flop is connected to be active only if all the lower-ranking flip-flops are on 1. This is achieved by a series of cascade-connected AND gates 15. Each AND gate controls the T input of an associated flip-flop and combines the Q output of the previous flip-flop with the output of the previous AND gate. The T input of the first flip-flop normally receives the value 1 continuously, and the AND gate preceding the second flip-flop is thus optional.
To make the increment programmable, an OR gate is provided between each AND gate 15 and the T input of the associated flip-flop. A first input of the OR gate receives the output of the AND gate, and a second input of the OR gate forms an increment programming input, the rank of which is the one of the associated flip-flop. With this configuration, when a programming input of rank k is put to 1, the OR gate forces to 1 the T input of the flip-flop of rank k. The flip-flop then toggles upon each clock edge independently of the states of the lower-ranking flip-flops, incrementing the content of the counter by 2k upon each cycle.
The T input of the first flip-flop directly receives the programming input of rank 0. The diagrams in
In the previous examples, the content of the counter CNT-B is shifted upon each revolution due to the fact that it counts one step more per revolution. The same effect could be obtained with a counter CNT-B that counts modulo p instead of p+1, but that is stopped for one cycle after each revolution. In this case, in
The shifting by one step between the counters upon each revolution is obtained in the following manner, for example. Every time the counter CNT-B reaches its maximum value p−1, the counter CNT-A is incremented by 2 instead of 1 upon the next cycle for a single cycle. The phantom cycle existing in the diagram in
In fact, it may not be desirable to set the threshold to p−1 to trigger the one-step shift. It is sufficient for the counter CNT-B to have done a revolution which can start at any value between 0 and p−1.
Furthermore, the increment used for the shift can be different from 2. The set of values that the increment can take depends on the properties of the operands. When the operands have an equal number of bits, the increment can take as a value any whole number Ic ranging between 2 and p−1, such that Ic−1 and p are prime to each other. The possible values of the increment Ic are summarized in the following table for examples of operand sizes.
Those skilled in the art will be able to find definition sets for the increment Ic when the operands have different sizes.
The counter CNT-B is configured to count up modulo p instead of p+1. The masking circuit shown in
The counter CNT-A is a programmable increment counter. It could have the same structure as that of
A comparator 16 observes the output j of the counter CNT-B. When j is equal to p−1 (3 in the example), the comparator programs the increment +2 of the counter CNT-A. When j is different from p−1, i.e. in all the other cases, the comparator programs the increment +1 of the counter CNT-A.
The serial multiplier embodiments described herein do a multiplication in n×p cycles, where n and p are the respective sizes of the operands A and B. When the sizes are equal and a power of 2, it may be possible to divide the number of cycles by 2 by increasing the surface area of the circuit according to an acceptable compromise.
To do so, provision is made to produce two distinct simultaneous sequences of terms AiBj and to sum them two by two. The first sequence, noted AiBj, is produced directly by the counters CNT-A and CNT-B, and the second sequence, noted Ai/Bj/ is produced by the complements of the counters CNT-A and CNT-B. In addition, the manner of shifting the counter CNT-A is changed so that all the terms to be summed are produced without any duplications or missing terms.
The diagram indicates the resulting sequences for the values i+j, i/+j/, and the complements CNT-A/, CNT-B/, of the counters CNT-A and CNT-B. It can be seen that all the combinations of the pairs (i, j) are represented over 8 cycles in the values taken by the counters or their complements.
The demultiplexers DMUX, DMUX′ are respectively controlled by the values i+j and i/+j/, produced respectively by the adder 14 and an adder 14′ receiving the complements of the contents of the counters CNT-A and CNT-B. The sum of the two numbers formed by the outputs of the demultiplexers is used to program the increment of the counter CNT-R. In the example, two 6-bit numbers are summed, which may require, in theory, a comprehensive 6-bit adder. In reality, given the special structure used, it transpires that a simple OR gate is sufficient to add the most significant bits and the least significant bits. Indeed, as shown in the diagram in
It is for the central rank, 3 in the example, that two bits which can both be on 1 are summed. To do so, a semi-adder formed, as represented, by an OR-exclusive gate and an AND gate, each receiving the outputs of rank 3 of the two demultiplexers, is provided. The OR-exclusive gate supplies the bit of rank 3 of the sum and the AND gate supplies a carry digit. The bit of rank 4 of the sum is formed by an OR of the outputs of rank 4 of the two demultiplexers, followed by an OR with the carry digit. There is no other carry digit to be taken into account as, if the semi-adder produces a carry digit, this means that all the outputs of the demultiplexers, except those of rank 3, are zero.
The comparator 16, which determines the value of the increment to be applied to the counter CNT-A, is configured to compare the content of the counter CNT-B with the values 1 and 3, i.e. the median value and the maximum value of the content of the counter. With this structure, the increment programmed for the counter CNT-R can be of any type, and different from a power of 2. In other words, several increment programming inputs can be simultaneously on 1. The counter structures in
In addition, a logic cell 18 is associated with each flip-flop to both “request” a toggle from the next cell through an AND gate 20 and decide to toggle the current flip-flop through an XOR gate 22, the output of which is connected to the T input of the flip-flop. A toggle “request” is equivalent, in a conventional synchronous counter, to the indication that all the lower-ranking flip-flops are at “1” and thus that the next flip-flop must toggle. In the structure of
The cell 18 is configured to toggle the associated flip-flop upon the next clock edge CK, if one of the following conditions is met:
a) The programming input is at 0 and the previous cell requests the toggle of the flip-flop (case of the conventional synchronous counter).
b) The programming input is at 1, and the previous cell does not request the toggle of the flip-flop (case of the counter in
This functionality is obtained using the XOR gate 22, a first input of which is connected to the programming input, and a second input of which receives the output of the AND gate 20 of the previous cell.
The cell 18 is further configured to freeze the flip-flop if one of the following conditions is met:
c) The programming input is at 0 and the previous cell does not request the toggle of the flip-flop (case of the conventional synchronous counter).
d) The programming input is at 1, and the previous cell requests the toggle of the flip-flop. In other words, if the toggle is “requested” twice, the flip-flop does not toggle, which in fact amounts to toggling the flip-flop twice. This occurs when a lower-ranking flip-flop has its programming input at 1.
This functionality is also obtained using the XOR gate 22. A toggle request is produced by the AND gate 20 in the following cases:
e) In case a) when the flip-flop contains 1.
f) In case b) when the flip-flop contains 1.
g) In case d), independently of the content of the flip-flop. It is assumed that the flip-flop has toggled twice even though it was frozen. If it includes 0, this means that the flip-flop transitioned to 1 (virtually) and that a corresponding toggle request must be propagated.
The functions e) and f) are obtained by an OR gate 24 combining the toggle request of the previous cell and the programming input. A first input of the AND gate 20 receives the output of gate 24, and a second input receives the Q output of the flip-flop.
The function g) is obtained by an AND gate 26 combining the toggle request of the previous cell and the programming input. An OR gate 28 is inserted between the Q output of the flip-flop and gate 20, and forces the second input of gate 20 to 1 when the output of gate 26 is active.
Number | Date | Country | Kind |
---|---|---|---|
1254421 | May 2012 | FR | national |
Number | Name | Date | Kind |
---|---|---|---|
3670956 | Calhoun | Jun 1972 | A |
3816732 | Jackson | Jun 1974 | A |
3878985 | Ghest et al. | Apr 1975 | A |
3919535 | Vattuone | Nov 1975 | A |
3947670 | Irwin et al. | Mar 1976 | A |
4104729 | Gingell | Aug 1978 | A |
5095457 | Jeong | Mar 1992 | A |
5446909 | Intrater et al. | Aug 1995 | A |
5883825 | Kolagotla | Mar 1999 | A |
6167421 | Meeker et al. | Dec 2000 | A |
6438570 | Miller | Aug 2002 | B1 |
6938061 | Rumynin et al. | Aug 2005 | B1 |
7139788 | Talwar et al. | Nov 2006 | B2 |
8667046 | Brisk et al. | Mar 2014 | B2 |
20130304787 | Le-Gall | Nov 2013 | A1 |
Number | Date | Country |
---|---|---|
1476603 | Jun 1977 | GB |
Number | Date | Country | |
---|---|---|---|
20130304787 A1 | Nov 2013 | US |