This application is based upon and claims the benefit of priority from Japanese Patent Application No. 2021-203643, filed on Dec. 15, 2021; the entire contents of which are incorporated herein by reference.
Embodiments described herein relate generally to a calculation system.
In a calculation system, there is a case where predetermined arithmetic operations are performed and a plurality of signals are generated as arithmetic operation results. In the calculation system, it is desirable that the plurality of signals thus generated should be usable efficiently.
In general, according to one embodiment, there is provided a calculation system including a plurality of multiplying elements, a plurality of adding elements, a first processing circuit and a second processing circuit. The plurality of multiplying elements is arrayed to form a plurality of rows and a plurality of columns. The plurality of multiplying elements is configured to multiply a plurality of first signals by respective weights to generate a plurality of calculation results. The plurality of adding elements are configured to calculate a sum of calculation results in each column among the plurality of calculation results to generate a plurality of second signals individually for the plurality of columns. The first processing circuit is configured to receive the plurality of second signals generated by the adding elements, and to extract values corresponding to certain second signals among the plurality of second signals. The second processing circuit including a plurality of address circuits corresponding to the plurality of second signals, and configured to selectively enable address circuits corresponding to the certain second signals among the plurality of address circuits.
Exemplary embodiments of a calculation system will be explained below in detail with reference to the accompanying drawings. The present invention is not limited to the following embodiments.
A calculation system 1 according to a first embodiment includes, for example, a circuit that performs part of the processing of a neural network. As illustrated in
The calculation system 1 illustrated in
The memory array MA includes a plurality of memories M(j, i) to M(j+3, i+3). In the memory array MA, a plurality of memories M(j, i) to M(j+3, i+3) are arranged, in a matrix format, at the positions where the plurality of word lines WLj to WLj+3 intersect with the plurality of bit lines BLi to BLi+3 (each of “i” and “j” is an integer greater than or equal to 1). Here,
Each of the memories M(j, i) to M(j+3, i+3) has one end connected to a word line WL and the other end connected to a bit line BL. Each of the memories M(j, i) to M(j+3, i+3) is a resistance change type memory, for example, in which the resistance state can be set to a resistance value according to the corresponding one of weights Wj, i to Wj+3, i+3. The resistance values of the respective memories M(j, i) to M(j+3, i+3) can be set to, for example, 1/Wj, i to 1/Wj+3, i+3. Each of the memories M(j, i) to M(j+3, i+3) functions as a multiplying element that multiplies a received signal by the corresponding one of the weights Wj, i to Wj+3, i+3, and generates a signal of the multiplication result. In each of the memories M(j, i) to M(j+3, i+3), the voltage X of the word line WL is applied to one end, and, in accordance with the voltage X of the word line WL and the set weight W, a current is caused to flow through the bit line BL as a multiplication result. The currents of the memories M in each column are added up on the bit line BL and form a current Y as an addition result. That is, each of the bit lines BLi to BLi+3 functions as an adding element that adds up signals from a plurality of memories M arranged in the column direction.
For example, as shown by dotted line arrows in
Alternatively, although not illustrated, the voltage Xj of the word line WLj of the (j)th row is applied to one end of the memory M(j, i+3), and a current Xj×Wj, i+3 is caused to flow from the other end of the memory M(j, i+3) to the bit line BLi+3 of the (i+3)th column. The voltage Xj+3 of the word line WLj+3 of the (j+3)th row is applied to one end of the memory M(j+3, i+3), and a current Xj+3×Wj+3, i+3 is caused to flow from the other end of the memory M(j+3, i+3) to the bit line BLi+3 of the (i+3)th column. The currents Xj×Wj, i+3 to Xj+3×Wj+3, i+3 are added up on the bit line BLi+3, and become, as an addition result, a current Yi+3 (=Xj×Wj, i+3+Xj+1×Wj+1, i+3+Xj+2×Wj+2, i+3+Xj+3×Wj+3, i+3) .
The processing circuit 2 is supplied with the currents Yi to Yi+3 through the bit lines BLi to BLi+3. The currents Yi to Yi+3 correspond to voltages Vi to Vi+3 to be accumulated in the input nodes to the processing circuit 2. Each of the voltages Vi to Vi+3 is an analog signal representing a product-sum operation result for each column. In the processing circuit 2, an analog signal (voltage V) is AD-converted to a digital signal for each column by the processing circuit 2. The processing circuit 2 extracts digital signals of the upper K values from the digital signals of the plurality of columns, and generates a plurality of upper flag values. The plurality of upper flag values corresponds to the digital signals of a plurality of columns. Each of the upper flag values indicates whether the corresponding signal is a part of the upper K values or not.
The address solution circuit 3 includes a plurality of address circuits that correspond to a plurality of columns. Each address circuit is configured to output an address signal. The address signal indicates the address of the corresponding column. The address solution circuit 3 obtains a plurality of upper flag values from the processing circuit 2. In accordance with the plurality of upper flag values, the address solution circuit 3 selectively enables the address circuits corresponding to the upper K digital signals among the plurality of address circuits. The address solution circuit 3 causes address signals to be sequentially output from the K address circuits each enabled.
As a result, for the upper K digital signals, the address solution can be performed by processing in K cycles. Therefore, the address solution can be performed more efficiently as compared with a case where the address solution is performed by processing in cycles corresponding to the number of columns.
Next, an explanation will be given of the configuration of the processing circuit 2 with reference to
The processing circuit 2 performs a plurality of AD-conversion operations of the SAR(Successive Approximation Register)-type in parallel, to the signals that correspond to multiplication results in the plurality of columns and have been received from the plurality of bit lines BL, while searching for the upper K signals among the signals. The processing circuit 2 includes a plurality of local circuits 21-i, 21-(i+1), etc., a global circuit 22, and a controller 23. The global circuit 22 includes a DAC 221 for the global SAR and a parallel counter 222.
For simplification,
The local circuit 21 for each column includes a comparator 211 and a logic circuit 212. The logic circuit 212 includes an AND gate 213, a flip-flop 214, and an AND gate 215. The local circuits 21 for the respective columns have the same configuration.
The comparator 211 compares an input signal Vi+1 with a global reference signal VDAC supplied by the DAC 221 for the global SAR. The comparator 211 outputs a local signal yi, yi+1, which has been binarized (L/H or 0/1), as a comparison result in accordance with a clock CLK1. The comparator 211 receives a disable signal DISABLEi, DISABLEi+1 from the logic circuit 212. The comparator 211 is disabled in accordance with the disable signal DISABLEi, DISABLEi+1.
The AND gate 213 calculates a logical product between a logically inverted signal of the local signal yi, yi+1 and a global signal TOP_K, and outputs the calculation result to the flip-flop 214. The AND gate 215 calculates a logical product between an upper flag MAXi, MAXi+1 and a clock CLK2, and outputs the calculation result to the flip-flop 214 as a clock signal. The flip-flop 214 receives the calculation result of the AND gate 213 at a data input terminal D and receives the calculation result of the AND gate 215 at a clock input terminal.
The flip-flop 214 outputs, from an inversion output terminal nQ, an upper flag MAXi, MAXi+1, which indicates whether a signal Vi, Vi+1 of the corresponding column can be a signal among the upper K values, among the signals V of the plurality of columns input to the processing circuit 2. The flip-flop 214 may be a latch circuit. The flip-flop 214 outputs, from a non-inversion output terminal Q, a disable signal DISABLEi, DISABLEi+1 for disabling the comparator 211 when the signal Vi, Vi+1 of the corresponding column cannot be a signal among the upper K values. The disable signal DISABLEi, DISABLEi+1 can be used to limit (power gating) the power to be consumed by a plurality of comparators 211 during a sequential comparison (SAR) operation by the processing circuit 2.
The flip-flop 214 operates in synchronization with a clock signal output from the AND gate 215. The AND gate 215 is provided to restrict the clock CLK2 (clock gating) so as to prevent the state of the disable signal DISABLEi, DISABLEi+1 from changing when the upper flag MAXi, MAXi+1 is at the L level. That is, when the upper flag MAX is at the L level, the flip-flop 214 of a column, in which the comparator 211 has been disabled, keeps the disable signal DISABLE at the H level and keeps the upper flag MAX at the L level, in accordance with the clock signal received by the clock input terminal being fixed at the L level.
The inversion input terminals (−) of the comparators 211 for the respective columns are driven in parallel by the DAC 221 for the global SAR, and the signals V are sequentially processed from the most significant bit (MSB) to the least significant bit (LSB) in accordance with the SAR algorithm. The DAC 221 for the global SAR includes a global SAR register 221a and a global DAC 221b. The global SAR register 221a is a shift register including registers at a plurality of stages, in which the input value and the value at each stage are shifted in synchronization with the clock CLK1. The global SAR register 221a is configured to store “1” as the initial value in the register at the top stage at the time of its startup. The global DAC 221b receives the value at each stage of the shift register, performs DA conversion thereon, and outputs the converted analog voltage as the global reference signal VDAC.
Here, the controller 23 may be a local controller provided individually for each column, or may be a global controller provided for the respective columns in common. In
The input that determines the SAR transition in the global SAR register 221a is driven by the parallel counter 222.
The parallel counter 222 counts how many outputs among the local signals yi, yi+1, etc. output from the comparators 211 of the respective columns are at the H level (or 1), in each DA conversion cycle, and outputs the global signal TOP_K to the AND gate 213 for each column and the global SAR register 221a in accordance with the counted value. When the global signal TOP_K is supplied, the global SAR register 221a stores the value of the global signal TOP_K in the register at the first stage, and shifts the value held in the register at each stage.
In searching for the upper K values, the parallel counter 222 outputs the global signal TOP_K=H level (or 1) when the counted value is greater than or equal to K, and outputs the global signal TOP_K=L level (or 0) when the counted value is smaller than K. The concrete configuration of the parallel counter 222 may be implemented by a digital circuit or may be implemented by an analog circuit.
For example, the processing circuit 2 is operated as illustrated in
At timing t1, the DAC 221 for the global SAR sets the global reference voltage VDAC=VREF/2. The comparators 211 for the respective columns (the 0th column to the 7th column) compare the signals V0 to V7 with the global reference voltage VDAC=VREF/2, and output local signals (y0, y1, y2, y3, y4, y5, y6, y7)=(0, 1, 0, 1, 0, 1, 1, 1) as comparison results.
At timing t2, the parallel counter 222 counts the number of local signals whose value is 1, and causes the global signal TOP_K to transition from the L level (or 0) to the H level (or 1) since the counted value=5 is greater than or equal to K=4.
At timing t3, the logic circuit 212 for each of the 0th, 2nd, and 4th columns (i=0, 2, 4) changes the upper flag MAXi from H (or 1) to L (or 0) and changes the disable signal DISABLEi from L (or 0) to H (or 1), in light of the state where the signal Vi cannot be one of the upper K because the local signal yi=0 and the global signal TOP_K=1. Consequently, as shown by a dotted line in the waveform of the signal Vi, the comparator 211 for each of the 0th, 2nd, and 4th columns receives the disable signal DISABLEi=H (or 1) and is disabled in operation. Therefore, the power consumption of the comparator 211 is stopped (power gating).
On the other hand, the logic circuit 212 for each of the 1st, 3rd, and 5th to 7th columns (i=1, 3, 5 to 7) keeps the upper flag at MAXi=H (or 1) and keeps the disable signal at DISABLEi=L (or 0), in light of the state where the signal Vi can be one of the upper K because the local signal yi=1 and the global signal TOP_K=1.
At timing t4, the DAC 221 for the global SAR sets the global reference voltage VDAC=3VREF/4. Each of the comparators 211 for the 1st, 3rd, 5th, and 6th columns (i=1, 3, 5, 6) outputs the local signal yi=0, and, at timing t5, the global signal TOP_K=0 is made. This means that the number of signals higher than the global reference signal VDAC is smaller than K. In this case, since it is not possible to determine which signal of a plurality of signals V1, V3, V5 to V7 belongs to the upper K signals, the determination is suspended.
At timing t6, the DAC 221 for the global SAR sets the global reference voltage VDAC=5VREF/8. Although the comparator 211 for each of the 3rd and 6th columns (i=3, 6) outputs the local signal yi=1, as the comparator 211 for each of the 1st and 5th columns (i=1, 5) outputs the local signal yi=0, the global signal TOP_K=0 is kept. This means that the number of signals higher than the global reference signal VDAC is still smaller than K. Also in this case, the determination is kept suspended.
At timing t7, the DAC 221 for the global SAR sets the global reference voltage VDAC=9VREF/16. The comparators 211 for the respective columns (the 1st, 3rd, and 5th to 7th columns) not disabled at this time compare the signals V1, V3, and V5 to V7 with the global reference voltage VDAC=9VREF/16, and output local signals (y1, y3, y5, y6, y7)=(1, 1, 0, 1, 1) as comparison results.
At timing t8, the parallel counter 222 counts the number of local signals whose value is 1, and causes the global signal TOP_K to transition from the L level (or 0) to the H level (or 1) since the counted value=4 is greater than or equal to K=4.
At timing t9, the logic circuit 212 for the 5th column (i=5) changes the upper flag MAXi from H (or 1) to L (or 0) and changes the disable signal DISABLEi from L (or 0) to H (or 1), in light of the state where the signal Vi cannot be one of the upper K because the local signal yi=0 and the global signal TOP_K=1. Consequently, as shown by a dotted line in the waveform of the signal Vi, the comparator 211 for the 5th column receives the disable signal DISABLEi=H (or 1) and is disabled in operation. Therefore, the power consumption of the comparator 211 is stopped (power gating).
On the other hand, the logic circuit 212 for each of the 1st, 3rd, 6th, and 7th columns (i=1, 3, 6, 7) keeps the upper flag at MAXi=H (or 1) and keeps the disable signal at DISABLEi=L (or 0), in light of the state where the signal Vi can be one of the upper K because the local signal yi=1 and the global signal TOP_K=1.
When B denotes the number of bits according to the precision of conversion, the processing circuit 2 outputs, at timing t10 after B=4 cycles, the upper flags (MAX0, MAX1, MAX2, MAX3, MAX4, MAX5, MAX6, MAX7)=(0, 1, 0, 1, 0, 0, 1, 1) as the result of searching for the upper K values to the address solution circuit 3. In this example, it is illustrated that, as the searching result, the signals V1, V3, V6, and V7 of the 1st, 3rd, 6th, and 7th columns have been found as the upper K values among the signals V0 to V7 of the 0th column to the 7th column.
The address solution circuit 3 illustrated in
The address solution circuit 3 may be configured as illustrated in
As illustrated in
The plurality of address circuits 31-i to 31-(i+3) respectively correspond to the plurality of columns (the (i)th to (i+3)th columns) of the memory array MA. The shift register 33 includes a plurality of register circuits 32-i to 32-(i+3). The plurality of register circuits 32-i to 32-(i+3) are connected in series between an input node 33a and an output node 33b. Each register circuit 32 includes a register 321. The register 321 may be formed of a flip-flop. The plurality of register circuits 32-i to 32-(i+3) respectively correspond to the plurality of address circuits 31-i to 31-(i+3).
The global circuit 36 receives the clock CLK from the outside, and generates a clock CLK_TOPK, an enable signal TOPK_EN, and a pulse TOPK_START in accordance with the clock CLK. In synchronization with the clock CLK, the global circuit 36 sets the enable signal TOPK_EN to the active level and supplies this signal to the output circuit 34. The output circuit 34 is activated in accordance with the enable signal TOPK_EN being at the active level, and comes into a state ready to output signals present on the address bus addr<0:7> as address signals addr.
In response to the enable signal TOPK EN changing to the active level, the global circuit 36 supplies the pulse TOPK_START to the register circuit 32-i at the top of the shift register 33. The global circuit 36 logically inverts the clock CLK to generate the clock CLK_TOPK, and supplies the clock CLK_TOPK to the register 321 of each register circuit 32.
The shift register 33 may be reconfigured in accordance with the upper flags MAXi of the plurality of columns. The address solution circuit 3 connects the register 321 between the input node 32a and the output node 32b, in each of the register circuits 32 corresponding to the upper K upper flag values among the plurality of register circuits 32-i to 32-(i+3). The address solution circuit 3 bypasses the register 321 between the input node 32a and the output node 32b, in each of the remaining register circuits 32. With this arrangement, the address solution circuit 3 reconfigures the shift register 33.
The reconfigured shift register 33 receives an input of the pulse TOPK_START, which is supplied alone. The shift register 33 transmits the pulse TOPK_START by sequentially shifting this pulse between the registers 321 corresponding to the upper K upper flag values. Correspondingly, the address circuits 31 corresponding to the upper K upper flag values among the plurality of address circuits 31-i to 31-(i+3) are selectively and sequentially enabled. Consequently, the address signals are sequentially output from the address circuits 31 corresponding to the upper K upper flag values to the address bus addr<0:7>. That is, the shift register 33 may be reconfigured such that the address solution for the K address signals corresponding to the upper K signals can be performed in K cycles. Therefore, the address solution circuit 3 can be configured in a scalable manner for the “K” number of uppers and may perform a scalable address solution with K cycles for the upper K values.
The respective register circuits 32 are switchable between a first connection state and a second connection state in accordance with the upper flags MAXi to MAXi+3 and the inversion upper flags MAXi
Upon reception of the pulse TOPK START from the final register circuit 32-(i+3), the transfer detection circuit 35 generates a pulse TOPK_nSTOP indicating that the transfer of the pulse TOPK_START in the shift register 33 is completed, and supplies the pulse TOPK_nSTOP to the global circuit 36. The transfer detection circuit 35 may be formed of a flip-flop. Upon reception of the pulse TOPK_nSTOP, the global circuit 36 sets the enable signal TOPK_EN to the non-active level, and supplies this signal to the output circuit 34. The output circuit 34 is deactivated in accordance with the enable signal TOPK_EN being at the non-active level, and comes into a state not to output signals present on the address bus addr<0:7>.
Each of the plurality of address circuits 31-i to 31-(i+3) can store an address signal fixedly, and may be configured using a hard-wired circuit. Each address circuit 31 includes a storage circuit 311 and an enable circuit 312.
The storage circuit 311 stores the address signal. The storage circuit 311 may store the address signal fixedly.
The enable circuit 312 can enable or disable the storage circuit 311 in accordance with the corresponding one of the upper flag values MAXi to MAXi+3 and the connection state of the corresponding register circuit 32. When the storage circuit 311 is enabled, a state is formed to output the address signal from the storage circuit 311 to the address bus addr<0:7>. When the storage circuit 311 is disabled, a state is formed not to output the address signal from the storage circuit 311 to the address bus addr<0:7>. The state may be a state where the outputs are set to high impedance, disconnecting the storage circuit 311 from the address bus addr<0:7>.
For example, when to store an 8-bit address, the storage circuit 311 for the (i)th column may include a hard-wired circuit 311a, as illustrated in
The plurality of lines nB0 to nB7, the plurality of tri-state inverters IV0 to IV7, and the plurality of lines B0 to B7 correspond to each other. The line nB0, the tri-state inverter IV0, and the line B0 correspond to the LSB of the address, and the line nB7, the tri-state inverter IV7, and the line B7 correspond to the MSB of the address. Each of the common line E and the common line nE are shared by the plurality of tri-state inverters IV0 to IV7. The plurality of lines B0 to B7 respectively correspond to a plurality of address lines addr<0> to addr<7> contained in the address bus.
Each of the lines nB0 to nB7 is connected by hard-wired connection to a fixed potential according to the address signal. The example of
Each of the tri-state inverters IV0 to IV7 may be configured as illustrated in
The tri-state inverter IV includes an NMOS transistor NM1, an NMOS transistor NM2, a PMOS transistor PM1, and a PMOS transistor PM2. The NMOS transistor NM1 and the PMOS transistor PM1 are inverter-connected and share a common input node N1 and a common output node N2. The NMOS transistor NM2 and the PMOS transistor PM2 are inserted to the output node N2 of the inverter connection as a switch for activating/deactivating the inverter connection.
The NMOS transistor NM1 is connected between the ground potential and the NMOS transistor NM2. The NMOS transistor NM2 is connected between the NMOS transistor NM1 and the PMOS transistor PM2. The PMOS transistor PM2 is connected between the NMOS transistor NM2 and the PMOS transistor PM1. The PMOS transistor PM1 is connected between the PMOS transistor PM2 and the power supply potential Vdd.
In the NMOS transistor NM1, the source is connected to the ground potential, the drain is connected to the NMOS transistor NM2, and the gate is connected to the line nB via the input node N1.
In the NMOS transistor NM2, the source is connected to the NMOS transistor NM1, the drain is connected to the line B via the output node N2, and the gate is connected to the line E.
In the PMOS transistor PM1, the source is connected to the power supply potential, the drain is connected to the PMOS transistor PM2, and the gate is connected to the line nB via the input node N1.
In the PMOS transistor PM2, the source is connected to the PMOS transistor PM1, the drain is connected to the line B via the output node N2, and the gate is connected to the line nE.
Each of the tri-state inverters IV0 to IV7 is activated and operates as an inverter, when the level of the common line E is set to the active level (for example, the H level), and the common line nE is set to the active level (for example, the L level). In the example illustrated in
Each of the tri-state inverters IV0 to IV7 is deactivated and stops the operation as the inverter, when the level of the common line E is set to the non-active level (for example, the L level), and the common line nE is set to the non-active level (for example, the H level). In the example illustrated in
Returning to
The register circuit 32 for the (i)th column includes, in addition to the register 321, a signal line L1, a bypass line L2, a switch 322, a switch 323, a switch 324, and a switch 325. The register 321-i may be formed of a flip-flop. The switch 322, the switch 323, and the switch 324 are examples of switching elements, and are formed of transistors, for example.
The signal line L1 and the bypass line L2 are connected in parallel to each other between the input node 32a and the output node 32b. The signal line L1 includes a first end connected to the input node 32a and a second end connected to the output node 32b. The bypass line L2 includes a first end connected to the input node 32a and a second end connected to the output node 32b.
The switch 322 is arranged on the signal line L1, and is turned on/off in accordance with the upper flag MAXi. The switch 322 is turned on and activates a part of the signal line L1, when the upper flag MAXi is at the active level (the H level or level “1”). The switch 322 is turned off and deactivates the part of the signal line L1, when the upper flag MAXi is at the non-active level (the L level or level “0”). The switch 322 includes a first end connected to the input node 32a and a second end connected to a data input node D of the register 321-i.
The switch 323 is arranged on the signal line L1, and is turned on/off in accordance with the upper flag MAXi. The switch 323 is turned on and activates a part of the signal line L1, when the upper flag MAXi is at the active level (the H level or level “1”). The switch 323 is turned off and deactivates the part of the signal line L1, when the upper flag MAXi is at the non-active level (the L level or level “0”). The switch 323 includes a first end connected to an output node Q of the register 321-i and a second end connected to the output node 32b.
The switch 324 is arranged on the bypass line L2, and is turned on/off in accordance with an inversion upper flag MAXi
The switch 325 is arranged between the signal line L1 and the ground potential, and is turned on/off in accordance with the inversion upper flag MAXi
For example, the address solution circuit 3 is operated as illustrated in
At timing t11, in synchronization with the clock CLK, the global circuit 36 (see
In synchronization with the clock CLK, the global circuit 36 generates a single pulse TOPK_START with a pulse width of one period of the clock CLK, and supplies this pulse to the register circuit 32-0 at the top of the shift register 33.
At this time, as illustrated in
In accordance with the upper flag MAX0=0 and the inversion upper flag MAX0
In accordance with the upper flag MAX1=1 and the inversion upper flag MAX1
In accordance with the upper flag MAX2=0 and the inversion upper flag MAX2
In accordance with the upper flag MAX3=1 and the inversion upper flag MAX3
In accordance with the upper flag MAX4=0 and the inversion upper flag MAX4
In accordance with the upper flag MAX5=0 and the inversion upper flag MAX5
In accordance with the upper flag MAX6=1 and the inversion upper flag MAX6
In accordance with the upper flag MAX7=1 and the inversion upper flag MAX7
In the example of
At timing t12 illustrated in
At timing t13 illustrated in
Correspondingly, as illustrated in
Together with the above, the select signal SEL3 changes from 0 to 1, and the enable circuit 312 of the address circuit 31-3 sets the common line E to the active level (for example, the H level) and sets the common line nE to the active level (for example, the L level). Consequently, the storage circuit 311 of the address circuit 31-3 is activated. The storage circuit 311 outputs an address value of the signals corresponding to the upper flag MAX3 to the address bus addr<0:7>. The lines B0 to B7 respectively output levels of (B0, B1, B2, B3, B4, B5, B6, B7)=(1, 1, 0, 0, 0, 0, 0, 0) as address signals to the corresponding address lines addr. In this example, since B0 corresponds to LSB and B7 corresponds to MSB, the address value is “00000011” in binary notation and “3” in decimal notation.
At timing t14 illustrated in
Correspondingly, as illustrated in
Together with the above, the select signal SEL6 changes from 0 to 1, and the enable circuit 312 of the address circuit 31-6 sets the common line E to the active level (for example, the H level) and sets the common line nE to the active level (for example, the L level). Consequently, the storage circuit 311 of the address circuit 31-6 is activated. The storage circuit 311 outputs an address value of the signals corresponding to the upper flag MAX6 to the address bus addr<0:7>. The lines B0 to B7 respectively output levels of (B0, B1, B2, B3, B4, B5, B6, B7)=(0, 1, 1, 0, 0, 0, 0, 0) as address signals to the corresponding address lines addr. In this example, since B0 corresponds to LSB and B7 corresponds to MSB, the address value is “00000110” in binary notation and “6” in decimal notation.
At timing t15 illustrated in
Correspondingly, as illustrated in
Together with the above, the select signal SEL7 changes from 0 to 1, and the enable circuit 312 of the address circuit 31-7 sets the common line E to the active level (for example, the H level) and sets the common line nE to the active level (for example, the L level). Consequently, the storage circuit 311 of the address circuit 31-7 is activated. The storage circuit 311 outputs an address value of the signals corresponding to the upper flag MAX7 to the address bus addr<0:7>. The lines B0 to B7 respectively output levels of (B0, B1, B2, B3, B4, B5, B6, B7)=(1, 1, 1, 0, 0, 0, 0, 0) as address signals to the corresponding address lines addr. In this example, since B0 corresponds to LSB and B7 corresponds to MSB, the address value is “00000111” in binary notation and “7” in decimal notation.
At timing t16 illustrated in
Correspondingly, the select signal SEL7 changes from 1 to 0, and the enable circuit 312 of the address circuit 31-7 sets the common line E to the non-active level (for example, the L level) and sets the common line nE to the non-active level (for example, the H level). Consequently, the storage circuit 311 of the address circuit 31-7 is deactivated. Together with this, the transfer detection circuit 35 supplies the output that has become 1 to the global circuit 36 as the pulse TOPK_nSTOP.
In accordance with the pulse TOPK_nSTOP, at timing t17, the global circuit 36 causes the enable signal TOPK_EN to transition from the active level (for example, the H level) to the non-active level (for example, the L level), and supplies this signal to the output circuit 34, in synchronization with the clock CLK. The output circuit 34 comes into a state not to output signals present on the address bus addr<0:7>. Consequently, the outputting of address signals from the address solution circuit 3 to the address bus addr<0:7> is completed.
At timing t18, the output of the transfer detection circuit 35 changes from 1 to 0, and notification from the transfer detection circuit 35 to the global circuit 36 is completed.
The address solution in the calculation system 1 is performed in a sequence of the following (1) to (3).
(1) In the array of a plurality of memory elements that form a plurality of rows and a plurality of columns, a product-sum operation is performed by calculating the product between the word line voltage input to each row and the weight of each memory element, and adding up these products as the bit line current of each column. The processing circuit 2 processes the signals of the product-sum operation results on a plurality of columns, and generates a plurality of upper flags MAX corresponding to the plurality of columns. Among the product-sum operation results of the plurality of columns, the processing circuit 2 sets the upper flags corresponding to the upper K product-sum operation results to MAX=1, and sets the remaining upper flags to MAX=0. The upper flags MAX of the plurality of columns are supplied from the processing circuit 2 to the address solution circuit 3.
(2) In accordance with the upper flags of the plurality of columns, the address solution circuit 3 performs address solution to the signals of the upper K product-sum operation results among the product-sum operation results of the plurality of columns. That is, among a plurality of register circuits that correspond to the plurality of columns and each include a register, the address solution circuit 3 connects the register between the input and output nodes, in each register circuit corresponding to the upper flag MAX=1, and bypasses the register between the input and output nodes, in each register circuit corresponding to the upper flag MAX=0. Consequently, the address solution circuit 3 selectively connects K registers 321 corresponding to the upper flag MAX=1, among the registers of the plurality of register circuits, between the input and output nodes of the shift register 33, and thereby reconfigures the shift register 33 as a K bit shift register.
(3) The address solution circuit 3 propagates a pulse of one bit sequentially to the registers 321 at K stages in the reconfigured shift register 33, and, in accordance with this, sequentially and selectively enables K address circuits corresponding to the upper K among the plurality of address circuits corresponding to the plurality of columns. In each enabled address circuit, address signals stored by, for example, hard-wired connection are output to the address bus. Consequently, the address values corresponding to the upper K product-sum operation results are sequentially output, and the address solution in K cycles is realized.
As described above, according to the first embodiment, in the address solution circuit 3 of the calculation system 1, the plurality of address circuits 31 are provided corresponding to the plurality of columns of the memory array MA. Further, in accordance with the upper flags of the plurality of columns, the address circuits 31 corresponding to the upper K product-sum operation results, among the product-sum operation results of the plurality of columns, are selectively and sequentially enabled and caused to output address values. Consequently, the address solution for the upper K signals can be performed by an operation in cycles corresponding to K (smaller than or equal to the number of columns), and thus the address solution for the upper K signals can be performed more efficiently. Therefore, for example, when the upper K of a plurality of signals output from the plurality of columns of the memory array MA are required for use, the plurality of signals can be efficiently utilized.
For example, when the address solution for the upper K signals is performed in linear search, the respective columns of the plurality of columns are sequentially selected. The upper flag value of each selected column is checked, and the address value is output when the upper flag MAX=1, but the address value is not output when the upper flag MAX=0. This processing is performed sequentially for the respective columns of the plurality of columns. Therefore, the address solution is performed by an operation in cycles corresponding to the number of columns.
On the other hand, according to the first embodiment, the address solution for the upper K signals can be performed by an operation in cycles corresponding to K smaller than or equal to the number of columns. Therefore, the address solution for the upper K signals can be performed more efficiently.
Further, according to the first embodiment, in the address solution circuit 3 of the calculation system 1, each of the plurality of address circuits 31 stores the address value in a hard-wired configuration, and the shift register 33 is reconfigured such that K registers are selectively used in accordance with the upper flags of the plurality of columns. Consequently, the shift register 33 can be reconfigured such that the address solution for the K address signals corresponding to the upper K signals can be performed in K cycles. Therefore, the address solution circuit 3 can be configured in a scalable manner for the “K” number of uppers. That is, the address solution for the upper K signals can be performed more scalably. It follows that the circuit design for the address solution circuit 3 can be facilitated, and the area of the address solution circuit 3 can be reduced.
Next, a calculation system 401 according to a second embodiment will be described. Hereinafter, an explanation will be given by mainly focusing on part different from the first embodiment.
In the first embodiment, the configuration and operation for address solution have been illustrated for the upper K signals. On the other hand, in the second embodiment, the configuration and operation for address solution will be illustrated for the lower K signals.
The calculation system 401 includes a processing circuit 402 and an address solution circuit 403 in place of the processing circuit 2 and the address solution circuit 3 (see
The local circuit 421-i, 421-(i+1) in each column includes a comparator 4211, which is in a state where the two input terminals of the comparator 211 (see
In a global circuit 422, a parallel counter 4222 counts how many outputs among the local signals yi, yi+1, etc. output from the comparators 4211 of the respective columns are at the H level (or 1), in each DA conversion cycle, and outputs a global signal BOT_K to an AND gate 213 for each column and a global SAR register 4221a in accordance with the counted value. In searching for the lower K values, the parallel counter 4222 outputs the global signal BOT_K=H level (or 1) when the counted value is greater than or equal to K, and outputs the global signal BOT_K=L level (or 0) when the counted value is smaller than K. When the global signal BOT_K is supplied, the global SAR register 4221a stores a value obtained by logically inverting the value of the global signal BOT_K, in the register at the first stage, and shifts the value held in the register at each stage.
With this configuration, the processing circuit 402 searches for the lower K values while performing AD-conversion operations of the SAR-type, as illustrated in
The comparator 4211 in each column receives the signal Vi at the inversion input terminal (−), and receives the global reference voltage VDAC at the non-inversion input terminal (+). Accordingly, when the signal Vi is lower than the global reference voltage VDAC, the comparator 4211 outputs the local signal yi=1, and, when the signal Vi is higher than the global reference voltage VDAC, the comparator 4211 outputs the local signal yi=0.
For example, at timing t31, the DAC 4221 for the global SAR sets the global reference voltage VDAC=VREF/2. The comparators 4211 for the respective columns compare the signals V0 to V7 with the reference voltage VDAC=VREF/2, and output local signals (y0, y1, y2, y3, y4, y5, y6, y7)=(1, 0, 1, 0, 1, 0, 0, 0) as comparison results. Correspondingly, the parallel counter 4222 counts the number of local signals yi whose value is 1. Since the counted value=3 is smaller than K=4, at timing t32, the global signal BOT_K changes to the L level (or 0), and a global signal inversion signal BOT_K
At timing t33, the DAC 4221 for the global SAR sets the global reference voltage VDAC=3VREF/4. The comparators 4211 for the respective columns compare the signals V0 to V7 with the reference voltage VDAC=3VREF/4, and outputs local signals (y0, y1, y2, y3, y4, y5, y6, y7)=(1, 1, 1, 1, 1, 1, 1, 0) as comparison results. The parallel counter 4222 counts the number of local signals yi whose value is 1. Since the counted value=7 is greater than or equal to K=4, at timing t34, the global signal BOT_K changes to the H level (or 1), and the global signal inversion signal BOT_K
At timing t35, in the local circuit 421 for each of the 0th to 6th columns (i=0 to 6), since the comparator 4211 outputs the local signal yi=1, while the global signal BOT_K=1, the flip-flop 214 keeps its output in the original state. That is, the flip-flop 214 for each of the 0th to 6th columns keeps the lower flag at MINi=H (or 1) and keeps the disable signal at DISABLEi=L (or 0). On the other hand, in the local circuit 421 for the 7th column (i=7), since the comparator 4211 outputs the local signal yi=0, while the global signal BOT_K=1, the flip-flop 214 changes its output from the original state. That is, the flip-flop 214 for the 7th column changes the lower flag from MINi=H (or 1) to MINi=L (or 0) and changes the disable signal from DISABLEi=L (or 0) to DISABLEi=H (or 1).
Consequently, at timing t35 and thereafter, the comparator 4211 for the 7th column (i=7) receives the disable signal DISABLEi=H (or 1) and is disabled in operation. Therefore, as shown by a dotted line in the Vi waveform in
At timing t36, the DAC 4221 for the global SAR sets the global reference voltage VDAC=5VREF/8. The comparators 4211 for the respective columns (the 0th to 6th columns) not disabled at this time compare the signals V0 to V6 with the global reference voltage VDAC=5VREF/8, and output local signals (y0, y1, y2, y3, y4, y5, y6)=(1, 1, 1, 0, 1, 1, 0) as comparison results. Then, the parallel counter 4222 counts the number of local signals whose value is 1. Since the counted value=5 is greater than or equal to K=4, the global signal BOT_K is kept at the H level (or 1), and the global signal inversion signal BOT_K
At timing t37, in the local circuit 421 for each of the 0th to 2nd, 4th, and 5th columns (i=0 to 2, 4, 5), since the comparator 4211 outputs the local signal yi=1, while the global signal BOT_K=1, the flip-flop 214 keeps its output in the original state. That is, the flip-flop 214 for each of the 0th to 2nd, 4th, and 5th columns keeps the lower flag at MINi=H (or 1) and keeps the disable signal at DISABLEi=L (or 0). On the other hand, in the local circuit 421 for each of the 3rd and 6th columns (i=3, 6), since the comparator 4211 outputs the local signal yi=0, while the global signal BOT_K=1, the flip-flop 214 changes its output from the original state. That is, the flip-flop 214 for each of the 3rd and 6th columns changes the lower flag from MINi=H (or 1) to MINi=L (or 0) and changes the disable signal from DISABLEi=L (or 0) to DISABLEi=H (or 1).
Consequently, at timing t37 and thereafter, the comparator 4211 for each of the 3rd and 6th columns (i=3, 6) receives the disable signal DISABLEi=H (or 1) and is disabled in operation. Therefore, as shown by a dotted line in the Vi waveform in
At timing t38, the DAC 4221 for the global SAR sets the global reference voltage VDAC=9VREF/16. The comparators 4211 for the respective columns (the 0th to 2nd, 4th, and 5th columns) not disabled at this time compare the signals V0 to V2, V4, and V5 with the global reference voltage VDAC=9VREF/16, and output local signals (y0, y1, y2, y4, y5)=(1, 0, 1, 1, 1) as comparison results. Then, the parallel counter 4222 counts the number of local signals whose value is 1. Since the counted value=4 is greater than or equal to K=4, the global signal BOT_K is kept at the H level (or 1), and the global signal inversion signal BOT_K
At timing t39, in the local circuit 421 for each of the 0th, 2nd, 4th, and 5th columns (i=0, 2, 4, 5), since the comparator 4211 outputs the local signal yi=1, while the global signal BOT_K=1, the flip-flop 214 keeps its output in the original state. That is, the flip-flop 214 for each of the 0th, 2nd, 4th, and 5th columns keeps the lower flag at MINi=H (or 1) and keeps the disable signal at DISABLEi=L (or 0). On the other hand, in the local circuit 421 for the 1st column (i=1), since the comparator 4211 outputs the local signal yi=0, while the global signal BOT_K=1, the flip-flop 214 changes its output from the original state. That is, the flip-flop 214 for the 1st column changes the lower flag from MINi=H (or 1) to MINi=L (or 0) and changes the disable signal from DISABLEi=L (or 0) to DISABLEi=H (or 1).
Consequently, at timing t39 and thereafter, the comparator 4211 for the 1st column (i=1) receives the disable signal DISABLEi=H (or 1) and is disabled in operation. Therefore, as shown by a dotted line in the Vi waveform in
When B denotes the number of bits according to the precision of conversion, the processing circuit 402 outputs, at timing t33 after B=4 cycles, the lower flags (MIN0, MIN1, MIN2, MIN3, MIN4, MIN5, MIN6, MIN7)=(1, 0, 1, 0, 1, 1, 0, 0) as the result of searching for the lower K values. In this example, it is illustrated that, as the result of searching for the lower K values, the signals V0, V2, V4, and V5 of the 0th, 2nd, 4th, and 5th columns have been found as the lower K values among the signals V0 to V7 of the 0th column to the 7th column.
The address solution circuit 403 illustrated in
The address solution circuit 403 may be configured as illustrated in
As illustrated in
The shift register 33 can be reconfigured in accordance with the lower flags of the plurality of columns. The address solution circuit 403 connects the register 321 between the input node 32a and the output node 32b, in each of the register circuits 32 corresponding to the lower K lower flag values among the plurality of register circuits 32-i to 32-(i+3). The address solution circuit 403 bypasses the register 321 between the input node 32a and the output node 32b, in each of the remaining register circuits 32. With this arrangement, the address solution circuit 403 reconfigures the shift register 33.
The shift register 33 thus reconfigured receives an input of the pulse BOTK_START, which is supplied alone. The shift register 33 transmits the pulse BOTK_START by sequentially shifting this pulse between the registers 321 corresponding to the lower K lower flag values. Correspondingly, the address circuits 31 corresponding to the lower K lower flag values among the plurality of address circuits 31-i to 31-(i+3) are selectively and sequentially enabled. Consequently, the address signals are sequentially output from the address circuits 31 corresponding to the lower K lower flag values to the address bus addr<0:7>. That is, the shift register 33 can be reconfigured such that the address solution for the K address signals corresponding to the lower K signals can be performed in K cycles. Therefore, the address solution circuit 403 can be configured in a scalable manner for the “K” number of lowers.
For example, the address solution circuit 403 is operated as illustrated in
At timing t41, in synchronization with the clock CLK, the global circuit 436 (see
In synchronization with the clock CLK, the global circuit 436 generates a single pulse BOTK_START with a pulse width of one period of the clock CLK, and supplies this pulse to the register circuit 32-0 at the top of the shift register 33.
At this time, the shift register 33 is reconfigured in accordance with the lower flags (MIN0, MIN1, MIN2, MIN3, MIN4, MIN5, MIN6, MIN7)=(1, 0, 1, 0, 1, 1, 0, 0). For example, the address solution circuit 403 forms connection between the input node 32a and the output node 32b through the register 321 in each of the register circuits 32-0, 32-2, 32-4, and 32-5 (see
At timing t42, in accordance with a rising edge of the clock CLK_BOTK, the pulse BOTK_START is held in the register 321-0 at the top of the reconfigured shift register 33, and the output of the register 321-0 changes from 0 to 1. Correspondingly, the select signal SEL0 changes from 0 to 1, and the enable circuit 312 of the address circuit 31-0 sets the common line E to the active level (for example, the H level) and sets the common line nE to the active level (for example, the L level). Consequently, the storage circuit 311 of the address circuit 31-0 is activated. The storage circuit 311 outputs an address value of the signals corresponding to the lower flag MIN0 to the address bus addr<0:7>. The lines B0 to B7 respectively output levels of (B0, B1, B2, B3, B4, B5, B6, B7)=(0, 0, 0, 0, 0, 0, 0, 0) as address signals to the corresponding address lines addr. In this example, since B0 corresponds to LSB and B7 corresponds to MSB, the address value is “00000000” in binary notation and “0” in decimal notation.
At timing t43, in accordance with a rising edge of the clock CLK_BOTK, the pulse BOTK_START is shifted from the top register 321-0 to the second register 321-2 in the reconfigured shift register 33. Thus, the output of the register 321-0 changes from 1 to 0, and the output of the register 321-2 changes from 0 to 1.
Correspondingly, the select signal SEL0 changes from 1 to 0, and the enable circuit 312 of the address circuit 31-0 sets the common line E to the non-active level (for example, the L level) and sets the common line nE to the non-active level (for example, the H level). Consequently, the storage circuit 311 of the address circuit 31-0 is deactivated.
Together with the above, the select signal SEL2 changes from 0 to 1, and the enable circuit 312 of the address circuit 31-2 sets the common line E to the active level (for example, the H level) and sets the common line nE to the active level (for example, the L level). Consequently, the storage circuit 311 of the address circuit 31-2 is activated. The storage circuit 311 outputs an address value of the signals corresponding to the lower flag MIN2 to the address bus addr<0:7>. The lines B0 to B7 respectively output levels of (B0, B1, B2, B3, B4, B5, B6, B7)=(0, 1, 0, 0, 0, 0, 0, 0) as address signals to the corresponding address lines addr. In this example, since B0 corresponds to LSB and B7 corresponds to MSB, the address value is “00000010” in binary notation and “2” in decimal notation.
At timing t44, in accordance with a rising edge of the clock CLK_BOTK, the pulse BOTK_START is shifted from the second register 321-2 to the third register 321-4 in the reconfigured shift register 33. Thus, the output of the register 321-2 changes from 1 to 0, and the output of the register 321-4 changes from 0 to 1.
Correspondingly, the select signal SEL2 changes from 1 to 0, and the enable circuit 312 of the address circuit 31-2 sets the common line E to the non-active level (for example, the L level) and sets the common line nE to the non-active level (for example, the H level). Consequently, the storage circuit 311 of the address circuit 31-2 is deactivated.
Together with the above, the select signal SEL4 changes from 0 to 1, and the enable circuit 312 of the address circuit 31-4 sets the common line E to the active level (for example, the H level) and sets the common line nE to the active level (for example, the L level). Consequently, the storage circuit 311 of the address circuit 31-4 is activated. The storage circuit 311 outputs an address value of the signals corresponding to the lower flag MIN4 to the address bus addr<0:7>. The lines B0 to B7 respectively output levels of (B0, B1, B2, B3, B4, B5, B6, B7)=(0, 0, 1, 0, 0, 0, 0, 0) as address signals to the corresponding address lines addr. In this example, since B0 corresponds to LSB and B7 corresponds to MSB, the address value is “00000100” in binary notation and “4” in decimal notation.
At timing t45, in accordance with a rising edge of the clock CLK_BOTK, the pulse BOTK_START is shifted from the third register 321-4 to the final register 321-5 in the reconfigured shift register 33. Thus, the output of the register 321-4 changes from 1 to 0, and the output of the register 321-5 changes from 0 to 1.
Correspondingly, the select signal SEL4 changes from 1 to 0, and the enable circuit 312 of the address circuit 31-4 sets the common line E to the non-active level (for example, the L level) and sets the common line nE to the non-active level (for example, the H level). Consequently, the storage circuit 311 of the address circuit 31-4 is deactivated.
Together with the above, the select signal SELS changes from 0 to 1, and the enable circuit 312 of the address circuit 31-5 sets the common line E to the active level (for example, the H level) and sets the common line nE to the active level (for example, the L level). Consequently, the storage circuit 311 of the address circuit 31-5 is activated. The storage circuit 311 outputs an address value of the signals corresponding to the lower flag MIN5 to the address bus addr<0:7>. The lines B0 to B7 respectively output levels of (B0, B1, B2, B3, B4, B5, B6, B7)=(1, 0, 1, 0, 0, 0, 0, 0) as address signals to the corresponding address lines addr. In this example, since B0 corresponds to LSB and B7 corresponds to MSB, the address value is “00000101” in binary notation and “5” in decimal notation.
At timing t46, in accordance with a rising edge of the clock CLK_BOTK, the pulse BOTK_START is transferred from the final register 321-5 of the shift register 33 to the transfer detection circuit 35. Thus, the output of the register 321-5 changes from 1 to 0, and the output of the transfer detection circuit 35 changes from 0 to 1.
Correspondingly, the select signal SEL5 changes from 1 to 0, and the enable circuit 312 of the address circuit 31-5 sets the common line E to the non-active level (for example, the L level) and sets the common line nE to the non-active level (for example, the H level). Consequently, the storage circuit 311 of the address circuit 31-5 is deactivated. Together with this, the transfer detection circuit 35 supplies the output that has become 1 to the global circuit 436 as the pulse BOTK_nSTOP.
In accordance with the pulse BOTK_nSTOP, at timing t47, the global circuit 436 causes the enable signal BOTK_EN to transition from the active level (for example, the H level) to the non-active level (for example, the L level), and supplies this signal to the output circuit 34, in synchronization with the clock CLK. The output circuit 34 comes into a state not to output signals present on the address bus addr<0:7>. Consequently, the outputting of address signals from the address solution circuit 403 to the address bus addr<0:7> is completed.
At timing t48, the output of the transfer detection circuit 35 changes from 1 to 0, and notification from the transfer detection circuit 35 to the global circuit 436 is completed.
As described above, according to the second embodiment, the address solution for the lower K signals can be performed by an operation in cycles corresponding to K smaller than or equal to the number of columns, and thus the address solution for the lower K signals can be performed more efficiently. Therefore, for example, when the lower K of a plurality of signals output from the plurality of columns of the memory array MA are required for use, the plurality of signals can be efficiently utilized.
While certain embodiments have been described, these embodiments have been presented by way of example only, and are not intended to limit the scope of the inventions. Indeed, the novel embodiments described herein may be embodied in a variety of other forms; furthermore, various omissions, substitutions and changes in the form of the embodiments described herein may be made without departing from the spirit of the inventions. The accompanying claims and their equivalents are intended to cover such forms or modifications as would fall within the scope and spirit of the inventions.
Number | Date | Country | Kind |
---|---|---|---|
2021-203643 | Dec 2021 | JP | national |