The present disclosure relates to an electronic device design technology, and in particular, to a static random-access memory (SRAM) cell for high-speed content-addressable memory (CAM) and in-memory Boolean logic operations.
The surge of data-intensive applications such as artificial intelligence poses an ever-increasing demand for high-throughput and energy-efficient computing architectures. However, in the traditional von Neumann architecture, data needs to be transferred back and forth between a memory and a computing unit, leading to a limited data throughput and large energy overheads [1]. To tackle this challenge, an in-memory computing (IMC) architecture has been proposed to address the bottleneck of the von Neumann architecture by reducing data transmissions and directly performing computation inside the memory. Recently, different levels of memory have been explored, including SRAM, dynamic random access memory (DRAM), resistive random access memory (RRAM), spin-transfer torque magnetic random access memory (STT-MRAM), and flash memory (Flash), to implement an efficient IMC system.
Many IMC SRAMs designed with different cell structures have been proposed, such as 6T [2], standard 8T [3], 9T [4], and 10T [5]. By using massive parallel bit lines, the SRAM is capable of processing high-throughput and efficient logic/arithmetic/matrix computations. In [3], an analog-based IMC SARM has been proposed to perform multiply-and-accumulate (MAC)/dot-product computations, but it only supports specific fault-tolerant applications such as a convolutional neural network (CNN). In addition, these designs require expensive digital-to-analog converters (DACs) and analog-to-digital converters (ADCs) to convert analog voltages. Another promising digital-based IMC SRAM can perform precise bit-wise computations and has a wider range of application. In [2], basic CAM operations and Boolean logic operations have been implemented in 6T/8T SRAMs by activating a plurality of word lines. Through basic Boolean operations, addition/multiplication has been implemented in [6], and complex applications such as advanced encryption standard (AES) and CNN algorithms have been successfully executed.
However, when a plurality of word lines are activated simultaneously, both the analog-based IMC SRAM and the digital-based IMC SRAM encounter read disturbance due to shared read and write paths, which may corrupt stored data. To address the read disturbance, a hierarchical 6T SRAM design [7] and an interleaved structure [8] are proposed to avoid the read disturbance at the architectural level, but they both have rigid restrictions on data allocation and thus are not suitable for CAM applications. Other auxiliary schemes for the 6T SRAM, including weak word line drive [2] and interleaved word line activation [4], severely reduce the access speed. The standard 8T SRAM has also been explored to achieve IMC without read disturbance [9], but it causes performance degradation due to a low read margin. The 9T [4] and 10T [5] SRAMs with decoupling differential ports are reliable, but they bring large area overheads. In general, the previous solutions designed to solve the read disturbance problem of the IMC SRAM all lead to a reduction in speed or additional overheads in area.
Documents for reference are as follows:
In view of the read disturbance problem of existing high-speed SRAMs, a SRAM cell for high-speed CAM and in-memory Boolean logic operations is proposed, to mitigate the read disturbance problem of an IMC SRAM and ensure stable and high-speed execution of the SRAM, in-memory CAM, and in-memory logic operations.
The technical solution of the present disclosure is as follows: A SRAM cell for high-speed CAM and in-memory Boolean logic operations includes a standard 6T-SRAM and two additional PMOS access transistors, where read word lines of the two PMOS access transistors P1 and P2 are RWLR and RWLL respectively, and under the control of the two PMOS access transistors, a differential read port RBL/
Preferably, work states of NMOS access transistors N1 and N2 of the standard 6T-SRAM and the two additional PMOS access transistors P1 and P2 are as follows:
and a truth table corresponding to port voltages is as follows:
The beneficial effects of the present disclosure are as follows: The SRAM cell for high-speed CAM and in-memory Boolean logic operations in the present disclosure optimizes IMC of a SRAM, supports commercial CMOS technology, and has an opportunity to leverage the large number of existing on-chip SRAM caches.
The present disclosure will be described in detail in conjunction with the accompanying drawings and specific embodiments. The embodiments are implemented on the premise of the technical solutions of the present disclosure. The following presents detailed implementations and specific operation processes. The protection scope of the present disclosure, however, is not limited to the following embodiments.
The SRAM may be configured as a reliable high-speed BCAM or TCAM, or may be configured as a computational unit that performs Boolean logic functions. The 8T SRAM cell uses the 28 nm CMOS technology, with the same area as a standard 8T cell. As verified by post-simulation, a 16 Kb SRAM module operating at 2.7 GHz has a significantly higher speed than the previous designs.
With the additional read ports (that is, RBLs), the proposed 8T SRAM may be configured as a cell that performs SRAM, CAM, and in-memory logic operations. Table 1 is a detailed truth table for different operations. Table 2 lists work modes of the four access transistors. To perform the SRAM function, only WL is activated to perform a write or normal read operation. To perform the CAM function, the read word lines RWLLs and RWLRs of the PMOS access transistors P1 and P2 are configured to input search data. For example, if the search data is 1, the RWLL is pulled low to GND and the RWLR is pulled high to VDD. To perform Boolean logic operations, the read word lines RWLLs and RWLRs corresponding to P1 and P2 are selected.
1)
To support CAM operations, the RWLs are divided into RWLR and RWLL. The data to be searched is stored in columns, and compared with all columns by using word lines (that is, RWLRs or RWLLs) of driving rows. If input data is “0”, the RWLRs are at low level to turn on the right PMOS access transistor, and the RWLLs are at high level to turn off the left PMOS access transistor. When the input data is “1”, the case is the opposite.
For each column, a pair of single-ended sense amplifiers (SAs) are used to detect BL behavior, and a NOR gate connects the two SAs to generate a match or mismatch signal. In the case of a mismatch, as shown by the second bit in the first column in
2)
An induction scheme is the same as that of the BCAM. For each stored word, a search result can be generated by performing a NOR operation on outputs of the first SA and the fourth SA.
In the case of a match, the bit lines are not precharged, as shown by the first two columns in
3) Multiple-operand compound logical operations are useful in many applications, such as Hamming codes. By utilizing the two read ports of the proposed 8T SRAM, four words can be accessed simultaneously in one cycle to perform a compound logical operation. As shown in
The above-mentioned examples only express several implementations of the present disclosure, and the descriptions thereof are relatively specific and detailed, but they should not be thereby interpreted as limiting the scope of the present disclosure. It should be noted that those of ordinary skill in the art can further make several variations and improvements without departing from the idea of the present disclosure, but such variations and improvements shall all fall within the protection scope of the present disclosure. Therefore, the protection scope of the present disclosure shall be subject to the appended claims.
Number | Date | Country | Kind |
---|---|---|---|
202110520255.0 | May 2021 | CN | national |
This application is the national phase entry of International Application No. PCT/CN2021/119515, filed on Sep. 22, 2021, which is based upon and claims priority to Chinese Patent Application No. 202110520255.0, filed on May 13, 2021, the entire contents of which are incorporated herein by reference.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/CN2021/119515 | 9/22/2021 | WO |