COMPUTE-IN-MEMORY DEVICES, NEURAL NETWORK ACCELERATORS, AND ELECTRONIC DEVICES

CROSS-REFERENCE TO RELATED APPLICATION

The present application is based upon and claims the benefit of priorities of Chinese Patent Application No. 202210769401.8, filed on Jun. 30, 2022; Chinese Patent Application No. 202211599037.1, filed on Dec. 12, 2022; Chinese Patent Application No. 202211728423.6, filed on Dec. 30, 2022; and Chinese Patent Application No. 202310259263.3, filed on Mar. 9, 2023, the entire contents of all of which are incorporated herein by reference.

TECHNICAL FIELD

The present disclosure relates to a field of storage technology and integrated circuit technology, and particularly to a compute-in-memory devices, neural network accelerators, and electronic devices.

BACKGROUND

With the wide application of technologies such as artificial intelligence, the demand for data processing is increasing rapidly, but the computing performance and power consumption of modern computing systems based on the von Neumann architecture are limited by the movement of data between the storage part and the computing part. Compute-in-memory is an attempt to solve this problem, which attempts to reduce energy consumption and improve performance by reducing data movement activities. Conventional storage cells are optimized for storage without considering the cost of computing. Therefore, a compute-in-memory system that can be applied on a large scale is to combine the computing part with the storage part, and has a convenient interface, with consideration of cost, power consumption, performance, scalability, and stability.

Because of the advantages of Static Random-Access Memory (SRAM) such as fast read/write speed, low read/write energy consumption, good durability, and available mature technology, many compute-in-memory designs based on SRAM have emerged in recent years. Some existing designs use foundry-provided SRAM cells comprising 6 transistors (6T). Due to the more compact layout and smaller transistors used by foundries, this SRAM cell has a smaller cell area than non-6T SRAM cells. However, when 6T SRAM cells are used for computing, the swing range of the bit line voltage is limited, which leads to problems such as read disturbance and too small signal margin. Therefore, some designs choose to sacrifice a certain area efficiency and use SRAM composing more transistors as the computing unit.

The area of compute cells has a large impact on the overall cost of a chip and is thus a very imterminalant issue in edge artificial intelligence devices. In actual industrial applications, area efficiency is one of the core indicators of compute-in-memory design. Even with the adoption of foundry-provided 6T SRAM as the compute cell, compute-in-memory designs still face the problem of inadequate area efficiency. For large neural networks, SRAM-based compute-in-memory designs face difficulties storing all weights on the chip, thus needing to repeatedly read weight values from dynamic random access memory (DRAM), causing huge energy consumption and delay. Therefore, it is necessary to further improve the area efficiency of SRAM-based compute-in-memory.

As shown in FIG. 17, traditional von Neumann architecture separates data storage and computation to maintain computational universality and control flexibility. The frequent data movement between memory and processors has become a critical bottleneck limiting the performance metrics such as latency, energy efficiency, throughput, and scalability of mainstream computing platforms such as CPUs and GPUs. Therefore, traditional computing platforms have lagged behind the computational requirements of cutting-edge applications. This situation results in many computing tasks that cannot be completed directly at the edge, requiring data transmission to cloud computing platforms for processing. This not only increases communication overhead but also contradicts the requirements of users in terms of real-time performance, interactivity, and security and privacy.

To address the aforementioned issues, compute-in-memory is a viable solution. As shown in FIG. 17, it attempts to integrate storage and computational units to reduce unnecessary data movement, thereby reducing latency, power consumption, and improving overall system performance, enabling large-scale deployment of edge computing platforms. In the development process of mainstream memories over the past decades, they have only been improved for storage requirements and lack circuit optimization design for computation. Therefore, designing an compute-in-memory system of a certain scale requires comprehensive consideration of cost, power consumption, performance, scalability, reliability, and designing appropriate data interfaces to achieve an organic integration of computation and storage units.

Currently, the compute-in-memory research conducted by academia and industry mainly relies on devices such as Static Random-Access Memory (SRAM) and emerging non-volatile memories (e-NVM) represented by Resistor Random-Access Memory. SRAM has advantages of fast read/write speed, low read/write power consumption, good durability, and mature manufacturing processes. However, the significant drawback of using SRAM units for compute-in-memory is high static power consumption and low area efficiency. In addition, there is a trade-off between noise margin and storage density in SRAM units, and read interference problems limit the voltage swing of bitlines. To ensure system reliability, many design schemes have abandoned the compact 6T SRAM structure and introduced more transistors as shown in FIG. 18, further reducing the area efficiency of SRAM. Compared to SRAM, the compute-in-memory units implemented using emerging non-volatile memories have advantages such as high area efficiency and low power consumption. However, they suffer from limitations such as immature manufacturing processes, significant device mismatch, slow read/write speeds, and poor durability, making them unsuitable for building large-scale compute-in-memory systems in the short term.

According to the specific implementation method, compute-in-memory can be divided into two categories: digital compute-in-memory and analog compute-in-memory. The advantage of digital compute-in-memory lies in its ability to maintain accuracy on par with traditional computing platforms, while its improvement in computational efficiency is relatively limited. The advantage of analog compute-in-memory lies in its higher energy efficiency, but it has drawbacks such as certain accuracy loss and additional data conversion overhead. Analog compute-in-memory can be further divided into charge-domain computing and current-domain computing. The former is based on the principle of charge conservation and performs weighted accumulation operations based on the capacitance values. It has the advantage of high matching accuracy but introduces delay costs and area overhead due to capacitors. The latter is based on Kirchhoff s current law and performs weighted accumulation operations. It has the advantage of fast operation speed but relatively lower accuracy.

In the design of compute-in-memory circuits, the area efficiency of the computational unit is a key metric. On one hand, the area of the computational unit directly determines the manufacturing cost and commercial potential of the chip. On the other hand, due to the large number of parameters trained by mainstream neural network algorithms, compute-in-memory chips require frequent data interaction with Dynamic Random-Access Memory (DRAM) during the execution of AI algorithms, introducing additional delays and energy consumption. Therefore, improving the area efficiency of compute-in-memory can reduce the number of DRAM accesses, contributing to the overall performance improvement of the system.

Content-addressable memory (CAM) is a type of memory with search functionality. Its characteristic is that when performing a search operation, input data is used as the search content, the contents stored in the memory are matched with the input data, and the address of the matching data in the memory is outputted as the search result.

The characteristic of content-based search in CAM enables it to support the needs of some key applications. In computer networks, a routing table is a collection of data stored in a router or networked computer. The routing table stores paths pointing to specific network addresses, and the search for the routing table is a content-based search method as described earlier. Traditional search methods such as linear search and hash table search are software search methods based on random access memory (RAM), which are slow and cannot meet the search requirements of high-speed real-time communication systems. However, the hardware search method based on CAM completes the search within one clock cycle by querying all data in the memory in the same clock cycle, making the search speed unaffected by the amount of routing table data. The average search speed is much faster than that of the RAM-based search method, which significantly improves network transmission speed and network performance.

Currently, the common CAM in academia is mainly based on storage units composed of Static Random-Access Memory (SRAM) cells, Resistor Random-Access Memory (ReRAM) cells, Ferroelectric gate field-effect transistors (FeFET), and so on. These storage units are usually composed of multiple devices, with a large area and high power consumption. Since the area determines the manufacturing cost of the CAM chip, and power consumption affects the application value of the CAM chip, it is necessary to reduce the complexity of CAM storage units, thereby improving the area efficiency and reducing the power consumption of CAM.

To improve overall system throughput and energy efficiency, a compute-in-memory architecture has been proposed to eliminate the “memory wall” bottleneck in von Neumann architectures. Compute-in-memory integrates the memory and processing units, enabling computations to be performed within the memory units themselves, reducing the additional consumption associated with data movement. This makes compute-in-memory architectures highly promising for deploying machine learning models on edge devices.

Compute-in-memory shares similar computing principles among different types of memory devices, such as Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), Flash memory, Resistive Random-Access Memory (ReRAM), and other emerging memory technologies. However, due to its mature manufacturing process, durability, flexibility in read/write operations, and fast access speed, SRAM is currently the mainstream choice for compute-in-memory architectures. However, using 6T SRAM for compute-in-memory faces challenges like read disturbance due to the limited voltage swing range of the bit-lines during 6T SRAM cell computation. Therefore, to improve the performance of SRAM-based compute-in-memory, some circuits utilize SRAM configurations with more transistors, such as 8T SRAM, 10T SRAM, and 12T SRAM.

To enable the widespread adoption of AI applications on edge devices, the area efficiency of the storage units in compute-in-memory devices is a crucial consideration during design. However, due to the low storage density characteristics of SRAM, even with the smallest 6T SRAM cells employed on the chip, large-scale neural networks still face challenges in complete deployment on a chip. This leads to the need for frequent reading of weight values from off-chip memory during computations, resulting in significant data transfer energy consumption and latency. Therefore, improving the area efficiency of compute-in-memory devices becomes a core optimization objective.

SUMMARY

According to a first aspect of the present disclosure, a compute-in-memory apparatus is provided, and the apparatus comprises:

- a computing array, which comprises a plurality of computing modules, wherein each computing module comprises at least one storage cell, a reset switch, and a capacitor, the storage cell comprises at least one storage switch, wherein:
- the storage switch comprises a storage control terminal, a storage detection terminal, and a storage terminal, the storage terminal is connected to a data storage line to receive a storage state voltage and the associated information, and the storage control terminal is connected to a control word-line to receive a control voltage to adjust the impedance characteristic between the storage detection terminal and the storage terminal;
- the reset switch comprises a reset control terminal, a reset detection terminal, and a reset terminal. The reset control terminal is connected to a control word-line to receive a reset voltage to adjust the impedance characteristic between the reset detection terminal and the reset terminal;
- the reset terminal is connected to a reset state voltage line to receive a reset state voltage, the reset detection terminal and a first terminal of the capacitor are connected to an output terminal of the storage cell, and a second terminal of the capacitor is connected to a computing bit-line;
- a control module, which is connected to the computing array to control the computing array to perform at least one of a store operation, a read operation and a compute operation.

In an embodiment of the present disclosure, the storage cell comprises a first storage switch and a second storage switch, wherein the storage detection terminals of both the first storage switch and the second storage switch are connected to the output terminal of the storage cell.

In an embodiment of the present disclosure, the computing module additionally comprises a selection switch. The selection switch comprises a selection control terminal, a first detection terminal, and a second detection terminal, wherein,

- the selection control terminal is connected to a control bit line to receive a control voltage to adjust the impedance characteristic between the first selection detection terminal and the second selection detection terminal;
- the first detection terminal is connected to the output terminal of the storage cell, and the second detection terminal is connected to each storage detection terminal.

In an embodiment of the present disclosure, the storage cell comprises a first selection switch, a second selection switch, a third storage switch, a fourth storage switch, a fifth storage switch, and a sixth storage switch,

- the first detection terminals of the first selection switch and the second selection switch are connected to the output terminal of the storage cell, the second detection terminal of the first selection switch is connected to the storage detection terminals of the third storage switch and the fourth storage switch, the selection control terminal of the first selection switch is connected to a first control bit line, and the selection control terminal of the second selection switch is connected to a second control bit line,
- the storage control terminals of the third storage switch and the fifth storage switch are connected to a second control word-line, and the storage terminals of both the third storage switch and the fourth storage switch are connected to a first data storage line. The storage control terminals of both the fourth storage switch and the sixth storage switch are connected to a third control bit line, and the storage terminals of both the fifth storage switch and the sixth storage switch are connected to a second data storage line.

In an embodiment of the present disclosure, the compute operation includes a multiply-and-accumulate operation. The control module is additionally used to:

- activate the storage control terminal of the storage switch of the storage cell of the target computing module, thereby a logic AND operation is performed on the information carried by the control word-line connected to the storage control terminal of the activated storage switch and the information associated with the storage state voltage of the storage terminal of the activated storage switch;
- obtain a result of the multiply-and-accumulate operation through the computing bit-line.

In an embodiment of the present disclosure, the compute operation includes a logic AND operation. The control module is additionally used to:

- activate both the reset control terminal of the reset switch of a target computing module and the computing bit-line, thereby forming a voltage difference between the two terminals of the capacitor of the target computing module;
- turn off the reset control terminal and set the computing bit-line into floating state;
- input a set of operands for the logic AND operation via the reset control terminal, the control word-line connected to the storage control terminal of the storage switch, and the data storage line connected to the storage terminal of the storage switch;
- obtain a result of the logic AND operation on the set of operands through the computing bit-line.

In an embodiment of the present disclosure, within each column of the computing array, the control bit lines and computing bit-lines of at least one computing module are connected; within each row of the computing array, the control word-lines, data storage lines and reset state voltage lines of at least one computing module are connected.

In an embodiment of the present disclosure, the compute operation includes a multiply-and-accumulate operation. The control module is additionally used to:

- control one or more columns of computing modules of the computing array to perform the multiply-and-accumulate operation, and/or control some or all of the computing modules connected to the same computing bit-line to perform the multiply-and-accumulate operation.

In an embodiment of the present disclosure, the control module is additionally used to:

- control the computing modules connected to different computing bit-lines to perform a pipelined compute operation.

In an embodiment of the present disclosure, the control module is additionally used to:

- control each of the control word-lines, control bit lines, computing bit-lines, data storage lines, and reset state voltage lines to be grounded, whereby the computing array enters an idle mode.

According to a second aspect of the present disclosure, a neural network accelerator is provided. The neural network accelerator comprises at least one neural network module, the neural network module comprises at least one original convolutional layer, and the original convolutional layer comprises a backbone layer that has fixed weights and a branch layer that has adjustable weights. The backbone layer comprises one or more convolutional layers, and the branch layer at least comprises a first branch convolutional layer, a second branch convolutional layer, and a third branch convolutional layer, which are sequentially connected. The input channel number of the first branch convolutional layer is equal to that of the backbone layer, the output channel number of the third branch convolutional layer is equal to that of the backbone layer, the input channel number of the second branch convolution layer is smaller than that of the backbone layer, and the output channel number of the second branch convolution layer is smaller than that of the backbone layer,

- wherein, the backbone layer and the convolutional layers of the branch layer are implemented using the compute-in-memory apparatus.

In an embodiment of the present disclosure, the backbone layer and the first branch layer are used to receive an input to the neural network, and an output of the neural network module is obtained by aggregating the output of the backbone layer and the output of the third branch convolution layer.

In an embodiment of the present disclosure, in a training process of the neural network accelerator, the weights of each backbone layer are fixed, and/or the weight gradient of the backbone layer is zero in a back-propagation stage of the training process, and the weights of each branch layer are adjusted by gradient descent.

According to a third aspect of the present disclosure, an electronic device is provided, which includes the compute-in-memory apparatus, or includes the neural network accelerator.

The compute-in-memory apparatus of embodiments of the present disclosure comprises a computing array and a control module. The computing array comprises a plurality of computing modules, wherein each computing module comprises at least one storage cell, a reset switch Qf and a capacitor. The storage cell comprises at least one storage switch, wherein: the storage switch comprises a storage control terminal, a storage detection terminal, and a storage terminal, the storage terminal is connected to a data storage line to receive a storage state voltage and the information associated with the storage state voltage; the storage control terminal is connected to a control word-line to receive a control voltage to adjust the impedance characteristic between the storage detection terminal and the storage terminal. The reset switch Qf comprises a reset control terminal, a reset detection terminal, and a reset terminal, the reset control terminal is connected to a control word-line to receive a reset voltage to adjust the impedance characteristic between the reset detection terminal and the reset terminal, and the reset terminal is connected to a reset state voltage line to receive a reset state voltage. The reset detection terminal and a first terminal of the capacitor are connected to the output terminals of at least one storage cell, and a second terminal of the capacitor is connected to the compute word-line. The control module controls the computing array to perform at least one of a store operation, a read operation, and a compute operation. Embodiments of the present disclosure include at least one storage switch, while the storage cells of the related art include at least six transistors, therefore, embodiments of the present disclosure have higher area efficiency. In addition, embodiments of the present disclosure store data through storage state voltages, and a single storage switch can realize multi-bit storage, which further improves area efficiency and significantly reduces the power consumption of data access and compute-in-memory.

According to a fourth aspect of the present disclosure, a compute-in-memory apparatus is provided, the apparatus comprises: a plurality of read-only memory devices, a plurality of current sources, and a control module, wherein:

- each storage terminal of the read-only memory devices is connected to one end of the corresponding current source, and the other end of the current source is used to receive a preset voltage for data storage;
- the control terminal of each read-only memory device is connected to the corresponding control word line to receive the data to be computed;
- the output terminal of each read-only memory device is used to output the result data through a compute bit line;
- the control module is used to select the corresponding read-only memory devices for operation through the control word line and output the result data, wherein the operation includes multiplication and data read operations, if the operation is a multiplication operation, then the result data is the product of the stored data corresponding to the preset voltage received by the current source and the data to be computed input through the control word line; if the operation is a data read operation, the result data is the stored data corresponding to the preset voltage received by the current source.

In an embodiment of the present disclosure, the apparatus comprises a plurality of compute modules, wherein each compute module comprises a read-only memory device; the output terminal of the read-only memory device is directly connected to the corresponding compute bit line to output the result data.

In an embodiment of the present disclosure, the apparatus comprises a plurality of compute modules, wherein each compute module comprises two or more read-only memory devices and at least one selection device; the control terminals of the selection devices are connected to the corresponding control bit lines, the input terminals of the selection devices are connected to the output terminals of the read-only memory devices, and the output terminals of the selection devices are connected to the corresponding compute bit lines to output the result data.

In an embodiment of the present disclosure, the read-only memory devices in each compute module are arranged in a multi-row and multi-column layout, the control terminals of each row of read-only memory devices are connected to the same control word line, the output terminals of each column of read-only memory devices are connected to the same selection devices, and the output terminals of each selection device are connected to the same compute bit line.

In an embodiment of the present disclosure, a plurality of compute modules are combined into a multi-row and multi-column layout through electrical connections, wherein the control word lines of the compute modules in the same row are electrically connected, the control bit lines of the compute modules in the same column are electrically connected, the compute bit lines of the compute modules in the same column are electrically connected, and the compute bit lines are connected to the power supply voltage through selection devices.

In an embodiment of the present disclosure, the operation further includes a multiply-and-accumulate operation, and the result data includes the multiply-and-accumulate result; the control module is further used to:

- input a group of data through the control word lines, activate the control terminals of the corresponding selection devices through the control bit lines to select the compute modules participating in the multiply-and-accumulate operation, use the control word lines to select the read-only memory devices participating in the computation in each compute module, perform a multiplication operation on the input data and the stored data corresponding to the preset voltage received by the current sources in the compute modules, accumulate the multiplication results obtained in each compute module and output them in the form of a current through the compute bit lines to obtain the multiply-and-accumulate result.

In an embodiment of the present disclosure, the control module is further used to adjust the signal timing of the control word lines and control bit lines to achieve pipelining operations between different compute bit lines.

In an embodiment of the present disclosure, the control module is further used to control the compute modules to enter a working mode or an idle mode, wherein

- in the working mode, the read-only memory devices in the compute modules perform the operations,
- in the idle mode, no current flows inside the compute modules.

According to a fifth aspect of the present disclosure, a neural network accelerator is provided, wherein the neural network accelerator comprises the compute-in-memory apparatus.

According to a sixth aspect of the present disclosure, an electronic device is provided, wherein the electronic device comprises a neural network accelerator.

According to a seventh aspect of the present disclosure, a content-addressable storage device is provided, including:

- Multiple storage units, each storage unit including the read-only memory device and the capacitor. The read-only memory device includes the first input terminal, the second input terminal, and the output terminal, with the output terminal of the read-only memory device connected to the first terminal of the capacitor. The read-only memory device stores data based on the connection relationship between the first input terminal, the second input terminal, and the output terminal. Specifically, when the first input terminal of the read-only memory device is connected to the output terminal, the memory device stores the first data. Alternatively, when the second input terminal of the memory device is connected to the output terminal, the read-only memory device stores the second data, with the voltage levels of the first and second stored data being different.

The control module, connected to various storage units, is used to control the voltage of the first and second input terminals of the read-only memory devices in each storage unit, in order to perform the required operation.

The operation result of the target operation is determined based on the voltage of the second terminal of the capacitor as described.

In one possible implementation, the required operation includes a XNOR logic operation, and the first and second input terminals of each memory device are voltage-controlled to perform the required operation, including:

- Grounding the first and second input terminals of the read-only memory device, and discharging the capacitor;
- Floating the first and second input terminals of the read-only memory device, and leaving the output terminal floating electrically;
- Applying the signals to the first and second input terminals of the read-only memory device according to the input data, and executing the XNOR logic operation between the input data and the stored data with the device.

The required operation's result is then determined based on the voltage of the second terminal of the capacitor, obtaining the XNOR logic operation result between the input data and the stored data.

In one possible implementation, the control module is connected to the first input terminal of the read-only memory device via the first bitline, and to the second input terminal of the read-only memory device via the second bitline. The control module is also connected to the second terminal of the capacitor through the matching line. Multiple storage units are combined into a layout of multiple rows and columns through electrical connections, where the electrical connection method is such that the matching lines of some or all storage units in the same row are connected, and the first and second bitlines of some or all storage units in the same column are respectively connected.

In one possible implementation, the required operation mentioned includes content-addressable operation, which involves controlling the voltage of the first and second input terminals of the read-only memory device for each storage unit to perform the required operation, including:

- Grounding the first and second inputs of each read-only memory element in one or more rows of storage units and discharging the corresponding capacitors;
- Floating the first and second inputs of each read-only memory element in one or more rows of storage units and leaving the corresponding match lines floating;
- Applying the signals to the first and second input terminals of the read-only memory devices in one or more rows of storage units, according to the input data, to perform the content addressing operations in one or more rows. The operation results of the required operations are determined by the voltages of the second terminal of the capacitors, obtaining the addressing calculation result of the input data and the stored data, based on the voltage of the match lines of each activated row of storage units.

In one possible implementation, the required operation includes a content-addressable operation, which involves controlling the voltage of the first and second input terminals of the read-only memory device for each storage unit to perform the required operation, including:

- Grounding the first and second input terminals of each read-only storage device in the storage units connected to the same matching line, and discharging each capacitor connected to the storage units connected to the same matching line;
- Floating the first and second input terminals of each read-only storage device in the storage units connected to the same matching line, and leaving the matching lines electrically floating;
- Applying the signals to the first and second input terminals of the read-only memory devices in the storage units connected to the same matching line, according to the input data, to perform the content addressing operations. The operation results of the required operations are determined by the voltages of the second terminal of the capacitors, obtaining the addressing calculation result of the input data and the stored data, based on the voltage of the match.

In one possible implementation, the required operation also includes a read operation to retrieve the data stored in the storage unit. The control module is also used to: maintain complementary voltage levels at the first and second input terminals, distinguish whether the output terminal is connected to the first or second input terminal based on the magnitude of the output voltage, and thereby obtain the stored data.

In one possible implementation, the control module is also used to control at least one storage unit to operate in a working mode or an idle mode, wherein the storage unit in the working mode performs the required operation, or the first input terminal, the second input terminal, and the output terminal of the read-only memory device of the storage unit in the idle mode are set to low level.

According to an eighth aspect of this disclosure, a memory is provided, which includes a content-addressable storage device.

According to a ninth aspect of this disclosure, an electronic device is provided, which includes the memory as described above.

Various aspects of the implementations disclosed herein implement content-addressable storage devices based on read-only memory devices, which can improve the area efficiency of the content-addressable storage device, reduce or even eliminate unnecessary energy consumption caused by memory access, and thus achieve the goal of reducing power consumption.

According to a tenth aspect of the present disclosure, a high-density CiM device based on read-only memory (ROM) devices is provided, characterized by the following components:

- multiple computing modules, each of which includes at least one ROM device, multiple selection devices, electrical sources, data storage lines, control word-lines, computing bit-lines, and data selection control lines, wherein:
- the control terminal of the ROM device is connected to the corresponding control word-line to receive control signals, the two data terminals of the ROM device are connected to different data storage lines, the data storage lines are connected to the electrical source and computing bit-lines through selection devices, the data status of each data storage line is represented by current or voltage form depending on the type of electrical source, the control terminal of the selection device is connected to the corresponding data selection control line to receive data selection control signals, which are used to enable the corresponding selection device;
- a control module is connected to the control word-lines and data selection control lines of each computing module, it is used to select the desired ROM devices and selection devices through the control word-lines and data selection control lines, and output the resulting data through the computing bit-lines.

In an embodiment of the present disclosure, the device comprises selection devices, specifically data selection devices, the data selection devices consist of upper data selection devices and lower data selection devices, the upper data selection devices include a first upper data selection device and a second upper data selection device, the lower data selection devices include a first lower data selection device and a second lower data selection device, the device also includes data selection control lines, namely the first data selection control line and the second data selection control line, additionally, it comprises data storage lines, which are the first data storage lines and the second data storage lines;

- both the input terminals of the first lower data selection device and the second lower data selection device are connected to the electrical source;
- the output terminal of the first lower data selection device is connected to the input terminal of the second upper data selection device through the first data storage line;
- the output terminal of the second lower data selection device is connected to the input terminal of the first upper data selection device through the second data storage line;
- the control terminal of the first lower data selection device and the control terminal of the first upper data selection device are both connected to the first data selection control line;
- the control terminal of the second lower data selection device and the control terminal of the second upper data selection device are both connected to the second data selection control line;
- the output terminals of the first upper data selection device and the second upper data selection device are connected to the computing bit-lines;
- the first data terminal and the second data terminal of each ROM device are respectively connected to the corresponding first data storage line and second data storage line.

In an embodiment of the present disclosure, the device is characterized by the inclusion of column selection devices as part of the selection devices, wherein:

- the control terminals of each column selection device are connected to control lines to receive column selection signals, the input terminals of the serially connected components formed by the column selection devices and lower data selection devices are connected to the electrical source, the output terminals of these serially connected components are connected to the respective data storage lines.

In an embodiment of the present disclosure, the device is characterized by having multiple current sources or multiple voltage sources as the electrical source, if the electrical source is multiple voltage sources, the computing module further includes a reset switch and a capacitor, the control terminal of the reset switch is connected to a reset control word-line to receive a reset signal, the reset terminal of the reset switch is used to receive a reset state voltage, the reset detection terminal of the reset switch and the first terminal of the capacitor are connected to the output terminal of the upper data selection device, the second terminal of the capacitor is connected to the computing bit-line.

In an embodiment of the present disclosure, the device is characterized by having each ROM devices in the computing module in a layout of multiple rows and columns, wherein:

- the control terminals of each row of ROM devices are connected to the same control word-line, the two data terminals of each ROM device in each column are respectively connected to the corresponding first data storage line and second data storage line, the output terminals of each upper data selection device are connected to the corresponding computing bit-line either through a capacitor or directly, and the result is outputted through the computing bit-line.

In an embodiment of the present disclosure, the device is characterized by the feature that the first data storage lines connected to each individual ROM device in the (K+1)th column is the second data storage lines connected to each individual ROM device in the Kth column, where K is an integer.