This application claims the benefit under 35 USC § 119(a) of Korean Patent Application No. 10-2022-0088942 filed on Jul. 19, 2022, and Korean Patent Application No. 10-2022-0143480 filed on Nov. 1, 2022, in the Korean Intellectual Property Office, the entire disclosures, all of which, are incorporated herein by reference for all purposes.
The following description relates to a memory device with in-memory computing (IMC).
A vector-matrix multiplication operation, which is also known as a multiply-accumulate (MAC) operation, may be central to the performance of applications in various technical fields. For example, the MAC operation may be performed for machine learning and authentication of a multi-layer neural network. An input signal may be considered to form an input vector and may be data of images, byte streams, or other datasets to be processed by a neural network, for example. The input signal may be multiplied by a weight of an input layer of a neural network, for example, and an output vector may be obtained from an accumulated MAC operation result. The output vector may be provided as an input vector for a subsequent layer of the neural network. The MAC operation may be iteratively performed in a sequence of layers, and the processing performance of the neural network may thus be determined mainly by the performance of the MAC operation. The MAC operation may be implemented through in-memory computing (IMC).
This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.
In one general aspect, a memory device includes a multiplying cell including a memory cell including a pair of inverters including a first inverter and a second inverter, each inverter including an input and an output, wherein the input of the first inverter is connected to the output of the second inverter at a first end of the pair of inverters, and wherein the output of the first inverter is connected to the input of the second inverter at a second end of the pair of inverters, a first transistor connected to the first end of the pair of inverters, and a second transistor connected to the second end of the pair of inverters, in which a value is stored, and a switching element connected to an output end of the memory cell, the switching element configured to perform switching in response to an input value and output a signal corresponding to a multiplication result between the input value and the stored value.
The switching element may be configured to, when connected between a supply voltage and the output end of the memory cell: be turned off in response to a logic value of one being received as the input value, and be turned on in response to a logic value of zero being received as the input value.
The switching element may be configured as a pull-up transistor configured to receive the input value at a gate terminal.
The first transistor and the second transistor may each be an N-type metal-oxide-semiconductor (NMOS) transistor, and wherein the pull-up transistor may be a P-type metal-oxide-semiconductor (PMOS) transistor.
The memory device may be configured to select one operation from between a first operation and a second operation and perform the selected operation, wherein the first operation may include driving a voltage at an output end of the pull-up transistor to a supply voltage in response to a voltage less than the supply voltage being applied through a word line in some multiplication operations in a series of multiplication operations, and outputting each time a multiplication operation result according to an input supplied to the memory device, and the second operation may include driving a voltage at the output end of the pull-up transistor to the supply voltage in a pre-charge phase for each multiplication operation, and performing a multiplication operation in an evaluation phase.
The memory device may be further configured to select the one operation from between the first operation and the second operation based on either an operating frequency of the memory device or a leakage.
The memory device may further include an adder connected to an output end of the multiplying cell and configured to add an inverse value of a signal output from the multiplying cell
The memory device may further include a global bit line and switch for a read operation or a write operation on the weight of the memory cell through access to the memory cell of the multiplying cell.
The multiplying cell ma includes memory cells connected to the same pull-up transistor.
The memory device may further include an input/word line driver configured to select, from among the memory cells, a memory cell to be used for a target multiplication operation.
The input/word line driver may include a decoding circuit configured to decode an input value provided to the multiplying cell from an input signal and from a signal designating the memory cell to be used for the target multiplication operation.
The memory device may be further configured to activate a word line connected to a memory cell storing a value corresponding to a target operation among memory cells included in one multiplication cell, and deactivate a word line connected to a memory cell, among the memory cells, other than the memory cell of the activated word line.
The memory device of claim 9 may be further configured to select a first memory cell from among the memory cells for a first operation among a plurality of operations and output a signal corresponding to a multiplication result through the same pull-up transistor, and select a second memory cell from among the memory cells for a second operation among the plurality of operations and output a signal corresponding to a multiplication result through the same pull-up transistor.
The memory device may further include multiplying cells including the multiplying cell, and may be configured to perform a multiplication operation in each of the multiplying cells in parallel with other multiplying cells, and add, in the same adder, outputs of multiplying cells connected to the same column line among the plurality of multiplying cells.
The multiplying cell may be connected to a pair of local bit lines, a first memory cell among memory cells included in the multiplying cell may be connected to a first local bit line, and a second memory cell among the plurality of memory cells may be connected to a second local bit line.
The first memory cell may be connected to the first local bit line and may have a value corresponding to a weight of a neural network, and the second memory cell connected to the second local bit line may have an inverse value of the weight.
The memory device may further include an accumulator configured to store an output of an adder configured to add multiplication results of the multiplying cell, and accumulate results of the adding.
The memory device may further include an output register configured to store a final multiplication operation result output from the accumulator.
The memory device may be further configured to, when receiving an input signal corresponding to a last bit of a single bit or multiple bits, store an accumulator operation result for the input signal in an output register.
The memory device may further include a memory controller configured to control the multiplying cell, an input/word line driver, a read/write circuit, an adder, an accumulator, and an output register.
The memory device may be further configured to, in response to either a preset period having elapsed or a multiplication operation using another memory cell being performed in each multiplying cell, perform an operation for a pre-charge on an output end of a pull-up transistor.
In one general aspect, a method of operating a memory device includes receiving an input value through a word line by a memory cell including two inverters connected to each other in opposite directions relative to each other, and two transistors connected to both ends of the two inverters, receiving the input value at a gate terminal by a pull-up transistor connected to an output end of the memory cell, and outputting, from an output end of the pull-up transistor, a signal corresponding to a multiplication result between the input value and a weight stored in the memory cell.
In one general aspect, a memory device includes a pull-up transistor having a gate and connected to an output line, and a memory cell including a pair of inverters connected to each other at their respective ends in opposite directions such that the pair of inverters has a first end and a second end, and a cell transistor having a gate and connected to the first end of the pair of inverters and to the output line, and in response to an input having the same logic value being applied to the gate of the pull-up transistor and the gate of the cell transistor, configured to output, to the output line, a logic value corresponding to a binary multiplication result between the input and a binary value stored in the memory cell.
The logic value corresponding to the binary multiplication result may be a NAND result.
The pull-up transistor may be a P-type metal-oxide-semiconductor (PMOS) transistor, and the cell transistor may be an N-type metal-oxide-semiconductor (NMOS) transistor.
The multiplication result may be output every clock cycle.
The multiplication result may be output only every two clock cycles.
The cell transistor may be a first cell transistor, and the memory cell may further include a second cell transistor having a gate and connected to the second end of the pair of inverters, wherein an input having the same logic value is applied to the gate of the second cell transistor.
The output line may be a first output line further including a second output line.
The cell transistor may be a first cell transistor, and the memory cell may further include a second cell transistor having a gate and connected to the other end of the pair of inverters and to the second output line.
The pull-up transistor may be a first pull-up transistor, and the memory device may further include a second pull-up transistor connected to the second output line.
The memory cell may be one of multiple memory cells connected to the first output line and the second output line.
The memory cell may be one of multiple memory cells connected to the output line.
Other features and aspects will be apparent from the following detailed description, the drawings, and the claims.
Throughout the drawings and the detailed description, unless otherwise described or provided, the same drawing reference numerals will be understood to refer to the same or like elements, features, and structures. The drawings may not be to scale, and the relative size, proportions, and depiction of elements in the drawings may be exaggerated for clarity, illustration, and convenience.
The following detailed description is provided to assist the reader in gaining a comprehensive understanding of the methods, apparatuses, and/or systems described herein. However, various changes, modifications, and equivalents of the methods, apparatuses, and/or systems described herein will be apparent after an understanding of the disclosure of this application. For example, the sequences of operations described herein are merely examples, and are not limited to those set forth herein, but may be changed as will be apparent after an understanding of the disclosure of this application, with the exception of operations necessarily occurring in a certain order. Also, descriptions of features that are known after an understanding of the disclosure of this application may be omitted for increased clarity and conciseness.
The features described herein may be embodied in different forms and are not to be construed as being limited to the examples described herein. Rather, the examples described herein have been provided merely to illustrate some of the many possible ways of implementing the methods, apparatuses, and/or systems described herein that will be apparent after an understanding of the disclosure of this application.
The terminology used herein is for describing various examples only and is not to be used to limit the disclosure. The articles “a,” “an,” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. As used herein, the term “and/or” includes any one and any combination of any two or more of the associated listed items. As non-limiting examples, terms “comprise” or “comprises,” “include” or “includes,” and “have” or “has” specify the presence of stated features, numbers, operations, members, elements, and/or combinations thereof, but do not preclude the presence or addition of one or more other features, numbers, operations, members, elements, and/or combinations thereof.
Throughout the specification, when a component or element is described as being “connected to,” “coupled to,” or “joined to” another component or element, it may be directly “connected to,” “coupled to,” or “joined to” the other component or element, or there may reasonably be one or more other components or elements intervening therebetween. When a component or element is described as being “directly connected to,” “directly coupled to,” or “directly joined to” another component or element, there can be no other elements intervening therebetween. Likewise, expressions, for example, “between” and “immediately between” and “adjacent to” and “immediately adjacent to” may also be construed as described in the foregoing.
Although terms such as “first,” “second,” and “third”, or A, B, (a), (b), and the like may be used herein to describe various members, components, regions, layers, or sections, these members, components, regions, layers, or sections are not to be limited by these terms. Each of these terminologies is not used to define an essence, order, or sequence of corresponding members, components, regions, layers, or sections, for example, but used merely to distinguish the corresponding members, components, regions, layers, or sections from other members, components, regions, layers, or sections. Thus, a first member, component, region, layer, or section referred to in the examples described herein may also be referred to as a second member, component, region, layer, or section without departing from the teachings of the examples.
Unless otherwise defined, all terms, including technical and scientific terms, used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure pertains and based on an understanding of the disclosure of the present application. Terms, such as those defined in commonly used dictionaries, are to be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and the disclosure of the present application and are not to be interpreted in an idealized or overly formal sense unless expressly so defined herein. The use of the term “may” herein with respect to an example or embodiment, e.g., as to what an example or embodiment may include or implement, means that at least one example or embodiment exists where such a feature is included or implemented, while all examples are not limited thereto.
Hereinafter, examples will be described in detail with reference to the accompanying drawings. When describing the examples with reference to the accompanying drawings, like reference numerals refer to like components and a repeated description related thereto is omitted.
In computing devices that use the von-Neumann architecture, there may be a limitation in performance and power due to frequent data movements between an operator portion (e.g., a main processor) and a memory portion. IMC, which is a computer architecture for performing computation operations (e.g., MAC operations) directly on data in a memory in which data is stored, may reduce the frequency of data movements between a processor 120 and an IMC memory device 110 and may increase power efficiency. In an IMC system 100, the processor 120 may input data (that is to be computed) into the memory device 110, and the IMC memory device 110 may perform an operation (or computation) by itself on the data. The processor 120 may read a result of the operation from the IMC memory device 110. Accordingly, data transmission during such a computation process may be minimized.
For example, the IMC system 100 may perform a MAC operation that is frequently used in an artificial intelligence (AI) algorithm and in various other kinds of operations. As illustrated in
In Equation 1, Ot denotes an output to a t-th node, Im denotes an m-th input, and Wt,m denotes a weight to be applied to the m-th input to be input to the t-th node. Ot, which is an output of a node or a node value of the node, may be calculated as a weighted sum of the input Im and the weight Wt,m. Here, m may be greater than or equal to zero (0) and less than or equal to M−1, t may be greater than or equal to 0 and less than or equal to T−1. M denotes the number of nodes of a previous layer connected to one node of a current layer (the current layer being a target to be computed) and T denotes the number of nodes of the current layer. According to an embodiment, the IMC memory device 110 of the IMC system 100 may perform the MAC operation described above with input data inputted to the IMC memory device 110 serving as one operand and with data stored in the IMC memory device 110 as another operand (e.g., weight data). The IMC memory device 110 may also be referred to as a resistive memory device, a memory array, or an IMC device.
IMC devices may be classified into analog IMC devices and digital IMC devices. Analog IMC devices may perform a MAC operation in an analog domain including a current, a charge, or a time domain. Digital IMC devices may perform a MAC operation using a logic circuit, for example. Digital IMC may be readily implemented by advanced processing and exhibit a desirable performance. According to an embodiment, the memory device 110 may have a static random-access memory (SRAM) unit for storing a bit, which may include a plurality of transistors (e.g., six transistors). The SRAM unit including six transistors may also be referred to as a 6T SRAM. The SRAM unit may store data as a logic value of 0 or 1 and may thus not require domain transformation. For example, the memory device 110 may include a multiplying cell in which a pull-up transistor and a memory cell (e.g., an SRAM cell) are combined. The multiplying cell may include multiple memory cells connected to one pull-up transistor, and thus the memory array of the memory device 110 may be implemented with a smaller number of transistors. Accordingly, the memory device 110 may have hardware with improved area efficiency and power efficiency by the multiplying cell. The memory device 110 is not limited to being used for a MAC operation, and the memory device 110 may be used to drive various algorithms that include memory storage and multiplication operations. A computing structure in which the memory device 110 directly performs an operation within its memory without an external data movement is described below.
According to an embodiment, a memory device 200 (e.g., the memory device 110 of
The input/word line driver 220 may transmit, to the multiplying cell 210, input data on which an operation is to be performed. The input/word line driver 220 may generate a pull-up signal and a word line signal to be applied to a memory cell of each multiplying cell 210 and a pull-up transistor. The pull-up signal and the word line signal may each be a signal that is determined based on an input value of input data, and will be described later with reference to
For example, when a weight is a multi-bit value, output lines corresponding to the number of bits for representing the weight may be grouped. The grouped output lines may be collectively referred to as an output line group. For example, in a case of an X-bit weight, X output lines may be grouped, and the grouped X output lines may output multiplication sum results between an input value and the X-bit weight. In this example, X may be an integer greater than or equal to 2. For example, a first output line among the X output lines grouped into one group may output a multiplication result between a weight bit value corresponding to an LSB of the weight and an input bit value. For example, an x output line may output a multiplication result between a weight bit value at an x−1th bit position from the LSB and an input bit value. In this example, x may be an integer greater than or equal to 2 and less than or equal to X. In this example, an accumulator circuit 241 may apply bit shifting of a bit position corresponding to output lines of the same output line group to a sum of result outputs from the corresponding output line, and accumulate values to which the bit shifting is applied to output a final MAC operation result.
Also, when one multiplying cell 210 includes multiple memory cells, the input/word line driver 220 may select a memory cell storing a weight to be applied to received input data. The input/word line driver 220 may use a decoding unit (e.g., a decoding circuit) to extract a value indicating the memory cell storing the weight to be applied to the input data. Operation of a structure in which the multiplying cell 210 includes a plurality of memory cells is described with reference to
According to an embodiment, the multiplying cell 210 may perform a multiplication operation between a received input value and a weight stored in a memory cell. The multiplying cell 210 may output a signal corresponding to a multiplication result, through a structure in which the memory cell, a pull-up transistor, a word line WL, and a pull-up line PU are connected. For example, as described with reference to
The adder 230 may have an input connected to an output end of the multiplying cell 210. The output end of the multiplying cell 210 may correspond to an output line. The output end of the multiplying cell 210 may be connected to one output line. The adder 230 may add an inverse value of a signal output from the multiplying cell 210. The adder 230 may add multiplication results of multiplying cells 210 connected to the same output line. The adder 230 may be implemented as a full adder, a half adder, and/or a flip-flop, and may be implemented as an adder tree circuit. In addition, as described above, an output result of the multiplying cell 210 may be a NAND result value, and thus the adder 230 may be implemented with the inclusion of an inverting function or an inverter (logical negation) for inverting the output result of each multiplying cell 210. The adder 230 may add inverted values (results) outputted by respective multiplying cells 210. The adder 230 may transmit a result of adding a plurality of multiplication results to the accumulator circuit 241. The adder 230 may be disposed on each output line. For example, when there are T output lines, T adders may be respectively disposed. In this example, T multiplication result sum values may be transmitted from the T adders to the accumulator circuit 241.
The outputter 240 may include the accumulator circuit 241 and an output register 242. The accumulator circuit 241 may output a final MAC operation result by combining results.
The accumulator circuit 241 (e.g., an accumulator) may store an output of the adder 230 (which adds multiplication results of multiplying cells 210) and may accumulate results of the adding. For example, when the input/word line driver 220 receives multi-bit input data (e.g., streamed to the memory device 200), the input/word line driver 220 may sequentially transmit a bit value for each bit position to each multiplying cell 210. Thus, each multiplying cell 210 may output a multiplication result value of a corresponding bit position. The adder 230 may transmit a result of adding multiplication result values of a corresponding bit position to the accumulator circuit 241. The accumulator circuit 241 may perform bit shifting on the adding result for the corresponding bit position. The accumulator circuit 241 may combine a bit-shifted adding result of an adding result for a subsequent bit position and may obtain an accumulated result of multiplication results for each bit position. As described later, when the input/word line driver 220 receives single-bit input data, bit shifting may not be required, and thus the accumulator circuit 241 may transmit the adding result of the adder 230 immediately to the output register 242.
The output register 242 may store a final multiplication operation result (e.g., a MAC result) output from the accumulator circuit 241. The final multiplication operation result (e.g., the MAC result) stored in the output register 242 may be read by the processor to be used for other operations. For example, when the memory device 200 is capable of performing only a MAC operation corresponding to some of the layers of a neural network at a time, a MAC result stored in the output register 242 may be transmitted to the input/word line driver 220 for an operation of a subsequent layer. The input/word line driver 220 of the memory device 200 may select a memory cell in which a weight set corresponding to the subsequent layer is set and may then perform a multiplication operation.
The weight set may be a set of weights by which an input is multiplied in one MAC operation. That is, the weight set and the input may be operands of the MAC operation. For example, the weight set may be a set of connection weights between nodes in one layer and nodes in another layer in a neural network. However, the weight set is not limited to a set of connection weights between nodes in a neural network, and a different weight set may be used for each task. Moreover, application of the memory device 200 is not limited to any particular type of input or stored data. For example, when a first weight set is required in a MAC operation for a first task, the memory device 200 may select a memory cell in which a weight included in the first weight set is stored from among memory cells included in a multiplying cell 210. Similarly, when a second weight set is required in a MAC operation for a second task, the memory device 200 may select a memory cell in which a weight included in the second weight set is set.
The read/write circuit 280 may read and write data of a memory cell included in a multiplying cell 210. The data of the memory cell may include, for example, a weight by which an input value is to be multiplied in a MAC operation. The read/write circuit 280 may access the memory cell of the multiplying cell 210 through a global bit line (e.g., a GBL and a GBLB as shown in
The memory controller 290 may control the multiplying cells 210, the input/word line driver 220, the read/write circuit 280, the adders 230, the accumulator circuit 241, and the output register 242.
The memory device 200 may be implemented as a neural network device, an IMC circuit, and/or a MAC circuit or device. The memory device 200 may include area-efficient SRAM multiplying cells for IMC. The memory device 200 may receive an input value through a word line, and may output a signal (e.g., a NAND result signal) corresponding to a multiplication result between the input value and a weight stored in a 6T SRAM memory cell through a bit line. The memory device 200 may perform functions of a controller and a multiplier with a smaller number of transistors.
According to an embodiment, a multiplying cell 310 may perform a multiplication operation between an input value and a weight previously set/stored in a memory cell 311. Each multiplying cell 310 may include a memory cell 311 and a switching element 319 (e.g., a pull-up transistor). Each multiplying cell 310 may be connected to two local bit lines (e.g., an LBL and an LBLB), and one switching element 319 may be disposed on at least one of the two local bit lines. For example, each multiplying cell 310 may include only one switching element 319 on one of the two local bit lines. In the examples illustrated in
According to an embodiment, the memory cell 311 may have a set/stored weight. The memory cell 311 may selectively provide a weight-based signal to an output line in response to an input value. For example, when receiving a first logic value (e.g., a logic value of 0 or L) through a word line, the memory cell 311 may be disconnected from the output line. When receiving a second logic value (e.g., a logic value of 1 or H) through the word line, the memory cell 311 may provide a weight-based signal (e.g., a signal indicating an inverse value (QB) of a logic value of a set/stored weight) to the output line.
The memory cell 311 may include two inverters INV1 and INV2 and a cell transistor (e.g., a first transistor TR1). The cell transistor may have a gate and may be connected to one end of the pair of inverters INV1 and INV2 and to the output line. Two transistors (e.g., cell transistors) may be connected to both ends of the two inverters INV1 and INV2. For example, the pair of inverters INV1 and INV2 may be connected in opposite directions. A memory device may include multiple memory cells connected to the output line.
The inverters INV1 and INV2 may be paired at respective ends thereof. The first transistor TR1 (e.g., a first cell transistor) may be connected to one end of the pair of inverters INV1 and INV2. A second transistor TR2 (e.g., a second cell transistor) may be connected to the other end of the pair of inverters INV1 and INV2. The memory cell 311 may be configured with six transistors including the two inverters INV1 and INV2, the first transistor TR1, and the second transistor TR2. The memory cell 311 may be an SRAM implemented with six transistors. The value QB, obtained by inverting the weight, may be set at one end of the pair of inverters INV1 and INV2. The weight may be set at the other end of the pair of inverters INV1 and INV2 in the memory cell 311. A gate terminal of the first transistor TR1 and the second transistor TR2 may be connected to a word line WLm. One end of the first transistor TR1 may be connected to the first local bit line LBLB, and the other end of the first transistor TR1 may be connected to the pair of inverters INV1 and INV2. One end of the second transistor TR2 may be connected to the second local bit line LBL, and the other end of the second transistor TR2 may be connected to the pair of inverters INV1 and INV2. The cell transistors (e.g., the first transistor TR1 and the second transistor TR2) may each be a N-type metal-oxide-semiconductor (NMOS) transistor. An input having the same logic value may be applied to a gate of a pull-up transistor, a gate of the first cell transistor, and a gate of the second cell transistor. The first cell transistor may be connected to the first output line (e.g., the first local bit line LBLB), and the second cell transistor may be connected to the second output line (e.g., the second local bit line LBL).
The switching element 319 may be connected to an output end Nout of the memory cell 311. The switching element 319 may output a signal corresponding to a multiplication result between an input value and a weight by performing switching in response to the input value. The switching element 319 may be connected between a supply voltage VDD and the output end Nout of the memory cell 311. The switching element 319 may be turned off when receiving a logic value of 1 as the input value. The switching element 319 may be turned on when receiving a logic value of 0 as the input value. For example, the switching element 319 may include a pull-up transistor capable of receiving an input value at a gate terminal. Examples of the switching element 319 as being the pull-up transistor are mainly described herein.
The pull-up transistor may have a gate and may be connected to an output line. Also, in the examples of
According to an embodiment, as an input (a same logic value) is applied to a gate of a pull-up transistor and to a gate of a cell transistor, the memory device (e.g., the multiplying cell 310) may output, to an output line, a logic value corresponding to a binary multiplication result of a binary weight set/stored in the memory cell 311 and the input. The logic value corresponding to the binary multiplication result may be determined as a NAND logic output. For example, the multiplying cell 310 may operate as illustrated in the truth table illustrated in
In the examples of
In the examples of
However, in the multiplying cell 310 operating as illustrated in
For example, signals in an inverse relationship may appear in the second local bit line LBL and the first local bit line LBLB. An example in which the same logic value is applied to the pull-up line PU and the activated word line WLm is mainly described herein.
According to an embodiment, a memory device (e.g., the memory device 200 of
The first operation 410 may be an operation of outputting a multiplication operation result every time (every clock/CLK cycle) according to a supplied input. The first operation 410 may include an operation of driving a voltage at an output end of the pull-up transistor to the supply voltage as a voltage (e.g., 0V) sufficiently lower than the supply voltage is applied through the word line WL in some of a series of multiplication operations. That is, the voltage at the output end may be initialized to the supply voltage. The multiplying cell of the memory device may receive an input signal (e.g., an input value) on which an operation is to be performed every clock cycle through the word line WL. The multiplying cell may output a multiplication operation result between the input value and a weight stored in a node Q.
For example, in a state of M1, when the input value received through the word line WL is 1 and the weight of node Q is 0, the multiplying cell may maintain the supply voltage VDD on a local bit line LBLB. This is because when there is no leakage current (or when a leakage current is less than or equal to a threshold value) a voltage of the local bit line LBLB may be maintained at the supply voltage VDD without being dropped. In a state of M2, the input value of the word line WL is 0, and thus the local bit line LBLB may be driven toward the supply voltage VDD. Even when a slight leakage current occurs in the state of M1, the voltage of the local bit line LBLB may be restored due to the driving in the state of M2. When the input becomes 1 again in a state of M3, similar to the state of M1, the multiplying cell may maintain the supply voltage VDD on the local bit line LBLB. Thus, unless the leakage is large, the multiplying cell may substantially correctly output, as a voltage (e.g., 0 or VDD) corresponding to a logic value, a result of a multiplication of all input bit values and weight bit values to the local bit line LBLB through an output end.
For example, the memory device may perform an operation for pre-charging on the output end of the pull-up transistor in response to either a case where a predetermined period has elapsed or a case where a multiplication operation using another memory cell is performed in each multiplying cell. During the operating time, if an input value of 0 is not received through the word line WL and through the pull-up line PL, and if a voltage is not driven to the supply voltage on the local bit line LBLB, the voltage of the local bit line LBLB may be gradually reduced by an amount of voltage that may be up to VDD−VTH. The memory device may periodically perform an initialization operation (e.g., an operation of applying a voltage of 0 to the word line WL) such that the voltage of an output end of a multiplier is maintained at the supply voltage.
The second operation 420 may be an operation of driving a voltage of the output end of the pull-up transistor to the supply voltage in a pre-charge phase P for each multiplication operation and performing a multiplication operation in an evaluation phase E (as opposed to every clock cycle as in the first operation 410). For example, in the second operation 420, a first clock cycle may be used for the pre-charge phase P and a next clock cycle may be used for the evaluation phase E. An operation in the evaluation phase E may be the same as the first operation 410. The memory device may permanently force the voltage of the word line WL to 0 in a corresponding clock cycle in the pre-charge phase P. That is, the memory device may drive, to the supply voltage VDD, the voltage of the local bit line LBLB to which the output end of the multiplying cell is connected. Thereafter, the memory device may perform an operation by transmitting an input value to the word line WL in the evaluation phase E. The second operation 420 may be used in a structure in which a large leakage current occurs due to a circuit structure and layout or in a circuit using a clock cycle of a frequency slower than a threshold value.
The memory device may selectively determine and use an operation option in an advantageous manner according to a situation. For example, the memory device may select the first operation 410 or the second operation 420 of the memory device based on an operating frequency of the memory device or a leakage. The memory device may perform the second operation 420 when the operating frequency is less than a threshold frequency, and perform the first operation 410 when the operating frequency is greater than or equal to the threshold frequency. The memory device may perform the second operation 420 when the leakage is greater than a threshold value and perform the first operation 410 when the leakage is less than or equal to the threshold value. The memory device may further include a circuit for monitoring the foregoing operating frequency or leakage current, and a memory controller of the memory device, an input/word line driver, or an external processor may determine which of the operating modes is in effect.
According to an embodiment, a memory device (e.g., the memory device 200 of
In addition, the memory device may further include a global bit line (e.g., GBL and GBLB) and a switch SW for at least one of a read operation or a write operation on the weight of the memory cell through access to the memory cell of the multiplying cell 510. The global bit line (e.g., GBL and GBLB) may be connected to a first transistor and a second transistor of the multiplying cell 510 via the switch SW. GBLB indicates a global bit line bar (as in a crossbar construction). The global bit line (e.g., GBL and GBLB) may be connected to a read/write circuit 580. For example, the memory device may turn on switches SW disposed at both ends of a memory cell that is a target of a read operation or a write operation. The memory device may access a corresponding switched-on memory cell by activating a word line connected to the memory cell. The memory device may read a weight value recorded in the memory cell or may change and/or set the weight value of the memory cell through the read/write circuit 580.
Hereinafter, a structure that may improve area efficiency (computation/storage per unit of chip area) as a plurality of memory cells is connected within one multiplying cell 510 to share one pull-up transistor will be described with reference to
According to an embodiment, a multiplying cell 610 may be implemented in a structure in which a plurality of memory cells 611 share the same multiplication circuit (i.e., store bits for a same multiplying cell 610). For example, at least one multiplying cell 610 may include memory cells 611 connected to the same pull-up transistor 619 of the multiplying cell 610. The pull-up transistor 619 may be connected to output ends of the respective memory cells 611 at the same node and on the same local bit line.
The input/word line driver 620 may select a memory cell 611 of a multiplying cell 610 to be used for a target multiplication operation from among the plurality of memory cells 611 of the multiplying cell 610. The input/word line driver 620 may include a decoding circuit The decoding circuit may decode an input value provided to the multiplying cell 610 from an input signal and a signal appointing/selecting the memory cell 611 among the memory cells 611 included in the multiplying cell 610 to be used for the target multiplication operation. For example, in the example of
For example, the input/word line driver 620 may apply an m-th input value INm to an m-th pull-up line PUm and an i-th word line WLm,i in the multiplying cell 610 in response to an m-th input. A remaining word line WLm,k may be deactivated. As illustrated in a timing diagram, the m-th multiplying cell 610 may output a multiplication result Pm,i between the input value received through the i-th word line and a weight of the i-th memory cell 611 through a shared pull-up transistor 619 on a local bit line. That is, the multiplying cell 610 may output the multiplication result Pm,i between the m-th input value INm and the i-th weight Qm,i.
For example, the truth table of
According to an embodiment, the memory device may selectively activate a memory cell corresponding to each operation while sequentially performing a plurality of operations. That is, memory cells in a multiplying cell may be activated sequentially for respectively corresponding operations. When M multiplying cells are arranged on one output line and each of the multiplying cells includes N memory cells, a total number of memory cells may be M×N. For each operation, one memory cell may be selected from each of the M multiplying cells, and thus the memory device may select M memory cells from among the M×N memory cells. For a first operation among a plurality of operations, the memory device may select a first memory cell from among a plurality of memory cells (for each of the M multiplying cells) and output a signal corresponding to a multiplication result through the same pull-up transistor 619. For a second operation among the plurality of operations, the memory device may select a second memory cell among the plurality of memory cells and output a signal corresponding to a multiplication result through the same pull-up transistor 619.
For example, referring to
According to an embodiment, a memory device may include multiplying cells including a multiplying cell 710. For example, the multiplying cells may be arranged in an array structure. The multiplying cells may be arranged along a plurality of output lines and a plurality of word lines. As illustrated in
For example, multiplying cells connected to the same word line may receive the same input value INm. Each of the multiplying cells may perform a multiplication operation in parallel with each of the other multiplying cells. The memory device may add outputs of multiplying cells connected to the same column line (e.g., the same output line) among the multiplying cells, in the same adder 730. One multiplying cell and another multiplying cell may output their multiplication results in parallel with each other. In one multiplying cell (e.g., the multiplying cell 710), a multiplication operation based on one memory cell may be performed. That is, for example, when each multiplying cell 710 includes N memory cells, the input/word line driver 720 may select one memory cell from among the N memory cells every cycle. When M multiplying cells are connected to an output line, M multiplication operations may be performed in parallel. When there are T output lines, M×T multiplication operations may be performed in parallel in the memory array of the memory device. Since results of the M multiplication operations connected to the same output line are added, an outputter 740 may generate T accumulated output values.
In the memory device illustrated in
As illustrated on the right side of
According to an embodiment, a multiplying cell 810 may be connected to a pair of local bit lines. The multiplying cell 810 may output a multiplication result based on a first memory cell 811 (selected among a plurality of memory cells of the multiplying cell 810) to a first local bit line 850R, and output a multiplication result based on a second memory cell 812 to a second local bit line 850R. In the example of
The memory device may include a first pull-up transistor 819-R for outputting the multiplication result to the first local bit line 850R (e.g., a first output line) and may also include a second pull-up transistor 819-L for outputting the multiplication result to the second local bit line 850L (e.g., a second output line). Accordingly, the first memory cell 811 connected to the first local bit line 850R may have a value corresponding to a weight. The second memory cell 812 connected to the second local bit line 850L may have an inverse value of the weight. The memory device may include a plurality of memory cells connected to the first output line and the second output line.
In an adder, the multiplication result output through the first local bit line 850R of the first memory cell 811 may be added to the multiplication result output through the second local bit line 850L of the second memory cell 812. That is, even in the same multiplying cell, multiplication results of memory cells connected to different local bit lines may be added in the adder. A structure illustrated in
For example, the memory device may output a multiplication operation based on a memory cell having one of even-numbered index/location of weights in the multiplying cell 810 to the first local bit line 850R, and output a multiplication operation based on a memory cell having one of odd-numbered index/location of weights to the second local bit line 850L. However, a method of setting a weight is not limited to the foregoing example. Although the number of memory cells connected to the first local bit line 850R and the number of memory cells connected to the second local bit line 850L are described herein as being the same in one multiplying cell 810 for a symmetrical structure, examples are not limited thereto. For example, depending on design, the number of memory cells connected to each local bit line may vary.
According to an embodiment, the memory device may simultaneously perform multiplications on a first weight Qm,i and a second weight Qm,j with respect to the same input value INm within one multiplying cell 810. The input/word line driver 820 may apply a logic value of the input value INm to a pull-up line PUm, a second word line LWLm,j, and a first word line RWLm,j, all at once. The input/word line driver 820 may apply a logic value of 0 to all remaining word lines. A first multiplication result RP and a second multiplication result LP may be simultaneously output respectively from the first local bit line 850R and the second local bit line 850L. The structure illustrated in
A multiplying cell 910 illustrated in
The multiplication results of the local bit lines may be individually transmitted to an adder 930. For example, as illustrated in
In operation 1010, a memory device may transmit an input value to a multiplying cell. For example, a memory cell may receive the input value through a word line. As described above, the memory cell may have two inverters connected (paired ends) in opposite directions and two transistors connected to the paired ends of the two inverters, respectively. A pull-up transistor connected to an output end of the memory cell may receive the input value at a gate terminal.
In operation 1020, the multiplying cell of the memory device may output a signal corresponding to a multiplication result. For example, the memory device may output a signal corresponding to the multiplication result between the input value and a weight stored in the memory cell from an output end of the pull-up transistor. According to the truth table illustrated in
In operation 1101, a memory device may manage data in a memory array. For example, the memory device may set/store a weight (or any data to serve as an operand for an IMC operation such as a MAC operation) for each memory cell of the memory array, using a read/write circuit. A processor external to the memory device may instruct the memory device with data to be written and an address of the memory cell for which the weight is to be set/stored.
In operation 1102, the memory device may determine whether to initiate a MAC operation. For example, when receiving an input value that is a target or operand of the MAC operation, the memory device may initiate the MAC operation.
Subsequently, in operation 1010, the memory device may transmit the input value to a multiplying cell. For example, in operation 1111, the memory device may transmit an input signal and a weight set address to an input/word line driver. The external processor may also transmit, to the memory device, the input signal and the weight set address (e.g., a signal indicating an i-th memory cell among memory cells included in the multiplying cell). In operation 1112, the input/word line driver may generate a control signal. For example, the input/word line driver may decode the input signal and the weight set address, and apply a logic value equal to the input value to a pull-up line PUm and a word line WLm,i. The input/word line driver may apply a logic value of 0 to remaining word lines.
In operation 1120, the memory device may output a signal corresponding to a multiplication result of a memory cell selected from within the multiplying cell. For example, each multiplying cell may output a signal (e.g., a NAND result value) corresponding to a multiplication result between an input value INm and a weight Qm,i of the selected memory cell to a local bit line. Outputs of a plurality of multiplying cells connected to the same output line may be transmitted to an adder of the corresponding output line.
In operation 1130, the adder may perform a sum operation on multiplication results. As described above, the adder may receive a NAND result and may add an inverse value obtained by inverting the NAND result. The adder may transmit the added multiplication result values to an accumulator.
In operation 1140, the accumulator may accumulate a result of adding the multiplication results. As described later, in the case of a multi-bit input value, the accumulator may perform bit shifting according to a corresponding bit position and accumulate a multiplication result for a subsequent bit position.
In operation 1150, the memory device may determine whether the input value on which the multiplication operation is performed is a last bit. For example, when performing an operation on the last bit, the memory device may transmit an output of the accumulator to an output register. In the case of a single-bit input value, the accumulation may not be needed, and thus the accumulator may bypass the multiplication result to the output register. When a current input bit value is not the last bit, the memory device may perform the same operation on an input bit value of the subsequent bit position. When the multiplication result is output from the adder, the memory device may perform bit shifting on a previously stored accumulation result through the accumulator, add it up to the current multiplication result, and store a corresponding result in the accumulator again to accumulate the result.
In operation 1160, the memory device may store the accumulated result in the output register. For example, when receiving an input signal corresponding to the last bit of a single bit or multiple bits, the memory device may store, in the output register, a result of an operation of the accumulator for the input signal.
In operation 1170, the memory device may initialize the accumulator and at operation 1180 the process may end when the MAC operation is completed.
According to an embodiment, the memory device may have 30% or higher improved and/or reduced total number of transistors required for implementing a multiplication function, compared to a device embodying a 128 Kb crossbar array structure with 10 or 12 transistors.
According to an embodiment, an electronic device 1200 may include a high-density (HD) IMC macro 1210, a central processing unit (CPU) 1220, a random-access memory (RAM) 1230, a logic block 1240, and a high-efficiency (HE) IMC macro 1250.
The HD IMC macro 1210 may be a memory macro unit in which multiplying cells described above with reference to
The CPU 1220 may include a high-speed (HS) IMC macro 1221. The HS IMC macro 1221 may have a high throughput and operating speed and may represent a cell structure of a register file type.
The RAM 1230 may include a memory to be used as a system memory.
The logic block 1240 may include a logic circuit to be used for various logic operations.
The HE IMC macro 1250 may have a high energy efficiency and a low supply voltage operation.
According to an embodiment, the electronic device 1200 may be implemented as a dedicated hardware accelerator for an artificial intelligence (AI) algorithm (e.g., face recognition).
While embodiments are described herein as operating on neural network data such as weight data and input data inputted to a neural network, the embodiments of memory devices described herein are not limited to such applications. The IMC memory device features described herein can be used with any type of stored data or input data.
The computing apparatuses, the electronic devices, the processors, the memories, the image sensors, the displays, the information output system and hardware, the storage devices, and other apparatuses, devices, units, modules, and components described herein with respect to
The methods illustrated in
Instructions or software to control computing hardware, for example, one or more processors or computers, to implement the hardware components and perform the methods as described above may be written as computer programs, code segments, instructions or any combination thereof, for individually or collectively instructing or configuring the one or more processors or computers to operate as a machine or special-purpose computer to perform the operations that are performed by the hardware components and the methods as described above. In one example, the instructions or software include machine code that is directly executed by the one or more processors or computers, such as machine code produced by a compiler. In another example, the instructions or software includes higher-level code that is executed by the one or more processors or computer using an interpreter. The instructions or software may be written using any programming language based on the block diagrams and the flow charts illustrated in the drawings and the corresponding descriptions herein, which disclose algorithms for performing the operations that are performed by the hardware components and the methods as described above.
The instructions or software to control computing hardware, for example, one or more processors or computers, to implement the hardware components and perform the methods as described above, and any associated data, data files, and data structures, may be recorded, stored, or fixed in or on one or more non-transitory computer-readable storage media. Examples of a non-transitory computer-readable storage medium include read-only memory (ROM), random-access programmable read only memory (PROM), electrically erasable programmable read-only memory (EEPROM), random-access memory (RAM), dynamic random access memory (DRAM), static random access memory (SRAM), flash memory, non-volatile memory, CD-ROMs, CD-Rs, CD+Rs, CD-RWs, CD+RWs, DVD-ROMs, DVD-Rs, DVD+Rs, DVD-RWs, DVD+RWs, DVD-RAMs, BD-ROMs, BD-Rs, BD-R LTHs, BD-REs, blue-ray or optical disk storage, hard disk drive (HDD), solid state drive (SSD), flash memory, a card type memory such as multimedia card micro or a card (for example, secure digital (SD) or extreme digital (XD)), magnetic tapes, floppy disks, magneto-optical data storage devices, optical data storage devices, hard disks, solid-state disks, and any other device that is configured to store the instructions or software and any associated data, data files, and data structures in a non-transitory manner and provide the instructions or software and any associated data, data files, and data structures to one or more processors or computers so that the one or more processors or computers can execute the instructions. In one example, the instructions or software and any associated data, data files, and data structures are distributed over network-coupled computer systems so that the instructions and software and any associated data, data files, and data structures are stored, accessed, and executed in a distributed fashion by the one or more processors or computers.
While this disclosure includes specific examples, it will be apparent after an understanding of the disclosure of this application that various changes in form and details may be made in these examples without departing from the spirit and scope of the claims and their equivalents. The examples described herein are to be considered in a descriptive sense only, and not for purposes of limitation. Descriptions of features or aspects in each example are to be considered as being applicable to similar features or aspects in other examples. Suitable results may be achieved if the described techniques are performed in a different order, and/or if components in a described system, architecture, device, or circuit are combined in a different manner, and/or replaced or supplemented by other components or their equivalents.
Therefore, in addition to the above disclosure, the scope of the disclosure may also be defined by the claims and their equivalents, and all variations within the scope of the claims and their equivalents are to be construed as being included in the disclosure.
Number | Date | Country | Kind |
---|---|---|---|
10-2022-0088942 | Jul 2022 | KR | national |
10-2022-0143480 | Nov 2022 | KR | national |