Recent developments in the field of artificial intelligence have resulted in various products and/or applications, including, but not limited to, speech recognition, image processing, machine learning, natural language processing, or the like. Such products and/or applications often use neural networks to process large amounts of data for learning, training, cognitive computing, or the like. Memory devices configured to perform computing-in-memory (CIM) operations (also referred to herein as CIM memory devices) are usable neural network applications, as well as other applications. A CIM memory device includes a memory array configured to store weight data and/or input data to be used together in one or more CIM operations.
Aspects of the present disclosure are best understood from the following detailed description when read with the accompanying figures. It is noted that, in accordance with the standard practice in the industry, various features are not drawn to scale. In fact, the dimensions of the various features may be arbitrarily increased or reduced for clarity of discussion.
The following disclosure provides many different embodiments, or examples, for implementing different features of the provided subject matter. Specific examples of components, values, operations, materials, arrangements, or the like, are described below to simplify the present disclosure. These are, of course, merely examples and are not intended to be limiting. Other components, values, operations, materials, arrangements, or the like, are contemplated. For example, the formation of a first feature over or on a second feature in the description that follows may include embodiments in which the first and second features are formed in direct contact, and may also include embodiments in which additional features may be formed between the first and second features, such that the first and second features may not be in direct contact. In addition, the present disclosure may repeat reference numerals and/or letters in the various examples. This repetition is for the purpose of simplicity and clarity and does not in itself dictate a relationship between the various embodiments and/or configurations discussed. Source/drain(s) may refer to a source or a drain, individually or collectively dependent upon the context.
Further, spatially relative terms, such as “beneath,” “below,” “lower,” “above,” “upper” and the like, may be used herein for ease of description to describe one element or feature's relationship to another element(s) or feature(s) as illustrated in the figures. The spatially relative terms are intended to encompass different orientations of the device in use or operation in addition to the orientation depicted in the figures. The apparatus may be otherwise oriented (rotated 90 degrees or at other orientations) and the spatially relative descriptors used herein may likewise be interpreted accordingly.
In certain situations, a CIM operation involves an analog signal obtained by performing a digital-to-analog conversion (DAC) operation to convert digital data into the analog signal. In some embodiments, a subthreshold region of access transistors in memory cells of a memory array is used in a DAC operation. For example, various gate voltages lower than a threshold voltage of the access transistors are supplied by a control circuit to gates of the access transistors. In response to the corresponding gate voltage, an individual current corresponding to a datum stored in each memory cell and the corresponding gate voltage is permitted to flow through each memory cell. The individual currents of the memory cells are collected on a bit line, and a summation of the individual currents is performed to obtain an analog signal. As a result, digital data stored in the memory cells are converted to the analog signal. In some embodiments, the described DAC operation is performed without requiring or involving one or more separate DAC circuits (i.e., digital-to-analog converters). In at least one embodiment, this is an improvement in at least of power consumption, chip area, or design simplicity, over other approaches which use or require separate DAC circuits to perform DAC operations.
In some embodiments, besides the described DAC operation, the memory array is configured to perform other operations, such as read operations, program operations (or write operations), without requiring changes to arrangements of bit lines and/or word lines in the memory array.
In some embodiments, considering that the subthreshold region of the access transistors is temperature-dependent, a temperature sensor is provided to detect a temperature of the memory array during operation. Based on the detected temperature and predetermined calibration data, the gate voltages supplied to the access transistors during a DAC operation are adjusted, to ensure accuracy of the DAC operation in one or more embodiments.
In some embodiments, the memory array comprises a specific region configured to perform DAC operations. Such a specific region is sometimes referred to as a DAC region. Digital data to be converted are stored in a different region, sometimes referred to as a data storage region, of the memory array, and are copied to the DAC region to be converted to analog signals. As a result, it is possible in one or more embodiments to avoid data disturb in the data storage region, and/or to simplify the described temperature-dependent adjustment. In some embodiments, one or more devices, methods, operations, advantages described herein are applicable or achievable in applications other than CIM applications.
The memory device 100 comprises a memory macro 110 and a memory controller 120. The memory macro 110 comprises a memory array 112 of memory cells MC, a bit line (BL) selection circuit 115, a current summation circuit 117, and a sensing circuit 119. In some embodiments, the memory macro 110 further comprises one or more computation circuits configured to perform one or more CIM operations. An example of a computation circuit comprises a Multiply-Accumulate circuit (MAC). Other computation circuit configurations are within the scopes of various embodiments. In at least one embodiment, for an application other than a CIM application, computation circuits are omitted in the memory device 100. The memory controller 120 comprises a word line driver 122, and a control logic 123. The memory controller 120 is sometimes referred to as a control circuit. In some embodiments, one or more elements of the memory controller 120 are included in the memory macro 110, and/or one or more elements (except the memory array 112) of the memory macro 110 are included in the memory controller 120.
A macro has a reusable configuration and is usable in various types or designs of IC devices. In some embodiments, the macro is understood in the context of an analogy to the architectural hierarchy of modular programming in which subroutines/procedures are called by a main program (or by other subroutines) to carry out a given computational function. In this context, an IC device uses the macro to perform one or more given functions. Accordingly, in this context and in terms of architectural hierarchy, the IC device is analogous to the main program and the macro is analogous to subroutines/procedures. In some embodiments, the macro is a soft macro. In some embodiments, the macro is a hard macro. In some embodiments, the macro is a soft macro which is described digitally in register-transfer level (RTL) code. In some embodiments, synthesis, placement and routing have yet to have been performed on the macro such that the soft macro can be synthesized, placed and routed for a variety of process nodes. In some embodiments, the macro is a hard macro which is described digitally in a binary file format (e.g., Graphic Database System II (GDSII) stream format), where the binary file format represents planar geometric shapes, text labels, other information and the like of one or more layout-diagrams of the macro in hierarchical form. In some embodiments, synthesis, placement and routing have been performed on the macro such that the hard macro is specific to a particular process node.
A memory macro is a macro comprising memory cells which are addressable to permit data to be written to or read from the memory cells. In some embodiments, a memory macro further comprises circuitry configured to provide access to the memory cells and/or to perform a further function associated with the memory cells. For example, one or more weight buffers (not shown), one or more logic circuits (not shown) and one or more computation circuits (not shown) form circuitry configured to provide a CIM function associated with the memory cells MC in the memory macro 110. In at least one embodiment, a memory macro configured to provide a CIM function is referred to as a CIM macro. The described macro configuration is an example. Other configurations are within the scopes of various embodiments.
The memory cells MC are arranged in a plurality of columns and rows of the memory array 112. The memory controller 120 is electrically coupled to the memory cells MC and configured to control operations of the memory cells MC including, but not limited to, a read operation, a write operation, a DAC operation, a CIM operation, or the like.
The memory array 112 comprises a plurality of word lines (also referred to as “address lines”) WL0, WL1 to WLr extending along a row direction (i.e., the horizontal direction in
An example DRAM configuration 113 for each memory cell MC is shown in
In the example configuration in
The word line driver 122 is coupled to the memory array 112 via the word lines WL. The word line driver 122 is configured to decode a row address of the memory cell MC selected to be accessed in an access operation. The word line driver 122 is sometimes referred to as a word line decoder. The word line driver 122 is configured to supply a voltage to the selected word line WL corresponding to the decoded row address, and a different voltage to the other, unselected word lines WL. In at least one embodiment, the word line driver 122 comprises one or more driving circuits or inverters.
A bit line driver (not shown) is coupled to the memory array 112 via the bit lines BL. In some embodiments, the bit line driver is part of the memory macro 110 and/or is coupled to the bit lines BL through the BL selection circuit 115. The bit line driver is configured to decode a column address of the memory cell MC selected to be accessed in an access operation. The bit line driver is sometimes referred to as a bit line decoder. The bit line driver 124 is configured to supply a voltage to the selected bit line BL corresponding to the decoded column address, and a different voltage to the other, unselected bit lines BL. In at least one embodiment, the bit line driver comprises one or more driving circuits or inverters. In some embodiments, the memory controller 120 further comprises a source line driver (not shown) coupled to the memory cells MC via source lines (not shown). In one or more embodiments, one or more of the word line driver 122, the bit line driver, the source line driver are part of circuitry referred to as a read/write driver or a read/write decoder.
The control logic 123 is an example of one or more sub-controllers included in the memory controller 120, and configured to control other components and various operations in the memory device 100. In the example configuration in
The BL selection circuit 115 is configured to selectively couple one or more of the bit lines BL to one of the current summation circuit 117 and the sensing circuit 119. In some embodiments, the BL selection circuit 115 is configured to switch among the current summation circuit 117, the sensing circuit 119 and a bit line driver. In at least one embodiment, the BL selection circuit 115 is configured to have a switched state in which one or more of the bit lines BL are not coupled to any of the current summation circuit 117, the sensing circuit 119, and a bit line driver. The BL selection circuit 115 is coupled to the memory controller 120 which is configured to output a control signal Se1 to the BL selection circuit 115 to control switching of the BL selection circuit 115. In one or more embodiments, the BL selection circuit 115 comprises a switch, a transistor, a multiplexer, or the like. The memory controller 120 is configured to supply the control signal Se1 to a gate or a control terminal/pin/input of the BL selection circuit 115.
The current summation circuit 117 is configured to perform a summation of a bit line current on a bit line coupled to the current summation circuit 117 by the BL selection circuit 115, in a DAC operation as described herein. In some embodiments, the current summation circuit 117 comprises an integrator circuit. An example integrator circuit is described with respect to
The sensing circuit 119 is configured to perform a read operation, when coupled to a bit line by the BL selection circuit 115. In some embodiments, the sensing circuit 119 comprises a sense amplifier configured to determine a datum stored in a selected memory cell MC based on a read current on the bit line coupled to the selected memory cell MC and the sense amplifier. In at least one embodiment, the sensing circuit 119 further comprises a buffer for temporarily storing the read datum. Example buffers include, but are not limited to, registers, memory cells, or other circuit elements configured for data storage. Other configurations of the sensing circuit 119 and/or buffers are within the scopes of various embodiments. The described memory circuit configuration is an example, and other memory circuit configurations are within the scopes of various embodiments.
In the current-voltage characteristic 200, the horizontal axis indicates a voltage between the gate and the source of the access transistor T, also referred to herein as a gate voltage. The gate voltage is labelled as Vgs, and is shown in the linear scale in
When the gate voltage Vgs of the access transistor Tis at or above a threshold voltage Vth, the access transistor T is turned ON. When the gate voltage Vgs of the access transistor T is below the threshold voltage Vth, the access transistor T is turned OFF. A current value of the channel current Ids at the threshold voltage Vth is a threshold voltage current Ith. The current-voltage characteristic 200 has a subthreshold region 210 below the threshold voltage Vth. In the subthreshold region 210, although the access transistor T is turned OFF, there is a small amount of the channel current Ids flowing through the access transistor T. Such a small amount of the channel current Ids is sometimes referred to as a leakage current. In a non-limiting example, the leakage current in the subthreshold region 210 is 1 μA (uA, or microampere, or 10−6 A) and below.
The subthreshold region 210 of the current-voltage characteristic 200 comprises a linear region 220 in which the channel current Ids (in the logarithmic scale) increases linearly with an increase of the gate voltage Vgs. The actual current value (in the linear scale) of the channel current Ids in the linear region 220 increases exponentially with the increase of the gate voltage Vgs. The linear region 220 varies from one transistor to another transistor, depending on various factors including, but not limited to, sizes, materials, manufacturing processes, or the like, of the transistors. In the example configuration in
In some embodiments, the linear region 220 in the subthreshold region 210 of the access transistor T is to perform a DAC operation. For example, as shown in
In some embodiments, V0, V1, V2, V3, or the like, are predetermined such that each leakage current among I0, I1, I2, I3, or the like, is k times greater or smaller than another leakage current among I0, I1, I2, I3, or the like. For example, I1=k×I0, I2=k×I1, I3=k×I2, or the like. In some embodiments, k is 2, resulting in I1=2×I0, I2=4×I0, I3=8×I0, or the like. This value of k=2 is used for an example DAC operation described with respect to
In some embodiments, the same, predetermined voltage difference b between V0 and V1, between V1 and V2, between V2 and V3, or the like, results in I1=k×I0, I2=k×I1, I3=k×I2, or the like. In at least one embodiment, a voltage value of the voltage difference b between adjacent gate voltages among V0, V1, V2, V3, or the like, that results in a specific value of k between adjacent leakage currents among I0, I1, I2, I3, or the like, corresponds to a slope of the linear region 220 in the current-voltage characteristic 200, and is sometimes referred to as the slope of the linear region 220 or the subthreshold slope. The described numbers of four gate voltages V0, V1, V2, V3, and four corresponding leakage currents I0, I1, I2, I3 are examples. Other numbers of gate voltages and corresponding leakage currents are within the scopes of various embodiments.
In at least one embodiment, the current-voltage characteristic 200 is predetermined by measuring current values of the channel current Ids at different voltage values of the gate voltage Vgs for an actual transistor. In some embodiments, the current-voltage characteristic 200 is predetermined by a simulation executed by a computer system. In some embodiments, based on the predetermined current-voltage characteristic, especially based on a linear region in the subthreshold region of the current-voltage characteristic, V0, V1, V2, V3, or the like, and corresponding I0, I1, I2, I3, or the like, are predetermined by one or more of actual measurements, simulation results, interpolation, extrapolation, or the like. A process of predetermining a set of V0, V1, V2, V3, or the like, and corresponding I0, I1, I2, I3, or the like, for a transistor is sometimes referred to as a calibration process. In some embodiments, a database is developed in advance and stores various sets of V0, V1, V2, V3, or the like, and corresponding I0, I1, I2, I3, or the like, for different transistors which differ from each other in one or more of sizes, materials, manufacturing processes, or the like. Such a database is stored in a non-transitory computer-readable storage medium and is consulted, e.g., during a design stage of a memory device, to configure a control circuit of the memory device to control a DAC operation in a memory array of the memory device, as described herein.
The memory device 300 in
In the example configuration in
To convert the data 330 to an analog signal, the bit line BL0 is biased to a predetermined voltage, e.g., VSS or 0 V. Different word line voltages V0-V3 are supplied, e.g., by a word line driver (not shown) of the memory device 300 and through the corresponding word lines WL0-WL3, to the gates of the corresponding access transistors T0-T3. The word line voltages V0-V3 correspond to gate voltages V0-V3 described with respect to
I
BL=I0×Bit0+I1×Bit1+I2×Bit2+I3×Bit3.
In a non-limiting example, the data 330 include binary “1101”, i.e., Bit3 is logic “1”, Bit2 is logic “1”, Bit1 is logic “0”, and Bit0 is logic “1”. For Bit3, Bit2, Bit0 being logic “1”, the corresponding capacitors C3, C2, C0 are charged, and charged voltages of the charged capacitors C3, C2, C0 cause the corresponding individual currents I3 (i.e., I3×Bit3=I3×1=I3), I2 (i.e., I2×Bit2=I2×1=I2), I0 (i.e., I0×Bit0=I0×1=I0) to flow to the bit line BL0. For Bit1 being logic “0”, the corresponding capacitor C1 is not charged, and the corresponding individual current is zero (i.e., I1×Bit1=I1×0=0). As a result, the bit line current IBL=I3+I2+I0. In some embodiments with k=2 as described with respect to
The bit line current IBL is an example of an analog signal to which the data 330 are converted in the described DAC operation. In at least one embodiment, the bit line current IBL is considered a result of the DAC operation and/or is usable directly for further processing. In the example configuration in
In some embodiments, the described DAC operation involves a number of memory cells equal to the number of bits in the digital data to be converted, without requiring a separate DAC circuit. This is an improvement over other approaches which, besides the memory cells storing the digital data to be converted, also requires one or more separate DAC circuits. In some situations, such separate DAC circuits include an additional array of memory cells that occupies a significant chip area. For example, to convert 4 bits of data, e.g., Bit0-Bit3 as described above, the separate DAC circuits or, DAC array, in accordance with other approaches require at least 8 memory cells for Bit3, 4 memory cells for Bit2, 2 memory cells for Bit1, and one memory cell for Bit0. Thus, a total of 15 (i.e., 8+4+2+1=15) additional memory cells is required in accordance with the other approaches, besides the four memory cells storing the digital data to be converted. In addition, the routing of memory cells in the DAC circuits or DAC array in accordance with other approaches is different from that of a memory array, such as the memory array 112. This different routing requires additional efforts in the designing and/or manufacturing processes. Such additional memory cells, DAC array, or additional designing and/or manufacturing efforts are not required in a memory device configured to perform the described DAC operation in accordance with one or more embodiments. As a result, it is possible in one or more embodiments to reduce the circuit complexity, power consumption and chip area.
To perform a DAC operation in accordance with some embodiments, it is sufficient to predetermine, for access transistors of memory cells in a memory device, a current-voltage characteristic and/or its subthreshold region and/or or various gate voltages (e.g., V0-V3) in the subthreshold region, and to configure a control circuit of the memory device to supply the predetermined gate voltages (e.g., V0-V3) to the access transistors in a DAC operation, as described. A re-arrangement of bit lines and/or word lines is not required in one or more embodiments. In some embodiments, a BL selection circuit already exists for switching between a sensing circuit and a bit line driver, and therefore, it is sufficient to make a simple change to the configuration of the BL selection circuit to additionally switch to a current summation circuit. In at least one embodiment, the current summation circuit comprises an integrator circuit which, if not already included in the memory device, occupies a much smaller area than separate DAC circuits required by the other approaches. Thus, a DAC operation in accordance with some embodiments requires minimal changes to an existing memory device design and even reduces complexity of the memory device, e.g., by omitting, or not requiring, separate DAC circuits.
In the example in
As the access transistor T0 is turned ON, the pre-charged voltage on the bit line BL0 changes in accordance with the charging state of the capacitor C0 datum stored in the memory cell MC0. For example, when the memory cell MC0 stores logic “1”, the capacitor C0 is charged with a charged voltage VDD between its terminals. As a result, the pre-charged voltage on the bit line BL0 is increased by the charged voltage from VDD/2 toward VDD. In this process, the capacitor C0 looses at least part of its charge. In the sensing circuit 319, a sense amplifier coupled to the bit line BL0 detects and amplifiers the voltage increase on the bit line BL0, and outputs a read signal Qr having a voltage VDD indicating that the datum, or bit, read from the memory cell MC0 is logic “1”. In this amplifying process, the sense amplifier 319 also supplies VDD to the bit line BL0 to restore the capacitor C0 back to the charged state with the charged voltage VDD between its terminals, i.e., to rewrite the read out logic “1” back to the memory cell MC0.
When the memory cell MC0 stores logic “0”, the capacitor C0 is not charged, or is discharged, with no charged voltage between its terminal. As a result, the pre-charged voltage on the bit line BL0 is decreased from VDD/2 toward VSS. In this process, the capacitor C0 is partly charged. In the sensing circuit 319, the sense amplifier coupled to the bit line BL0 detects and amplifiers the voltage decrease on the bit line BL0, and outputs a read signal Qr having a voltage VSS indicating that the datum, or bit, read from the memory cell MC0 is logic “0”. In this amplifying process, the sense amplifier 319 also supplies VSS to the bit line BL0 to discharge any charges accumulated in the capacitor C0 due to the read operation, and to restore the capacitor C0 back to the discharged state with no charged voltage between its terminals, i.e., to rewrite the read out logic “0” back to the memory cell MC0. In some embodiments, the described read operation is performed periodically for all memory cells, not to output data from the memory cells, but to refresh the data stored therein. A reason is that capacitors in memory cells potentially loose their charges, and stored data, over time.
In a write operation of the memory cell MC0, a write circuit or a bit line driver is coupled, by the BL selection circuit 315, to the corresponding bit line BL0. The access voltage Va is supplied to the gate of the access transistor T0 to turn ON the access transistor T0. To write logic “1” to the memory cell MC0, the write circuit or bit line driver supplies VDD to the bit line BL0 to charge the corresponding capacitor C0 to the charged voltage VDD between its terminals. To write logic “0” to the memory cell MC0, the bit line BL0 is grounded to discharge any charges in the capacitor C0, and bring the capacitor C0 to the discharged state with no charged voltage between its terminals.
In some embodiments, read operations and write operations in the memory device 300 are not affected by the ability/functionality of the memory device 300 to also perform DAC operations. As a result, it is possible in one or more embodiments to convert, with minimal efforts, a design of an existing memory device into one configured to perform DAC operations in accordance with some embodiments, without changes to the functionality, e.g., read operations and write operations, of the existing memory device.
In some situations, at least one of the current-voltage characteristic 200, the subthreshold region 210, or the linear region 220 described with respect to
Compared with the memory device 100, the memory device 400 further comprises a temperature sensor 423 and a storage circuit 433. An example temperature sensor is a bandgap temperature sensor including bipolar junction transistors (BJTs), and further circuitry such as current sources, an operational amplifier, a voltage adder, and a control logic. An example BJT is described with respect to
The temperature sensor 423 configured to detect a temperature of the memory device 400 in operation. In the example configuration in
The temperature sensor 423 is coupled to the memory controller 120, e.g., to the control logic 123, to provide the detected temperature to the control logic 123 in operation of the memory device 400. The control logic 123 is configured to adjust one or more of the predetermined word line voltages (e.g., V0-V3) based on the temperature detected by the temperature sensor 423, and control the word line driver 122 to supply the adjusted word line voltages to the gates of the access transistors in a DAC operation. For example, it is possible in one or more embodiments that V0 (i.e., the lowest predetermined word line voltage) is not adjusted; however, the other word line voltages V1-V3 are adjusted based on the detected temperature. In a further example, all of the word line voltages V0-V3 are adjusted. In at least one embodiment where the same voltage difference b is between adjacent word line voltages among V0-V3, as described herein, the control logic 123 is configured to adjust one or more of the word line voltages V0-V3, by adjusting the voltage difference b based on the detected temperature. As described herein, the voltage difference b corresponds to the slope of the linear region 220 in the subthreshold region 210 of the current-voltage characteristic 200, and adjusting the voltage difference b corresponds to adjusting the slope of the linear region 220. In some embodiments, the control logic 123 is configured to perform the adjustment of one or more of the word line voltages V0-V3 and/or the voltage difference b so that the set of corresponding leakage currents I0-I3 remains unchanged, or substantially unchanged, as the operational temperature of the memory device 400 varies. In some embodiments, the leakage currents I0-I3 are considered substantially unchanged when any changes of one or more of the leakage currents I0-I3, due to the described temperature dependence and as a result of the described adjustment, are sufficiently small to not affect results of DAC operations.
In the example configuration in
In some embodiments, the temperature values and corresponding voltage data in the look-up table 434 are determined in advance, e.g., by actual measurements of channel currents (or leakage currents) at different gate voltages and different operational temperatures for an actual transistor. In some embodiments, the temperature values and voltage data in the look-up table 434 are predetermined by a simulation executed by a computer system. In at least one embodiment, based on the actual measurements and/or simulation results, additional temperature values and corresponding voltage data are predetermined by interpolation, extrapolation, or the like. In some embodiments, the predetermined temperature values and corresponding voltage data of the look-up table 434 are hard-wired, e.g., by a circuit designer in the storage circuit 433. In at least one embodiment, at least a part, or a whole, of the temperature values and corresponding voltage data of the look-up table 434 is provided or updated from an device external, e.g., by an operator of the memory device 400 of a computer system including the memory device 400, through an I/O circuit of the memory device 400, to the storage circuit 433. The described process of predetermining a relationship between different voltage data and corresponding different temperature values is sometimes referred to as a temperature compensation process.
The described voltage data comprising different values of the voltage difference b constitute an example. Other voltage data are within the scopes of various embodiments. For example, in one or more embodiments, the voltage data include different sets of word line voltages V0-V3, or the like, each set corresponding to one of the temperature values t0, t1, t2, or the like. The described look-up table as a way to present the predetermined relationship between different voltage data and corresponding different temperature values is an example. Other manners for present the predetermined relationship are within the scopes of various embodiments. For example, in one or more embodiments, the relationship is expressed by at least one formula or function of the operational temperature. The formula or function is developed in advance based on actual measurements and/or simulation results as described herein, and is stored in the control logic 123. Upon receiving a detected temperature from the temperature sensor 423, the control logic 123 is configured to calculate a corresponding value of the voltage difference b or a corresponding set of word line voltages, e.g., V0-V3, by inputting the detected temperature into the stored formulas or functions. In some embodiments, by adjusting the word line voltages supplied to the access transistors in a DAC operation based on the operational temperature of the memory device, it is possible to ensure accuracy of the DAC operation. One or more advantages described herein are also achievable by the memory device 400, in accordance with some embodiments.
As described with respect to
The memory device 500A comprises a source memory array 532, and a DAC memory array 542. In some embodiments, the source memory array 532 and the DAC memory array 542 are regions of a larger memory array. In at least one embodiment, the source memory array 532 and the DAC memory array 542 are separate memory arrays, e.g., in different memory banks. In each of the memory arrays in
The source memory array 532 is configured to store digital source data 535 to be converted to analog signals. The source memory array 532 is configured with a sensing circuit 539, which is configured to perform read operations to read the source data 535 from the source memory array 532 in a manner similar to the read operation described with respect to
The DAC memory array 542 is configured to perform DAC operations. In at least one embodiment, the DAC memory array 542 corresponds to the memory array 112, and is configured with a BL selection circuit, a current summation circuit, and/or a sensing circuit corresponding to the BL selection circuit 115, current summation circuit 117, sensing circuit 119. The DAC memory array 542 is further configured with a bit line driver 548 which is configured to perform write operations to write data to be converted to the DAC memory array 542 in a manner similar to the write operation described with respect to
In some embodiments, when the source data 535 are to be converted into analog signals, a DAC operation is not directly performed in the source memory array 532. Instead, the source data 535 are copied to the DAC memory array 542 before the DAC operation is performed at the DAC memory array 542, as described with respect to
In at least one embodiment, by performing DAC operations on copied data, rather than source data, it is possible to prevent the source data from being disturbed by the DAC operation.
In some situations, due to process variations, it is possible that the subthreshold slopes of access transistors in different regions of a memory device, or a memory array thereof, are different from one another. The potentially different subthreshold slopes in different regions complicate the calibration process of determining in advance the word line voltages to be used in DAC operations, because the different regions potentially require different sets of predetermined word line voltages. In some embodiments, by performing DAC operations in a specific region, e.g., the DAC memory array 542, it is sufficient to predetermine the word line voltages to be used in DAC operations for just the specific region, thereby simplifying the calibration process.
In some embodiments where an adjustment to compensate for temperature-dependent effects is to be made, the temperature compensation process of determining a relationship between different voltage data and corresponding different temperature values as described with respect to
In some embodiments, the specific region where DAC operations are to be performed includes a single row or column of memory cells, e.g., the memory cells (including the memory cells 543, 544) coupled to a bit line 550 in the DAC memory array 542. In at least one embodiment, the specific region where DAC operations are to be performed includes a few rows or columns of memory cells, each row or column similar to the row or column corresponding to the bit line 550. In at least one embodiment, by configuring the specific region where DAC operations are to be performed as one or a few rows/columns, it is possible to further simplify the calibration process and/or the temperature compensation process.
The memory device 500B comprises a source memory array 562, and a DAC memory array 572. The source memory array 562 and the DAC memory array 572 are regions of a larger memory array, and share a set of bit lines 576 and a sensing circuit 579. The source memory array 562 is configured to store digital source data 565 to be converted to analog signals, similarly to the source memory array 532. The DAC memory array 572 is configured to perform DAC operations, similarly to the DAC memory array 542. In some embodiments, when the source data 565 are to be converted into analog signals, a DAC operation is not directly performed in the source memory array 562. Instead, the source data 565 are first copied to the DAC memory array 572 where the DAC operation is to be performed later, as described with respect to
A manner of data copying (or data duplication) in the memory device 500B, in one or more embodiments, is different from that described with respect to the memory device 500A. Specifically, as described with respect to
For example, as described with respect to
In at least one embodiment, the memory device 500B permits a large amount of data to be quickly copied from a source memory array to a DAC memory array. This arrangement is advantageous in one or more embodiments where a large amount of data, e.g., in multiple rows or columns of memory cells, are to be converted to analog signals. One or more advantages described herein with respect to the memory device 500A are achievable by the memory device 500B, in accordance with some embodiments.
In some embodiments, both data copying configurations described with respect to the memory device 500A and memory device 500B are implementable in a single memory device. For example, the data copying configuration of the memory device 500A is performed when a source memory array and a DAC memory array do not share a common set of bit lines, and the data copying configuration of the memory device 500B is performed when a source memory array and a DAC memory array share a common set of bit lines.
The memory device 600A comprises a substrate 640, at least one transistor 650 over the substrate 640, an interconnect structure 660 over the transistor 650 and the substrate 640, and a metal-insulator-metal (MIM) structure 670 over the transistor 650 and the substrate 640. The MIM structure 670 comprises a capacitor coupled to the transistor 650 to form a memory cell having the 1T1C configuration as described herein. The transistor 650 is an example of an access transistor as described herein. The transistor 650 also serves as an example of transistors constituting various circuits in the memory device 600A including, but not limited to, BL selection circuits, current summation circuits, sensing circuits, a memory controller with components as described with respect to
In some embodiments, the substrate 640 is a semiconductor substrate. N-type and P-type dopants are added to the substrate to correspondingly form N wells 651, 652, and P wells (not shown). In some embodiments, isolation structures are formed between adjacent P wells and N wells. For simplicity, several features such as P wells and isolation structures are omitted from
The transistor 650 comprises a gate and source/drains. The N wells 651, 652 configure the source/drains of the transistor 650. The gate of the transistor 650 comprises a stack of gate dielectric layers 653, 654, and a gate electrode 655. In at least one embodiment, the transistor 650 comprises a gate dielectric layer instead of multiple gate dielectrics. Example materials of the gate dielectric layer or layers include HfO2, ZrO2, or the like. Example materials of the gate electrode 655 include polysilicon, metal, or the like.
The memory device 600A further comprises contact structures configured to electrically couple the transistor 650 to other circuitry in the memory device 600A. The contact structures comprise source/drain (metal-to-device, or MD) contacts 656, 657 correspondingly over and in electrical contact with the source/drains 651, 652. The contact structures further comprise various vias. For example, a via-to-gate (VG) via 645 is over and in electrical contact with the gate electrode 655, and is configured to couple the gate electrode 655 to a word line WL in the interconnect structure 660. Via-to-device (VD) vias 658, 659 are correspondingly over and in electrical contact with the MD contacts 656, 657. The VD via 658 is configured to couple the source/drain 651 to the capacitor in the MIM structure 670, as described herein. The VD via 659 is configured to couple the source/drain 652 to a bit line BL in the interconnect structure 660.
The interconnect structure 660 comprise a plurality of metal layers M0, M1, . . . and a plurality of via layers VIA0, VIA1, . . . arranged alternatingly in a thickness direction, i.e., a Z direction, of the substrate 640. The interconnect structure 660 further comprises various interlayer dielectric (ILD) layers (not shown) in which the metal layers and via layers are embedded. The M0 layer, i.e., metal-zero (M0) layer, is the lowermost metal layer immediately over and in electrical contact with the VD and VG vias, and is schematically illustrated in the drawings with the label “M0.” The M1 layer is the metal layer immediately over the M0 layer. The interconnect structure 660 further comprises other metal layers sequentially stacked over the M1 layer, and are schematically illustrated in the drawings with the corresponding labels such as “M5,” “M6,” and “M7.” The interconnect structure 660 also comprises via layers arranged between and electrically couple successive metal layers. A via layer VIAn is arranged between and electrically couple the Mn layer and the Mn+1 layer, where n is an integer from zero and up. For example, a via-zero (VIA0) layer is the lowermost via layer which is arranged between and electrically couple the M0 layer and the M1 layer. Several via layers are schematically illustrated in the drawings with corresponding labels such as “VIA5” and “VIA6.” The metal layers and via layers of the interconnect structure 660 are configured to electrically couple various elements or circuits of the memory device 600A with each other, and with external circuitry. Although the M7 layer is illustrated in
In the example configuration in
The MIM structure 670 is arranged over the M6 layer and comprises a multilayer structure. In the example configuration in
In the example configuration in
The described configuration of the transistor 650 is an example. Various transistor configurations are within the scopes of various embodiments, including, but not limited to, metal oxide semiconductor field effect transistors (MOSFET), complementary metal oxide semiconductors (CMOS) transistors, P-channel metal-oxide semiconductors (PMOS), N-channel metal-oxide semiconductors (NMOS), bipolar junction transistors (BJT), high voltage transistors, high frequency transistors, P-channel and/or N-channel field effect transistors (PFETs/NFETs), FinFETs, planar MOS transistors with raised source/drains, nanosheet FETs, nanowire FETs, or the like. The described capacitor configuration as a MIM structure is an example. Other capacitor configurations, such as MOS capacitor, trench capacitor, or the like, are within the scopes of various embodiments.
In the example configuration in
A structure below the MO layer and including the transistor 650 is manufactured by front-end-of-line (FEOL) processing, and is sometimes referred to as an FEOL structure. For example, the transistor 650 is an FEOL transistor. A structure including the M0 layer and above is manufactured by back-end-of-line (BEOL) processing, and is sometimes referred to as a BEOL structure. For example, the MIM structure 670 is a BEOL capacitor.
In the memory device 600A, access transistors of memory cells are FEOL transistors which occupy a chip area on the substrate 640 which could otherwise be configured to form other circuitry. In some embodiments, access transistors of memory cells are BEOL transistors which do not occupy chip areas on the substrate 640, thereby freeing up chip areas on the substrate 640 for other circuitry, as described with respect to
Compared to the memory device 600A, the memory device 600B comprises BEOL transistors each coupled to a BEOL capacitor to form a BEOL memory cell. For example, a transistor 680 in the memory device 600B is a BEOL transistor that is coupled to the BEOL capacitor in the MIM structure 670 to form a BEOL memory cell. In the example configuration in
The transistor 680 comprises a metal oxide layer 681, a gate dielectric layer 682 lining an opening formed in a dielectric layer over the metal oxide layer 681, and a gate electrode 683 filled in a remainder of the opening. A portion of the metal oxide layer 681 under the gate electrode 683 defines a channel of the transistor 680. Portions of the metal oxide layer 681 on opposite sides of the channel define source/drains 684, 685. Contact structures 686, 687 are correspondingly over and in electrical contact with the source/drains 684, 685. Vias 663, 688, 689 are correspondingly over and in electrical contact with the contact structure 686, gate electrode 683, contact structure 687. The via 663 is configured to couple the source/drain 684 to the capacitor in the MIM structure 670, as described with respect to
In the example configuration in
As described herein, the memory device 600B with BEOL access transistors, i.e., BEOL memory cells and BEOL memory arrays, makes it possible to free-up additional chip area for other circuitry. In at least one embodiment, one or more advantages described herein are achievable in the memory device 600B, in accordance with some embodiments.
The memory device 700 comprises a DAC memory array 710, a memory array 720, and a Multiply-Accumulate circuit (MAC) 730. In some embodiments, the DAC memory array 710 and memory array 720 are regions of a larger memory array. In at least one embodiment, DAC memory array 710 and memory array 720 are separate memory arrays, e.g., in different memory banks. In each of the memory arrays in
The DAC memory array 710 is configured to perform DAC operations on first digital data stored therein. The DAC memory array 710 is configured with a current summation circuit 717 which outputs analog signals 718 corresponding to the first digital data to the MAC 730. In some embodiments, the DAC memory array 710 and current summation circuit 717 correspond to the memory array 112 and current summation circuit 117, and are configured to perform DAC operations as described with respect to
The memory array 720 is configured to store second digital data to be operated on, together with the first digital data, in a CIM operation. The memory array 720 is configured with a sensing circuit 729 which reads the second digital data from the memory array 720 and outputs the read second digital data 728 to the MAC 730. In some embodiments, the DAC memory array 720 and sensing circuit 729 are configured to perform read operations as described with respect to
The MAC 730 is an example of a computation circuit configured to perform a CIM operation on the first digital data having been converted to analog signals 718 and the read second digital data 728, and to output a result of the CIM operation as output 733. Examples of CIM operations include, but are not limited to, mathematical operations, logical operations, combination thereof, or the like. In some embodiments, the MAC 730 is configured to multiply each of the analog signals 718 with a corresponding row of the read second digital data 728, and perform accumulation or summation of the multiplication results to obtain the output 733. For example, the MAC 730 is configured to multiply, among the analog signals 718, an analog signal corresponding to data in a column 711 of the DAC memory array 710 with data in a row 721 of the memory array 720. The MAC 730 is next configured to multiply, among the analog signals 718, an analog signal corresponding to data in a column 712 of the DAC memory array 710 with data in a row 722 of the memory array 720. The described multiplication is similarly repeated for remaining columns of the DAC memory array 710 and remaining corresponding rows of the memory array 720. The MAC 730 is further configured to perform accumulation or summation of the multiplication results, to obtain the output 733.
In at least one embodiment, the output 733 comprises one or more analog signals. In at least one embodiment, the analog signals in the output 733 are directly supplied to a further MAC for a further CIM operation. In some embodiments, the memory device 700 further comprises one or more analog-to-digital converters (ADCs) configured to convert the analog signals in the output 733 to digital data for output or for further processing, such as another CIM operation, or the like. Example ADCs include, but are not limited to, logics, integrated circuits, comparators, counters, registers, combinations thereof, or the like. In some embodiments, the MAC 730 comprises one or more of accumulators, multipliers, adders, or the like. Example accumulators include, but are not limited to, resistors, capacitors, integrator circuits, operational amplifiers, combinations thereof, or the like. Example multipliers include, but are not limited to, NOR gates, AND gates, any other logic gates, combinations of logic gates, or the like. Example adders include, but are not limited to, full adders, half adders, or the like. In some embodiments, the adders are coupled to each other to form an adder tree having multiple stages. Other MAC configurations are within the scopes of various embodiments.
In some embodiments, one of the first and second digital data comprise weight data, and the other of the first and second digital data comprise input data. For example, in
The memory device 800A comprises memory macros 802, 804, 806, 808 and memory controller 820. In some embodiments, one or more of the memory macros 802, 804, 806, 808 correspond to the memory macro 110, and/or the memory controller 820 corresponds to the memory controller 120. In the example configuration in
The memory macros 802, 804, 806, 808 are coupled to each other in sequence, with output data of a preceding memory macro being input data for a subsequent memory macro. For example, input data DIN are input into the memory macro 802. The memory macro 802 performs one or more CIM operations based on the input data DIN and weight data stored in the memory macro 802, and generates output data DOUT2 as results of the CIM operations. The output data DOUT2 are supplied as input data DIN4 of the memory macro 804. The memory macro 804 performs one or more CIM operations based on the input data DIN4 and weight data stored in the memory macro 804, and generates output data DOUT4 as results of the CIM operations. The output data DOUT4 are supplied as input data DIN6 of the memory macro 806. The memory macro 806 performs one or more CIM operations based on the input data DIN6 and weight data stored in the memory macro 806, and generates output data DOUT6 as results of the CIM operations. The output data DOUT6 are supplied as input data DIN8 of the memory macro 808. The memory macro 808 performs one or more CIM operations based on the input data DIN8 and weight data stored in the memory macro 808, and generates output data DOUT as results of the CIM operations. One or more of the input data DIN, DIN4, DIN6, DIN8 correspond to the input data described with respect to
The neural network 800B comprises a plurality of layers A-E each comprising a plurality of nodes (or neurons). The nodes in successive layers of the neural network 800B are connected with each other by a matrix or array of connections. For example, the nodes in layers A and B are connected with each other by connections in a matrix 812, the nodes in layers B and C are connected with each other by connections in a matrix 814, the nodes in layers C and D are connected with each other by connections in a matrix 816, and the nodes in layers D and E are connected with each other by connections in a matrix 818. Layer A is an input layer configured to receive input data 811. The input data 811 propagate through the neural network 800B, from one layer to the next layer via the corresponding matrix of connections between the layers. As the data propagate through the neural network 800B, the data undergo one or more computations, and are output as output data 819 from layer E which is an output layer of the neural network 800B. Layers B, C, D between input layer A and output layer E are sometimes referred to as hidden or intermediate layers. The number of layers, number of matrices of connections, and number of nodes in each layer in
In some embodiments, the matrices 812, 814, 816, 818 are correspondingly implemented by the memory macros 802, 804, 806, 808, the input data 811 correspond to the input data DIN, and the output data 819 correspond to the output data DOUT. Specifically, in the matrix 812, a connection between a node in layer A and another node in layer B has a corresponding weight. For example, a connection between node A1 and node B1 has a weight W(A1,B1) which corresponds to a weight value stored, e.g., in a row or a column of a memory array of the memory macro 802. The memory macros 804, 806, 808 are configured in a similar manner. The weight data in one or more of the memory macros 802, 804, 806, 808 are updated, e.g., by a processor and through the memory controller 820, as machine learning is performed using the neural network 800B. One or more advantages described herein are achievable in the neural network 800B implemented in whole or in part by one or more memory macros and/or memory devices in accordance with some embodiments.
The IC device 800C comprises one or more hardware processors 832, one or more memory devices 834 coupled to the processors 832 by one or more buses 836. In some embodiments, the IC device 800C comprises one or more further circuits including, but not limited to, cellular transceiver, global positioning system (GPS) receiver, network interface circuitry for one or more of Wi-Fi, USB, Bluetooth, or the like. Examples of the processors 832 include, but are not limited to, a central processing unit (CPU), a multi-core CPU, a neural processing unit (NPU), a graphics processing unit (GPU), a digital signal processor (DSP), a field-programmable gate array (FPGA), an application-specific integrated circuit (ASIC), other programmable logic devices, a multimedia processor, an image signal processors (ISP), or the like. Examples of the memory devices 834 include one or more memory devices and/or memory macros described herein. In at least one embodiment, each of the processors 832 is coupled to a corresponding memory device among the memory devices 834.
In some embodiments, the memory devices 834 are CIM memory devices, and various computations are performed in the memory devices which reduces the computing workload of the corresponding processor 832, reduces memory access time, and improves performance. In at least one embodiment, the IC device 800C is a system-on-a-chip (SOC). In at least one embodiment, one or more advantages described herein are achievable by the IC device 800C.
The integrator circuit 800D comprises an operational amplifier 855 and a capacitor 856. An input 851 of the operational amplifier 855 is electrically coupled, through a BL selection circuit (not shown), to a bit line BL0 to receive the bit line current IBL, as described herein. A further input 854 of the operational amplifier 855 is grounded. The capacitor 856 is electrically coupled between the input 851 and an output 852 of the operational amplifier 855. The operational amplifier 855 and the capacitor 856 are configured to integrate the bit line current IBL over time to generate the analog signal SDAC. The described configuration of the integrator circuit 800D is an example. Other integrator circuit configurations or current summation circuit configurations are within the scopes of various embodiments.
At operation 915, a plurality of memory cells of a memory device is accessed, by supplying, to gates of access transistors of the plurality of memory cells, corresponding different gate voltages lower than a threshold voltage of the access transistors. In some embodiments, the different gate voltages are in a linear region of a current-voltage characteristic, to permit an individual current to flow through the each memory cell. The individual current corresponds to a datum stored in the each memory cell and the corresponding gate voltage. For example, as described with respect to
At operation 920, a computation is performed based on individual currents flowing through the plurality of memory cells in response to the corresponding different gate voltages. For example, as described with respect to
At operation 925, prior to operation 915, the gate voltage to be supplied to each memory cell is adjusted based on a temperature of the memory device. For example, as described with respect to
At operation 930, prior to operation 915, data to be converted to one or more analog signals are copied from a plurality of further memory cells to the plurality of memory cells where a DAC operation is to be performed. For example, as described with respect to
The described methods and algorithms include example operations, but they are not necessarily required to be performed in the order shown. Operations may be added, replaced, changed order, and/or eliminated as appropriate, in accordance with the spirit and scope of embodiments of the disclosure. Embodiments that combine different features and/or different embodiments are within the scope of the disclosure and will be apparent to those of ordinary skill in the art after reviewing this disclosure.
In some embodiments, a memory device comprises a plurality of word lines, a bit line, a memory array, a control circuit, and a current summation circuit. The memory array comprises a plurality of memory cells coupled to the bit line. Each of the plurality of memory cells comprises an access transistor coupled to a corresponding word line among the plurality of word lines. The control circuit is configured to supply, correspondingly through the plurality of word lines, a plurality of word line voltages, different from each other, to the access transistors. The current summation circuit is configured to be coupled to the bit line, and to detect a bit line current on the bit line.
In some embodiments, a memory device comprises a plurality of word lines, a bit line, a memory array, and a control circuit. The memory array comprises a plurality of memory cells coupled to the bit line. Each of the plurality of memory cells comprises an access transistor coupled to a corresponding word line among the plurality of word lines. The control circuit is configured to, in a digital-to-analog conversion (DAC) operation, supply a plurality of word line voltages correspondingly through the plurality of word lines to the access transistors of the plurality of memory cells, the plurality of word line voltages lower than a threshold voltage of the access transistors. The control circuit is further configured to, in a read operation of a selected memory cell among the plurality of memory cells, supply a read voltage through the corresponding word line to the access transistor of the selected memory cell, the read voltage equal to or higher than the threshold voltage.
In some embodiments, a method comprises a computing-in-memory (CIM) operation of a memory device. The CIM operation comprises accessing a plurality of memory cells of the memory device by supplying, to gates of access transistors of the plurality of memory cells, corresponding different gate voltages lower than a threshold voltage of the access transistors. The CIM operation further comprises performing a computation based on individual currents flowing through the plurality of memory cells in response to the corresponding different gate voltages.
The foregoing outlines features of several embodiments so that those skilled in the art may better understand the aspects of the present disclosure. Those skilled in the art should appreciate that they may readily use the present disclosure as a basis for designing or modifying other processes and structures for carrying out the same purposes and/or achieving the same advantages of the embodiments introduced herein. Those skilled in the art should also realize that such equivalent constructions do not depart from the spirit and scope of the present disclosure, and that they may make various changes, substitutions, and alterations herein without departing from the spirit and scope of the present disclosure.
This application claims the benefit of U.S. Provisional Application No. 63/615,378, filed Dec. 28, 2023, which is herein incorporated by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
63615378 | Dec 2023 | US |