The invention generally relates to the field of integrated circuit and more particularly to memory subsystem in CNN based digital Integrated Circuit (IC) for Artificial Intelligence (AI).
Artificial Intelligence (AI) is defined as intelligence exhibited by machines (e.g., computers, processors, etc.). Intelligence means the ability to acquire and apply knowledge and skills. Many different approaches have been tried and tested in AI research since 1960s. One of the more promising techniques is based on Cellular Neural Networks or Cellular Nonlinear Networks (CNN). CNN have been applied to many different fields and problems including, but limited to, image processing, speech recognition, etc. However, most of the prior art CNN approaches are either based on software solutions (e.g., Convolutional Neural Networks, Recurrent Neural Networks, etc.) or based on hardware that are designed for other purposes (e.g., graphic processing, general computation, etc.). As a result, CNN prior approaches are too slow in term of computational speed and/or too expensive thereby impractical for processing large amount of imagery data. The imagery data can be from any two-dimensional signals (e.g., a still photo, a picture, a frame of a video stream, etc.).
For a CNN based IC for artificial intelligence, data must be provided as close to the CNN processing logic. In addition, different characteristics of data may be required. For example, in image processing, filter coefficients and imagery data have different requirements. Filter coefficients need to be validly stored for long time, while the imagery data are written and read more often.
This section is for the purpose of summarizing some aspects of the invention and to briefly introduce some preferred embodiments. Simplifications or omissions in this section as well as in the abstract and the title herein may be made to avoid obscuring the purpose of the section. Such simplifications or omissions are not intended to limit the scope of the invention.
For a CNN processing unit for either mobile co-processor or servo co-processor for processing large amount of input signals (e.g., imagery data, voice data, etc.) Processing-in memory or memory-in processor is the most promising approach. Together with low power consumption, read/write speed and highly distributed on the same silicon are the three major requirements.
According to one aspect, CNN (Cellular Neural Networks or Cellular Nonlinear Networks) based digital Integrated Circuit for artificial intelligence contains multiple CNN processing units. Each CNN processing unit contains CNN logic circuits operatively coupling to a memory subsystem having first and second memories. The first memory contains magnetic random access memory (MRAM) cells for storing weights (e.g., filter coefficients) while the second memory is for storing input signals (e.g., imagery data). The first memory may store one-time-programming weights. The memory subsystem may contain a third memory that contains MRAM cells for storing one-time-programming data for security purpose. The second memory contains MRAM cells or static random access memory cells. Each MRAM cell contains a Spin-Orbit-Torque (SOT) based magnetic tunnel junction (MTJ) element.
Other objects, features, and advantages of the invention will become apparent upon examining the following detailed description of an embodiment thereof, taken in conjunction with the attached drawings.
These and other features, aspects, and advantages of the invention will be better understood with regard to the following description, appended claims, and accompanying drawings as follows:
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the invention. However, it will become obvious to those skilled in the art that the invention may be practiced without these specific details. The descriptions and representations herein are the common means used by those experienced or skilled in the art to most effectively convey the substance of their work to others skilled in the art. In other instances, well-known methods, procedures, and components have not been described in detail to avoid unnecessarily obscuring aspects of the invention.
Reference herein to “one embodiment” or “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the invention. The appearances of the phrase “in one embodiment” in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Further, the order of blocks in process flowcharts or diagrams or circuits representing one or more embodiments of the invention do not inherently indicate any particular order nor imply any limitations in the invention. Used herein, the terms “top”, “bottom”, “upper”, “lower”, “vertical”, “horizontal”, “planar”, “parallel”, “anti-parallel”, “perpendicular”, “plan”, “elevation” are intended to provide relative positions for the purposes of description, and are not intended to designate an absolute frame of reference. Additionally, term “MTJ element” and “MTJ bit” are interchangeable.
It is to be appreciated that the use of any of the following “/”, “and/or”, and “at least one of”, for example, in the cases of “A/B”, “A and/or B” and “at least one of A and B”, is intended to encompass the selection of the first listed option (A) only, or the selection of the second listed option (B) only, or the selection of both options (A and B). As a further example, in the cases of “A, B, and/or C” and “at least one of A, B, and C”, such phrasing is intended to encompass the selection of the first listed option (A) only, or the selection of the second listed option (B) only, or the selection of the third listed option (C) only, or the selection of the first and the second listed options (A and B) only, or the selection of the first and third listed options (A and C) only, or the selection of the second and third listed options (B and C) only, or the selection of all three options (A and B and C). This may be extended, as readily apparent by one of ordinary skill in this and related arts, for as many items listed.
Embodiments of the invention are discussed herein with reference to
Referring first to
The IC 100 is implemented as a digital semi-conductor chip (e.g., a silicon substrate) and contains a controller 110, and a plurality of CNN processing units 102a-102b operatively coupled to at least one input/output (I/O) data bus 120. Controller 110 is configured to control various operations of the CNN processing units 102a-102b, which are connected in a loop with a clock-skew circuit (e.g., D flip-flop).
In one embodiment, the digital integrated circuit 100 is extendable and scalable. For example, multiple copy of the digital integrated circuit 100 can be implemented on a single semi-conductor chip.
All of the CNN processing units are identical. For illustrating simplicity, function block diagram of an example CNN processing unit 200 is shown in
Each CNN processing unit 200 contains CNN logic circuits 202, which is operatively coupled to an embedded memory subsystem 210. In other words, the memories of the embedded memory subsystem 210 and the CNN logic circuits 202 are located on the same digital semi-conductor chip. In one embodiment, CNN logic circuits 202 are for performing convolution operations of input signals with filter coefficients (or weights). In one embodiment, the input signals are imagery data. In another embodiment, the input signals are converted voice data.
Memory subsystem 210 is made of a first memory 212 and a second memory 214. The first memory 212 is for requiring data being stored with higher retention rate than the second memory 214. The second memory 214 is for facilitating higher endurance of balanced data read and write operations than the first memory 212. In one embodiment, the first memory 212 is for storing weights (e.g., filter coefficients) while the second memory 214 is for storing input signals (e.g., imagery data in an image processing application).
In one embodiment, the first memory 212 contains a first group of magnetic random access memory (MRAM) cells. The second memory 214 contains a second group of magnetic random access memory cells. Each of the magnetic random access memory cells contains a Spin-Orbit-Torque (SOT) based magnetic tunnel junction (MTJ) element.
A schematic diagram of 3-terminal structure of an example SOT based MTJ element 310 is shown in
The free layer 312 is in contact with a bottom nonmagnetic layer 311 made of heavy metal (e.g., Platinum (Pt), Tantalum (Ta), etc.). When injecting an electric current in the bottom nonmagnetic layer 311, Spin-Orbit coupling leads to a perpendicular spin current induced by the spin Hall and Rashba-like effects, which is transferred to the magnetization creating a spin torque and inducing magnetization reversal.
Each SOT based MTJ element 310 contains three terminals (i.e., terminal-1331, terminal-2332 and terminal-3333) for facilitating separate paths shown in
A SOT based MTJ element of the first memory 212 can have a range of 20-500 nm (nanometer) in diameter 521. The dimension range for the second memory 214 is 20-200 nm in diameter 521. In general, it is easier to fabricate larger size SOT based MTJ elements than smaller ones.
Furthermore, the order of layers in the example SOT based MTJ elements 310 can be reversed to achieve the same purpose.
In another embodiment, the first memory 212 contains a group of magnetic random access memory (MRAM) cells. The second memory 214 contains a group of static random access memory (SRAM) cells. Each of the magnetic random access memory cells contains a Spin-Orbit-Torque (SOT) based magnetic tunnel junction (MTJ) element.
Referring back to
In one embodiment, both first and second memories 232-234 are made of MRAM cells with SOT based MTJ elements. In another embodiment, the second memory 234 contains a group of SRAM cells instead of MRAM cells.
A further embodiment shown in
In one embodiment, all three memories 251-253 are made of MRAM cells with SOT based MTJ elements. In another embodiment, the second memory 252 is made of SRAM cells instead of MRAM cells.
OTP is referred to data being written to memory only one time (e.g., substantially permanent once written). For a MRAM cell, OTP can be performed in many stages: wafer level, chip level, after soldering during fabrication of a CNN based digital IC. For example, a specific application such as face recognition requires a particular set of filter coefficients, which can be permanently written to an IC (i.e., first memory 232 in
OTP can also be performed after fabrication during use, for example, a specific pattern unique to any application and any user is created and programmed (i.e., written) to the OTP memory in an initialization procedure or at first use. In one embodiment, one user can write a particular set of filter coefficients to an IC (i.e., first memory 232 in
To break down the oxide barrier layer of an SOT based MTJ element for creating OTP memory, a number of techniques may be used as follows:
Although the invention has been described with reference to specific embodiments thereof, these embodiments are merely illustrative, and not restrictive of, the invention. Various modifications or changes to the specifically disclosed exemplary embodiments will be suggested to persons skilled in the art. For example, whereas image processing has been shown and described as an example usage of the CNN based digital IC, other applications may be used, for example, voice recognition. Sound waves can be recorded and converted into a series of digital images (e.g., using fast Fourier Transform), whose features in turn can be extracted using a CNN based digital IC. Furthermore, two groups of different sized MTJ elements have been shown and described for the at least two groups, any number of groups of different sized MTJ elements may be used for achieving the same, for example, three groups. Additionally, the order of the layers in example SOT based MTJ elements has been shown and described in one particular pattern, other patterns may be used for achieving the same, for example, the order of the fixed or pinned layer and the free layer can be reversed. In summary, the scope of the invention should not be restricted to the specific exemplary embodiments disclosed herein, and all modifications that are readily suggested to those of ordinary skill in the art should be included within the spirit and purview of this application and scope of the appended claims.
This application is a continuation-in-part (CIP) of a co-pending U.S. patent application Ser. No. 15/477,263, entitled “Embedded Memory Subsystems For A CNN Based Processing Unit And Methods Of Making” filed on Apr. 3, 2017. This application is also a CIP of a co-pending U.S. patent application Ser. No. 15/498,378, entitled “Buffer Memory Architecture For A CNN Based Processing Unit And Creation Methods Thereof” filed on Apr. 26, 2017. This application is also a CIP of a co-pending U.S. patent application Ser. No. 15/591,069, entitled “MLC BASED MAGNETIC RANDOM ACCESS MEMORY USED IN CNN BASED DIGITAL IC FOR AI” filed on May 9, 2017. All of which are hereby incorporated by reference in their entirety for all purposes.
Number | Name | Date | Kind |
---|---|---|---|
5355528 | Roska | Oct 1994 | A |
5717834 | Werblin et al. | Feb 1998 | A |
7884433 | Zhong et al. | Feb 2011 | B2 |
8183061 | Zhong et al. | May 2012 | B2 |
8324698 | Zhong et al. | Dec 2012 | B2 |
8772051 | Zhong et al. | Jul 2014 | B1 |
8803293 | Zhong et al. | Aug 2014 | B2 |
8933542 | Zhong et al. | Jan 2015 | B2 |
9111222 | Aparin | Aug 2015 | B2 |
9673388 | Toh et al. | Jun 2017 | B2 |
9685604 | Huang et al. | Jun 2017 | B2 |
9734880 | Augustine | Aug 2017 | B1 |
9767557 | Gulsun | Sep 2017 | B1 |
9940534 | Yang | Apr 2018 | B1 |
9959500 | Torng | May 2018 | B1 |
10049322 | Ross | Aug 2018 | B2 |
20070047294 | Panchula | Mar 2007 | A1 |
20110273926 | Wu et al. | Nov 2011 | A1 |
20120294076 | Lee et al. | Nov 2012 | A1 |
20120294078 | Kent et al. | Nov 2012 | A1 |
20140071741 | Kim et al. | Mar 2014 | A1 |
20160225423 | Naik | Aug 2016 | A1 |
20170039472 | Kudo | Feb 2017 | A1 |
20170103298 | Liang et al. | Apr 2017 | A1 |
20170103299 | Aydonat et al. | Apr 2017 | A1 |
20180040811 | Lee | Feb 2018 | A1 |
20180137414 | Du et al. | May 2018 | A1 |
20180276539 | Lea | Sep 2018 | A1 |
20180315473 | Jung et al. | Nov 2018 | A1 |
Entry |
---|
USPTO office action for U.S. Appl. No. 15/632,203 (Related case) dated Sep. 20, 2018. |
USPTO office action for U.S. Appl. No. 15/477,263 (Parent case) dated Sep. 7, 2018. |
USPTO office action for U.S. Appl. No. 15/591,069 (Parent case) dated Sep. 7, 2018. |
Zhao et al. “Design of MRAM based Logic Circuits and its Applications”, Proceeding GLSVLSI '11 Proceedings of the 21st edition of the great lakes symposium on Great lakes symposium on VLSI pp. 431-436 Lausanne, Switzerland—May 2-4, 2011. |
Kodzuka et al. “Effects of boron composition on tunneling magnetoresistance ratio and microstructure of CoFeB/MgO/CoFeB pseudo-spin-valve magnetic tunnel junctions”, Journal of Applied Physics 111, 043913 (2012). |
Sze et al. “Hardware for Machine Learning: Challenges and Opportunities”, Published at an invited conference paper at CICC 2017, Submitted on Dec. 22, 2016 (v1). |
L. Thomas et al. “Basic Principles, Challenges and Opportunities of STT-MRAM for Embedded Memory Applications” MSST 2017—Santa Clara, May 17, 2017. |
Kultursay et al. “Evaluating STT-RAM as an Energy-Efficient Main Memory Alternative”, Published in: Performance Analysis of Systems and Software (ISPASS), 2013 IEEE International Symposium, Apr. 22-23, 2013. |
Jabeur et al. “High Performance Spin-Orbit-Torque (SOT) Based Non-volatile Standard Cell for Hybrid CMOS/Magnetic ICs”,Computer Science and Information Technology 5(3): 91-96, 2017. |
Prenat et al. “Ultra-Fast and High-Reliability SOT-MRAM: From Cache Replacement to Normally-Off Computing”, IEEE Transactions on Multi-Scale Computing Systems, vol. 2, No. 1, pp. 49-60, Jan.-Mar. 2016. |
Sengupta et al. “Spin Orbit Torque Based Electronic Neuron”, School of Electrical & Computer Engineering, Purdue University, West Lafayette, IN, 47906, pp. 1-12. |
USPTO office action for U.S. Appl. No. 15/632,203 (related case) dated Feb. 26, 2019. |
USPTO notice of allowance for U.S. Appl. No. 15/477,263 (parent case) dated Mar. 11, 2019. |
USPTO notice of allowance for U.S. Appl. No. 15/591,069 (parent case) dated Mar. 14, 2019. |
USPTO notice of allowance for U.S. Appl. No. 15/632,203 (related case) dated Mar. 25, 2019. |
Number | Date | Country | |
---|---|---|---|
20180285723 A1 | Oct 2018 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 15591069 | May 2017 | US |
Child | 15729616 | US | |
Parent | 15498378 | Apr 2017 | US |
Child | 15591069 | US | |
Parent | 15477263 | Apr 2017 | US |
Child | 15498378 | US |