The present invention relates to the field of integrated circuit, and more particularly to configurable processor (also known as configurable gate array).
A conventional processor uses logic-based computing (LBC), which carries out computation primarily with logic circuits (e.g. XOR circuit). Logic circuits are suitable for arithmetic functions, whose operations involve only basic arithmetic operations. The basic arithmetic operations consist of addition “+”, subtraction “−” and multiplication “*” only, which can be easily implemented by logic circuits. However, logic circuits are not suitable for non-arithmetic functions, which cannot be expressed in terms of a finite number of arithmetic operations. Exemplary non-arithmetic functions include transcendental functions and special functions. Non-arithmetic functions are computationally hard and their hardware implementation has been a major challenge. Unless indicated otherwise, the term “mathematical functions” are limited to non-arithmetic functions in this specification.
A complex mathematical function is a mathematical function with multiple independent variables (an independent variable is also known as an input variable or an argument). It can be expressed as a combination of basic mathematical functions. A basic mathematical function is a mathematical function with a single independent variable. Exemplary basic mathematical functions include basic transcendental functions, such as exponential function (exp), logarithmic function (log), trigonometric functions (sin, cos, tan, atan) and others.
On the conventional processor, the basic mathematical functions which can be calculated by hardware (i.e. hardware computing) are referred to as built-in mathematical functions. Because different mathematical functions are implemented with different logic circuits, the hardware implementation of the built-in mathematical functions is highly customized. Due to limited resources on a processor die, only a small number of the built-in mathematical functions can be implemented by hardware. For example, only 7 built-in mathematical functions (i.e. CBRT, EXP, LN, SIN, COS, TAN, ATAN) are implemented by hardware on an Intel IA-64 processor (referring to Harrison et al. “The Computation of Transcendental Functions on the IA-64 Architecture”, Intel Technology journal, Q4, 1999, page 6).
Because the hardware implementation of mathematical functions is difficult, most mathematical functions are implemented by software. On the conventional processor, all complex mathematical functions (even most basic mathematical functions) are implemented by software. As software computing is more complicated than hardware computing, calculation of complex mathematical functions is slow and inefficient. It is highly desired to realize hardware computing for complex mathematical functions. It is even more desirable to realize configurable hardware computing, i.e. to use a same set of hardware to implement a large set of complex mathematical functions.
A configurable processor is a semi-custom integrated circuit designed to be configured by a customer after manufacturing. It is also referred to as configurable electrical circuit, configurable gate array, field-programmable gate array (FPGA), complex programmable logic device (CPLD), or other names. U.S. Pat. No. 4,870,302 issued to Freeman on Sep. 26, 1989 (hereinafter referred to as Freeman) discloses a configurable electrical circuit. It contains an array of configurable logic elements (CLE's, also known as configurable logic blocks) and a hierarchy of configurable interconnects (CIT's, also known as programmable interconnects) that allow the configurable logic elements to be wired together per customer's desire. Each CLE in the array is in itself capable of realizing any one of a plurality of logic functions (e.g. shift, logic NOT, logic AND, logic OR, logic NOR, logic NAND, logic XOR, arithmetic addition “+”, arithmetic subtraction “−”, etc.) depending upon a first configuration signal. Each CIT can selectively couple or de-couple interconnect lines depending upon a second configuration signal.
In the configurable electrical circuit of Freeman, fixed computing elements are used to implement basic mathematical functions. These fixed computing elements are portions of hard blocks which are not configurable, i.e. the circuits implementing these mathematical functions are fixedly connected and are not subject to change by programming. As is the case with the conventional processor, the fixed computing elements can implement only a small number of mathematical functions. This limits further applications of the configurable electrical circuit. To overcome these difficulties, the present invention expands the original concept of Freeman from configurable logic to configurable computing.
It is a principle object of the present invention to implement configurable computing.
It is a further object of the present invention to provide a configurable processor to customize not only logic functions, but also mathematical functions.
It is a further object of the present invention to improve computational complexity.
It is a further object of the present invention to improve computational density.
It is a further object of the present invention to shorten the time-to-market.
It is a further object of the present invention to reduce the physical size of the configurable processor.
It is a further object of the present invention to lower the cost of the configurable processor.
It is a further object of the present invention to provide a paradigm shift for scientific computing.
It is a further object of the present invention to realize rapid and efficient modeling and simulation.
In accordance with these and other objects of the present invention, the present invention discloses a configurable processor.
The present invention discloses a configurable processor. It comprises at least an array of configurable computing elements (CCE's). Each CCE comprises at least a three-dimensional (3-D) memory (3D-M) array; an arithmetic logic circuit (ALC); and, a plurality of inter-storage-processor (ISP) connections communicatively coupling them. The 3D-M array stores a look-up table (LUT) of a mathematical function, while the ALC performs arithmetic operations on selected data from the LUT.
The preferred configurable processor comprises a semiconductor substrate, which is single-crystalline. The ALC and at least a portion of the peripheral circuit of the 3D-M are disposed on the semiconductor substrate. On the other hand, the memory cells of the 3D-M array are not disposed on the semiconductor substrate. In fact, they are neither in contact with nor interposed therebetween by any semiconductor substrate. Hence, the ALC and the portion of the peripheral circuit of the 3D-M array comprise at least one single-crystalline semiconductor material, while the memory cells of the 3D-M array do not comprises any single-crystalline semiconductor material.
The usage of the CCE includes two stages: a configuration stage and a computing stage. In the configuration stage, the LUT of a desired mathematical function is loaded into the memory cells of the 3D-M array. In the computing stage, selected data of the LUT for the desired mathematical function is read out from the memory cells of the 3D-M array, upon which further computation is performed.
Preferably, the 3D-M array is a 3-D non-volatile memory (3D-NVM) array, which keeps the data stored therein for long term even when power goes off. Depending on the number of programmings that can be performed on the 3D-NVM array, the preferred configurable processor can be categorized into one-time-configurable processor and re-configurable processor. By using a 3-D one-time-programmable memory (3D-OTP) array, the LUT can be loaded once. This type of the configurable processor is referred to as one-time-configurable processor. On the other hand, by using a 3-D multiple-time-programmable memory (3D-MTP, or 3-D rewritable memory) array, the LUT can be loaded multiple times. Accordingly, the CCE is a re-configurable computing element (re-CCE) and this type of the configurable processor is referred to as re-configurable processor.
Besides CCE's, the preferred configurable processor further comprises at least an array of configurable logic elements (CLE's) and/or at least an array of configurable interconnects (CIT's). With CLE's and CIT's, the preferred configurable processor can be used to implement complex mathematical functions. A complex mathematical function is first decomposed into a combination of basic mathematical functions. Each basic mathematical function is realized by programming an associated CCE. The complex mathematical function is then realized by programming the appropriate CLE's and CIT's.
The present invention realizes hardware computing of complex mathematical functions. Compared with software computing, hardware computing is much faster and more efficient. Because the LUT's are used as a primary means to implement mathematical functions, this type of computing is referred to as memory-based computing (MBC). Although arithmetic operations are still performed, using a larger LUT as a starting point, the MBC only needs to calculate a polynomial to a smaller order. For the MBC, the fraction of computation done by the LUT is significantly more than the ALC.
The advantage of MBC over logic-based computing (LBC) is configurability and generality. Because the LUT's of different mathematical functions can be loaded into the 3D-M array, the preferred configurable processor can be configured into implementing different mathematical functions. In addition, with hundreds of gigabits to store the LUT's (a 3D-XPoint die stores 128 Gb), the types of the mathematical functions that can be implemented by the preferred configurable processor are essentially boundless.
The preferred configurable processor 100 takes two forms—singlet and doublet. In a preferred configurable-processor singlet, the 3D-M array and the ALC are monolithically integrated into a single configurable-processor die. On the other hand, in a preferred configurable-processor doublet, the 3D-M array and the ALC are disposed onto two separate dice—a 3D-M die and a processing die bonded face-to-face.
For either configurable-processor die or configurable-processor doublet, the 3D-M array and the ALC substantially overlap. In addition, because they do not penetrate through any semiconductor substrate, the ISP-connections are short, small and numerous. Adding the fact that the 3D-NVM cells are much smaller than the RAM cells (4 F2 vs.˜100 F2, F is minimum feature size), the preferred CCE is much smaller than prior art. Hence, the preferred configurable processor contains massive number of the CCE's. In one preferred embodiment, the preferred configurable processor contains at least one thousand CCE's. In another preferred embodiment, the preferred configurable processor contains at least ten thousand CCE's. As a result, the preferred configurable processor is computationally powerful, i.e. it can achieve massive parallelism, great computational complexity, and/or large computational density.
Accordingly, the present invention discloses a configurable processor, comprising a single-crystalline semiconductor substrate and an array of configurable computing elements (CCE's), each of said CCE's comprising: at least a three-dimensional memory (3D-M) array including memory cells for storing at least a portion of a look-up table (LUT) of a mathematical function, wherein said memory cells are neither in contact with nor interposed therebetween by any semiconductor substrate including said single-crystalline semiconductor substrate; and, said memory cells do not comprise any single-crystalline semiconductor material; an arithmetic logic circuit (ALC) and at least a portion of a peripheral circuit of said 3D-M array disposed on said single-crystalline semiconductor substrate, wherein said ALC performs at least one arithmetic operation on selected data of said LUT; said ALC and said portion of said peripheral circuit are communicatively coupled; and, said ALC and said portion of said peripheral circuit comprise at least a single-crystalline semiconductor material; a plurality of inter-storage-processor (ISP) connections for communicatively coupling said memory cells and said portion of said peripheral circuit, wherein said ISP-connections do not penetrate through any semiconductor substrate including said single-crystalline semiconductor substrate; and, said memory cells and said ALC at least partially overlap.
It should be noted that all the drawings are schematic and not drawn to scale. Relative dimensions and proportions of parts of the device structures in the figures have been shown exaggerated or reduced in size for the sake of clarity and convenience in the drawings. The same reference symbols are generally used to refer to corresponding or similar features in the different embodiments. Singular form is used to refer to both singular and plural forms. The symbol “/” means a relationship of “and” or “or”. The terms “singlet” and “die” are used interchangeably. Furthermore, the terms “program” and “write” are used interchangeably.
As used herein, the term “mathematical functions” refer to non-arithmetic mathematical functions, i.e. the mathematical functions that cannot be expressed in terms of a finite number of arithmetic operations. In other words, the mathematical functions involve more operations than arithmetic operations performable by the arithmetic logic circuit (ALC). The term “memory” is used to mean a semiconductor memory and the term “memory array” is used in its broadest sense to mean a collection of all memory cells sharing at least one address line. The term “look-up table (LUT)” could refer to either look-up table per se, or the memory circuit used to store the look-up table, depending on the context. The phrase “communicatively coupled” is used in its broadest sense to mean any coupling whereby electrical signals may be passed from one element to another element.
As used herein, the phrase “a circuit on a substrate” is used in its broadest sense to mean that at least some of its active elements (e.g. transistors) or portions thereof (e.g. channel of the MOSFET) are formed in the substrate, even though the interconnects coupling the active elements (e.g. transistors) and other portions of the active elements are formed above the substrate. The phrase “a circuit above a substrate” is used in its broadest sense to mean that all active elements are disposed above the substrate and they are not in contact with the substrate. The phrase “memory cells are interposed therebetween by a semiconductor substrate” means that a semiconductor substrate separates the memory cells; in other words, there is a semiconductor substrate between the memory cells. The phrase “memory cells are not interposed therebetween by any semiconductor substrate” means that no semiconductor substrate separates the memory cells; in other words, there is no semiconductor substrate between the memory cells.
As used herein, the phrases “a circuit made of single-crystalline semiconductor material” and “a circuit comprising at least a single-crystalline semiconductor material” mean that at least a key portion (e.g. channel) of the active elements (e.g. transistors) is formed in a single-crystalline semiconductor substrate (or, film). The phrases “a circuit made of non-single-crystalline semiconductor material”, “a circuit comprising non-single-crystalline semiconductor materials” and “a circuit does not comprise any single-crystalline semiconductor material” mean that all key portions (e.g. channel/gate/source/drain) of the active elements (e.g. transistors) are formed in a non-single-crystalline (e.g. poly-crystalline, micro-crystalline or amorphous) semiconductor film and does not comprise any single-crystalline semiconductor material.
As used herein, the phrases “diode”, “steering element”, “steering device”, “selector”, “selecting element”, “selecting device”, “selection element” and “selection device”, all have the same meaning. They are used in their broadest sense to mean a diode-like device whose resistance at the read voltage is substantially lower than that when the applied voltage has a magnitude smaller than or a polarity opposite to that of the read voltage.
Those of ordinary skills in the art will realize that the following description of the present invention is illustrative only and is not intended to be in any way limiting. Other embodiments of the invention will readily suggest themselves to such skilled persons from an examination of the within disclosure.
Referring now to
Referring now to
Referring now to
The circuit implementation of mathematical functions is much more complicated than the circuit implementation of logic functions. The LUT stored in the CCE 100ij includes numerical values related to a mathematical function, whereas the LUT stored in a configurable logic element (CLE) of the configurable electrical circuit (Freeman) includes only logic values of a logic function. Numerical values are denoted by a large number of bits. For example, a half-precision floating-point number comprises 16 bits; a single-precision floating-point number comprises 32 bits; a double-precision floating-point number comprises 64 bits. In comparison, the logic values can be denoted by a single bit and have only two values, i.e. “true” and “false”. Accordingly, the LUT size in the CCE 100ij is substantially larger than that in the CLE.
In an LUT for a mathematical function, the numerical values include the functional values of the mathematical function. When the input variable of a mathematical function comprises a larger number of bits, the LUT size could become excessively large. For example, an LUT including the functional values of a single-precision mathematical function (32-bit input and 32-bit output) needs 232*32=128 Gb. To reduce the LUT size, Taylor-series (or other polynomial expansion) calculation is preferably used. To be more specific, the LUT not only includes the functional values, but also includes the derivative values of a mathematical function, e.g. the first-order derivative values, the second-order derivative values, and so on. To perform the Taylor-series calculation, the CCE 100ij further comprises at least an adder and a multiplier.
By combining the LUT with polynomial interpolation, a high precision can be achieved without using an excessively large LUT. In the above embodiment, a single-precision function can be realized using a total of 4 Mb LUT (2 Mb for the functional values, and 2 Mb for the first-order derivative values) in conjunction with a first-order Taylor series calculation. This is significantly less than the LUT-only approach (4 Mb vs. 128 Gb).
Besides elementary functions (including algebraic functions and transcendental functions), the preferred CCE 100ij can be used to implement non-elementary functions such as special functions. Special functions can be defined by means of power series, generating functions, infinite products, repeated differentiation, integral representation, differential difference, integral, and functional equations, trigonometric series, or other series in orthogonal functions. Important examples of special functions are gamma function, beta function, hyper-geometric functions, confluent hyper-geometric functions, Bessel functions, Legrendre functions, parabolic cylinder functions, integral sine, integral cosine, incomplete gamma function, incomplete beta function, probability integrals, various classes of orthogonal polynomials, elliptic functions, elliptic integrals, Lame functions, Mathieu functions, Riemann zeta function, automorphic functions, and others. Direct hardware implementations of special functions using the CCE 100ij will simplify computing and promote their applications in scientific computing.
Preferably, the 3D-M array is a 3-D non-volatile memory (3D-NVM) array, which keeps the data stored therein for long term even when power goes off. Depending on the number of programmings that can be performed on the 3D-NVM array, the preferred configurable processor 100 can be categorized into one-time-configurable processor and re-configurable processor. By using a 3-D one-time-programmable memory (3D-OTP) array, the LUT can be loaded once. This type of the configurable processor is referred to as one-time-configurable processor. On the other hand, by using a 3-D multiple-time-programmable memory (3D-MTP, or 3-D rewritable memory) array, the LUT can be loaded multiple times. Accordingly, the CCE is a re-configurable computing element (re-CCE) and this type of the configurable processor is referred to as re-configurable processor.
Referring now to
Being reconfigurable, the re-CCE 100ij can realize a second mathematical function during the second usage cycle 660, which includes a second configuration stage 650 and a second computing stage 670. During the second usage cycle 660, the first LUT is erased from the 3D-M array 170 first. Then a second LUT of a second mathematical function is loaded into the 3D-M array 170 during the second configuration stage 650. Later selected data from the second LUT are read out to calculate the second mathematical function during the second computing stage 670. The re-CCE 100ij is particularly suitable for single-instruction-multiple-data (SIMD)-type of data processing. Once the LUT's of the mathematical functions (considered as part of the instruction) are loaded into the 3D-M arrays 170 in the configuration stage, a large amount of data can be fed into the re-CCE 100ij and processed at high speed. SIMD has many applications, e.g. vector processing in image processing, massively parallel processing in scientific computing.
Referring now to
Referring now to
For this instantiation, the configurable channel 310 is configured in such a way that the inputs a, b, c, d associated with four independent variables of the complex mathematical function e=a·sin(b)+c·cos(d) are coupled to the inputs of the CCE's 100AA-100AD, respectively. Furthermore, the CCE 100AA is configured to realize the function log( ) whose result log(a) is sent to a first input of the CLE 200A. The CCE 100AB is configured to realize the function log[sin( )], whose result log[sin(b)] is sent to a second input of the CLE 200A. The CLE 200A is configured to realize arithmetic addition “+”, whose result log(a)+log[sin(b)] is sent the CCE 100BA. The CCE 100BA is configured to realize the function exp( ), whose result exp{log(a)+log[sin(b)]}=a·sin(b) is sent to a first input of the CLE 200BA. Similarly, through proper configurations, the results of the CCE's 100AC, 100AD, the CLE's 200AC, and the CCE 100BC can be sent to a second input of the CLE 200BA. The CLE 200BA is configured to realize arithmetic addition “+”, whose result a·sin(b)+c·cos(d) is sent to the output e. Apparently, by changing its configuration, the first preferred configurable processor 100 can realize other complex mathematical functions.
Referring now to
Referring now to
The preferred configurable-processor die 100 comprises only a single semiconductor substrate 0. Since the semiconductor substrate 0 is single-crystalline, the ALC's 180 comprise at least a single-crystalline semiconductor material. On the other hand, since that they are neither in contact with nor interposed therebetween by any semiconductor substrate, the memory cells of the 3D-M arrays 170 do not comprise any single-crystalline semiconductor material.
The 3D-M arrays 170 are preferably 3-D non-volatile memory (3D-NVM) arrays, which keeps the data stored therein for long term even when power goes off. Compared with a volatile memory (e.g. SRAM, DRAM), the memory cell of a 3D-NVM is much smaller. For example, the cell size of a three-dimensional read-only memory (3D-ROM, referring to U.S. Pat. No. 5,385,396) is only 4 F2, whereas the cell size of an SRAM is ˜100 F2 (F is the minimum feature size).
Based on its physical structure, the 3D-M can be categorized into horizontal 3D-M (3D-MH) and vertical 3D-M (3D-MV). In a 3D-MH, all address lines are horizontal. The memory cells form a plurality of horizontal memory levels which are vertically stacked above each other. A well-known 3D-MH is 3D-XPoint. In a 3D-MV, at least one set of the address lines are vertical. The memory cells form a plurality of vertical memory strings which are placed side-by-side on/above the substrate. A well-known 3D-MV is 3D-NAND. In general, the 3D-MH (e.g. 3D-XPoint) is faster, while the 3D-MV (e.g. 3D-NAND) is denser.
In the present invention, the 3D-NVM array 170 is preferably a 3-D writable memory (3D-W), whose memory cells are electrically programmable. Based on the number of programmings allowed, the 3D-W can be further categorized into three-dimensional one-time-programmable memory (3D-OTP) and three-dimensional multiple-time-programmable memory (3D-MTP, including rewritable). Common 3D-MTP includes 3D-XPoint and 3D-NAND. Other 3D-MTP's include memristor, resistive random-access memory (RRAM or ReRAM), phase-change memory (PCM), programmable metallization cell (PMC) memory, conductive-bridging random-access memory (CBRAM), and the like.
In
The memory levels 16A, 16B are communicatively coupled with the substrate circuit OK through contact vias 1av, 3av, which collectively form the ISP-connections 160. The contact vias 1av, 3av comprise a plurality of vias, each of which is communicatively coupled with the vias above or below. Not penetrating through any semiconductor substrate including the single-crystalline semiconductor substrate 0, the ISP-connections 160 are short, small and numerous.
The 3D-MH array 170 in
In
The preferred 3D-MV array 170 in
The preferred 3D-MV array 170 in
To minimize interference between memory cells, a diode or a diode-like device is preferably formed between the word line 15 and the bit line 19. In a first preferred embodiment, the programmable layer 13 acts as a diode. In a second preferred embodiment, this diode is formed by depositing an extra diode layer on the sidewall of the memory well (not shown in this figure). In a third preferred embodiment, this diode is formed naturally between the word line 15 and the bit line 19, i.e. to form a built-in junction (e.g. P-N junction, or Schottky junction). More details on the built-in diode are disclosed in U.S. patent application Ser. No. 16/137,512, filed on Sep. 20, 2018.
Not penetrating through any semiconductor substrate, the ISP-connections 160 (e.g. contact vias 1av, 3av) are short, small and numerous. To be more specific, the length of the contact vias 1av, 3av is on the order of one micrometer, e.g. ranging from ⅓ micrometers to 3 micrometers. Apparently, short contact vias 1av, 3av can be made small in size. In general, the size of the contact vias 1av, 3av is equal to or twice as much as the width of the address lines. For example, the size of the contact vias (e.g. 1av, 3av) is smaller than 100 nanometers. With small contact vias (e.g. 1av, 3av), more contact vias can be formed in each CCE 100ij. For example, a single CCE 100ij could comprise at least one thousand contact vias; and, a single configurable-processor die 100 could comprise at least one million contact vias. Thus, the preferred configurable-processor die 100 can achieve a large bandwidth between 3D-M array 170 and ALC 180.
The small contact vias (e.g. 1av, 3av) and the small 3D-M cells (˜4 F2, e.g. 7aa, 18ay) lead to a small CCE 100ij. Accordingly, the preferred configurable-processor die 100 comprises massive number of CCE's 100AA-100BD. In one example, the preferred configurable-processor die 100 comprises at least one thousand CCE's. In another example, the preferred configurable-processor die 100 comprises at least ten thousand CCE's. As a result, the preferred configurable processor 100 is computationally powerful, i.e. it can achieve massive parallelism, great computational complexity, and/or large computational density.
Referring now to
In the preferred configurable-processor doublet 100 of
The preferred 3D-M die 100a in
The 3D-M arrays 170 are stacked above the substrate circuit 0Ka. The 3D-M arrays 170 include eight address-line layers 0a1a-0a8a. Each address-line layer (e.g. 0a1a) comprises a plurality of address lines on a same physical plane. The address-line layers 0a1a-0a8a form eight memory levels. Since they are formed above the first semiconductor substrate 0M and neither in contact with nor interposed therebetween by any semiconductor substrate, the memory cells (e.g. 18ay-18hy) of the 3D-M arrays 170 do not comprise any single-crystalline semiconductor material.
The preferred processing die 100b in
In the preferred configurable-processor doublet 100, the 3D-M die 100a comprises substantially more back-end-of-line (BEOL) layers (including all interconnect layers and all address-line layers) than the processing die 100b. For example, the 3D-M die 100a in
Since the 3D-M die 100a and the processing die 100b are face-to-face bonded and not separated by any semiconductor substrate, the ISP-connections 160 (e.g. micro-bumps 160x of
Referring now to
The embodiment of
The embodiment of
The embodiment of
In
For either configurable-processor singlet 100 or configurable-processor doublet 100, the 3D-M array 170 and the ALC 180 substantially overlap. In addition, because they do not penetrate through any semiconductor substrate, the ISP-connections 160 are short, small and numerous. Adding the fact that the 3D-NVM cells are much smaller than the RAM cells, the preferred CCE 100ij is much smaller than prior art. Hence, the preferred configurable processor 100 contains massive number of the CCE's. In one preferred embodiment, the preferred configurable processor 100 contains at least one thousand CCE's. In another preferred embodiment, the preferred configurable processor 100 contains at least ten thousand CCE's. As a result, the preferred configurable processor 100 is computationally powerful, i.e. it can achieve massive parallelism, great computational complexity, and/or large computational density.
Because it can implement significantly more built-in mathematical functions than prior art (ten thousand vs. ten), the preferred configurable processor 100 will provide a paradigm shift in scientific computing. Scientific computing uses advanced computing capabilities to advance human understandings and solve engineering problems. It has wide applications in computational mathematics, computational physics, computational chemistry, computational biology, computational engineering, computational economics, computational finance and other computational fields.
The prevailing framework of scientific computing comprises three layers: a foundation layer, a function layer and a modeling layer. The foundation layer includes built-in mathematical functions that can be implemented by hardware. The function layer includes mathematical functions that cannot be implemented by hardware. The modeling layer includes mathematical models, which are the mathematical descriptions of the input-output characteristics of a system component within a system under simulation.
The conventional processor supports very few (˜ten) built-in mathematical functions. This small set of built-in mathematical functions can be implemented by hardware and constitute the foundation layer of scientific computing. On the other hand, the mathematical functions in the function layer and the mathematical models in the modeling layer are both implemented by software. The function layer involves one software-decomposition step: mathematical functions are decomposed into combinations of built-in mathematical functions by software, before these built-in mathematical functions and the associated arithmetic operations are calculated by hardware. The modeling layer involves two software-decomposition steps: the mathematical models are first decomposed into combinations of mathematical functions; then the mathematical functions are further decomposed into combinations of built-in mathematical functions. Apparently, the software-implemented functions (e.g. mathematical functions, mathematical models) run much slower and less efficiently than the hardware-implemented functions (i.e. built-in mathematical functions), and extra software-decomposition steps (e.g. for mathematical models) would make these performance gaps even more pronounced.
To illustrate how computationally intensive a mathematical model could be,
Significantly more built-in mathematical functions shall flatten the prevailing framework of scientific computing (including the foundation, function and modeling layers). The hardware-implemented functions, which were only available to the foundation layer, now become available to the function layer and modeling layer. Not only mathematical functions in the function layer can be directly realized by hardware, but also mathematical models in the modeling layer can be directly described by hardware. In the function layer, mathematical functions can be realized by a function-by-LUT method, i.e. the functional values are calculated by reading the LUT plus polynomial interpolation. In the modeling layer, mathematical models can be described by a model-by-LUT method, i.e. the input-output characteristics of the system component are modeled by reading the LUT plus polynomial interpolation. This would lead to a paradigm shift for scientific computing.
Referring now to
The 3D-M array 170U could store different forms of mathematical models. In one case, the mathematical model data stored in the 3D-M array 170U are raw measurement data, i.e. the measured input-output characteristics of the transistor 24. One example is the measured drain current vs. the applied gate-source voltage (ID-VGS) characteristics. In another case, the mathematical model data stored in the 3D-M array 170U is the smoothed measurement data. The raw measurement data could be smoothed using a purely mathematical method (e.g. a best-fit model). Or, this smoothing process can be aided by a physical transistor model (e.g. a BSIM4 V3.0 transistor model). In a third case, the mathematical data stored in the 3D-M array 170U include not only the measured data, but also its derivative values. For example, the LUT data include not only the drain-current values of the transistor 24 (e.g. the ID-VGS characteristics), but also its transconductance values (e.g. the Gm-VGS characteristics). With derivative values, polynomial interpolation can be used to improve the modeling precision using a reasonably-sized LUT.
Model-by-LUT offers many advantages over function-by-LUT. By skipping two software-decomposition steps (from mathematical models to mathematical functions, and from mathematical functions to built-in mathematical functions), model-by-LUT saves substantial modeling time and energy. Moreover, model-by-LUT may need less LUT than function-by-LUT. In theory, mapping a mathematical function into an LUT requires an infinite space. In reality, mapping a mathematical model of a real life system component into an LUT requires only a finite space. To be more specific, because a transistor model (e.g. BSIM4 V3.0) has hundreds of model parameters, calculating the intermediate functions of the transistor model requires extremely large LUT's. However, if function-by-LUT is skipped (namely, skipping the transistor models and the associated intermediate functions), the transistor behaviors can be described using only three parameters (including the gate-source voltage VGS, the drain-source voltage VDS, and the body-source voltage VBs). Describing the mathematical models of the transistor 24 requires relatively small LUT's.
While illustrative embodiments have been shown and described, it would be apparent to those skilled in the art that many more modifications than that have been mentioned above are possible without departing from the inventive concepts set forth therein. The invention, therefore, is not to be limited except in the spirit of the appended claims.
Number | Date | Country | Kind |
---|---|---|---|
201610083747.7 | Feb 2016 | CN | national |
201610125227.8 | Mar 2016 | CN | national |
201610260845.3 | Apr 2016 | CN | national |
201610289592.2 | May 2016 | CN | national |
201610307102.7 | May 2016 | CN | national |
201710122749.7 | Mar 2017 | CN | national |
201710126067.3 | Mar 2017 | CN | national |
201710237780.5 | Apr 2017 | CN | national |
201710980620.X | Oct 2017 | CN | national |
201710980779.1 | Oct 2017 | CN | national |
201710980779.1 | Oct 2017 | CN | national |
201710980813.5 | Oct 2017 | CN | national |
201710980826.2 | Oct 2017 | CN | national |
201710980827.7 | Oct 2017 | CN | national |
201710989881.8 | Oct 2017 | CN | national |
201710989885.6 | Oct 2017 | CN | national |
201710989901.1 | Oct 2017 | CN | national |
201811506212.1 | Dec 2018 | CN | national |
201811508130.0 | Dec 2018 | CN | national |
201811520357.7 | Dec 2018 | CN | national |
201811527885.5 | Dec 2018 | CN | national |
201811527911.4 | Dec 2018 | CN | national |
201811528014.5 | Dec 2018 | CN | national |
201811546476.X | Dec 2018 | CN | national |
201811546592.1 | Dec 2018 | CN | national |
201910002944.5 | Jan 2019 | CN | national |
201910029523.1 | Jan 2019 | CN | national |
This application is a division of U.S. patent application Ser. No. 16/693,370, filed Nov. 24, 2019, which is a continuation-in-part of the following U.S. Patent Applications (A)-(D): (A) U.S. patent application Ser. No. 16/186,571, filed Nov. 11, 2018, now U.S. Pat. No. 10,700,686, issued Jun. 30, 2020, which is a continuation-in-part of U.S. patent application Ser. No. 16/059,023, filed Aug. 8, 2018, now U.S. Pat. No. 10,312,917, issued Jun. 4, 2019, which is a continuation-in-part of the following U.S. Patent Applications (A1)-(A4): (A1) U.S. patent application Ser. No. 15/793,912, filed Oct. 25, 2017, now U.S. Pat. No. 10,075,168, issued Sep. 11, 2018, which is a continuation of U.S. patent application Ser. No. 15/450,049, filed Mar. 6, 2017, now U.S. Pat. No. 9,838,021, issued Dec. 5, 2017, which is a continuation-in-part of U.S. patent application Ser. No. 15/450,017, filed Mar. 5, 2017, now U.S. Pat. No. 9,948,306, issued Apr. 17, 2018; (A2) U.S. patent application Ser. No. 15/793,968, filed Oct. 25, 2017, now abandoned, which is a continuation-in-part of U.S. patent application Ser. No. 15/450,049, filed Mar. 6, 2017, now U.S. Pat. No. 9,838,021, issued Dec. 5, 2017, which is a continuation-in-part of U.S. patent application Ser. No. 15/450,017, filed Mar. 5, 2017, now U.S. Pat. No. 9,948,306, issued Apr. 17, 2018; (A3) U.S. patent application Ser. No. 15/793,927, filed Oct. 25, 2017, now U.S. Pat. No. 10,075,169, issued Sep. 11, 2018, which is a continuation-in-part of U.S. patent application Ser. No. 15/450,049, filed Mar. 6, 2017, now U.S. Pat. No. 9,838,021, issued Dec. 5, 2017, which is a continuation-in-part of U.S. patent application Ser. No. 15/450,017, filed Mar. 5, 2017, now U.S. Pat. No. 9,948,306, issued Apr. 17, 2018; (A4) U.S. patent application Ser. No. 15/793,933, filed Oct. 25, 2017, now U.S. Pat. No. 10,141,939, issued Nov. 27, 2018, which is a continuation-in-part of U.S. patent application Ser. No. 15/450,049, filed Mar. 6, 2017, now U.S. Pat. No. 9,838,021, issued Dec. 5, 2017, which is a continuation-in-part of U.S. patent application Ser. No. 15/450,017, filed Mar. 5, 2017, now U.S. Pat. No. 9,948,306, issued Apr. 17, 2018. (B) U.S. patent application Ser. No. 16/055,170, filed Aug. 6, 2018, now abandoned, which is a continuation-in-part of U.S. patent application Ser. No. 15/793,912, filed Oct. 25, 2017, now U.S. Pat. No. 10,075,168, issued Sep. 11, 2018, which is a continuation of U.S. patent application Ser. No. 15/450,049, filed Mar. 6, 2017, now U.S. Pat. No. 9,838,021, issued Dec. 5, 2017, which is a continuation-in-part of U.S. patent application Ser. No. 15/450,017, filed Mar. 5, 2017, now U.S. Pat. No. 9,948,306, issued Apr. 17, 2018. (C) U.S. patent application Ser. No. 15/487,366, filed Apr. 13, 2017, now U.S. Pat. No. 10,763,861, issued Sep. 1, 2020. (D) U.S. patent application Ser. No. 16/249,021, filed, Jan. 16, 2019. These patent applications claim priorities from the following Chinese Patent Applications: 1) Chinese Patent Application No. 201610083747.7, filed Feb. 13, 2016;2) Chinese Patent Application No. 201610125227.8, filed Mar. 5, 2016;3) Chinese Patent Application No. 201610260845.3, filed Apr. 22, 2016;4) Chinese Patent Application No. 201610289592.2, filed May 2, 2016;5) Chinese Patent Application No. 201610307102.7, filed May 10, 2016;6) Chinese Patent Application No. 201710122749.7, filed Mar. 3, 2017;7) Chinese Patent Application No. 201710126067.3, filed Mar. 6, 2017;8) Chinese Patent Application No. 201710237780.5, filed Apr. 12, 2017;9) Chinese Patent Application No. 201710980620.X, filed Oct. 19, 2017;10) Chinese Patent Application No. 201710980779.1, filed Oct. 20, 2017;11) Chinese Patent Application No. 201710980813.5, filed Oct. 20, 2017;12) Chinese Patent Application No. 201710980826.2, filed Oct. 20, 2017;13) Chinese Patent Application No. 201710980827.7, filed Oct. 20, 2017;14) Chinese Patent Application No. 201710989881.8, filed Oct. 23, 2017;15) Chinese Patent Application No. 201710989885.6, filed Oct. 23, 2017;16) Chinese Patent Application No. 201710989901.1, filed Oct. 23, 2017;17) Chinese Patent Application No. 201811506212.1, filed Dec. 10, 2018;18) Chinese Patent Application No. 201811508130.0, filed Dec. 11, 2018;19) Chinese Patent Application No. 201811520357.7, filed Dec. 12, 2018;20) Chinese Patent Application No. 201811527885.5, filed Dec. 13, 2018;21) Chinese Patent Application No. 201811527911.4, filed Dec. 13, 2018;22) Chinese Patent Application No. 201811528014.5, filed Dec. 14, 2018;23) Chinese Patent Application No. 201811546476.X, filed Dec. 15, 2018;24) Chinese Patent Application No. 201811546592.1, filed Dec. 15, 2018;25) Chinese Patent Application No. 201910002944.5, filed Jan. 2, 2019;26) Chinese Patent Application No. 201910029523.1, filed Jan. 13, 2019, in the State Intellectual Property Office of the People's Republic of China (CN), the disclosure of which are incorporated herein by reference in their entireties.
Number | Name | Date | Kind |
---|---|---|---|
4870302 | Freeman | Sep 1989 | A |
5046038 | Briggs et al. | Sep 1991 | A |
5060182 | Briggs et al. | Oct 1991 | A |
5604499 | Miyagoshi et al. | Feb 1997 | A |
5835396 | Zhang | Nov 1998 | A |
5901274 | Oh | May 1999 | A |
5954787 | Eun | Sep 1999 | A |
6181355 | Brethour et al. | Jan 2001 | B1 |
6263470 | Hung et al. | Jul 2001 | B1 |
7028247 | Lee | Apr 2006 | B2 |
7206410 | Bertoni et al. | Apr 2007 | B2 |
7366748 | Tang et al. | Apr 2008 | B1 |
7472149 | Endo | Dec 2008 | B2 |
7512647 | Wilson et al. | Mar 2009 | B2 |
7574468 | Rayala | Apr 2009 | B1 |
7539927 | Lee et al. | May 2009 | B2 |
7558812 | Padalia et al. | Jul 2009 | B1 |
7634524 | Okutani et al. | Dec 2009 | B2 |
7917559 | Redgrave | Mar 2011 | B2 |
7962543 | Schulte et al. | Jun 2011 | B2 |
8203564 | Jiao et al. | Jun 2012 | B2 |
8438522 | Frederick et al. | May 2013 | B1 |
8487948 | Kai et al. | Jul 2013 | B2 |
9015452 | Dasgupta | Apr 2015 | B2 |
9196752 | Baskaran et al. | Nov 2015 | B2 |
9207910 | Azadet et al. | Dec 2015 | B2 |
9225501 | Azadet | Dec 2015 | B2 |
9465580 | Pineiro et al. | Oct 2016 | B2 |
9606796 | Lee et al. | Mar 2017 | B2 |
10848158 | Zhang | Nov 2020 | B2 |
20040044710 | Harrison et al. | Mar 2004 | A1 |
20050071401 | Clifton | Mar 2005 | A1 |
20060106905 | Chren, Jr. | May 2006 | A1 |
20120068229 | Bemanian et al. | Mar 2012 | A1 |
20120248595 | Or-Bach et al. | Oct 2012 | A1 |
20130185345 | Tsadik et al. | Jul 2013 | A1 |
20140067889 | Mortensen | Mar 2014 | A1 |
20140222883 | Pineiro et al. | Aug 2014 | A1 |
20140300500 | San et al. | Oct 2014 | A1 |
20160173101 | Gao et al. | Jun 2016 | A1 |
Entry |
---|
Harrison et al., “The Computation of Transcendental Functions on the IA-64 Architecture”, Intel Technical Journal, Q4, 1999. |
Paul et al., “Reconfigurable Computing Using Content Addressable Memory for Improved Performance and Resource Usage”, Design Automation Conference (DAC), pp. 786-791, 2008. |
Karam et al., “Emerging Trends in Design and Applications of Memory-Based Computing and Content-Addressable Memories”, Proceedings of the IEEE, vol. 103, issue 8, pp. 1311-1330, 2015. |
Kim et al. “Design and Analysis o 3D-MAPS (3D Massively Parallel Processor with Stacked Memory)”, IEEE Transactions on Computers, vol. 64, No. 1, pp. 112-125, Jan. 2015. |
Number | Date | Country | |
---|---|---|---|
20210083670 A1 | Mar 2021 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 16693370 | Nov 2019 | US |
Child | 17065632 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 15450049 | Mar 2017 | US |
Child | 15793912 | US | |
Parent | 15450049 | Mar 2017 | US |
Child | 15793912 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 16186571 | Nov 2018 | US |
Child | 16693370 | US | |
Parent | 16059023 | Aug 2018 | US |
Child | 16186571 | US | |
Parent | 15793912 | Oct 2017 | US |
Child | 16059023 | US | |
Parent | 15450017 | Mar 2017 | US |
Child | 15450049 | US | |
Parent | 15793968 | Oct 2017 | US |
Child | 16059023 | US | |
Parent | 15450049 | Mar 2017 | US |
Child | 15793968 | US | |
Parent | 15450017 | Mar 2017 | US |
Child | 15450049 | US | |
Parent | 15793927 | Oct 2017 | US |
Child | 16059023 | US | |
Parent | 15450049 | Mar 2017 | US |
Child | 15793927 | US | |
Parent | 15450017 | Mar 2017 | US |
Child | 15450049 | US | |
Parent | 15793933 | Oct 2017 | US |
Child | 16059023 | US | |
Parent | 15450049 | Mar 2017 | US |
Child | 15793933 | US | |
Parent | 15450017 | Mar 2017 | US |
Child | 15450049 | US | |
Parent | 16055170 | Aug 2018 | US |
Child | 16693370 | US | |
Parent | 15793912 | Oct 2017 | US |
Child | 16055170 | US | |
Parent | 15450017 | Mar 2017 | US |
Child | 15450049 | US | |
Parent | 15487366 | Apr 2017 | US |
Child | 16693370 | US | |
Parent | 16249021 | Jan 2019 | US |
Child | 15487366 | US |