The present invention relates to the field of integrated circuit, and more particularly to configurable gate array.
Conventional processors use logic-based computation (LBC), which carries out computation primarily with logic circuits (e.g. XOR circuit). Logic circuits are suitable for arithmetic functions, whose operations involve only basic arithmetic operations, i.e. addition, subtraction and multiplication. However, logic circuits are not suitable for non-arithmetic functions, whose operations involve more than addition, subtraction and multiplication. Exemplary non-arithmetic functions include transcendental functions and special functions. Non-arithmetic functions are computationally hard and their hardware implementation has been a major challenge. Throughout the present invention, the term “math functions” are limited to non-arithmetic functions.
A complex math function is a non-arithmetic function with multiple independent variables (independent variable is also known as input variable or argument). It can be expressed as a combination of basic functions. A basic function is a non-arithmetic function with a single independent variable. Exemplary basic functions include basic transcendental functions, such as exponential function (exp), logarithmic function (log), trigonometric functions (sin, cos, tan, atan) and others.
On a conventional processor, the basic functions which can be calculated by hardware (i.e. hardware computing) are referred to as built-in functions. Because different math functions are implemented with different logic circuits, the hardware implementation of built-in functions is highly customized. Due to limited resources on a processor die, only a small number of built-in functions can be implemented by hardware. For example, only 7 built-in functions (i.e. CBRT, EXP, LN, SIN, COS, TAN, ATAN) are implemented by hardware on an Intel IA-64 processor (referring to Harrison et al. “The Computation of Transcendental Functions on the IA-64 Architecture”, Intel Technology Journal, Q4, 1999, page 6).
Because hardware implementation of even basic functions is difficult, software computing has been a commonly accepted practice. On a conventional processor, all complex math functions, even most basic functions, are calculated by software. As software computing is more complicated than hardware computing, calculation of complex math functions is slow and inefficient. It is highly desired to realize hardware computing for complex math functions. It is even more desirable to realize configurable hardware computing, i.e. to use a same set of hardware to implement a large set of complex math functions.
A configurable gate array is a semi-custom integrated circuit designed to be configured by a customer after manufacturing. It is also referred to as field-programmable gate array (FPGA), complex programmable logic device (CPLD), or other names. U.S. Pat. No. 4,870,302 issued to Freeman on Sep. 26, 1989 (hereinafter referred to as Freeman) discloses a configurable gate array. It contains an array of configurable logic elements (also known as configurable logic blocks) and a hierarchy of configurable interconnects (also known as programmable interconnects) that allow the configurable logic elements to be wired together per customer's desire. Each configurable logic element in the array is in itself capable of realizing any one of a plurality of logic functions (e.g. shift, logic NOT, logic AND, logic OR, logic NOR, logic NAND, logic XOR, arithmetic addition “+”, arithmetic subtraction “−”, etc.) depending upon a first configuration signal. Each configurable interconnect can selectively couple or de-couple interconnect lines depending upon a second configuration signal.
In the conventional configurable gate array, fixed computing elements are used to implement basic functions. These fixed computing elements are portions of hard blocks which are not configurable, i.e. the circuits implementing these math functions are fixedly connected and are not subject to change by programming. This would limit further application of the configurable gate array. To overcome these difficulties, the present invention expands the original concept of the configurable gate array by making the fixed computing elements configurable. In other words, besides configurable logic elements, the configurable gate array comprises configurable computing elements, which can realize any one of a plurality of math functions.
It is a principle object of the present invention to extend the concept of the configurable gate array from logic computation to math computation.
It is a further object of the present invention to provide a configurable computing array to customize not only logic functions, but also math functions.
It is a further object of the present invention to provide a configurable computing array with a small physical size.
It is a further object of the present invention to provide a configurable computing array with a fast computational speed.
It is a further object of the present invention to provide a configurable computing array with a short time-to-market.
It is a further object of the present invention to provide a configurable computing array with a good manufacturability.
It is a further object of the present invention to provide a configurable computing array with a lower manufacturing cost.
In accordance with these and other objects of the present invention, the present invention discloses a configurable computing array.
The present invention discloses a configurable computing array. It comprises at least an array of configurable interconnects, at least an array of configurable logic elements and at least an array of configurable computing elements. Each configurable computing element comprises at least a programmable memory, which can be loaded with a look-up table (LUT) for a math function. Because the memory is programmable, the math functions that can be realized by the configurable computing element are essentially boundless and numerous.
The usage cycle of the configurable computing element comprises two stages: a configuration stage and a computation stage. In the configuration stage, the LUT for a desired math function is loaded into the programmable memory. In the computation stage, a selected portion of the LUT for the desired math function is read out from the programmable memory. For a rewritable memory, a configurable computing element can be re-configured to realize different math functions at different time.
Besides configurable computing elements, the preferred configurable computing array further comprises configurable logic elements and configurable interconnects. During operation, a complex math function is first decomposed into a combination of basic functions. Each basic function is realized by programming an associated configurable computing element. The complex math function is then realized by programming the appropriate configurable logic elements and configurable interconnects.
By using arrays of configurable computing elements, configurable logic elements and configurable interconnects, the present invention implements hardware computing of complex math functions. Compared with software computing, hardware computing is much faster and more efficient. Moreover, because the LUTs are used as a primary means to implement math functions, this type of computing is a memory-based computing (MBC). The best advantage of MBC over LBC is configurability and generality. By loading the LUTs of different math functions into the programmable memory at different time, a single programmable memory can be used to implement a large set of basic functions, thus realizing configurable computing.
Accordingly, the present invention discloses a configurable computing array, comprising: at least an array of configurable logic elements including a configurable logic element, wherein said configurable logic element selectively realizes a logic function from a logic library; and at least an array of configurable computing elements including first and second configurable computing elements, wherein said first configurable computing element comprises a first memory for storing a first look-up table (LUT) for a first math function; and, said second configurable computing element comprises a second memory for storing a second LUT for a second math function; whereby said configurable computing array realizes a math function by programming said configurable logic elements and said configurable computing elements, wherein said math function is a combination of at least said first and second math functions.
The present invention further discloses another configurable computing array, comprising: at least an array of configurable interconnects including a configurable interconnect, wherein said configurable interconnect selectively realizes an interconnect from an interconnect library; and at least an array of configurable computing elements including first and second configurable computing elements, wherein said first configurable computing element comprises a first memory for storing a first look-up table (LUT) for a first math function; and, said second configurable computing element comprises a second memory for storing a second LUT for a second math function; whereby said configurable computing array realizes a math function by programming said configurable interconnects and said configurable computing elements, wherein said math function is a combination of at least said first and second math functions.
The present invention further discloses yet another configurable computing array, comprising: at least an array of configurable interconnects including a configurable interconnect, wherein said configurable interconnect selectively realizes an interconnect from an interconnect library; at least an array of configurable logic elements including a configurable logic element, wherein said configurable logic element selectively realizes a logic function from a logic library; and at least an array of configurable computing elements including first and second configurable computing elements, wherein said first configurable computing element comprises a first memory for storing a first look-up table (LUT) for a first math function; and, said second configurable computing element comprises a second memory for storing a second LUT for a second math function; whereby said configurable computing array realizes a math function by programming said configurable interconnects, said configurable logic elements and said configurable computing elements, wherein said function is a combination of at least said first and second math functions.
It should be noted that all the drawings are schematic and not drawn to scale. Relative dimensions and proportions of parts of the device structures in the figures have been shown exaggerated or reduced in size for the sake of clarity and convenience in the drawings. The same reference symbols are generally used to refer to corresponding or similar features in the different embodiments. In the present invention, the terms “write”, “program” and “configure” have similar meanings and are used interchangeably. The symbol “I” means a relationship of “and” or “or”.
Those of ordinary skills in the art will realize that the following description of the present invention is illustrative only and is not intended to be in any way limiting. Other embodiments of the invention will readily suggest themselves to such skilled persons from an examination of the within disclosure.
Referring now to
The implementation of math functions is much more complicated than the implementation of logic functions. The LUT stored in the configurable computing element 100 includes numerical values related to a math function, whereas the LUT stored in a configurable logic element of the conventional configurable gate array includes logic values of a logic function. Numerical values are denoted by a large number of bits. For example, a half-precision floating-point number comprises 16 bits; a single-precision floating-point number comprises 32 bits; a double-precision floating-point number comprises 64 bits. In comparison, the logic values can be denoted by a single bit and have only two values, i.e. “true” and “false”. Accordingly, the LUT size in the configurable computing element 100 is substantially larger than that in the configurable logic element.
In an LUT, the numerical values related to a math function include the functional values of the math function. When the input variable of a math function comprises a larger number of bits, the LUT size could become excessively large. For example, an LUT including the functional values of a double-precision math function needs 264*64=1021 bits. To reduce the LUT size, Taylor-series (or other polynomial expansion) calculation is preferably used. To be more specific, the LUT not only includes the functional values, but also includes the derivative values of a math function, e.g. the first-order derivative values, the second-order derivative values, and so on. To perform the Taylor-series calculation, the configurable computing element 100 further comprises at least an adder and a multiplier. More details on Taylor-series implementation of math functions are disclosed in a co-pending U.S. patent application Ser. No. 15/487,366, filed Apr. 13, 2017.
Referring now to
Referring now to
Referring now to
Referring now to
The preferred configurable computing array 400 can be constructed in many ways. In one preferred embodiment, the preferred configurable computing array 400 is a single-level configurable computing array, wherein the configurable computing elements 100 and the configurable logic elements 200 are disposed on a same physical level. Accordingly, the present invention discloses a preferred single-level configurable computing array.
Alternatively, the preferred configurable computing array 400 is a multi-level configurable computing array, wherein the configurable computing elements 100 and the configurable logic elements 200 are disposed on different physical levels. To be more specific, the memory cells of the configurable computing elements 100 are disposed on at least a memory level, the transistors of the configurable logic elements 200 are disposed on at least a logic level, and the memory level and the logic level are different physical levels. In one preferred example, both the memory cells and the transistors are disposed on the same side of a same semiconductor substrate, but the memory cells are stacked above the transistors (
Comparing with the single-level configurable computing array, the multi-level configurable computing array offers many advantages. First of all, because the memory cells are disposed on a separate memory level(s), the memory level(s) can be dedicated to the LUT storage. As a result, the memory level(s) has a large storage density and therefore, can be used to store a large LUT (for better precision) or more LUTs (for more math functions). Secondly, because they are formed on a separate logic level, the configurable logic elements would have a small footprint. This leads to smaller die size. Thirdly, because the configurable computing elements are disposed above (or, below) the configurable logic elements, the connections coupling the configurable computing elements and the configurable logic elements are relatively short. This leads to a fast speed.
Referring now to
Based on the orientation of the memory cells, the 3D-M can be categorized into horizontal 3D-M (3D-MH) and vertical 3D-M (3D-Mv). In a 3D-MH, all address lines are horizontal and the memory cells form a plurality of horizontal memory levels which are vertically stacked above each other. A well-known 3D-MH is 3D-XPoint. In a 3D-Mv, at least one set of the address lines are vertical and the memory cells form a plurality of vertical memory strings which are placed side-by-side on/above the substrate. A well-known 3D-Mv is 3D-NAND. In general, the 3D-MH (e.g. 3D-XPoint) is faster, while the 3D-Mv (e.g. 3D-NAND) is denser.
The preferred 3D-M in
The 3D-M cell laa comprises a programmable layer 12 and a diode layer 14. The programmable layer 12 could be an OTP layer (e.g. an antifuse layer, used for the 3D-OTP) or an MTP layer (e.g. a phase-change layer, used for the 3D-MTP). The diode layer 14 (also referred to as selector layer, a quasi-conduction layer or other names) is broadly interpreted as any layer whose resistance at the read voltage is substantially lower than the case when the applied voltage has a magnitude smaller than or polarity opposite to that of the read voltage. The diode could be a semiconductor diode (e.g. p-i-n silicon diode), or a metal-oxide (e.g. TiO2) diode. In some embodiments, the programmable layer 12 and the diode layer 14 are merged into a single layer.
The preferred 3D-MV array in
To minimize interference between memory cells, a diode is formed between the word line and the bit line. This diode may be formed by the programmable layer 21, which could have an electrical characteristic of a diode per se. Alternatively, this diode may be formed by depositing an extra diode layer on the sidewall of the memory well (not shown in this figure). As a third option, this diode may be formed naturally between the word line and the bit line, i.e. to form a built-in junction, e.g. P-N junction, or Schottky junction.
The preferred 3D-MV array in
In the preferred embodiments of
Referring now to
This type of integration, i.e. forming the configurable logic elements 100AA-100BB and the configurable computing elements 200AA-200BB on different sides of the substrate, is referred to as two-sided integration. The two-sided integration can improve computational density and computational complexity. With the conventional 2-D integration, the die size of configurable computing array is the sum of those of the configurable computing elements and the configurable logic elements. With the two-sided integration, the configurable computing elements are moved from aside to the other side. This leads to a smaller die size and a higher computational density. In addition, because the memory transistors in the configurable computing elements and the logic transistors in the configurable logic elements are formed on different sides of the substrate, their manufacturing processes can be optimized separately.
Referring now to
The configurable computing-array package 400 in
The configurable computing-array package 400 in
Although their active elements are disposed in a 3-D space, the configurable computing die 100W and the configurable logic die 200W are separate dice. Accordingly, this type of integration is generally referred to as 2.5-D integration. The 2.5-D integration excels the conventional 2-D integration (i.e. single-level configurable computing array) in many aspects. First of all, the footprint of a conventional 2-D integrated configurable computing array is roughly equal to the sum of those of the configurable computing elements, the configurable logic elements and the configurable interconnects. On the other hand, because the 2.5-D integration moves the configurable computing elements from aside to above, the configurable computing-array package 400 becomes smaller and computationally more powerful. Secondly, because they are physically close and coupled by a large number of inter-die connections 160, the configurable computing die 100W and the configurable logic die 200W have a larger communication bandwidth than the conventional 2-D integrated configurable computing array. Thirdly, the 2.5-D integration benefits manufacturing process. Because the configurable computing die 100W and the configurable logic die 200W are separate dice, the memory transistors in the configurable computing die 100W and the logic transistors in the configurable logic die 200W are formed on separate semiconductor substrates. Consequently, their manufacturing processes can be individually optimized.
The preferred embodiments of the present invention are field-programmable computing-array (FPCA) package. For an FPCA package, all manufacturing processes of the configurable computing die and the configurable logic die are finished in factory. The function of the FPCA package can be electrically defined in the field of use. The concept of FPCA package can be extended to mask-programmed computing-array (MPCA) package. For a MPCA package, the wafers containing the configurable computing elements and/or the wafer containing the configurable logic elements are prefabricated and stockpiled. However, certain interconnects on these wafers are not fabricated until the function of the MPCA package is finally defined.
While illustrative embodiments have been shown and described, it would be apparent to those skilled in the art that many more modifications than that have been mentioned above are possible without departing from the inventive concepts set forth therein. The invention, therefore, is not to be limited except in the spirit of the appended claims.
Number | Date | Country | Kind |
---|---|---|---|
2016 1 0125227 | Mar 2016 | CN | national |
2016 1 0307102 | May 2016 | CN | national |
2017 1 0122749 | Mar 2017 | CN | national |
2017 1 0126067 | Mar 2017 | CN | national |
2017 1 0980620 | Oct 2017 | CN | national |
2017 1 0996864 | Oct 2017 | CN | national |
2017 1 0980779 | Oct 2017 | CN | national |
2017 1 0980813 | Oct 2017 | CN | national |
2017 1 0980817 | Oct 2017 | CN | national |
2017 1 0980826 | Oct 2017 | CN | national |
2017 1 0980827 | Oct 2017 | CN | national |
2017 1 0980967 | Oct 2017 | CN | national |
2017 1 0980989 | Oct 2017 | CN | national |
2017 1 0981043 | Oct 2017 | CN | national |
2017 1 0998652 | Oct 2017 | CN | national |
2017 1 0989881 | Oct 2017 | CN | national |
2017 1 0989901 | Oct 2017 | CN | national |
This application is a continuation-in-part of U.S. patent application Ser. No. 16/059,023, filed Aug. 8, 2018, which is a continuation-in-part of U.S. Patent Applications (A)-(D): (A) U.S. patent application Ser. No. 15/793,912, filed Oct. 25, 2017, now U.S. Pat. No. 10,075,168, issued Sep. 11, 2018; (B) U.S. patent application Ser. No. 15/793,968, filed Oct. 25, 2017; (C) U.S. patent application Ser. No. 15/793,927, filed Oct. 25, 2017, now U.S. Pat. No. 10,075,169, issued Sep. 11, 2018; (D) U.S. patent application Ser. No. 15/793,933, filed Oct. 25, 2017, now U.S. Pat. No. 10,141,939, issued Nov. 27, 2018. U.S. Patent Applications (A)-(D) are continuations-in-part of U.S. patent application Ser. No. 15/450,049, filed Mar. 6, 2017, now U.S. Pat. No. 9,838,031, issued Dec. 5, 2017, which is a continuation-in-part of U.S. patent application Ser. No. 15/450,017, filed Mar. 5, 2017, now U.S. Pat. No. 9,948,306, issued Apr. 17, 2018. These patent applications claim priorities from Chinese Patent Application No. 201610125227.8, filed Mar. 5, 2016; Chinese Patent Application No. 201610307102.7, filed May 10, 2016; Chinese Patent Application No. 201710122749.7, filed Mar. 3, 2017; Chinese Patent Application No. 201710126067.3, filed Mar. 6, 2017; Chinese Patent Application No. 201710980620.X, filed Oct. 19, 2017; Chinese Patent Application No. 201710996864.7, filed Oct. 19, 2017; Chinese Patent Application No. 201710998652.2, filed Oct. 20, 2017; Chinese Patent Application No. 201710980817.3, filed Oct. 20, 2017; Chinese Patent Application No. 201710980779.1, filed Oct. 20, 2017; Chinese Patent Application No. 201710980813.5, filed Oct. 20, 2017; Chinese Patent Application No. 201710980826.2, filed Oct. 20, 2017; Chinese Patent Application No. 201710980967.4, filed Oct. 20, 2017; Chinese Patent Application No. 201710981043.6, filed Oct. 20, 2017; Chinese Patent Application No. 201710980989.0, filed Oct. 20, 2017; Chinese Patent Application No. 201710980827.7, filed Oct. 20, 2017; Chinese Patent Application No. 201710989881.8, filed Oct. 23, 2017; Chinese Patent Application No. 201710989901.1, filed Oct. 23, 2017, in the State Intellectual Property Office of the People's Republic of China (CN), the disclosure of which are incorporated herein by reference in their entireties.
Number | Name | Date | Kind |
---|---|---|---|
4870302 | Freeman | Sep 1989 | A |
5046038 | Briggs et al. | Sep 1991 | A |
5835396 | Zhang | Nov 1998 | A |
5954787 | Eun | Sep 1999 | A |
7472149 | Endo | Dec 2008 | B2 |
7512647 | Wilson et al. | Mar 2009 | B2 |
8564070 | Zhang | Oct 2013 | B2 |
9207910 | Azadet et al. | Dec 2015 | B2 |
9225501 | Azadet | Dec 2015 | B2 |
10141939 | Zhang | Nov 2018 | B2 |
Entry |
---|
Paul et al., “Reconfigurable Computing Using Content Addressable Memory for Improved Performance and Resource Usage”, Design Automation Conference (DAC), pp. 786-791, 2008. |
Karam et al, “Emerging Trends in Design and Applications of Memory-Based Computing and Content-Addressable Memories”, Proceedings of the IEEE, vol. 103, issue 8, pp. 1311-1330, 2015. |
Number | Date | Country | |
---|---|---|---|
20190158095 A1 | May 2019 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 15450049 | Mar 2017 | US |
Child | 15793912 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 16059023 | Aug 2018 | US |
Child | 16186571 | US | |
Parent | 15793912 | Oct 2017 | US |
Child | 16059023 | US | |
Parent | 15450017 | Mar 2017 | US |
Child | 15450049 | US | |
Parent | 15793968 | Oct 2017 | US |
Child | 16059023 | Aug 2018 | US |
Parent | 15450049 | Mar 2017 | US |
Child | 15793968 | US | |
Parent | 15450017 | Mar 2017 | US |
Child | 15450049 | US | |
Parent | 15793927 | Oct 2017 | US |
Child | 16059023 | Aug 2018 | US |
Parent | 15450049 | Mar 2017 | US |
Child | 15793927 | US | |
Parent | 15450017 | Mar 2017 | US |
Child | 15450049 | US | |
Parent | 15793933 | Oct 2017 | US |
Child | 16059023 | Aug 2018 | US |
Parent | 15450049 | Mar 2017 | US |
Child | 15793933 | US | |
Parent | 15450017 | Mar 2017 | US |
Child | 15450049 | US |