The present disclosure generally relates to storage of data structures, and specifically relates to storage of data structures having multiple read ports.
Data structures, such as look-up tables, may be used in many applications to perform a function on received input data. For example, an arithmetic logic unit (ALU) may perform an operation on a received input value by looking up the value in a look-up table and returning a corresponding output value.
In some cases, such as in single instruction multiple data (SIMD) applications, it may be desirable to be able to perform the same operation on different sets of input data in parallel. As such, multiple ALUs or other circuits may need to be able to access the data contained within the look up table in parallel.
A memory structure having multiple read ports may be used to allow for parallel access by multiple ALUs or other processing devices to a common data structure, such as a look-up table. The memory structure may be constructed using a plurality of memory structures having fewer read ports.
A memory structure having 2m read ports allowing for concurrent access to n data entries can be constructed using three memory structures (e.g., sub-structures) each having 2m-1 read ports. The three memory structures include a first structure providing access to a first half of the n data entries (n/2 entries), a second structure providing access to a second half of the n data entries (n/2 entries), and a difference structure providing access to difference data between the first and second halves of the n data entries (n/2 entries). Each of the 2m ports may be connected to a respective port of each of the 2m-1-port data structures, such that a port may access data from the first half of the n data entries by accessing the first structure or by accessing both the difference structure and the second structure to reconstruct the data stored by the first structure. Similarly, a port may access data from the second half of the n data entries by accessing the second structure or by accessing both the difference structure and the first structure to reconstruct the data stored by the second structure.
As such, a 2-port memory structure for accessing n data entries can be constructed using three 1-port memory structure, each storing n/2 data entries. Similarly, a 2m-port memory structure for accessing n data entries can be constructed using multiple 1-port memory structures, storing a total of (3/2)m*n entries.
The figures depict embodiments of the present disclosure for purposes of illustration only. One skilled in the art will readily recognize from the following description that alternative embodiments of the structures and methods illustrated herein may be employed without departing from the principles, or benefits touted, of the disclosure described herein.
The figures and the following description relate to preferred embodiments by way of illustration only. It should be noted that from the following discussion, alternative embodiments of the structures and methods disclosed herein will be readily recognized as viable alternatives that may be employed without departing from the principles of what is claimed.
A data structure such as a look-up table may be used by an arithmetic logic unit (ALU) or other circuit to perform an operation on received input data. In many parallel processing applications, such as single instruction multiple data (SIMD) applications, multiple ALUs may need to access the data structure in parallel. As such, it is desirable for the data structure to be implemented on a memory structure (e.g., a random access memory (RAM) or read only memory (ROM)) having multiple read ports. In addition, although the present disclosure refers primarily to ALUs as reading data from the data structure via one or more read ports, in other embodiments, any other type of circuit or consumer may read data from the data structure via one or more read ports.
The ALUs 102 may be part of a SIMD or other parallel processor, where each ALU 102 is configured to perform the same arithmetic operation on different sets of input data. For example, each ALU 102 receives a respective set of input data, and performs one or more look-ups on the data structure stored in the memory structure 100 to produce a respective set of output data based upon the function associated with the data structure. In order for the ALUs 102 to operate in parallel, the plurality of ALUs 102 may need to be able to concurrently access the data structure on the memory structure 100. For example,
In some embodiments, a memory structure having multiple read ports, such as the memory structure 100, may be constructed using memory structures having fewer read ports. For example, the memory structure 100 may be constructed from a plurality of memory structures, each having a single read port.
An example way to allow for multiple ALUs to access any of the data of the structure 200 in parallel is to duplicate the data (e.g., entries [0] through [n−1]) across multiple single read port memory structures. For example, duplicating the data of the structure 200 across a second single read port memory structure may be done to construct a combined memory structure with two read ports, each of which can be independently accessed by a different ALU, providing access of any of the data in the original structure. However, this configuration also doubles the amount of memory needed to store the data, as each of the entries [0] through [n−1] will be duplicated across both single port memory structures. Using this type of configuration, in order to build a memory structure having n entries accessible over 2m read ports, a total of n*2m entries will need to be stored, where m comprises a positive integer value.
2-Port Memory Structure
The 2-port memory structure 300 comprises a first 1-port memory structure 305A storing a table containing a first half of the data entries [0] through [n−1], and a second 1-port memory structure 305B storing a table containing a second half of the entries [0] through [n−1]. For ease of explanation, the first half of the data entries may also be referred to as the “lower” half (e.g., entries [0] through [n/2−1]), while the second half may also be referred to as the “upper” half (e.g., entries [n/2] through [n−1]). As such, the first structure 305A may be referred to as the “lower structure”, while the second structure 305B may be referred to as the “upper structure.”
In addition to the lower and upper structures 305A and 305B, the 2-port structure 300 further comprises a third 1-port structure 310 (hereinafter referred to as the “difference structure”) storing n/2 entries, each indicating whether differences exist between a corresponding entry of the lower structure and a corresponding entry of the upper structure. For example, the difference structure may store entries indicating differences between entry [0] of the lower structure and entry [n/2] of the upper structure, between entries [1] and [n/2+1], and so forth. The difference may be determined using any function that allows for the value of a data entry of the lower or upper half to be determined using only the value of the corresponding difference and data entry of the opposite half. For example, in some embodiments, the entries in the difference structure are generated from an exclusive-or (XOR) of corresponding entries of the lower and upper structures. As such, the value of a particular data entry of the lower half can be determined using the corresponding upper half data entry and XOR value, without needing to access the lower structure. In other embodiments, reversible functions other than XOR can be used to calculate the difference entries.
The access circuit 315 comprises a circuit that maps the read ports of the lower structure 305A, upper structure 305B, and difference structure 310 to the two different read ports 320A and 320B (which may be referred to as the lower and upper read ports, respectively). Each of the read ports 320 is configured to receive read requests specifying read addresses of one or more entries to be read. The access circuit 315 comprises, for each of the read ports 320, a multiplexer (MUX) 325 and a difference calculation circuit 330. Each difference calculation circuit 330 is configured to receive data of corresponding entries from the difference structure 310 and one of the lower or upper structures 305A/B, to calculate the value of a corresponding entry from the remaining upper or lower structure 305B/A, such as by implementing an XOR operation or other reversible function. For example, any entry in the upper structure 305B (e.g., entry [n/2]) can be determined from an XOR of the corresponding entries of the lower structure 305A and the difference structure 310 (e.g., entry [0] and entry([0] XOR [n/2])). As such, a particular read port may provide data corresponding to entries of the upper structure 305B, even if the upper structure 305B is unavailable (e.g., due to being accessed by the other read port), by combining data retrieved from the lower structure 305A and the difference structure 310. Similarly, data entries of the lower structure 305A can be determined by accessing the upper structure 305B and the difference structure 310, when the lower structure 305A is unavailable.
In some embodiments, the difference circuits 330 comprise a first difference circuit 330A configured to determine entry values for the upper structure 305B using the lower structure 305A and the difference structure 310, and a second difference circuit 330B configured to determine entry values for the lower structure 305A using the upper structure 305B and the difference structure 310. The first and second difference circuits 330A/B may be referred to as the lower and upper difference circuits respectively.
The MUXes 325 comprise a lower MUX 325A and an upper MUX 325B, each configured to select between the lower structure 305A (for when the read request requests an address from the lower half of the stored entries), the upper structure 305A (for when the read request requests an address from the upper half of the stored entries), and the output of one of the difference circuits 330A or 330B, and provide the selected output to a respective read port 320A/B. For example, the lower read ports 320A receives an output of the lower MUX 325A, which is connected to the difference circuit 330A, while the upper read port 320B receives an output of the upper MUX 325B, which is connected to the difference circuit 330B.
In some embodiments, a conflict control circuit 335 uses a priority scheme to determine how each of the read ports 320A and 320B are able to access the data entries stored by the structures 305A, 305B, and 310. The conflict control circuit 335 is configured to receive addresses from the read ports corresponding to received read requests, and performs conflict resolution between the any concurrently received requests by controlling the MUXes 325A/B to select from which structure each read port 320A/B should receive data from.
For example, as discussed above, the read ports 320 may be designated as a lower read port 320A and an upper read port 320B. The lower read port 320A has “priority” to the lower structure 305A. As such, the conflict control circuit 335 configures the MUX 325A such that all requests through the lower read port 320A for entries in the lower structure 305A are read directly from the lower structure 305A. Similarly, the upper read port 320B has “priority” to the upper structure 305B, such that all requests through the upper read port 320B for entries in the upper structure 305B are read directly from the upper structure 305B. In addition, the conflict control circuit 335 may configure the MUXes 325A/B such that each read port 320A/B may read directly from lower/upper structure 305A/B to which it does not have priority whenever the other read port has not received a concurrent read request to read data from the same structure. However, if both the lower read port 320A and upper read port 320B receive concurrent requests to read one or more entries from the upper structure 305B, then the conflict control circuit 335 configures the MUX 325A such that the lower read port 320A reads from the output of the difference calculation circuit 330A instead, which determines the values of the requested entries of the upper structure 305B using the corresponding entries of the lower structure 305A and the difference structure 310. Similarly, if the lower and upper read ports 320A and 320B receive concurrent requests to read one or more entries from the lower structure 305A, the conflict control circuit 335 configures the MUX 325B to cause the upper read port 320B to read from the output of the difference calculation circuit 330B.
Although
2m-Port Memory Structure
The construction of 2-port memory structures using 1-port memory structures discussed above may be extrapolated to assemble structures with additional numbers of available read ports (e.g., 2m read ports).
For purpose of discussion, the data entries stored by the table implemented on the 4-port memory structure 400 are divided into data subsets “A”, “B”, “C”, and “D”, each corresponding to a quarter of the total data entries of the 4-port structure 400.
The first 2-port memory structure 405A comprises three 1-port memory structures comprising a lower structure 415A storing a table containing the data subset “A”, an upper structure 415B storing a table containing the data subset “B”, and a difference structure 415C storing a table indicating differences between data subsets “A” and “B” (e.g., “A⊕B”). Similarly, the second 2-port memory structure 405B comprises a lower structure 420A storing a table containing the data subset “C”, an upper structure 420B storing a table containing the data subset “D”, and a difference memory 420C storing a table indicating differences between data subsets “C” and “D” (e.g., “C⊕D”). As such, the first 2-port memory structure 405A and the second 2-port memory structure 405B may function as a 2-port lower structure and a 2-port upper structure for the 4-port memory structure 400. The third 2-port memory structure 410 functions as a 2-port difference structure between the first and second 2-port memory structures 405A and 405B, comprising a lower structure 425A storing a table indicating differences between data subsets “A” and “C” (e.g., “A⊕C”), an upper structure 425B storing a table indicating differences between data subsets “B” and “D” (e.g., “B⊕C”), and a difference structure 425C storing a table indicating differences between all four data subsets (e.g., “(A⊕C)⊕(B⊕D)”). As illustrated in
Each of the 2-port memory structures 405A, 405B, and 410 also comprises a respective access circuit 430, hereinafter referred to as sub-access circuits 430 (e.g., sub-access circuits 430A, 430B, and 430C), which may be substantially similar in structure to the access circuit 315 illustrated in
Each port of each of the three sub-access circuits 430 connects to an access circuit 435. For example, the first access circuit 435A connects to the lower read port of each sub-access circuit 430, while the second access circuit 435B connects to the upper read port of each sub-access circuit 430. Each access circuit 435A may have a structure substantially similar to the access circuit 315 of
As illustrated in
Using the construction described above, a read port of a 4-port memory structure is able to access a particular data entry (e.g., a data entry of the data subset “A”) using one of four different methods, allowing for all four read ports of the memory structure to access the data entry in parallel. Using the first method 502, a read port may access the data subset “A” through the 1-port memory structure storing the table containing the data subset “A” (e.g., structure 415A illustrated in
The techniques and construction described above can be further extrapolated to construct memory structures having 2m read ports.
Each of the 2m-1 ports of the 2m-1-port memory structures maps to an access circuit 615 (e.g., access circuits 615-1 through 615-2m-1). For example, a first port of each of the three 2m-1-port memory structures maps to a first access circuit 615-1, a second port of each of the three 2m-1-port memory structures maps to a second access circuit 615-2, and so forth up to the access circuit 615-2m-1.
Each access circuit 615 contains two read ports 625 (e.g., a lower read port and an upper read port 625-1 and 625-2 . . . 625)-2m-1−1) and 625-2m), each able to access either the 2m-1-port lower and upper structures 605A/B directly, or determine the value of data entries of the lower or upper structure using the difference structure 610 and the opposite structure. For example, each access circuit may be configured such that their respective lower read port is always able to access the lower sub-table 605A directly but uses the lower sub-table 605A and the difference sub-table 610 to determine values of entries in the upper sub-table 605B when the respective upper read port needs to concurrently access the upper sub-table 605B. Similarly, the upper read port is always able to access the upper sub-table 605B, but uses the upper sub-table 605B and the difference sub-table 610 to determine values of entries in the lower sub-table 605A when the lower read port is to concurrently access the lower sub-table 605A.
Thus, as illustrated in
Although the techniques described herein primarily discuss constructing memory structures having multiple read ports using 1-port memory structures, it is understood that in other embodiments, memory structures having more than one read port (e.g., 2 read ports, 3 read ports, etc.) may be used to construct memory structures having additional read ports. For example, three memory structures each having k read ports can be used to construct a memory structure having up to 2 k read ports using the configuration described above.
In addition, constructed memory structures are not necessarily limited to 2m ports. For example, if a particular level of the memory structure contains a number of ports that is not a power of two, a subsequent level may also have a number of ports that is not a power of 2 (e.g., three 3-port memory structures can be used to construct a 6-port memory structure). In addition, fewer access circuits may be used at a given level, reducing the total number of available read ports. For example, referring to
Although the above examples illustrate each level of a multiple read port memory structure constructed using two sub-structures storing subsets of data (e.g., lower and upper halves) and a difference sub-structure, it is understood that in other embodiments, different numbers of sub-structures may be used. For example, in some embodiments, the data entries may be divided between three sub-structures and a difference sub-structure. Instead of the difference corresponding to an XOR operation, a different operation (such as addition mod 3) may be used. In some embodiments, an access circuit may be configured to connect and control access to a plurality of the sub-structures to more than two ports.
Writing Data
A multiple read port memory structure (e.g., the 2m-port memory structure 600 described in
When writing data into a 2m-port memory structure, a recursive writing process is used such that the data to be written is reflected in all levels of the structure. For example, referring to the configuration illustrated in
Additional Configuration Information
The foregoing description of the embodiments of the disclosure has been presented for the purpose of illustration; it is not intended to be exhaustive or to limit the disclosure to the precise forms disclosed. Persons skilled in the relevant art can appreciate that many modifications and variations are possible in light of the above disclosure.
Some portions of this description describe the embodiments of the disclosure in terms of algorithms and symbolic representations of operations on information. These algorithmic descriptions and representations are commonly used by those skilled in the data processing arts to convey the substance of their work effectively to others skilled in the art. These operations, while described functionally, computationally, or logically, are understood to be implemented by computer programs or equivalent electrical circuits, microcode, or the like. Furthermore, it has also proven convenient at times, to refer to these arrangements of operations as modules, without loss of generality. The described operations and their associated modules may be embodied in software, firmware, hardware, or any combinations thereof.
Any of the steps, operations, or processes described herein may be performed or implemented with one or more hardware or software modules, alone or in combination with other devices. In one embodiment, a software module is implemented with a computer program product comprising a computer-readable medium containing computer program code, which can be executed by a computer processor for performing any or all of the steps, operations, or processes described.
Embodiments of the disclosure may also relate to an apparatus for performing the operations herein. This apparatus may be specially constructed for the required purposes, and/or it may comprise a general-purpose computing device selectively activated or reconfigured by a computer program stored in the computer. Such a computer program may be stored in a non-transitory, tangible computer readable storage medium, or any type of media suitable for storing electronic instructions, which may be coupled to a computer system bus. Furthermore, any computing systems referred to in the specification may include a single processor or may be architectures employing multiple processor designs for increased computing capability.
Embodiments of the disclosure may also relate to a product that is produced by a computing process described herein. Such a product may comprise information resulting from a computing process, where the information is stored on a non-transitory, tangible computer readable storage medium and may include any embodiment of a computer program product or other data combination described herein.
Finally, the language used in the specification has been principally selected for readability and instructional purposes, and it may not have been selected to delineate or circumscribe the inventive subject matter. It is therefore intended that the scope of the disclosure be limited not by this detailed description, but rather by any claims that issue on an application based hereon. Accordingly, the disclosure of the embodiments is intended to be illustrative, but not limiting, of the scope of the disclosure, which is set forth in the following claims.
This application is a continuation of U.S. application Ser. No. 16/132,196 filed on Sep. 14, 2018, which claims a benefit, and priority, under 35 U.S.C. § 119(e) to U.S. Provisional Patent Application Ser. No. 62/559,390, titled “Data Structures with Multiple Read Ports,” filed on Sep. 15, 2017, which is hereby incorporated by reference in its entirety.
| Number | Name | Date | Kind |
|---|---|---|---|
| 4670856 | Nishino et al. | Jun 1987 | A |
| 5058001 | Li | Oct 1991 | A |
| 5146543 | Vassiliadis et al. | Sep 1992 | A |
| 5179702 | Spix et al. | Jan 1993 | A |
| 5333279 | Dunning | Jul 1994 | A |
| 5379440 | Kelly et al. | Jan 1995 | A |
| 5488729 | VeQesna et al. | Jan 1996 | A |
| 5541914 | Krishnamoorthy et al. | Jul 1996 | A |
| 5590083 | Pinkham et al. | Dec 1996 | A |
| 5594915 | Atalla | Jan 1997 | A |
| 5794062 | Baxter | Aug 1998 | A |
| 5796745 | Adams et al. | Aug 1998 | A |
| 5842034 | Bolstad et al. | Nov 1998 | A |
| 5889413 | Bauer | Mar 1999 | A |
| 5898881 | Miura et al. | Apr 1999 | A |
| 5958041 | Petolino, Jr. et al. | Sep 1999 | A |
| 6181164 | Miller | Jan 2001 | B1 |
| 6243808 | Wang | Jun 2001 | B1 |
| 6279057 | Westby | Aug 2001 | B1 |
| 6298162 | Sutha et al. | Oct 2001 | B1 |
| 6681316 | Clermidy et al. | Jan 2004 | B1 |
| 6712313 | Zoppitelli et al. | Mar 2004 | B2 |
| 6988181 | Saulsbury et al. | Jan 2006 | B2 |
| 7015913 | Lindholm et al. | Mar 2006 | B1 |
| 7181484 | Stribaek et al. | Feb 2007 | B2 |
| 7236995 | Hinds | Jun 2007 | B2 |
| 7272730 | Acquaviva et al. | Sep 2007 | B1 |
| 7339941 | Twomey | Mar 2008 | B2 |
| 7421559 | Yadav | Sep 2008 | B1 |
| 7640528 | Baeckler | Dec 2009 | B1 |
| 7805392 | Steele et al. | Sep 2010 | B1 |
| 7861060 | Nickolls et al. | Dec 2010 | B1 |
| 7912889 | Juffa et al. | Mar 2011 | B1 |
| 7965725 | Langevin et al. | Jun 2011 | B2 |
| 8038539 | Stamps et al. | Oct 2011 | B2 |
| 8089959 | Szymanski | Jan 2012 | B2 |
| 8250555 | Lee et al. | Aug 2012 | B1 |
| 8286172 | Chakradhar et al. | Oct 2012 | B2 |
| 8345540 | Rollins | Jan 2013 | B2 |
| 8370280 | Lin et al. | Feb 2013 | B1 |
| 8407167 | Abts et al. | Mar 2013 | B1 |
| 8583895 | Jacobs et al. | Nov 2013 | B2 |
| 8655937 | Vanderspek | Feb 2014 | B1 |
| 8689202 | Braun et al. | Apr 2014 | B1 |
| 8830993 | Dublin et al. | Sep 2014 | B1 |
| 8850262 | Cardinell et al. | Sep 2014 | B2 |
| 8989220 | Scrobohaci et al. | Mar 2015 | B2 |
| 9009660 | Griffin et al. | Apr 2015 | B1 |
| 9146747 | Moloney et al. | Sep 2015 | B2 |
| 9388862 | Lidak | Jul 2016 | B2 |
| 9432298 | Smith | Aug 2016 | B1 |
| 9442757 | Munshi et al. | Sep 2016 | B2 |
| 9535869 | Zheng | Jan 2017 | B2 |
| 9639490 | Blankenship et al. | May 2017 | B2 |
| 9672188 | Vorbach | Jun 2017 | B2 |
| 9690938 | Saxe et al. | Jun 2017 | B1 |
| 9691019 | Gulland et al. | Jun 2017 | B1 |
| 9697463 | Ross et al. | Jul 2017 | B2 |
| 9710265 | Temam et al. | Jul 2017 | B1 |
| 9710748 | Ross et al. | Jul 2017 | B2 |
| 9723317 | Hattori | Aug 2017 | B2 |
| 9805303 | Ross et al. | Oct 2017 | B2 |
| 10167800 | Chuna et al. | Jan 2019 | B1 |
| 10175980 | Temam et al. | Jan 2019 | B2 |
| 10235735 | Venkatesh et al. | Mar 2019 | B2 |
| 10320390 | Ross | Jun 2019 | B1 |
| 10489680 | Aliabadi et al. | Nov 2019 | B2 |
| 10521488 | Ross et al. | Dec 2019 | B1 |
| 10754621 | Thorson | Aug 2020 | B2 |
| 10776110 | Pearce et al. | Sep 2020 | B2 |
| 10936569 | Baskaran et al. | Mar 2021 | B1 |
| 11086623 | Valentine et al. | Aug 2021 | B2 |
| 20010051860 | Copeland et al. | Dec 2001 | A1 |
| 20020060796 | Kanno et al. | May 2002 | A1 |
| 20020103961 | Ayukawa et al. | Aug 2002 | A1 |
| 20030095547 | Schofield | May 2003 | A1 |
| 20030206527 | Yim | Nov 2003 | A1 |
| 20040078555 | Porten et al. | Apr 2004 | A1 |
| 20040150543 | Wang et al. | Aug 2004 | A1 |
| 20040215679 | Beaumont | Oct 2004 | A1 |
| 20050125594 | Mattausch et al. | Jun 2005 | A1 |
| 20050278505 | Lim et al. | Dec 2005 | A1 |
| 20060161338 | Sohn et al. | Jul 2006 | A1 |
| 20060179207 | Eisen et al. | Aug 2006 | A1 |
| 20060190519 | Stribaek et al. | Aug 2006 | A1 |
| 20060225061 | Ludwig et al. | Oct 2006 | A1 |
| 20070124732 | Lia et al. | May 2007 | A1 |
| 20080126761 | Fontenot et al. | May 2008 | A1 |
| 20080209181 | Petkov et al. | Aug 2008 | A1 |
| 20080244135 | Akesson et al. | Oct 2008 | A1 |
| 20080301354 | Bekooij | Dec 2008 | A1 |
| 20090138534 | Lee et al. | May 2009 | A1 |
| 20090150621 | Lee | Jun 2009 | A1 |
| 20110022791 | Iyer et al. | Jan 2011 | A1 |
| 20110173258 | Arimilli et al. | Jul 2011 | A1 |
| 20110273459 | Letellier et al. | Nov 2011 | A1 |
| 20110320698 | Wang | Dec 2011 | A1 |
| 20120072699 | Vorbach et al. | Mar 2012 | A1 |
| 20120127818 | Levy et al. | May 2012 | A1 |
| 20120159507 | Kwon et al. | Jun 2012 | A1 |
| 20120240185 | Kapoor et al. | Sep 2012 | A1 |
| 20120275545 | Utsunomiya et al. | Nov 2012 | A1 |
| 20120303933 | Manet et al. | Nov 2012 | A1 |
| 20120317065 | Bernstein et al. | Dec 2012 | A1 |
| 20120331197 | Campbell et al. | Dec 2012 | A1 |
| 20130010636 | Regula | Jan 2013 | A1 |
| 20130070588 | Steele et al. | Mar 2013 | A1 |
| 20130212277 | Bodik et al. | Aug 2013 | A1 |
| 20140047211 | Fleischer et al. | Feb 2014 | A1 |
| 20140115301 | Sanghai et al. | Apr 2014 | A1 |
| 20140181171 | Dourbal | Jun 2014 | A1 |
| 20140201755 | Munshi et al. | Jul 2014 | A1 |
| 20140281284 | Block et al. | Sep 2014 | A1 |
| 20150046678 | Moloney et al. | Feb 2015 | A1 |
| 20150378639 | Chien et al. | Dec 2015 | A1 |
| 20150379429 | Lee et al. | Dec 2015 | A1 |
| 20160062947 | Chetlur et al. | Mar 2016 | A1 |
| 20160246506 | Hebig | Aug 2016 | A1 |
| 20160328158 | Bromberg | Nov 2016 | A1 |
| 20160337484 | Tola | Nov 2016 | A1 |
| 20160342892 | Ross | Nov 2016 | A1 |
| 20160342893 | Ross et al. | Nov 2016 | A1 |
| 20160371093 | Chang | Dec 2016 | A1 |
| 20170032281 | Hsu | Feb 2017 | A1 |
| 20170063609 | Philip et al. | Mar 2017 | A1 |
| 20170085475 | Cheng et al. | Mar 2017 | A1 |
| 20170103316 | Ross et al. | Apr 2017 | A1 |
| 20170139677 | Lutz et al. | May 2017 | A1 |
| 20170168990 | Kernert et al. | Jun 2017 | A1 |
| 20170177352 | Quid-Ahmed-Vail | Jun 2017 | A1 |
| 20170220719 | Elrabaa et al. | Aug 2017 | A1 |
| 20170331881 | Chandramouli et al. | Nov 2017 | A1 |
| 20170347109 | Hendry et al. | Nov 2017 | A1 |
| 20170372202 | Ginsburg et al. | Dec 2017 | A1 |
| 20180046903 | Yao et al. | Feb 2018 | A1 |
| 20180046907 | Ross et al. | Feb 2018 | A1 |
| 20180075338 | Gokmen | Mar 2018 | A1 |
| 20180121196 | Temam et al. | May 2018 | A1 |
| 20180121796 | Deisher et al. | May 2018 | A1 |
| 20180145850 | Tam et al. | May 2018 | A1 |
| 20180157966 | Henry et al. | Jun 2018 | A1 |
| 20180191537 | Xiong et al. | Jul 2018 | A1 |
| 20180198730 | Cook et al. | Jul 2018 | A1 |
| 20180247190 | Chuna et al. | Aug 2018 | A1 |
| 20180267932 | Zhu | Sep 2018 | A1 |
| 20180314671 | Zhang et al. | Nov 2018 | A1 |
| 20180315157 | Quid-Ahmed-Vail et al. | Nov 2018 | A1 |
| 20180329479 | Meixner | Nov 2018 | A1 |
| 20180357019 | Karr | Dec 2018 | A1 |
| 20190089619 | Yeager et al. | Mar 2019 | A1 |
| 20190206454 | Ross et al. | Jul 2019 | A1 |
| 20190244080 | Li et al. | Aug 2019 | A1 |
| 20190303147 | Brewer | Oct 2019 | A1 |
| 20190311243 | Whatmough et al. | Oct 2019 | A1 |
| 20190370645 | Lee et al. | Dec 2019 | A1 |
| 20200117993 | Martinez-Canales et al. | Apr 2020 | A1 |
| 20200192701 | Horowitz et al. | Jun 2020 | A1 |
| 20200285605 | Nam | Sep 2020 | A1 |
| Number | Date | Country |
|---|---|---|
| 0940012 | Apr 2002 | EP |
| 3 343 463 | Jul 2018 | EP |
| 2017-062781 | Mar 2017 | JP |
| 200926033 | Jun 2009 | TW |
| 201706871 | Feb 2017 | TW |
| 201706917 | Feb 2017 | TW |
| 201732560 | Sep 2017 | TW |
| 201804320 | Feb 2018 | TW |
| 201810538 | Mar 2018 | TW |
| 2016186826 | Nov 2016 | WO |
| Entry |
|---|
| Taiwanese Intellectual Property Office, Office Action, TW Patent Application No. 108109969, dated Feb. 14, 2020, 12 pages (with concise explanation of relevance). |
| PCT International Search Report and Written Opinion. PCT Application No. PCT/US2019/022357, dated Nov. 7, 2019, 11 pages. |
| United States Office Action, U.S. Appl. No. 16/132,196, dated Dec. 8, 2020, 13 pages. |
| United States Office Action, U.S. Appl. No. 16/132,196, filed May 20, 2020, 22 pages. |
| United States Office Action, U.S. Appl. No. 16/132,196, filed Dec. 11, 2019, 20 pages. |
| Non Final Office Action received for U.S. Appl. No. 16/132,243 dated Dec. 31, 2019, 28 bages. |
| Notice of Allowance received for U.S. Appl. No. 16/132,243 dated Jun. 22, 2021, 47 pages. |
| Notice of Allowance received for U.S. Appl. No. 16/132,243 dated Sep. 30, 2021, 42 pages. |
| Notice of Allowance received for U.S. Appl. No. 16/132,243 dated Dec. 15, 2021, 19 pages. |
| Notice of Allowance received for U.S. Appl. No. 16/526,966 dated Feb. 8, 2021, 45 pages. |
| Notice of Allowance received for U.S. Appl. No. 16/526,966 dated Jun. 21, 2021, 28 pages. |
| Notice of Allowance received for U.S. Appl. No. 16/526,966 dated Oct. 15, 2021, 30 pages. |
| Notice of Allowance received for U.S. Appl. No. 16/526,966 dated Jan. 5, 2022, 18 pages. |
| Communication Pursuant to Article 94(3) EPC received for European Patent Application Serial No. 19827878.0 dated May 22, 2023, 5 pages. |
| Decision to Grant received for Japanese Patent Application Serial No. 2021-527941 dated Mar. 28, 2023, 5 pages (Including English Translation). |
| Written Decision on Registration received for Korean Patent Application Serial No. KR20217012323 dated Apr. 24, 2023, 12 pages (Including English Translation). |
| Wikipedia, “Reduced instruction set computer,” Last edited Jan. 14, 2021, pp. 1-10, [Online] [Retrieved Jan. 20, 2021] Retrieved from the Internet <URL: https://en.wikipedia.ora/wiki/Reduced_instruction_set_computer>. |
| Wikipedia, “SIMD,” Last edited Dec. 18, 2020, pp. 1-10, [Online] [Retrieved Jan. 22, 2021] Retrieved from the Internet <URL: https://en.wikipedia.org/wiki/SIMD>. |
| Wikipedia, “Tensor,” Last edited Jan. 10, 2021, pp. 1-20, [Online] [Retrieved Jan. 15, 2021] Retrieved from the Internet <URL: https://en.wikipedia.org/wiki/Tensor>. |
| Yang et al., “Fast subword permutation instructions based on butterfly network,” Proceedings of SPIE, Media Processor 2000, Jan. 27-28, 2000, pp. 80-86. |
| Office Action received for Taiwan Patent Application Serial No. 108131334 dated Jun. 30, 2022, 6 pages (Including English Translation). |
| Final Office Action received for U.S. Appl. No. 16/951,938 dated Feb. 4, 2022, 23 pages. |
| Non Final Office Action received for U.S. Appl. No. 16/951,938 dated Aug. 17, 2021, 32 pages. |
| Non Final Office Action received for U.S. Appl. No. 16/932,632 dated May 19, 2021, 24 pages. |
| Non Final Office Action received for U.S. Appl. No. 16/928,958 dated Sep. 21, 2021, 19 pages. |
| Non Final Office Action received for U.S. Appl. No. 16/928,958 dated Jul. 23, 2021, 19 pages. |
| Final Office Action received for U.S. Appl. No. 16/928,958 dated Jun. 4, 2021, 18 pages. |
| Non Final Office Action received for U.S. Appl. No. 16/928,958 dated Apr. 12, 2021, 27 pages. |
| Non Final Office Action received for U.S. Appl. No. 16/526,936 dated Jul. 1, 2022, 27 pages. |
| Non Final Office Action received for U.S. Appl. No. 16/277,817 dated May 20, 2020, 37 pages. |
| Final Office Action received for U.S. Appl. No. 16/243,768 dated Apr. 26, 2021, 26 pages. |
| Non Final Office Action received for U.S. Appl. No. 16/243,768 dated Sep. 1, 2020, 22 pages. |
| Non Final Office Action received for U.S. Appl. No. 17/528,609 dated Jan. 4, 2023, 26 pages. |
| Non Final Office Action received for U.S. Appl. No. 17/532,694 dated Jan. 19, 2023, 27 oaaes. |
| Groq, Inc. “The Challenge of Batch Size 1: Groq Adds Responsiveness to Inference Performance” White Paper, Apr. 2020, pp. 1-7. |
| Office Action received for Indian Patent Application Serial No. 202247031762 dated Sep. 20, 2022, 6 pages. |
| Lethin, R.A. et al., “How VLIW Almost Disappeared—and Then Proliferated,” IEEE Solid-State Circuits Magazine, vol. 1, No. 3, Aug. 7, 2009, pp. 15-23. |
| Mercaldi et al. “Instruction Scheduling for a Tiled Dataflow Architecture,” ACM SIGARCH Computer Architecture News, vol. 34, No. 5, Oct. 20, 2006, pp. 141-150. |
| Sotiropoulos, A. et al. “Enhancing the Performance of Tiled Loop Execution onto Clusters Using Memory Mapped Network Interfaces and Pipelined Schedules,” ipdps, Apr. 15, 2002, pp. 1-9. |
| Southard, D. “Tensor Streaming Architecture Delivers Unmatched Performance for Compute-Intensive Workloads” Groq White Paper, Nov. 18, 2019, pp. 1-7. |
| Non Final Office Action received for U.S. Appl. No. 17/684,337 dated Feb. 14, 2023, 45 pages. |
| Non Final Office Action received for U.S. Appl. No. 17/104,465 dated Nov. 12, 2021, 40 pages. |
| Notice of Allowance received for U.S. Appl. No. 16/132,196 dated Apr. 30, 2021, 35 pages. |
| Notice of Allowance received for U.S. Appl. No. 16/243,768 dated May 21, 2021, 30 pages. |
| Non Final Office Action received for U.S. Appl. No. 17/582,895 dated Apr. 6, 2023, 32 pages. |
| Notice of Allowance received for U.S. Appl. No. 16/951,938 dated Dec. 23, 2022, 33 pages. |
| Notice of Allowance received for U.S. Appl. No. 16/132,102 dated Jul. 1, 2021, 26 pages. |
| Notice of Allowance received for U.S. Appl. No. 16/526,916 dated Sep. 20, 2021, 28 pages. |
| Notice of Allowance received for U.S. Appl. No. 16/526,922 dated Aug. 27, 2021, 25 pages. |
| Notice of Allowance received for U.S. Appl. No. 16/526,936 dated Oct. 13, 2022, 23 pages. |
| Notice of Allowance received for U.S. Appl. No. 17/528,609 dated Jan. 30, 2023, 27 pages. |
| Non Final Office Action received for U.S. Appl. No. 16/117,763 dated Oct. 24, 2019, 17 pages. |
| Notice of Allowance received for U.S. Appl. No. 17/532,694 dated Feb. 10, 2023, 27 pages. |
| Notice of Allowance received for U.S. Appl. No. 16/932,632 dated Sep. 9, 2021, 25 pages. |
| Notice of Allowance received for U.S. Appl. No. 16/277,817 dated Sep. 30, 2020, 34 pages. |
| Notice of Allowance received for U.S. Appl. No. 16/928,958 dated Dec. 17, 2021, 16 pages. |
| Notice of Allowance received for U.S. Appl. No. 16/117,763 dated Apr. 14, 2020, 17 pages. |
| Notice of Allowance received for U.S. Appl. No. 16/117,763 dated Jun. 8, 2020, 5 pages. |
| Notice of Intent to Grant for European Patent Application Serial No. 19765954.3 dated Feb. 17, 2023, 74 pages. |
| Notice of Intent to Grant for European Patent Application No. 19765954.3 dated Oct. 17, 2022, 41 pages. |
| Communication Pursuant to Article 94(3) EPC received for European Patent Application Serial No. 19765954.3 dated Feb. 23, 2022, 8 pages. |
| Notice of Allowance received for U.S. Appl. No. 17/105,976, filed Feb. 3, 2022, 28 pages. |
| Notice of Allowance received for U.S. Appl. No. 17/684,337, filed Apr. 13, 2023, 50 pages. |
| Sotiropoulos et al., Enhancing the Performance of Tiled Loop Execution on to Clusters using Memory Mapped NetworkInterfaces and Pipelined Schedules, 2002, citation 1 page. |
| Notice of Allowance received for U.S. Appl. No. 17/697,201 dated Feb. 23, 2023, 37 pages. |
| Notice of Allowance received for U.S. Appl. No. 17/697,201 dated Mar. 7, 2023, 4 pages. |
| International Search Report and Written Opinion received for International PCT Application Serial No. PCT/US2019/062303 dated Mar. 25, 2020, 14 pages. |
| Non Final Office Action received for U.S. Appl. No. 16/686,864 dated Jun. 1, 2021, 22 pages. |
| Non Final Office Action received for U.S. Appl. No. 16/686,866 dated Sep. 23, 2021, 25 pages. |
| Non Final Office Action received for U.S. Appl. No. 16/686,858 dated Jan. 25, 2022, 32 pages. |
| Non Final Office Action received for U.S. Appl. No. 17/519,425 dated Jan. 26, 2023, 17 pages. |
| Bustamam et al. “Fast Parallel Markov Clustering in Bioinformatics Using Massively Parallel Computing on GPU with CUDA And ELLPACK-R Sparse Format,” IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 9, No. 3, Mar. 22, 2012, pp. 679-692. |
| Bouaziz et al., “Parallel Long Short-Term Memory for Multi-Stream Classification,” IEEE Spoken Language Technology Workshop, Dec. 13-16, 2016, pp. 218-223. |
| Fuchs et al., “Parallel Vectors Criteria for Unsteady Flow Vortices,” IEEE Transactions on Visualization and Computer Graphics, vol. 14, No. 3, May-Jun. 2008, pp. 615-626. |
| Gelder et al., “Using PVsolve to Analyze and Locate Positions of Parallel Vectors,” IEEE Transactions on Visualization and Computer Graphics, vol. 15, No. 4, Jul.-Aug. 2009, pp. 682-695. |
| Gil-Cacho et al., “Nonlinear Acoustic Echo Cancellation Based On A Parallel-Cascade Kernel Affine Projection Algorithm,” IEEE International Conference on Acoustics, Speech and Signal Processing, Mar. 25-30, 2012, pp. 33-36. |
| Japan Patent Office, Office Action, Japanese Patent Application No. 2021-527941 dated Dec. 20, 2022, 11 pages (Including English Translation). |
| Request for the Submission of an Opinion received for Korean Patent Application Serial No. 10-2021-7012323 dated Aug. 29, 2022, 10 pages. |
| Rodrigues et al., “SIMDization of Small Tensor Multiplication Kernels for Wide SIMD Vector Processors,” 4th Workshop on Programming Models for SIMDNector Processing, Feb. 2018, pp. 1-8. |
| Suh et al., “A Performance Analysis of PIM, Stream Processing, and Tiled Processing on Memory-Intensive Signal Processing Kernels,” 30th Annual International Symposium on Computer Architecture, Jun. 2003, pp. 410-421. |
| Office Action received for Taiwan Patent Application Serial No. 108142039 dated Jan. 3, 2023, 28 pages. |
| Non Final Office Action received for U.S. Appl. No. 16/686,870 dated May 27, 2022, 61 pages. |
| Final Office Action received for U.S. Appl. No. 16/686,858 dated Jun. 29, 2022, 23 pages. |
| Notice of Allowance received for U.S. Appl. No. 17/519,425 dated Mar. 15, 2023, 25 pages. |
| Notice of Allowance received for U.S. Appl. No. 16/686,858 dated Aug. 3, 2022, 25 pages. |
| Notice of Allowance received for U.S. Appl. No. 16/686,864 dated Jul. 29, 2021, 14 pages. |
| Notice of Allowance received for U.S. Appl. No. 16/686,866 dated Dec. 7, 2021, 13 pages. |
| Notice of Allowance received for U.S. Appl. No. 16/686,870 dated Aug. 17, 2022, 54 pages. |
| Notice of Allowance received for U.S. Appl. No. 16/686,870 dated Aug. 24, 2022, 5 pages. |
| Notice of Allowance received for U.S. Appl. No. 16/686,870 dated Oct. 25, 2022, 5 pages. |
| Non Final Office Action received for U.S. Appl. No. 17/203,214 dated Mar. 15, 2023, 52 pages. |
| Dey et al., “Fast Integer Multiplication Using Modular Arithmetic”, The proceedings of the 40th ACM Symposium on Theory of Computing, 2008, 7 pages. |
| Lopes et al., “A fused hybrid floating point and fixed point dot-ptoduct for FPGAs”, International symposium on Applied reconfigurable computing, ARC 2010, pp. 157-168. |
| Haidar et al., “Harnessing GPU Tensor Cores for Fast FP16 Arithmetic to Speed up Mixed-Precision Iterative Refinement Solvers”, SC18, Nov. 11-16, 2018, Dallas, USA, 12 pages. |
| Abts et al., “Think Fast: A Tensor Streaming Processor (TSP) for Accelerating Deep Learning Workloads,” 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture, May 2020, pp. 145-158. |
| Chang, W. “Computer Organization,” CSC137, Sacramento State University, Spring Semester 2020, pp. 1-70. |
| De et al.' “Fast Integer Multiplication Using Modular Arithmetic,” SIAM Journal on Computing, vol. 42, No. 2, Apr. 18, 2013, pp. 1-18. |
| Groq, “Grog Announces World's First Architecture Capable of 1,000,000,000,000,000 Operations per Second on a Single Chip,” Nov. 14, 2019, 3 pages, [Online] [Retrieved on Jan. 12, 2021] Retrieved from the Internet <URL: https://www.prnewswire.com/news-releases/grog-announces-worlds-firstarchitecture-capable-of-1-000-000-000-000-000-operations-per-second-on-a-single-chip-300958743.html>. |
| Hu et al., “On-Chip Instruction Generation for Cross-Layer CNN Accelerator on FPGA,” 2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI), Jul. 2019, pp. 7-12. |
| Johnson, J. “Making floating point math highly efficient for AI hardware,” Nov. 8, 2018, 9 pages, [Online] [Retrieved on Jan. 20, 2021] Retrieved from the Internet <URL: https://engineering.fb.com/2018/11/08/ai-research/floating-point-math/>. |
| Jouppi et al., “In-Datacenter Performance Analysis of a Tensor Processing Unit,” ISCA '17, Jun. 2017, pp. 1-12. |
| Narksith et al., “Switch adjusting on hierarchical shuffle-exchange networks for all-to-all personalized exchange,” The 2013 10th International Joint Conference on Computer Science and Software Engineering, May 29-31, 2013, pp. 121-126. |
| International Search Report and Written Opinion received for PCT Application Serial No. PCT/US20/62241 dated Feb. 11, 2021, 7 pages. |
| International Search Report and Written Opinion received for PCT Application Serial No. PCT/US2019/048568 dated Nov. 20, 2019, 10 pages. |
| International Search Report and Written Opinion received for PCT Application Serial No. PCT/US2019/068767 dated Mar. 17, 2020, 10 pages. |
| Ren et al., “Permutation Capability of Optical Cantor Network”, IEEE, Dec. 2007, pp. 398-403. |
| Final Office Action received for U.S. Appl. No. 16/132,243 dated Aug. 10, 2020, 32 pages. |
| Non Final Office Action received for U.S. Appl. No. 16/132,243 dated Dec. 31, 2019, 18 pages. |
| Non Final Office Action received for U.S. Appl. No. 17/105,976, filed Sep. 30, 2021, 37 pages. |
| Waksman, A. “A Permutation Network,” Journal of the Association for Computing Machinery, vol. 15, No. 1, Jan. 1968, pp. 159-163. |
| Wang et al., “Hera: A Reconfigurable and Mixed-Mode Parallel Computing Engine on Platform FPGAS” Department of Electrical and Computer Engineering, Jan. 2004, pp. 1-6. |
| Wikipedia, “Complex instruction set computer,” Last edited Dec. 27, 2020, pp. 1-4, [Online] [Retrieved Jan. 20, 2021] Retrieved from the Internet <URL: https://en.wikipedia.org/wiki/Complex instruction set computer>. |
| Wikipedia, “Harvard architecture,” Last edited Mar. 4, 2020, pp. 1-4, [Online] [Retrieved Jan. 20, 2021] Retrieved from the Internet <URL: https://en.wikipedia.org/wiki/Harvard architecture>. |
| Wikipedia, “Instruction pipelining,” Last edited Jan. 14, 2021, pp. 1-8, [Online] [Retrieved Jan. 8, 2021] Retrieved from the Internet <URL: https://en.wikipedia.org/wiki/Instruction pipelining>. |
| Wikipedia, “Parallel computing,” Last edited Jan. 16, 2021, pp. 1-21, [Online] [Retrieved Jan. 22, 2021] Retrieved from the Internet <URL: https://en.wikipedia.org/wiki/Parallel_computing>. |
| Notice of Allowance received for U.S. Appl. No. 17/203,214 dated Jul. 19, 2023, 50 pages. |
| Non- Final office action received for U.S. Appl. No. 18/083,388 dated Jul. 14, 2023, 50 pages. |
| Notice of Allowance received for U.S. Appl. No. 17/684,337 dated Jul. 3, 2023, 91 pages. |
| Notice of Allowance received for U.S. Appl. No. 17/519,425 dated Jun. 20, 2023, 60 pages. |
| Notice of Allowance received for U.S. Appl. No. 17/397,158 dated Aug. 23, 2023, 78 pages. |
| Notice of Allowance received for U.S. Appl. No. 16/951,938 dated Sep. 5, 2023, 81 pages. |
| Notice of Allowance received for U.S. Appl. No. 18/083,388 dated Aug. 31, 2023, 25 pages. |
| Notice of Allowance received for U.S. Appl. No. 17/582,895 dated Aug. 16, 2023, 40 pages. |
| Decision to Grant a Patent received for European Patent Application Serial No. 19765954.3 dated Jun. 29, 2023, 2 pages. |
| Office Action received for Taiwan Patent Application Serial No. 11220743060 dated Aug. 1, 2023, 4 pages. |
| Office Action received for Chinese Patent Application Serial No. 201880006508.9 dated Jul. 19, 2023, 7 pages. |
| Notice of Allowance received for U.S. Appl. No. 17/203,214 dated Aug. 16, 2023, 5 pages. |
| Notice of Allowance received for U.S. Appl. No. 17/582,895 dated Oct. 4, 2023, 12 pages. |
| Notice of Allowance received for U.S. Appl. No. 16/951,938 dated Sep. 27, 2023, 102 pages. |
| Notice of Allowance received for U.S. Appl. No. 18/083,388 dated Oct. 4, 2023, 10 pages. |
| First Office Action received for Chinese Patent Application Serial No. 201980074328.9 dated Aug. 14, 2023, 6 pages (Including English Translation). |
| Number | Date | Country | |
|---|---|---|---|
| 20220101896 A1 | Mar 2022 | US |
| Number | Date | Country | |
|---|---|---|---|
| 62559390 | Sep 2017 | US |
| Number | Date | Country | |
|---|---|---|---|
| Parent | 16132196 | Sep 2018 | US |
| Child | 17397158 | US |