The present disclosure relates generally to the field of data storage, and, more specifically, to the field of First-in First-out (FIFO) memory.
In integrated circuits designed to process data, First-in First-out (FIFO) memories are typically used to store data between processing states. In hardware form, a FIFO primarily consists of a set of read and write pointers, storage and control logic. The storage may be based on RAM cells in one example or built-in logic storage in another example, such as flip-flops or latches, or any other suitable form of storage elements.
In multithreaded processing systems, it is important that data for each thread can be accessed independently from data for another thread because during multithreaded processing, each thread may be executed at a different rate and data may be pushed in or drained from the FIFOs in a different rate. Conventionally, a FIFO memory in such system typically is dedicated to buffer data for a single processing thread to avoid interference between threads. However, using multiple FIFOs inevitably requires increasing die area, which does not fit in the area optimization scheme in IC designs.
Another issue with FIFO designs lies in that, in a conventional FIFO generator that produces synthesizable code, FIFO designs are optimized by area and not so much by power. RAM arrays provide the benefit of large storage capacity. However, they are usually placed on the boundary of the IC partition and unfortunately typically far away from the logic that uses the storage. Thus, data transmission to and from RAM cells consumes relatively high dynamic power through the long interconnect paths from the using logic to the RAM cells. Also, a RAM array itself may have long internal paths, further contributing to high dynamic power consumption. In contrast, flip-flops or latches are built into or near the using logic and have relatively short interconnect and internal paths. Thus they provide the advantages of lower dynamic power consumption and faster speed, but usually are not used in large volume storage because they consume large areas and are expensive.
Medium to large FIFOs usually use RAM cells to provide large capacities for the worse storage requirements. However, it is observed that these FIFOs are often empty or near empty in many periods, e.g. 70% of the working time they may be empty or near empty.
It would be advantageous to provide a multithreaded FIFO memory that consumes reduced dynamic power as well as reduced die area. It would also be advantageous to provide a generator that is capable of producing synthesizable code for such a FIFO memory.
Accordingly, embodiments of the present disclosure provide an asymmetric FIFO memory comprising a built-in logic storage array, e.g. latches or flip-flops, and a RAM array and consuming reduced dynamic power. The RAM may be located along the IC boundaries while the latch array is disposed close to the logic that uses the FIFO. Embodiments of the present disclosure advantageously include a mechanism to alternate the data entry between the two constituent arrays (a built-in logic storage array and a RAM array) with maximized usage of the built-in logic storage array and without introducing complication of the logic that uses the FIFO. Each of the constituent arrays is divided into a plurality of sections, each section responsible for buffering data for a respective thread. Thereby data for different sections can be independently accessed and associated using logic may use the FIFO memory as if there is separate FIFO for each thread.
In one embodiment of present disclosure, a FIFO operable to buffer data for multiple threads comprises an input to receive data, a first memory block comprising a plurality of sequential logic storage units, a second memory block comprising a plurality of RAM cells, control logic, and an output configured to drain data. Each of the first and the second memory blocks is divided into a plurality of sections. Thus, each memory block comprises a section designated to buffer data for a respective thread. The control logic is configured to control the usage of the FIFO. Data for a respective thread are pushed into a corresponding section of the first memory block if this corresponding section contains vacancies until it becomes full. When it becomes full, pushing data for the same thread to the corresponding section of the second memory block during a corresponding spill-over period until the corresponding section of the first memory block becomes empty, and then using the corresponding section of the first memory block for buffering subsequent incoming data for this thread. Data for this thread are drained in accordance with the entry order thereof. The storage capacity assigned to a respective section in either memory block may be reconfigurable upon the FIFO memory being empty. The first memory block may comprise a plurality of latches or flip-flops or a combination thereof. The first memory block may comprise less storage capacity than the second memory block. The first memory block may comprise one or more latch arrays, with each designated to a respective thread. A section of the second memory block may comprise two spill regions. During a corresponding spill-over period, data may be pushed to the first spill region until the corresponding section of first memory block is empty. Upon the corresponding section of first memory block becoming full, and when the first spill region is not empty, data are pushed to the second spill region until the corresponding section of first memory block is empty.
In another embodiment of present disclosure, a method for buffering data for multiple execution thread using a FIFO memory comprises: a) buffering data for a first execution thread to and from a first segment of a first memory block while the first segment has vacancies; b) upon the first segment of the first memory block becoming full, buffering data for the first execution thread to a first spill region of a first portion of a second memory block during a corresponding spill-over period; c) during the corresponding spill-over period, buffering data for the first execution thread to the first spill region of the first portion of the second memory block until the first segment of the first memory block becoming empty; and d) during b) and c) draining data for the first execution thread out of the first and second memory blocks in accordance with a storage order thereof. The method may further comprise buffering data for the first execution thread to the second spill region of the second memory block during the corresponding spill-over period upon the first segment of the first memory block being full and the first spill region of the first portion being not empty.
In another embodiment of present disclosure, a computer readable non-transient storage medium storing instructions for causing a processor to produce synthesizable code representing a FIFO memory by performing the operations of: 1) obtaining configuration input from a user; 2) generating synthesizable code representing an input operable to receive data; 3) generating synthesizable code representing a first memory block comprising a plurality of memory segments, each comprising a plurality of latches and/or flip-flops; 4) generating synthesizable code representing a second memory block comprising a plurality of RAM sections, each comprising a plurality of RAM cells; 5) generating synthesizable code representing logic coupled with and configurable to control usage of the first memory block and the second memory block. The logic is operable to perform a method comprising: a) buffering data for a respective execution thread to and from a respective memory segment of the first memory block while the respective memory segment has vacancies; b) upon the respective memory segment of the first memory block becoming full, buffering data for the respective execution thread to and from a first spill region of a respective RAM section of the second memory block until the respective memory segment of the first memory block becoming empty; and c) draining data for the respective execution thread out of the first and second memory blocks in accordance with a storage order thereof. The logic may further buffer data for a thread to a second spill region of the RAM section upon the memory segment of the first memory block becoming full and when the first spill region is not empty. The first spill region and the second spill region may merge upon the first spill region becoming empty.
This summary contains, by necessity, simplifications, generalizations and omissions of detail; consequently, those skilled in the art will appreciate that the summary is illustrative only and is not intended to be in any way limiting. Other aspects, inventive features, and advantages of the present invention, as defined solely by the claims, will become apparent in the non-limiting detailed description set forth below.
Embodiments of the present invention will be better understood from a reading of the following detailed description, taken in conjunction with the accompanying drawing figures in which like reference characters designate like elements and in which:
a, 6b, 6c, 6d, 6e, 6f, 6g, and 6h are state diagrams illustrating a sequence of exemplary data buffering processes in an asymmetric multithreaded FIFO memory in accordance with an embodiment of the present disclosure.
Reference will now be made in detail to the preferred embodiments of the present invention, examples of which are illustrated in the accompanying drawings. While the invention will be described in conjunction with the preferred embodiments, it will be understood that they are not intended to limit the invention to these embodiments. On the contrary, the invention is intended to cover alternatives, modifications and equivalents, which may be included within the spirit and scope of the invention as defined by the appended claims. Furthermore, in the following detailed description of embodiments of the present invention, numerous specific details are set forth in order to provide a thorough understanding of the present invention. However, it will be recognized by one of ordinary skill in the art that the present invention may be practiced without these specific details. In other instances, well-known methods, procedures, components, and circuits have not been described in detail so as not to unnecessarily obscure aspects of the embodiments of the present invention. The drawings showing embodiments of the invention are semi-diagrammatic and not to scale and, particularly, some of the dimensions are for the clarity of presentation and are shown exaggerated in the drawing Figures. Similarly, although the views in the drawings for the ease of description generally show similar orientations, this depiction in the Figures is arbitrary for the most part. Generally, the invention can be operated in any orientation.
It should be borne in mind, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Unless specifically stated otherwise as apparent from the following discussions, it is appreciated that throughout the present invention, discussions utilizing terms such as “processing” or “accessing” or “executing” or “storing” or “rendering” or the like, refer to the action and processes of a computer system, or similar electronic computing device, that manipulates and transforms data represented as physical (electronic) quantities within the computer system's registers and memories and other computer readable media into other data similarly represented as physical quantities within the computer system memories or registers or other such information storage, transmission or display devices. When a component appears in several embodiments, the use of the same reference numeral signifies that the component is the same component as illustrated in the original embodiment.
As shown in
As described further below, in accordance with embodiments of the present disclosure, a designated section of the latch array 102 can be used predominantly to buffer data for a particular thread when available to store data, and a section designated to the same thread in the RAM array 103 can be used during spill-over periods when the designated section in the latch array 102 is full. Thus, controlled by the FIFO control logic 108, data flow from a certain thread remains in the sections designated to this thread in all the associated arrays, namely 102 and 103, while the availability of sections designated for other threads may be irrelevant. The control logic 108 may make this sharing between the RAM array 103 and the latch array 102 completely transparent to the using logic 110.
Although for purposes of illustration, the asymmetric multithreaded FIFO 101 is described to contain only one latch array and one RAM array, there is no limit on the number of either type of array that the FIFO 101 may comprise in order to implement this disclosure. In some embodiments, the latch array 102 can be replaced with a flip-flop array or combined with a flip-flop array, as flip-flops share the advantage of consuming relatively low dynamic power.
Each storage array, 202 and 203, is divided into two sections, T0 section and T1 section, with each section responsible for buffering data for a respective thread. A section of the latch array 202 can be used predominantly to buffer data for a particular thread when available to store data, and a RAM section designated to the same thread can be used during spill-over periods for data from this thread when the corresponding latch array section is full.
In some embodiments, each section in the FIFO 200 may have a different storage capacity, e.g., depending on the estimated data flow for a corresponding thread. In some embodiments, the addresses allocated for each thread may be reconfigurable by a user. In some embodiments, this reconfiguration may be permitted only when both arrays, 202 and 203, are empty. In some other embodiments, the addresses allocated for each thread may be dynamically changed.
The FIFO control logic 208 may operate to combine the latch array 202 and the RAM array 203 into an asymmetric multithreaded FIFO memory 200 which buffers the data for two threads of the using logic 210. Incoming data can be received at the input 212 sequentially, pushed and stored into appropriate sections in accordance with the method described above. The FIFO control logic 208 disclosed herein can be implemented as a circuit, an application program or a combination of both.
The stored data from a respective thread can later be read, or consumed, by that thread of the using logic 210 through the output 222 and the MUX 206 in the same order as being pushed in. The sections designated for a respective thread have a separate counter from other sections. Thereby, data for each thread can be accessed independently, as if each thread has a dedicated FIFO memory.
Internally, the FIFO control logic 208 can operate to manage and control the flow of data associated with a certain thread and allocate an incoming data to a corresponding section in either the latch array 202 or the RAM array 203 based on the status of the two arrays. The control logic 208 may identify the source thread of an incoming data by a thread ID associated with the data. Data for T0 remain in T0 section 215 in the latch array 202 and the T0 section 213 in the RAM array 203. Likewise, data for T1 remain in T1 section 216 and 214. To make efficient use of the latch array 202 which typically consumes much less dynamic power than the RAM array 203, each section the latch array 202 is assigned with higher use priority to receive data than a corresponding section in the RAM array 203.
The allocation of the data among the sections inside the asymmetric FIFO 200, can be transparent to the using logic 210, as well as any other logic that communicates with the asymmetric FIFO 200. Moreover, the using logic 210 may only see a FIFO memory 200 having one input and one output without regards to the asymmetric FIFO's 200 internal allocation mechanism.
In some embodiments, a section in a RAM array may only be configured as one spill region. Once a section for a thread in the latch array 202 is full, the subsequent incoming data from the particular thread is pushed into the RAM array section. For example, once T0 section 215 in the latch array 202 is full, data from thread T0 is pushed into T0 section 213 of the RAM array 203 until the latch array section 215 is empty again. In some embodiments, especially those that employ less complicated FIFO control logic to keep track of the ordering of data for economical purposes, data writing operations for a particular thread is put off until the spill region in the RAM array is determined to be completely drained. Thus, in order to further maximize the usage of the latch array, two or more spill regions can be allowed in the RAM array.
In the illustrated embodiment, each section in the RAM array, 213 and 214, is further divided into two spill regions. Data from, e.g., thread T0 is pushed into T0 section 215 in the latch array 202 until it is full. Subsequent T0 data is pushed into T0 spill region 1 in the RAM array 203 until the T0 section 215 in the latch array 202 is empty. When the section 215 is full again and when spill region 1 of the T0 section is not empty, subsequent data is pushed into spill region 2 of the T0 section. Thereby, the FIFO memory as well as the latch array is used more frequently and efficiently.
In some other embodiments, a section for a RAM array can have two or even more spill regions. In some embodiments, the spill regions can be fixed partitions of a RAM array section; while in some others, they can be dynamic partitions of a section of the RAM array 203.
In some embodiments, similar with the RAM array, one or more spill regions can be created in a latch array section. In the illustrated embodiments in
For purposes of practicing the embodiments of the present disclosure, there are no particular requirements on the width or the depth of each array or each section of an individual array, nor are there limitations on the number of threads associated with the multithreaded FIFO memory.
In some embodiments, the asymmetric multithreaded FIFO memory 200 can be configured to be a programmable FIFO capable of performing specific operations on the buffered data before the data are output. Further, in some embodiments, the asymmetric FIFO memory 200 can be configured as a synchronous memory.
In some embodiments, the multithreaded FIFO memory may comprise more than one set of read and write port with each set responsible for data ingression and egression for a separate thread or section. In this manner, data for different threads may be accessed concurrently as well as independently. For example, at a certain moment, T0 section in the latch array can be read while T1 section in the RAM array is being read.
An empty status may be declared when either no data has been stored in it or the stored data have all been consumed by a using circuit. In some other embodiments, the latch array can be defined as empty when a certain number of entries have been consumed. A full status may be declared when every entry has been written to and not consumed.
Upon the Ti section of the latch array becoming empty at 512, incoming Ti data are pushed to the Ti section of the latch array until the section becomes full, as shown in 504. If the Ti sections in both arrays are full, as determined in 507 and 508, it means the FIFO has no vacancies and incoming data have to wait until a space is freed. In the event that the Ti section of the RAM array becomes full at 507 while the Ti section of the latch array has vacancies, a spill region 2 that is not part of an already existed spill region 1 is created in the Ti section of the latch array at 513. Incoming data are pushed into the spill region 2 of Ti section of the latch array. Similarly with the spill regions of the RAM array, the spill region 2 of the latch section expands as data are consumed from the spill region 1 of the latch section (, or as the spill region 1 shrinks) of the latch array Ti section. Once the spill region 2 of the latch array Ti section is full at 515, spill region 2 becomes spill region 1 at 516. At this point, since the latch array Ti section becomes full, data are pushed into the Ti section of the RAM array if it has vacancies, as evaluated at 517. The foregoing steps are then repeated.
Since spill region 1 of Ti section of the RAM array may have to drain completely before Ti section of the latch array can drain again due to the entry ordering of the data, once the Ti section of the latch array is determined to be empty, in some embodiment, it can be determined automatically that spill region 1 of Ti section is empty and accordingly spill region 2 can become spill region 1. In other words, the assertion of an empty section of the latch array operates to trigger the merge of the spill regions.
In this embodiment, only one spill region remains active in one section to receive data at a time. Data can be pushed into the corresponding section of the latch array as soon as it is determined to be empty. However, in some other embodiments, more than one spill regions can be active in one section at a time.
a-6h are state diagrams illustrating a sequence of exemplary data buffering processes in an asymmetric multithreaded FIFO memory in accordance with an embodiment of the present disclosure. The latch array and the RAM array in
Both arrays start as empty. In
Since the T1 data stored in the latch array entered the asymmetric FIFO before those stored in the spill region at this point, the T1 section of the latch array is drained before the T1 section of the RAM array, as shown in
When T1 section of the latch array is full again,
The asymmetric FIFO memory as well as associated circuitry disclosed herein can be produced automatically by a synthesizable code generator, such as VHDL, Verilog, or other hardware description languages known to those skilled in the art.
The generator program comprises components that are used to produce corresponding components of synthesizable code, such as a RAM generator, an input port code generator and an output port interface code generator, and a FIFO generator (not shown). When executed by CPU, the storage code generator produces synthesizable storage code. Generally, the RAM generator code is used to synthesize or instantiate the storage resources within the asymmetric FIFO memory, e.g. flip flops, latches, RAM or the like. The FIFO generator provides an option to manually divide the asymmetric FIFO into a normal RAM array, for example, for the lower address and a sequential circuit or latch array for the upper addresses in accordance with those embodiments disclosed herein. The FIFO generator can lay down a RAM wrapper that looks like a normal RAM, but transparently handles the data flow among the constituent storage units within the FIFO as discussed within the embodiments of the present disclosure.
Table 1 is an exemplary synthesizable code of one multithreaded FIFO instance in accordance with an embodiment of the present disclosure.
Although certain preferred embodiments and methods have been disclosed herein, it will be apparent from the foregoing disclosure to those skilled in the art that variations and modifications of such embodiments and methods may be made without departing from the spirit and scope of the invention. It is intended that the invention shall be limited only to the extent required by the appended claims and the rules and principles of applicable law.
The present disclosure is related to: the co-pending patent application titled, “ASYMMETRIC FIFO MEMORY,” filed on Nov. 8, 2012 and Ser. No. 13/672,596; a U.S. patent titled, “MULTI-THREAD FIFO MEMORY GENERATOR,” U.S. Pat. No. 7,630,389; and a U.S. patent titled, “MULTITHREADED FIFO MEMORY WITH SPECULATIVE READ AND WRITE CAPABILITY,” U.S. Pat. No. 8,201,172. The foregoing related patent application and patents are herein incorporated by reference.