The present application claims priority of Italian Application No. TO2010A000994 filed Dec. 14, 2010, which is incorporated herein in its entirety by reference.
The disclosure relates to techniques for controlling the operation of memories.
In various embodiments, the disclosure can refer to buffer memories of a FIFO (First-In First-Out) type.
In the context of the application outlined previously, an example structure is defined as a “circular FIFO micro-architecture” according to the arrangement represented schematically in
Referring to a purposely simple model in order not to render the treatment excessively complex, a structure comprising three memory locations 101, 102, 103 may be considered, in which the input data are written with a write interface 104 under the control of an input or write pointer (write ptr) and are then read at output with a read interface 105, resorting to a read pointer (read ptr).
Hence, in the case of the structure represented in
In this solution the input data, coming from the write interface, can be guided to any of the memory locations with the write pointer that chooses the location in which to store the input data. This means that the single lane of the datapath presents a so-called “fanout” (i.e., a number of terminal ports to be driven) that is rather high, determined by the number of locations that are to be driven.
An even worse situation arises in the case of the control signal (referred to as “write enable”), which derives in a combinational way from the signals for control of interface flow (write request, write grant). When there is a write operation, said control signal must in fact drive not only the locations of the FIFO buffer, for enabling the one that has been selected by the write pointer, but also all the functional multiplexers that select each bit of the location (the number of which depends evidently upon the size of the memory location).
From a design and implementation standpoint, high fanout values imply the presence of considerable capacitive loads on the paths of the control data signals (datapath/control), which results in a degradation of their timing properties, such as for example the transition time.
As a remedy to possible violations, the tools for design and synthesis of digital circuits (to which this function is normally entrusted) use a technique consisting of automatic building of a tree network with technological cells having buffer functions for generating a number of copies of the same input signal and dividing the overall fanout between them.
This solution is schematically presented in
In the implementation step, if recourse is had to the solution referred to herein, on the write signals a delay is introduced due to their propagation through the elementary buffer cells provided for solving the problem of fanout. It is, in other words, a sort of implementation delay that must be distinguished from the delay introduced by the functional logic and that is due to the cells inserted as a remedy to the possible violations of the design rules.
For example, the aforesaid single write location can be the location “0”, designated by 101 in the figures, even though as single write location it would be possible to choose a different location.
As may be appreciated from a direct comparison of
The structure of
Accordingly, it is envisaged to update the read pointer thus selecting (according to the FIFO logic) the “oldest” data that have not yet been read. This operating mode enables a correct propagation of the data to the reading side, without any loss of information.
In brief, the solution to which
writing the data D1, D2 at input to the memory in a single write location (the location 101, in the example considered herein) from among the plurality of locations 101, 102, 103 of the memory; and
making the (single) write location 101 available for writing an input datum with a shift of the datum previously written in said single write location 101 to another of the locations 101, 102, 103 of the memory.
The corresponding implementation, schematically illustrated in
The solution of
Once again, at the level of design and synthesis of the circuit, it is necessary to insert a tree structure of the buffer cells as a remedy to the violations of fanout, which has an adverse impact on the timing performance of the corresponding micro-architecture as regards the control part of the write interface.
This situation is highlighted in
A solution such as the one illustrated in
From what has been outlined previously, it emerges that FIFO buffer structures of the type described above, with flow-control capacities, do not present ideal modalities of operation, moreover if, for example, the critical aspects of timing are considered, which, above all, do not normally appear evident at an architectural level when the allocation and sizing of the memory locations is defined according to the design targets, but emerge, instead, during the implementation step, when design-dependent variables (fanout, capacitive load, transition time, propagation delay) are taken into account, and are such that the dimensions of the FIFO buffer (expressed in terms of number of locations and size in bits of the individual location) can produce misalignments in the timing performance in practice achieved as compared to the performance expected in the design stage.
The object of the invention is to provide a solution that will be able to overcome said drawbacks.
According to the invention, said object is achieved thanks to a method having the characteristics recalled in a specific way in the ensuing claims. The invention also regards a corresponding system, as well as a computer program product (which can be used, for example, to provide a model for purposes of evaluation of the behavior and of validation in terms of performance) that can be loaded into the memory of at least one computer and comprises portions of software code that are able to implement the steps of the method when the product is run on at least one computer. As used herein, reference to such a computer program product is understood as being equivalent to reference to a computer-readable means containing instructions for control of the processing system for co-ordinating implementation of the method according to the invention. The reference to “at least one computer” is evidently meant to highlight the possibility of the present invention being implemented in a modular and/or distributed form.
The claims form an integral part of the technical teaching provided herein in relation to the invention.
In various embodiments, the disclosure enables a fast write operation of a FIFO buffer to be obtained in a way independent of the number of the locations comprised in the FIFO structure, which enables the desired performance to be achieved irrespective of the dimensions of the component and of the applicational requirements, without any conditioning as regards the timing performance of the component, as may occur in solutions such as the ones described previously.
In various embodiments, it is possible to reduce the impact on the timing performance at the level of instantiation of FIFO buffers, for example, at the interfaces of blocks or of so-called IPs (Intellectual Property), for example ones of a synchronous type and at high frequency, which exchange data with one another on the basis of flow-control protocols.
In various embodiments, this can apply in contexts in which a pure function of pipelining of the data is not sufficient to satisfy the requirements of maximum traffic bandwidth.
In various embodiments, it is possible to solve the problem of the dependence of the timing performance upon the number and the size of the locations of the FIFO structure, which are parameters that are liable both to affect the fanout of the interface-control signals, with the risk of adding capacitive loads, and to give rise to violations at the level of transition times.
In various embodiments, it is possible to eliminate the aforesaid dependence upon the number of the locations of the FIFO structure by modifying the modalities of control of the shift operations in fast-write structures of a current type.
In various embodiments, it is possible to eliminate the dependence of a combinational type between the signals that govern the shifts and the signals that govern the write operation.
In various embodiments, this result can be achieved by eliminating the effect described previously as regards the buffer networks, causing timing of the FIFO buffers not to change with the number of the locations.
In various embodiments, it is thus possible to choose the optimal dimensions of the FIFO buffers according to the applicational requirements without this entailing timing problems, with the same area occupation or even with a lower area occupation.
Various embodiments can be applied in all the contexts where there are used memory components (such as buffers) with FIFO structure of a configurable type with a capacity for managing flow control of an embedded type with the presence of write interfaces that are able to manage control signals and data with time constraints that are very stringent with respect to the reference timing cycle.
Various embodiments are suited for being used, for example, as input stage of synchronous interfaces of IPs or components, within the so-called Systems on Chip (SoC) or within application-specific integrated circuits (ASICs), ensuring a functional uncoupling in addition to the improvement in terms of timing performance. This enables an easier and more efficient integration without this leading to conditioning in a limiting sense the IP or the fastest component.
Various embodiments are suited for being used as element of a complex re-timing stage in components of the pipeline type where there are required mechanisms of flow control in order to guarantee the performance in terms of traffic over the entire passband, cases in which the simple re-timing stages prove inadequate.
Various embodiments are suited for being used as interconnect IPs for the implementation at the level of Network on Chip (NoC) and of conversion components for protocol or bus-size conversions.
In brief, various embodiments can afford, at least in part, the following advantages:
elimination of the dependence of a combinational type between the control signals of a write-enable type generated in write interfaces with critical characteristics in terms of timing and the actions of updating of the locations of the FIFO buffer (said updating actions, in the solutions discussed in the introductory part of the present description, being performed via an enable function of a selective type or else via a shift);
possibility of the tree networks of buffer cells envisaged for the write-enable signal in order to remedy the violations at the fanout level being controlled with tools for design/synthesis of the circuits and depending only upon the size of the respective location of the FIFO buffer (i.e., upon the number of bits of each location);
timing performance on the data and control signals of the write interfaces rendered independent of the number of locations of the FIFO buffer;
easier integration of the IPs and of the components, which is due both to timing performance rendered less critical and to the fact that the changes of the configuration of the dimensions of the FIFO buffer do have not any effect at the timing level; and
possibility of reducing the requirements in terms of optimization at the level of synthesis of the circuit and of back-end tools in order to adapt the timing constraints on the interfaces where these buffers are used, a fact that results in benefits at the level of area occupation and of management of the connections from the design standpoint, in particular as regards the runtime of the corresponding tools.
The basic principles that underlie various embodiments can be summarized as follows:
identification of a way for eliminating the combinational functional dependence between the operations of write and shift in fast-write FIFO micro-architectures (thus providing an intra-cycle uncoupling between the write and shift operations);
there is no need of performing the operation of shift on the locations of the FIFO buffer that aims at preventing overwriting of the data previously written in the write location is only when a new writing operation takes place (i.e., when the intra-cycle combinational approach “shift when overwrite” is adopted);
each write operation on the FIFO structure at a given clock cycle (e.g., N) scheduling a shift operation on the next clock cycle (e.g., N+1) without waiting for the subsequent write operation (hence adopting an inter-cycle sequential approach of the “write then shift” type).
The invention will now be described, purely by way of non-limiting example, with reference to the annexed drawings, in which:
In the ensuing description, various specific details are illustrated aimed at an in-depth understanding of the embodiments. The embodiments may be provided without one or more of the specific details, or with other methods, components, materials, etc. In other cases, known structures, materials, or operations are not shown or described in detail to prevent the various aspects of the embodiments from being obscured.
Reference to “an embodiment” or “one embodiment” in the framework of the present description is meant to indicate that a particular configuration, structure, or characteristic described in relation to the embodiment is comprised in at least one embodiment. Hence, phrases such as “in an embodiment” or “in one embodiment” that may be present in various points of this description do not necessarily refer to one and the same embodiment. In addition, particular conformations, structures, or characteristics can be combined in any adequate way in one or more embodiments.
The references used herein are only provided for convenience and hence do not define the sphere of protection or the scope of the embodiments.
In
Once again, reference will be made herein by way of non-limiting example, to write operations that are performed (only) in the first location (location “0”) designated by 101. From this fact it follows that the operations of shift which will be described more fully in what follows occur in the form of shift-up operations.
Of course, the choice of the location “0” as single write location and of the shift-up operation is provided purely by way of example in no way limiting the scope of the description. Various embodiments could in fact use as write location a location different from the location “0” designated by 101, and the operations of shift described could assume the nature of both shift-up and shift-down operations. The fact of referring to the location “0” and to shift-up operations presents the advantage of simplifying the illustration of embodiments provided by way of non-limiting example.
For the same considerations set forth above, these choices provided by way of example present the advantage of driving the reading side through a controlled read pointer (read ptr) of a simple type structured so as to choose for reading the oldest data of the FIFO structure that have not yet been read.
Consistently with the scheme set out previously, also the representation provided by way of example in
Once again, it is recalled that said representation is provided purely by way of example, in so far as loading of the data could also occur in a different location, with the consequence that the operations of shift could occur both upwards and downwards.
The representation of
As has already been said, in solutions corresponding to those of
Instead, in various embodiments as the ones considered here (on the left in
This procedure can be viewed as a sort of look-ahead approach to prevent any possible overwriting of the memory locations, based, however, of the observation of the fact that there is no need to wait for the condition of overwriting before freeing the write location.
In particular, the comparison of
In particular, parts-a) and -b) of
Part-b) of
In this way, in the example of embodiment considered here, the location 101 (where, once again by way of example, that the write operations are assumed to occur) is available at the next cycle N+3 (part-c) of
In the example here considered, the shift operation represented on the left in part-c) of
Even though the location 101 is not in itself cleared, the next operation of writing of the datum D2 in the memory location 101 can occur freely, without any risk of loss of information, in so far as the datum D2 overwrites a datum that is no longer of interest: the same datum has been shifted into the location 102 by duplicating it, i.e., by copying it in said location.
From part-d) of
Part-e) on the left in
Once again it is assumed by way of example that the memory location 101 can be made available for writing a new datum without the need for it being cleared, it being possible simply to store in the memory location 101 an old datum that has been by now duplicated and consequently can be overwritten by a new input datum without loss of information.
It will be appreciated, instead, that in the solutions operating according to the criteria illustrated with reference to
In brief, in the solutions according to
In various embodiments, this control mode on the write side can be co-ordinated with the read operations so as to maintain consistency of the data available at output.
In various embodiments, it is hence possible to eliminate the combinational dependence between write and shift operations.
As a result of the foregoing is the fact that, in various embodiments, the write-enable control signal arriving on the write interface does not drive directly updating of all the locations of the buffer, but only the location 101 requested for writing in so far as the shift operation can be controlled by a finite-state machine 108 associated to the FIFO structure to which this task is entrusted.
This characteristic is better highlighted in
Whereas the buffer network 104a of
In this regard, it is once again possible to compare the solution of
The scheme of
At the same time, the network 104a that manages the shift operations can be reduced in effect to a pair of cells subjected to control of a finite-state machine 108, which, on the one hand, schedules the shift and, on the other, updates the control state. In this way, at the level of design and synthesis tools, in order to take into account the violations at the transition level, it is possible to adopt a smaller buffer-tree structure, which reduces the delay of propagation that affects the control signal.
In various embodiments, the finite-state machine 108 can drive the locations 102, 103 different from the write location 101 thus managing their updating via shift operations, uncoupling the cycle from the write operation and hence from the corresponding interface.
In fact the corresponding fanout is loaded on the shift signal controlled by the finite-state machine 108, and the tree network, albeit still present, is such that the entire clock cycle available for compensating the delay of propagation introduced by the buffer network (in other words the shift signal controlled by the finite-state machine 108) does not present critical aspects of timing as in case of the interface signal.
In this way, the control of writing is rendered independent of the number of locations present in the FIFO structure (once again it is recalled that the reference to three locations 101, 102, 103 is an example that is extremely simplified for clarity of illustration) ensuring that the performance in terms of timing will be maintained in a practically identical way also in the presence of different FIFO configurations.
In various embodiments, the finite-state machine 108 can undertake also control of the level of filling of the FIFO structure.
When the last write operation possible is executed (which is considered as being the last possible since there remains available for writing only the location in which the write operation is normally carried out—hence only the location 101, in the example here considered—in so far as all the other locations contain valid data that have not yet been read), the finite-state machine 108 would in fact schedule a shift operation that cannot take place in the next cycle in so far as the FIFO structure is full.
In various embodiments, the finite-state machine 108 can be thus configured in such a way as to have, with respect to a homologous finite-state machine that could be used in solutions of the same type as the ones represented in
In said conditions, as soon as there is a reading operation (the only admissible operation when the FIFO structure is completely full), the finite-state machine 108 is able to update the locations and the read pointer accordingly, taking into account the fact that there was a shift operation pending starting from the previous write operation.
In various embodiments, this mechanism can require only an additional bit to cause the finite-state machine 108 to be able to support a three-state implementation (the finite-state machines that can be used in solutions such as that of
The scheme of
These states are basically three, i.e., an inactivity or idle state 200, an active state 202, which governs actuation of the shift, and a wait state 204, which takes into account, in the presence of a FIFO structure completely full of data, as described previously, the fact that following upon the last possible writing operation there is a shift pending.
The write-enable signal (designated in the legend of
The read-enable signal (rd_en) determines the transition from the shift-pending state 204 to the idle state 200, updating the state when there is a reading operation with the FIFO structure full, thus enabling the shift.
All three states of the finite-state machine 108 control updating of the read pointer so as to maintain the consistency between the data written and the data read, with passage, according to the applicable condition, to the states 200, 202 or 204 when there are read events and to the state 202 when there are write events.
In various embodiments, the finite-state machine 108 can also perform the task of controlling generation of the grants to the write interface, enabling the operations of writing according to the state of the FIFO structure.
The block diagram of
The block diagram of
The first entity 110 supervises management of the grants to the subsequent writing operation intervening accordingly, for example, on a flip-flop 114 that generates at output the write-grant signal proper.
The entity 112 supervises management of the read pointers intervening on a set of logic elements such as, for example flip-flops 116, which generate the signal corresponding to the read pointer (read ptr), which is sent back also to the finite-state machine 108 for the reasons illustrated previously. The logic 112 moreover supervises, for example via another flip-flop 118, generation of the read requests.
Tests conducted by the present applicant show, for example, that in the case of an operation at a rate of 400 MHz in a context such as the one schematically represented in the figures of the annexed drawings, the worse violation in terms of timing (the so-called Worse Negative Slack—WNS) is, in various embodiments, altogether independent of the FIFO configuration.
Instead, in solutions such as those of
In particular, it may be noted that, by doubling the number of locations of the FIFO in solutions such as those represented in
In various embodiments the improvement of the timing performance has a positive effect also from the standpoint of area occupation, since various embodiments require tree networks of buffer cells that are far less critical for driving the control signals with high fanout, hence with a reduced recourse to solutions for compensation and optimization of the timings, such as logic parallelism and duplication.
Of course, without prejudice to the principle of the invention, the details of implementation and the embodiments may vary, even significantly, with respect to what is described and illustrated herein purely by way of example, without thereby departing from the scope of the present invention, as defined in the ensuing claims.
Number | Date | Country | Kind |
---|---|---|---|
TO10A0994 | Dec 2010 | IT | national |
Number | Name | Date | Kind |
---|---|---|---|
5463340 | Takabatake et al. | Oct 1995 | A |
5611068 | Giangarra et al. | Mar 1997 | A |
6516384 | Clark et al. | Feb 2003 | B1 |
7107393 | Sabih | Sep 2006 | B1 |
7669086 | Gower et al. | Feb 2010 | B2 |
8225034 | Golla et al. | Jul 2012 | B1 |
20040006671 | Handgen et al. | Jan 2004 | A1 |
Number | Date | Country |
---|---|---|
0 416 513 | Mar 1991 | EP |
2009060260 | May 2009 | WO |
Entry |
---|
Microsoft Computer Dictionary, Mar. 15, 2002, Microsoft Press, 1st edition. |
Italian Search report for IT TO2010A000994, mailed May 7, 2011, pp. 8. |
Number | Date | Country | |
---|---|---|---|
20120311276 A1 | Dec 2012 | US |