The invention relates to the technical field of electronic design automation (EDA), specially relates to a method and a system for emulating an IC design with an FPGA, and a storage medium.
Integrated circuit design (IC design) is a design flow with IC or super-large-scale IC as target. IC design comprises application specific integrated circuit (ASIC) design, and ASIC is IC designed and manufactured according to requirements of specific users and needs of specific electronic systems. After the IC design is completed, it will enter tape-out stage, the cost of the tape-out is very high, and once there is a problem, it can result in tape-out failure. In order to reduce the risk of tape-out failure, it is necessary to fully validate software and hardware in the IC before tape-out, so as to timely discover some problems that are difficult to find in the design process and timely adjust the design to ensure the smooth progress of the tape-out. Common verification tools include simulation and emulation. Emulation refers to simulating hardware actually used in the chip design by the user and then emulate it. Emulation is mostly based on field programmable gate array (FPGA) chip for design. The FPGA chip comprises components such as lookup table (LUT) and register, and different functions can be realized by configuring electrical signals of corresponding components through software.
Because the physical structures of IC and FPGA are different, IC design is based on a standard cell library, and FPGA is based on a macro-cell module (lookup table) provided by a manufacturer, if you want to verify the IC design with the FPGA, a certain conversion must be made on the IC design to transplant it on the FPGA. The biggest difference between the cores of an IC and an FPGA chip is a clock structure. The clock structure in an IC includes at least one clock tree, and each clock tree has a clock tree structure composed of a primary clock and a plurality of generated clocks. In an IC design before wiring design, due to circuit delay, glitches are present in the generated clocks generated by combinatorial logic operations. If the generated clocks with glitches are connected to a clock input of a sequential cell, because the clock input of the sequential cell is sensitive to glitches, the sequential cell is mistakenly triggered to sample signals from the data input, resulting in error of the data sample output by a data output of the sequential cell. In order to solve the glitch problem, the delay of combination logic input signal can be controlled by strict wiring design in IC design, and then the generation of glitch can be controlled. However, if IC design is directly applied on FPGA, since FPGA is a pre-wired semi-custom circuit, the generation of glitches cannot be controlled in the manner of controlling delay by controlling wiring length, and the problem of glitch-caused error data sample output by the sequential cell cannot be avoided.
It is an object of the claimed invention to overcome some or all of these shortcomings.
For this, according to the first aspect, the present invention provides a method for emulating an IC design with an FPGA, comprising the steps of:
Wherein, S2.1 includes converting a sequential cell in the plurality of sequential cells to a node, including: i) identifying on the sequential cell: a user clock by which the sequential cell is driven; a data input; a user-clock input and a data output, wherein the user clock is connected to the user-clock input; ii) converting the sequential cell into a node, including: defining a logic path having a pair of endpoints, wherein: the user-clock input constitutes one of the pair of endpoints; and the data output constitutes the other one of the pair of endpoints; defining a node on the logic path between the pair of endpoints; and defining another logic path and defining a node on the other logic path between its pair of endpoints.
Wherein, S3.1 includes 3.1.1) labeling on the directed graph at least a node as group A and at least another node as group B; 3.1.2) modifying the nodes that are labeled as group A; and 3.1.3) modifying the nodes that are labeled as group B.
Further, S3.1.2 includes i) configuring the clock model[CA], which includes a data input[CA], a data output[CA], a user enable[CA] and a clock input[CA].
Wherein, S3.2.2 includes i) configuring a phase shift for the primary clock′ to which a node being labeled as group B is connected, including: i.1) identifying on the path a set of nodes that are labeled as group B, which includes a node[#M] and a node[#N]; and i.2) letting F(x−tM)=F(x−tN), wherein: tM>tN; 1≤N<M≤K; F(x−tM) represents a waveform of the primary clock′ to which the node[#M] is connected; tM is a phase shift; F(x−tN) represents a waveform of the primary clock′ to which the node[#N] is connected; and ty is a phase shift; ii) configuring a phase shift for the primary clock′ to which a node being labeled as group A is connected, including: ii.1) identifying on the path a set of nodes that are labeled as group A, which includes a node[#U]; ii.2) when 1<U<K: identifying a nearest set of nodes to the node[#U] that are labeled as group B, which includes a node[#R] and a node[#S], wherein: R<U<S; and letting exactly one of the two propositions be true: F(x−tU)=F(x−tR), wherein tU≥tR; and F(x−tU)=F(x−tS), wherein tS≥tU; ii.3) when U=1: identifying a nearest set of nodes to the node[#1] that are labeled as group B, which includes a node[#S]; and letting F(x−tU)=F(x−tS), wherein tS≥tU; and ii.4) when U=K: identifying a nearest set of nodes to the node[#K] that are labeled as group B, which includes node[#R]; and letting F(x−tU)=F(x−tR), wherein tU≥tR, wherein: F(x−tR) represents a waveform of the primary clock′ to which the node[#R] is connected; tR indicates the waveform's phase; F(x−tU) represents a waveform of the primary clock′ to which the node[#U] is connected; tU indicates the waveform's phase; F(x−tS) represents a waveform of the primary clock′ to which the node[#S] is connected; and tS indicates the waveform's phase.
The second aspect, the present invention provides a system for emulating an IC design with an FPGA, comprising a processor and a computer-readable storage medium in communication with the processor, wherein: the system implements the method in claim 1 when the processor executes a program in the computer-readable storage medium.
The third aspect, the present invention provides a non-transitory computer-readable storage medium in which at least one instruction or at least one program is stored, wherein: the at least one instruction or the at least one program is loadable and executable by a processor to implement the method for emulating an IC design with an FPGA.
The present invention has at least the following beneficial effects:
The present invention provides a method for emulating an IC design with an FPGA, comprising the steps of: extracting a directed graph by identifying logic paths of the sequential cells in an IC design, and substituting clock models for the sequential cells corresponding to nodes in the directed graph; adding a user enable to an external port of a clock model compared to the port of a sequential cell in the IC design, through a user clock to which a user enable of a clock model is connected and a primary clock′ to which a clock input of a clock model is connected, a clock model[CA] reproducing the function of the sequential cell in the IC design by controlling outputting time of a data sample, and the clock model[CB] reproducing the function of the sequential cell in the IC design by controlling data sampling time; and configuring phase shifts for primary clocks' by which the clock model[CA] and the clock model[CB] are driven to ensure normal operation of a circuit. The primary clocks' are directly connected to the clock inputs of the clock model[CA] and the clock model[CB] and contain no glitch, the user clock is connected to user enables of the clock model[CA] and the clock model[CB], but the user enable is non-sensitive to glitch-containing user clock, so that the invention can solve the technical problem of sampling error due to presence of the user clock while ensuring the functions of the original sequential cells.
Other advantages, objectives and features of the present invention are in non-limiting description in at least one specific example for describing a method and a system for emulating an IC design with an FPGA and a storage medium and according to the figures, wherein:
In an IC design before wiring design, glitches are present in the generated clocks generated by combinatorial logic operations due to circuit delay.
In an IC design before wiring design, glitches are present in the generated clocks generated by combinatorial logic operations due to circuit delay. To better understand where glitches come from, an AND gate circuit with two inputs is adopted for example for description, referring to
In order to solve the problem of sampling error caused by inputting glitch-containing generated clock to a clock input of a sequential cell and realize application of the sequential cell in IC design on FPGA at the same time, sequential cells in an IC design are extracted, and then logic paths in each sequential cell is extracted with the path from a clock input to a data output in a sequential cell as a logic path. A node is defined on the logic path. The flowing direction from a data output of the contemporary node to a clock input of a next node as a directed edge, and a directed graph is obtained by connecting nodes with the edge. The directed graph includes a root node, a plurality of leaf nodes, and a plurality of intermediate nodes between the root node and the plurality of leaf nodes. At least one path is present between the root node and each leaf node. The root node in the directed graph corresponds to the root node in a clock tree. A user clock by which the root node is driven is a primary clock, and the user clocks by which other nodes are driven are generated clocks. Logic paths in all sequential cells of a clock tree in an IC design and the relationship among the logic paths are characterized by all nodes in the directed digraph and the connection relationship among the nodes.
After extracting the directed graph, sequential cells represented by a node of a plurality of nodes in the directed graph as a whole are needed to be substituted with clock models in FPGA with equivalent logic function, including clock models[CA] and clock models[CB]. In order to solve the glitch problem, a new primary clock′ is introduced. The primary clock′ directly is connected to the clock input of the clock model[CA] or the clock model[CB], there is no combinatorial logic between the primary clock′ and the clock input of the clock model[CA] or the clock input of the clock model[CB], and no delay is introduced, so that there is no glitch at the clock inputs of the clock model[CA] and the clock model[CB], solving the glitch problem. The user clock is used as a user enable to be connected to the clock model[CA] or the clock model[CB], since the user clock of the user enable needs to meet the requirements of setup time and retention time, wherein the setup time refers to the time when the data remains stable until an active edge of primary clock′ arrives, the retention time refers to the time when the data remains stable after the active edge of the primary clock′ arrives, while the glitch has generally short retention time, which cannot meet the requirements of setup time and retention time, and can be shielded, so that the glitch is no longer harmful and will not affect data sampling, and the user enable is non-sensitive to glitch. The clock model[CA] and the clock model[CB] are driven by coordination of the primary clock′ and the user clock for data sampling to ensure function recurrence of the sequential cells in an IC design. The invention provides two types of clock models. The clock model[CA] emulates the function of sequential cells in an IC design by controlling a user clock to drive the clock model[CA] to output a data sample from a data output, and the clock model[CB] emulates the function of sequential cells in an IC design by controlling a user clock to sample and output data at a data input in cooperation with the primary clock′. Finally, an appropriate phase shift is configured to the primary clock′ such that the modified circuit can be in normal operation.
Based on this, the invention provides the following three examples. In the scheme provided in example 1, at least one sequential cell is substituted with a clock model[CA] and at least another sequential cell is substituted with a clock model[CB]. In the scheme provided in example 2, all sequential cells are substituted with clock models[CA]. In the scheme provided in example 3, all sequential cells are substituted with clock models[CB].
Referring to
Preferably, the primary clock is a clock input source of an IC design, and usually represents a physical clock.
Preferably, the subcircuit is a combinational logic circuit, a sequential logic circuit or hybrid circuit of a combinational logic circuit and a sequential logic circuit. As an example, the subcircuit can be a frequency divider, a gate clock or a multiplexer, wherein the frequency divider is a sequential logic circuit, and the gate clock and the multiplexer are combinational logic circuits.
Wherein, the generated clocks are derived from the primary clock or a previous-level generated clock after being processed by the subcircuit. Different generated clocks are derived from the primary clock by being processed by different subcircuits, and next-level generated clocks can be also derived from the generated clocks after being processed by another subcircuit, and so on, the primary clock and the topologic structure of the generated clocks form a clock tree structure. The generated clocks are used to be connected to the clock inputs of a plurality of sequential cells to drive the sequential cells for data sampling.
Wherein, the sequential cell comprises a user-clock input, a data input and a data output. When an active edge is occurring on a user clock connected to the user-clock input, the sequential cell is triggered to sample data signal from the data input, and data sample is output by the data output. The active edge of the user clock is used as driving signal to drive the sequential cell for sampling. Under the action of the driving signal, the data input to the sequential cell is sampled and output to obtain the generated clocks. The data sample output from the data output of a sequential cell can be used as clock input signal of the clock input of another sequential cell, and can also be used as data input signal of the data input of another sequential cell.
Wherein, the sequential cells are connected by cascade connection or series connection, referring to
Preferably, the sequential cell is configured to be a latch, a trigger, a register, a shifting register or a memory. Other combined units having the same function in prior art fall within the protection scope of the prevent invention.
It should be noted that, an IC design includes a combinational logic circuit and a sequential logic circuit. The problem to be solved in the present invention is error data sampling due to glitches present in a clock input signal inputted to a sequential cell. In order to solve the problem, it is necessary to extract sequential cells for processing.
Further, S2.1 further includes: 2.2.1) converting a sequential cell in the plurality of sequential cells to a node, including: i) identifying on the sequential cell: a user clock by which the sequential cell is driven; a data input; a user-clock input and a data output, wherein the user clock is connected to the user-clock input; ii) converting the sequential cell into a node, including: defining a logic path having a pair of endpoints, wherein: the user-clock input constitutes one of the pair of endpoints; and the data output constitutes the other one of the pair of endpoints; defining a node on the logic path between the pair of endpoints; and 2.1.2) converting the rest of the plurality of sequential cells to nodes in view of the step of converting a sequential cell in the plurality of sequential cells to a node. The sequential cells can be extracted into an abstract node in the manner of identifying a logic path to extract the node, and only a topologic relationship among related sequential cells is extracted, so as to avoid interference of other irrelevant structures.
Further, the step for connecting an edge of the nodes in S2.2 includes: connecting the nodes with a directed edge having flowing direction from a data output of a node to a clock input of a next node. A directed graph is obtained by connecting nodes while ignoring the combinatorial logic among the sequential cells.
Preferably, the storage method of the directed graph is an adjacency matrix or an adjacency list.
As an example, to better understand the present invention, a clock tree having one part composed of eight sequential cells is adopted for example, referring to
S3) configuring the nodes on the directed graph, wherein, the directed graph includes exactly a root node and a plurality of leaf nodes. It should be noted that, a user clock connected to a root node is configured to be a primary clock, except the root node, the user clocks connected to other nodes in the directed graph are generated clocks.
Further, referring to
Further, S3.1 includes: 3.1.1) labeling a plurality of nodes on the directed graph as group A and group B; 3.1.2) modifying the nodes that are labeled as group A; and 3.1.3) modifying the nodes that are labeled as group B.
It should be noted that, to solve the technical problem proposed by the present invention, the present invention provides two clock models functionally capable of equivalently replacing sequential cells in an IC design, and adopts clock models to the technical problem, wherein, the nodes being labeled as group A are substituted with clock models[CA], and the nodes being labeled as group B are substituted with clock models[CB]. The clock model[CA] reproduces the function of the sequential cell in the IC design by controlling outputting time of a data sample, and the clock model[CB] reproduces the function of the sequential cell in the IC design by controlling data sampling time.
Preferably, in S3.1.1, grouping can be performed in view of user designation, or in view of random designation or in other manners.
Wherein, S3.1.2 further includes the steps of: i) configuring the clock model[CA], which includes a data input[CA], a data output[CA], a user enable[CA] and a clock input[CA].
Preferably, the clock frequency of the primary clock′ is twice over that of the primary clock.
Referring to
The primary clock′ is directly connected to the clock input[CA] of the clock model[CA]; there is no combinatorial logic between the primary clock′ and the clock input[A] of the clock model[CA], so that delay is not introduced to the clock input[CA] and glitches are not present at the clock input[CA] of the clock model[CA]. The glitch-containing user clock is connected to a user enable[CA] of the clock model[CA], while the user clock connected to the user enable[CA] needs to meet the requirements of setup time and retention time, wherein the setup time refers to the time when the data remains stable until an active edge of primary clock′ arrives, the retention time refers to the time when the data remains stable after the active edge of the primary clock′ arrives, while the glitch has generally short retention time, which cannot meet the requirements of setup time and retention time, and can be shielded, so that the glitch in a user clock in the user enable[CA] of the clock model[CA] is no longer harmful and will not affect data sampling, and the glitch problem can be solved by the clock model[CA].
Preferably, referring to
Wherein, the clock detector[E1] is used for detecting whether active edge occurs or not; the sampler[CAs] is used for sampling signals from the data input[CA] at each active edge of the primary clock′ to obtain data sample for a period of the primary clock′ led by the active edge; when an N-th active edge is occurring on the user clock[A]: obtaining a data sample[N] for a contemporary period of the primary clock′; and outputting to the data output[CA] the data sample[N] until an N+1-th active edge occurs on the user clock[A]; and when the N+1-th active edge is occurring on the user clock[A]: obtaining a data sample[N+1] for a contemporary period of the primary clock′; and outputting to the data output[CA] the data sample[N+1] until an N+2-th active edge occurs on the user clock[A], wherein the active edge of the user clock output by the clock detector[E1] is used for driving the sampler[CAs] to output corresponding data sample.
Further, preferably, referring to
Wherein, the multiplexer is configured to be a combinational logic circuit with a plurality of data inputs and single data output. The multiplexer with a plurality of data inputs is a multi-channel digital switch, and can select one data input signal from the plurality of data inputs according to different signals of a signal selection and output to a common data output.
Wherein, the first state holder[Re1] is used for realizing sampling of data signal from the data input[CA] when each active edge of the primary clock′ occurs to obtain data sample. The second state holder[Re2] is used for realizing sampling of data signal from the data input[CA] when each active edge of the primary clock′ occurs to obtain data sample, wherein the primary clock′ is greater than the user clock in clock frequencies, and the first state holder[Re1] and the second state holder[Re2] are connected to the same one primary clock′. Two kinds of data samples are obtained at each active edge of the primary clock′, including: the data sample obtained at contemporary period of the primary clock′ and outputted from the first state holder[Re1], and the data sample obtained at previous one period of the primary clock′ and outputted from the second state holder[Re2]. The multiplexer[MUX1] is used for selectively outputting the data sample of first state holder[Re1] when the user clock[A] is active and outputting the data sample of the second state holder[Re2] when the user clock[A] is inactive. That is, when the i-th active edge of primary clock′ arrives, the first state holder[Re1] samples the input data to obtain data sample 1, if the user clock[A] is active at this time, and the multiplexer[MUX1] is driven to select the data sample of the first state holder[Re1] to output the data 1; when the (i+1)-th active edge of primary clock′ arrives, the first state holder[Re1] continues to sample the input data to obtain the new data sample 2, but at this time the second state holder[Re2] samples the data 1 output by the multiplexer[MUX1] to obtain data 1, if the user clock[A] is inactive at this time, the multiplexer[MUX1] is driven to select the data sample of the second state holder[Re2] to output data 1, and when the (i+2)-th active edge of primary clock′ arrives, if the user clock is active, the multiplexer[MUX1] is driven to output the new data sample 2, otherwise continue to output data 1, so as to achieve that: when the N-th active edge is occurring on the user clock[A]: the data sample N of the primary clock′ in the contemporary period is obtained and output to the data output[CA] until the (N+1)-th active edge is occurring on the user clock[A]; and when the (N+1)-th active edge is occurring on the user clock[A]: the data sample N+1 of the primary clock′ in the contemporary period is obtained and output to the data output[CA] until the (N+2)-th active edge is occurring on the user clock[A].
Further, preferably, referring to
Wherein, the gate circuit[A0] is used for realizing that logic output from the signal output[A0] is opposite to that output from the signal input[A0]. The AND gate circuit[A1] is used for realizing that: the data output[A1] outputs high logic when the first data input[A1] and the second data input[A1] are at high logic state simultaneously, otherwise, the data output[A1] outputs low logic.
Preferably, the combinational circuit of the non-gate circuit[A0] and the AND gate circuit[A1] can comprise lookup tables with the same truth table, wherein the digital combinational circuit realizing lookup tables with the same truth table comprises various types, wherein the lookup table is a structure used to implement the truth table that has written all possible logic combinations and their corresponding logic results in advance. The corresponding logic results can be obtained according to the logic combinations at the input of the lookup table.
Preferably, an active edge of a user clock is configured to a rising edge or falling edge.
Preferably, active edges of the primary clocks' by which the clock model[CA] and the clock model[CB] are driven are configured to be triggered by a rising edge or configured to be triggered by falling edge, or the clock model[CA] is configured to be triggered by a rising edge and the clock model[CB] is configured to be triggered by falling edge, or the clock model[CA] is configured to be triggered by a falling edge and the clock model[CB] is configured to be triggered by rising edge. When the active edge is configured to be triggered by a rising edge, the clock detector adopts a clock detector for detecting a rising edge, such as the clock detector[E1] in
The clock model[CA] can not only reproduce the function of the sequential cell, but also directly performs data sampling after an active edge of the primary clock arrives owing to its internal structure, and only selectively outputs the corresponding sampling results, so that there is no delay problem on the sampling time of the clock model[CA].
Preferably, when an enable of a sequential cell is connected to a circuit, referring to
Preferably, the first state holder[Re1] is configured to be a register or a latch, and the second state holder[Re2] is configured to be a register or a latch.
Wherein, S3.1.3 further includes the steps of:
It should be described that, the external ports of the clock model[CA] and the clock model[CB] are the same, so that substitution method is similar to the method shown in
The primary clock′ is directly connected to the clock input[CB] of the clock model[CB], the glitch-containing user clock is connected to the user enable[CB] of the clock model[CB], the clock model[CB] and the clock model[CB] have the same principle of solving technical problem, there is no glitch in the primary clock′ connected to the clock input[CB], the glitch-containing user clock is connected to the glitch-insensitive user enable[CB], and the glitch does not affect data sampling, so that the clock model[CB] solves the glitch problem.
Preferably, referring to
The function of the clock detector[E2] is the same as that of the clock detector[E1], the internal structure of the clock detector[E2] is the same as that of the clock detector[E1], and here detailed description is avoided.
Wherein, the sampler[CBs] is used for sampling a signal from the data input[CB] when an active edge is occurring on the user clock[B] and the active edge is occurring on the primary clock′ to obtain a data sample for a period of the primary clock′ led by the active edge, and; and when no active edge occurs on the user clock[B] or no active edge occurs on the primary clock′: sampling nothing from the data input[CB], and outputting nothing to the data output[CB].
Circuits for realizing the sampler CBs can be configured to be various types. The examples in the present invention provide two deformation methods.
Preferably, the sampler[CBs] is configured to be a register[ERe1] having an enable. The clock model[CB] has simple internal structure, and the clock model[CA] is realized by combination of more hardware. Compared with the clock model[CA], the clock model[CB] can save a lot of hardware resources, so that the clock model[CA] and the clock model[CB] are cooperated to realize balance of hardware resources.
Preferably, the clock model[CB] includes a register[ERe1] and a clock detector[E2], wherein the register[ERe1] is configured to be a register having an enable, and the output of the clock detector[E2] is connected to the enable of the register.
Preferably, referring to
Preferably, the fourth state holder[Re4] is configured to be a register or a latch.
Preferably, the clock model[CB] also can be functional module composed of other structures, referring to
The third state holder[Re3] is configured to be a register or a latch.
Preferably, when an enable of a sequential cell is connected to a circuit, referring to
Preferably, an active edge is configured to a rising edge or falling edge, wherein, the active edge is the active edge triggered by the user clock and/or the active edge triggered by the primary clock′.
In an IC design, there are anchor points in the combinational logics among sequential cells. The anchor point is the generated clock, which is output by the sequential cell, processed by the combinational logic circuit and then processed by a plurality of branch output ports, and each branch output port is called as an anchor point, and each anchor point is connected to the clock input of the corresponding clock model. Since the user clocks connected to the same anchor point are exactly the same, the results of the user clock active edges of the same user clock output through the clock detector of different clock models are the same, thereby saving hardware resources. The clock models connected to the same anchor point are configured to share a same clock detector to detect an active edge of a user clock. As an example, in order to better understand the anchor point in the present invention, referring to
Combination of common clock detectors includes the following three examples.
Preferably, referring to
Referring to
Wherein, the substituted circuit comprises a p-th clock model[CA] and a g-th clock model[CB], which are series-connected, and the user enable[CA] of the p-th clock model[CA] and the user enable[CB] of the g-th clock model[CB] are connected to a same user clock. The series relationship is that the data output of the previous-level clock model is connected to the data input of the next-level clock model. The modified circuit also comprises a v-th clock model[CB] and a q-th clock model[CB], which are series-connected, and the user enable[CB] of the v-th clock model[CB] and the user enable[CB] of the q-th clock model[CB] are connected to a same user clock. The v-th clock model[CB] and the q-th clock model[CB] also can be connected to a same primary clock′. The problem of timing sequence disorder caused by clock models[CB] used in cascade structure can be solved by using two clock models[CA].
Preferably, an algorithm for identifying the path includes depth first search (DFS) or breadth first search (BFS). Other algorithms for identifying the path in the prior art also fall within the protection scope of the present invention.
It should be noted that, there is only one root node and one leaf node on a path, and the path is led by the root node and terminated by the leaf node.
As an example, referring to
Wherein, whether a clock model[CB] outputs a data sample is controlled by both a user clock and a primary clock′, while whether a clock model[CA] outputs a data sample is completely controlled by a user clock independently. The times for transmitting a same active edge of the user clock to different nodes in a path are in time sequence, the clock model[CA] in the path directly outputs a data sample along occurring of an active edge on the user clock, while the clock model[CB] in the path needs to wait for occurring of an active edge on the primary clock′ to output a data sample. For the clock model[CB], the data sample is transmitted along direction of the path, the node[#N] in small number outputs the data sample firstly, and the node[#M] in large number outputs the data sample later, so that the arriving time sequence of an active edge of the primary clock′ should be configured to ensure outputting sequence of the data sample, thereby ensuring normal operation of a circuit. Therefore, only when the time for outputting the data sample by the node[#M] lags behind that of the node[#N], the node[#M] can obtain a correct data sample. Therefore, it is necessary to configure the phase ty of the primary clock′ connected at the node[#M] with the larger number in the path to be lagged behind the phase ty of the primary clock′ connected at the node[#N] with the small number in the path, so as to ensure that the correct data sample can be obtained. Further, when the phase difference between the tM and tN is equal to or larger than the time for transmitting the data sample from the node[#N] to the node[#M], a correct data sample for a sample period of the primary clock′ can be obtained by performing data sampling at the time when the data is transmitted to the node[#M] or performing data sampling by waiting for a small time different.
As an example, referring to
Wherein, whether a clock model[CA] outputs a data sample is not controlled by a primary clock′, so that the clock model[CA] can be driven by a primary clock′ having the same phase as any clock model[CB] having closest to the clock model[CA]. for the node[#U] and the node[#R], whether the clock model[CA] corresponding to the node[#U] outputs a data sample is not controlled by the primary clock′, so that the phase of the primary clock′ connected at the node[#U] can be equal to that of the primary clock′ connected at the node[#R] or the node[#S]. As shown in
Preferably, τ=tS−tU, satisfying: t0<τ<T/2, or satisfying: t0<τ<T, wherein t0 is a minimum threshold of τ.
Preferably, t0 is a preset offset time threshold.
Preferably, t0 is a time lag between when an active edge occurring on clock model[CB] when a user clock is being inputted into a clock detector and when the active edge of the user clock is being outputted from the clock detector, so as to ensure that the offset time τ is within the time range of the active rising edge outputted by the clock detector.
Understandably, referring to
when U=1: identifying a nearest set of nodes to the node[#1] that are labeled as group B, which includes a node[#S]; and letting F(x−tU)=F(x−tS), wherein tS≥tU, that is, when the root node in a path is inserted with a clock model[CA], it is necessary to satisfy that the phase of the primary clock′ at the node[#S] closest to the root node is equal to or lags behind the phase of the primary clock′ at the root node, therefore, Δt=tS−tU≥0 should be satisfied.
It should be noted that, the phase relationship between the clock model[CA] and the clock model[CB] in the path is defined in Sii.3 and Sii.4 when the root node or leaf node is inserted with a clock model[CA], ensuring a correct data sample can be obtained at a node of the plurality of nodes in the path.
Wherein, F(x−tR) represents a waveform of the primary clock′ to which the node[#R] is connected; tR indicates the waveform's phase; F(x−tU) represents a waveform of the primary clock′ to which the node[#U] is connected; tU indicates the waveform's phase; F(x−tS) represents a waveform of the primary clock′ to which the node[#S] is connected; and tS indicates the waveform's phase.
Preferably, a process for obtaining F(x−tM) includes: obtaining a clock signal[Tc]; performing X frequency division to the clock signal[Tc] to obtain frequency-divided signal, which is F(x); delaying the frequency-divided signal by time ty to obtain F(x−tM), and so on, obtaining waveforms of the primary clocks' at other nodes by the same method. Understandably, the clock frequency of the frequency-divided signal is 1/X times of that of the clock signal[Tc]. The clock signals of the first clock substituting model and the second clock substituting model can be obtained by frequency division of the clock signal, so as to further reduce number of introduced clock domains, wherein the clock signal[Tc] can be a crystal signal.
Preferably, a method for X frequency division of the clock signal[Tc] includes: inputting the clock signal[Tc] into a frequency divider to obtain a frequency-divided signal, wherein X is larger than 0.
Preferably, a method for delaying the frequency-divided signal by a time t includes: inputting the frequency-divided signal into a delay circuit.
The clock tree contains clock domains and data domains. X clock models[CA] are contained in a clock domain, and Y hybrid clock models are contained in a data domain. The Y hybrid clock models include Y1 clock models (CA) and Y2 clock models[CB], wherein Y=Y1+Y2. The clock model[CA] is a clock model having function completely equivalent to a register[FF], and the clock model[CB] is a clock model having a clock delay relative to the clock model[CA]. The primary clock′[mc2] by which the clock model[CB] is driven has a same clock period T as the primary clock′[mc1] by which the clock model[CB] is driven, the primary clock′[mc2] has an offset time τ with respect to the primary clock′[me1], t0<τ<T/2, and t0 is a minimum threshold for τ, wherein, a data signal from the data output of an i-th clock model[CA] in a clock domain is a clock input signal of an i+1-th clock model[CA]; and a data signal from the data output of a j-th hybrid clock model is a data input signal of a j+1-th clock substituting model.
It should be noted that, the number of the clock models[CB] is larger than that of the clock models[CA]. In actual circuits, the number of the sequential cells in a data domain is far larger than that of the sequential cells in a clock domain, so that more clock models[CB] can be used after substitution, thereby converting resource consumption of the system.
For the clock models[CB] in the data domain, when the data signal from the data output of the j-th sequential cell is connected to the data input of the g-th sequential cell, the j-th sequential cell and the g-th sequential cell are both substituted with clock models[CB]. For two clock models[CB] connected in series, the timing sequences of the two clock models[CB] are exactly the same when the primary clocks' connected to the two are the same, thereby avoiding the problem of timing sequence disorder and converting resource consumption of the system.
Preferably, the method further includes:
Preferably, in a data domain, a r-th clock model[CB] and a r+1-th clock model[CB] are provided and in series connection, and the r+1-th clock model[CB] is in series connected with a q-th clock model[CA], and the clock signal between the r+1-th clock model[CB] and the q-th clock model[CA] has an offset time t. A method for shifting the offset time t includes: substituting the r+1-th clock model[CB] with a clock model[CA] such that the offset time t is shifted between the r+1-th clock model[CB] and the q-th clock model[CA]. By shifting offset time through adjusting types of clock substituting models, the maximum clock frequency of the system is further increased, thereby improving performances of the system.
It should be noted that: all clock models[CA] in the clock domain are within same clock domain, and the less the clock domain is introduced, the less resources of the BUFG are required, so as to further save resource consumption of the system. The whole system can simultaneously started-up and paused by controlling one primary clock′.
It should be noted that: the primary clocks' that drive all clock models[CB] in group B are the same.
In conclusion, an embodiment of the present invention provides a method for emulating an IC design with an FPGA, comprising the steps of: extracting a directed graph by identifying logic paths of the sequential cells in an IC design, and substituting a clock model[CA] or a clock model[CB] for a sequential cell corresponding to a node in the directed graph; adding a user enable to an external port of the clock model[CA] or the clock model[CB] compared to the port of a sequential cell in the IC design, through cooperation of a user clock to which a clock model[CA] or a user enable of a clock model[CA] is connected and a primary clock′ to which a clock input is connected, a clock model[CA] reproducing the function of the sequential cell in the IC design by controlling outputting time of a data sample, and the clock model[CB] reproducing the function of the sequential cell in the IC design by controlling data sampling time; and configuring phase shifts for primary clocks' by which the clock model[CA] and the clock model[CB] are driven to ensure normal operation of a circuit. The primary clocks' are directly connected to the clock inputs of the clock model[CA] and the clock model[CB] and contain no glitch, the user clock is connected to user enables of the clock model[CA] and the clock model[CB], but the user enable is non-sensitive to glitch-containing user clock, so that the invention can solve the technical problem of sampling error due to presence of the user clock while ensuring the function of the original sequential cells.
Compared with example 2 and example 3, example 1 has the effect that: the problems of large hardware resource consumption due to all using of clock models[CA] and large number of clock domains due to all using of clock models[CB] are balanced.
A method for emulating an IC design with an FPGA is provided in example 2. The method comprises the steps of:
Wherein, S2.1 for converting the plurality of sequential cells to a plurality of nodes includes: converting a sequential cell in the plurality of sequential cells to a node, including: identifying a user clock by which the sequential cell is driven; identifying a plurality of nodes on the sequential cells and constructing the directed graph by connecting the plurality of nodes; wherein identifying a plurality of nodes on the sequential cells includes: a data input; a user-clock input and a data output, and the user clock is connected to the user-clock input; converting the sequential cell into a node, including: defining a logic path having a pair of endpoints, wherein: the user-clock input constitutes one of the pair of endpoints; and the data output constitutes the other one of the pair of endpoints; defining a node on the logic path between the pair of endpoints, and defining another logic path and defining a node on the other logic path between its pair of endpoints; and converting the rest of the plurality of sequential cells to nodes.
Wherein, S3.1 includes: 3.1.1) labeling a plurality of nodes on the directed graph all as group A; 3.1.2) modifying the nodes that are labeled as group A.
Wherein, S3.1.2 further includes i) configuring the clock model[CA], which includes a data input[CA], a data output[CA], a user enable[CA] and a clock input[CA]; ii) modifying a node[A] in the plurality of nodes being labeled as group A; and iii) modifying the rest of nodes[A] in the plurality of nodes being labeled as group A.
Wherein, S3.1.2.ii includes: ii.1) identifying a logic path by which the node[A] is defined in view of S2, wherein: an endpoint of the logic path is located at a user-clock input[A]; the other endpoint of the logic path is located at a data output[A]; and a user clock[A] is connected to the user-clock input[A].
Wherein, S3.2 includes:
Preferably, S3.1 further includes: configuring for the clock model[CA] a clock detector[E1] and a sampler[CAs]; configuring for the clock detector[E1] a user enable[E1], a clock input[E1] and an active-edge output[E1]; configuring for the sampler[CAs] a data input[CAs], an enable[CAs], a clock input[CAs] and a data output[CAs]; connecting the user enable[E1] to the user enable[CA]; connecting the clock input[E1] and the clock input[CAs] to the clock input[CA]; connecting the active-edge output[E1] to the enable[CAs]; connecting the data input[CAs] to the data input[CA]; and connecting the data output[CAs] to the data output[CA].
Preferably, wherein, S3.1 further includes: configuring for the sampler[CAs] a first state holder[Re1], a second state holder[Re2] and a multiplexer[MUX1]; configuring for the first state holder[Re1] a clock input[Re1], a data input[Re1] and a data output[Re1]; configuring for the second state holder[Re2] a clock input[Re2], a data input[Re2] and a data output[Re2]; configuring for the multiplexer[MUX1] a first data input[MUX1], a second data input[MUX1], a signal selection[MUX1] and a data output[MUX1]; connecting the data input[Re1] to the data input[CAs]; connecting the data output[MUX1] to the data output[CAs]; connecting the clock input[Re1] and the clock input[Re2] to the clock input[CAs]; connecting the data output[Re1] to the first data input[MUX1]; connecting the data input[Re2] to the data output[MUX1]; connecting the data output[Re2] to the second data input[MUX1]; and connecting the enable[CAs] to the signal selection[MUX1].
Preferably, wherein, in S1, the plurality of sequential cells include: a sequential cell[A1] and a sequential cell[2]; in S2, on the directed graph: the sequential cell[A1] is converted to a node[A1]; and the sequential cell[A2] is converted to a node[A2; in S3.1.1, the node[A1] and the node[A2] are both labeled as group A; in S3.1.2: a clock model[CA1] is inserted at the node[A1]; and the clock model[CA1] is configured to include a data input[CA1], a data output[CA1], a user enable[CA1] and a clock input[CA1]; a clock model[CA2] is inserted at the node[A2]; and the clock model[CA2] is configured to include a data input[CA2], a data output[CA2], a user enable[CA2] and a clock input[CA2]; S3.1 further includes: configuring for the clock model[CA1] a sampler[CA1s]; configuring for the sampler[CAs1] a data input[CA1s], an enable[CA1s], a clock input[CA1s] and a data output[CA1s]; configuring for the clock model[CA2] a sampler[CA2s]; configuring for the sampler[CA2s] a data input[CA2s], an enable[CA2s], a clock input[CA2s] and a data output[CA2s]; identifying an anchor to which the user enable[CA1] is connected; and if the user enable[CA2] is connected to the anchor: configuring jointly for the clock model[CA1] and for the clock model[CA2] a clock detector[E12]; configuring for the clock detector[E12] a user enable[E12], a clock input[E12] and an active-edge output[E12]; connecting the user enable[E12] to the anchor, the user enable[CA1] and the user enable[CA2]; connecting the clock input[E12] to the clock input[CA1s], the clock input[CA1], the clock input[CA2s] and the clock input[CA2]; connecting the active-edge output[E12] to the enable[CA1s] and to the enable[CA2s]; connecting the data input[CA1s] to the data input[CA1]; connecting the data output[CA1s] to the data output[CA1]; connecting the data input[CA2s] to the data input[CA2]; and connecting the data output[CA2s] to the data output[CA2].
The example 2 has the benefits that: the phase shifts of all clock models[CA] can be the same, when the phase shifts are the same, all clock models[CA] are located in a same clock domain, so as to not only save BUFG hardware sources, but also achieve the purpose of simultaneously starting-up or pausing a circuit.
A method for emulating an IC design with an FPGA is provided in example 3. The method comprises the steps of:
Wherein, S2.1 for converting the plurality of sequential cells to a plurality of nodes includes: converting a sequential cell in the plurality of sequential cells to a node, including: identifying a user clock by which the sequential cell is driven; identifying a plurality of nodes on the sequential cells and constructing the directed graph by connecting the plurality of nodes; wherein identifying a plurality of nodes on the sequential cells includes: a data input; a user-clock input and a data output, and the user clock is connected to the user-clock input; converting the sequential cell into a node, including: defining a logic path having a pair of endpoints, wherein: the user-clock input constitutes one of the pair of endpoints; and the data output constitutes the other one of the pair of endpoints; defining a node on the logic path between the pair of endpoints; and defining another logic path and defining a node on the other logic path between its pair of endpoints; and converting the rest of the plurality of sequential cells to nodes.
Wherein, S3.1 includes: 3.1.1) labeling a plurality of nodes on the directed graph all as group B; 3.1.2) modifying the nodes that are labeled as group B.
Wherein, S3.1.2 includes i) configuring the clock model[CB], which includes a data input[CB], a data output[CB], a user enable[CB] and a clock input[CB]; ii) modifying a node[B] in the plurality of nodes being labeled as group B; and iii) modifying the rest of nodes[B] in the plurality of nodes being labeled as group B.
Wherein, S3.1.2.ii includes:
Wherein, S3.2 includes: 3.2.1) identifying a path on the directed graph and configuring a phase shift for a primary clock′ to which a node on the path is connected; 3.2.2) numbering a total of K nodes on the path, wherein the total of K nodes include: a node[#1], a node[#N], a node[#M] and a node[#K], wherein the node[#1] is the root node, and the node[#K] is one of the leaf nodes, wherein: 1≤N≤M≤K; 3.2.3) letting F(x−tM)=F(x−tN) tM>tN, wherein: F(x−tM) is represents waveform for a primary clock′ connected to the node[#M], and ty indicates a phase shift; and F(x−tN) represents a waveform of the primary clock′ to which the node[#N] is connected, and tN indicates a phase shift; and 3.2.4) identifying an other path on the directed graph and configuring a phase shift for a primary clock′ to which a node on the other path is connected.
Preferably, wherein, S3.1 further includes: configuring for the clock model[CB] a clock detector[E2] and a sampler[CBs]; configuring for the clock detector[E2] a user enable[E2], a clock input[E2] and an active-edge output[E2]; configuring for the sampler[CBs] a clock input[CBs], an enable[CBs], a data input[CBs] and a data output[CBs]; connecting the data input[CBs] to the data input[CB]; connecting the data output[CBs] to the data output[CB]; connecting the clock input[E2] and the clock input[CBs] to the clock input[CB]; connecting the user enable[E2] to the user enable[CB]; and connecting the active-edge output[E2] to the enable[CBs].
Preferably, the sampler[CBs] is configured to be a register having an enable.
Preferably, wherein, in S1, the plurality of sequential cells include: a sequential cell[B1] and a sequential cell[B2]; in S2, on the directed graph: the sequential cell[B1] is converted to a node[B1], and the sequential cell[B2] is converted to a node[B2]; in S3.1.1, the node[B1] and the node[B2] are both labeled as group B; in S3.1.2: a clock model[CB1] is inserted at the node[B1]; the clock model[CB1] is configured to include a data input[CB1], a data output[CB1], a user enable[CB1] and a clock input[CB1]; a clock model[CB2] is inserted at the node[B2]; and the clock model[CB2] is configured to include a data input[CB2], a data output[CB2], a user enable[CB2] and a clock input[CB2]; S3.1 further includes: configuring for the clock model[CB1] a sampler[CB1s]; configuring for the sampler[CB1s] a data input[CB1s], an enable[CB1s], a clock input[CB1s] and a data output[CB1s]; configuring for the clock model[CB2s] a sampler[CB2s]; configuring for the sampler[CB2s] a data input[CB2s], an enable[CB2s], a clock input[CB2s] and a data output[CB2s]; identifying an anchor to which the user enable[CB1] is connected; and if the user enable[CB2] is connected to the anchor: configuring jointly for the clock model[CB1] and for the clock model[CB2] a clock detector[E12]; configuring for the clock detector[E12] a user enable[E12], a clock input[E12] and an active-edge output[E12]; connecting the user enable[E12] to the anchor, the user enable[CB1] and the user enable[CB2]; connecting the clock input[E12] to the clock input[CB1s], the clock input[CB1], the clock input[CB2s] and the clock input[CB2]; connecting the active-edge output[E12] to the enable[CB1s] and to the enable[CB2s]; connecting the data input[CB1s] to the data input[CB1]; connecting the data output[CB1s] to the data output[CB1]; connecting the data input[CB2s] to the data input[CB2]; and connecting the data output[CB2s] to the data output[CB2].
Example 3 has the benefits that: since the clock model[CB] has simple structure, compared with the example 1 and the example 2, the example 3 can greatly reduce consumption of FPGA hardware resources.
Based on the same inventive conception as the methods in embodiments, an embodiment of the present invention also provides a system for emulating an IC design with an FPGA, comprising a processor and a computer-readable storage medium in communication with the processor, wherein: the system implements the method for emulating an IC design with an FPGA and provided in any one embodiment aforementioned when the processor executes a program in the computer-readable storage medium.
Wherein, the processor can comprise one or more processing cores, such as 4-core processor, 12-core processor, etc. The processor can realize processing by using at least one hardware form of a digital signal processing (DSP), a field programmable gate array (FPGA) and a programmable logic array (PLA). The processor can also include a host processor and a coprocessor. The host processor is a processor used to process the data in wake state, also known as CPU. The coprocessor is a low-power processor used to process data in standby mode. In some embodiments, the processor may be integrated with a graphics processing unit (GPU), which is responsible for rendering and painting the content required by the display. In some embodiments, the processor may also include an artificial intelligence (AI) processor for processing computational operations concerning machine learning.
Wherein, the computer-readable storage medium is a memory device in a computer device, and is used to store information such as computer readable instructions, data structures, program modules or other data. Understandably, the storage medium herein can include a built-in storage medium in the computer device, and, of course, can include an extended storage medium supported by a computer device. The storage medium provides a storage space, in which one or more computer instructions loadable and executable by the processor are also stored. These computer instructions may be one or more computer programs (including program code). It should be noted that the storage medium herein can be a high-speed RAM memory, and also can be a non-volatile memory. For example, the memory medium comprises RAM, ROM, erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other solid memories, CD-ROM, digital video disc (DVD) or other optical memories, tape cartridge, tape, memories or other magnetic memory device, etc. Of course, the skilled in the field know that computer storage media are not limited to the aforementioned.
Based on the same inventive conception as the methods in embodiments, an embodiment of the present invention also provides a non-transitory computer-readable storage medium in which at least one instruction or at least one program is stored, wherein: the at least one instruction or the at least one program is loadable and executable by a processor to implement the method for emulating an IC design with an FPGA and provided in any one embodiment aforementioned, wherein the method for emulating an IC design with an FPGA is described in detail in the method embodiments, and here detailed description is avoided.
Number | Date | Country | Kind |
---|---|---|---|
202211018303.7 | Aug 2022 | CN | national |
202310658622.2 | Jun 2023 | CN | national |
The application is the U.S. national phase of PCT/CN2023/109188 filed Jul. 25, 2023, which claims the benefit of CN202211018303.7 filed Aug. 24, 2022 and CN202310658622.2 filed Jun. 5, 2023, each of which is incorporated herein by reference in its entirety.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/CN2023/109188 | 7/25/2023 | WO |