The present invention relates generally to integrated circuits (“ICs”), and more particularly to synchronizing multiple simulators used to model an IC system.
Programmable logic devices (PLDs) are a well-known type of integrated circuit that can be programmed to perform specified logic functions. One type of PLD, the field programmable gate array (FPGA), typically includes an array of programmable tiles. These programmable tiles can include, for example, input/output blocks (IOBs), configurable logic blocks (CLBs), dedicated random access memory blocks (BRAM), multipliers, digital signal processing blocks (DSPs), processors, clock managers, delay lock loops (DLLs), and so forth.
Each programmable tile typically includes both programmable interconnect and programmable logic. The programmable interconnect typically includes a large number of interconnect lines of varying lengths interconnected by programmable interconnect points (PIPs). The programmable logic implements the logic of a user design using programmable elements that can include, for example, function generators, registers, arithmetic logic, and so forth.
The programmable interconnect and programmable logic are typically programmed by loading a stream of configuration data into internal configuration memory cells that define how the programmable elements are configured. The configuration data can be read from memory (e.g., from an external PROM) or written into the FPGA by an external device. The collective states of the individual memory cells then determine the function of the FPGA.
Another type of PLD is the Complex Programmable Logic Device, or CPLD. A CPLD includes two or more “function blocks” connected together and to input/output (I/O) resources by an interconnect switch matrix. Each function block of the CPLD includes a two-level AND/OR structure similar to those used in Programmable Logic Arrays (PLAs) and Programmable Array Logic (PAL) devices. In some CPLDs, configuration data is stored on-chip in non-volatile memory. In other CPLDs, configuration data is stored off-chip in non-volatile memory, then downloaded to volatile memory as part of an initial configuration sequence.
For all of these programmable logic devices (PLDs), the functionality of the device is controlled by data bits provided to the device for that purpose. The data bits can be stored in volatile memory (e.g., static memory cells, as in FPGAs and some CPLDs), in non-volatile memory (e.g., FLASH memory, as in some CPLDs), or in any other type of memory cell.
The programming used to configure an FPGA or other PLD is often very complex. It is common to use a modeling system to simulate the operation of the programming to evaluate how a physical FPGA will operate when used in a system, such as a system on a chip (“SoC”). In some systems, a PLD interfaces with or includes functional blocks. For example, an FPGA includes an embedded processor operating at a first clock speed, and an I/O interfacing peripheral and a customized computation peripheral (such as a digital processing or image processing filter) operating at a different clock speed. Multiple simulators are integrated into the modeling system to simulate the different functional blocks. In yet other instances, the PLD devices themselves are used in the simulation as emulators. In this case, a portion of a design physically runs on a PLB device while the rest of the design is simulated by the simulators running on a host PC. A modeling system interface controls the simulation progress of the software simulators or emulation hardware, and exchange simulation data between them when needed.
One the one hand, the number of clock cycles used by a functional blocks to process one input data sample may be vastly different. For example, a microprocessor runs an operating system, and requires thousands of clock cycles to manage the operating system services (e.g., interrupt service routines, data transfer to off-chip devices). In comparison, an adder functional block usually just needs one or a few clock cycles to finish the addition/subtraction of one set of data samples.
On the other hand, the different simulators and the emulation hardware may have significant different simulation speeds. For example, the emulation hardware component can simulate tens of millions of clock cycles per second, while a low-level HDL simulator can only simulate a few kilo clock cycles per second.
Considering the above two factors, the times spent by the simulators integrated inside the modeling system to simulate the IC system are often significantly different.
The functional blocks have precedence relations when processing the input data and need to exchange output data with each other. It is required that the simulators that simulate these functional blocks are synchronized properly during co-simulation. One technique is to use single-step clocking co-simulation.
During single-step clocking co-simulation, a global clock pulse is applied to each simulator or emulation hardware after each simulation step. The functional blocks operate off the global clock pulse, which is usually at least as slow as the slowest simulator clock rate in the co-simulation modeling environment. However, due to the different simulation requirements of the functional blocks, and the different simulation speeds of the integrated simulators, this technique unnecessarily slows the simulators/emulation hardware with fast simulation speeds, and the simulation of functional blocks that require significantly more clock cycles than other functional blocks. Single-step clocking is too slow and may even prove to be impractical for many embedded system developments.
An integrated circuit modeling co-simulation synchronization system for is presented here. The system allows the various components of a co-simulation; the integrated circuit hardware, the functional blocks programmatically configured in the hardware, the hardware simulated in a host computer, the simulated functional blocks configured in the simulated hardware, and other components, to work without being slowed to the pace of the slowest clock speed represented in the simulation.
A high-level integrated circuit (“IC”) modeling system includes multiple simulators and/or emulation hardware. Each of the simulators and emulation hardware models a subset of functional blocks that compose of an IC system, and operates according to initial simulation operating conditions. A co-simulation synchronization interface is configured to automatically change at least one of the initial simulation operating conditions to a triggered operating condition in response to a user-identified triggering signal.
As used herein, the term “high-level” means that a modeling environment uses values (data) in a particular format, such as fixed point, floating point, or integer values. When operating in a high-level modeling environment, the users do not have to concern themselves with how the bit values are transmitted or operated upon in the modeling blocks. The translation and propagation between the high-level values of interest to the user and low-level (e.g., bit level or HDL) values handled by the functional blocks are automatic and transparent to the users. Further information on high-level modeling environments is found in commonly owned U.S. Pat. No. 7,110,935, entitled METHOD AND SYSTEM FOR MODELING AND AUTOMATICALLY GENERATING AN ELECTRONIC DESIGN FROM A SYSTEM LEVEL ENVIRONMENT, issued Sep. 19, 2006 to Hwang et al., the disclosure of which is hereby incorporated in its entirety by reference for all purposes.
The user-identified signals (e.g., p_select, opb_addr, sg2fsl_write) are used to describe triggers that automatically change one or more operating conditions of one or more co-simulators in the high-level modeling environment 100. When the values of the user-identified signals satisfy a triggering condition, a triggering signal is sent to the interface 102 to dynamically change the IC system simulation without stopping (pausing) for a real-time user input. The interface 102 changes an operating condition of at least one co-simulator (see, e.g.,
The co-simulators are multiple different simulators that operate concurrently to model an IC system, such as a Virtual Platform simulator used to model a processor and a simulator (e.g., hand-written C simulation models or HDL simulators) integrated with a system-level design tool, such as SYSTEM GENERATOR (“S
Additional information on using hardware implementations in co-simulation high-level modeling environment is found in commonly-owned U.S. Pat. No. 7,085,976, entitled METHOD AND APPARATUS FOR HARDWARE CO-SIMULATION CLOCKING, issued Aug. 1, 2006 to Shirazi et al., the disclosure of which is hereby incorporated by reference in its entirety for all purposes.
In a particular embodiment, a computer, such as a PC, controls both a hardware implementation of a modeling element (e.g., an IC configured to provide a portion of the simulation) and software simulating another modeling element. In a specific embodiment, a processor runs at a processor speed (clock rate) and a peripheral, such as an adaptive filter, occasionally exchanges data to and from the processor, such as providing sampled input data to the processor and receiving updated filter coefficients from the processor. The time to simulate the data being processed by peripheral is much less than the time to simulate the data processing on the processor. The simulation speeds of the processor and the peripheral can be determined using well-known bench marking techniques; however, the synchronization between co-simulators is often application specific and changes dynamically during simulation depending on the status of the functional blocks and the input data samples. Embodiments of the invention allow a user to define one or more triggers according to the user's application, which can change the synchronization of the simulators dynamically during simulation.
For example, a user can specify a triggering condition based on one or more selected signals, such as p_select==1 is used to indicate that at a specific simulation cycle, a value appears at output port p_select. When p_select goes HIGH, it indicates that the peripheral contains the data requested by the processor. On each simulation clock cycle of the integrated co-simulators, these triggering conditions are evaluated in their order of appearance in the interface. If a triggering condition is evaluated to be TRUE, it is deemed satisfied (i.e., a triggering signal is generated).
When a triggering condition is satisfied during simulation, a triggering signal is sent to the synchronization interface that performs an associated operation on one or more of the co-simulators. The operations include co-simulation synchronization modes that the high-level modeling environment would enter when the triggering conditions are satisfied, and the duration of the triggering conditions. Other user-defined operations can also be specified through the co-simulation interface. For example, it is possible by associating a triggering condition with operations to enable or disable a subset of triggering conditions so that the synchronization interface can be self-adjusted in some specific simulation context.
During co-simulation IC system modeling, the interface automatically adjusts the synchronization of the co-simulators based on the user-specified triggering conditions and the associated operations. When a triggering condition is satisfied, the associated operation is performed, which puts the high-level modeling environment into a specific co-simulation mode, or enables/disables the triggering condition(s), or perform some operations specified by the users. If no triggering condition is satisfied or if the conditions that are triggered time out, the interface typically brings the co-simulators into a default co-simulation synchronization mode.
For example, if the default condition is that for a specific amount of actual execution time of an IC system, the time to simulate the data processing on a processor is 500 times longer than that of a peripheral being co-simulated, the default clocking ratio is 500:1. The simulator of the processor is synchronized to the simulator of the peripheral at this default clocking ratio if the processor and the peripheral do not need to communicate with each other. A co-simulation interface controls the clocking of the processor and the peripheral, either directly by generating clock signals, or coupling an external clock source(s) to the elements being modeled, or by controlling an on-chip clock circuit in a hardware implementation of the IC system being modeled. If a triggering condition is satisfied, such as a p_select==1 being sent to the interface, the interface places both the processor and the peripheral into a single-step clocking mode, the co-simulation mode associated with this triggering condition. That is, both the processor and the peripheral advance one operation on each clock cycle (i.e., a co-simulator clock ratio of 1:1). Conventional modeling systems using transactional modeling lose clock cycle accountability, which can lead to inaccuracies when correlating test data (test vector) input to test data output, and hence inaccurate output if data boundary transfer modeling becomes inaccurate. In comparison, using the co-simulation interface, the data boundary transfer is modeled on a cycle-accurate basis. Users can observe whether the hand-shaking protocols between different functional blocks are properly implemented so that the data is able to transfer between them. The single-step clocking mode is maintained for a number of clock cycles as specified by the duration associated with the triggering condition or until the appearance of another triggering signal indicating the end of the triggering is received. Then, the interface returns the co-simulators to the default 500:1 clock ratio, or another co-simulation state specified by the new triggered condition.
In
Based on the above triggers and triggering conditions, corresponding triggers are pushed into the underlining software simulators and hardware co-simulation platforms.
During simulation, the interface automatically synchronizes the simulators based on these user-specified triggering conditions and the associated operations. When a triggering condition is satisfied, the associated operation is performed, which turns the simulation into a specific co-simulation mode, and/or enables/disables the triggering conditions. Note that if none of the triggering conditions is satisfied or if the triggering period has been met, the proposed interface brings the integrated co-simulators into a default co-simulation synchronization mode, which in a particular embodiment is a free running mode.
Three lines coming out of the M
Triggering condition sg2sl_read==1 indicates that the simulated peripheral acknowledges the successful acceptance of one word of data from the simulated M
Triggering condition sg2sl_write==1 indicates that the simulated peripheral has data to send to the simulated M
With the above triggering conditions and associated operations, the synchronization interface 306 allows the V
Note that while the data can transmit from one component to the other component using the above triggering conditions, the data appears at different times in a cycle-accurate simulation. So, the users need to implement some hand-shaking mechanisms to ensure that the data can be properly accepted the other components.
Consider a design in which a M
As shown in
The opb_select is selected into the proposed interface and is used to create triggering condition opb_selectr_trig. When opb_select==1, the synchronization interface brings both the V
In many cases, the software development on embedded processors requires running an operating system. While the operating systems have hundreds of thousands of execution clock cycles, the co-processing in the peripheral developed in the S
The embedded processor may read data from the S
For example, the M
There are two corresponding FROM FIFO blocks 414, 416 that accept data from the TO FIFO blocks 410, 412. The P
For example, when the P
These data would be further transmitted to the pair of FROM FIFO blocks 414, 416 and finally arrive at the data logger and profiler modeling block 422 running on the host PC processor system 419. From the data logger and profiling block 422, users can obtain profiling information about the P
A triggering condition fifo_nonempty_trig, shown in the Table 4, below, is created in the proposed interface. When the user uses the S
The major advantage of the above profiling scheme is that the profiling data can be transmitted from the FPGA device to the host PC 419 using a simple triggering condition. Only two small FIFOs are required on the FPGA device 418 while a large amount of profiling data is stored using the abundant memory available on the host PC 419. The amount of data that can be recorded and stored by the profiling system is not constrained by the limited on-chip and off-chip memory on the FPGA device 418.
One limitation of the above technique is that it would affect the functioning of hardware peripherals, which can be sensitive to clock switching and single-stepping. Examples of such peripherals include the serial universal asynchronous receiver/transmitter (“UART”) peripherals. Such problems are avoided by ensuring that the FPGA device runs with a free-running clock (i.e., the half portions of the shared FIFOs on hardware do not contain any profiling data) when these peripherals are in operation. This can be accomplished by checking the status of the shared FIFO and ensuring that the pair of shared FIFOs is empty before interacting with these hardware peripherals.
The co-simulation interface automatically changes (i.e., dynamically controls) at least one of the first simulation operating condition and the second simulation operating condition, such as simulator clocking speed or clocking mode (e.g., free-running or single-step) of at least one co-simulator in response to a user-selected signal in the high-level IC system modeling environment and changes the initial simulation operating to a triggered operating condition in response to a user-selected triggering signal. In the case of hardware co-simulation, the co-simulation interface also changes the co-simulation interface between the PLD and the host PC through techniques such as dynamic partial reconfiguration. The high-level IC system modeling environment uses the first co-simulator and the second co-simulator to model the IC system in a first condition (step 504). In a particular embodiment, the first co-simulator is running at the first default clocking speed and the second co-simulator is running at the second default clocking speed.
A user-selected signal is provided as a triggering condition to the co-simulation interface (step 506) and the co-simulation interface automatically changes the high-level IC system modeling environment to a second condition (“triggered operating condition”) for a selected term (508). In a particular embodiment, the co-simulation interface changes a clocking speed of at least one of the first co-simulator and the second co-simulator. In another embodiment, the co-simulation interface changes the first and second co-simulators from free-running clocking modes to a single-step clocking mode.
After the selected term, the co-simulation interface takes the high-level IC system modeling environment out of the second condition (step 510). In a particular embodiment, the selected term is defined by a user-specified limit, such as a number of clock cycles. In an alternative embodiment, the selected term is defined by a second user-selected trigger, such as a flag indicating a full buffer, bus availability, or the execution of a processor running into a code region of interest, etc. In a particular embodiment, the co-simulation interface re-establishes the first condition of the high-level IC system modeling environment after completion of a selected term or duration.
In a further embodiment using a hardware implementation as a co-simulator in the high-level IC system modeling environment, the performance of the hardware implementation is profiled using a triggering signal provided to the co-simulator interface and data capture. In a particular further embodiment, an FPGA device is a hardware co-simulator. A triggering signal, such as a buffer status flag, is provided to the co-simulation interface, which switches the FPGA device from a free-running clocking condition to a single-step clocking condition, and simulates for one clock cycle. Each occurrence of the triggering condition is logged on a FIFO buffer of the FPGA device and transferred to a data logger and profiler running on a computer. In a particular embodiment, the co-simulation interface includes both the customized hardware synchronization circuits and the software programs running on the computer.
The FPGA architecture includes a large number of different programmable tiles including multi-gigabit transceivers (MGTs 601), configurable logic blocks (CLBs 602), random access memory blocks (BRAMs 603), input/output blocks (IOBs 604), configuration and clocking logic (CONFIG/CLOCKS 605), digital signal processing blocks (DSPs 606), specialized input/output blocks (I/O 607) (e.g., configuration ports and clock ports), and other programmable logic 608 such as digital clock managers, analog-to-digital converters, system monitoring logic, and so forth. Some FPGAs also include dedicated processor blocks (PROC 610).
In some FPGAs, each programmable tile includes a programmable interconnect element (INT 611) having standardized connections to and from a corresponding interconnect element in each adjacent tile. Therefore, the programmable interconnect elements taken together implement the programmable interconnect structure for the illustrated FPGA. The programmable interconnect element (INT 611) also includes the connections to and from the programmable logic element within the same tile, as shown by the examples included at the top of
For example, a CLB 602 can include a configurable logic element (CLE 612) that can be programmed to implement user logic plus a single programmable interconnect element (INT 611). A BRAM 603 can include a BRAM logic element (BRL 613) in addition to one or more programmable interconnect elements. Typically, the number of interconnect elements included in a tile depends on the height of the tile. In the pictured embodiment, a BRAM tile has the same height as four CLBs, but other numbers (e.g., five) can also be used. A DSP tile 606 can include a DSP logic element (DSPL 614) in addition to an appropriate number of programmable interconnect elements. An IOB 604 can include, for example, two instances of an input/output logic element (IOL 615) in addition to one instance of the programmable interconnect element (INT 611). As will be clear to those of skill in the art, the actual I/O pads connected, for example, to the I/O logic element 615 are manufactured using metal layered above the various illustrated logic blocks, and typically are not confined to the area of the input/output logic element 615. In the pictured embodiment, a columnar area near the center of the die is used for configuration, clock, and other control logic.
Some FPGAs utilizing the architecture illustrated in
Note that
While the present invention has been described in connection with specific embodiments, variations of these embodiments will be obvious to those of ordinary skill in the art. For example, although specific embodiments are described in terms of specific co-simulators and physical implementations of co-simulation interfaces, alternative embodiments apply. Therefore, the spirit and scope of the appended claims should not be limited to the foregoing description.
Number | Name | Date | Kind |
---|---|---|---|
5768567 | Klein et al. | Jun 1998 | A |
5771370 | Klein | Jun 1998 | A |
5960181 | Sanadidi et al. | Sep 1999 | A |
5960182 | Matsuoka et al. | Sep 1999 | A |
5987243 | Aihara | Nov 1999 | A |
6009256 | Tseng et al. | Dec 1999 | A |
6182247 | Herrmann et al. | Jan 2001 | B1 |
6212489 | Klein et al. | Apr 2001 | B1 |
6230114 | Hellestrand et al. | May 2001 | B1 |
6263302 | Hellestrand et al. | Jul 2001 | B1 |
6356862 | Bailey | Mar 2002 | B2 |
6366530 | Sluiter et al. | Apr 2002 | B1 |
6389379 | Lin et al. | May 2002 | B1 |
6389558 | Herrmann et al. | May 2002 | B1 |
6453450 | Walter | Sep 2002 | B1 |
6470481 | Brouhard et al. | Oct 2002 | B2 |
6480954 | Trimberger et al. | Nov 2002 | B2 |
6570592 | Sajdak et al. | May 2003 | B1 |
6598178 | Yee et al. | Jul 2003 | B1 |
6624829 | Beck et al. | Sep 2003 | B1 |
6707474 | Beck et al. | Mar 2004 | B1 |
6718294 | Bortfeld | Apr 2004 | B1 |
6751583 | Clarke et al. | Jun 2004 | B1 |
7085976 | Shirazi et al. | Aug 2006 | B1 |
7089175 | Nemecek et al. | Aug 2006 | B1 |
7110935 | Hwang et al. | Sep 2006 | B1 |
7143369 | Milne | Nov 2006 | B1 |
7260517 | Bailey et al. | Aug 2007 | B2 |
7287178 | Milne et al. | Oct 2007 | B1 |
7346481 | Ballagh et al. | Mar 2008 | B1 |
7366650 | Nightingale et al. | Apr 2008 | B2 |
7366651 | Milne et al. | Apr 2008 | B1 |
7401333 | Vandeweerd | Jul 2008 | B2 |
7437280 | Ballagh et al. | Oct 2008 | B1 |
7475288 | Multhaup et al. | Jan 2009 | B2 |
7478027 | Ban | Jan 2009 | B2 |
7539953 | Seng et al. | May 2009 | B1 |
7574635 | Alfke | Aug 2009 | B1 |
7673265 | Akiba et al. | Mar 2010 | B2 |
7707019 | Ballagh et al. | Apr 2010 | B1 |
7937259 | Chan et al. | May 2011 | B1 |
7970597 | Lin et al. | Jun 2011 | B2 |
8001409 | Osborn et al. | Aug 2011 | B2 |
8145467 | Ou et al. | Mar 2012 | B1 |
20020019969 | Hellestrand et al. | Feb 2002 | A1 |
20020059054 | Bade et al. | May 2002 | A1 |
20020120909 | Brouhard et al. | Aug 2002 | A1 |
20030105620 | Bowen | Jun 2003 | A1 |
20030171908 | Schilp et al. | Sep 2003 | A1 |
20030188278 | Carrie | Oct 2003 | A1 |
20040181385 | Milne et al. | Sep 2004 | A1 |
20040215438 | Lumpkin et al. | Oct 2004 | A1 |
20040260528 | Ballagh et al. | Dec 2004 | A1 |
20050060133 | Schuppe | Mar 2005 | A1 |
20050071602 | Niell et al. | Mar 2005 | A1 |
20060224372 | Ban | Oct 2006 | A1 |
20070028144 | Graham et al. | Feb 2007 | A1 |
20090083682 | Akiba et al. | Mar 2009 | A1 |
20090132224 | Reichor et al. | May 2009 | A1 |
20100077118 | Blackwell et al. | Mar 2010 | A1 |