The disclosure generally relates to co-simulation of circuit designs.
Simulating a circuit design may involve separating the design into parts that are simulated on different simulation platforms. A co-simulation can run a testbench program on a computer system (a “software platform”), such as a desktop, server or other type of computer system and emulate parts of the circuit design (design-under-test or “DUT”) on a hardware platform, such as a platform having one or more programmable logic devices.
Emulating part of the design on a hardware co-simulation platform reduces simulation time relative to simulating the design on a computer system. However, existing approaches to partitioning a design for co-simulation are inefficient, which is a result of the overhead involved in exchanging data between the hardware and software platforms during co-simulation.
A disclosed method includes generating configuration data by a design tool to implement circuitry for emulation of a design-under-test (DUT) on programmable logic of a system-on-chip (SoC). The method includes generating testbench executable code from testbench source code by the design tool. The testbench executable code is configured to generate stimuli to the circuitry on the programmable logic. The method includes configuring a processor of the SoC to execute the testbench executable code and the programmable logic to implement the circuitry for emulation of the DUT.
A disclosed system includes one or more computer processors configured to execute program code and a memory arrangement coupled to the one or more computer processors. The memory arrangement is configured with instructions of a design tool that when executed by the one or more computer processors cause the one or more computer processors to perform operations that include generating configuration data to implement circuitry for emulation of a design-under-test (DUT) on programmable logic of a system-on-chip (SoC). The operations include generating testbench executable code, and the testbench executable code is configured to generate stimuli to the circuitry on the programmable logic. The operations include configuring a processor of the SoC to execute the testbench executable code and the programmable logic to implement the circuitry for emulation of the DUT.
A disclosed system-on-chip (SoC) includes programmable logic circuitry configured to emulate a design-under-test (DUT) and a processor coupled to the programmable logic circuitry. The processor is configured to execute testbench executable code that generates stimuli to the DUT.
Other features will be recognized from consideration of the Detailed Description and Claims, which follow.
Various aspects and features of the method and system will become apparent upon review of the following detailed description and upon reference to the drawings in which:
In the following description, numerous specific details are set forth to describe specific examples presented herein. It should be apparent, however, to one skilled in the art, that one or more other examples and/or variations of these examples, all of which are non-limiting, may be practiced without all the specific details given below. In other instances, well known features have not been described in detail so as not to obscure the description of the examples herein. For ease of illustration, the same reference numerals may be used in different diagrams to refer to the same elements or additional instances of the same element.
The disclosed methods and systems reduce co-simulation time by reducing the overhead involved in exchange of data between the testbench code and the emulated circuit. The disclosed methods and systems generate executable program code for a testbench and configuration data for emulating a circuit design. The methods and systems configure an SoC, which has a processor and programmable logic, to execute the testbench code and emulate the circuit design (or a portion thereof). The overhead is reduced by the exchange of data being between SoC circuitry rather than through input/output (“I/O”) channels that span between hardware and software platforms. The term, “software domain,” may be used to refer to C/C++/SystemC and/or behavioral hardware description language (“HDL”) testbench code constructed to provide stimulus to the DUT and to execute on a computer processor. The term, “hardware domain,” may be used to refer to register transfer level (“RTL”) code that specifies the DUT and can be synthesized for emulation on logic gates of programmable logic circuitry.
The exemplary emulation/prototyping platform includes three devices 106, 108, and 110 in addition to the SoC 104. The additional devices on the emulation/prototyping platform can be field programmable gate array (“FPGA”) devices, application specific integrated circuits (ASICs) devices, or additional SoCs. For emulation some vendors provide ASICs in the form of multiple custom processors communicatively coupled in a grid. The RTL code is compiled for emulation of gate-level behavior by the processors executing the compiled code. The ASICs are tailored to emulate the circuit specified by the RTL and provide features of probing signals and/or capturing waveforms etc. Other vendors provide FPGA devices that can be configured to emulate the gate-level behavior based on an RTL design. For prototyping, FPGA devices are generally preferred.
The SoC 104 includes a processing subsystem (“PS”) 105, which can include multiple processors, and a programmable logic subsystem that includes FPGA circuitry 107. The processing subsystem supports a co-simulation that can execute various testbench configurations that range from bare metal code compiled from C/C++ source code to a testbench reliant on an operating system (e.g., Linux) executing on the processor(s).
A workstation can provide a user interface 130 for configuring and/or interacting with the emulation/prototyping platform at runtime.
The testbench source code 112 and DUT source code 114 can be input to a design tool for generating the testbench executable code and configuration data for DUT emulation as shown by block 116. The design tool compiles the testbench source code for a processor(s) of the SoC as shown by block 118. At block 120, Instrumentation code is added to the testbench executable code by the design tool to provide an interface between the testbench code and the emulation circuitry.
The design tool generates testbench executable code at block 122. If the SoC has multiple processors, the design tool can partition the testbench source code into multiple partitions and compile the partitions for execution on the processors. For example, testbench source code having functions c_test1( ), c_test2( ), c_test3( ), . . . , can be partitioned and compiled such that the executable code of c_test1( ) will be executed on one processor, the executable code of c_test2( ) will be executed on another processor, the executable code of c_test3( ) will be executed on yet another processor etc. The design tool will also generate separate instrumentation code for each of the processors. The partitioning of the testbench enables testbench functions to execute in parallel execution of testbench function, in addition to running in parallel with the DUT.
At block 124, the design tool can partition the DUT and synthesize the design into a netlist. For a large design that cannot be emulated on a single device, the design tool can partition the design for emulation across multiple devices. The design tool can generate time division multiplexing (“TDM”) logic at various I/O boundaries as known in the art.
In addition to partitioning, the design tool generates instrumentation logic at block 126. The instrumentation logic is added to the nettlist to implement an instrumentation interface to communicate with the testbench executable code.
Based on the instrumented netlist, at block 128 the design tool generates configuration data to implement circuitry to emulate the DUT on the emulation device(s). Depending on the type of device, the generation of the image can include place-and-route along with bitstream generation for FPGA circuitry, or generation of executable code for ASIC devices.
To configure the emulation/prototyping platform for co-simulation, the design tool configures a processor(s) of the SoC 104 to execute the testbench executable code, including the instrumentation code, and configures one or more of the devices 106, 108, and 110 to implement circuitry of the instrumentation interface and circuitry to emulate the DUT.
The design tool can provide a user interface 130 for configuration of the platform 102. By invocation of functions of the design tool, a user can instantiate testbench executable code on the SoC for execution by a processor and configure programmable logic of the SoC. In addition, the user can commence the co-simulation and optionally view runtime behavior. The design tool can be communicatively coupled to the emulation/prototyping platform via PCie or Ethernet bus.
In an exemplary SoC, the processing subsystem can be communicatively coupled to the programmable logic subsystem by an Advanced extensible Interface (“AXI”) bus. The testbench instrumentation code 208 and instrumentation interface 212 support communication between the testbench executable code and the DUT circuitry 210.
During co-simulation runtime, the condition check logic 214 of the instrumentation interface monitors DUT signals to determine when a test condition(s), which was specified in the DUT source code, has been satisfied. In the example code 114 of
In combination with logic for condition checking, logic 214 includes data packing logic. In response to satisfaction of the condition, the data packing logic combines signal values in registers (not shown). The signal values are those specified in the DUT source code, which in the example are “addr” and “data” specified as arguments to the c_func function call. The AXI Master FSM writes data from the registers to allocated addresses in memory (not shown; see
In response to receiving the interrupt, the processing subsystem reads the vector address associated with the interrupt and redirects execution (jumps) to the (instrumented/modified) function, c_func. In executing c_func, the data is output, such as displaying the data on a universal asynchronous receiver-transmitter (“UART”) console or output via an operating system (e.g., Linux). To input stimuli to the DUT 210, the testbench executable code 206 writes data to AXI slave registers 220 via the instrumentation code 208. In response to data written to the AXI slave registers, the data unpacking and control logic 222 issues a stop-clk signal to the clock control logic 216, which issues a stop-clks signal to the DUT 210. The data unpacking logic unpacks the data from the register(s) and generates signals on associated signals lines for input to the DUT 210.
Upon completing execution of the function, the processing subsystem issues a resume-clks signal to a general purpose I/O pin, which is connected to the clock control logic 216. In response, the clock control logic deasserts the stop-clks signal to the DUT, which enables oscillation of the clock signal(s) to drive the DUT.
Examples 1, 2, and 3 that follow show RTL code, testbench source code, and instrumentation source code generated based on the RTL and testbench source code. The instrumentation code invokes the testbench code by interacting with circuitry of the emulated DUT specified by the RTL code.
Example 1 shows a module in Verilog that defines a circuit to be emulated by an SoC. The “always” block calls “c_test” after every valid transaction (when posedge of “clk” and “valid” signal asserted to 1) and gets the next address to be issued from “c_func” and puts the address in register “next_addr.”
Example 2 shows the source code of the function “c_test.” The function prints the address and data output by module “top,” and generates a new address that is returned as input to “top.”
The source code in Example 3 shows the instrumentation code generated by the design tool based compilation of the RTL code and testbench source code of Examples 1 and 2. The instrumentation code includes “main” and “irq0_isr” functions.
The RTL specification of “import “DPI-C” function void c_test(bit[31:0] addr, bit[31:0] wdata)” provides information to design tool about the name of function to be invoked from “irq0_isr” and the arguments to be passed. The address at which arguments (addr & wdata) are stored by AXI Master FSM 218 and eventually read by “irq0_isr” can be determined based on reserved address space allocated in the on-chip memory (e.g.,
The function “irq0_isr” is the interrupt service routine, and “irq0” denotes the IRQ-0 pin connected to the processing subsystem 202 by the AXI master FSM to trigger c_test by checking condition of DUT signals in the DUT instrumentation circuitry 212. Upon power on, “irq0_isr” is registered/initialized upon execution of “main” as an interrupt service routine for irq0 pin.
Upon issue of IRQ-0 by the DUT instrumentation circuitry 212, “irq0_isr” is invoked, and “irq0_isr” reads the arguments written by the DUT instrumentation circuitry into the specified memory addresses. Upon reading, “irq0_isr” will invoke “c_test”. Upon completion of execution of “c_test,” the return value is stored in the specified address for “next_addr” signal in the DUT instrumentation circuitry (AXI-Slave register 220. For example, address 0x8000_0XXXX is reserved for the software instrumentation 208 to store the return value for hardware DUT signals. Upon completion, “irq0_isr” triggers the hardware clock control logic 216 to resume clock signals to the DUT 210.
An OS image can be booted on the processor and through the instrumentation code, the testbench code can be executed with support from libraries (e.g., C/C++ libraries) and the OS. This software stack covers use cases relying on system calls, such as “AVIPs” (accelerated verification intellectual properties from CADENCE®) in which the software portion of AVIP is executing on a processor and a hardware portion of AVIP is implemented on an FPGA.
In generating the instrumentation code, the design tool generates code that provides an interface to SystemC library functions for communicating the stimuli to the instrumentation interface circuitry through the operating system. The SoC can be configured to boot the OS on a processor at emulation runtime.
Memory and storage arrangement 820 includes one or more physical memory devices such as, for example, a local memory (not shown) and a persistent storage device (not shown). Local memory refers to random access memory or other non-persistent memory device(s) generally used during actual execution of the program code. Persistent storage can be implemented as a hard disk drive (HDD), a solid state drive (SSD), or other persistent data storage device. System 800 may also include one or more cache memories (not shown) that provide temporary storage of at least some program code and data in order to reduce the number of times program code and data must be retrieved from local memory and persistent storage during execution.
Input/output (I/O) devices such as user input device(s) 830 and a display device 835 may be optionally coupled to system 800. The I/O devices may be coupled to system 800 either directly or through intervening I/O controllers. A network adapter 845 also can be coupled to system 800 in order to couple system 800 to other systems, computer systems, remote printers, and/or remote storage devices through intervening private or public networks. Modems, cable modems, Ethernet cards, and wireless transceivers are examples of different types of network adapter 845 that can be used with system 800.
Memory and storage arrangement 820 may store an EDA application 850. EDA application 850, being implemented in the form of executable program code, is executed by processor(s) 805. As such, EDA application 850 is considered part of system 800. System 800 while executing EDA application 850, receives and operates on input code 801. Input code 801 includes the testbench source code and DUT specification. In one aspect, system 800 compiles the testbench code and performs a design flow on the DUT code. The design flow can include synthesis, mapping, placement, routing, and generating configuration data for programmable logic. System 800 generates an executable testbench, instrumentation code and configuration data as output code 860.
EDA application 850, input code 801, output code 860, and any data items used, generated, and/or operated upon by EDA application 850 are functional data structures that impart functionality when employed as part of system 800 or when such elements, including derivations and/or modifications thereof, are loaded into an IC such as a programmable IC causing implementation and/or configuration of a circuit design within the programmable IC.
Referring to the PS 902, each of the processing units includes one or more central processing units (CPUs) and associated circuits, such as memories, interrupt controllers, direct memory access (DMA) controllers, memory management units (MMUs), floating point units (FPUs), and the like. The interconnect 916 includes various switches, busses, communication links, and the like configured to interconnect the processing units, as well as interconnect the other components in the PS 902 to the processing units.
The OCM 914 includes one or more RAM modules, which can be distributed throughout the PS 902. For example, the OCM 914 can include battery backed RAM (BBRAM), tightly coupled memory (TCM), and the like. The memory controller 910 can include a DRAM interface for accessing external DRAM. The peripherals 908, 915 can include one or more components that provide an interface to the PS 902. For example, the peripherals can include a graphics processing unit (GPU), a display interface (e.g., DisplayPort, high-definition multimedia interface (HDMI) port, etc.), universal serial bus (USB) ports, Ethernet ports, universal asynchronous transceiver (UART) ports, serial peripheral interface (SPI) ports, general purpose (GPIO) ports, serial advanced technology attachment (SATA) ports, PCIe ports, and the like. The peripherals 915 can be coupled to the MIO 913. The peripherals 908 can be coupled to the transceivers 907. The transceivers 907 can include serializer/deserializer (SERDES) circuits, MGTs, and the like.
Various logic may be implemented as circuitry to carry out one or more of the operations and activities described herein and/or shown in the figures. In these contexts, a circuit or circuitry may be referred to using terms such as “logic,” “module,” “engine,” “generator,” or “block.” It should be understood that elements labeled by these terms are all circuits that carry out one or more of the operations/activities. In certain implementations, a programmable circuit is one or more computer circuits programmed to execute a set (or sets) of instructions stored in a ROM or RAM and/or operate according to configuration data stored in a configuration memory.
Though aspects and features may in some cases be described in individual figures, it will be appreciated that features from one figure can be combined with features of another figure even though the combination is not explicitly shown or explicitly described as a combination.
The methods and system are thought to be applicable to a variety of systems for co-simulation. Other aspects and features will be apparent to those skilled in the art from consideration of the specification. It is intended that the specification and drawings be considered as examples only, with a true scope of the invention being indicated by the following claims.