The present invention relates generally to hardware simulation and, more specifically, to high-speed, object-oriented hardware simulations.
Electronic hardware design is typically performed using register transfer level (RTL) descriptions of the device being designed. Hardware description languages such as Verilog allow hardware designers to describe the electronic devices that they are designing, and to have those descriptions synthesized into a form that can be fabricated.
The process of producing electronic devices is time-consuming and expensive. As a result, various simulation systems have been developed to permit hardware designs to be verified prior to actually producing an electronic device. Typically, a description of an electronic device is exercised using a simulator. The simulator generally includes a simulation kernel that runs the simulation either in software, or using simulation hardware, which typically consists of a collection of programmable logic devices or specially designed processing units. Use of simulation for the purpose of verifying hardware designs is a regular part of the hardware design cycle.
Many current hardware designs are intended to be used extensively in conjunction with software applications. Due to the slow speed of many current simulators, it may be necessary to delay much of the design and testing of such software until after early versions of the actual hardware become available. As a result, software development may not be possible until relatively late in the design cycle, potentially causing significant delays in bringing some electronic devices to market.
In view of the above, it is desirable to create high-speed simulations of the system so that software developers may begin working on applications while the hardware engineers are still designing the actual implementation. Some systems have, in fact, been developed to offer operating speeds sufficient to permit software testing. In other words, software developers can simulate the behavior of the modeled hardware in response to their code. Reaching such simulation speeds, however, generally requires operating trade-offs. For example, a high-speed simulation may not fully model the functionality of the hardware, perhaps abstracting components to the point of being accurate in terms of interface only. As a result, such a simulation is limited in its reflection of how the system—software and hardware—will eventually run. To improve modeling accuracy, as the hardware components are developed, simulations representing closer approximations of the actual devices may be introduced. But again, due to the trade-off between capability and speed, such simulations generally run slowly and consequently limit the efficiency with which hardware and software may be co-designed.
The best way to model is to accurately reproduce the timing requirements of the system being simulated. Faithfully reproducing all timing in simulation ensures correct modeling. For example, all simulated devices may be bound to a system clock, and with each clock cycle executed explicitly; in this way, no device will lose synchronization with other devices, and premature device execution can be prevented. Unfortunately, the price of this accuracy is slow execution due to the need to process every clock cycle.
The present invention increases the speed and versatility of hardware simulations. In general, the invention represents hardware components as executable objects that not only may be tested and run individually to simulate the behavior of a modeled hardware device, but which can be organized into a multi-object circuit simulating device behaviors and interactions among them. The various devices respect each other's timing requirements, so the simulation remains cycle-accurate, but do not require the simulation to explicitly execute each clock cycle in order to maintain overall timing integrity. Using the invention, a designer may define objects to model the behavior of hypothetical or actual devices, and then determine their interconnections and interactions. The invention ensures consistent timing and proper execution without explicitly executing each clock cycle.
In accordance with the invention, software objects are interconnected in a manner that prevents race conditions between the objects (i.e., between the devices that the objects model) and accommodates inconsistent timing requirements. Moreover, the invention permits designers to transparently distribute object interconnections among multiple processors or computers, e.g., via a network. To accomplish these objectives, the invention introduces “interconnection objects” that act as intermediaries between communicating devices, ensuring that data does not pass between or among devices until an appropriate state has been reached. The interconnection devices may also execute integrity checks to prevent transmission of illegal values, as well as resolution and/or other functions on incoming and/or outgoing data.
Accordingly, in a first aspect, the invention comprises a method for simulating hardware parallelism. In accordance with the method, a plurality of hardware objects, each representing at least a portion of a hardware device, are provided, as is an interconnection object. The interconnection object is in communication with one or more first hardware objects (e.g., objects producing an output) and one or more second hardware objects (e.g., objects expecting, as input, the output of the first hardware object(s)). For simplicity, the ensuing discussion will focus on the simple case of a single first hardware object, a single second hardware object, and a single interconnection object, it being understood that pluralities of hardware objects are contemplated as well.
The interconnection object comprises a source variable and a destination variable. In accordance with the invention, the interconnection object's source variable receives output data from the first hardware object; the output data is intended for receipt by the second hardware object. Based at least in part on receipt of an update command by the interconnection object, the destination variable is set equal to the source variable, and the data from the destination variable is provided to the second hardware object as input.
The first and second hardware objects may be the same object or different objects. The output data may be based at least in part on a plurality of values supplied by the first hardware object, on a random function, and/or on a resolution function. The source variable may receive output data from one or a plurality of hardware objects.
The update command may be received after a signal transition, e.g., a clock pulse, a reset, and/or an arbitrary function. The first and second hardware objects may reside within separate computational processes, may be executed by different processors, and indeed, may reside on different computers. The source variable may contain one or a plurality of source data values, and the interconnection object may be configured to detect illegal source data values and prevent their storage in the source variable. The source data values may, for example, represent states of one or more pins of the hardware device corresponding to the first hardware object. For example, the first hardware object may represent one or more buses, in which case the source data values comprise one or more states of the bus(es), and the interconnection object may implement a resolution function to accommodate different bus types. Alternatively, the source data values may correspond to one or more states of one or more control signals coming from the first hardware object.
Similarly, the destination variable may contain a plurality of destination data values, and the interconnection object may be configured to detect illegal values and prevent their storage in the destination variable. The destination data values may comprise states of one or more pins of the second hardware object, which may be, for example, one or more buses. Once again, the interconnection object may implement a resolution function to accommodate different bus types. The destination data values may comprise a plurality of states of one or more control signals going to the second hardware object.
In another aspect, the invention comprises an apparatus for simulating hardware parallelism. The apparatus comprises first and second hardware objects, each representing at least a portion of a design of a hardware device, and an interconnection object in communication with the first and second hardware objects. The interconnection object includes a source variable and a destination variable and is responsive to an update command. The interconnection object is configured to receive, in the source variable, output data from the first hardware object intended for receipt by the second hardware object. It is also configured to set the destination variable equal to the source variable based at least in part on receipt of the update command, and to thereupon provide data from the destination variable to the second hardware object as input.
The interconnection object may comprise means for processing the output data based at least in part on a randomized function and/or a resolution function. The hardware objects may reside within separate computational processes, may be executed by different processors, and may even reside on different computers. The source and/or destination variables may be configured to accommodate a plurality of data values. The interconnection object may comprise means for detecting illegal source data values and preventing their storage in the source variable, as well as means for executing a resolution function to accommodate, for example, different bus types. The destination data values may, for example, comprise a plurality of states of one or more buses, in which case the interconnection object may comprise means for executing a resolution function to accommodate different bus types. The destination data values may comprise one or more of states of a control signal going to the second hardware object.
Other aspects and advantages of the present invention will become apparent from the following detailed description, taken in conjunction with the accompanying drawings, illustrating the principles of the invention by way of example only.
The foregoing and other objects, features, and advantages of the present invention, as well as the invention itself, will be more fully understood from the following description of various embodiments, when read together with the accompanying drawings, in which:
In brief overview,
In one embodiment, the method 100 begins by providing a system-level model (STEP 102) such as a SystemC design environment. The system-level model, written in a software language such as, but not limited to, SystemC, emulates a physical system at a high level. In a simple example, a system-level model may represent a hand-held calculator, with functions for adding, subtracting, multiplying and dividing. Initially, the calculator model may implement a function such as adding by taking in two parameters and utilizing the native “+” implementation provided by the programming language. Using high-level methods to emulate functionality is advantageous in terms of performance, but does not reflect the way a real system would behave. To emulate actual system behavior, it is necessary to model the steps performed by a real calculator. The parameters would be put into physical registers within the system, a binary addition would be performed on the registers, the result would be put on a data bus, and the output would be read from the bus and displayed on a screen. While emulating each register and bus of a calculator is fairly simple, emulating every component of a system such as a desktop computer is a far more complex task not amenable to real-time modeling. Therefore, the system-level model is divided into functional blocks (STEP 104) of code representing the higher-level hardware components of the system, so that each component may be developed independently from the rest of the system.
Once the system-level model is divided into functional blocks (STEP 104), application programming interfaces (APIs) to those blocks are provided. The APIs mimic the way a physical system would interact with the hardware device being modeled. Using the calculator example, the physical calculator may have an adder component that has two sets of data-in pins and one set of data-out pins. The physical calculator would place the parameters on the adder's data-in pins and on the next system clock cycle, check the data-out pins for the result (though it should be noted that the addition step may be performed asynchronously). The binary addition step is performed by the adder component. Like the physical calculator, the calculator model may have an adder functional block that takes in two input parameters and presents one output parameter. The model would pass in the parameters to be added and on the next simulated clock cycle it would read the output parameter. This behavior mimics the way the physical calculator's components interact. In a physical system, components are generally not aware of the implementation specifics of other components; they only “see” the other components' input and output pins.
Communication between functional blocks defined within the system-level model is trivial; the system developer has direct access to a functional block's inputs and outputs through native APIs (i.e., APIs specifically associated with the functional block and consistent with other APIs used with the system-level model) or address pointers. It is desirable, however, to allow the system-level model to also interact with functional blocks written outside the system (“hardware objects”) as if they were natively defined, i.e., written expressly for interaction with the system environment. Developing hardware objects outside the scope of a specific system allows developers to reuse objects they have created for other systems, to use programming languages with which they are already comfortable, or even to incorporate proprietary hardware objects for which they may not have the source code. These hardware objects may be written using any of a number of programming languages such as, but not limited to, Verilog, HDL, C, C++, SystemC, Java, or low-level assembly. The objects may be source code or object code that was compiled using a compiler such as the SPEEDCompiler program supplied by Carbon Design Systems, Inc., Waltham, Mass. However, because such reused objects are not native to the system-level model, and the system therefore is not configured to interact with them directly (e.g., their values or pointers are not natively defined with respect to the system-level model), a mapping layer or “wrapper” is provided (STEP 106) to enable the system-level model to communicate with non-native hardware objects. The wrapper provides a defined interface, generalized with respect to the hardware device being simulated, with which the system-level model—i.e., other objects defined within the system-level model or aspects of the model itself—may interact while hiding the details of declaring and instantiating the objects, as well as facilitating any communications that may flow from one object to another. Beneficially, this allows the developer to swap hardware object files during the compile (STEP 108) or linking (STEP 110) step in favor of more efficient or more complete implementations. For example, a system-level model emulating a desktop computer may examine the value on the data-out pins of a soundcard. An object provided by a first vendor may refer to the pointer representing the data-out pins as sndCard.d_out. An implementation of the same object provided by a second vendor may refer to the same pins using a pointer named soundCard.dataOut. To swap the objects in a system that does not utilize a wrapper, the system-level model code would need to be changed to import, declare, and instantiate the correct object instance and to call the appropriate variable. Instead, one embodiment of the present invention allows the system to interact with wrappers in a standard, unchanging manner and let each wrapper declare the correct object, instantiate it, and map the inputs and outputs from the system to the correct hardware object variables. With reference to
The mapping that the wrapper creates (“mapping layer”) typically has several modules that facilitate object creation and communication: the declaration module 144, the instantiation module 146, the sensitization module 148, the initialization module 150, the execution module 152, and the output scheduling module 154. It is understood that the following description pertains, in reference to the steps of instantiation, initialization and execution, to run-time behavior of a hardware object and a system-level model. All steps may be coded before the compilation step of the method 100, but the interactions described pertaining to the instantiation, initialization, and execution of the object, preferably occur at run-time. The first step performed by the mapping layer is declaration, though as one skilled in the art is aware, declaration, instantiation, and initialization may take place in any order and/or the steps (or aspects thereof) may interleave depending on the developer's implementation style and practices.
In one embodiment of the present invention, a wrapper 130ML begins object declaration by importing a library that defines the necessary classes or data structures that represent the hardware object 130. The library contains a template of what the object will be, defining inputs and outputs (including a pin-level interface 160) as well as functions and methods, e.g., constructors, which create objects from templates, and entry methods, which provide system-level access to an object, accessible to a calling object or environment. The wrapper 130ML will use this template to create “handles” that facilitate access to the object, e.g., a pointer to an address in memory, to the hardware object and/or to its components for a calling system to access once the object is instantiated. Because the object, its variables and methods are shielded from the system-level model 125 by the wrapper 130ML, the wrapper 130ML will use the handles to pass data between the system 125 and the object 130, reading from and writing to the handles as appropriate. For example, to simulate a FIFO buffer, a handle is declared for the buffer itself, its reset pin, its push clock pin, and its data-in pins. In some embodiments, the wrapper provides a one-to-one mapping of inputs and outputs. For instance, using the FIFO example, a single-bit port of the hardware device such as the reset or push clock may each be represented as a single Boolean variable. In other embodiments, the wrapper may use a one-to-many mapping, a many-to-one mapping, or a many-to-many mapping. Multiple single bit ports, such as a set of data-in pins on the FIFO, may be mapped to a single unsigned integer value (with the lowest significant digit, in binary representation, corresponding to the first pin of the data-in set of pins). The wrapper generally performs these translations via mapping functions. For example, an input that is presented in an 8-bit representation at the system level may be converted to a 32-bit representation at the hardware object level by running the 8-bit number through a 32-bit adder. Though the mapping is still considered one-to-one, the input is translated into a format the hardware object can accommodate. Handles typically represent an input or output for the hardware object, but in some embodiments, a handle is declared to access a waveform of the signals that flow through the hardware object. Such a waveform allows for generation of a human-readable graph of what data went into and out of the hardware object at what time and may be used for performance measurements and hardware design decisions. This pin-to-pin mapping is commonly referred to as API mapping and is generally cycle accurate and clock-bound.
Once declaration is complete, the hardware object may be instantiated by the instantiation module 146. Instantiation takes the template provided by the declaration module and creates a blank hardware object in memory. The object and its components, such as the input and output variables, now exist in memory but are not yet “hooked in” to the inputs and outputs of the local variables of the mapping layer. The system-level model, the wrapper, and the hardware object all exist in memory, but system-level model may not communicate with the hardware object's components, and vice versa, yet. The initialization module 150 obtains, from the object that was instantiated, pointers to its internal variables representing the pins and methods to be exposed, assigning them to the local variables and methods, respectively, of the wrapper. Once this has been completed, the system-level model may access the hardware object via the wrapper. The hardware object may raise events to the system-level model through the wrapper as well.
Before a hardware object is executed, it is sensitized to changes on its inputs via the sensitization module 148. Sensitization involves making the system-level model aware of every change to a hardware object's inputs that will result in the changing of one of its outputs. For example, if a new value placed in the push clock variable of a FIFO object causes the object to place data into its data-out variable, then the system-level model is “sensitive” to the change of the hardware object's push clock. The collection of signals that influence object output is termed a “sensitivity list.” The wrapper 130ML makes the system-level model 125 aware of the hardware object's sensitivity list by passing the sensitive pins of the pin-level interface 160 to the system-level model 125 and registering those pins with the system-level model. In some embodiments, the system-level model's execution kernel, when it attempts to put values into the pin variables, will raise an event that will “wake up” the hardware object 130 to the forthcoming changes to its input pins. Typical signals to which an object is sensitive to include changes to its clock pin, changes to asynchronous reset pins it may have, or changes to inputs which cause changes to the object's output pins, yet do not require the toggling of a clock or a reset. In any of these instances, and others, the sensitivity list may be level sensitive as opposed to edge sensitive.
Once the object 130 is instantiated, sensitized, and initialized, the object 130 may be executed via the wrapper 130ML by signals from the system-level model 125 (i.e., signals produced by other objects in accordance with the system-level design or from other system-level components). The system-level model 125 communicates with the wrapper 130ML as if it were communicating with a hardware device, placing inputs into the wrapper's input variables as if they were the pins of the physical object. The wrapper checks for changes to the input variables defined in the sensitivity list and if there are changes, the wrapper passes the inputs to the corresponding handles of the instantiated hardware object's components. The hardware object executes and places output data in its output variables. The wrapper then copies the data from the handles of the object's output variables to its output variables, thereby returning output data from the simulated hardware to the system-level model at the expected output pins (via the pin-level interface 160).
A more detailed view of object organization is shown in
The interaction between an object's wrapper and the system-level model may follow the boundaries of the system's clock(s), operating in the one-cycle-to-one-cycle mode described above, or the two may utilize an transaction-based interaction model. In a transaction-based simulation, the system-level model calls the wrapper only when necessary, skipping potentially thousands of “ticks” (each of which represents an absolute measure of system time not necessarily coinciding with a clock cycle) at a time. A non-cycle-accurate system is useful when writing higher-level application software or hardware drivers. For example, rather than being required to set every individual pin required to a complete transaction, which may iterate over several clock cycles, a system may instead simply call a busobject.write( ) method and pass in an array representing the value to be written. This step, known as “abstract mapping,” effectively takes an abstract concept such as a write command and turns it into a series of transactions and pin interactions that the object-calling system need not execute directly. The system therefore is not bogged down calculating its state for every clock cycle if nothing significant is occurring. Instead, the system is allowed to jump to the points in the system/hardware interaction that are useful to the developer.
In an abstract mapping scenario, an arrangement similar to the one above is used, i.e., a system-level model interacts through a wrapper with a hardware object. However, because the system issues high-level abstract commands while the hardware object is expecting low-level changes to its pins, translation objects or methods are employed to facilitate communication. With reference to
An abstract function such a write operation is, at the implementation level, composed of a series of pin state changes. For example, a physical hardware component, before filling a data bus, may first request permission to write to the bus. It may do this on its first clock cycle (read from a clock pin). Permission to write may not be granted on the next clock cycle but may be granted on, for example, the third, at which point the hardware actually writes data to the bus pins. Lastly a write acknowledgement may be returned on the fourth cycle. In the API-mapping approach, the system-level model 125 iterates through each clock cycle, computing the entire state for each object on each cycle—even though, as in this example, not every cycle is relevant to the operation of the hardware component in question. In abstract mapping, the system-level model 125 may issue a bus.write( ) command and jump ahead four clock cycles to the next point in the simulation relevant to that command, i.e., the point where that value is written to the bus, or later still, e.g., to a point where execution of the command is relevant to the simulation as a whole (such as when the write data is actually used). Because abstraction mapping does not necessarily depend on a system clock, yet typically needs an internal notion of time, the mapping layer 130ML may include a control object 177 to determine when to advance to the next point in the transaction and in the system-to-object interaction timeline. Aside from pin-level or abstract interactions that model system/hardware object behavior, hardware objects may expose to the system, through the wrapper, an object API 178 comprising methods that relate specifically to the object as a piece of software. Such routines may be, but are not limited to, execution routines, diagnostics, garbage-collection routines, destructors, or other methods that may not relate to modeling system/hardware interactions. Coordinating transactions within the abstract mapping is discussed below.
The overall execution flow in an abstract-mapping regime is shown in
Though software typically processes methods and function calls sequentially, hardware often executes events in parallel. It may be necessary for certain hardware operations to take place before others can validly take place (e.g., “race conditions” described below in connection with
In some situations, two hardware objects are involved, e.g., the output 212 of the first hardware object 2021 provides input 214 for the second hardware object 2022. In other situations, only a single object is involved, i.e., the output of the object is additionally used as an input to the object. Still other situations involve multiple hardware objects, each with multiple inputs and outputs. In any of these situations, data is not transferred directly between objects; instead, output data on the pins of a hardware object is copied to the inputs of the interconnection object 204, and the interconnection object 204 stores this output until transfer is appropriate. Output data 222 may be in any form produced by a hardware object. It may be, but is not limited to, a single value (e.g., simulating a single pin 215) or an array of values (e.g., from a single object or multiple objects); a series of values (e.g., bits) for a given period of time (e.g., a multitude of bus states for a given bus 216 for a specified interval); one or more control states (e.g., 1, 0, X or Z) for a given control signal 218; a series of bits from one or more simulated hardware pins representing a single state from each of one or more buses for a given point in time; and/or a single state from each of one or more control signals for a given hardware object.
The interconnection object 204 generally contains one or more source variable(s) 220, or placeholders in memory, to store data relevant to the interaction between hardware objects. These source variables serve as holding points for data that flows from one component or series of components to another. Output data 222, which may originate from multiple hardware objects (e.g., the objects 2021, 2022 as shown), en route to the source variable(s) 220 of the interconnection object 204, may also be processed through one or more functions. In one embodiment, one function is a resolution function 224 which may, for example, select one output data value from a group of competing data values using specified criteria. Examples of such functions are an AND function or an XOR bitmask. In another embodiment, one function is a random value function 226. Examples of the random function 226 include assigning a random value based on a system call, using a preset value, or randomly choosing between the competing values. In yet another embodiment, a resolution function accommodates multiple drivers for a single signal or bus 228, such as a bus that is expected to have “noise” values on it (e.g., a modem's data-in bus). As the interconnection object 204 receives the output data, validity checks 230 may be performed thereon to avoid storing illegal values (e.g., a clock signal having a value that is neither zero nor one). Any illegal values may be discarded (as indicated at 232), ignored, or output for diagnostic purposes. After receiving the output data and excluding illegal values, the source variable 220 stores the output data.
After the output data is stored in the source variable 220, the interconnection object 204 receives an update command 210 at the end of the current “time” indicating that the current time in a clock-bound system or the current transaction in a transaction-bound system is complete (or nearly so). The update command is generally issued before the next signal transition 234 occurs, which may be, but is not limited to, a clock pulse 236, a reset 238, or the result of an arbitrary function 240 such as a “slow” serial bus or a network packet delay emulator. An arbitrary function 240 typically includes cycle time as an independent variable. In some circumstances, the update command may be received immediately after the output data is stored. In other circumstances, the command may be received after one or more other hardware objects are executed. Waiting for an update command to flow data, rather than propagating data immediately between components, allows the system to correctly model certain behaviors while respecting hardware parallelism, e.g., avoiding “race conditions.” An example of a race condition is shown in
Once the update command 210 is received, the interconnection object 204 next copies data from the source variable 220 to the destination variable 242. Delaying the copying operation until the update command 210 is received allows hardware objects to use the current state of the simulated hardware up to the very last iteration or operation of the system before the system time or state is advanced. The destination variable 242 is generally similar to the source variable 220. The destination variable may contain, for example, a single value; an array of values; a series of values (e.g., bits) intended to correspond to a simulated hardware pin 244, such as multiple bus states 246 for a given bus over a period of time; multiple states for a single control signal 248 going to a hardware object; a series of bits intended for multiple simulated hardware pins for a single point in time, e.g., a single state from each of a multitude of buses; or a single state from each of a plurality of control signals going to a hardware object. As the data from the source variable 220 is copied to the destination variable 242, validity checks 250 may be performed on the incoming data so as not to store any illegal values. One such check may be a resolution function to accommodate multiple drivers for a single signal or bus 252 such as WAND or WOR buses. Any illegal values may be kept in a separate memory for diagnostic purposes or may be discarded (as indicated at 254). A valid value or values is (are) stored in the destination variable(s) 248 of the interconnect object 204.
After the copy is made from the source variable 220 to the destination variable 242, the second hardware object 2022 receives (as indicated at 208) the value(s) in the destination variable(s) 242 as input 214. Again, the objects 2021, 2022 may be the same object or different objects (or multiple objects). Though
Although interconnection objects avoid problems of parallelism and inconsistent timing, even clock-bound hardware objects may not be synchronized to a system-wide clock; indeed, to increase simulation speed it is desirable to avoid unnecessary cycle executions and instead confine transaction processing to meaningful operations. This may be accomplished as illustrated in
Referring to
The update objects 302 are generally in communication with one or more hardware objects 304. The hardware objects 304, which are responsive to communications from the update objects 302, are also in communication with, in some embodiments, transactor objects 328 that perform various abstract functions (e.g. read( ) and write( ) as described above). The communications sent by the update objects 302 and transactors 328 to the hardware objects 304 may be, but are not limited to, method calls, functions, or changes to the objects' input pins.
The master object 306 generally is in communication with both the update objects 302 and the hardware objects 304 and generally provides overall control. Referring to
Refer now to
From the foregoing, it will be appreciated that the systems and methods provided by the invention afford an efficient method for integrating a hard device represented in software into a system-level simulation, a method for communicating between hardware objects, and a method of control the execution of the objects and the communications between them.
One skilled in the art will realize the invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The foregoing embodiments are therefore to be considered in all respects illustrative rather than limiting of the invention described herein. Scope of the invention is thus indicated by the appended claims, rather than by the foregoing description, and all changes that come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein.