The present invention relates, in general, to electronic design automation and electronic system level design automation for integrated circuits and applications and, more particularly, to an algorithmic electronic system level method, system and software for integrated application development for and design and simulation of integrated circuitry.
Electronic Design Automation (“EDA”) and Electronic System Level (“ESL”) design and simulation tool suites for integrated circuits (“ICs”) have evolved for a wide variety of architecture platforms, such as for embedded microprocessors, digital signal processors (“DSPs”), and application-specific integrated circuits (“ASICs”). In many instances, such design tool suites provide for acceleration of some computationally intensive tasks in custom hardware, with execution control and performance of other tasks retained in an embedded, instruction-based processor.
Much of the prior art EDA design and simulation tools have been designed to optimize gate-level performance in an IC and verify functionality at this detailed hardware level. These EDA tool suites, however, have been unable to integrate this level of verification with system level designs and requirements, for testing and verifying algorithmic performance and power and control specifications, for example.
In addition, prior art EDA and ESL design and simulation tool suites have generally been inapplicable to data flow processing architectures or data streaming architectures, which are designed to execute whenever input data exists and provide corresponding output data. Such data flow architectures have typically been difficult to design and model because typical data flow models, while accounting for data input and output, have insufficient control information for execution control and further fail to account for memory requirements, movements and flows. In addition, such prior art data flow models do not provide sufficient interface information or provide incompatible interfaces, so that one dataflow element cannot be connected automatically to another dataflow element. Indeed, prior art design and simulation tools instead assume infinite memory availability for data flow modeling. In addition, current design and simulation tool suites do not provide for self-contained, data-flow based task modules, which may be utilized for implementing more than one algorithm.
Traditional ESL design platforms have been unable to design efficient architectures without significant knowledge of the algorithms which will run on those architectures. Software (such as C, C++ or assembly code) may be considered merely a simulation model for a given architecture using Turing methods. As a consequence, a need remains for an ESL design platform which can incorporate optimized algorithms to create high quality IC systems which meet, if not surpass, performance and power requirements.
Prior art EDA and ESL design and simulation tool suites also have not provided an integrated environment for both architecture design (including data flow architecture design) and application development. In addition, prior art EDA and ESL design and simulation tool suites have not provided for functional simulation of algorithms concurrent with hardware simulations of the performance of the algorithm on the actual target IC. In prior art EDA and ESL design, separate sets of “test benches” are required and are created multiple times during the course of a design cycle.
As a consequence, a need remains for a design and simulation tool flow which can integrate both control flow and memory flow with data flow, and utilize such an integrated view to simulate and model computational elements which will implement a selected algorithm on an IC. Such a design and simulation platform should generate appropriate control and memory requirements, and provide a common platform for application development, using a modular and integrated data flow model having both control and memory flow and a modular, well-defined interface. A design and simulation platform should also provide an integrated solution, allowing an application developer to perform both a functional simulation of an algorithm or program and to concurrently perform a hardware simulation of the algorithm based upon the target architecture. Such a design and simulation tool suite should also provide for mapping of the algorithm directly to the target IC architecture, with the provision of a resulting compilation of the algorithm for the target IC architecture.
The exemplary embodiments of the invention provide an Algorithmic Electronic System Level (Algorithmic ESL or “AESL”) design and simulation platform, embodied as a system, methodology and software. The exemplary embodiments incorporate algorithmic representations into both application development and hardware development, providing a significant advance over current methodologies of hardware and software co-design.
Algorithmic representations are utilized as part of hardware (IC) design, and provide integrated modules for use in application development, functional verification and hardware verification. In exemplary embodiments, algorithmic representations may then be represented rather automatically in software or dataflow, functionally verified, and may then be mapped, simulated and verified concurrently with the target IC architecture. In addition, the models generated as part of the hardware verification process may then be utilized directly by a compiler for generation of corresponding code or netlists for performance of the algorithm on the target IC architecture.
Algorithmic representations are utilized as part of IC (hardware) design, utilizing an instruction (or control or compute primitive) and memory-based modeling platform. This platform provides an integrated “flow transform” which has a combined data flow representation, control representation, a memory representation, and an interface representation. The flow transform is architecture neutral. Each flow transform is also interface neutral, having a well-defined but generic interface, allowing a plurality of flow transforms to be interconnected (via memory interconnect for modeling) to define an algorithm. The instruction (or control) and memory-based modeling platform is also utilized to generate hardware descriptions, such as in a concurrent modeling language or system such as SystemC descriptions, which may then be modeled utilizing an integrated, system modeling and simulation platform, such as a SystemC modeling platform.
In addition, using the inventive and integrated Algorithmic ESL design platform, an application developer may rely upon on all of these various detailed functional and behavioral models and work at a higher level of abstraction, with all of the information from the various detailed views “rolled-up” or integrated into these higher, more abstract levels. In addition, as may be necessary or desirable, the application designer may also “drill-down” into the more detailed views and simulations, particularly to select among alternative architectures and implementations. When the application has been completed, the application may also be compiled directly for operation on the selected IC architecture.
A first exemplary method embodiment, for developing and simulating an integrated circuit architecture, comprises: (a) inputting an algorithm using an instruction language or computational primitive having control information; (b) decomposing the algorithm to a plurality of tasks having a first selected abstraction level; (c) for each task of the plurality of tasks, determining and combining data flow, control flow, and memory flow to form a flow transform of a corresponding plurality of flow transforms; (d) connecting the plurality of flow transforms using an interconnect between each flow transform to provide an algorithm representation; and (e) simulating the connected flow transforms.
The simulation step (e) may generate computation data paths, computation control, data flow interfaces, and memory requirements and statistics. The interconnect may be at least one of the following: a memory, a first-in first-out (FIFO) memory, a buffer, a circular buffer, a constant value, a switch, or a bus. In addition, the method may also include generating a hardware description of a plurality of computational elements comprising the plurality of flow transforms, wherein the hardware description is SystemC, Verilog, or VHDL.
In exemplary embodiments, the decomposition step (b) is hierarchical and preserves control information, either as part of the flow transform or separate from the flow transform. Also in exemplary embodiments, the simulation step (e) generates control bits for control of computational elements selected to implement a corresponding flow transform; may also generate the number and type of computational elements utilized to implement a corresponding flow transform; and also may generate a plurality of quantitative measures, the plurality of quantitative measures including time spent by data operands in interconnect, time spent by data operands in a compute path. The inputting step (a) may further comprises inputting a power, cycle, latency, or size requirement (P3 requirement), while the simulation step (e) may generate a plurality of quantitative measures (P3), such as power dissipation, integrated circuit size, and cycles utilized.
In another exemplary embodiment, a computer-implemented method for developing and simulating an integrated circuit architecture, comprises: (a) determining at least one task corresponding to an algorithm; (b) for the at least one task, determining data flow, control flow, and memory flow to form a flow transform; (c) providing a corresponding interconnect for input to and output from the flow transform; and (d) using a processing device, simulating the flow transform having the memory interconnect. The simulation step (d) may further comprises at least one of the following simulations: individually simulating data flow, individually simulating control flow, individually simulating memory flow, or simulating any selected combination of data flow, control flow, or memory flow.
In exemplary embodiments, the method may also include inputting an algorithm using an instruction language or computational primitive having control information and interface information; extracting parallel computation capability; and hierarchically decomposing the algorithm to form a plurality of tasks having a first selected abstraction level, the plurality of tasks including the at least one task. The interface information may be at least one of the following: a data type, a data width, an amount or number of bytes, a latency, a delay. In addition, the method may also include generating control bits for control of computational elements selected to implement a corresponding flow transform.
In another exemplary embodiment, a system for developing and simulating an integrated circuit architecture comprises: an interface to receive an algorithm having control information; a memory; and a processor coupled to the interface and to the memory, the processor adapted to simulate a plurality of flow transforms connected using a memory interconnect to represent the algorithm, at least one flow transform of the plurality of flow transforms comprising data flow, control flow, and memory flow of a corresponding task of the algorithm.
In another exemplary embodiment, a machine-readable medium storing instructions for developing and simulating an integrated circuit architecture comprises: a first program construct for determining at least one task corresponding to an algorithm; a second program construct for determining data flow, control flow, and memory flow to form a flow transform for the at least one task; a third program construct for providing a corresponding memory interconnect for input to and output from the flow transform; and a fourth program construct for simulating the flow transform having the memory interconnect.
In exemplary embodiments, the machine-readable medium may also include a fifth program construct for inputting an algorithm using an instruction language having control information; a sixth program construct for hierarchically decomposing the algorithm to form a plurality of tasks having a first selected abstraction level, the plurality of tasks including the at least one task; a seventh program construct for generating a hardware description of a plurality of computational elements comprising the plurality of flow transforms, wherein the hardware description is SystemC, Verilog, or VHDL, and for generating control bits for control of computational elements selected to implement a corresponding flow transform.
In another exemplary embodiment, a method for developing and simulating an integrated circuit architecture comprises: inputting an algorithm having control information and inputting a power or performance requirement; hierarchically decomposing the algorithm to a plurality of tasks having a first selected abstraction level; for each task of the plurality of tasks, determining and combining data flow, control flow, and memory flow to form a flow transform of a corresponding plurality of flow transforms; connecting the plurality of flow transforms using a first-in first-out memory interconnect between each flow transform to provide an algorithm representation; simulating the connected flow transforms; generating a hardware description of a plurality of computational elements comprising the plurality of flow transforms; modeling the plurality of computational elements; and generating control bits for control of computational elements selected to implement a corresponding flow transform.
In an exemplary embodiment, a computer-implemented method for electronic system level design and verification is also provided. An exemplary method comprises: (a) receiving an application as design input; (b) performing a first functional simulation of the application to provide a functional application model; (c) verifying the functional application model; (d) providing the verified functional application model in a hardware simulation compatible format; (e) performing a second functional simulation using the verified functional application model in the hardware simulation compatible format and using an integrated circuit architecture model to provide a functional architecture model; and (f) comparing the functional architecture model with the verified functional application model. The exemplary method may also include generating a plurality of cycle-accurate, transactional-accurate, or functionally-accurate computational element models, generally in the hardware simulation compatible format; and incorporating the plurality of cycle-accurate, transactional-accurate, or functionally-accurate computational element models into the integrated circuit architecture model.
In exemplary embodiments, the step (a) of receiving the application may also further comprise: receiving a plurality of architecture definition files; receiving a plurality of dataflow diagrams; and receiving performance specifications. In addition, the step (d) of providing the verified functional model may also further Comprise: providing the verified functional application model as an application netlist of computational elements and interconnections. In exemplary embodiments, the method may also include verifying the functional architecture model; and using the verified functional architecture model, compiling the application to an integrated circuit architecture represented by the integrated circuit architecture model.
In another exemplary embodiment, a computing system for algorithmic electronic system level design comprises: a plurality of databases, a first database of the plurality of databases adapted to store a plurality of functional models, a second database of the plurality of databases adapted to store a plurality of computational element models, and a third database of the plurality of databases adapted to store a plurality of hardware definition representations; an application design processor coupled to the first database, the application design processor adapted to perform a first functional simulation of an algorithm using a plurality of computational element architecture definitions to generate a first selection of a plurality of computational elements and corresponding control code for an implementation of the algorithm; a control and memory modeling processor coupled to the second database, the control and memory modeling processor adapted to generate a plurality of flow transforms from the algorithm and to convert the plurality of flow transforms into the plurality of plurality of computational element models; and a system simulation processor coupled to the second databases and the third database, the system simulation processor adapted to convert the plurality of computational element models into the plurality of hardware definition representations and to perform a second functional simulation of the algorithm using the plurality of computational element models corresponding to the first selection and the corresponding control code.
In exemplary embodiments, the control and memory modeling processor may be further adapted to generate the plurality of flow transforms from the algorithm coded in an instruction-based language, and may also combine data flow, control flow, and memory flow information to generate a flow transform of the plurality of flow transforms. The system simulation processor may be further adapted to generate a cycle-accurate computational element model of the plurality of computational element models which further comprises control information for configuration of a configurable computational element.
In another exemplary embodiment, a system for electronic system level design and verification comprises: a first processor adapted to receive an application as design input, perform a first functional simulation of the application to provide a functional application model, verifying the functional application model, and provide the verified functional application model in a hardware simulation compatible format; and a second processor coupled to the first processor, the second processor adapted to perform a second functional simulation using the verified functional application model in the hardware simulation compatible format and using an integrated circuit architecture model to provide a functional architecture model. In exemplary embodiments, the system may also include a third processor coupled to the first processor and to the second processor, the third processor adapted to determine a plurality of architecture definition files and to provide the plurality of architecture definition files as input to the first processor.
In exemplary embodiments, the second processor may be further adapted to generate a plurality of cycle-accurate computational element models in the hardware simulation compatible format and to incorporate the plurality of cycle-accurate computational element models into the integrated circuit architecture model. The first processor may also be further adapted to provide the verified functional application model as an application netlist of computational elements and interconnections; and to verify the functional architecture model. In exemplary embodiments, the system may also include a fourth processor coupled to the second processor, the fourth processor adapted to use the verified functional architecture model to compile the application to an integrated circuit architecture represented by the integrated circuit architecture model.
In another exemplary embodiment, a system for algorithmic electronic system level design comprises: an interface for receiving an algorithmic description; a memory adapted to store a plurality of computational element architecture definitions and a plurality of cycle-accurate computational element models; and a processor coupled to the memory and to the interface, the processor adapted to perform a first functional simulation of the algorithm using the plurality of computational element architecture definitions to generate a first selection of a plurality of computational elements and corresponding control code for an implementation of the algorithm; and to perform a second functional simulation of the algorithm using a plurality of cycle-accurate computational element models corresponding to the first selection and the corresponding control code.
In exemplary embodiments, the algorithm is defined by a plurality of interconnected dataflow diagrams. The processor may be further adapted to map the plurality of interconnected dataflow diagrams to a corresponding plurality of computational elements; and generate an interconnection among the corresponding plurality of computational elements as defined by the plurality of interconnected dataflow diagrams. Also, the processor may be further adapted to convert the algorithm into a plurality of flow transforms, and to combine data flow, control flow, and memory flow information to generate a flow transform of the plurality of flow transforms.
In exemplary embodiments, the processor may be further adapted to generate a cycle-accurate computational element model of the plurality of cycle-accurate computational element models which further comprises control information for configuration of a configurable computational element. The processor also may be further adapted to perform the second functional simulation utilizing a plurality of integrated circuit architecture models, the plurality of models comprising at least two of the following models: an interconnect model, a memory model, an input and output model, a clocking model, and an integrated circuit operating system model.
In another exemplary embodiment, the processor is further adapted to perform a third functional simulation using the plurality of computational element architecture definitions to generate a second selection of a plurality of computational elements and corresponding control code for an implementation of the algorithm; to perform a fourth functional simulation of the algorithm using a plurality of cycle-accurate computational element models corresponding to the second selection and the corresponding control code; and to compare the second functional simulation and fourth functional simulation.
In exemplary embodiments, the processor may be further adapted to perform the first and second functional simulations at a plurality of levels of abstraction. In addition, the processor may be further adapted to roll-up a plurality of parameters from a each level of abstraction to the next higher level of abstraction.
In another exemplary embodiment, a system for algorithmic electronic system level design comprises: a plurality of databases, a first database of the plurality of databases adapted to store a plurality of computational element architecture definitions, a second database of the plurality of databases adapted to store a plurality of cycle-accurate computational element models, and a third database of the plurality of databases adapted to store a hardware definition representation of the plurality of cycle-accurate computational element models; and a processor coupled to the plurality of databases, the processor adapted to perform a first functional simulation of an algorithm using the plurality of computational element architecture definitions to generate a first selection of a plurality of computational elements and corresponding control code for an implementation of the algorithm; and to perform a second functional simulation of the algorithm using a plurality of cycle-accurate computational element models corresponding to the first selection and the corresponding control code.
In another exemplary embodiment, a computer-implemented method for algorithmic electronic system level design and simulation comprises: (a) inputting an algorithm; (b) providing a plurality of computational element architecture definitions; (c) functionally simulating the algorithm using the plurality of computational element architecture definitions; (d) generating from the functional simulation a first selection of a plurality of computational elements and corresponding control code for an implementation of the algorithm; and (e) functionally simulating the algorithm using a plurality of cycle-accurate computational element models corresponding to the first selection and the corresponding control code.
The algorithm may be defined by a plurality of interconnected dataflow diagrams. The functional simulation step (b) may further comprise: mapping the plurality of interconnected dataflow diagrams to a corresponding plurality of computational elements; and generating an interconnection among the corresponding plurality of computational elements as defined by the plurality of interconnected dataflow diagrams.
In exemplary embodiments, the method may also include (d1) generating from the functional simulation a second selection of a plurality of computational elements and corresponding control code for an implementation of the algorithm; (e1) functionally simulating the algorithm using a plurality of cycle-accurate computational element models corresponding to the second selection and the corresponding control code; and (f1) comparing the functional simulations using the first selection and the second selection.
In another exemplary embodiment, a machine-readable medium storing instructions for electronic system level design and verification comprises: a first program construct for receiving an application as design input and receiving a plurality of architecture definition files, the plurality of architecture definition files having been determined from control and memory-based integrated circuit modeling; a second program construct for performing a first functional simulation of the application to provide a functional application model; a third program construct for verifying the functional application model; a fourth program construct for providing the verified functional application model in a hardware simulation compatible format; a fifth program construct for performing a second functional simulation using the verified functional application model in the hardware simulation compatible format and using an integrated circuit architecture model to provide a functional architecture model; and a sixth program construct for comparing the functional architecture model with the verified functional application model.
In exemplary embodiments, the machine-readable medium may also include a seventh program construct for generating a plurality of cycle-accurate, transactional-accurate, or functionally-accurate computational element models; an eighth program construct for incorporating the plurality of cycle-accurate, transactional-accurate, or functionally-accurate computational element models into the integrated circuit architecture model; a ninth program construct for providing the verified functional application model as an application netlist of computational elements and interconnections; a tenth program construct for verifying the functional architecture model; and/or an eleventh program construct for compiling the application, using the verified functional architecture model, to an integrated circuit architecture represented by the integrated circuit architecture model.
These and additional embodiments are discussed in greater detail below. Numerous other advantages and features of the present invention will become readily apparent from the following detailed description of the invention and the embodiments thereof, from the claims and from the accompanying drawings.
The objects, features and advantages of the present invention will be more readily appreciated upon reference to the following disclosure when considered in conjunction with the accompanying drawings and examples which form a portion of the specification, wherein like reference numerals are used to identify identical components in the various views, in which:
While the present invention is susceptible of embodiment in many different forms, there are shown in the drawings and will be described herein in detail specific examples and embodiments thereof, with the understanding that the present disclosure is to be considered as an exemplification of the principles of the invention and is not intended to limit the invention to the specific examples and embodiments illustrated, and that numerous variations or modifications from the described embodiments may be possible and are considered equivalent.
Similarly, data output from the apparatus 50 may be provided to any of a plurality of output devices such as an electronic display 40, such as a CRT, plasma or LCD display, or a printer (e.g., a laser or inkjet printer) (not separately illustrated), for example. In addition, output may also be provided in the form of electronic data through network 45 or machine-readable media 30, such as to transmit to another location or a remote location.
As illustrated in
The memory 65, which may include a data repository (or database) 70, may be embodied in any number of forms, including within any computer or other machine-readable data storage medium, memory device or other storage or communication device for storage or communication of information such as computer-readable instructions, data structures, program modules or other data, currently known or which becomes available in the future, including, but not limited to, a magnetic hard drive, an optical drive, a magnetic disk or tape drive, a hard disk drive, other machine-readable storage or memory media such as a floppy disk, a CDROM, a CD-RW, digital versatile disk (DVD) or other optical memory, a memory integrated circuit (“IC”), or memory portion of an integrated circuit (such as the resident memory within a processor IC), whether volatile or non-volatile, whether removable or non-removable, including without limitation RAM, FLASH, DRAM, SDRAM, SRAM, MRAM, FeRAM, ROM, EPROM or E2PROM, or any other type of memory, storage medium, or data storage apparatus or circuit, which is known or which becomes known, depending upon the selected embodiment. In addition, such computer readable media includes any form of communication media which embodies computer readable instructions, data structures, program modules or other data in a data signal or modulated signal, such as an electromagnetic or optical carrier wave or other transport mechanism, including any information delivery media, which may encode data or other information in a signal, wired or wirelessly, including electromagnetic, optical, acoustic, RF or infrared signals, and so on. The memory 65 is adapted to store various programs or instructions (of the software of the present invention) and database tables, discussed below.
The apparatus 50 further includes one or more processors 55, adapted to perform the functionality discussed below. As the term processor is used herein, a processor 55 may include use of a single integrated circuit (“IC”), or may include use of a plurality of integrated circuits or other components connected, arranged or grouped together, such as microprocessors, digital signal processors (“DSPs”), parallel processors, multiple core processors, custom ICs, application specific integrated circuits (“ASICs”), field programmable gate arrays (“FPGAs”), adaptive computing ICs, associated memory (such as RAM, DRAM and ROM), and other ICs and components. As a consequence, as used herein, the term processor should be understood to equivalently mean and include a single IC, or arrangement of custom ICs, ASICs, processors, microprocessors, controllers, FPGAs, adaptive computing ICs, or some other grouping of integrated circuits which perform the functions discussed below, with associated memory, such as microprocessor memory or additional RAM, DRAM, SDRAM, SRAM, MRAM, ROM, FLASH, EPROM or E2PROM. A processor (such as processor 55), with its associated memory, may be adapted or configured (via programming, FPGA interconnection, or hard-wiring) to perform the methodology of the invention, as discussed below. For example, the methodology may be programmed and stored, in a processor 55 with its associated memory (and/or memory 65) and other equivalent components, as a set of program instructions or other code (or equivalent configuration or other program) for subsequent execution when the processor is operative (i.e., powered on and functioning). Equivalently, when the processor 55 may implemented in whole or part as FPGAs, custom ICs and/or ASICs, the FPGAs, custom ICs or ASICs also may be designed, configured and/or hard-wired to implement the methodology of the invention. For example, the processor 55 may implemented as an arrangement of microprocessors, DSPs and/or ASICs, collectively referred to as a “processor”, which are respectively programmed, designed, adapted or configured to implement the methodology of the invention, in conjunction with one or more databases (70) or memory 65.
As indicated above, the processor 55 is programmed, using software and data structures of the invention, for example, to perform the methodology of the present invention. As a consequence, the system and method of the present invention may be embodied as software which provides such programming or other instructions, such as a set of instructions and/or metadata embodied within a computer readable medium, discussed above. In addition, metadata may also be utilized to define the various data structures of database 70, such as to store the various color management models and calibrations discussed below.
More generally, the system, methods, apparatus and programs of the present invention may be embodied in any number of forms, such as within any type of apparatus (computer or server) 50, within a processor 55, within a computer network, within an adaptive computing device, or within any other form of computing or other system used to create or contain source code, including the various processors and computer readable media mentioned above. Such source code further may be compiled into some form of instructions or object code (including assembly language instructions or configuration information). The software, source code or metadata of the present invention may be embodied as any type of code, such as C, C++, SystemC, LISA, XML, Java, Brew, SQL and its variations (e.g., SQL 99 or proprietary versions of SQL), DB2, Oracle, or any other type of programming language which performs the functionality discussed herein, including various hardware definition or hardware modeling languages (e.g., Verilog, VHDL, RTL) and resulting database files (e.g., GDSII). As a consequence, a “construct”, “program construct”, “software construct” or “software”, as used equivalently herein, means and refers to any programming language, of any kind, with any syntax or signatures, which provides or can be interpreted to provide the associated functionality or methodology specified (when instantiated or loaded into a processor or computer and executed, including the apparatus 50 or processor 55, for example). For example, various versions of the software may be embodied using the instruction set architecture language LISA.
The software, metadata, or other source code of the present invention and any resulting bit file (object code, database, or configuration bit sequence) may be embodied within any tangible storage medium, such as any of the computer or other machine-readable data storage media, as computer-readable instructions, data structures, program modules or other data, such as discussed above with respect to the memory 65, e.g., a floppy disk, a CDROM, a CD-RW, a DVD, a magnetic hard drive, an optical drive, or any other type of data storage apparatus or medium, as mentioned above.
In addition, while the present invention is frequently illustrated with respect to simulation and modeling systems available from selected vendors, it should be understood that any simulation, modeling and IC architecture design systems can be utilized with and are within the scope of the present invention.
The exemplary embodiments of the present invention may be referred to as Algorithmic ESL (“AESL”) and divided into two categories, an architecture design platform and an application design platform. The architecture design platform is illustrated primarily with reference to
Next, in step 105, any parallel computation capability is extracted, such as through unrolling loops, duplication of processing elements in parallel, other parallel instantiations, and other methods known to those of skill in the field. In accordance with the present invention, the algorithm or other program is then hierarchically decomposed into a plurality of tasks and subtasks, which may be represented by processing or functional blocks, to a selected level of granularity, step 110. This parallel extraction and decomposition may be performed by a processor 55 or other component of system 10, typically by executing parsing and unroll programs, for example and without limitation.
In exemplary embodiments, each level of decomposition may be displayed (via display 40) to the user/designer as a separate view, with clicking (via pointing device 25) on a processor 210 or co-processor (215, 220) resulting in opening a more detailed view (at the next, more detailed level of decomposition), until the level of the most highly detailed view being utilized. Conversely, as utilized in the various simulations and verifications discussed below, the more detailed views and more concrete decompositions may be rolled back up into the less detailed views and more abstract blocks (220, 215 and 210), with associated details automatically incorporated or subsumed within the more abstract level, such as simulated or modeled timing and delay statistics, discussed below. For example, the more detailed, concrete computational elements and functional blocks (e.g., co-processors 220) may be rigorously modeled and tested, with all associated timing, latency, power and other parameters determined. Such parameters will already be integrated for subsequent modeling (such as for implementation of other algorithms), so design and verification of subsequent designs do not need to repeat such detailed modeling, with all such parameters already embedded in the component models. An exemplary decomposition for a portion of a H.264 decoder is also discussed below with reference to
The decomposition to the various co-processor (215, 220) and computational elements 225 may be accomplished by a processor 55, such as by mapping parsed functionality to a library of co-processors (215, 220) and computational elements 225 stored in a memory 65 (or database 70). Such libraries may be provided by a design tool vendor, may be input by the user/designer, or may be created by the methodology described herein.
Referring again to
This well-defined, generic interface facilitates coupling of such flow transforms in virtually any order by a designer or other user, without requiring specific knowledge of the inner workings or details of the flow transform itself. The well-defined data, control and memory interface (as input and output from any selected flow transform) allows a plurality of flow transforms to be connected together as building blocks to implement any selected algorithm, analogously to creating a chain by coupling one link after another. Such implementations may then be (iteratively) tested, as described below. In addition, the resulting architectural elements utilized to implement such flow transforms may also be manipulated as building blocks to instantiate any selected algorithm in an IC, such as an adaptive IC allowing such interconnection through a programmable or adaptive interconnect among computational elements.
As illustrated in
For example, as used herein, functionally-accurate implies providing a correct result, without regard to order, e.g., a+b+c=result. Similarly, transactionally-accurate includes functionally accurate, with additional ordering, such as (a+b)+temp and temp+c=result, and cycle-accurate implies an accurate data ordering based on timing (clock cycles), such as time 0: a; time 3: b; time 7: temp=a+b; time 12: c; time 20: result+temp+c.
As a consequence, the hierarchical processing block decomposition of the present invention preserves data flow information, control flow information, and memory flow information, which is combined into a “flow transform” (step 120,
Referring again to
Following the methodology of the present invention, an instruction-based programming language may be utilized to architect (and not just model) a non-instruction based system, such as a data flow system IC architecture. The simulation and modeling using the flow transforms can create a “netlist” of computational elements for design of the IC, and the designer can then determine if more elements or a different mix of elements should be utilized to improve performance, or decrease IC area or power dissipation, for example. The creation and preservation of memory flow information, such as register usage, provides memory and interconnect requirements. The present invention also preserves control instructions, which is generally unavailable in the prior art for data flow architecture environments. A combined flow transform is provided, integrating data flow, control flow, and memory flow. The various flow transforms which are generated and correspond to an algorithmic task or function, in turn, may be combined in any of a plurality of ways to express an algorithm as data flow, yet preserving any needed control and memory information as integral blocks. In addition, as discussed below, the creation and modeling of a flow transform in accordance with the present invention can be combined with a larger design tool flow for creation of adaptive computing IC architectures.
The Algorithmic ESL system 500 may generally be divided into 2 portions, an architecture design platform (illustrated in
The architecture design platform, as discussed above with reference to
The system modeling and simulation platform (540) may be implemented utilizing a wide variety of platforms available from various vendors. The system modeling and simulation platform (540) provides a common platform to link and integrate algorithmic (application) development with hardware development, and to provide corresponding simulation and verification, among other functionality. In an exemplary embodiment, SystemC has been selected to provide this common platform (as the system modeling and simulation platform (540)) to link, as a single framework, an application and system design platform 520 and the instruction (or control) and memory-based modeling platform (510). Platforms provided by other vendors, such as the SPW and LISATek platforms, have then been modified by providing SystemC conduits, for the corresponding information to be converted and/or exported into the common SystemC platform. In an exemplary embodiment, a ConvergenC platform from CoWare has been utilized, while an OSCI System C modeling platform could be utilized equivalently. Other platforms and non-SystemC platforms may be utilized equivalently. For such alternative embodiments, rather than providing SystemC-compatible descriptions and files, the application and system design platform 520 and the instruction (or control) and memory-based modeling platform (510) should be adapted to provide compatible descriptions and files suitable for the selected system modeling and simulation platform (540), such as a Cadence modeling platform. The Algorithmic ESL system 500 simply requires that the outputs of the application and system design platform 520 and instruction (or control) and memory-based modeling platform (510) be provided or capable of being converted into a format which is usable by the system modeling and simulation platform (540), such as to provide the sophisticated level of interactivity and abstraction available with the Algorithmic ESL system 500.
The application and system design platform 520 is utilized by a system or application designer to create and model applications for operation on a selected architecture, generally interactively with the system modeling and simulation platform 540 (which may be running in the background). As mentioned above, the system or application designer does not need to interact directly with or have knowledge of the system modeling and simulation platform 540. The application and system design platform 520 receives the “design intent” of the application as inputs, generally in the form of architectural definitions 570 (such as macrolibraries, IC libraries to implement specific functions (e.g., DCTs, FFTs, DAGs, DMAs), computational elements existing on the IC, contexts for implementations of configurable architectures, and other types of instructions (e.g., C or C++ code)), graphical data flow diagrams 575 representing a selected or given algorithm, and P3 and/or R3 specifications 580. Transparently to the user/designer, the application and system design platform 520 also receives input from the instruction (or control) and memory-based modeling platform (510), such as the CA and TA computational element models 555 and the P3 and/or R3 statistics 565.
The application and system design platform 520 then performs functional simulations of the application (or any portions thereof, such as for testing of application modules or components), providing functional models which can be evaluated by the system designer. On the basis of these results, the application or system designer may then modify the application, repeat the functional simulations, and continue with this iterative process until the functional model has been verified to the required level of performance and to meet other specified requirements. A satisfactory application functional model is then provided (typically as a database) to the system modeling and simulation platform 540, for simulation and modeling of the application (or algorithm) on the target IC architecture.
For example, the application and system design platform 520 then provides various selectable outputs, such as computational element compositions files 585 (the number and type of computational elements to implement the algorithm), any P3 and/or R3 constraints 590 for the given algorithm, and computational element code 595 (such as design XML which may be mapped to interconnect the various computational elements, or contexts utilized to configure adaptive or configurable computational elements). These outputs, in turn, are utilized by the system modeling and simulation platform (540) to provide functional and/or behavioral simulation and modeling of the application (or algorithm) on the target IC architecture, to provide an IC functional model, and-to provide corresponding feedback, generally iteratively, to the designer via the application and system design platform 520, allowing the designer to modify and refine the algorithm based on performance statistics (515) and other parameters. Typically, the system modeling and simulation platform 540 is adapted to compare the application functional model with the IC functional model, and to provide the corresponding results back to the application or system designer.
In addition, the functional and behavioral simulation and modeling of the application on the target IC provided by the system modeling and simulation platform 540 may be incremental or modular. For example, as one aspect of an application is prepared, such as a DCT or FFT module, that module may be ported into the system modeling and simulation platform 540, which will provide a corresponding portion (module) of the functional IC model. This process may occur in the background, while the system or application designer continues to work with the application and system design platform 520. This incremental and concurrent approach is one of the features of the Algorithmic ESL system 500 that helps to significantly decrease development time cycles and time to market.
Another important result of the integrated Algorithmic ESL system 500 is that the functional IC model generated for each such module or component provides both verification and performance results which may then be utilized by the other platforms (520, 510) and integrated directly, without repeating those modeling and computation steps. In addition, these results are then automatically embedded (rolled-up) in the overall models, allowing the designer to work at a more abstract level, yet simultaneously allowing the designer to drill-down as needed into these more concrete details.
As part of the application functional testing, the application and system design platform 520 can simulate and test various data traffic scenarios, test cases, verify computational element designs, test interconnect traffic patterns, control flow patterns, etc. The application and system design platform 520 may also do this at various levels of abstraction and views (as provided via the instruction (or control) and memory-based modeling platform (510) discussed above)), including the abstractions of the data flow, control flow, and memory flow, and any other abstractions of the memory hierarchy itself, such as the identifying multiple waypoints which exercise the memory subsystems. This ability to abstract and model a memory architecture as part of a data flow architecture and, indeed, as part of any embedded processing environment, is one of the many new and novel features of the present invention.
For example, instead of generating thousands of lines of C code, an algorithm may be captured in SPW (application and system design platform 520), followed by opening ports of the memory subsystems, and exporting the information into SystemC. The system modeling and simulation platform (540) may then connect to the memory subsystems and run the application, providing data traffic, memory flow information, and all other parameters and statistics utilized by those of skill in the field. Different versions of an algorithm may also be iteratively tested in this way, such as by simulating one solution with a first mix of computational elements, and comparing this to a simulation utilizing a second mix of computational elements performing the same algorithm. In addition, the use of the various levels of functional and architectural abstraction allow a designer to drill-down to increased detail as needed and to roll-up to a higher level of abstraction, allowing rapid design and development cycles.
Similarly, the SystemC framework implemented with the system modeling and simulation platform (540) can also model interconnect at different levels of abstraction and using different types and mixes of interconnect, such as switches, multiplexers, or routers. The interconnect can be modeled at these various levels, providing a simulation framework to form conclusions and make decisions based on objective, numeric evaluations.
The resulting simulation models, from both the application and system design platform 520 and the system modeling and simulation platform 540, are also scaleable, utilizing the various levels of abstraction. For example, initial functional simulations using the application and system design platform 520 may be run rapidly at a high level of abstraction, providing greater performance without requiring hardware emulation or hardware prototypes. In addition, higher accuracy and a more detailed analysis is provided utilizing the less abstract, more detailed and concrete levels illustrated, such as the block and elemental levels 265 and 270 illustrated in
The application and system design platform 520 may be implemented utilizing an algorithmic programming language platform, such as platforms available from various vendors, with the inventive modifications and features of the Algorithmic ESL system 500, such as a Signal Processing Workstation (SPW) available from CoWare or Cadence, or other platforms such as those provided by MathWorks Simulink. A myriad of other equivalent platforms may be utilized, with the additional functionality described herein, and all such platforms are within the scope of the present invention.
Using the format-compatible database generated by the application and system design platform 520, the system modeling and simulation platform (540) generates a functional IC model of a version of the system or the final system (505), namely, a version based on the operation of the application on the target IC architecture, based on simulation and verification of computational elements, interconnect, memory subsystems, support models (such as clocking and I/O), with any hardware operating system (hardware OS) running on the model of the IC, and other IC parameters as used in the EDA and ESL fields, and utilizing the inputs provided from the instruction (or control) and memory-based modeling platform (510). As mentioned above, the system modeling and simulation platform (540) provides a unifying platform for both applications and architecture, such as linking SPW and SystemC, and linking LISATek and SystemC, for example.
This interaction between the application and system design platform 520 and the system modeling and simulation platform (540) allows rapid prototyping and comparisons by the designer of a plurality of versions, at different levels of simulation and verification, to allow rapid decisions for design trade-offs such as IC size and performance. In addition, the application and system design platform 520 can be utilized in conjunction with the instruction (or control) and memory-based modeling platform (510), such as to create an architecture with more or fewer computational elements or a different mix of computational elements. Also, the application and system design platform 520 is utilized to create the any code (contexts, control, assembly or other programs) to operate the resulting IC for implementation of the selected algorithm, not just for design and functional simulation.
For example, various applications may be created to run on different IC platforms, such as those with different mixes of computational elements, using application and system design platform 520. These functional simulations and models (e.g., in database 605 of
The Algorithmic ESL illustrated in
The Algorithmic ESL also has particular application to the design and simulation of configurable and reconfigurable IC architectures. In such architectures, computational elements may be configured, through control bits (representing contexts or other types of control information), to perform multiple operations. In addition, the interconnect connecting a plurality of computational elements is also programmable or configurable, allowing a plurality of ways of connecting the computational elements for execution of a particular function or algorithm. The ability of the instruction (or control) and memory-based modeling platform (510) to create a flow transform, which includes not only data flow but also the memory flow and control information (for configuring the operations of the computational elements), is invaluable for implementing any selected algorithm. These architectures (with their corresponding configurations or contexts) may then be encapsulated as separate library elements in SystemC (or another RTL, VHDL or other compatible format utilized in the common platform), allowing rapid assembly into functional block for simulation and verification by system modeling and simulation platform 540. These architectures may also be provided as libraries (architecture definition files 570) and CA and TA computational element models 555 for use directly in application development (with application and system design platform 520) and system modeling (with system modeling and simulation platform 540).
As a consequence, the Algorithmic ESL system 500, 600 of the present invention provides an integrated application, IC design, and IC and application simulation and modeling solution, integrating algorithmic development with software and hardware design and implementation. In the illustrated embodiments, an application may be functionally modeled, further modeled using the target IC architecture, and compiled to that architecture, all using a single, integrated framework with full communication capability between and among the composite design and simulation platforms (510, 540, 520).
The Algorithmic ESL of the present invention also provides multiple levels and abstractions of simulation and modeling. At one level, represented by functional models database 605, functional simulation is provided, without regard to particular IC architectural effects. At other levels, simulation and modeling is provided for computational elements and different platforms, incorporating any selected IC parameters. At yet another level, complete device gate-level characteristics may be included, such as transistor-level parasitics, to provide functional and architectural simulation and modeling. In addition, each of these various levels may be back-annotated or fed back into other simulation and modeling levels, to provide further IC refinements and to roll-up more detailed simulations into the higher level, more abstract simulations and views. Of particular importance, an application designer does not need to perform verification at a detailed level, as that information is already embedded in the models utilized and generated via the instruction (or control) and memory-based modeling platform (510) and system modeling and simulation platform (540). The Algorithmic ESL system 500 allows applications and other software to be captured at a high level in application and system design platform 520, yet concurrently mapped to, modeled, and compiled on the target architecture. At the same time, parameterization and control (such as for P3 requirements) is available to the system designer, allowing high-level trade-offs for modeling and to guide the system compiler 650.
When the functional application model has been verified in step 715, the method proceeds to step 725, and provides the verified functional application model in a hardware simulation compatible format, such as SystemC, RTL, Verilog, or VHDL, also typically by the application and system design platform 520. In an exemplary embodiment, the verified functional application model is provided as an application netlist of computational elements and interconnections. Next, in step 730, a second functional simulation is performed using the verified functional application model in the hardware simulation compatible format and using an integrated circuit architecture model to provide a functional architecture model, typically by the system modeling and simulation platform (540). The functional architecture model is compared with the verified functional application model, step 735. Through these comparisons and other evaluations, the functional architecture model may be verified, step 740, and using the verified functional architecture model, the application may be compiled to an integrated circuit architecture represented by the integrated circuit architecture model, step 745., and the method may end, return step 750. When the functional architecture model is not verified in step 740, the method returns to step 720 and iterates, typically interactively with the system or application designer, until a satisfactory functional architecture model is verified, as discussed above.
Also as discussed above, the methodology may include generating a plurality of cycle-accurate computational element models; and incorporating the plurality of cycle-accurate computational element models into the integrated circuit architecture model. The plurality of cycle-accurate computational element models are generated in the hardware simulation compatible format, to facilitate use in the common platform. In addition, receiving the application may further comprise: receiving a plurality of architecture definition files; receiving a plurality of dataflow diagrams; and receiving performance specifications.
In addition, the methodology illustrated in
The inventive Algorithmic ESL also provides a fully integrated solution. It allows an application to be captured and developed at an abstract level. It further allows it to be modeled and verified at abstract levels, compared using different architectures and hardware versions, and finally compiled to a selected architecture, all within the same design and development tool suite.
While the invention is particularly illustrated and described with reference to exemplary embodiments, it will be understood by those skilled in the art that numerous variations and modifications in form, details, and applications may be made therein without departing from the spirit and scope of the novel concept of the invention. Some of these various alternative implementations are noted in the text. It is to be understood that no limitation with respect to the specific methods, systems, software and apparatus illustrated herein is intended or should be inferred. It is, of course, intended to cover by the appended claims all such modifications as fall within the scope of the claims.
This application is related to and claims priority to U.S. patent application Ser. No. ______, filed concurrently herewith, inventor Bhaskar Kota, entitled “Flow Transform For Integrated Circuit Design And Simulation Having Combined Data Flow, Control Flow, And Memory Flow Views”, which is commonly assigned herewith, the contents of which are incorporated herein by reference, and with priority claimed for all commonly disclosed subject matter.