Profiling of software and circuit designs utilizing data operation analyses

Description

CROSS REFERENCE TO A RELATED APPLICATION

This application is related to Paul L. Master et al., U.S. patent application Ser. No. 09/815,122, entitled “Adaptive Integrated Circuitry With Heterogeneous And Reconfigurable Matrices Of Diverse And Adaptive Computational Units Having Fixed, Application Specific Computational Elements”, filed Mar. 22, 2001, now U.S. Pat. No. 6,836,839, commonly assigned to QuickSilver Technology, Inc., and incorporated by reference herein, with priority claimed for all commonly disclosed subject matter (the “related application”).

FIELD OF THE INVENTION

The present invention relates in general to profiling of software and circuit designs for performance analyses and, more particularly, to profiling both software and reconfigurable and adaptive circuit designs utilizing data parameters, including data dynamics and other data operational statistics.

BACKGROUND OF THE INVENTION

Software or other computing programs, such as programs expressed in C and C++ code, have been profiled in the prior art, as a method of determining performance of the program as executed, generally using criteria such as estimated power consumption, speed of execution, code size, integrated circuit (IC) area utilized in execution, and other performance measures. Such current profiling techniques, as a consequence, have been confined largely to the processor (microprocessor) computing environment, for example, to identify algorithms which may be separately accelerated in an application specific integrated circuit (ASIC), or to provide statistics on processor or program performance.

Current profiling techniques are generally statistical or intrusive. In statistical profiling, an interrupt is generated, which then allows the capture of various register contents or counters. This type of profiling then provides statistics, such as how often the program executes a particular algorithm or routine. One widely used hardware profiler, for example, requires the user to stop the central processing unit (CPU) during program execution, and use special debugging registers to generate a profile.

Other existing profiling techniques are typically intrusive. In this method, extra lines of programming code are actually inserted periodically into the program code to be profiled. As these inserted code segments are called, hard counts may be generated, reflecting usage of a corresponding algorithm or routine.

Both statistical profiling and intrusive profiling have significant limitations. For example, depending upon the granularity or degree to which code has been inserted or interrupts generated, both methodologies may typically miss or overlook code features between such points of intrusion or interrupt.

In addition, measures of power and performance based upon such current statistical or intrusive profiling may be significantly inaccurate. Such power and performance measures are typically based upon various underlying assumptions, such as data pipeline length, and exhibit strong data dependencies, such as depending upon the sequence of logic 1s and 0s (i.e., high and low voltages) within a particular data stream. In addition, such power and performance measures also depends significantly upon program dynamics, such that statistical or intrusive profiling often provides inaccurate results compared to actual performance of the program. As a consequence, because current profiling techniques do not account for data issues and concerns, they tend to be significantly inaccurate.

Finally, the existing profilers can measure program performance in known computing architectures or processor architectures only; no profilers exist for profiling program execution for an integrated circuit that is reconfigurable or adaptive. In the reconfigurable hardware environment, the combination of hardware computational units, their interconnections, the proximity of data to these computation units, and the algorithms to be performed by the circuit, each contribute to overall efficiency of execution. Existing profiling tools do not address the impact of each of these variables.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a flow diagram illustrating an exemplary method of data profiling in accordance with the present invention.

FIG. 2 is a block diagram illustrating an exemplary apparatus embodiment, referred to as the ACE architecture, in accordance with the invention of the related application.

FIG. 3 is a block diagram illustrating a reconfigurable matrix, a plurality of computation units, and a plurality of computational elements of the ACE architecture, in accordance with the invention of the related application.

FIG. 4 is a flow diagram illustrating an iterative profiling process for code selection and circuit design in accordance with the present invention.

FIG. 5 is a graphical diagram visually illustrating a data structure embodiment in accordance with the present invention.

DETAILED DESCRIPTION OF THE INVENTION

While the present invention is susceptible of embodiment in many different forms, there are shown in the drawings and will be described herein in detail specific embodiments thereof, with the understanding that the present disclosure is to be considered as an exemplification of the principles of the invention and is not intended to limit the invention to the specific embodiments illustrated.

The present invention, referred to herein as the “profiler” or the “profiling tool”, of the present invention, evaluates or profiles both existing (or legacy) program code (or software) and new forms of program code on the basis of a plurality of data parameters. Such a profiling tool, in accordance with the present invention, provides this profiling based upon data parameters (or data metrics) such as data location (for static data), data type, data size (input and output), data source and destination locations (for dynamic data), data pipeline length, locality of reference, distance of data movement, speed of data movement, data access frequency, number of data load/stores, degree of cache, register or memory usage, data persistence, corresponding algorithmic element, and corresponding hardware location for the algorithmic element. The profiler of the invention may also provide other measurements of resource utilization, such as memory throughput, execution time and frequency, power consumed, number of instructions utilized, and so on. The profiling tool of the present invention is especially useful in performing actual circuit design and implementation, particularly in the adaptive computing environment.

The profiler of the present invention was developed as a component in a suite of development tools for designing adaptive and reconfigurable hardware and attendant configuration information (which themselves are the subjects of the related application and other pending patent applications). This reconfigurable hardware, referred to as an adaptive computing engine (“ACE”) architecture, and the other development tools, is briefly discussed with reference to FIGS. 2 and 3 and is used to establish a hardware context for the present invention.

The present invention is a method, system, tangible medium storing machine readable software, and data structure for profiling computing programs, communication programs, and other program code or other software, with respect to a plurality of data parameters. These programs would typically operate or be run in a processing environment, such as a microprocessor or digital signal processor, or operate as embedded in custom hardware such as an ASIC. In an exemplary embodiment of the invention, the profiler analyzes the code based upon the plurality of data parameters, and outputs the results in any of various forms, such as a data structure. Such data structures may embodied in a plurality of forms, such as electronically in a memory, or a visually perceptible form, such as in a graph, spreadsheet, table, or array.

It should be noted that the terminology such as “program”, “code”, “program code”, or “software” are used interchangeably herein and are to be construed broadly to cover and include any type of programming language which has been arranged or ordered such that, when executed, a particular function is performed. For example, the code to be profiled may be a computing program, such as software or operating systems used with a computer or workstation, or a communication program, such as the International Telecommunications Union (ITU) programs or code for cellular, CDMA, GSM, or 3G communications, including legacy ITU code which often is the actual promulgated definition of a communication function to be performed to be compliant with an ITU standard. Other code types may include the code utilized in the International Electrical and Electronic Engineers (IEEE) standards, such as IEEE 802.11 for wireless LANs, or code which is under development or which is to be developed, such as code for software defined radios, and so on.

The profiler of the invention is of particular use in transforming this program code for use in an adaptive and reconfigurable computing environment by, for example, selecting algorithmic elements from the code for performance on various computational elements of the adaptive computing architecture and determining the locations of these computational elements within the ACE architecture based on, among other things, the profiled data parameter results (also referred to as measured data parameters which are combined to form data parameter comparative results). In another exemplary embodiment of the invention, the profiler accepts, as inputs, various hardware descriptions of the adaptive computing architecture, and program code or software which would otherwise typically be run in a processor-based environment. The profiler analyzes the code based upon the plurality of data parameters, and using the profiled data parameter results, selects portions of the profiled code for conversion into a form so that its corresponding functions or algorithmic elements may be executed on the ACE architecture directly by corresponding selected computational elements. These profiling statistics are calculated iteratively, as portions of code are identified for execution of corresponding functions in the adaptive computing architecture, and subsequently removed from the software code. The process is repeated until optimal performance of the ACE architecture (with its adapting configuration) is achieved.

Another unique feature of the present invention is a “self-profiling” capability in the adaptive computing architecture. As discussed in greater detail below, the adaptive computing engine (“ACE”) architecture is configurable and reconfigurable, with actual input connections and output connections between and among constituent computational elements being changeable, in real time, to perform different functions as needed, to provide the overall operating mode of the adaptive computing architecture. For example, computational elements such as adders and multipliers, which were performing a discrete cosine transformation (DCT), may be reconfigured to perform a fast Fourier transformation (FFT). As a consequence of this reconfiguration capability, a profiler may be included within the ACE, with the profiler operating upon the actual circuit design or structure of the ACE as it is operating. As the ACE operates, it may determine that, based upon the profiled data parameter results, it should change or modify its configuration for performance of one or more functions or operations. For example, based upon profiled data parameter results such as distance of data movement, the ACE may reconfigure itself by providing additional data memory in closer proximity to the area of its circuit performing a corresponding calculation.

The profiler of the invention utilizes the plurality of data parameters as one form of measurement of performance, indicative of resource utilization, speed of operation, power utilization, and so on. Such performance may be evaluated on one or more levels, such as “coarse grain” performance metrics at the program language function level, and “fine grain” statistics at the CPU instruction level or other hardware level.

In the various exemplary embodiments, the plurality of data parameters may include one or more of the following parameters, in addition to other forms of data measurement:

- data location (for static data), such as a memory or register location;
- data type, such as input data, intermediate calculation data, output data, other forms of operand data, and so on;
- data size (input and output), such as number of bits, bandwidth required (bus or interconnect width), which may also be a function of or otherwise related to data type;
- data source and destination locations (for dynamic data), such as memory or register locations;
- data pipeline length, such as how long a data stream is;
- locality of reference, such as how far the data is from the computing elements which will process or use it;
- distance of data movement (for dynamic data), such as a distance between various caches, registers, or other memory locations, and/or the distance the data moves between or among the lines of code being profiled;
- speed of data movement (for dynamic data), namely, how fast was the data able to be moved from a source location to a computing element and/or to a destination location, such as the number of computation cycles utilized in the data transfer;
- data access frequency, such as how often was the data accessed;
- data loads and stores (load/stores) into registers, caches or other memory;
- degree of cache, register or memory usage; and
- data persistence, such as how long did the data remain in use, either unchanged or as (repeatedly) modified and used.

In addition, the measured data parameters, as profiled data parameter results, may be combined in various ways, such as by a weighted function, to produce an overall, comparative result (referred to as a data parameter comparative result), defining a new unit of measure referred to as a “data operational unit”, as discussed in greater detail below. Various sets of profiled data parameter results may be generated, with corresponding data operational units, based upon various hardware architectures, based upon corresponding algorithms or algorithmic elements, and based upon various input data sets. Using the comparative data operational units, an optimal architecture may be selected, with a corresponding set of optimal profiled data parameter results.

FIG. 1 is a flow diagram illustrating an exemplary method of data profiling in accordance with the present invention. The method begins, start step 5, with the selection of a program or other code for data parameter profiling, step 10, and the selection of an input operand data set (if any), to be utilized by the program or other code, step 15. Steps 10 and 15 are not order-dependent, that is, they may occur in either order, as is true of most of the method steps below. The method then determines and identifies one or more portions of the code in which data may remain static, that is, the data need not move from one memory location to another during processing or other code execution, such as from one register or memory to another memory or register, step 20. When data which may remain static has been determined or identified in step 20, the method proceeds to step 25, to evaluate the selected code portion using one or more of the plurality of data parameters, providing a corresponding plurality of measured data parameters. When the code portion does not utilize static data, the method may proceed to step 40.

In step 25, not all of the plurality of data parameters may be applicable. For example, the data parameters for source and destination locations would be inapplicable, as the data used in this code portion does not move to a destination location. Data parameters which are applicable, among others, include data location, data type, data size (input and output), data pipeline length, locality of reference, data access frequency, number of data load/stores, degree of cache, register or memory usage, and data persistence. In the event that data location may be determined as a code location rather than an IC location, the code location may also be mapped to an IC location, as part of step 25 or as a separate step.

Also for static data, the profiling methodology determines the corresponding algorithmic element (or function or operation) involving the static data, step 30, such as a multiply, divide, add, subtract, accumulate, multiply-and-accumulate, and so on. In addition, the method determines an applicable hardware location, if any, for execution or performance of the algorithmic element, step 35, such as a location of a computational element for an ACE embodiment, or a location within a microprocessor.

Following step 35, or when there is no data which may remain static determined or identified in step 20, the method proceeds to evaluate the selected code portion using the plurality of data parameters for dynamic data, that is, data which does move from one location to another during processing or other execution, step 40. When dynamic data has been identified or determined in step 40, the profiling methodology evaluates the selected code portion using one or more of the plurality of data parameters, step 45, such as determining the source (from) and destination (to) locations of the data, data type, data size (input and output), data pipeline length, locality of reference, distance of data movement, speed of data movement, data access frequency, number of data load/stores, degree of cache, register or memory usage, and data persistence, and provides a corresponding plurality of measured data parameters. As mentioned above, if the source and destination locations are determined as locations within the code, they also may be mapped to source and destination locations within an IC, in this or another step. For the particular identified dynamic data for the selected code portion, the profiling methodology determines the corresponding algorithmic element or function involving the dynamic data, step 50, also such as a multiply, divide, add, subtract, accumulate, multiply-and-accumulate, and so on. In addition, the method determines an applicable hardware location, if any, for execution or performance of the algorithmic element, step 55, such as a location of a computational element for an ACE embodiment, or a location within a microprocessor.

Following step 55, or when there is no dynamic data determined or identified in step 40, the method proceeds to step 60, and determines whether there is any remaining code (or program) for profiling. When there is remaining code in step 60, the method returns to step 20, and continues to iterate until there is no code remaining for profiling. In lieu of returning to step 20, the method may also return to step 15, if one or more additional input operand data sets are needed, and then continue to iterate until there is no code remaining for profiling.

Not separately illustrated in FIG. 1, a program may also be profiled for resource utilization, based upon the plurality of data parameters, independently of any assumption of an underlying hardware utilized for execution, with all locations and distances based upon code locations, and generally with a more limited set of data parameters used in the evaluation.

When there is no code remaining for profiling in step 60, the methodology generates the complete results of the data profiling, such as the measured data parameters (also referred to as profiled data parameter results), and preferably also the data parameter comparative results (derived from the measured data parameters, as discussed below), and provides these results in a data structure form, step 65. In step 65, the method generates a selected data structure or other representation of the profiled data parameter results, such as a graphical or tabular representation, a spreadsheet, a multidimensional array, a database, a data array stored in a memory or other machine-readable medium, or another form of data structure. As indicated above, the measured data parameters (or profiled data parameter results) may be combined to form data parameter comparative results which are expressed in data operational units, as comparative, numerical values. An exemplary data structure, as a visually perceptible structure, is illustrated as a two-dimensional array in FIG. 5. Following such result generation, the method may end, return step 70.

With regard to the methodology illustrated in FIG. 1, it should be noted that many of the steps are order independent and may occur in any order, without departing from the spirit and scope of the present invention. For example, the dynamic and static data determinations, and their corresponding steps, may occur in a wide variety of orders and in a wide variety of ways. Similarly, the generation of results may also occur in a wide variety of ways and in a wide variety of orders, such as following each iteration, or periodically, rather than at the end of the profiling process. All such variation are considered equivalent to the method illustrated in FIG. 1.

The methodology of the present invention is particularly suitable for adaptation of existing or legacy code, such as C or C++ code, for the adaptive computing architecture. In addition, the profiling of the present invention is also suitable for new forms of code or programming, including code based upon programming languages designed for the adaptive computing engine.

FIG. 2 is a block diagram illustrating an adaptive computing engine (ACE) 100 of the invention of the related application, which is preferably embodied as an integrated circuit, or as a portion of an integrated circuit having other, additional components. (The ACE 100 is also described in detail in the related application.) In the exemplary embodiment, and as discussed in greater detail below, the ACE 100 includes one or more reconfigurable matrices (or nodes) 150, such as matrices 150A through 150N as illustrated, and a matrix interconnection network (MIN) 110. Also in the exemplary embodiment, and as discussed in detail below, one or more of the matrices 150, such as matrices 150A and 150B, are configured for functionality as a controller 120, while other matrices, such as matrices 150C and 150D, are configured for functionality as a memory 140. While illustrated as separate matrices 150A through 150D, it should be noted that these control and memory functionalities may be, and preferably are, distributed across a plurality of matrices 150 having additional functions to, for example, avoid any processing or memory “bottlenecks” or other limitations. The various matrices 150 and matrix interconnection network 110 may also be implemented together as fractal subunits, which may be scaled from a few nodes to thousands of nodes. Depending upon the selected embodiment, a processor (such as a microprocessor or digital signal processor (DSP) may be included with the ACE 100 in a larger apparatus or system embodiment.

A significant departure from the prior art, the ACE 100 does not utilize traditional (and typically separate) data, direct memory access (DMA), random access, configuration and instruction busses for signaling and other transmission between and among the reconfigurable matrices 150, the controller 120, and the memory 140, or for other input/output (“I/O”) functionality. Rather, data, control and configuration information are transmitted between and among these matrix 150 elements, utilizing the matrix interconnection network 110, which may be configured and reconfigured, to provide any given connection between and among the reconfigurable matrices 150, including those matrices 150 configured as the controller 120 and the memory 140, as discussed in greater detail below.

It should also be noted that once configured, the MIN 110 also and effectively functions as a memory, directly providing the interconnections for particular functions, until and unless it is reconfigured. In addition, such configuration and reconfiguration may occur in advance of the use of a particular function or operation, and/or may occur in real-time or at a slower rate, namely, in advance of, during or concurrently with the use of the particular function or operation. Such configuration and reconfiguration, moreover, may be occurring in a distributed fashion without disruption of function or operation, with computational elements in one location being configured while other computational elements (having been previously configured) are concurrently performing their designated function.

The matrices 150 configured to function as memory 140 may be implemented in any desired or preferred way, utilizing computational elements (discussed below) of fixed memory elements, and may be included within the ACE 100 or incorporated within another IC or portion of an IC. When the memory 140 is included within the ACE 100, it may be comprised of computational elements which are low power consumption random access memory (RAM), but also may be comprised of computational elements of any other form of memory, such as flash, DRAM, SRAM, SDRAM, MRAM, FeRAM, ROM, EPROM or E²PROM. As mentioned, this memory functionality may also be distributed across multiple matrices 150, and may be temporally embedded, at any given time, as a particular MIN 110 configuration. In addition, the memory 140 may also include DMA engines, not separately illustrated.

The controller 120 may be implemented, using matrices 150A and 150B configured as adaptive finite state machines, as a reduced instruction set (“RISC”) processor, controller or other device or IC capable of performing the two types of functionality discussed below. The first control functionality, referred to as “kernel” control, is illustrated as kernel controller (“KARC”) of matrix 150A, and the second control functionality, referred to as “matrix” control, is illustrated as matrix controller (“MARC”) of matrix 150B.

The matrix interconnection network 10 of FIG. 2, and its subset interconnection networks separately illustrated in FIG. 3 (Boolean interconnection network 210, data interconnection network 240, and interconnect 220), collectively and generally referred to herein as “interconnect”, “interconnection(s)”, “interconnection network(s)” or MIN, provide selectable (or switchable) connections between and among the controller 120, the memory 140, the various matrices 150, and the computational units 200 and computational elements 250 discussed below, providing the physical basis for the configuration and reconfiguration referred to herein, in response to and under the control of configuration signaling generally referred to herein as “configuration information”. In addition, the various interconnection networks (110, 210, 240 and 220) provide selectable, routable or switchable data, input, output, control and configuration paths, between and among the controller 120, the memory 140, the various matrices 150, and the computational units 200 and computational elements 250, in lieu of any form of traditional or separate input/output busses, data busses, DMA, RAM, configuration and instruction busses. In other words, the configuration information is utilized to select or switch various connections between or among computational elements 250 and, in so doing, configures or reconfigures the computational elements 250 to perform different functions, operations, or algorithmic elements.

The various matrices or nodes 150 are reconfigurable and heterogeneous, namely, in general, and depending upon the desired configuration: reconfigurable matrix 150A is generally different from reconfigurable matrices 150B through 150N; reconfigurable matrix 150B is generally different from reconfigurable matrices 150A and 150C through 150N; reconfigurable matrix 150C is generally different from reconfigurable matrices 150A, 150B and 150D through 150N, and so on. The various reconfigurable matrices 150 each generally contain a different or varied mix of adaptive and reconfigurable computational (or computation) units (200, FIG. 3); the computational units 200, in turn, generally contain a different or varied mix of fixed, application specific computational elements (250, FIG. 3), which may be adaptively connected, configured and reconfigured in various ways to perform varied functions, through the various interconnection networks. In addition to varied internal configurations and reconfigurations, the various matrices 150 may be connected, configured and reconfigured at a higher level, with respect to each of the other matrices 150, through the matrix interconnection network 110.

The ACE architecture utilizes a plurality of fixed and differing computational elements, such as (without limitation) correlators, multipliers, complex multipliers, adders, demodulators, interconnection elements, routing elements, combiners, finite state machine elements, reduced instruction set (RISC) processing elements, bit manipulation elements, input/output (I/O) and other interface elements, and the lower-level “building blocks” which form these units, which may be configured and reconfigured, in response to configuration information, to form the functional blocks (computational units and matrices) which may be needed, at any given or selected time, to perform higher-level functions and, ultimately, to execute or perform the selected operating mode, such as to perform wireless communication functionality, including channel acquisition, voice transmission, multimedia and other data processing. Through the varying levels of interconnect, corresponding algorithms are then implemented, at any given time, through the configuration and reconfiguration of fixed computational elements (250), namely, implemented within hardware which has been optimized and configured for efficiency, i.e., a “machine” is configured in real-time which is optimized to perform the particular algorithm.

Next, the ACE architecture also utilizes a tight coupling (or interdigitation) of data and configuration (or other control) information, within one, effectively continuous stream of information. This coupling or commingling of data and configuration information, referred to as “silverware” or as a “silverware” module, is the subject of another patent application. For purposes of the present invention, however, it is sufficient to note that this coupling of data and configuration information into one information (or bit) stream, which may be continuous or divided into packets, helps to enable real-time reconfigurability of the ACE 100, without a need for the (often unused) multiple, overlaying networks of hardware interconnections of the prior art. For example, as an analogy, a first configuration of computational elements at a first period of time, as the hardware to execute a corresponding algorithm during or after that first period of time, may be viewed or conceptualized as a hardware analog of “calling” a subroutine in software which may perform the same algorithm. As a consequence, once the configuration of the computational elements has occurred (i.e., is in place), as directed by (a first subset of) the configuration information, the data for use in the algorithm is immediately available as part of the silverware module. The same computational elements may then be reconfigured for a second period of time, as directed by second configuration information (i.e., a second subset of configuration information), for execution of a second, different algorithm, also utilizing immediately available data. The immediacy of the data, for use in the configured computational elements, provides a one or two clock cycle hardware analog to the many separate software steps of determining a memory address, fetching stored data from the addressed registers, and performing the various operations on the data. This has the further result of additional efficiency, as the configured computational elements may execute, in comparatively few clock cycles, an algorithm which may require orders of magnitude more clock cycles for execution if called as a subroutine in a conventional microprocessor or digital signal processor (“DSP”).

This use of silverware modules, as a commingling of data and configuration information, in conjunction with the reconfigurability of a plurality of heterogeneous and fixed computational elements 250 to form adaptive, different and heterogeneous computation units 200 and matrices 150, enables the ACE 100 architecture to have multiple and different modes of operation. For example, when included within a hand-held device, given a corresponding silverware module, the ACE 100 may have various and different operating modes as a cellular or other mobile telephone, a music player, a pager, a personal digital assistant, and other new or existing functionalities. In addition, these operating modes may change based upon the physical location of the device. For example, while configured for a first operating mode, using a first set of configuration information, as a CDMA mobile telephone for use in the United States, the ACE 100 may be reconfigured using a second set of configuration information for an operating mode as a GSM mobile telephone for use in Europe.

Referring again to FIG. 2, the functions of the controller 120 (preferably matrix (KARC) 150A and matrix (MARC) 150B, configured as finite state machines) may be explained with reference to a silverware module, namely, the tight coupling of data and configuration information within a single stream of information, with reference to multiple potential modes of operation, with reference to the reconfigurable matrices 150, and with reference to the reconfigurable computation units 200 and the computational elements 250 illustrated in FIG. 3. As indicated above, through a silverware module, the ACE 100 may be configured or reconfigured to perform a new or additional function, such as an upgrade to a new technology standard or the addition of an entirely new function, such as the addition of a music function to a mobile communication device. Such a silverware module may be stored in the matrices 150 of memory 140, or may be input from an external (wired or wireless) source through, for example, matrix interconnection network 110. In the exemplary embodiment, one of the plurality of matrices 150 may be configured to decrypt such a module and verify its validity, for security purposes. Next, prior to any configuration or reconfiguration of existing ACE 100 resources, the controller 120, through the matrix (KARC) 150A, checks and verifies that the configuration or reconfiguration may occur without adversely affecting any pre-existing functionality, such as whether the addition of music functionality would adversely affect pre-existing mobile communications functionality. In the exemplary embodiment, the system requirements for such configuration or reconfiguration are included within the silverware module or configuration information, for use by the matrix (KARC) 150A in performing this evaluative function. If the configuration or reconfiguration may occur without such adverse affects, the silverware module is allowed to load into the matrices 150 (of memory 140), with the matrix (KARC) 150A setting up the DMA engines within the matrices 150C and 150D of the memory 140 (or other stand-alone DMA engines of a conventional memory). If the configuration or reconfiguration would or may have such adverse affects, the matrix (KARC) 150A does not allow the new module to be incorporated within the ACE 100.

Continuing to refer to FIG. 2, the matrix (MARC) 150B manages the scheduling of matrix 150 resources, clocking, and the timing of any corresponding data, to synchronize any configuration or reconfiguration of the various computational elements 250 and computation units 200 with any corresponding input data and output data. In the exemplary embodiment, timing or other clocking information is also included within a silverware module or, more generally, within configuration information, to allow the matrix (MARC) 150B through the various interconnection networks to direct a reconfiguration of the various matrices 150 in time for the reconfiguration to occur before corresponding data has appeared at any inputs of the various reconfigured computation units 200. In addition, the matrix (MARC) 150B may also perform any residual processing which has not been accelerated within any of the various matrices 150. As a consequence, the matrix (MARC) 150B may be viewed as a control unit which “calls” the configurations and reconfigurations of the matrices 150, computation units 200 and computational elements 250, in real-time, in synchronization with any corresponding data to be utilized by these various reconfigurable hardware units, and which performs any residual or other control processing. Other matrices 150 may also include this control functionality, with any given matrix 150 capable of calling and controlling a configuration and reconfiguration of other matrices 150.

FIG. 3 is a block diagram illustrating, in greater detail, a reconfigurable matrix 150 with a plurality of computation units 200 (illustrated as computation units 200A through 200N), and a plurality of computational elements 250 (illustrated as computational elements 250A through 250Z), and provides additional illustration of many exemplary types of computational elements 250. As illustrated in FIG. 3, any matrix 150 generally includes a matrix controller 230, a plurality of computation (or computational) units 200, and as logical or conceptual subsets or portions of the matrix interconnect network 110, a data interconnect network 240 and a Boolean interconnect network 210. As mentioned above, in the exemplary embodiment, at increasing “depths” within the ACE 100 architecture, the interconnect networks become increasingly rich, for greater levels of adaptability and reconfiguration. The Boolean interconnect network 210, also as mentioned above, provides the reconfiguration and data interconnection capability between and among the various computation units 200, and is preferably small (i.e., only a few bits wide), while the data interconnect network 240 provides the reconfiguration and data interconnection capability for data input and output between and among the various computation units 200, and is preferably comparatively large (i.e., many bits wide). It should be noted, however, that while conceptually divided into reconfiguration and data capabilities, any given physical portion of the matrix interconnection network 110, at any given time, may be operating as either the Boolean interconnect network 210, the data interconnect network 240, the lower level interconnect 220 (between and among the various computational elements 250), or other input, output, configuration, or connection functionality.

Continuing to refer to FIG. 3, included within a computation unit 200 are a plurality of computational elements 250, illustrated as computational elements 250A through 250Z (individually and collectively referred to as computational elements 250), and additional interconnect 220. The interconnect 220 provides the reconfigurable interconnection capability and input/output paths between and among the various computational elements 250. As indicated above, each of the various computational elements 250 consist of dedicated, application specific hardware designed to perform a given task or range of tasks, resulting in a plurality of different, fixed computational elements 250. Utilizing the interconnect 220, the fixed computational elements 250 may be reconfigurably connected together into adaptive and varied computational units 200, which also may be further reconfigured and interconnected, to execute an algorithm or other function, at any given time, utilizing the interconnect 220, the Boolean network 210, and the matrix interconnection network 10. While illustrated with effectively two levels of interconnect (for configuring computational elements 250 into computational units 200, and in turn, into matrices 150), for ease of explanation, it should be understood that the interconnect, and corresponding configuration, may extend to many additional levels within the ACE 100. For example, utilizing a tree concept, with the fixed computational elements analogous to leaves, a plurality of levels of interconnection and adaptation are available, analogous to twigs, branches, boughs, limbs, trunks, and so on, without limitation.

In the exemplary embodiment, the various computational elements 250 are designed and grouped together, into the various adaptive and reconfigurable computation units 200. In addition to computational elements 250 which are designed to execute a particular algorithm or function, such as multiplication, correlation, clocking, synchronization, queuing, sampling, or addition, other types of computational elements 250 are also utilized in the exemplary embodiment. As illustrated in FIG. 3, computational elements 250A and 250B implement memory, to provide local memory elements for any given calculation or processing function (compared to the more “remote” memory 140), thereby decreasing the distance and time required for data movement. In addition, computational elements 250I, 250J, 250K and 250L are configured to implement finite state machines, to provide local processing capability (compared to the more “remote” matrix (MARC) 150B), especially suitable for complicated control processing.

With the various types of different computational elements 250 which may be available, depending upon the desired functionality of the ACE 100, the computation units 200 may be loosely categorized. A first category of computation units 200 includes computational elements 250 performing linear operations, such as multiplication, addition, finite impulse response filtering, clocking, synchronization, and so on. A second category of computation units 200 includes computational elements 250 performing non-linear operations, such as discrete cosine transformation, trigonometric calculations, and complex multiplications. A third type of computation unit 200 implements a finite state machine, such as computation unit 200C as illustrated in FIG. 3, particularly useful for complicated control sequences, dynamic scheduling, and input/output management, while a fourth type may implement memory and memory management, such as computation unit 200A as illustrated in FIG. 3. Lastly, a fifth type of computation unit 200 may be included to perform bit-level manipulation, such as for encryption, decryption, channel coding, Viterbi decoding, and packet and protocol processing (such as Internet Protocol processing). In addition, another (sixth) type of computation unit 200 may be utilized to extend or continue any of these concepts, such as bit-level manipulation or finite state machine manipulations, to increasingly lower levels within the ACE 100 architecture.

In the exemplary embodiment, in addition to control from other matrices or nodes 150, a matrix controller 230 may also be included or distributed within any given matrix 150, also to provide greater locality of reference and control of any reconfiguration processes and any corresponding data manipulations. For example, once a reconfiguration of computational elements 250 has occurred within any given computation unit 200, the matrix controller 230 may direct that that particular instantiation (or configuration) remain intact for a certain period of time to, for example, continue repetitive data processing for a given application.

The profiling methodology of the present invention is also utilized in both the design and the implementation of ACE 100 circuitry. For many applications, IC functionality is already defined and existing as C or C++ code. In other circumstances, standards and algorithms for various technologies are defined and described as C or C++ code. In accordance with the present invention, this existing or legacy code is profiled based on the data parameters mentioned above. Based upon the results of this profiling, the ACE 100 circuitry, and corresponding configuration information, may be determined, preferably in an iterative fashion, as discussed below. For example, a determination may be made that certain data should be static in an ACE 100 implementation, with appropriate hardware (computational elements) configured and reconfigured around the static data, for the performance of the selected algorithm. In other circumstances, also for example, memory elements may be configured adjacent to other computational elements, to provide a very close distance (or locality of reference) between a data source location and a data destination location.

FIG. 4 is a flow diagram illustrating the iterative profiling process for code selection and adaptive computing architecture design in accordance with the present invention. The process begins, start step 300, with input into the profiler of a plurality of hardware architecture descriptions for a corresponding plurality of computational elements 250 (step 305), with these descriptions typically defined using any selected hardware description language (e.g., Matlab and any number of equivalent languages); input into the profiler of selected program code or program(s), such as C or C++ code, to perform one or more specific, selected functions, operations or algorithms (step 310), such as wireless communication algorithms; and input of any operand data sets or any input parameters or settings (step 315). (As referred to herein, the profiler is an apparatus, system or other embodiment which is capable of performing the method illustrated in FIG. 1, such as a computer or workstation, by way of example and not of limitation.) Profiling is performed on a code portion, using the data parameters and other criteria, which produces measured data parameters, data parameter comparative results, and other statistics (step 320), and one or more corresponding algorithmic elements (of the code portion) are determined or selected for execution in an ACE 100 (step 325). For example, such code may be selected because of its measured data parameters, such as because of poor resource usage, or because of high power consumption, such that greater performance may be achieved through an ACE 100 implementation. The method then selects one or more computational elements, and determines a configuration of the computational elements which will perform the algorithmic element (of the selected or candidate code, step 330. When there is remaining code to be profiled, step 335, the method returns to step 320, and iterates until all of the input program code has been profiled.

When there is no remaining code to be profiled in step 335, the method proceeds to step 340, and generates an adaptive computing (or ACE) architecture from the selected computational elements and determines the overall configuration information needed for the adaptive computing (or ACE) architecture to perform the algorithms of the input program code. In an initial iteration, this process results in a “first generation” ACE 100 architecture (or a subpart computational unit 200) and corresponding configuration information, with “next generation” architectures and configuration information generated with subsequent iterations.

The adaptive computing architecture may then be profiled as various algorithmic elements are performed, based upon the data parameters and other criteria, with the generation of another (or next) set of profiled data parameter results (or measured data parameters), step 345. If this set of profiled data parameter results are not optimal or acceptable in step 350, the method returns to step 340 and iterates, generating a next generation adaptive computing (or ACE) architecture and corresponding configuration information, followed again by profiling and generating another set of profiled data parameter results, step 345.

The determination of acceptability or optimality in step 350 may be performed in any number of ways, such as by using a predetermined criterion of optimality, such as a particular level or score expressed in data operational units, or by comparing the various sets of profiled data parameter results after repeated iterations, and selecting the one or more of the better or best adaptive computing (or ACE) architectures of those generated in repeated iterations of step 340. (It should be noted that optimization is used herein in a very broad sense, to mean and include merely desired or acceptable for one or more purposes, such as for a selected operating mode of a plurality of operating modes, for example, and not just meaning “most” desired or favorable.)

When optimal or acceptable results have been obtained in step 350, the method outputs the comparatively optimal, selected adaptive computing (or ACE) architecture and corresponding configuration information, step 355, and outputs the corresponding profiled data parameter results (such as in the form of a data structure), step 360, and the method may end, step 365. It should be noted that there may be a plurality of optimal adaptive computing (or ACE) architectures produced, depending upon any number of factors or constraints imposed; for example, the architecture may have several operating modes, with any given architecture better for one of the operating modes compared to another operating mode.

In the exemplary embodiment, where execution of the algorithmic elements of the program may be optimized through the ACE 100 architecture and corresponding configuration information, the program code may have one or more of the following characteristics: (1) frequently executed code, which may be more appropriately implemented as hardware (i.e., computational elements 250); (2) sequentially executed programming code that could be distributed for parallel execution across computational units 200; (3) inappropriate data typing resulting in wasted memory resources; (4) unnecessary data movement, such that the data should remain static with logic (computational elements 250) configured and reconfigured around the static data; and (5) a distant locality of reference for data, such that source and destination locations for data are comparatively far apart, or a data location “distant” from the computation units 200 or computational elements 250 acting upon the data. Other code or portions of code, such as those only utilized on a single occasion or those involving complicated control sequences, may be performed on various computational units 200 configured as finite state machines, as discussed above, or performed utilizing a separate or additional processor, depending upon selection of an overall system embodiment which may include the adaptive computing (or ACE) architecture with other processor components.

In the exemplary embodiment, the significant measurements of the profiler are those made for the various data parameters discussed above, such as data movement, size, speed and location, and resource utilization, including memory utilization (e.g., frequency of access, load/stores, and so on). Power (consumption and/or dissipation) and other performance metrics may also be assigned to these data parameter measurements, or measured based upon other statistics, providing additional information for use in IC design and evaluation.

As discussed above, the profiling methodology of the present invention may be utilized to evaluate program code on a variety of levels, such as on a code or program level, where the plurality of data parameters are applied to what data does within the program itself (e.g., at a “C” or “C++” function level). The profiling methodology of the present invention may be utilized to evaluate program code on a processor level, where the plurality of data parameters are applied to what data does within a microprocessor of DSP, for example, and may provide a useful point of comparison with profiling of the code on an adaptive computing (or ACE) architecture. The profiled data parameter results from these two types of profiling applications may be referred to, respectively, as “coarse grain” statistics or performance metrics (for the code level), and comparatively lower-level “fine grain” statistics or performance metrics (for the processor or architecture level).

In evaluating program code, the profiler may also measure resource utilization and execution speed using additional metrics or parameters. In the exemplary embodiment, in addition to measurements for the data parameters, other coarse grain measurements include (without limitation): (1) function call metrics, an architecture-independent quantity of how many times a function is executed in the operation; (2) function execution time, an architecture-independent quantity of time in seconds and relative percentage of execution time a function requires to complete; (3) function execution time, including child functions invoked, also architecture-independent and expressed in absolute time in seconds and relative percentage of execution time; (4) memory consumption of the function, measured in bytes, an architecture-independent metric calculated for a complete program, as well as for each component function; and (5) memory throughput, as average throughput, expressed in bytes per second for execution of the complete program and each component function.

In the exemplary embodiment, in addition to measurements for the data parameters, other fine grain measurements determine CPU instruction level activity, defined as the number of machine instructions (e.g., add, multiply, load) executed during program execution. The measurements include the number of times each instruction is executed, globally and per function, classified by type (arithmetic, memory, program control etc.), such as add, multiply, subtract, divide, clear, move, load, swap, jump, and branch. For example, an instruction call profile for some function f₁on some dataset d may indicate that exactly 100 additions, 50 multiplications and 25 divisions are required. This metric is architecture-independent, meaning that the same number and type of instructions will be performed by function f₁regardless of the processor architecture. These additional fine grain measurements may or may not be applicable to an adaptive computing (or ACE) architecture.

In addition to reporting statistics for a function, the profiler can furthermore distinguish between function “call” and function “execution”. During function call, some overhead work occurs, such as parameter passing and context switching, that can be distinguished from the actual operations the function was designed to do. Simple functions, when they are executed a great number of times, may exhibit large execution times in calling the function. One programming remedy to reduce excessive overhead expenditure is to embed the function's program code within the driver, or master program that invokes the function many times. This is called “inlining” the code. Profiler statistics may be used in the determination of whether inlining should occur. The adaptive computing (or ACE) architecture may be designed, with corresponding configuration information, to similarly reduce (if not eliminate) such overhead.

Often program code executes differently depending upon input operand data or other input parameters passed to the function (of either the code or the hardware). In the exemplary embodiment, as an option, several profiles of function performance may be created based upon different input operand data or input parameter values. Similarly, the size or content of different input data sets (or other operand data) may affect performance. The profiler of the present invention can measure performance with several input data sets, and report worst case, best case, as well as average metrics.

As indicated above, one of the truly novel and important features of the present invention is the calculation or creation of an overall, comparative performance metric, referred to as a data parameter comparative result, in a new measurement unit referred to herein as a “data computational unit”, based upon the measured data parameters (profiled data parameter results), for any given architecture or program, for example. Various sets of measured data parameters (profiled data parameter results) may be generated, with corresponding data operational units, based upon various hardware architectures, based upon corresponding algorithms or algorithmic elements, and based upon various input data sets. Using the comparative data operational units, an optimal architecture may be selected, with a corresponding set of optimal profiled data parameter results.

In the exemplary embodiment, for each set of profiled data parameter results, the profiler may calculate an overall, data parameter comparative result in “data operational units” as a measure, among other things, of data handling efficiency, power consumption, operating speed, and so on. In accordance with the present invention, a data operational unit is unit measure, which may be represented as a real number, which in general is calculated or created as a result of a selected combination (such as a weighted sum or product) of one or more of the measured data parameters (for a selected plurality of the data parameters), with or without weighting or other biasing, to form a data parameter comparative result in data operational units. A practitioner skilled in the art, depending upon the purposes of the application, could use myriad different methods and types of calculations and combinations, selecting different data parameter measurements of the profiled data parameter results, and potentially biasing each data parameter differently.

For example, for a selected embodiment, distance between memory locations for data movement, and distance between the data location and where it will be processed (locality of reference), may be among the more significant data parameters, as shorter distances and closer localities of reference may be indicative of faster operation. As a consequence, in determining a data parameter comparative result, expressed in data operational units, distance and locality of reference may be provided with increased weighting or biasing compared to, for example, access frequency or data persistence. Continuing with the example, the data parameter comparative result may be determined as a weighted (β, β, γ, δ) sum, such as α(distance)+β(locality)+γ(frequency)+δ(persistence), where in this case α, β, >>γ, δ.

It should be noted that, as discussed in greater detail below, the measurements themselves comprising the profiled data parameter results should be made, modified or converted into to a form to allow such combination into an overall, comparative result expressed in data operational units. Such conversion also may be accomplished through a weighting mechanism, as discussed above.

In an alternative embodiment, the underlying, profiled data parameter results, for each data parameter, are measured and converted directly into data operational units, which are then further combined to form an overall data parameter comparative result.

The measurements of the data parameters themselves, to form the measured data parameters (also referred to as profiled data parameter results) may also be made in relative or absolute (empirical) measures, such as for data movement and locality of reference distances. Other measures, such as speed or access frequency, may be determined or estimated as a number of clock cycles, and translated into a power measurement for an amount of power that a given movement of data will consume.

As mentioned above, a small “locality of reference” is highly desirable in computing, and particularly so in the reconfigurable environment, where traditional data movement to computational units may be supplanted by positioning or creating computational units positioned closer to persistent data.

Absolute measures may be utilized to determine data distances (such as the distance between source and destination locations), based upon a known architecture, such as the distance between primary and tertiary caches. Another method to measure distances among reconfigurable matrices is relativistic or comparative, using comparative measures based upon a potentially changing hardware topology, which may be more appropriate for the adaptive computing engine.

In addition to embodiment within an adaptive computing architecture such as the ACE 100, the profiler may be embodied in any number of forms, such as within a computer, within a workstation, or within any other form of computing or other system used to profile program code. The profiler may be embodied as any type of software, such as C, C++, C#, Java, or any other type of programming language, including as configuration information (as a form of software) to direct a configuration within an adaptive computing architecture to perform the various profiling functions. The profiler may be embodied within any tangible storage medium, such as within a memory or storage device for use by a computer, a workstation, any other machine-readable medium or form, or any other storage form or medium for use in a computing system to profile program code. Such storage medium, memory or other storage devices may be any type of memory device, memory integrated circuit (“IC”), or memory portion of an integrated circuit (such as the resident memory within a processor IC or ACE 100), including without limitation RAM, FLASH, DRAM, SRAM, SDRAM, MRAM, FeRAM, ROM, EPROM or E²PROM, or any other type of memory, storage medium, or data storage apparatus or circuit, depending upon the selected embodiment. For example, without limitation, a tangible medium storing computer or machine readable software, or other machine-readable medium, is interpreted broadly and may include a floppy disk, a CDROM, a CD-RW, a magnetic hard drive, an optical drive, a quantum computing storage medium or device, a transmitted electromagnetic signal (e.g., a computer data signal embodied in a carrier wave used in internet downloading), or any other type of data storage apparatus or medium, and may have a static embodiment (such as in a memory or storage device) or may have a dynamic embodiment (such as a transmitted electrical signal).

The data parameter comparative results and the measured data parameters may be stored, transmitted, or displayed in the form of a data structure embodied in any tangible medium, data signal or other carrier wave. Such a data structure, for example, may be an array of a plurality of fields stored in a form of memory or in a data storage device, such as the various forms of memory and other storage media and devices discussed above. In addition, such a data structure may also be displayed or illustrated, or converted into a form suitable for such display or illustration. For example, the display of the measured data parameters may be multidimensional and illustrated via any form or type of visual display, such as a video or holographic display, or may be displayed in two dimensions as a graphical or tabular display. For purposes of example, FIG. 5 is a graphical diagram visually illustrating an exemplary data structure embodiment 400 in accordance with the present invention.

Referring to FIG. 5, the exemplary data structure 400 comprises a plurality of fields, a first field 410, a second field 420, a third field 430, and a fourth field 440. The first field 410 comprises an identification or designation of the functions of the program code to be profiled, such as multiply, divide, add, and so on, as determined during profiling, with an identification or designation of a corresponding input data set. As illustrated as an example, first field 410 provides a plurality of functions with two input data sets, designated “function/input data set”, such as function_1/input_data_set_1, function_1/input_data_set_2, function_2/input_data_set_1, function_2/input_data_set_2, and so on. The second field 420 comprises a designation or identification of a plurality of data parameters, in any ordering, and as illustrated includes data location (for static data), data size (input and output), data type, data source and destination locations (for dynamic data), data pipeline length, locality of reference, distance of data movement, speed of data movement, data persistence, data access frequency, number of data load/stores, degree of cache, and register or memory usage. In this exemplary embodiment, the second field 420 also includes designation of a corresponding algorithmic element, corresponding hardware location for the algorithmic element, memory throughput, execution time and frequency, power consumed, and number of instructions utilized.

The third field 430 provides a listing of the measured data parameters (also referred to as profiled data parameter results), for each data parameter of the plurality of data parameters (of field 420), and for each function and input data set (of field 410), and is formed as the profiler is operating. As the profiler operates, it determines which function (of field 410) is occurring in the program code, and with the input data set, performs a measurement or determination of the plurality of data parameters, providing the measured data parameters of field 430. These profiled data parameter results may then be combined, in various forms, to provide one or more data parameter comparative results of a fourth field 440. As illustrated in the fourth field 440, data parameter comparative results are provided for each function and input data set (of field 410), with an overall data parameter comparative result provided for the entire program or architecture being profiled.

In the exemplary embodiment, the profiler is repeatedly run, beginning with the entire program or code set to be incorporated into an ACE architecture, with a target hardware configuration and subsequent modifications, until additional iterations indicate diminishing returns of further acceleration and/or an optimal ACE architecture (with configuration information) is determined. As adjustments are made, certain functions are removed from the code completely, with corresponding algorithmic elements being performed by the computational units 200 of the ACE 100 with its configuration information. Other code, if any, which may not become part of the ACE 100 with configuration information, may be maintained as separate code for separate execution within a processor or within an ACE configured as a processor.

In another exemplary embodiment, the profiler may also reside within the ACE 100 itself, with the profiler operating upon the actual circuit design or structure of the ACE as it is operating. The ACE 100 is capable of refining and adjusting its own configurations and reconfigurations, in the field, without outside intervention, through modification of the configuration information for any given function or operation. As the ACE operates, based upon the profiled data parameter results, the ACE 100 may determine that it should change or modify its configuration for performance of one or more functions or operations. For example, based upon profiled data parameter results such as distance of data movement, the ACE may reconfigure itself by providing additional data memory in closer proximity to the area of its circuit performing a corresponding calculation, store data in new locations, modify data types, and so on.

Numerous advantages of the present invention may be readily apparent. The profiling tool of the invention evaluates both program code and hardware architecture based upon a plurality of data parameters, such as data movement, size and speed. Both existing (or legacy) code and of new forms of code are profiled, within a variety of reconfigurable hardware environments or typical processing environments. The profiler of the invention provides profiling information based upon data parameters or metrics such as data location (for static data), data type, data size (input and output), data source and destination locations (for dynamic data), data pipeline length, locality of reference, distance of data movement, speed of data movement, data access frequency, number of data load/stores, degree of cache, register or memory usage, data persistence, corresponding algorithmic element, and corresponding hardware location for the algorithmic element. The profiler of the invention may also provide other measurements of resource utilization, such as memory throughput, execution time and frequency, power consumed, number of instructions utilized, and so on.

The various exemplary embodiments of the profiler of the invention provide unique advantages, such as use in actual circuit design and implementation. The various exemplary embodiments also provide for profiling of actual circuit designs, and self-modification of adaptive or reconfigurable circuitry through self-profiling.

The present invention also provides a novel unit of measure, a “data operational unit”, for use in providing data parameter comparative results. This comparative measure may be utilized to provide direct comparison of otherwise incomparable or incongruent objects, such as allowing direct comparison of a computing program with a hardware architecture (which performs the algorithms of the program).

From the foregoing, it will be observed that numerous variations and modifications may be effected without departing from the spirit and scope of the novel concept of the invention. It is to be understood that no limitation with respect to the specific methods and apparatus illustrated herein is intended or should be inferred. It is, of course, intended to cover by the appended claims all such modifications as fall within the scope of the claims.

Claims

1. A computer-implemented method for generating a reconfigurable architecture for a hardware adaptive computing engine (ACE) having a set of one or more matrices, each matrix comprising a set of one or more computation units, each computation unit comprising a set of one or more computational elements, and the reconfigurable architecture being reconfigurable in real time when ACE configuration code is executed, the method comprising: profiling ACE configuration code to make measurements of a plurality of data parameters, wherein the code is executable and embodies a plurality of algorithmic elements, and wherein the code, when executed, causes a first function to be performed and a second function to be performed;based on the plurality of data parameters measured, selecting which of the algorithmic elements of the code are to be implemented in the reconfigurable architecture for the first function and the second function;receiving a plurality of hardware architecture descriptions of the sets of matrices, computation units and computational elements;based on the hardware architecture descriptions and the selected algorithmic elements, selecting one or more computational elements;selecting an interconnection network for causing the selected one or more computational elements to be connected together in a first architecture configuration in real time for performing the first function, andswitching, when the ACE configuration code is executing, the interconnection network for causing the selected one or more computational elements to be connected together in a second architecture configuration for performing the second function, the switching including changing the connections among the computational elements based on the profiling to cause the computational elements to be connected in a first architecture configuration for performing the first function and cause the computational elements to be connected in a second, different architecture configuration for performing the second function.
2. The method of claim 1, further comprising: profiling of the reconfigurable architecture to obtain performance results; anditeratively performing the selecting one or more computational elements, the switching the interconnection network, and the profiling until the results are optimal.
3. The method of claim 2, wherein the optimal results are determined based on comparing the data parameters measured from profiling each generation of the reconfigurable architecture for the hardware adaptive computing engine.
4. The method of claim 2, wherein the optimal results are determined based on comparing the data parameters measured from profiling the reconfigurable architecture for the hardware adaptive computing engine to a predetermined criterion.
5. The method of claim 1, wherein the code comprises a first program to perform the first function and a second, different program to perform the second function.
6. The method of claim 1, wherein the code comprises one program having a first code portion to perform the first function and a second code portion to perform the second function.
7. The method of claim 1, wherein the reconfigurable architecture for the hardware adaptive computing engine is embodied in a single integrated circuit.
8. The method of claim 1, wherein the selecting an interconnection network further comprises: determining a hardware location in the hardware adaptive computing engine for respective ones of the selected one or more computational elements.
9. The method of claim 1, further comprising: based on the plurality of data parameters measured, determining power consumption for each algorithmic element of the plurality of algorithmic elements.
10. The method of claim 1, further comprising providing reconfiguration information to connect the selected one or more computational elements together in the first and second configurations.
11. A computer-implemented method for modifying the configuration of a reconfigurable architecture of a hardware adaptive computing engine (ACE) in real time when ACE configuration code is executed, the adaptive computing engine having a set of one or more matrices, each matrix comprising a set of one or more computation units, each computation unit comprising a set of one or more computational elements, the method comprising: profiling ACE configuration code to determine a plurality of data parameters, wherein the code is executable and embodies a plurality of algorithmic elements, and wherein the code, when executed, causes a first function to be performed and a second function to be performed;based on the plurality of data parameters measured, selecting which of the algorithmic elements of the code are to be implemented in the reconfigurable architecture for the first function and the second function;reading a plurality of hardware architecture descriptions of the sets of matrices, computation units and computational elements;based on the hardware architecture descriptions and the selected algorithmic elements, selecting one or more computational elements;switching, when the ACE configuration code is executing, the interconnection network for causing the selected one or more computational elements to be connected together in a second architecture configuration for performing the second function, the switching including changing the connections among the computational elements based on the profiling to cause the computational elements to be connected in a first architecture configuration for performing the first function and cause the computational elements to be connected in a second, different architecture configuration for performing the second function.
12. The method of claim 11, wherein the modifying further comprises: iteratively profiling the modified architecture configuration and changing the connections among the computational elements to obtain optimal performance results.
13. The method of claim 12, wherein the optimal results are determined based on comparing data parameters measured from profiling each of the configurations of the interconnections.
14. The method of claim 12, wherein the optimal results are determined based on comparing data parameters measured from profiling each modified architecture configuration.
15. The method of claim 11, wherein the code comprises a first program to perform a first function and a second, different program to perform a second function.
16. The method of claim 11, wherein the code comprises one program having a first code portion to perform a first function and a second code portion to perform a second function.
17. The method of claim 11, wherein the data parameters are a plurality of static and dynamic data parameters comprising at least one of the following data parameters: locality of reference parameter; data location for static data; data type; input data size; output data size; data source location; data destination location; data pipeline length; distance of data movement; speed of data movement; data access frequency; number of data load/stores; cache usage; register usage; memory usage, and data persistence.
18. An integrated circuit having a reconfigurable architecture, the integrated circuit being reconfigurable in real time when configuration code is executed, the integrated circuit comprising: a profiler for profiling configuration code to make measurements of a plurality of data parameters, wherein the code is executable to perform a first function of the code and a second function of the code and embodies a plurality of algorithmic elementsa plurality of computational elements;control logic for: a) selecting the algorithmic elements of the code that are to be implemented in the reconfigurable architecture for the first function and the second function based on the plurality of data parameters measured,b) receiving a plurality of hardware architecture descriptions of the sets of matrices, computation units and computational elements, andc) based on the hardware architecture descriptions and the selected algorithmic elements, selecting one or more computational elements; anda reconfigurable interconnection network using real time execution of the configuration code for selectively connecting together the plurality of computational elements in a first configuration associated with the first function of the code, by switching, when the configuration code is executing, the interconnection network and causing the selected one or more computational elements to be connected together in a second architecture configuration in real time for performing the second function, the switching including changing connections among the plurality of computational elements based on the profiling to cause the plurality of computational elements to be connected in a second, different configuration for performing the second function of the code.
19. The integrated circuit of claim 18, wherein the control logic iteratively selects configurations based on the profiling to thereby achieve optimal results.
20. The integrated circuit of claim 19, wherein the control logic determines the optimal results based on comparing data parameters measured from profiling each of the configurations.
21. The integrated circuit of claim 19, wherein the control logic determines the optimal results based on comparing data parameters measured from profiling the configuration to a predetermined criterion.
22. The integrated circuit of claim 18, wherein the code comprises a first program to perform the first function and a second, different program to perform the second function.
23. The integrated circuit of claim 18, wherein the code comprises one program having a first code portion to perform the first function and a second code portion to perform the second function.
24. The integrated circuit of claim 18, further comprising providing reconfiguration information to selectively connect together the plurality of computational elements in the first and second configurations.

US Referenced Citations (527)

Number	Name	Date	Kind
3409175	Byrne	Nov 1968	A
3666143	Weston	May 1972	A
3938639	Birrell	Feb 1976	A
3949903	Benasutti et al.	Apr 1976	A
3960298	Birrell	Jun 1976	A
3967062	Dobias	Jun 1976	A
3991911	Shannon et al.	Nov 1976	A
3995441	McMillin	Dec 1976	A
4076145	Zygiel	Feb 1978	A
4143793	McMillin et al.	Mar 1979	A
4172669	Edelbach	Oct 1979	A
4174872	Fessler	Nov 1979	A
4181242	Zygiel et al.	Jan 1980	A
RE30301	Zygiel	Jun 1980	E
4218014	Tracy	Aug 1980	A
4222972	Caldwell	Sep 1980	A
4237536	Enelow et al.	Dec 1980	A
4252253	Shannon	Feb 1981	A
4302775	Widergren et al.	Nov 1981	A
4333587	Fessler et al.	Jun 1982	A
4354613	Desai et al.	Oct 1982	A
4377246	McMillin et al.	Mar 1983	A
4380046	Fung et al.	Apr 1983	A
4393468	New	Jul 1983	A
4413752	McMillin et al.	Nov 1983	A
4458584	Annese et al.	Jul 1984	A
4466342	Basile et al.	Aug 1984	A
4475448	Shoaf et al.	Oct 1984	A
4509690	Austin et al.	Apr 1985	A
4520950	Jeans	Jun 1985	A
4549675	Austin	Oct 1985	A
4553573	McGarrah	Nov 1985	A
4560089	McMillin et al.	Dec 1985	A
4577782	Fessler	Mar 1986	A
4578799	Scholl et al.	Mar 1986	A
RE32179	Sedam et al.	Jun 1986	E
4633386	Terepin et al.	Dec 1986	A
4658988	Hassell	Apr 1987	A
4694416	Wheeler et al.	Sep 1987	A
4711374	Gaunt et al.	Dec 1987	A
4713755	Worley, Jr. et al.	Dec 1987	A
4719056	Scott	Jan 1988	A
4726494	Scott	Feb 1988	A
4747516	Baker	May 1988	A
4748585	Chiarulli et al.	May 1988	A
4758985	Carter	Jul 1988	A
4760525	Webb	Jul 1988	A
4760544	Lamb	Jul 1988	A
4765513	McMillin et al.	Aug 1988	A
4766548	Cedrone et al.	Aug 1988	A
4781309	Vogel	Nov 1988	A
4800492	Johnson et al.	Jan 1989	A
4811214	Nosenchuck et al.	Mar 1989	A
4824075	Holzboog	Apr 1989	A
4827426	Patton et al.	May 1989	A
4850269	Hancock et al.	Jul 1989	A
4856684	Gerstung	Aug 1989	A
4870302	Freeman	Sep 1989	A
4901887	Burton	Feb 1990	A
4905231	Leung et al.	Feb 1990	A
4921315	Metcalfe et al.	May 1990	A
4930666	Rudick	Jun 1990	A
4932564	Austin et al.	Jun 1990	A
4936488	Austin	Jun 1990	A
4937019	Scott	Jun 1990	A
4960261	Scott et al.	Oct 1990	A
4961533	Teller et al.	Oct 1990	A
4967340	Dawes	Oct 1990	A
4974643	Bennett et al.	Dec 1990	A
4982876	Scott	Jan 1991	A
4993604	Gaunt et al.	Feb 1991	A
5007560	Sassak	Apr 1991	A
5021947	Campbell et al.	Jun 1991	A
5040106	Maag	Aug 1991	A
5044171	Farkas	Sep 1991	A
5090015	Dabbish et al.	Feb 1992	A
5099418	Pian et al.	Mar 1992	A
5129549	Austin	Jul 1992	A
5139708	Scott	Aug 1992	A
5144166	Camarota et al.	Sep 1992	A
5156301	Hassell et al.	Oct 1992	A
5156871	Goulet et al.	Oct 1992	A
5165023	Gifford	Nov 1992	A
5165575	Scott	Nov 1992	A
5190083	Gupta et al.	Mar 1993	A
5190189	Zimmer et al.	Mar 1993	A
5193151	Jain	Mar 1993	A
5193718	Hassell et al.	Mar 1993	A
5202993	Tarsy et al.	Apr 1993	A
5203474	Haynes	Apr 1993	A
5218240	Camarota et al.	Jun 1993	A
5240144	Feldman	Aug 1993	A
5245227	Furtek et al.	Sep 1993	A
5261099	Bigo et al.	Nov 1993	A
5263509	Cherry et al.	Nov 1993	A
5269442	Vogel	Dec 1993	A
5280711	Motta et al.	Jan 1994	A
5297400	Benton et al.	Mar 1994	A
5301100	Wagner	Apr 1994	A
5303846	Shannon	Apr 1994	A
5335276	Thompson et al.	Aug 1994	A
5336950	Popli et al.	Aug 1994	A
5339428	Burmeister et al.	Aug 1994	A
5343716	Swanson et al.	Sep 1994	A
5361362	Benkeser et al.	Nov 1994	A
5367651	Smith et al.	Nov 1994	A
5367687	Tarsy et al.	Nov 1994	A
5368198	Goulet	Nov 1994	A
5379343	Grube et al.	Jan 1995	A
5381546	Servi et al.	Jan 1995	A
5381550	Jourdenais et al.	Jan 1995	A
5388062	Knutson	Feb 1995	A
5388212	Grube et al.	Feb 1995	A
5392960	Kendt et al.	Feb 1995	A
5437395	Bull et al.	Aug 1995	A
5450557	Kopp et al.	Sep 1995	A
5454406	Rejret et al.	Oct 1995	A
5465368	Davidson et al.	Nov 1995	A
5475856	Kogge	Dec 1995	A
5479055	Eccles	Dec 1995	A
5490165	Blakeney, II et al.	Feb 1996	A
5491823	Ruttenberg	Feb 1996	A
5504891	Motoyama et al.	Apr 1996	A
5507009	Grube et al.	Apr 1996	A
5515519	Yoshioka et al.	May 1996	A
5517600	Shimokawa	May 1996	A
5519694	Brewer et al.	May 1996	A
5522070	Sumimoto	May 1996	A
5530964	Alpert et al.	Jun 1996	A
5534796	Edwards	Jul 1996	A
5542265	Rutland	Aug 1996	A
5553755	Bonewald et al.	Sep 1996	A
5555417	Odnert et al.	Sep 1996	A
5560028	Sachs et al.	Sep 1996	A
5560038	Haddock	Sep 1996	A
5570587	Kim	Nov 1996	A
5572572	Kawan et al.	Nov 1996	A
5590353	Sakakibara et al.	Dec 1996	A
5594657	Cantone et al.	Jan 1997	A
5600810	Ohkami	Feb 1997	A
5600844	Shaw et al.	Feb 1997	A
5602833	Zehavi	Feb 1997	A
5603043	Taylor et al.	Feb 1997	A
5607083	Vogel et al.	Mar 1997	A
5608643	Wichter et al.	Mar 1997	A
5611867	Cooper et al.	Mar 1997	A
5623545	Childs et al.	Apr 1997	A
5625669	McGregor et al.	Apr 1997	A
5626407	Westcott	May 1997	A
5630206	Urban et al.	May 1997	A
5635940	Hickman et al.	Jun 1997	A
5646544	Iadanza	Jul 1997	A
5646545	Trimberger et al.	Jul 1997	A
5647512	Assis Mascarenhas deOliveira et al.	Jul 1997	A
5667110	McCann et al.	Sep 1997	A
5684793	Kiema et al.	Nov 1997	A
5684980	Casselman	Nov 1997	A
5687236	Moskowitz et al.	Nov 1997	A
5694613	Suzuki	Dec 1997	A
5694794	Jerg et al.	Dec 1997	A
5699328	Ishizaki et al.	Dec 1997	A
5701398	Glier et al.	Dec 1997	A
5701482	Harrison et al.	Dec 1997	A
5704053	Santhanam	Dec 1997	A
5706191	Bassett et al.	Jan 1998	A
5706976	Purkey	Jan 1998	A
5712996	Schepers	Jan 1998	A
5720002	Wang	Feb 1998	A
5721693	Song	Feb 1998	A
5721854	Ebicioglu et al.	Feb 1998	A
5729754	Estes	Mar 1998	A
5732563	Bethuy et al.	Mar 1998	A
5734808	Takeda	Mar 1998	A
5737631	Trimberger	Apr 1998	A
5742180	DeHon et al.	Apr 1998	A
5742821	Prasanna	Apr 1998	A
5745366	Highma et al.	Apr 1998	A
RE35780	Hassell et al.	May 1998	E
5751295	Becklund et al.	May 1998	A
5754227	Fukuoka	May 1998	A
5758261	Weideman	May 1998	A
5768561	Wise	Jun 1998	A
5778439	Trimberger et al.	Jul 1998	A
5784636	Rupp	Jul 1998	A
5787237	Reilly	Jul 1998	A
5790817	Asghar et al.	Aug 1998	A
5791517	Avital	Aug 1998	A
5791523	Oh	Aug 1998	A
5794062	Baxter	Aug 1998	A
5794067	Kadowaki	Aug 1998	A
5802055	Krein et al.	Sep 1998	A
5812851	Levy et al.	Sep 1998	A
5818603	Motoyama	Oct 1998	A
5819255	Celis et al.	Oct 1998	A
5822308	Weigand et al.	Oct 1998	A
5822313	Malek et al.	Oct 1998	A
5822360	Lee et al.	Oct 1998	A
5828858	Athanas et al.	Oct 1998	A
5829085	Jerg et al.	Nov 1998	A
5835753	Witt	Nov 1998	A
5838165	Chatter	Nov 1998	A
5845815	Vogel	Dec 1998	A
5854929	Van Pract et al.	Dec 1998	A
5860021	Klingman	Jan 1999	A
5862961	Motta et al.	Jan 1999	A
5870427	Teidemann, Jr. et al.	Feb 1999	A
5873045	Lee et al.	Feb 1999	A
5881106	Cartier	Mar 1999	A
5884284	Peters et al.	Mar 1999	A
5886537	Macias et al.	Mar 1999	A
5887174	Simons et al.	Mar 1999	A
5889816	Agrawal et al.	Mar 1999	A
5889989	Robertazzi et al.	Mar 1999	A
5890014	Long	Mar 1999	A
5892900	Ginter et al.	Apr 1999	A
5892950	Rigori et al.	Apr 1999	A
5892961	Trimberger	Apr 1999	A
5892962	Cloutier	Apr 1999	A
5894473	Dent	Apr 1999	A
5901884	Goulet et al.	May 1999	A
5903886	Heimlich et al.	May 1999	A
5907285	Toms et al.	May 1999	A
5907580	Cummings	May 1999	A
5910733	Bertolet et al.	Jun 1999	A
5912572	Graf, III	Jun 1999	A
5913172	McCabe et al.	Jun 1999	A
5917852	Butterfield et al.	Jun 1999	A
5920801	Thomas et al.	Jul 1999	A
5931918	Row et al.	Aug 1999	A
5933642	Greenbaum et al.	Aug 1999	A
5940438	Poon et al.	Aug 1999	A
5949415	Lin et al.	Sep 1999	A
5950011	Albrecht et al.	Sep 1999	A
5950131	Vilmur	Sep 1999	A
5951674	Moreno	Sep 1999	A
5953322	Kimball	Sep 1999	A
5956518	DeHon et al.	Sep 1999	A
5956967	Kim	Sep 1999	A
5959811	Richardson	Sep 1999	A
5959881	Trimberger et al.	Sep 1999	A
5963048	Harrison et al.	Oct 1999	A
5966534	Cooke et al.	Oct 1999	A
5970254	Cooke et al.	Oct 1999	A
5987105	Jenkins et al.	Nov 1999	A
5987611	Freund	Nov 1999	A
5991302	Berl et al.	Nov 1999	A
5991308	Fuhrmann et al.	Nov 1999	A
5993739	Lyon	Nov 1999	A
5999734	Willis et al.	Dec 1999	A
6005943	Cohen et al.	Dec 1999	A
6006249	Leong	Dec 1999	A
6016395	Mohamed	Jan 2000	A
6021186	Suzuki et al.	Feb 2000	A
6021492	May	Feb 2000	A
6023742	Ebeling et al.	Feb 2000	A
6023755	Casselman	Feb 2000	A
6028610	Deering	Feb 2000	A
6036166	Olson	Mar 2000	A
6039219	Bach et al.	Mar 2000	A
6041322	Meng et al.	Mar 2000	A
6041970	Vogel	Mar 2000	A
6046603	New	Apr 2000	A
6047115	Mohan et al.	Apr 2000	A
6052600	Fette et al.	Apr 2000	A
6055314	Spies et al.	Apr 2000	A
6056194	Kolls	May 2000	A
6059840	Click, Jr.	May 2000	A
6061580	Altschul et al.	May 2000	A
6073132	Gehman	Jun 2000	A
6076174	Freund	Jun 2000	A
6078736	Guccione	Jun 2000	A
6085740	Ivri et al.	Jul 2000	A
6088043	Kelleher et al.	Jul 2000	A
6091263	New et al.	Jul 2000	A
6091765	Pietzold, III et al.	Jul 2000	A
6094065	Tavana et al.	Jul 2000	A
6094726	Gonion et al.	Jul 2000	A
6111893	Volftsun et al.	Aug 2000	A
6111935	Hughes-Hartogs	Aug 2000	A
6115751	Tam et al.	Sep 2000	A
6119178	Martin et al.	Sep 2000	A
6120551	Law et al.	Sep 2000	A
6122670	Bennett et al.	Sep 2000	A
6128307	Brown	Oct 2000	A
6134605	Hudson et al.	Oct 2000	A
6138693	Matz	Oct 2000	A
6141283	Bogin et al.	Oct 2000	A
6150838	Wittig et al.	Nov 2000	A
6154494	Sugahara et al.	Nov 2000	A
6157997	Oowaki et al.	Dec 2000	A
6158031	Mack et al.	Dec 2000	A
6173389	Pechanek et al.	Jan 2001	B1
6175854	Bretscher	Jan 2001	B1
6175892	Sazzad et al.	Jan 2001	B1
6181981	Varga et al.	Jan 2001	B1
6185418	MacLellan et al.	Feb 2001	B1
6192070	Poon et al.	Feb 2001	B1
6192255	Lewis et al.	Feb 2001	B1
6192388	Cajolet	Feb 2001	B1
6195788	Leaver et al.	Feb 2001	B1
6198924	Ishii et al.	Mar 2001	B1
6199181	Rechef et al.	Mar 2001	B1
6202130	Scales, III et al.	Mar 2001	B1
6202189	Hinedi et al.	Mar 2001	B1
6219697	Lawande et al.	Apr 2001	B1
6219756	Kasamizugami	Apr 2001	B1
6219780	Lipasti	Apr 2001	B1
6223222	Fijolek et al.	Apr 2001	B1
6226387	Tewfik et al.	May 2001	B1
6230307	Davis et al.	May 2001	B1
6237029	Master et al.	May 2001	B1
6246883	Lee	Jun 2001	B1
6247125	Noel-Baron et al.	Jun 2001	B1
6249251	Chang et al.	Jun 2001	B1
6258725	Lee et al.	Jul 2001	B1
6263057	Silverman	Jul 2001	B1
6266760	DeHon et al.	Jul 2001	B1
6272579	Lentz et al.	Aug 2001	B1
6272616	Fernando et al.	Aug 2001	B1
6281703	Furuta et al.	Aug 2001	B1
6282627	Wong et al.	Aug 2001	B1
6286134	Click, Jr. et al.	Sep 2001	B1
6289375	Knight et al.	Sep 2001	B1
6289434	Roy	Sep 2001	B1
6289488	Dave et al.	Sep 2001	B1
6292822	Hardwick	Sep 2001	B1
6292827	Raz	Sep 2001	B1
6292830	Taylor et al.	Sep 2001	B1
6292938	Sarkar et al.	Sep 2001	B1
6301653	Mohamed et al.	Oct 2001	B1
6305014	Roediger et al.	Oct 2001	B1
6311149	Ryan et al.	Oct 2001	B1
6321985	Kolls	Nov 2001	B1
6326806	Fallside et al.	Dec 2001	B1
6346824	New	Feb 2002	B1
6347346	Taylor	Feb 2002	B1
6349394	Brock et al.	Feb 2002	B1
6353841	Marshall et al.	Mar 2002	B1
6356994	Barry et al.	Mar 2002	B1
6359248	Mardi	Mar 2002	B1
6360256	Lim	Mar 2002	B1
6360259	Bradley	Mar 2002	B1
6360263	Kurtzberg et al.	Mar 2002	B1
6363411	Dugan et al.	Mar 2002	B1
6366999	Drabenstott et al.	Apr 2002	B1
6377983	Cohen et al.	Apr 2002	B1
6378072	Collins et al.	Apr 2002	B1
6381293	Lee et al.	Apr 2002	B1
6381735	Hunt	Apr 2002	B1
6385751	Wolf	May 2002	B1
6405214	Meade, II	Jun 2002	B1
6408039	Ito	Jun 2002	B1
6410941	Taylor et al.	Jun 2002	B1
6411612	Halford et al.	Jun 2002	B1
6421372	Bierly et al.	Jul 2002	B1
6421809	Wuytack et al.	Jul 2002	B1
6426649	Fu et al.	Jul 2002	B1
6430624	Jamtgaard et al.	Aug 2002	B1
6433578	Wasson	Aug 2002	B1
6434590	Blelloch et al.	Aug 2002	B1
6438737	Morelli et al.	Aug 2002	B1
6446258	McKinsey et al.	Sep 2002	B1
6449747	Wuytack et al.	Sep 2002	B2
6456996	Crawford, Jr. et al.	Sep 2002	B1
6459883	Subramanian et al.	Oct 2002	B2
6467009	Winegarden et al.	Oct 2002	B1
6469540	Nakaya	Oct 2002	B2
6473609	Schwartz et al.	Oct 2002	B1
6483343	Faith et al.	Nov 2002	B1
6484304	Ussery et al.	Nov 2002	B1
6507947	Schreiber et al.	Jan 2003	B1
6510138	Pannell	Jan 2003	B1
6510510	Garde	Jan 2003	B1
6526570	Click, Jr. et al.	Feb 2003	B1
6538470	Langhammer et al.	Mar 2003	B1
6556044	Langhammer et al.	Apr 2003	B2
6563891	Eriksson et al.	May 2003	B1
6570877	Kloth et al.	May 2003	B1
6577678	Scheuermann	Jun 2003	B2
6587684	Hsu et al.	Jul 2003	B1
6590415	Agrawal et al.	Jul 2003	B2
6601086	Howard et al.	Jul 2003	B1
6601158	Abbott et al.	Jul 2003	B1
6604085	Kolls	Aug 2003	B1
6604189	Zemlyak et al.	Aug 2003	B1
6606529	Crowder, Jr. et al.	Aug 2003	B1
6615333	Hoogerbrugge et al.	Sep 2003	B1
6618434	Heidari-Bateni et al.	Sep 2003	B2
6640304	Ginter et al.	Oct 2003	B2
6647429	Semal	Nov 2003	B1
6653859	Sihlbom et al.	Nov 2003	B2
6658564	Smith et al.	Dec 2003	B1
6675265	Barroso et al.	Jan 2004	B2
6675284	Warren	Jan 2004	B1
6691148	Zinky et al.	Feb 2004	B1
6694380	Wolrich et al.	Feb 2004	B1
6711617	Bantz et al.	Mar 2004	B1
6718182	Kung	Apr 2004	B1
6718541	Ostanevich et al.	Apr 2004	B2
6721286	Williams et al.	Apr 2004	B1
6721884	De Oliveira Kastrup Pereira et al.	Apr 2004	B1
6732354	Ebeling et al.	May 2004	B2
6735621	Yoakum et al.	May 2004	B1
6738744	Kirovski et al.	May 2004	B2
6748360	Pitman et al.	Jun 2004	B2
6751723	Kundu et al.	Jun 2004	B1
6754470	Hendrickson et al.	Jun 2004	B2
6760587	Holtzman et al.	Jul 2004	B2
6760833	Dowling	Jul 2004	B1
6766165	Sharma et al.	Jul 2004	B2
6778212	Deng et al.	Aug 2004	B1
6785341	Walton et al.	Aug 2004	B2
6795930	Laurenti et al.	Sep 2004	B1
6819140	Yamanaka et al.	Nov 2004	B2
6823448	Roth et al.	Nov 2004	B2
6826748	Hohensee et al.	Nov 2004	B1
6829633	Gelfer et al.	Dec 2004	B2
6832250	Coons et al.	Dec 2004	B1
6836839	Master et al.	Dec 2004	B2
6859434	Segal et al.	Feb 2005	B2
6865664	Budrovic et al.	Mar 2005	B2
6871236	Fishman et al.	Mar 2005	B2
6883084	Donohoe	Apr 2005	B1
6894996	Lee	May 2005	B2
6901440	Bimm et al.	May 2005	B1
6912515	Jackson et al.	Jun 2005	B2
6941336	Mar	Sep 2005	B1
6980515	Schunk et al.	Dec 2005	B1
6985517	Matsumoto et al.	Jan 2006	B2
6986021	Master et al.	Jan 2006	B2
6986142	Ehlig et al.	Jan 2006	B1
6988139	Jervis et al.	Jan 2006	B1
7032229	Flores et al.	Apr 2006	B1
7044741	Leem	May 2006	B2
7082456	Mani-Meitav et al.	Jul 2006	B2
7139910	Ainsworth et al.	Nov 2006	B1
7142731	Toi	Nov 2006	B1
7171548	Smith et al.	Jan 2007	B2
7249242	Ramchandran	Jul 2007	B2
7996827	Vorbach et al.	Aug 2011	B2
20010003191	Kovacs et al.	Jun 2001	A1
20010023482	Wray	Sep 2001	A1
20010029515	Mirsky	Oct 2001	A1
20010034795	Moulton et al.	Oct 2001	A1
20010039654	Miyamoto	Nov 2001	A1
20010048713	Medlock et al.	Dec 2001	A1
20010048714	Jha	Dec 2001	A1
20010050948	Ramberg et al.	Dec 2001	A1
20020010848	Kamano et al.	Jan 2002	A1
20020013799	Blaker	Jan 2002	A1
20020013937	Ostanevich et al.	Jan 2002	A1
20020015435	Rieken	Feb 2002	A1
20020015439	Kohli et al.	Feb 2002	A1
20020023210	Tuomenoksa et al.	Feb 2002	A1
20020024942	Tsuneki et al.	Feb 2002	A1
20020024993	Subramanian et al.	Feb 2002	A1
20020031166	Subramanian et al.	Mar 2002	A1
20020032551	Zakiya	Mar 2002	A1
20020035623	Lawande et al.	Mar 2002	A1
20020041581	Aramaki	Apr 2002	A1
20020042875	Shukla	Apr 2002	A1
20020042907	Yamanaka et al.	Apr 2002	A1
20020061741	Leung et al.	May 2002	A1
20020069282	Reisman	Jun 2002	A1
20020072830	Hunt	Jun 2002	A1
20020078337	Moreau et al.	Jun 2002	A1
20020083305	Renard et al.	Jun 2002	A1
20020083423	Ostanevich et al.	Jun 2002	A1
20020087829	Snyder et al.	Jul 2002	A1
20020089348	Langhammer	Jul 2002	A1
20020101909	Chen et al.	Aug 2002	A1
20020107905	Roe et al.	Aug 2002	A1
20020107962	Richter et al.	Aug 2002	A1
20020119803	Bitterlich et al.	Aug 2002	A1
20020120672	Butt et al.	Aug 2002	A1
20020133688	Lee et al.	Sep 2002	A1
20020138716	Master et al.	Sep 2002	A1
20020141489	Imaizumi	Oct 2002	A1
20020147845	Sanchez-Herrero et al.	Oct 2002	A1
20020159503	Ramachandran	Oct 2002	A1
20020162026	Neuman et al.	Oct 2002	A1
20020168018	Scheuermann	Nov 2002	A1
20020181559	Heidari-Bateni et al.	Dec 2002	A1
20020184275	Dutta et al.	Dec 2002	A1
20020184291	Hogenauer	Dec 2002	A1
20020184498	Qi	Dec 2002	A1
20020191790	Anand et al.	Dec 2002	A1
20030007606	Suder et al.	Jan 2003	A1
20030012270	Zhou et al.	Jan 2003	A1
20030018446	Makowski et al.	Jan 2003	A1
20030018700	Giroti et al.	Jan 2003	A1
20030023830	Hogenauer	Jan 2003	A1
20030026242	Jokinen et al.	Feb 2003	A1
20030030004	Dixon et al.	Feb 2003	A1
20030046421	Horvitz et al.	Mar 2003	A1
20030061260	Rajkumar	Mar 2003	A1
20030061311	Lo	Mar 2003	A1
20030063656	Rao et al.	Apr 2003	A1
20030074473	Pham et al.	Apr 2003	A1
20030076815	Miller et al.	Apr 2003	A1
20030099223	Chang et al.	May 2003	A1
20030102889	Master et al.	Jun 2003	A1
20030105949	Master et al.	Jun 2003	A1
20030110485	Lu et al.	Jun 2003	A1
20030142818	Raghunathan et al.	Jul 2003	A1
20030154357	Master et al.	Aug 2003	A1
20030163723	Kozuch et al.	Aug 2003	A1
20030171907	Gal-On et al.	Sep 2003	A1
20030172138	McCormack et al.	Sep 2003	A1
20030172139	Srinivasan et al.	Sep 2003	A1
20030200538	Ebeling et al.	Oct 2003	A1
20030212684	Meyer et al.	Nov 2003	A1
20030229864	Watkins	Dec 2003	A1
20040006584	Vandeweerd	Jan 2004	A1
20040010645	Scheuermann et al.	Jan 2004	A1
20040015970	Scheuermann	Jan 2004	A1
20040025159	Scheuermann et al.	Feb 2004	A1
20040057505	Valio	Mar 2004	A1
20040062300	McDonough et al.	Apr 2004	A1
20040081248	Parolari	Apr 2004	A1
20040093479	Ramchandran	May 2004	A1
20040168044	Ramchandran	Aug 2004	A1
20050044344	Stevens	Feb 2005	A1
20050160402	Wang et al.	Jul 2005	A1
20050166038	Wang et al.	Jul 2005	A1
20050198199	Dowling	Sep 2005	A1
20060031660	Master et al.	Feb 2006	A1

Foreign Referenced Citations (52)

Number	Date	Country
100 18 374	Oct 2001	DE
0 301 169	Feb 1989	EP
0 166 586	Jan 1991	EP
0 236 633	May 1991	EP
0 478 624	Apr 1992	EP
0 479 102	Apr 1992	EP
0 661 831	Jul 1995	EP
0 668 659	Aug 1995	EP
0 690 588	Jan 1996	EP
0 691 754	Jan 1996	EP
0 768 602	Apr 1997	EP
0 817 003	Jan 1998	EP
0 821 495	Jan 1998	EP
0 866 210	Sep 1998	EP
0 923 247	Jun 1999	EP
0 926 596	Jun 1999	EP
1 056 217	Nov 2000	EP
1 061 437	Dec 2000	EP
1 061 443	Dec 2000	EP
1 126 368	Aug 2001	EP
1 150 506	Oct 2001	EP
1 189 358	Mar 2002	EP
2 067 800	Jul 1981	GB
2 237 908	May 1991	GB
62-249456	Oct 1987	JP
63-147258	Jun 1988	JP
4-51546	Feb 1992	JP
7-064789	Mar 1995	JP
7066718	Mar 1995	JP
10233676	Sep 1998	JP
10254696	Sep 1998	JP
11296345	Oct 1999	JP
2000315731	Nov 2000	JP
2001-053703	Feb 2001	JP
WO 8905029	Jun 1989	WO
WO 8911443	Nov 1989	WO
WO 9100238	Jan 1991	WO
WO 9313603	Jul 1993	WO
WO 9511855	May 1995	WO
WO 9633558	Oct 1996	WO
WO 9832071	Jul 1998	WO
WO 9903776	Jan 1999	WO
WO 9921094	Apr 1999	WO
WO 9926860	Jun 1999	WO
WO 9965818	Dec 1999	WO
WO 0019311	Apr 2000	WO
WO 0065855	Nov 2000	WO
WO 0069073	Nov 2000	WO
WO 0111281	Feb 2001	WO
WO 0122235	Mar 2001	WO
WO 0176129	Oct 2001	WO
WO 0212978	Feb 2002	WO

Related Publications (1)

	Number	Date	Country
	20040093589 A1	May 2004	US

Profiling of software and circuit designs utilizing data operation analyses

Information

Patent Number

Date Filed

Date Issued

Inventors

Original Assignees

Examiners

Agents

CPC

US Classifications

Field of Search

US

International Classifications

Disclaimer

Term Extension

Abstract

Description

Claims

US Referenced Citations (527)

Foreign Referenced Citations (52)

Related Publications (1)