IBM ® is a registered trademark of International Business Machines Corporation, Armonk, N.Y., U.S.A. Other names used herein may be registered trademarks, trademarks or product names of International Business Machines Corporation or other companies.
1. Field of the Invention
Exemplary embodiments relate to a method, apparatus, and computer program produced for integrated circuit (IC) function verification, and more particularly, to integrating fairness, performance, and livelock assessment into multiple stages of logic development processes using a loop manager with comparative parallel looping.
2. Description of Background
In large scale, multi-processor cache coherent systems, there exists a significant potential for livelock scenarios in which forward progress in a system is impeded due to one or more processors in the system being unfairly locked out or starved. This lack of fairness can lead to machine check hard errors as one or more threads within the system are unable to complete an operation within a specified period of time. However, in other situations, this type of problem can lead to degraded performance that often goes undetected until the logic is integrated into system-level or performance testing.
In the past, timeout constraints were used in the simulation environment to detect fairness, performance, and livelock problems that would cause the first type of failure identified above (timeouts in the system). However, because of the fact that the number of commands that can be executed within a given simulation is limited and that all drivers in the environment will eventually stop issuing commands, the commands or drivers that are being unfairly locked out or starved will eventually complete and will very often not cause a timeout in simulation. Therefore, the use of timeout constraints did not identify fairness, performance, and livelock problems that caused timeouts in more complicated environments nor did the use of timeouts identify problems that would cause the second type of problem, degraded performance, in which timeouts do not always occur but overall system performance is adversely impacted. In addition, in the past, there has been no means of easily integrating livelock, performance, and fairness testing into simulation environments to effectively verify and tune the livelock prevention circuitry built into the hardware and ensure the threshold counters within this logic are initialized with proper values.
There is a need for the detection and elimination of these types of problems early in the logic development process during simulation and early hardware bringup. There is also a need for a means to effectively tune and verify livelock prevention circuitry in the logic design.
In accordance with exemplary embodiments, a method is provided for assessing fairness, performance, and livelock in a logic development process utilizing comparative parallel looping. Multiple loop macros are generated, where the multiple loop macros respectively correspond to multiple processor threads, and where the multiple loop macros are parallel comparative loop macros. The multiple processor threads for the multiple loop macros are executed, where the multiple processor threads are executed to access a common resource. A forward performance of each of the multiple processor threads is verified. The forward performance of each of the multiple processor threads is compared with each other. It is determined whether any of the multiple processor threads fails to meet a minimum loop count or a minimum loop time. It is determined whether any of the multiple processor threads exceeds a maximum loop count or a maximum loop time. It is recognized whether fairness is maintained during the execution of the multiple processor threads.
In accordance with exemplary embodiments, an apparatus is provided for assessing fairness, performance, and livelock in a logic development process utilizing comparative parallel looping. The apparatus includes memory, a processor functionally coupled to the memory, a manager configured to manage multiple loop macros, and multiple bus functional models, each respectively associated with the multiple loop macros. The multiple bus functional models respectively execute the multiple loop macros as the manager monitors the execution. The manager is configured to verify a forward performance of each of the multiple loop macros being executed by the plurality of bus functional models and to compare the forward performance of each of the plurality of loop macros with each other. The manager is configured to determine whether any of the multiple loop macros fails to meet a minimum loop count or a minimum loop time and to determine whether any of the multiple loop macros exceeds a maximum loop count or a maximum loop time. Also, the manager is configured to recognize whether fairness is maintained during the execution of the multiple loop macros.
In accordance with exemplary embodiment, a computer program product tangibly embodied on a computer readable medium is provided for assessing fairness, performance, and livelock in a logic development process utilizing comparative parallel looping. The computer program product including instructions for causing a computer to execute the above method.
Additional features and advantages are realized through the techniques of the present invention. Other embodiments and aspects of the invention are described in detail herein and are considered a part of the claimed invention. For a better understanding of the invention with advantages and features, refer to the description and to the drawings.
The subject matter which is regarded as the invention is particularly pointed out and distinctly claimed in the claims at the conclusion of the specification. The foregoing and other features of exemplary embodiments are apparent from the following detailed description taken in conjunction with the accompanying drawings in which:
The detailed description explains exemplary embodiments of the invention, together with advantages and features, by way of example with reference to the drawings.
Exemplary embodiments provide a mechanism for integrating fairness, performance, and livelock assessment into multiple stages of the logic development process. Rather than using timeout constraints, as was used in the past, exemplary embodiments may use a loop manager with comparative parallel looping within the simulation and bringup environments to both identify and eliminate fairness, performance, and livelock problems as well as to tune and verify livelock circuitry in the hardware to more effectively address these problems. The advantages of this approach include (1) being able to detect fairness, performance, and livelock problems that would lead to degraded system performance and/or timeouts early in the logic development process and (2) being able to verify and tune livelock prevention circuitry in the hardware early in the logic development process. A further advantage provided by this disclosure is a mechanism whereby the detection and verification can be easily integrated into a simulation testcase language, parameter file, or software exerciser used in, e.g., lab bringup.
In accordance with exemplary embodiments, parallel comparative loop macros may be used to specify a sequence of commands that can be executed by a specific processor for a specified number of times, for a specified number of clock cycles, or until a specified trigger event occurs. The loop macro is integrated into the simulation stimulus language or random parameter definitions and can be introduced into the simulation environment through the use of a testcase port or parameter file in accordance with exemplary embodiments. The loop macro may also be incorporated into a lab bringup software exerciser as a subroutine or function call. A central loop manager manages all loop macros in the environment. Each bus functional model (BFM) that is capable of interpreting and generating loop macros is registered with the central loop manager at the beginning of simulation. In addition, each loop macro is registered with the central loop manager in the simulation environment and can be specified with a group loop ID to enable the loop macro to be compared with other loop macros specified with the same group loop ID. The central loop manager manages all loop macros in the environment and performs relative comparisons between loops assigned with a common GROUP_LOOP_ID. The GROUP_LOOP_ID is defined in the loop parameter syntax illustrated in TABLES 1, 2, 3, 4A, and 4B of
The loop macro 100 includes a list of commands 130 (PROC_1, PROC_2, PROC_n) that will be executed as part of the loop (i.e., executing reads/writes/requests for cacheline ownership, etc.). Each one of the lists 130 (PROC_1, PROC_2, PROC_n) includes a list of commands to execute for a single processor. The LOOP_END identifier 140 closes the loop and defines the end of the loop macro 100. For example, for loop macros that are generated from testcase files 120, the LOOP_END identifier 140 may be supplied in the testcase file 120 and will indicate the end of the loop macro. For loop macros that are generated from random parameter files 115, the loop is closed according to the loop parameters used to construct the loop (e.g., supplied in the parameter file 115). For example, if the random parameter file 120 indicated that a loop with 15 instructions was to be generated, the loop would close after the generation of the 15th instruction.
As non-limiting examples of commands that may be utilized in the commands 130,
As illustrated in TABLES 1-4B, the loop macros can be time-based, count-based, or trigger event-based loop macros (see description of LOOP_TYPE in TABLE 1). In addition, by creating separate loops for all processor threads in the simulation environment and using the MIN_LOOP_COUNT, MAX_LOOP_COUNT, MIN_LOOP_TIME, and MAX_LOOP_TIME parameters, the forward progress of each processor thread can be verified. For example, if processor loops are set up for each processor thread as shown in
As illustrated in
When one of the loop macros with a specific GROUP_LOOP_ID reaches the GLOBAL_EXIT_NUM_ITER value, the loop manager 300 can verify whether all other loops with the same GROUP_LOOP_ID have reached the MIN_LOOP_COUNT number of iterations, and the loop manager 300 can optionally direct all loops with this GROUP_LOOP_ID to terminate at the completion of the current iteration. A benefit of having the loop manager 300 configured to terminate all loops at the current iteration is that relative performance comparisons would only be valid as long as all processor threads were executing and competing for the same shared resource. For example, if the loops were configured to assess relative performance and fairness, the point at which one of the loops has reached a maximum loop count (and therefore has stopped execution) would be the point at which this relative evaluation could be made. After this point, the system would no longer include maximum contention with respect to the shared resources and there would be no need to continue the execution of the remaining loops.
The loop manager 300 may also use the MAX_ITER_GAP parameter to indicate the maximum discrepancy between current total iterations that can exist between all loops with a common Group ID. The MAX_ITER_GAP parameter may be checked each time a loop macro in the group completes or as each loop macro for BFM 310 and BFM 320 informs the loop manager 300 that an iteration has completed. For example, an iteration may be defined as the completion of one loop count.
The loop manager 300 monitors all loops with common Group IDs and verifies fairness by ensuring that the number of iterations processed by each loop does not vary more than the MAX_ITER_GAP value. This enables livelock, fairness, and performance assessment to be performed on the fly for randomly generated environments.
The loop macros are executed by the respective BFMs at 415. The MAX_ITER_GAP parameter may be checked each time a loop macro in the group completes and/or as each loop macro informs the loop manager 300 that an iteration has completed at 420. The loop manager 300 monitors all loops with common Group IDs and verifies fairness by ensuring that the number of iterations processed by each loop does not vary more than the MAX_ITER_GAP value at 425. The loop manager 300 may determine whether one of the loops with the specific GROUP_LOOP_ID has reached the GLOBAL_EXIT_NUM_ITER value (or any other predefined parameter) at 430. If the loop macros have not reached the GLOBAL_EXIT_NUM_ITER value, the process returns to operation 415. In response to one of the loops with a specific GROUP_LOOP_ID reaching the GLOBAL_EXIT_NUM_ITER value (or any other predefined parameter), the loop manager 300 may verify whether all other loops with the same GROUP_LOOP_ID have reached the MIN_LOOP_COUNT number of iterations or have satisfied other loop criteria (MAX_LOOP_COUNT, MIN_LOOP_TIME, etc.) at 435. The loop manager 300 may direct all loops with this Loop ID to terminate at the completion of the current iteration at 440.
The loop manager 300 may also use the MAX_ITER_GAP parameter to indicate the maximum discrepancy between current total iterations that can exist between all loops with a common Group ID. Further, in response to one of the loop macros meeting a specific constraint (such as GLOBAL_EXIT_NUM_ITER), the loop manager 300 can stop the other loops. As discussed herein, this enables livelock, fairness, and performance assessment to be performed on the fly for randomly generated environments.
After all associated loop macros have quiesced, the loop manager 300 will inform the BFM that is requesting a HOT_PLUG operation that it can proceed with the HOT_PLUG command at 530. Following the execution of the HOT_PLUG operation, the BFM will re-register with the loop manager 300 at 535, and the loop manager 300 will inform all BFMs having associated loop macros to continue execution at 540. Exemplary embodiments allow for the HOT_PLUG process, which is a common system requirement in which a system can be temporarily quiesced and reconfigured with different memory or operative characteristics. Measuring relative fairness and livelock in the midst of this type of operation is provided in accordance with exemplary.
As shown by
The loop macro 620 functions of the loop command operation 630 may include among others: loop initialize functions to initialize the parameters of the loop, loop update functions to update the loop manager 610 as the loop progresses, loop abort functions to prematurely terminate the loop, loop start/stop functions to start and stop the loop, loop pattern functions to set up the type of loop pattern to follow, and a loop error handler to handle errors during loop operation.
The loop manager 630 contains the loop queue 640 or list of active loops in the simulation environment as well as associated functions for managing this loop queue 640. The associated functions for managing this loop queue 640 may include among others: register/unregister functions to both register BFMs and loops macros (such as loop macro 620) associated with BFMs, lock/unlock functions to allow BFMs to have sole access to specific resources (such as the resource 330) or addresses in the environment for a period of time, query functions to query the status of active loops, and management functions to manage the execution (e.g., quiesce, abort, generate) of loops in the environment.
In accordance with exemplary embodiments, by integrating this looping mechanism within the simulation environment through the use of parameter files or testcase files as shown in
In addition, by integrating the looping mechanism and loop manager described herein within a subroutine of a software exerciser during lab bringup, livelock verification can be performed quickly and fairness and performance can be assessed early.
The plurality of processor threads for the plurality of loop macros are executed, where the plurality of processor threads are executed to access a common resource at 1210. A forward performance of each of the plurality of processor threads is verified at 1220. The forward performance of each of the plurality of processor threads is compared with each other at 1230. It is determined whether any of the plurality of processor threads fails to meet a minimum loop count or a minimum loop time at 1240. It is determined whether any of the plurality of processor threads exceeds a maximum loop count or a maximum loop time at 1250. It is recognized whether fairness is maintained during the execution of the plurality of processor threads at 1260.
A loop manager directs a plurality of bus functional models to respectively generate the plurality of loop macros in accordance with predefined parameters. Also, the plurality of loop macros may be generated by receiving an input of the plurality of loop macros.
Generally, in terms of hardware architecture, the computer 1300 may include one or more processors 1310, memory 1320, and one or more input and/or output (I/O) devices 1370 that are communicatively coupled via a local interface (not shown). The local interface can be, for example but not limited to, one or more buses or other wired or wireless connections, as is known in the art. The local interface may have additional elements, such as controllers, buffers (caches), drivers, repeaters, and receivers, to enable communications. Further, the local interface may include address, control, and/or data connections to enable appropriate communications among the aforementioned components.
The processor 1310 is a hardware device for executing software that can be stored in the memory 1320. The processor 1310 can be virtually any custom made or commercially available processor, a central processing unit (CPU), a data signal processor (DSP), or an auxiliary processor among several processors associated with the computer 1300, and the processor 1310 may be a semiconductor based microprocessor (in the form of a microchip) or a macroprocessor.
The memory 1320 can include any one or combination of volatile memory elements (e.g., random access memory (RAM), such as dynamic random access memory (DRAM), static random access memory (SRAM), etc.) and nonvolatile memory elements (e.g., ROM, erasable programmable read only memory (EPROM), electronically erasable programmable read only memory (EEPROM), programmable read only memory (PROM), tape, compact disc read only memory (CD-ROM), disk, diskette, cartridge, cassette or the like, etc.). Moreover, the memory 1320 may incorporate electronic, magnetic, optical, and/or other types of storage media. Note that the memory 1320 can have a distributed architecture, where various components are situated remote from one another, but can be accessed by the processor 1310.
The software in the memory 1320 may include one or more separate programs, each of which comprises an ordered listing of executable instructions for implementing logical functions. The software in the memory 1320 includes a suitable operating system (O/S) 1350, compiler 1340, source code 1330, and an application 1360 (which may be one or more applications) of the exemplary embodiments. As illustrated, the application 1360 comprises numerous functional components for implementing the features and operations of the exemplary embodiments. The application 1360 of the computer 1300 may represent various applications (loop macros or processor threads), but the application 1360 is not meant to be a limitation.
The operating system 1350 may control the execution of other computer programs, and provides scheduling, input-output control, file and data management, memory management, and communication control and related services.
The application 1360 may be a source program, executable program (object code), script, or any other entity comprising a set of instructions to be performed. When a source program, then the program is usually translated via a compiler (such as the compiler 1340), assembler, interpreter, or the like, which may or may not be included within the memory 1320, so as to operate properly in connection with the O/S 1350. Furthermore, the application 1360 can be written as (a) an object oriented programming language, which has classes of data and methods, or (b) a procedure programming language, which has routines, subroutines, and/or functions, for example but not limited to, C, C++, C#, Pascal, BASIC, API calls, HTML, XHTML, XML, ASP scripts, FORTRAN, COBOL, Perl, Java, ADA, NET, and the like.
The I/O devices 1370 may include input devices such as, for example but not limited to, a mouse, keyboard, scanner, microphone, camera, etc. Furthermore, the I/O devices 1370 may also include output devices, for example but not limited to, a printer, display, etc. Finally, the I/O devices 1370 may further include devices that communicate both inputs and outputs, for instance but not limited to, a NIC or modulator/demodulator (for accessing remote devices, other files, devices, systems, or a network), a radio frequency (RF) or other transceiver, a telephonic interface, a bridge, a router, etc. The I/O devices 1370 also include components for communicating over various networks, such as the Internet or an intranet.
When the computer 1300 is in operation, the processor 1310 is configured to execute software stored within the memory 1320, to communicate data to and from the memory 1320, and to generally control operations of the computer 1300 pursuant to the software. The application 1360 and the O/S 1350 are read, in whole or in part, by the processor 1310, perhaps buffered within the processor 1310, and then executed.
When the application 1360 is implemented in software it should be noted that the application 1360 can be stored on virtually any computer readable medium for use by or in connection with any computer related system or method. In the context of this document, a computer readable medium may be an electronic, magnetic, optical, or other physical device or means that can contain or store a computer program for use by or in connection with a computer related system or method.
The application 1360 can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. In the context of this document, a “computer-readable medium” can be any means that can store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. The computer readable medium can be, for example but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, device, or propagation medium.
More specific examples (a nonexhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic) having one or more wires, a portable computer diskette (magnetic or optical), a random access memory (RAM) (electronic), a read-only memory (ROM) (electronic), an erasable programmable read-only memory (EPROM, EEPROM, or Flash memory) (electronic), an optical fiber (optical), and a portable compact disc memory (CDROM, CD R/W) (optical). Note that the computer-readable medium could even be paper or another suitable medium, upon which the program is printed or punched, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.
In exemplary embodiments, where the application 1360 is implemented in hardware, the application 1360 can be implemented with any one or a combination of the following technologies, which are each well known in the art: a discrete logic circuit(s) having logic gates for implementing logic functions upon data signals, an application specific integrated circuit (ASIC) having appropriate combinational logic gates, a programmable gate array(s) (PGA), a field programmable gate array (FPGA), etc.
It is understood that the computer 1300 includes non-limiting examples of software and hardware components that may be included in various devices and systems discussed herein, and it is understood that additional software and hardware components may be included in the various devices and systems discussed in exemplary embodiments.
The capabilities of the present invention can be implemented in software, firmware, hardware or some combination thereof.
As one example, one or more aspects of the present invention can be included in an article of manufacture (e.g., one or more computer program products) having, for instance, computer usable media. The media has embodied therein, for instance, computer readable program code means for providing and facilitating the capabilities of the present invention. The article of manufacture can be included as a part of a computer system or sold separately.
Additionally, at least one program storage device readable by a machine, tangibly embodying at least one program of instructions executable by the machine to perform the capabilities of the present invention can be provided.
The flow diagrams depicted herein are just examples. There may be many variations to these diagrams or the steps (or operations) described therein without departing from the spirit of the invention. For instance, the steps may be performed in a differing order, or steps may be added, deleted or modified. All of these variations are considered a part of the claimed invention.
While exemplary embodiments to the invention have been described, it will be understood that those skilled in the art, both now and in the future, may make various improvements and enhancements which fall within the scope of the claims which follow. These claims should be construed to maintain the proper protection for the invention first described.