This application is based upon and claims the benefit of priority from the prior Japanese Patent Application No. 2005-114775, filed on Apr. 12, 2005, the entire contents of which are incorporated herein by reference.
1. Field of the Invention
The present invention relates to a multicore model simulator.
2. Description of the Related Art
Recently, CPUs have been shifted to multicore designs in built-in processors as in general-purpose CPUs (central processing units) for personal computers. In order to reduce a development period of system LSI which is more and more complicated, it is important to perform co-design of hardware and software from an early stage of design. However, with the existing simulators, multicores cannot be handled, in addition to which, a sufficient simulation speed cannot be obtained. Development of high-speed software/hardware cooperative simulator becomes a challenge.
When a multi-master of multicores or the like is simulated with an instruction level simulator (ISS: Instruction Set Simulator), simulation time is increased to be as long as the time multiplied by the number of cores as compared with simulation time of one core. For example, when a processor of the configuration of N of multicores is simulated, it is assumed that the programs executed by the individual processors are the same. When the time taken for one core to execute the program is H seconds, the simulation time of the simulator is N×H seconds because the simulation time is the total time of N of the processors.
Besides, Japanese Patent Application Laid-open No. 2001-318805 describes a test method and a test system for verifying software of a built-in system by building a simulator that simulates a hardware configuration of the built-in system on a computer and by using the simulator.
Besides, Japanese patent Application No. 2004-110812 describes a method and a structure which are capable of effectively mapping memory/address designation of a certain multiprocessing system when emulating by using a virtual memory address designation of another multiprocessing system.
At present, a simulator is used for logic verification of expected value generation and the like in LSI design, and is used for program development of application and the like simultaneously with this. Besides, a simulator is used for a simulator for program development of an end user. Since a simulator of which simulation speed is high contributes to reduction in development period significantly, high-speed simulators are required.
An object of the present invention is to realize a high-speed multicore model simulator.
According to one aspect of the present invention, a multicroe model simulator having a plurality of threads, and a plurality of core models that execute the aforesaid plurality of threads is provided.
For example, when the programs executed on the individual processor core models PE0 and PE1 are the same, the processing performance of one processor core model at this time is set as Z[MIPS]. When the time taken for one program core model to execute the program is set as H seconds, all the processing can be finished in the H seconds which is the processing time of one processor core model, because N of the processor core models perform execution in parallel. The processing performance of the multiprocessor core model simulator is Z×N. This embodiment is effective in the computer with the simulation execution environment of the multiprocessor cores 901 and 902 as shown in
The processor core models PE0 and PE1 are synchronized SNC with each other every predetermined number of execution instructions (the number of run steps) of the threads 102 and 103. Note that they may be synchronized SNC every predetermined number of cycles.
Next, the reason for maintaining synchronization between the processor core models PE0 and PE1 will be described. The multiprocessor cores 901 and 901 which are each constructed by an actual one chip are synchronized at the same clock intervals or at the intervals of a constant multiple of a clock. When an interrupt occurs in the processor cores 901 and 902 in a certain timing, processing differs depending on where in the program under execution the interrupt occurs. If the multiprocessor core model cannot correctly realize a time base and the number of instruction steps, the multiprocessor core model cannot debug a user program executed on the processor core models PE0 and PE1. Besides, if the multiprocessor core model cannot count the number of cycles, it cannot perform accurate simulation. As a result, synchronization between the processor cores or between a master and a slave is a very important technique for a multiprocessor core model. In the first place, a multithread program is not generally synchronized between threads. Namely, it is asynchronous. Threads are primarily asynchronous as described above, and the structure of synchronizing them by using algorism called barrier in
In
Next, the processor core model PE0 executes the instructions of a predetermined number I1 of execution instructions of the thread 102, and the processor core model PE1 executes the instructions of a predetermined number of execution instructions 11. For example, when the processor core model PE0 terminates the execution earlier at a timing t2, it goes into the waiting state by the wait function 501. Next, when the processor core model PE1 terminates the execution at a timing t3, it goes into the waiting state by the wait function 501.
When the processor core models PE0 and PE1 finish the execution, the wait function 501 synchronizes the processor core models PE0 and PE1 to awaken both of them from the waiting state. The processor core model PE0 executes the following instructions of the predetermined number I1 of execution instructions of the thread 102, and the processor core model PE1 executes the following instructions of the predetermined number I1 of execution instructions of the thread 103.
Next, for example, when the processor core model PE1 terminates the execution earlier, it goes into the waiting sate by the wait function 501. Next, when the processor model PE0 terminates the execution at a timing t6, it goes into the waiting state by the wait function 501.
When the processor core models PE0 and PE1 terminate execution, the wait function 501 synchronizes the processor core models PE0 and PE1 at a timing t7, and awakens both of them from the waiting state. The processor core model PE0 executes the following instructions of the thread 102, and the processor core model PE1 executes the following instructions of the thread 103.
Next, when the processor core models PE0 and PE1 terminate execution of all the instructions at a timing t8, the main thread 301 awakes from the sleep state, and returns to the processing of the main thread 301.
When either the processor core model PE0 or PE1 finishes the processing of the predetermined number of execution instructions first, it goes into the waiting state, and the other one of the processor core model PE0 or PE1 releases it. As a result, in the case of three or more processor core models, synchronization can be maintained by the same operation. When traced, the processor core models PE0 and PE1 completely perform parallel operations. By this structure of synchronization, the structure of synchronizing every predetermined number of execution instructions can be realized.
The main thread 401 generates the threads 104a and 105a at the first loop processing, the processor core model PE0 executes the thread 104a and the processor core model PE1 executes the thread 105a. The threads 104a and 105a are executed in parallel, and when execution of both of them terminates, the process returns to the processing of the main thread 401. By the processing of the main thread 401, synchronization SNC is achieved when the above described processing of the threads executed in parallel terminates.
Next, the main thread 401 generates the threads 104b and 105b by the second loop processing, the processor core model PE0 executes the thread 104b, and the processor core model PE1 executes the thread 105b. The threads 104b and 105b are executed in parallel, and when execution of both of them terminates, the process returns to the processing of the main thread 401, and the synchronization SNC is performed.
Next, the main thread 401 generates the threads 104c and 105c by the third loop processing, the processor core model PE0 executes the thread 104c, and the processor core model PE1 executes the thread 105c. The threads 104c and 105c are executed in parallel, and when execution of both of them terminates, the process returns to the processing of the main thread 401, and the synchronization SNC is performed.
When the same processing is repeated thereafter and the processing of the last ones of the thread groups 104 and 105 is performed, the loop processing of the main thread 401 terminates. As described above, the processor core model PE0 executes the thread group 104, and the processor core model PE1 executes the thread group 105. A plurality of processor core models PE0 and PE1 execute a plurality of thread groups 104 and 105 in parallel. As a result, speeding up of simulation of the simulator can be realized. In this embodiment, N of processor core models perform execution in parallel as in the first embodiment, and therefore, all the processing can be finished in the H seconds which is the processing time corresponding to one processor core model. The processing performance of the multiprocessor core simulator is Z×N.
As described above, in this embodiment, a thread is made every predetermined number I1 of execution instructions in each of the processor core models PE0 and PE1, and the processor core models PE0 and PE1 are synchronized every predetermined number I1 of execution instructions. A plurality of multiprocessor core models PE0 and PE1 are synchronized in the main thread 401 every thread executed in parallel. The main thread 401 generates a thread of each of the processor core models PE0 and PE1. The generation unit is a set of instructions of the predetermined number I1 of execution instructions of each of the processor core models PE0 and PE1. By executing instructions serially by this unit, synchronization is enabled in the main thread 401. The predetermined number I1 of execution instructions may be one or more since it is a parameter. When such synchronization is adopted, the advantage that debug of a program is easy is provided.
This embodiment shows a simulator example of SoC (System-on-Chip) having the hardware model HW which becomes a master other than the processor core models, and by maintaining the synchronization SNC every predetermined number I1 of execution instructions, the multiprocessor core model and the operation model of SoC can be realized.
As described above, according to the first to the third embodiments, the multicore model simulator having a plurality of core models which execute a plurality of threads is provided. In the first and second embodiments, the plurality of core models are plurality of processor core models. Besides, in this embodiment, the plurality of core models include both the processor core model and hardware core model. Besides, the plurality of core models may be a plurality of hardware core models.
In this embodiment, the example in which the hardware model HW is applied to the second embodiment is explained, but the hardware model HW can be similarly applied to the first embodiment.
Debug is processing of finding and eliminating an error (bug) of a computer program. The debugger 701 is software (computer program) which aids operations of finding a bug and correcting it, and can monitor an internal state thereof by stopping it by an optional execution instruction of each of the processor core models PE0 to PEN, or by stopping execution by an instruction unit, for example.
The debugger 701 of this embodiment can be applied to the second and the third embodiments in addition to the first embodiment.
The multi-debugger 801 of this embodiment can be applied to the second and the third embodiments in addition to the first embodiment.
As described above, in the first to the fifth embodiments, the multithreaded multicore model simulator can be realized. Single processors are the mainstream in the computer (computing machine) environment so far, and simulators of multiprocessor core models executed in this environment are single-thread. Multithreaded multicore model simulators have not bee developed since they are difficult to develop. The first to the fifth embodiments adopt the synchronous control method which facilitates development, and therefore, multithreaded (parallel programming) multicore model simulators (including SoC simulators) can be realized. End users will also be under the situation in which they can ordinarily use the multiprocessor personal computer environment in the near future, and therefore, superiority of the above described embodiments is high.
The above described embodiments are high-speed simulators which can simulate built-in type multiprocessors. The simulators are capable of high-speed simulation of multiprocessors according to the basic principle that one core is operated by a unit of one thread. However, in order to provide the performance of the simulators, the computer environment of the multi-CPU capable of executing multithreads has to be utilized. Under the condition that the number of threads which can be executed in parallel is the number of CPUs of the multiprocessor model or more, high-speed execution is possible.
Core models have to be synchronized. When one core model executes one thread, it is necessary to synchronize the threads of the core models. In the first embodiment, the threaded core models are synchronized by utilizing the barrier model. In the second embodiment, synchronization between the core models is controlled in the main thread.
In realization of core model simulators of a processor and SoC, synchronization matters. In the above described embodiments, the mechanism of synchronization of the master block and/or the slave block of the processor core model, hardware core model and the like and high-speed simulation using it are provided. Thereby, multiprogramming becomes possible on the simulator, and the simulators of the multicore and the multi-master/slave which adopt the multiprocessor and SoC can be realized.
When the multiprocessor or the SoC model are executed by the conventional single thread program, the processing time which is taken for the processing of total number of core models or hardware core models which become the master serially connected is required. According to the above described embodiment, the core model and the hardware core model are arranged in parallel, and the processing performance which does not depend on the number of them can be realized. In this case, the mechanism of synchronization is very difficult in the parallel program, but in the second embodiment, the mechanism of synchronization is coded in the main thread, and thereby, synchronization is enabled by the code. Thereby, synchronization is also enabled with the debug of the program running on the core model. Since each core model performs execution in parallel, the processing speed becomes a multiple of the number of core models, and the performance of the multiple of the number of core models can be executed as compared with the prior art.
Since the simulator is capable of high-speed execution, it has the use purpose such as architecture specification study, logical verification of expected value generation and the like, firm development and the like, and the simulator significantly contributes to reduction in system LSI development period. Besides, this simulator can be applied to simulation of system LSI including a multiprocessor.
This embodiment can be realized by execution of a program by a computer. Means for supplying the program to a computer, for example, a computer readable recording medium such as a CD-ROM or the like recording such a program, or a transmission medium such as the Internet and the like which transmits such a program can be also applied as the embodiment of the present invention. Besides, a computer program product such as a computer readable recording medium and the like recording the above described program can be also applied as the embodiment of the present invention. The above described program, recording medium, transmission medium and the computer program product are included in the scope of the present invention. As the recording medium, for example, flexible disk, a hard disk, an optical disk, an optical magnetic disk, a CD-ROM, a magnetic tape, a nonvolatile memory card, a ROM and the like can be used.
A plurality of core models execute a plurality of threads, and thereby, the high-speed multicore model simulator can be realized.
The present embodiments are to be considered in all respects as illustrative and no restrictive, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. The invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof.
Number | Date | Country | Kind |
---|---|---|---|
2005-114775 | Apr 2005 | JP | national |