The present invention relates to a simulation apparatus that performs a simulation.
There is a simulation apparatus that performs a simulation to develop and verify a system composed of hardware (hereinafter referred to as HW) including a plurality of central processing units (hereinafter referred to as multiple CPUs) or a plurality of cores (hereinafter referred to as multiple cores) and software (hereinafter referred to as SW) that runs on the HW. The simulation apparatus concurrently operates a HW model which describes the HW on a system to be verified (hereinafter referred to as a target system) in a C-based system level design language and a target code which is the SW running on the multiple CPUs or cores to be a target, thereby verifying the operation.
The CPU core 0 model 6001 in the simulation apparatus 6000 is implemented by an instruction set simulator (hereinafter referred to as an ISS), which executes the target code. The ISS executes the target code by converting the target code into an instruction code (hereinafter referred to as a host code) for a host machine or host CPU on which the simulation apparatus 6000 operates.
The simulation apparatus 6000 executes the target code of the CPU core 0 model 6001 and the CPU core 1 model 6002 while synchronizing the two. The simulation apparatus 6000 synchronizes the CPU core 0 model 6001 and the CPU core 1 model 6002 on the basis of time on the host machine or host CPU (such time will be hereinafter referred to as host CPU time). The simulation apparatus 6000 switches execution of the target code between the CPU core 0 model 6001 and the CPU core 1 model 6002 when a certain period of the host CPU time elapses.
When the run time of the simulation apparatus 6000 is viewed in terms of the host CPU time, the CPU core 0 model 6001 and the CPU core 1 model 6002 each execute processing until a certain period of time elapses in the host CPU time, whereby the processing times of the CPU core 0 model 6001 and the CPU core 1 model 6002 appear to be equal to each other.
On the other hand, in terms of the target CPU time, the target CPU times of the CPU core 0 model 6001 and the CPU core 1 model 6002 are not necessarily equal due to the operating frequency, instruction execution cycle, and processing details of each of the CPU core 0 model 6001 and the CPU core 1 model 6002. The CPU core 0 model 6001 and the CPU core 1 model 6002 are thus synchronized at a timing different from that of the target system including the multiple cores or multiple CPUs, thereby causing a problem that the operation of the target system cannot be simulated accurately.
Patent Literature 1 discloses a technique for performing a simulation on a plurality of CPU core models that executes a plurality of threads while synchronizing the plurality of CPU core models. Patent Literature 1 synchronizes the plurality of CPU core models by concurrently executing the plurality of CPU core models while allocating one thread to each of the plurality of CPU core models, and causing the CPU core models to shift to a standby state after execution of a predetermined number of execution instructions in the thread. As described above, Patent Literature 1 runs one thread in one CPU core model and shifts the CPU core model to the standby state after execution of the specific number of execution instructions to synchronize the CPU core models with the instructions of the target CPU, thereby solving the above problem and simulating the target system including the multiple cores.
Patent Literature 1: JP 4717492 B2
In the conventional technique described in Patent Literature 1, the plurality of CPU core models performs synchronization thereamong by shifting to the standby state after executing the common number of instructions. The conventional technique is thus inapplicable to a case where the cores execute different numbers of instructions by different time units, a case where the cores are run with different operating frequencies, and the like, thereby having a problem that the technique cannot be applied to a model that runs the cores with different execution accuracies. That is, the conventional technique has a problem of difficulty in maintaining accuracy of synchronization among the multiple CPUs or multiple cores and performing an accurate performance evaluation while executing the multiple CPUs or multiple cores with different execution accuracies.
The present invention has been made in order to solve the above problems, and an object of the present invention is to provide a multi-core simulation apparatus that ensures accuracy of synchronization among multiple CPUs or multiple cores and performs an accurate performance evaluation even when executing the multiple CPUs or multiple cores with different execution accuracies in simulating a system including the multiple CPUs or multiple cores.
A simulation apparatus of a multi-core model according to the present invention includes: a plurality of processor core models to each execute an instruction being input; a processing time calculator to calculate time at which each of the plurality of processor core models executes the instruction as processing time; a scheduler to select a processor core model to be executed next from among the plurality of processor core models on the basis of the processing time calculated by the processing time calculator; and an overall time holding unit to hold processing time of the entire simulation apparatus determined from the processing time calculated by the processing time calculator, wherein the processor core model selected by the scheduler executes a next instruction in accordance with a direction from the scheduler.
The simulation apparatus according to the present invention can maintain the accuracy of synchronization among the multiple CPUs or multiple cores and perform an accurate performance evaluation while executing the multiple CPUs or multiple cores with different execution accuracies in simulating the system including the multiple CPUs or multiple cores.
A simulation apparatus 1000 illustrated in
Moreover, the CPU model 2000 includes a CPU core 0 model 2001, a CPU core 1 model 2002, an instruction memory model 2003, a processing time calculator 2004, a processing time calculator 2005, and a scheduler 2006. The CPU core 0 model 2001 and the CPU core 1 model 2002 are each a functional model simulating a CPU core in a target system. The instruction memory model 2003 is a functional model storing the program 4000 of a SW model. The processing time calculator 2004 and the processing time calculator 2005 calculate processing times of the CPU core 0 model 2001 and the CPU core 1 model 2002, respectively. The scheduler 2006 controls execution of the CPU core 0 model 2001 and the CPU core 1 model 2002 from the processing times calculated by the processing time calculator 2004 and the processing time calculator 2005.
Moreover, the CPU core 0 model 2001 includes an instruction execution unit 101 and an instruction input controller 102, while the CPU core 1 model 2002 includes an instruction execution unit 201 and an instruction input controller 202. The instruction input controllers 102 and 202 control input of instructions executed by the CPU core 0 model 2001 and the CPU core 1 model 2002 on the basis of the settings of the execution accuracy setting 0 2100 and the execution accuracy setting 1 2200, respectively. The instruction execution units 101 and 201 execute instructions input from the instruction input controllers 102 and 202.
Moreover, the HW model 2400 includes a CPU bus model 2401, a memory model 2402, an external I/O model 2403, and a peripheral device model 2404.
Each of the models is a functional model described in a programming language. These models are modeled using a high-level language such as C language, but the HW model 2400 may be described in a HW description language such as a hardware description language (hereinafter referred to as an HDL) or the like. Note that the configuration of hardware actually implementing the functional models such as the HW model 2400 described in the programming language will be described later with reference to
The execution accuracy setting 0 2100 sets the accuracy of processing execution for the CPU core 0 model 2001, and the execution accuracy setting 1 2200 sets the accuracy of processing execution for the CPU core 1 model 2002. A processing execution step of each core is set in a corresponding one of the execution accuracy setting 0 2100 and the execution accuracy setting 1 2200. The process executing step sets any one of the number of execution instructions, the number of cycles, and the processing time period.
First, in step 800 of
Next, in step 810 of
In step 820 of
Next, in step 830 of
Next, in step 840 of
Next, in step 850 of
In
First, it is assumed that the execution accuracy setting 0 2100 sets “four instructions” and the execution accuracy setting 1 2200 sets “11 instructions”. Moreover, T1 indicates the processing time period of one instruction for the purpose of description. Processings 0A to 0D are the processings performed by the CPU core 0 model 2001 with the execution accuracy setting 0 2100, and processings 1A to 1C are the processings performed by the CPU core 1 model 2002 with the execution accuracy setting 1 2200. Thus, the processings 0A to 0D each include four instructions, and the processings 1A to 1B each include 11 instructions.
Moreover, in an operation preceding the operation of
At the loop count=1 in
At the loop count=2 in
At the loop count=3 in
At the loop count=4 in
At the loop count=5 in
In the next loop, the CPU core 1 model 2002 selected at the loop count=5 executes the processing if not all the instructions are completed by the processing 0D. The simulation apparatus 1000 ends the operation if all the instructions are completed by the processing 0D.
The CPU 300 is an example of a processor to control the entire simulation apparatus and execute a program. The memory 301 is the ROM or HDD being a nonvolatile memory storing a program such as a boot program and a program representing the functional models illustrated in the first to third embodiments, and the RAM used as a work area of the CPU 300 or the like.
The communication I/F 302 is connected to a network and allows the simulation apparatus to be controlled via the network. The network may be a wide area network (WAN) such as an Internet Protocol Virtual Private Network (IP-VPN), a wide area LAN, or an asynchronous transfer mode (ATM) network, or the Internet. The LAN, WAN, and the Internet are examples of the network. The disk drive 303 is a device that controls read and write of data from/to a disk. The I/F 304 is a device that connects devices other than the one illustrated in
As described above, in the simulation of the system including the multiple CPUs or multiple cores, the multiple CPUs or cores can be synchronized accurately and at high speed while the CPUs or cores are operated with different execution accuracies. Moreover, the use of the present simulation apparatus enables an accurate performance evaluation of the system including the multiple CPUs or multiple cores.
As described above, the simulation apparatus 1000 of the multi-core model according to the first embodiment includes: the plurality of processor core models represented by the CPUI core 0 model 2001 and the CPU core 1 model 2002 that execute the instructions being input; the processing time calculators 2004 and 2005 that calculate the time at which each of the plurality of core models executes the instruction as the processing time; the scheduler 2006 that selects a processor core model to be executed next from the plurality of processor core models represented by the CPUI core 0 model 2001 and the CPU core 1 model 2002 on the basis of the processing time calculated by the processing time calculators 2004 and 2005; and the overall time holding unit 2300 that holds the processing time of the entire apparatus determined from the processing time calculated by the processing time calculator 2005. The processor core model selected by the scheduler 2006 executes a next instruction in accordance with the direction from the scheduler 2006. With such a configuration, the simulation apparatus 1000 of the multi-core model can maintain the accuracy of synchronization between the multiple CPUs or multiple cores and perform an accurate performance evaluation while executing the multiple CPUs or multiple cores with different execution accuracies in simulating the system including the multiple CPUs or multiple cores.
Moreover, in the simulation apparatus 1000 of the multi-core model according to the first embodiment, the scheduler 2006 includes: the core time determining unit 61 which is a determining unit that determines a processor core model with the least lapse of time among the plurality of processor core models on the basis of the processing time calculated by the processing time calculator 2005; and the execution core selecting unit 62 which is a selecting unit that selects the processor core model determined by the core time determining unit 61 as a processor core model to be executed next. Such a configuration can perform control to advance the time of the processor core model with the least lapse of time and increase the accuracy of synchronization among the plurality of processor core models by reducing the difference in the processing times among the plurality of processor core models.
Moreover, in the simulation apparatus 1000 of the multi-core model according to the first embodiment, the CPUI core 0 model 2001 and the CPU core 1 model 2002 being the processor core models include the instruction input controllers 102 and 202 that generate the host codes from the instructions being input, and the instruction execution units 101 and 201 that execute the host codes generated by the instruction input controllers 102 and 202, respectively. Such a configuration includes the instruction input controller 102 that generates the host code from the instruction being input and the instruction execution unit 101 that executes the host code generated by the instruction input controller 102. Such a configuration can convert the instruction being input with the set accuracy into the host code and execute the instruction by the set processing unit.
The simulation apparatus 1000 of the multi-core model according to the first embodiment further includes an execution accuracy setting unit represented by the execution accuracy setting 0 2100 and the execution accuracy setting 1 2200 that set the execution accuracy of the corresponding CPUI core 0 model 2001 and CPU core 1 model 2002 being the plurality of processor core models. The instruction input controllers 102 and 202 each generate the host code from the instruction being input on the basis of the execution accuracy set by the execution accuracy setting unit. Such a configuration can individually set the accuracy for the CPUI core 0 model 2001 and the CPU core 1 model 2002 being the plurality of processor core models.
Moreover, the execution accuracy set in the simulation apparatus 1000 of the multi-core model according to the first embodiment is any one of the number of instructions, the number of cycles, the processing time period, and the type of the instruction. Such a configuration can set the unit of execution of the instruction being input on the basis of the number of instructions, the number of cycles, the processing time period, or the type of the instruction.
Moreover, in the simulation apparatus 1000 of the multi-core model according to the first embodiment, the processing time calculators 2004 and 2005 each include the processing time period acquiring unit 41 that acquires the execution processing time period of the instruction executed by the processor core model, and the processing time calculating unit 44 that calculates, with the start of the simulation as the starting point, the time at which the processor core model executes the instruction as the processing time on the basis of the execution processing time period acquired by the processing time period acquiring unit 41. Such a configuration can measure the execution processing time period of the instruction executed by the processor core model for each execution unit and calculate an appropriate processing time for each execution unit.
Moreover, in the simulation apparatus 1000 of the multi-core model according to the first embodiment, the overall time holding unit 2300 holds the processing time of the processor core model selected by the scheduler 2006 as the processing time of the entire apparatus. Such a configuration allows the processing time of the entire apparatus to be synchronized with the processing time of the processor core model selected by the scheduler 2006.
Moreover, in the simulation apparatus 1000 of the multi-core model according to the first embodiment, the overall time holding unit 2300 holds the processing time of the processor core model selected by the scheduler 2006 as the processing time of the entire apparatus. Such a configuration allows the processing time of the entire apparatus to be synchronized with the processing time of the processor core model selected by the scheduler 2006.
The first embodiment mainly illustrates the configuration in the case where the number of instructions is set as the execution accuracy setting 0, whereas the present embodiment illustrates a configuration in the case where the processing time period or the number of cycles is set as the execution accuracy setting 0.
The instruction input controller 102 illustrated in
The operation of the simulation apparatus according to the second embodiment will be described with reference to
In step 800 of
In the flowchart illustrated in
The way a multi-core simulation is performed by the simulation apparatus of the second embodiment is similar to that in
As described above, the simulation apparatus according to the second embodiment performs the simulation of the system including the multiple CPUs or multiple cores such that the multiple CPUs or cores can be synchronized accurately and at high speed while the CPUs or cores are operated with different execution accuracies without limiting threads executed by the multiple CPUs or cores. Moreover, the use of the present simulation apparatus enables an accurate performance evaluation of the system including the multiple CPUs or multiple cores.
Note that the first and second embodiments may include one thread in the program 4000 or have a multi-thread configuration including a plurality of programs. The configurations described in the first and second embodiments can also be applied to the case where the program 4000 includes multiple threads.
The first and second embodiments receive the execution accuracy setting value such as the number of instructions, the processing time period, or the number of cycles from outside the core 0 model 2001 or the core 1 model 2002 to control the processing performed by each core on the basis of the accuracy setting. On the other hand, the present embodiment illustrates a configuration in which one branch included in a program is treated as one unit and each branch instruction as the unit of execution independent of the setting from outside the core 0 model 2001 or the core 1 model 2002.
The operation of the simulation apparatus according to the second embodiment illustrated in
In step 800 of
The operations in steps 81 to 83 are similar to those of the first embodiment. In step 155, the acquired instruction controlling unit 16 determines whether the instruction acquired in step 83 is a branch instruction or a jump instruction so that the instruction acquiring unit 10 stops acquiring instructions if the acquired instruction is the branch instruction or jump instruction, or acquires a next instruction if the acquired instruction is not the branch instruction or jump instruction. The operations in step 86 and the flow after step 810 are similar to those of the first embodiment.
As described above, in the simulation apparatus 1000 of the multi-core model according to the third embodiment, the processor core models such as the CPUI core 0 model 2001 and the CPU core 1 model 2002 execute the instruction being input with the branch instruction as one unit. Such a configuration allows the processor core models such as the CPUI core 0 model 2001, the CPU core 1 model 2002, and the like to execute the processing while determining the execution accuracy independently of the setting from the outside.
10: instruction acquiring unit, 11: number of acquired instructions counting unit, 12: number of acquired instructions controlling unit, 14: host code generating unit, 15: next address generating unit, 16: acquired instruction controlling unit, 17: instruction processing time period information, 41: processing time period acquiring unit, 42: instruction processing time period information, 43: instruction execution checking unit, 44: processing time calculator, 45: processing time holding unit, 46: execution instruction information, 47: execution completion information, 61: core time comparing unit, 62: execution core selecting unit, 101, 201: instruction execution unit, 102, 202: instruction input controller, 300: CPU, 301: memory (Hard Disk Drive (HDD)/Random Access Memory (RAM)/Read Only Memory (ROM)), 302: communication interface (I/F), 303: disk drive (Compact Disc (CD)/Digital Versatile Disc (DVD)/Floppy Disk (FD)), 304: I/F (Peripheral Component Interconnect (PCI)/Universal Serial Bus (USB)), 305: display, 306: mouse, 307: keyboard, 308: printer, 308: printer, 309: bus, 1000: simulation apparatus, 2000: CPU model, 2001: CPU core 0 model, 2002: CPU core 1 model, 2003: instruction memory model, 2004: processing time calculator, 2005: processing time calculator, 2006: scheduler, 2100: execution accuracy setting 0, 2200: execution accuracy setting 1, 2300: overall time holding unit, 2400: HW model, 2401: CPU bus model, 2402: memory model, 2403: external I/O model, 2404: peripheral device model, 4000: program, 6000: simulator apparatus, 6001: CPU core 0 model, 6002: CPU core 1 model, 6003: CPU bus model, 6004: external I/O model, 6005: peripheral model, 6006: memory model, 7000: SW model.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2016/056198 | 3/1/2016 | WO | 00 |