This application is based upon and claims the benefit of priority from Japanese Patent Application No. 2011-157111, filed on Jul. 15, 2011; the entire contents of which are incorporated herein by reference.
Embodiments described herein relate generally to a program generating apparatus, a method of generating a program, and a medium.
Recently, multiprocessors each having a memory group in which a plurality of memories is configured to be hierarchically connected have been developed. In a cage where software is produced in a cross development environment for the multiprocessor as a target processor, an operation of estimating the performance acquired when the produced software is executed by the target processor is necessary. Here, the performance represents a processing time required for the execution.
PIG. 7 is a diagram illustrating a specific example of the input source code;
In general, according to an embodiment, a program generating apparatus includes a cross-compiling unit, a processing time calculating unit, a source code converting unit, and a self-compiling unit. The cross-compiling unit generates an instruction string for each basic block based on a source code and specifies instructions, which are included in the instruction string, performing a memory access. The processing time calculating unit calculates a processing time of the instruction string for each basic block and generates a memory access information, which is used for identifying an access destination of the memory access, for each of the specified instructions. The source code converting unit inserts a first code, which adds the processing time of the basic block to an accumulated processing time variable of an executed thread of the basic block, and a second code, which calculates the processing time required for the memory access based on the memory access information and adds the calculated processing time to the accumulated processing time variable of an executed thread of the memory access, into the source code. The self-compiling unit generates a performance estimating program outputting the accumulated processing time variable of the thread executed last time based on the source code after the insertion of the codes.
Exemplary embodiments of a program generating apparatus, a method of generating a program, and a medium will be explained below in detail with reference to the accompanying drawings. The present invention is not limited to the following embodiments.
A first embodiment of the invention is a program generating apparatus that can acquire an estimation of the processing time required for a case where software is executed on a target machine in an environment in which the target machine is not present.
Here, a target machine is assumed to have a multiprocessor architecture in which a plurality of processors is connected to a memory group having a hierarchical configuration. In addition, a memory is a device that maintains digital data processed by a processor. The processor included in the target machine is a calculation device that performs intrinsic Instructions and will be referred to as a target processor. In addition, a device that includes a processor other than the target processor, a memory, an input/output device, and the like and generates a program (performance estimating program) used for estimating the performance of a program for the target machine or acquires a processing time by executing the generated performance estimating program will be referred to as a host machine, and a calculation device included in the host machine will be referred to as a host processor.
Hereinafter, each one of the L1 caches 2a to 2d, the L2 caches 3a and 3b, the L3 cache 4, and the main memory 5 that are included in the target machine will be referred to as an individual memory, and a memory system formed by connecting the L1 caches 2a to 2d, the L2 caches 3a and 3b, the L3 cache 4, and the main memory 5 will be referred to as a memory group.
In addition, it is assumed that the source code of software can be converted into a machine instruction string without any loss in the meaning by using a compiler as a conversion device. The converted machine instruction string will be referred to as a program. In addition, to compile a source code into a host machine-dedicated program will be referred to as self-compiling, and to compile a source code into a target machine-dedicated program will be referred to as cross-compiling. A program generated by self-compiling will be referred to as a host program, and a program generated by cross-compiling will be referred to as a target machine program. In addition, the complied software includes an instruction string that is executable on the target processor, and this instruction string will be referred to as a thread. The program is a set of the threads. Here, software having parallelism represents software in which a program generated through compiling has threads that can be simultaneously executed by mutually different processors, and the threads operate in cooperation with each other.
As Illustrated in
In addition, in the first embodiment, although the input source code 1000 is assumed to be data that is described in a high-level language, the input source code 1000 may be data described in any way as long as it can be converted into a machine instruction string without any loss in the meaning by using the compiler.
The thread scheduler 20 has a function of managing an accumulated processing time kept by each thread. The accumulated processing time kept by the thread is a sum of processing times required for the execution of an instruction on the target machine. In addition, the thread scheduler 20 has a function of receiving an inquiry on whether the currently executed thread can execute a process affecting the other threads and permitting the execution when the process can be executed or temporarily stopping the thread until the process can be executed when the process cannot be executed. Accordingly, the occurrence of a contradiction that a future process affects the past when the thread is executed on the target machine can be prevented.
In a case where an instruction for an access to the memory model 30 is executed, the memory model 30 has a function of receiving the memory access information as an input, returning another thread affected by the access, and retuning a processing time required for the access. Here, the memory model 30 is acquired by modeling the memory group included in the target machine. Digital data transmitted to the memory group is maintained at a unique place in accordance with positional information (address). The memory access information is information that includes an access symbol, an access size, and an access type. The access type is either Read or Write. The access symbol represents a symbol that represents a variable in the source code.
As the generator program 53 stored in the ROM 52 is loaded into a program storing area of the RAM 51 and is executed by the host processor 50, whereby the host machine realizes the function of the generator 10, The input source code 1000, for example, is input from an external storage device that is not illustrated in the figure. The host processor 50 processes the input source code 1000 in accordance with the generator program 53 loaded into the RAM 51 and outputs the performance estimating program 1001. The output destination of the performance estimating program 1001 may be the RAM 51 or an external storage device.
The thread scheduler program 54 and the memory model program 55 loaded in the ROM 52 are loaded into the program storing area of the RAM 51. Through execution by using the host processor 50, the host machine realises the functions of the thread scheduler 20 and the memory model 30. In the state in which the thread scheduler program 54 and the memory model program 55 are loaded in the RAM 51, the performance estimating program 1001 is loaded into the program storing area of the RAM 51 and is executed by the host processor 50. When an instruction for calling the function of the thread scheduler 20, which is included in the performance estimating program 1001, is executed, the host processor 50 moves the control to the thread scheduler program 54. In addition, an instruction for calling the function of the memory model 30 is executed, the host processor 50 moves the control to the memory model program 55.
In addition, the host machine realizing the generator 10 and the host machine realizing the environment in which the performance estimating program 1001 is executed may be different from each other.
The generator 10 includes an analysis preliminary information adding unit 101, a cross-compiling unit 102, a target processor instruction executing information generating unit (processing time calculating unit) 103, a source code converting unit 104, an analysis process generating unit 105, and a self-compiling unit 106. The functions and the operations of such constituent elements will be described in detail with reference to
First, an input source code 1000 is input to the generator 10 in step S1.
The analysis preliminary information adding unit 101 generates an analysis preliminary information-added source code 1003 by inserting analysis preliminary information into the input source code 1000 in step S2. The analysis preliminary information is information that is used for classifying a code row group configuring the input source code 1000 for each basic block and specifying a code row that performs a process affecting the outside of the target processors, 1a
to 1d. In the first embodiment, a memory access corresponds to the process affecting the outside of the target processors 1a to 1d. Hereinafter, out of the analysis preliminary information, analysis preliminary information used for classifying the code row group for each basic block and analysis preliminary information used for specifying the code row that performs a process affecting the outside of the target processors 1a to 1d may be distinguishably represented as first analysis preliminary information and second analysis preliminary information.
In the analysis preliminary information-added source code 1003 illustrated in
In addition, _API3, and _API5 correspond to the second analysis preliminary information. Here, “z=a[1]” represents that information positioned at an address acquired by adding one unit address to an addressed represented by symbol a is read, and z is substituted with the read value. In addition, one unit address is a data size allocated to the symbol.
Next, the cross-compiling unit 102 performs cross-compiling of the analysis preliminary information-added source Code 1003 and outputs a target processor instruction string 1004 that is an instruction string for a target machine in step S3. Here, the cross-compiling unit 102 classifies and generates an instruction string for a target machine for each basic block and specifies an instruction, which is included in the above-described generated instruction string, for performing a memory access. More specifically, when the analysis preliminary information-added source code 1003 is sequentially converted into an instruction string, the cross-compiling unit 102 generates a description that represents the beginning of a basic block from the code row including the first analysis preliminary information and generates a description specifying an instruction for performing a memory access from a code row including the second analysis information.
Out of such descriptions, the descriptions of “_API1:”, “_API2:”, “_API4:”, and “_API6:” are generated in correspondence with the first analysis preliminary information and are generated at the beginning of each basic block that configures the instruction string acquired by performing cross-compiling the input source code 1000, According to insertion positions of the descriptions generated in correspondence with the first analysis preliminary information, the instruction string acquired by performing cross-compiling the input source code 1000 is classified into four basic blocks. In other words, in a basic block represented by _API1, it can be understood that a mov instruction and a bnez instruction are executed. Similarly, it can be understood that lw, mov, and bra instructions are executed in a basic block represented by _API2, lw and mov instructions are executed in a basic block represented by _API4, two mov instructions and add3 and ret instructions are. executed in a basic block represented by _API6.
Here, the mov instruction is an instruction for substituting the value of the second argument into the first argument, and the add3 instruction is an instruction for substituting a sum of values of the second and third arguments into the first argument. In addition, the bnez instruction is an instruction for jumping to the label of the second argument unless the value of the first argument is zero, and the bra instruction is an instruction for unconditionally jumping to the label of a designated destination. Furthermore, the lw instruction is an instruction accompanying a memory access and is an instruction for reading 4 byte data from a corresponding address and substituting the read data into a register. In arguments, a symbol row starting from “$” represents a register, a symbol row starting from a number represents a numeric value, and a symbol row starting from an alphabet represents a variable. Hereinafter, the descriptions generated in correspondence with the first analysis preliminary information may be referred to as basic block specifying information.
In addition, the basic block represented by _API3 and _API5 include the lw instructions that accompany memory accesses. The descriptions of “_API3: func, line=3, column=8” and “_API5: func, line=5, column =8” are generated in correspondence with the second analysis preliminary information and are generated in such basic blocks including instructions for performing memory accesses. Such a description includes a description that specifies the position at which the instruction accompanying the memory access is executed in the input source code 1000, Hereinafter, the description generated in correspondence with the second analysis preliminary information may be referred to as memory access instruction specifying information.
Next, a target processor instruction executing information generating unit 103 generates target processor instruction executing information 1005 based on the target processor instruction string 1004 in step S4. The target processor instruction executing information 1005 is configured by a processing time required for the execution of a target processor instruction and the memory access information.
The target processor instruction executing information generating unit 103 calculates a processing time required for the execution of a target processor instruction based on the control flow and the basic block. The target processor instruction executing information generating unit 103 can recognize an instruction that is executed when the basic block is processed based on the basic block specifying information that is inserted into the target processor instruction string 1004. The processing time of an instruction that does not accompanying a memory access is predetermined. Accordingly, when instructions to be executed and the number of the instructions are given, the target processor instruction executing information generating unit 103 can calculate a processing time required for the execution by using the target processor.
On the other hand, the processing time required for an instruction accompanying a memory access is acquired by adding a processing time depending on the individual memory model of the access destination to the processing time of the instruction. The target processor instruction executing information generating unit 103 inserts memory access information into a place at which the processing time required for an instruction accompanying a memory access is described. The target processor instruction executing information generating unit 103 can recognize whether or not an instruction accompanying a memory access is included in the basic block by referring to the memory access specifying information. In addition, the memory access information is information used for identifying the access destination of the memory access and includes an access symbol, an access size, and an access type, in the memory access information, the processing time that depends on the individual memory model of the access destination is not included.
As above, the target processor instruction executing information generating unit 103 calculates a processing time required for a process not affecting the outside of the target processors 1a to 1d, in other words, the process executed within the target processors 1a to 1d for each basic block based on the instruction string for each basic block and generates memory access information used for identifying the access destination of the memory access for each specified instruction.
Next, the source code converting unit 104 converts the analysis preliminary information added to the analysis preliminary information-added source code 1003 into an analysis API based on the-target processor instruction executing information 1005 and outputs an analysis API-added source code 1006 in step S5. Here, the conversion of the analysis preliminary information into the analysis API is performed such that a processing time for each basic block, which is described in the target processor instruction executing information 11005 is accumulated, when the analysis API-added source code 1006 is self-compiled and is executed on the host machine. In addition, an instruction accompanying a memory access is converted so as to issue a request to the memory model 30.
Next, the analysis process generating unit 105 generates a first target instruction executing analyzing process 1007 and a second target instruction executing analyzing process 1008 that are executed by being called from the analysis API in step S6. The first target instruction executing analyzing process 1007 and the second target instruction executing analyzing process 1008 configure an analysis API library 1009.
The first target instruction executing analyzing process 1007 is called from _PROC and is a process of updating the accumulated processing time for each thread which is managed by the thread scheduler 20. The first target instruction executing analyzing process 1007 inquires the thread scheduler 20 of the accumulated processing time of the thread that is currently executed, adds a processing time necessary for the target machine to the acquired accumulated processing time, and passes the added accumulated processing time to the thread scheduler 20. In other words, the first target instruction executing analyzing process 1007 adds the processing time of an executed basic block, which is calculated by the target processor instruction executing information generating unit 103, to the accumulated processing time of the thread that performs the basic block when the basic block is executed,
The second target instruction executing analyzing process 1008 is called from _MREAD and is a process of issuing a request for inquiring the memory model 30 of a processing time necessary for an access.
The management of data respectively maintained by the individual memory models 30a to 30g corresponding to caches may be performed, for example, by using a reuse-distance model. According to the reuse-distance model, a reuse distance stack representing a memory access sequence of each individual memory is defined, and, when there is an access to an individual memory, information (for example, a combination of an address and a size) used for identifying data of the access destination is. stacked in the reuse-distance stack of the individual memory.
When an individual memory model maintaining the access target data is specified, the memory model 30 determines whether or not there is a plurality of target processors that can access the specified individual memory model in step S22. In a case where there is a plurality of the accessible target processors (Yes in S22), a list of the accessible target processors is output in step S23. On the other hand, in a case where there is only one accessible target processor (No in step S22), the memory model 30 outputs a notification indicating that the target thread does not affect the other threads in step S24. After step S23 or S24, the memory model 30 ends the operation performed when the effect investigating request is received.
in a case where the list of the accessible target processors is returned from the memory model 30, the target thread affect the other threads (Yes in S11), and accordingly, in the second target instruction executing analyzing process 1008, as represented in the third row illustrated in
In a case where the accumulated processing time of the target thread is not the shortest (No in S32), the thread scheduler 20 stops the execution of the target thread in step S33 and determines whether or not the execution of the thread of which the accumulated processing time is the shortest is stopped in step S34. In a case where the execution of the thread of which the accumulated processing time is the smallest is stopped (Yes in S34), the thread scheduler 20 restarts the execution of the thread of which the accumulated processing time is the smallest in step S35. Then, the thread scheduler 20 waits for the restart of the execution of the target thread in step S36.
Here, in order to allow the notified thread to access the individual memory model that the target thread tries to access, the thread scheduler 20 individually performs the process of steps S31 to S35 for each of the notified threads. Accordingly, when the execution of the target thread is stopped according to the process of step S33 and is waited for the time being, by performing the process of step S31 to S35 for each notified thread out of all the notified threads and the target thread, the accumulated processing time of the target thread becomes the smallest. In the state, when one of the notified threads tries to access the individual memory model, according to the process of step S35 performed for the notified thread that tries to access the individual model, the execution of the target thread is restarted. After the execution of the target thread is restarted, the thread scheduler 20 permits the execution of the memory access of the target thread in step S37. On the other hand, in a case where the execution of the thread of which the accumulated processing time is the smallest is not stopped (No in step S34), the thread scheduler 20 skips the process of step S35.
In a case where the accumulated processing time of the target thread is the shortest of all the threads notified of in step S23 (Yes in S32), the execution of the memory access of the target thread is permitted in step S37, and the operation ends.
As above, the thread scheduler 20 performs scheduling of the target thread and the other threads that are affected such that the accumulated processing time of the target thread Side is shorter than that of the other threads that apply affects.
Here, the execution of the target thread is described to be restarted according to the process of step S35 for the other notified thread. However, instead of the process of steps S34 to S36, it may be configured to determine whether or not the accumulated processing time of the target thread is the shortest. In a case where the accumulated processing time of the target thread is the shortest, the execution of the target thread may be restarted. In addition, in a case where the accumulated processing time of the target thread is not the smallest, it is determined again whether or not the accumulated processing time of the target thread is the smallest.
After the scheduling is performed, the second target instruction executing analyzing process 1008 issues a memory access request to the memory model 30 and acquires the processing time required to access the individual memory model of the access destination in step S13.
As above, in-the second target instruction executing analyzing process 1008, when the basic block performing a memory access is executed, scheduling between the thread executing the basic block that performs the memory access and the thread having the same access destination is performed based on the memory access information, the processing time required for the memory access of the basic block performing the memory access is calculated, and the calculated processing time required for the memory access is added to the accumulated processing time of the thread executing the basic block that performs the memory access.
Thereafter, in the second target instruction executing analyzing process 1008, the first target instruction executing analyzing process 1007 is called, the processing time required for the memory access, which is output in step S41, is added to the accumulated processing time in step S14, and the operation ends. In a case where the target thread does not affect the other threads (No in step S11), in the second target instruction executing analyzing process 1008, the process of step S12 is skipped.
Here, the reuse-distance model is described to be used so as to determine a cache hit/miss by using the memory model 30 or so as to calculate the processing time required for a memory access. However, the method of calculating the processing time required for a memory access is not limited thereto. For example, in a case where read can be processed in one cycle, and write can be processed in two cycles in terms of the number of cycles of the target processor for any access, by performing a process as illustrated in
After the process of S5, the self-compiling unit 106 performs self-compiling the analysis API-added source code 1006, links the analysis API library in the process of the self-compiling, and outputs the performance estimating program 1001 in S6.
A user can acquire an estimation value 1002 of the processing time when the target program acquired by performing cross-compiling the input source code 1000 by using the target processor by executing the performance estimating program 1001 in a host machine to which the thread scheduler 20 and the memory model 30 are installed in S7. The performance estimating program 1001 estimates the accumulated processing time of the thread that is executed last time and outputs the estimated accumulated processing time as a value 1002 by being executed in the host machine to which the thread scheduler 20 and the memory model 30 are installed.
In addition, in the description presented above, although the analysis preliminary information has been described as a description for calling the functional-type API, any description may be used as long as it is in the form that can be interpreted by the cross-compiling unit 102 and the source code converting unit 104.
Furthermore, the analysis API library D10 may be prepared in advance,
As described above, according to the first embodiment of the present invention, the cross-compiling unit 102 classifies and generates an instruction string for a target machine for each basic block by performing cross-compiling of the source code (input source code 1000) of the software that has the computer including a plurality of target processors 1a to 1d and a memory group (the L1 caches 2a to 2d, the L2 caches 3a and 3b, the L3 cache 4, and the main memory 5) that the plurality of target processors 1a to 1d accesses as a target machine, specifies instructions, which are included in the generated instruction string, performing a memory access, the target processor instruction executing information generating unit 103 calculates the processing time required for the process not affecting the outside of the target processors 1a to 1d for each basic block based on the instruction string, which is generated as described above, for each basic block and generates the memory access information used for identifying the access destination of the memory access for each specified instruction, and the source code converting unit 104 inserts _PROC that is a code adding the processing time of the executed basic block, which is calculated by the processing time calculating unit, to the accumulated processing time of thread that executes the basic block when the basic block is executed into a place corresponding to the input source code 1000, scheduling between the thread executing the basic block that performs a memory access when the basic block performing the memory access is executed and the thread having the same access destination is performed based on the memory access information, the processing time required for a memory access of the basic block that performs the memory access is calculated, _MREAD that is a code that adds the processing time required for the memory access, which is calculated as described above, to the accumulated processing time of the thread executing the basic block that perform the memory access is inserted into a place corresponding to the input source code 1000, the self-compiling unit 106 performs self-compiling of the source code (the analysis API-added source code 1006) after the insertion of the code and generates a performance estimating program 1001 that outputs the accumulated processing time of the thread that has been completed the last, similarly configured, whereby the processing time of the software can be estimated without preparing the target machine. Since the processing time of the software can be estimated without preparing a target machine, and the performance evaluation of the software can be performed even in a case where the target machine is in the designing process, whereby the operation of estimating the processing time when software developed only for the multiprocessors is executed can be efficiently performed.
In addition, the evaluation of the performance of the software may be considered to be performed based on the performance ratio between the host machine and the target machine (Comparative Example 1). However, in Comparative Example 1, a difference in the execution time based on the difference in the architecture of the host machine and the target machine is not considered, and accordingly, in a case where the execution time is markedly different from each other based on the difference in the architecture, there is a problem in that the precision markedly decreases. In contrast to this, according to the first embodiment, the performance estimating program 1001 can calculate the processing time when the software is executed by the target machine with the memory architecture or the processor architecture being considered, whereby the processing time of the software dedicated to the target machine can be estimated more precisely than Comparative Example 1.
Furthermore, while a method (Comparative Example 2) of performing the evaluation of software dedicated to the target machine by using an instruction simulator mounted by software may be considered, the performance estimating program 1001 calculates the processing time in a target machine based on instructions generated through cross-compiling, and accordingly, an instruction executed by the target processor is not simulated, whereby the processing time of software dedicated to the target machine can be estimated at a speed higher than that of Comparative Example 2.
In addition, the host machine further includes the thread scheduler 20 that manages the accumulated processing time for each thread executed by processors included in the target machine, and the first target instruction executing analyzing process 1007 called by _PROC calls the thread scheduler 20 while Using the processing time calculated by the target processor instruction executing information generating unit 103 as an argument, and the processing time passed as the argument is configured to be added to the accumulated processing time of the executed thread, whereby a user, can change the thread scheduler 20 in accordance with the architecture of the target processors 1a to 1d.
Furthermore, the memory group included in the target machine includes a plurality of individual memories (the L1 caches 2a to 2d, the L2 caches 3a and 3b, the L3 cache 4, and the main memory 5) that form a hierarchical structure,, and, in the second target instruction executing analysing process 1008 called by _MREAD, while a plurality of threads having the same individual memory of the access destination is configured as the threads having the same access destination, it may be configured such that the individual memories are divided into a plurality of areas, and the threads having the same areas that are divided so as to be generated as the same access destinations are configured as the threads having the same access destinations.
In addition, the thread scheduler 20 allows a memory access performed by the basic block that performs the memory access until the accumulated processing time is the shortest among a plurality of threads having the same individual memory of the access destination to be waited and returns the waiting time, and accordingly, in the second target instruction executing analyzing process 1008 called by _MREAD, it is configured such that the thread scheduler 20 is called, and the waiting time returned from the called thread scheduler 20 is added to the accumulated processing time of the executed thread, whereby a contradiction that a future process affects the past process can be prevented.
The target machine illustrated in
The external hardware model 40 is acquired by modeling the external hardware 6 and has a function of returning a processing time that is necessary for performing the process by using the external hardware 6 in terms of the number of cycles of the target processor in a case where an instruction used for driving the external hardware 6 is executed on the target processor.
The analysis process generating unit 105, in addition to the first target instruction executing analyzing process 1007 and the second target instruction executing analyzing process 1008, generates a third target instruction executing analyzing process that is called by HWE1_EXEC and generates an analysis API library 1009 by combining the first-target instruction executing analyzing process 1007, the second target instruction executing analyzing process 1008, and a third, target instruction executing analyzing process 1010.
In the third target instruction executing analyzing process 1010, a process of adding the time output by the external hardware model executing process 1011 to the accumulated processing time of the target thread is performed (the sixth row in
As above, in the second embodiment, the source code converting unit 104 inserts the third target instruction executing analyzing process 1010, in which the processing time required to drive the external hardware 6 when this basic block driving the external hardware 6 is executed is added to the accumulated processing time of the thread that executes the basic block driving the external hardware 6, to a corresponding place in the input source code 1000.
As described above, according to the second embodiment of the present invention, it is configured such that the target machine includes the external hardware 6 that is driven by the target processors 1a and 1b, the cross-compiling unit 102 specifies instructions, which are included in the generated instruction string, used for driving the external hardware 6, the target processor instruction executing information generating unit 103 generates the external hardware access information used for identifying specified instructions used for driving the external hardware 6 based on the specified instructions that drive the external hardware 6, and the source code converting unit 104 inserts the third target instruction executing analysing process 1010, in which the processing time required to drive the external hardware 6 when the basic block driving the external hardware 6 is executed to the accumulated processing time of the thread executing the basic block that drives the external hardware 6, into a corresponding place in the input source code 1000, and accordingly, even in a case where the target machine includes the external hardware 6, the performance estimating program 1001 that can estimate the performance of the software dedicated to the target machine can be generated.
In a third embodiment, a case will be described in which the external hardware 6 is connected to a memory. In the example illustrated in
In a case where the external hardware 6 is connected to the individual memory, the external hardware model executing process 1011 is performed as illustrated in
In other words, in the third target instruction executing analyzing process 1010, a processing time required for a memory access from the external hardware 6 is calculated by performing scheduling between the external hardware 6 and the threads having the same access destination, and the processing time, which is calculated as described above, required for the memory access from the external hardware 6 and the processing time required to drive the external hardware 6 are added to the accumulated processing time of the thread that executes the basic block driving the external hardware 6. Accordingly, even in a case where the external hardware 6 performs a memory access, the software dedicated to the target machine can be evaluated.
As described above, according to the first to third embodiments, the processing time of software can be estimated without preparing a target machine, and the processing time of the software can be estimated with precision higher than that of a case where an instruction simulation is performed. Accordingly, an operation of estimating the processing time when software developed so as to be dedicated to multiprocessors is executed can be efficiently performed.
In addition, it may be configured such that the analysis process generating unit 105 receives the target processor instruction executing information 1005 of the first to third embodiments as an input and generates necessary processes out of the first target instruction executing analyzing process 1007, the second target instruction executing analyzing process 1008, and the third target instruction executing analyzing process 1010.
While certain embodiments have been described, these embodiments have been presented by way of example only, and are not intended to limit the scope of the inventions, Indeed, the novel embodiments described herein may be embodied in a variety of other forms; furthermore, various omissions, substitutions and changes in the form of the embodiments described herein may be made without departing from the spirit of the inventions. The accompanying claims and their equivalents are intended to cover such forms or modifications as would fall within the scope and spirit of the inventions.
Number | Date | Country | Kind |
---|---|---|---|
2011-157111 | Jul 2011 | JP | national |