This application is based upon and claims the benefit of priority of the prior Japanese Patent Application No. 2017-193294, filed on Oct. 3, 2017, the entire contents of which are incorporated herein by reference.
The embodiments discussed herein are related to an information processing apparatus, an information processing method, and a recording medium in which a program is recorded.
In a target code pre-transformation method, target code is transformed into host code before execution.
The related art is disclosed in Japanese Laid-open Patent Publication Nos. 2012-159936 and 7-84799.
According to an aspect of the embodiments, an information processing apparatus includes: a memory; and a processor coupled to the memory, the processor is configured to: acquire, by analyzing a program, a first address of the memory at which a memory access instruction in the program is stored, and a second address of the memory to be accessed by the memory access instruction; and generate first information indicating a correspondence between the first address and the second address.
The object and advantages of the invention will be realized and attained by means of the elements and combinations particularly pointed out in the claims.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the invention, as claimed.
Target code is decomposed into, for example, basic blocks, each of which is a minimum unit of a sequence of instructions in which a branch instruction or an entry from a branch instruction does not take place. The instructions in the target code are parsed for each basic block, so that read instructions from registers, write instructions to the registers, read instructions from a memory, write instructions to the memory, and arithmetic and logical instructions are detected. In this case, a dependency graph is generated in which a dependency relationship of a value to be loaded to a certain register on a value in another register or a memory content is represented together with nodes and edges concerning instructions. A memory reference table is used every time a read or write access to the memory takes place. In the memory reference table, a content read from the memory and a content written to the memory are respectively associated with address values. By linking the dependency graph and the memory reference table with each other, all possible address values as jump destination addresses of branch instructions are listed as entry points in the course of pre-transformation of the branch instructions.
For example, by compiling, an object program with high performance is generated for a computer having a cache memory. The occurrence of cache contention between memory references in an input program is detected.
A processor is able to perform processing by running software. In order to make processing faster than the processing by software, the processing may be performed by hardware, namely, a field-programmable gate array (FPGA). To develop the FPGA, a hardware designer has to first understand the processing contents by the software and then to create hardware operation description aiming at hardware architecture, which may be difficult to accomplish.
For example, it may be desirable to provide a technique of assisting creation of hardware operation description aiming at hardware architecture.
The CPU 102 performs data processing or computations, and controls the constituent elements coupled to the CPU 102 via the bus 101. The ROM 103 stores a startup program. The CPU 102 starts operating by executing the startup program in the ROM 103. The external storage device 108 stores a program containing a compiler 201, a dynamic analysis tool 202, and a memory access data analysis tool 203 illustrated in
The present embodiment may be implemented with the computer running a program. In addition, a computer-readable recording medium in which the aforementioned program is recorded and a computer program product of the aforementioned program or the like may be applied as embodiments of the present disclosure. Examples usable as the recording medium include a flexible disk, a hard disk, an optical disk, a magneto-optical disk, a CD-ROM, a magnetic tape, a non-volatile memory card, a ROM, and so on.
Next, an accelerator is explained. The processor is capable of performing various kinds of processes by executing the program. The processing of the program is performed at relatively low speed. In the case where a large volume of data is processed at high speed, the processing of the program executed by the processor is disadvantageous in the aspects of processing speed and power consumption. In this case, an accelerator may be used such as a general-purpose computing on graphics processing units (GPGPU) or FPGA. The accelerator is hardware and is advantageous in the aspects of processing speed and power consumption. The designer classifies desired processing into processing to be executed by the accelerator and processing to be executed by the processor according to their processing contents and processing purposes. Then, the designer writes the processing for the accelerator as source code in a description language for hardware designing (for example, the C language if a high-level synthesis tool is used). Lastly, the designer accomplishes development of the accelerator based on the source code by using the high-level synthesis tool and so on.
In the case where a FPGA is employed to implement the accelerator, the development of the hardware circuit involves a large number of man-hours. Use of the high-level synthesis tool allows designing with high-level language description at a high level of abstraction and therefore may reduce the number of man-hours for the development of the hardware circuit. Even in this case, however, in order to generate an excellent circuit, the designer has to create the source code in the high-level language description for the high-level synthesis by aiming at the hardware architecture.
If usual high-level language description is directly given to the high-level synthesis tool, the high-level synthesis tool often causes a problem of generating a circuit with low performance or a circuit too enormous to be layouted on a FPGA. To avoid this, the hardware designer desirably first understands the processing contents of the source code desired to be implemented as the accelerator, and then newly writes high-level language description aiming at hardware architecture. It takes a certain number of man-hours for the designer to understand the processing contents of the source code to be processed by the processor. The creation of the source code in the high-level language description suited to the accelerator relies on the degree of understanding of the source code to be processed by the processor and the skill level of the designer. In view of this, the present embodiment is intended to provide the information processing apparatus 100 capable of assisting creation of source code aiming at hardware architecture.
In
Running the dynamic analysis tool 202, the CPU 102 inputs the executable file 212, the input data file 213, and the analysis range specifying file 214 and generates memory access data 215. The input data file 213 is a file containing input data for the executable file 212 and may be omitted. For example, input data is described on the line 24 in the source file 211 in
Running the dynamic analysis tool 202, the CPU 102 parses the executable file 212 and generates the memory access data 215 by acquiring the time 401, the instruction address 402, the memory address 403, and the type 404 correspondent with an access of each variable name in the source file 211 specified in the analysis range specifying file 214. The memory access data 215 contains information indicating each association among the time 401, the instruction address 402, the memory address 403, and the type 404. Note that the time 401 and the type 404 may be omitted. The detailed processing of the dynamic analysis tool 202 will be described later with reference to
In
Meanwhile, running the grouping module 205, the CPU 102 performs grouping of the multiple memory address nodes 521 to 526 in the graph structure 500 in
The edge 511 is an edge directed from the memory address node 541 to the instruction address node 531 and represents the read instruction R. The edge 512 is an edge directed from the instruction address node 532 to the memory address node 541 and represents the write instruction W. The edge 513 is an edge directed from the memory address node 542 to the instruction address node 531 and represents the read instruction R. The edge 514 is an edge directed from the instruction address node 532 to the memory address node 542 and represents the write instruction W. The edge 515 is an edge directed from the memory address node 543 to the instruction address node 531 and represents the read instruction R. The edge 516 is an edge directed from the instruction address node 532 to the memory address node 543 and represents the write instruction W.
The grouping module 205 groups together the two instruction address nodes 613 and 614 in the graph structure 601 to generate one instruction address node 615 in the graph structure 602. The graph structure 602 contains instruction address nodes 611, 612, and 615, edges 621 to 624, and memory address nodes 631 to 633. The edge 623 is an edge directed from the instruction address node 615 to the memory address node 633 and represents the write instruction W. The edge 624 is an edge directed from the memory address node 633 to the instruction address node 615 and represents the read instruction R.
Next, an example of 10 types of grouping processes is explained. First to Sixth grouping processes are processes of grouping instruction address nodes as illustrated in
The first grouping process performs grouping by instruction address. The grouping module 205 groups multiple instruction address nodes 501, 503, and 505 representing the same instruction address into one instruction address node 531 as illustrated in
The second grouping process performs grouping by source file. The grouping module 205 groups instruction address nodes representing multiple instruction addresses contained in the same source file among multiple source files 211 into one instruction address node.
The third grouping process performs grouping by function. The grouping module 205 groups instruction address nodes representing instruction addresses correspondent with source code lines contained in the same function main or ft01 in the source file 211 in
The fourth grouping process performs grouping by code block. The grouping module 205 groups instruction address nodes representing instruction addresses correspondent with source code lines contained in the same code block among the code blocks 301 to 303 in the source file 211 in
The fifth grouping process performs grouping by loop process iteration. In reference to the time 401 in the memory access data 215 in
The sixth grouping process performs grouping by set of multiple functions including a certain function and a function to be called by the certain function. The grouping module 205 groups an instruction address node representing an instruction address correspondent with the source code contained in a first function (for example, main) and an instruction address node representing an instruction address correspondent with the source code contained in a second function (for example, ft01) to be called by the first function in the source file 211 in
The seventh grouping process performs grouping by memory address. The grouping module 205 groups multiple memory address nodes 521 and 522 representing the same memory address into one memory address node 541 as illustrated in
The eighth grouping process performs grouping by variable. The grouping module 205 groups memory address nodes representing multiple memory addresses contained in a memory area correspondent with each variable in the source file 211 into one memory address node. For example, the array variable mem1 in
The ninth grouping process performs grouping by set of memory accesses made by instructions consecutively executed. In reference to the time 401 in the memory access data 215 in
In the tenth grouping process, memory areas dynamically allocated are taken into account. The grouping module 205 groups memory address nodes representing different memory addresses for the same variable name (for example, in, out, mem1, or the like) in the source file 211 in
In
Running the grouping module 205, the CPU 102 groups instruction address nodes for each code block in the source file 211 based on the debug information in the executable file 212 to generate the instruction address nodes 701 to 705. The instruction address node 701 is formed by grouping together the instruction address nodes for the code block on the lines 22 to 25 in the function main in
In addition, running the grouping module 205, the CPU 102 groups memory address nodes for each variable name in the source file 211 based on the debug information in the executable file 212 to generate the memory address nodes 721 to 723. The memory address node 721 is formed by grouping the memory address nodes for the variable name in
The edge 711 is an edge directed from the instruction address node 701 to the memory address node 721 and represents the write instruction W. The edge 712 is an edge directed from the memory address node 721 to the instruction address node 702 and represents the read instruction R. The edge 713 is an edge directed from the instruction address node 702 to the memory address node 722 and represents the write instruction W. The edge 714 is an edge directed from the instruction address node 703 to the memory address node 722 and represents the write instruction W. The edge 715 is an edge directed from the memory address node 722 to the instruction address node 703 and represents the read instruction R. The edge 716 is an edge directed from the memory address node 722 to the instruction address node 704 and represents the read instruction R. The edge 717 is an edge directed from the instruction address node 704 to the memory address node 723 and represents the write instruction W. The edge 718 is an edge directed from the memory address node 723 to the instruction address node 705 and represents the read instruction R.
Moreover, running the labeling module 206, the CPU 102 refers to the source file 211 in
Specifically, the CPU 102 assigns a label of “symbol (function name)/code block name/line number” in the source file 211 to each of the instruction address nodes 701 to 705. The instruction address node 701 is assigned with the label of “main/STMT/22”, which indicates that the function name is main, the code block name is statement (STMT), and the start line number is 22. The instruction address node 702 is assigned with the label of “ft01/for/8”, which indicates that the function name is ft01, the code block name is for sentence, and the start line number is 8. The instruction address node 703 is assigned with the label of “ft01/for/12”, which indicates that the function name is ft01, the code block name is for sentence, and the start line number is 12. The instruction address node 704 is assigned with the label of “ft01/for/15”, which indicates that the function name is ft01, the code block name is for sentence, and the start line number is 15. The instruction address node 705 is assigned with the label of “main/for/27”, which indicates that the function name is main, the code block name is for sentence, and the start line number is 27.
Then, the labeling module 206 assigns a label of “variable name” in the source file 211 to each of the memory address nodes 721 to 723. The memory address node 721 is assigned with the label of “in”, which indicates that the variable name is in. The memory address node 722 is assigned with the label of “mem1”, which indicates that the variable name is main. The memory address node 723 is assigned with the label of “out”, which indicates that the variable name is out.
Next, running the output module 207 in
With reference to the graph structure 216, the designer may relatively easily rewrite the hardware behavioral description aiming at the hardware architecture. For example, with reference to the graph structure 216, the designer may find processes executable in parallel, and create hardware behavioral description to cause parallel processing of the processes thus found. The accelerator may achieve speed-up of processing by performing parallel processing. The parallel processing includes data-level parallel processing and task-level parallel processing. The data-level parallel processing corresponds to single instruction, multiple data (SIMD) processing. The task-level parallel processing corresponds to parallel processing of multiple pipelines.
Line 6int mem1[8], mem2[8];
Line 13mem2[j]=mem1[j]*5/t0;
Line 16out[j]=mem2[j]+1;
When the corrected source code is applied as the source file 211 in
The high-level synthesis tool may transform the hardware behavioral description in the high-level language to a hardware description language (HDL) file in order to develop an accelerator. However, when pointer variables are used in the behavioral description in the high-level language, the high-level synthesis tool may fail to synthesize circuits capable of efficient processing. When a value of a pointer variable is used as an argument to call a function, the variable name may change in some cases. For this reason, by just looking at the source code, it is difficult to immediately judge whether or not pointer variables point to the same memory area. Meanwhile, by referring to the graph structure 216, the designer may easily know that variables even having different variable names actually point to the same memory area. This is useful at an early stage of planning circuit architecture.
In the work of task division and memory division conducted at the stage of planning circuit architecture, the hardware designer may obtain information as hints for the work from the graph structure 216, and thereby achieve a reduction in man-hours.
In step S901, the dynamic analysis tool 202 loads the executable file (program under analysis) 212, the input data file 213, and the analysis range specifying file 214 from the external storage device 108.
Here, the dynamic analysis tool 202 has a function like a software debugger GDB. Next, in step S902, by referring to the debug information in the executable file 212 and the source file 211, the dynamic analysis tool 202 sets a first breakpoint at a location immediately after a memory is allocated to a variable (analysis range) specified by the analysis range specifying file 214. For example, in the case where the analysis range is the variable mem1, the dynamic analysis tool 202 sets the first breakpoint at the location in the program of the executable file 212 immediately after a memory is allocated to the variable mem1 on the line 6 in the source file 211 in
In addition, by referring to the debug information in the executable file 212 and the source file 211, the dynamic analysis tool 202 sets a second breakpoint at a location immediately before the memory allocated to a variable (analysis range) specified by the analysis range specifying file 214 is released. For example, in the case where the analysis range is the variable mem1, the dynamic analysis tool 202 sets the second breakpoint at the location in the program of the executable file 212 immediately before the memory allocated to the variable mem1 is released at the end of the function ft01 on the line 18 in the source file 211 in
Next, in step S903, the dynamic analysis tool 202 starts execution of the executable file (program under analysis) 212.
If the dynamic analysis tool 202 detects the processing of the executable file (program under analysis) 212 reaching the first breakpoint in step S1001, the dynamic analysis tool 202 advances to step S1002. In step S1002, the dynamic analysis tool 202 sets, as a memory access monitor area, the start address to the end address of the memory area allocated to the variable at the analysis range. Next, in step S1003, the dynamic analysis tool 202 restarts the execution of the executable file (program under analysis) 212.
If the dynamic analysis tool 202 detects the processing of the executable file (program under analysis) 212 reaching the second breakpoint in step S1011, the dynamic analysis tool 202 advances to step S1012. In step S1012, the dynamic analysis tool 202 releases the setting of the memory access monitor area related to the second breakpoint. Next, in step S1013, the dynamic analysis tool 202 restarts the execution of the executable file (program under analysis) 212.
If the dynamic analysis tool 202 detects that a memory access to the memory access monitor area is carried out by the processing of the executable file (program under analysis) 212 in step S1101, the dynamic analysis tool 202 advances to step S1102. In step S1102, the dynamic analysis tool 202 records the memory access data 215 into the external storage device 108 according to the detected memory access, the memory access data 215 containing the time 401, the instruction address 402 that performs the memory access, the accessed memory address 403, and the type 404 of the memory access (read or write). Next, in step S1103, the dynamic analysis tool 202 restarts the execution of the executable file (program under analysis) 212.
The aforementioned memory access monitor performed by the dynamic analysis tool 202 may be carried out by any of several methods. One of the methods is to use the CPU 102 having a function called watch point which generates an interrupt when a memory access to a designated address is performed. When an interrupt is generated by the watch point, the memory access data 215 can be recorded by interrupt handling program. Another method is to use a function held by the CPU 102 to execute instructions one by one by step execution. In this case, when a memory access instruction is found, whether the memory access instruction accesses to the memory access monitor area or not is checked, and the memory access data 215 is recorded if the memory access monitor area is accessed.
The memory access monitor by the dynamic analysis tool 202 may be carried out by software. The program is executed on a CPU emulator, a memory access to the monitor area by the program is detected by the CPU emulator, and the memory access data 215 is recorded. For example, the dynamic analysis tool 202 may be implemented by using a tool VALGRIND, popular software to detect memory-related bugs, configured to detect memory access and a software debugger GDB in combination.
In step S1201, the memory access data analysis tool 203 transforms the memory access data 215 in
The processing in steps S1202 to S1207 is executed by the memory access data analysis tool 203 running the grouping module 205. First, in step S1202, the memory access data analysis tool 203 enumerates grouping processes F0, F1, . . . based on the aforementioned first to tenth grouping processes. Even when there are ten types of grouping processes, for example, a huge number of Fs are enumerated because there are a plurality of targets to which the grouping processes are to be applied.
Next, in step S1203, the memory access data analysis tool 203 obtains node decrease numbers D0, D1, . . . for the respective grouping processes F0, F1, . . . , where the node decrease number represents the number of nodes by which the nodes are decreased from the graph structure 500 before the grouping to the graph structure after the grouping.
Next, in step S1204, the memory access data analysis tool 203 sorts the grouping processes F0, F1, . . . in ascending order of the node decrease number D0, D1, . . . to generate a list FL. The order for sorting is not limited to the above order. There is an evaluation function E that gives an evaluation value to each of the grouping processes F0, F1, . . . , and the memory access data analysis tool 203 calculates E(F0)=D0, E(F1)=D1, . . . as the evaluation values. Then, the memory access data analysis tool 203 sorts the grouping processes F0, F1, . . . in ascending order of the evaluation value D0, D1, . . . to generate a list FL.
Next, in step S1205, the memory access data analysis tool 203 judges whether or not the total node number of the instruction address nodes and memory address nodes in the current graph structure is equal to or less than the designated number of nodes N. The memory access data analysis tool 203 advances to step S1208 if the total node number is equal to or less than the designated number of nodes N, or advances to step S1206 if the total node number is more than the designated number of nodes N.
In step S1206, the memory access data analysis tool 203 judges whether the list FL is empty or not. The memory access data analysis tool 203 advances to step S1207 if the list FL is not empty, or displays an error and terminates the processing in
In step S1207, the memory access data analysis tool 203 takes out the grouping process from the top of the list FL, deletes the taken-out grouping process from the list FL, and performs the taken-out grouping process on the current graph structure to generate the graph structure thus grouped.
Thereafter, the memory access data analysis tool 203 returns to step S1205, and iterates the above processing until the total node number in the current graph structure becomes equal to or less than the designated number of nodes N. Until the total number of instruction address nodes and memory address nodes in the current graph structure becomes equal to or less than the designated number of nodes N, the memory access data analysis tool 203 performs multiple types of grouping processes in ascending order of the number of nodes by which the total number of instruction address nodes and memory address nodes is decreased by the grouping process.
In step S1208, running the labeling module 206, the memory access data analysis tool 203 generates the graph structure 216 by referring to the source file 211 and labeling related to the source file 211 to the instruction address nodes and the memory address nodes in the graph structure grouped by the grouping module 205.
Next, in step S1209, running the output module 207, the memory access data analysis tool 203 outputs the graph structure 216 in which the labels are assigned by the labeling module 206 to the output device 107. The output device 107 displays or prints out the graph structure 216 that is easily understandable by human.
As described above, the information processing apparatus 100 is capable of presenting the graph structure 216 to the designer and thereby assisting the designer to create the hardware behavioral description aiming at the hardware architecture.
All the foregoing embodiments are described as just specific examples for carrying out the present disclosure, and the technical scope of the present disclosure is not to be interpreted in a manner limited by these embodiments. In other words, the present disclosure may be carried out in various ways without departing from the technical idea of the present disclosure or the main features thereof.
All examples and conditional language recited herein are intended for pedagogical purposes to aid the reader in understanding the invention and the concepts contributed by the inventor to furthering the art, and are to be construed as being without limitation to such specifically recited examples and conditions, nor does the organization of such examples in the specification relate to a showing of the superiority and inferiority of the invention. Although the embodiments of the present invention have been described in detail, it should be understood that the various changes, substitutions, and alterations could be made hereto without departing from the spirit and scope of the invention.
Number | Date | Country | Kind |
---|---|---|---|
2017-193294 | Oct 2017 | JP | national |