The present invention relates to a high-level synthesis device, a high-level synthesis method, and a high-level synthesis program to automatically generate a register-transfer level hardware description language (HDL) from a behavioral description in a programming language.
Conventionally, in the development of a large scale integration (LSI), design has been performed in a hardware description language, such as Verilog-HDL or VHDL. However, as integrated circuits have increased in size in recent years, design using a hardware description language lets the amount of design descriptions be enormous, and requires tremendous design time; hence, improvement in design productivity is sought. As one technique to improve design productivity, there is a high-level synthesis technique to automatically synthesize a register-transfer level circuit description from a behavioral description. The high-level synthesis technique is a technique to perform design in a high-level language, such as the C language, the C++ language or the System C language, with a higher level of abstraction than a hardware description language, and to automatically generate a hardware description language by using a high-level synthesis tool. By the high-level synthesis technique, it is possible to reduce the amount of design description, and to reduce the design time.
In a technique disclosed in Patent Literature 1, a behavioral-level description is separated into N stage descriptions, and a timing is adjusted in a scheduling unit so that pipeline processing of input/output and operations among the N stage descriptions are performed. Then, in the technique disclosed in Patent Literature 1, a hardware description language is generated so that stage circuits for each of the N stage descriptions, and a state control circuit to control possible 2N−1 stage states of a semiconductor integrated circuit are generated. In this manner, Patent Literature 1 discloses a behavioral synthesis method to realize a high-speed pipelined circuit.
Patent Literature 1: JP 2010-086310 A
There is a problem that the technique disclosed in Patent Literature 1 cannot be applied to a behavioral description of a circuit to perform a repeat arithmetic process to repeat an arithmetic process, wherein output of the arithmetic process is used as input to the next arithmetic process.
The present invention is aimed at providing a high-level synthesis device to generate a hardware description language with high processing performance, by enabling pipeline processing, even when a behavioral description of a circuit to perform a repeat arithmetic process to repeat an arithmetic process, wherein output of the arithmetic process is used as input to the next arithmetic process, is used as input.
A high-level synthesis device according to one aspect of the present invention includes a control data flow graph (CDFG) change unit to obtain, as a first CDFG, a CDFG representing a repeat arithmetic process to repeat an arithmetic process, the repeat arithmetic process using an output of the arithmetic process as an input to a next arithmetic process, and to change the first CDFG into the second CDFG to perform the repeat arithmetic process represented by the first CDFG through pipeline processing.
A high-level synthesis device according to the present invention includes a control data flow graph (CDFG) change unit to obtain, as a first CDFG, a CDFG representing a repeat arithmetic process to repeat an arithmetic process, in which output of the arithmetic process is used as input to the next arithmetic process, and to change the repeat arithmetic process into the second CDFG to execute the repeat arithmetic process represented in the first CDFG through pipeline processing; hence, there is an effect that the repeat arithmetic process can be pipelined.
A configuration of a high-level synthesis device 100 according to the present embodiment will be discussed using
In the present embodiment, the high-level synthesis device 100 is a computer. The high-level synthesis device 100 is equipped with hardware components such as a processor 910, a storage device 920, an input interface 930 and an output interface 940. The storage device 920 includes a memory 921 and an auxiliary storage device 922.
The high-level synthesis device 100 is equipped with, as a functional configuration, a CDFG generation unit 110, a scheduling unit 120, a pipeline judgment unit 150, a CDFG change unit 160, a binding unit 130, an RTL generation unit 140 and a storage unit 170.
In the following explanation, the CDFG generation unit 110, the scheduling unit 120, the pipeline judgment unit 150, the CDFG change unit 160, the binding unit 130 and the RTL generation unit 140 in the high-level synthesis device 100 are collectively called a high-level synthesis unit 101 as well. Further, in the following explanation, the functions of the CDFG generation unit 110, the scheduling unit 120, the pipeline judgment unit 150, the CDFG change unit 160, the binding unit 130 and the RTL generation unit 140 in the high-level synthesis device 100 are referred to as functions of “units” of the high-level synthesis device 100.
The functions of the “units” of the high-level synthesis device 100 are realized by software.
Further, the storage unit 170 is realized by the storage device 920. The storage unit 170 stores a source code 171, synthesis restriction information 172, circuit information 173 and RTL 174. Further, the storage unit 170 stores information such as the first CDFG 111 generated by the CDFG generation unit 110, control cycle information 121 and a scheduling result 122 generated by the scheduling unit 120, and the second CDFG 112 generated by the CDFG change unit 160.
The processor 910 is connected to other hardware components via a signal line to control the other hardware components.
The processor 910 is an integrated circuit (IC) to perform processing. The processor 910 is, as a specific example, a central processing unit (CPU).
The storage device 920 includes the memory 921 and the auxiliary storage device 922. The auxiliary storage device 922 is, as a specific example, a read only memory (ROM), a flash memory, or a hard disk drive (HDD). The memory 921 is, as a specific example, a random access memory (RAM). In the present embodiment, the storage unit 170 is realized by the memory 921. The storage unit 170 may be realized by the auxiliary storage device 922, or may be realized by the memory 921 and the auxiliary storage device 922. A realization method of the storage unit 170 is arbitrary.
The input interface 930 is a port whereto an input device such as a mouse, a keyboard, or a touch panel is connected. The input interface 930 is, as a specific example, a USB terminal. The input interface 930 may be a port whereto a local area network (LAN) is connected.
The output interface 940 is a port whereto a cable of a display apparatus such as a display device is connected. The output interface 940 is, as a specific example, a USB terminal or a high definition multimedia interface (HDMI) (registered trademark) terminal. The display device is, as a specific example, a liquid crystal display (LCD). The output interface 940 may be connected to an output device, such as a printer device.
The auxiliary storage device 922 stores a program to realize the functions of the “units.” The program is loaded into the memory 921, read into the processor 910, and executed by the processor 910. The auxiliary storage device 922 also stores an operating system (OS). At least a part of the OS is loaded into the memory 921, and the processor 910 executes the program to realize the functions of the “units” while executing the OS.
The high-level synthesis device 100 may be equipped with only one processor 910, or may be equipped with a plurality of processors 910. The plurality of processors 910 may cooperatively execute the program to realize the functions of the “units.”
The information, data, signal values and variable values indicating the results of the processing by the functions of “units” are stored in the memory 921, the auxiliary storage device 922, or a register or a cache memory in the processor 910. The arrows connecting each unit and the storage unit 170 in
The program to realize the functions of the “units” may be stored in a portable recording medium such as a magnetic disk, a flexible disk, an optical disc, a compact disk, a blue-ray (registered trademark) disc, a digital versatile disc (DVD), etc.
Note that the program to realize the functions of the “units” is also called a high-level synthesis program 520. The high-level synthesis program 520 is a program to realize the function described as the “units.” Further, what is called a high-level synthesis program product is a storage medium and a storage device wherein the high-level synthesis program 520 is recorded, into which a computer-readable program is loaded, irrespective of the form as it appears.
Next, a high-level synthesis technique as a premise of the present embodiment will be described.
The high-level synthesis device 100x is a configuration which is obtained by removing the pipeline judgment unit 150 and the CDFG change unit 160 from the configuration of the high-level synthesis device 100 according to the present embodiment described in
The high-level synthesis unit 101x performs high-level synthesis by using the source code 171, the synthesis restriction information 172 and the circuit information 173 as input, and outputs the RTL 174.
The RTL 174 is an example of a hardware description language.
The source code 171 is a behavioral description describing operations of a circuit as a subject of high-level synthesis in a high-level language, such as the C language, the C++ language and the System C language. The source code 171 is input via the input interface 930 from the input device, and stored in the storage unit 170.
The synthesis restriction information 172 includes information such as a circuit size, resource amount, timing restriction, clock frequency, a unit to be pipelined of the circuit as the subject of high-level synthesis. The synthesis restriction information 172 is input via the input interface 930 from the input device, and stored in the storage unit 170.
The circuit information 173 includes information such as the size and delay information, etc. of an arithmetic unit, a register, a memory unit, etc. provided in an LSI whereon a circuit after high-level synthesis is mounted. The circuit information 173 is input via the input interface 930 from the input device, and stored in the storage unit 170.
The RTL 174 is a circuit description wherein a circuit structure is written in a hardware description language. The circuit description is what to describe a circuit behavior by a combination of flows of signals between registers, and logical operations.
The circuit description is also referred to as a structural description of a circuit.
An outline of the high-level synthesis process S100x being the operation of the high-level synthesis device 100x in
In the CDFG generation process S110, the CDFG generation unit 110 performs syntax analysis of the source code 171, analyzes control structure and data dependency, and generates a control data flow graph (CDFG) 111. The first CDFG 111 is a graph representing a control flow and a data flow. The data flow is represented by nodes indicating arithmetic operations, nodes indicating variables, and edges joining a node to another node. The CDFG generation unit 110 deletes a redundant operation node. Further, the CDFG generation unit 110 performs deletion of unnecessary processing, deletion of common part processing, processing of constant propagation and constant convolution, and processing of increasing parallelism by deploying loop processing, etc. in order to generate a structure description of a circuit improved at its performance and reduced at its area. The first CDFG 111 will be described below in detail.
Next, in the scheduling process S120x, the scheduling unit 120x determines a control cycle necessary for performing processing indicated by each node inside the first CDFG 111, and outputs the control cycle as control cycle information 121. The scheduling unit 120x determines the control cycle based on a clock frequency set in the synthesis restriction information 172, and delay information of an arithmetic unit, a register, a memory unit, etc. set in the circuit information 173. At this time, the scheduling unit 120x tries the control cycle wherein a repeat process included in the first CDFG 111 is pipelined. When the processing cannot be performed in the control cycle tried, the scheduling unit 120x tries another method, and determines a control cycle. The scheduling unit 120x outputs the control cycle information 121 including the control cycle as a scheduling result 122.
Next, in the binding process S130, the binding unit 130 assigns hardware resources such as a hardware storage resource, a hardware arithmetic resource, etc. to a circuit based on the control cycle information 121. The binding unit 130 analyzes the lifetime of the hardware resources from the control cycle information 121. Based on the analysis result, the binding unit 130 assigns the same hardware resource to a hardware resource whose lifetime does not overlap, among hardware resources capable of the same processing, and shares hardware. The binding unit 130 outputs the assignment result of the hardware resources to the circuit as a binding result.
Lastly, in the RTL generation process S140, the RTL generation unit 140 generates a control circuit to be necessary for realizing the control cycle information 121 and the binding result. Then, the RTL generation unit 140 outputs an RTL 174 being a register transfer level description in addition to a data path whereto the hardware resources obtained by the binding unit 130 are connected.
Next, the high-level synthesis technique being a premise of the present embodiment will be described using specific examples.
The source code 171 illustrated in
The source code 171 illustrated in
As illustrated in
In the variable swapping process 302, an exponent part of the input variable A300 and an exponent part of the input variable B301 are compared in magnitude by a comparison 310, and a variable being a subject of processing of the digit matching process 303 is selected by a switch 311. In this case, when the exponent part of the input variable B301 is larger than the exponent part of the input variable A300, the mantissa of the input variable A300 is passed to the digit matching process 303 as a subject of the digit matching process, and the mantissa of the input variable B301 is passed to the digit matching process 303 as being unnecessary to be performed the digit matching process. When the exponent part of the input variable B301 is smaller than the exponent part of the input variable A300, the mantissa of the input variable B301 is passed to the digit matching process 303 as a subject of the digit matching process, and the mantissa of the input variable A300 is passed to the digit matching process 303 as being unnecessary to be performed the digit matching process.
In the digit matching process 303, the mantissa of the variable passed from the variable swapping process 302 as the subject of the digit matching process 303 in the variable swapping process 302 is performed a shift process to the right by a shifter 313, and is performed digit matching with the mantissa of the variable passed from the variable swapping process 302 as being unnecessary to be matched digits. The variable which has been performed digit matching is passed to the addition process 304. The shift amount for digit matching is calculated from a difference between the exponent part of the input variable A300 and the exponent part of the input variable B301 by subtraction 312.
Further, for the mantissa of the variable passed from the variable swapping process 302 as a variable unnecessary to be performed the digit matching process, the value input is passed as it is to the addition process 304.
In the addition process 304, the sum of two variables whose digits have been matched, which have been passed from the digit matching process 303, is obtained, and is output to the rounding process 305. Note that when the signs of two variables of the input variable A300 and the input value B301 are the same, addition is performed; meanwhile when the signs are different, subtraction is performed.
In the rounding process 305, a rounding process of the addition result passed from the addition process 304 to an approximate value is performed in order to normalize the addition result in accordance with the standard of IEEE 754, etc., which is then output as an operation result 306.
When the total value of the floating points as illustrated in
As described above, for the addition operation of the floating points, many processing steps are necessary, and longer calculation time is necessary than addition of integers. When the series of processing steps is performed by one clock, the clock rate becomes extremely low; hence generally, a circuit is designed in such a manner that each processing step is performed in different clock cycles.
A loop 400 indicates a loop count in the repeat process illustrated in
In the processing for each clock cycle of the processing 402 and the processing 403, variable swapping A0 and variable swapping A1 in
The processing cycles of the arithmetic process in one loop is four cycles in
However, in
In
In
However, in
This concludes the explanation of the high-level synthesis technique being the premise of the present embodiment.
Next, an operation of the high-level synthesis device 100 according to the present embodiment will be described.
The processing of the high-level synthesis process S100 by a high-level synthesis method 510 and the high-level synthesis program 520 of the high-level synthesis device 100 according to the present embodiment will be schematically described using
In the high-level synthesis process S100 illustrated in
In the following, the source code 171 describes a behavior of a repeat arithmetic process to repeat an arithmetic process, wherein output of the arithmetic process is used as input to the next arithmetic process.
Further, the first CDFG 111 is a CDFG representing a repeat arithmetic process to repeat an arithmetic process, wherein output of the arithmetic process is used as input to the next arithmetic process. Specifically, the first CDFG 111 is generated from the source code 171 by the CDFG generation unit 110.
Further, in the following, pipelining of the first CDFG 111 means making it possible to perform the repeat arithmetic process represented by the first CDFG 111 through pipeline processing.
In the scheduling process S120, processing to output a scheduling result 122 is added to the scheduling process S120x.
In the scheduling process S120, the scheduling unit 120 outputs a scheduling result 122 in a case wherein the repeat arithmetic process represented by the first CDFG is performed through pipeline processing. Specifically, the scheduling unit 120 outputs information indicating that processing cannot be realized in a control cycle of performing pipeline processing, a data hazard variable for which a data hazard occurs, and the scheduling result 122 including that processing cycles of a pipeline is four cycles. The data hazard variable is a variable for which a data hazard occurs in a case wherein the repeat arithmetic process represented by the first CDFG 111 is performed through pipeline processing. The processing cycles of the pipeline is processing cycles of the arithmetic process.
In the pipeline judgment process S150, the pipeline judgment unit 150 judges whether the repeat arithmetic process represented by the first CDFG 111 can be performed through pipeline processing based on the scheduling result 122. The pipeline judgment unit 150 judges whether the repeat arithmetic process represented by the first CDFG 111 can be performed through pipeline processing based on the data hazard variable included in the scheduling result 122. That is, the pipeline judgment unit 150 judges whether pipelining of the repeat arithmetic process is possible by changing the first CDFG 111. The pipeline judgment unit 150 judges whether pipelining of the first CDFG 111 is possible based on the scheduling result 122 output from the scheduling process S120.
When it is judged that pipelining of the first CDFG 111 is possible, the processing proceeds to the CDFG change process S160.
When it is judged that pipelining of the first CDFG 111 is impossible, the processing proceeds to the binding process S130.
The pipeline judgment process S150 will be described below in detail.
In the CDFG change process S160, the CDFG change unit 160 changes the first CDFG 111, and generates a second CDFG 112 after change. The CDFG change unit 160 obtains the first CDFG 111 representing the repeat arithmetic process, and changes the repeat arithmetic process represented by the first CDFG 111 to the second CDFG 112 to be performed through pipeline processing. The CDFG change unit 160 inputs the second CDFG 112 changed to the scheduling process S120.
The CDFG change process S160 will be described below in detail.
Next, the high-level synthesis process S100 according to the present embodiment will be described further in detail.
<CDFG Generation Process S110>
The CDFG generation process S110 is processing to generate the first CDFG 111 from the source code 171, as mentioned above.
In
A condition judgment DFG 701 represents control of condition judgment, which indicates performing an arithmetic process in a case of “i<N,” and completing an arithmetic process in a case of “else” (other).
The arithmetic process DFG 702 is a DFG of an arithmetic process, which performs an addition process of floating points illustrated in
Condition update DFG 703 is a DFG to update a variable ‘i’ to perform loop condition judgment, wherein ‘i’ is increased one by one for every one loop.
<Scheduling Process S120>
In the scheduling process S120, the scheduling unit 120 determines a control cycle necessary for performing processing indicated in each node inside the first CDFG 111.
When the first CDFG 111 in
As mentioned above, the scheduling unit 120 tries the control cycle wherein the repeat arithmetic process 790 included in the first CDFG 111 is pipelined. Specifically, the scheduling unit 120 tries the control cycle wherein pipeline processing is performed at the timing illustrated in
When the processing cannot be performed in the control cycle tried, the scheduling unit 120 tries another method, and determines a control cycle. Specifically, in a case of the pipeline processing illustrated in
The scheduling unit 120 outputs control cycle information 121 as a scheduling result. Specifically, when it is determined the control cycle wherein the processing is performed at the timing illustrated in
Further, the scheduling unit 120 outputs information indicating that the processing cannot be performed in the control cycle tried as a scheduling result 122. Specifically, the scheduling unit 120 outputs the scheduling result 122 including that the processing cannot be realized in the control cycle to perform pipeline processing, a data hazard variable for which a data hazard occurs, and a processing cycle of a pipeline.
When the control cycle of the pipeline processing illustrated in
<Pipeline Judgment Process S150>
In the pipeline judgment process S150, the pipeline judgment unit 150 judges whether pipelining of the first CDFG 111 is possible based on the scheduling result 122 notified from the scheduling unit 120. When it is judged that pipelining of the first CDFG 111 is unnecessary or impossible, the pipeline judgment unit 150 outputs the control cycle information 121 output by the scheduling unit 120 to the binding unit 130. When it is judged that pipelining is possible, the pipeline judgment unit 150 notifies the CDFG change unit 160 of the scheduling result 122 notified from the scheduling unit 120, and orders change of the first CDFG 111.
In a step S151, the pipeline judgment unit 150 judges whether a data hazard occurs and pipelining fails based on the scheduling result 122. Specifically, the pipeline judgment unit 150 judges whether a data hazard occurs and pipelining fails from a “trial result of pipelining” column and a “data hazard variable” column in the scheduling result 122. In the example of
In the step S152, based on the scheduling result 122, the pipeline judgment unit 150 judges whether there are only data hazard variables that occur by using output variables of the last arithmetic process (i.e., last loop) as input variables for the next arithmetic process. The fact that there are only data hazard variables that occur by using the output variables of the last arithmetic process (i.e., last loop) as the input variables for the next arithmetic process means that a data hazard that depends on an operation order of a plurality of operation nodes included in the arithmetic process does not occur. Specifically, the pipeline judgment unit 150 compares variables set in the “data hazard variable” column in the scheduling result 122 with the first CDFG 111, and judges whether the variables set in the “data hazard variable” column in the scheduling result 122 are used only for the output variables of the last arithmetic process and for the input variables of the next arithmetic process. When the pipeline judgment unit 150 detects that a data hazard that occurs in pipeline processing occurs by inputting the output variables in the last loop, and that a data hazard depending on the operation order of the operation nodes does not occur, the procedure proceeds to a step S153. In the other cases, the procedure proceeds to the step S154.
In the step S153, the pipeline judgment unit 150 judges that pipelining of the first CDFG is possible. When it is judged that pipelining is possible, the pipeline judgment unit 150 notifies the CDFG change unit 160 of the scheduling result 122 notified from the scheduling unit 120, and orders change of the first CDFG 111.
In the step S154, the pipeline judgment unit 150 judges that pipelining of the first CDFG 111 is unnecessary or impossible. When it is judged that pipelining is unnecessary or impossible, the pipeline judgment unit 150 outputs the control cycle information 121 output from the scheduling unit 120 to the binding unit 130.
<CDFG Change Process S160>
In the CDFG change process S160, the CDFG change unit 160 changes the first CDFG 111 to the second CDFG 112 wherein the repeat arithmetic process 790 represented by the first CDFG 111 is performed through pipeline processing. When it is judged that the repeat arithmetic process 790 represented by the first CDFG 111 can be performed through pipeline processing by the pipeline judgment unit 150, the CDFG change unit 160 changes the first CDFG 111 to the second CDFG 112.
In other words, the CDFG change unit 160 changes the first CDFG 111 generated by the CDFG generation unit 110 so as to be realized through pipeline processing of processing cycles of an arithmetic process (loop processing). That is, the CDFG change unit 160 changes the first CDFG 111 to the second CDFG 112 so that the first CDFG 111 can be realized through the pipeline processing of four cycles being the processing cycle of the arithmetic process (loop processing).
The CDFG change unit 160 changes the first CDFG 111 to the second CDFG 112 based on the loop count of the repeat arithmetic process 790, and the processing cycles of the arithmetic process.
The CDFG change unit 160 divides, in the first CDFG 111, the repeat arithmetic process 790 into repeat arithmetic sub-processes of the number of the processing cycles. Then, the CDFG change unit 160 changes the repeat arithmetic sub-processes into the second CDFG 112 representing the first arithmetic process 804 to perform repeat arithmetic sub-processes of the number of the processing cycles, and the second arithmetic process 814 to perform an arithmetic process 812 by using each output of the repeat arithmetic sub-processes of the number of the processing cycles as input.
The first arithmetic process 804 can be performed through pipeline processing. The first arithmetic process 804 is also called the first repeat arithmetic process. The second arithmetic process 814 can be performed through pipeline processing. Here, the second arithmetic process 814 can be also performed through time-division processing. The second arithmetic process 814 is also called the second repeat arithmetic process.
In the second CDFG 112 in
The first point is that the initial setting 700 of the first CDFG 111 is changed to an initial setting 800 in the second CDFG 112.
The second point is that the arithmetic process 702 of the first CDFG 111 is changed to an arithmetic process 802 in the second CDFG 112.
The third point is that the second arithmetic process composed of an initial setting 810, a condition judgment 811, an arithmetic process 712 and a loop condition variable update 813 is added in the second CDFG 112.
In the second CDFG 112 of
In a step S161, the CDFG change unit 160 changes the first CDFG 111 so that output variables “res_d” of the arithmetic process 702 are arrayed in the number of processing cycles of the arithmetic process (pipeline processing). In the present embodiment, since the cycle number of the arithmetic process (pipeline processing) is four, the CDFG change unit 160 arrays output variables in “res_d1[0] through res_d1[4]” as in the arithmetic process 802, and assigns an acquisition source and a save destination of the operation result as “red_d1[i%4], from “res_d1[0]” through “res_d1[3]” for each loop count.
In a step S162, the CDFG change unit 160 changes the first CDFG 111 so as to set initial values of the output variables arrayed. The CDFG change unit 160 changes the first CDFG 111 so as to set the initial values of the output variables “res_d1[0] through res_d1[4]” arrayed. Specifically, the CDFG change unit 160 adds output variables “res_d[]=0,” “res_d[1]=0,” “res_d[2]=0” and “res_d[3]=0” to the first CDFG 111, as in the initial setting 800.
In a step S163, the CDFG change unit 160 adds the second arithmetic process 814. The CDFG of the second arithmetic process 814 to be added is the same as the first CDFG 111 before change. The second arithmetic process 814 is different in that input variables of the arithmetic process are output of the first arithmetic process 804, and that the number of times of repeat operation of the arithmetic process 812 is the cycle number of the arithmetic process (pipeline processing).
Specifically, the CDFG change unit 160 first reproduces the initial setting 700 and generates an initial setting 800. Next, the CDFG change unit 160 changes the number of repeat operation “i<N” of the condition judgment 701 to “i<4”, and generates a condition judgment 811. Next, the CDFG change unit 160 changes the input variables “in_d” of the arithmetic process 702 to “redd1[i]”, and generates an arithmetic process 812. Lastly, the CDFG change unit 160 reproduces the loop condition variable update 703, and generates a loop condition variable update 813.
As described above, the CDFG change unit 160 divides the repeat arithmetic process 790 into four repeat arithmetic sub-processes, being the number of processing cycles, by arraying the output variables of the arithmetic process in the number of processing cycles. Four repeat arithmetic sub-processes are each arithmetic process 802 to input “red_d1[i%4]” and “in_d[i]” and output “red_d1[i%4]”. Four repeat arithmetic sub-processes can be performed through pipeline processing. Then, the CDFG change unit 160 outputs each execution result of four repeat arithmetic sub-processes to the second arithmetic process 814, and performs an arithmetic process 812.
This concludes the explanation of the high-level synthesis process S100 according to the present embodiment.
A formula 50 represents the first CDFG 111 illustrated in
A circuit diagram 60 represents a circuit generated from the first CDFG 111 illustrated in
In the circuit diagram 60, since the arithmetic processing circuit 601 cannot be performed through pipeline processing, the arithmetic processing circuit 601 is performed through time-division processing.
Meanwhile, in the circuit diagram 61, an arithmetic processing circuit 611 corresponds to the first arithmetic process 804 in
In
Further, in the present embodiment, the example is provided of the case wherein the cycle number of the pipeline processing is four; however, the present embodiment can be also applied to a case wherein the cycle number of pipeline processing is other than four. The CDFG may be changed in such a way that in the first arithmetic process, the operation result is stored in arrays of the cycle number of the pipeline processing, and in the second arithmetic process, operation is performed by using as input the arrays of the cycle number of the pipeline processing.
Further, in the present embodiment, the example is provided wherein addition of floating points is taken as an example of an arithmetic process; however, the arithmetic process as a target of the present embodiment is not limited to addition of floating points. In the first arithmetic process 804 in
Further, the high-level synthesis device 100 may include a communication device, and receive the source code 171, the synthesis restriction information 172 and the circuit information 173 via the communication device. Further, the high-level synthesis device 100 may transmit the RTL 174 via the communication device. In this case, the communication device includes a receiver and a transmitter. Specifically, the communication device is a communication chip or a network interface card (NIC). The communication device functions as a communication unit to communicate data. The receiver functions as a receiving unit to receive data, and the transmitter functions as a transmitting unit to transmit data.
Further, in the present embodiment, the functions of the “units” of the high-level synthesis device 100 are realized by software; however, as a variation, the functions of the “units” of the high-level synthesis device 100 may be realized by hardware components.
A configuration of a high-level synthesis device 100y according to a variation of the present embodiment will be described using
The processing circuit 909 is a dedicated electronic circuit for realizing the functions of the “units” described above and the storage unit 170. The processing circuit 909 is specifically a single circuit, a composite circuit, a processor that has been made into a program, a processor that has been made into a parallel program, a logic IC, a gate array (GA), an application specific integrated circuit (ASIC) or a field-programmable gate array (FPGA).
The functions of the “units” may be realized by one processing circuit 909 or may be realized dispersedly by a plurality of processing circuits 909.
As another variation, the functions of the high-level synthesis device 100 may be realized by combination of software and hardware. That is, a part of the functions of the high-level synthesis device 100 may be realized by dedicated hardware, and the rest of the functions may be realized by software.
The processor 910, the storage device 920 and the processing circuit 909 are collectively referred to as “processing circuitry.” That is, the functions of the “units” and the storage unit 170 are realized by the processing circuitry even when the configuration of the high-level synthesis device 100 is any of the configurations as illustrated in
The “units” may be replaced with “steps,” “procedures” or “processing.” Further, the functions of the “units” may be realized by firmware.
As described above, the high-level synthesis device 100 according to the present embodiment includes the CDFG change unit to change CDFGs. The CDFG change unit changes CDFGs in such a manner that it is possible to perform a repeat arithmetic process to repeat an arithmetic process, using output variables as the next input variables, through pipeline processing. Thus, it is possible to make the repeat arithmetic process to repeat the arithmetic process using output variables as the next input variables be also pipelined, and to obtain an appropriate operation result. Further, it is possible to generate an RTL description with high processing performance (product of processing latency and a clock cycle) also in a circuit wherein a result of the last time is referred to for input to processing in one loop as described above.
Further, the high-level synthesis device 100 according to the present embodiment includes the pipeline judgment unit to judge whether a repeat arithmetic process can be performed through pipeline processing based on a scheduling result notified from the scheduling unit. Since it is possible for the CDFG change unit to change a CDFG only when pipeline processing is possible by the pipeline judgment unit, it is possible to efficiently change the CDFG while omitting unnecessary processing.
Further, since the high-level synthesis device 100 according to the present embodiment determines a change method of a CDFG according to the cycle number of pipeline processing, the CDFG can be changed using the original CDFG.
In the above, the embodiment of the present invention is described; however, any one or any arbitrary combination of what are described as the “units” in the explanation of the embodiment may be adopted. That is, functional blocks of the high-level synthesis device are arbitrary as long as the functional blocks can realize the functions as described in the above embodiment. The high-level synthesis device may be configured by any combination of or arbitrary block configuration of those functional blocks. Further, the high-level synthesis device needs not be one device, but may be a high-level synthesis system configured by a plurality of devices.
Further, a plurality of parts of the embodiment may be combined and implemented. Otherwise, the embodiment may be partially implemented. Additionally, the embodiment may be partially or as a whole implemented in any combined manner.
Note that the embodiment as mentioned above is essentially preferable examples, not aiming at limiting the range of the present invention, application and use thereof, and various alterations can be made as needed.
50, 51: formula; 60, 61: circuit diagram; 100, 100x, 100y: high-level synthesis device; 101, 101x: high-level synthesis unit; 110: CDFG generation unit; 111: CDFG; 120, 120x: scheduling unit; 121: control cycle information; 122: scheduling result; 130: binding unit; 140: RTL generation unit; 150: pipeline judgment unit; 160: CDFG change unit; 112: second CDFG; 170: storage unit; 171: source code; 172: synthesis restriction information; 173: circuit information; 174: RTL; 221: trial result; 222: data hazard variable; 223: processing cycle; 300: input variable A; 301: input variable B; 302: variable swapping process; 303: digit matching process; 304: addition process; 305:
rounding process; 306: operation result; 310: comparison; 311: switch; 312: subtraction; 313: shifter; 400, 500: loop; 401, 501: cycle; 403, 403, 502, 503: processing; 510: high-level synthesis method; 520: high-level synthesis program; 601, 611, 613: arithmetic processing circuit; 700, 800, 810: initial setting; 701, 811: condition judgment; 702, 802, 812: arithmetic process; 703, 803, 813: loop condition variable update; 790: repeat arithmetic process; 804: first arithmetic process; 814: second arithmetic process; 909: processing circuit; 910: processor; 920: storage device; 921: memory; 922: auxiliary storage device; 930: input interface; 940: output interface; S100, S100x: high-level synthesis process; S110: CDFG generation process; S120, S120x: scheduling process; S130: binding process; S140: RTL generation process; S150:
pipeline judgment process; S160: CDFG change process
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2016/058445 | 3/17/2016 | WO | 00 |