The present application claims priority from Japanese patent application No. 2007-56491 filed on Mar. 7, 2007, the content of which is hereby incorporated by reference into this application.
The present invention relates to a data processing system comprising, as shared resources, a plurality of arithmetic circuits such as a floating-point processing circuit and a digital signal processing arithmetic circuit which receive operation commands to operate, and relates to a technology effectively applied to, for example, a single chip microcomputer of a multiprocessor core.
A technology of effectively using the operation resources of a multiprocessor system is described in Patent Document 1 (International Publication No. WO 2002/061591 Pamphlet). This technology adopts an interface circuit in a data processing system, the interface circuit allowing other data processing systems to be coupled, as a bus master, to an internal bus of the data processing system, and allows peripheral resources coupled with the internal bus of the data processing system to be directly used by other external data processing systems.
The inventors investigated that one processor core of a multiprocessor system distributes commands also to the arithmetic circuits of the other processor cores of the multiprocessor system to operate the arithmetic circuits of its own and other processor cores in parallel. According to this investigation, as can be analogized from Patent Document 1, one processor core can share the operation resources of other processor core, but must avoid any conflict of operation resources between both processor cores. However, it was found out by the inventors that only exclusive arbitration of use of operation resources is not sufficient to promote efficient use of sharable operation resources. If the shared operation resources are not allowed to be used by priority with a simple procedure, it is not possible that the arithmetic circuits of its own and other processor cores can be easily operated in parallel by distributing operation commands to other arithmetic circuits.
It is an object of the present invention to provide a data processing system in which arithmetic circuits which are shared resources can be used by priority with a simple procedure.
It is another object of the present invention to provide a data processing system in which one central processing unit can cause a plurality of arithmetic circuits to easily operate in parallel by distributing operation commands to the arithmetic circuits which are shared resources.
The above and further objects and novel features of the present invention will be apparent from the description in this specification and the accompanying drawings.
The outline of a typical one of inventions disclosed in this application will be briefly described below.
In a data processing system comprising central processing units and a plurality of arithmetic circuits, wherein the central processing units are able to supply a command to one arithmetic circuit based on one fetched instruction and supply a command to other arithmetic circuit based on other fetched instruction, a memory circuit is provided which is used to store first information indicating which arithmetic circuit is executing a command, and second information indicating which central processing unit has reserved the arithmetic circuit for execution of the next command. When operation commands are distributed to the arithmetic circuits which are shared resources, it can be determined by referring to the first information of the memory circuit whether the arithmetic circuits are already executing commands, so that any conflict among the arithmetic circuits can be easily avoided. When the arithmetic circuits are already executing commands, reservation of the arithmetic circuits for execution of the next commands using the second information of the memory circuit, makes it possible, after the execution, to assign operation commands fast to the arithmetic circuits and cause them to execute the commands.
Typical ones among the inventions disclosed in this application will be briefly described below.
Namely, the arithmetic circuits which are shared resources can be used by priority with a simple procedure to perform data processing.
Further, one central processing unit can cause a plurality of arithmetic circuits to easily operate in parallel by distributing operation commands to the arithmetic circuits which are shared resources.
First, an outline of typical embodiments of the present invention disclosed in this application will be described. The reference numerals and symbols in the figures which are referred to with parentheses in the outline description of the typical embodiments just exemplify ones included in concepts of components to which the reference numerals and symbols are attached.
[1] A data processing system according to a typical embodiment of the present invention includes a plurality of central processing units (CPU0, CPU1), a plurality of arithmetic circuits (FPU0, FPU1) capable of executing a command supplied from the central processing units, and a memory circuit (BREG, RREG, BREG0, BREG1, BREG0, BREG1, IREG0, and IREG1). The central processing units are able to supply a command to one arithmetic circuit based on one fetched instruction and supply a command to other arithmetic circuit based on other fetched instruction. The memory circuit is used to store first information (BF0, BF1) indicating which arithmetic circuit is executing the command and second information (RF0 and RF1, or, RF0_A, RF1_A, RF0_B, and RF1_B) indicating which central processing unit has reserved the arithmetic circuit for execution of the next command. Thus, when commands are distributed to the arithmetic circuits which are shared resources, it can be determined by referring to the first information of the memory circuit whether the arithmetic circuit is already executing a command, so that any conflict among the arithmetic circuits can be easily avoided. When the arithmetic circuit is already executing a command, reservation of the arithmetic circuit for execution of the next command using the second information of the memory circuit makes it possible, after the execution, to assign operation commands fast to the arithmetic circuits and cause them to execute the commands.
In one concrete embodiment, the central processing unit causes one arithmetic circuit assigned thereto to execute a first command, and determines, when using other arithmetic circuit assigned to other central processing unit, whether or not the other arithmetic circuit is under command execution by referring to the first information. The central processing unit supplies a second command to the other arithmetic circuit when the other arithmetic circuit is not under command execution, and determines, when the other arithmetic circuit is under command execution, whether or not the other arithmetic circuit has been reserved for command execution by referring to the second information. The central processing unit reserves the other arithmetic circuit when the other arithmetic circuit has not been reserved, supplies the second command to the other arithmetic circuit when the command execution of the other arithmetic circuit has finished before the one arithmetic circuit finishes execution of the first command, and supplies the second command to the one arithmetic circuit when the other arithmetic circuit is still under command execution when the one arithmetic circuit has finished execution of the first command. According to the above procedure, when executing a plurality of operation instructions, the central processing units are able to issue a command to the arithmetic circuits efficiently according to reserved or non-reserved states of the arithmetic circuits to cause the arithmetic circuits to execute the operation instructions.
In another concrete embodiment, the arithmetic circuit is an accelerator such as a floating-point processing circuit or a digital signal processing arithmetic circuit. The loads of the central processing unit can be reduced and the efficiency of data processing can be increased.
In still another concrete embodiment, the arithmetic circuit operates the first information, when the arithmetic circuit has finished operations according to a supplied operation command, so as to indicate that the arithmetic circuit is not under command execution. The state of the arithmetic circuit can be reflected to the first information more immediately than in the case the central processing unit operates the first information.
In another concrete embodiment, the data processing system further includes a plurality of arithmetic buses (FPUB0, FPUB1) which are individually coupled with the respective arithmetic circuits, and are commonly coupled with the central processing units. Bus conflicts which arise when the central processing units transfer operation commands to the arithmetic circuits and obtain the results of operation of the arithmetic circuits can be reduced.
In still another concrete embodiment, the memory circuit is commonly coupled with the arithmetic bus. Bus conflicts which arise when the central processing units refer to the memory circuit and the arithmetic circuits operate the memory circuit can be reduced.
In still another concrete embodiment, the data processing system further includes a comparison circuit coupled with the arithmetic bus. One input of the comparison circuit is coupled with one arithmetic bus, and the other input of the comparison circuit is coupled with the other arithmetic bus. The operation results of the floating-point processing circuits can be input to the comparison circuit through the operation buses through the central processing units, and can be compared by the comparison circuit. Thus, in such a case of executing two operation instructions, comparing the results of the operations, and then executing instructions using the comparison result, the number of steps of executing the instructions can be reduced. Furthermore, it becomes possible that a command according to one operation instruction is supplied to the two arithmetic circuits to allow the arithmetic circuits to operate individually, and the results of the operations are compared with the comparison circuit, so that it is also becomes possible to assure higher reliability than usual for the results of operation by the arithmetic circuits. For example, by providing an interrupt controller (INTC) which receives the comparison result by the comparison circuit as one interrupt factor, when the comparison is anticoincidence, re-execution of an operation instruction, failure verification processing for the arithmetic circuits, and the like can be performed according to the interrupt processing program of the interrupt controller.
[2] A data processing system according to an embodiment in another aspect includes a plurality of central processing units (CPU0, CPU1), a plurality of arithmetic circuits (FPU0, FPU1) capable of executing a command supplied from the central processing units, and a memory circuit. The central processing unit can supply a command to one arithmetic circuit based on one fetched instruction and supply a command to other arithmetic circuit based on other fetched instruction. The memory circuit is used to store first information (BF0, BF1) indicating which arithmetic circuit is executing the command and second information (RF0_A, RF1_A) indicating whether the arithmetic circuit has been reserved for execution of the next command. Thus, when commands are distributed to the arithmetic circuits which are shared resources, it can be determined by referring to the first information of the memory circuit whether the arithmetic circuits is already executing a command, so that any conflict between the arithmetic circuits can be easily avoided. When the arithmetic circuit is already executing a command, the arithmetic circuit is reserved for execution of the next command using the second information of the memory circuit, and thereby after the execution, commands can be assigned fast to the arithmetic circuit for execution of the commands.
In one concrete embodiment, the central processing unit causes one arithmetic circuit assigned thereto to execute a first command, and determines whether or not the other arithmetic circuit is under command execution, when using the other arithmetic circuit assigned to the other central processing unit, by referring to the first information. The central processing unit supplies a second command to the other arithmetic circuit when the other arithmetic circuit is not under command execution, and determines whether or not the other arithmetic circuit has been reserved for command execution, when the other of the arithmetic circuits are under command execution, by referring to the second information. The central processing unit reserves the other arithmetic circuit when the other arithmetic circuit has not been reserved by any of the central processing units, supplies the second command to the other arithmetic circuit when the command execution of the other arithmetic circuit has finished before the one arithmetic circuit finishes execution of the first command, and supplies the second command to the one arithmetic circuit when the other arithmetic circuit is still under command execution when the one arithmetic circuit has finished execution of the first command. According to the above procedure, when executing a plurality of operation instructions, the central processing unit can issue commands to the arithmetic circuits efficiently according to reserved or non-reserved states of the arithmetic circuits to cause the arithmetic circuits to execute the operation instructions.
The central processing unit has an internal memory circuit for storing information indicating which arithmetic circuit has been reserved for operation. According to this configuration, when the central processing unit confirms the reservation of its own, the central processing unit does not need to refer an external memory circuit. When information capable of indicating which central processing unit has reserved the arithmetic circuit for execution of the next command is employed as the second information, the central processing unit needs to refer the second information for confirmation of operation reservation of its own.
[3] A data processing system according to an embodiment in another aspect includes a plurality of processor cores (PCORE0, PCORE1), a first register (BREG), and a second register (RREG). Each of the processor cores has an arithmetic circuit (FPU0, FPU1) which receives an operation command of its own and from other processor cores to operate. The first register is used to store information (BF0, BF1) indicating whether each of the arithmetic circuits is used, and can be accessed by the processor cores. The second register is used to store information (RF0, RF1) indicating whether each of the arithmetic circuits has been reserved for next use by which of the processor cores, and can be accessed by the processor cores. Thus, when the processor core distributes commands to the arithmetic circuits which are shared resources of the other processor core, it can be determined by referring to the first register whether the arithmetic circuit of the other processor core is already executing a command, so that any conflict between the arithmetic circuits can be easily avoided. When the arithmetic circuit of the other processor core is already executing a command, the arithmetic circuit of the other processor core is reserved for execution of the next command using the second register, and thereby after the execution, the command can be assigned fast to the arithmetic circuit of the other processor core for execution of the commands.
In a concrete embodiment, the processor core refers to the first register, when using the arithmetic circuit of the other processor core, to determine whether the arithmetic circuit of the other processor core is used; supplies a command to the arithmetic circuit of the other processor core when the arithmetic circuit of the other processor core is not used; determines whether or not the arithmetic circuit of the other processor core has been reserved for use when the arithmetic circuit of the other processor core is used, by referring to the second register; reserves the arithmetic circuit of the other processor core when the arithmetic circuit of the other processor core has not been reserved; and supplies a command to the reserved arithmetic circuit when the reserved arithmetic circuit has become available before the arithmetic circuit of its own becomes available. According to the above procedure, when executing a plurality of operation instructions, one processor core can issue a command to the arithmetic circuit of its own and other processor cores efficiently according to reserved or non-reserved states of the arithmetic circuits to cause the arithmetic circuits to execute the operation instructions.
The embodiments will be described in more detail.
The processor core PCORE0 includes a central processing unit CPU0, a work memory MEM0, a floating-point processing circuit FPU0 which is an example of an arithmetic circuit, and a cache memory CACHE0. The central processing unit CPU0, the work memory MEM0, and the cache memory CACHE0 are commonly coupled with a CPU bus CPUB0. Likewise, the processor core 1 includes a central processing unit CPU1, a work memory MEM1, a floating-point processing circuit FPU1 which is an example of an arithmetic circuit, and a cache memory CACHE1. The central processing unit CPU1, the work memory MEM1, and the cache memory CACHE1 are commonly coupled with a CPU bus CPUB1.
The cache memories CACHE0 and CACHE1 are coupled with the peripheral bus PRPHB, and the external memory EXMEM is used as a primary storage of the cache memories CACHE0 and CACHE1.
The central processing units CPU0 and CPU1 are commonly coupled with the FPU buses FPUB0 and FPUB1, and the floating-point processing circuits FPU0 and FPU1 are commonly coupled with the FPU buses FPUB0 and FPUB1, respectively.
The central processing units CPU0 and CPU1 execute fetched instructions. An instruction set of the data processing system DPRCS1 includes central processing unit instructions (CPU instructions) and floating-point processing circuit instructions (FPU instructions). The central processing unit CPU0 or CPU1 executes a CPU instruction when it has fetched the CPU instruction, and issues an operation command corresponding to the FPU instruction when it has fetched the FPU instruction. Each of the floating-point processing circuits FPU0 and FPU1 has a command register in which an operation command is set by the central processing unit CPU0 or CPU1. Without a particular limit, when it is necessary to obtain an operation operand necessary for execution of a FPU instruction by memory access, the central processing unit CPU0 or CPU1 performs the memory access to set the operand into the data register of FPU0 or FPU1. When the central processing unit CPU0 or CPU1 has fetched a FPU instruction, it is able to set an operation command indicated by the FPU instruction in either of the floating-point processing circuits FPU0 and FPU1. As memory circuits which are referred to for the control, a busy register BREG and a reservation register RREG are commonly coupled with the FPU buses FPUB0 and FPUB1.
The busy register BREG is used to store 1-bit busy flags (first information) BF0 and BF1 indicating which of the floating-point processing circuits FPU0 and FPU1 is executing an operation command, respectively. The busy flag BF0 corresponds to the floating-point processing circuit FPU0, and the busy flag BF1 corresponds to the floating-point processing circuit FPU1. Each of the busy flags indicates, in a set state, that an operation command is being executed, and indicates, in a reset state, that an operation command is not being executed. Without a particular limit, the busy flag BF0 or BF1 is set by the central processing unit CPU0 or CPU1 when the central processing unit CPU0 or CPU1 supplies an operation command to the floating-point processing circuits FPU0 or FPU1, and is reset by the floating-point processing circuit FPU0 or FPU1 when the floating-point processing circuit FPU0 or FPU1 has executed an operation command.
The reservation register RREG is used to store two-bit reservation flags (second information) RF0 and RF1 indicating which of the central processing units CPU0 and CPU1 has reserved the floating-point processing circuits FPU0 and FPU1, respectively, for execution of the next operation command. The reservation flag RF0 corresponds to the floating-point processing circuit FPU0, and the reservation flag RF1 corresponds to the floating-point processing circuit FPU1. In the reservation flags, the value of “00” means that the floating-point processing circuit has not been reserved, the value of “10” means that the floating-point processing circuit has been reserved by the central processing unit CPU0, and the value of “11” means that the floating-point processing circuit has been reserved by the central processing unit CPU1. Reservation setting for the reservation flag RF0 or RF1 is performed by the central processing units CPU0 or CPU1, which performs reservation cancel in parallel with setting an operation command to the reserved floating-point processing circuit FPU0 or FPU1.
In the data processing system DPRCS1, when operation commands are distributed to the floating-point processing circuits FPU0 and FPU1 which are shared resources, it can be determined by referring to the busy register BREG whether the floating-point processing circuit FPU0 or FPU1 is already executing a command, so that any conflict between operational indications for the floating-point processing circuits FPU0 and FPU1 can be easily avoided. When the floating-point processing circuit FPU0 or FPU1 is already executing a command, the floating-point processing circuit is reserved for execution of the next operation command using the reservation register RREG, and thereby after the floating-point processing circuit which is executing an operation has finished the operation, an operation command can be assigned fast to the floating-point processing circuit to cause it to execute the operation command. Thus, when one central processing unit has fetched a plurality of FPU instructions, it is able to issue operation commands to the floating-point processing circuits efficiently according to reserved or non-reserved states of the floating-point processing circuits to cause the floating-point processing circuits to execute operations.
On the other hand, when a plurality of FPU instructions causing any register conflict can be assigned to FPU0 in succession, it is most efficient that FPU0 executes the FPU instructions in succession, so that it is recommended that one central processing unit CPU0 causes FPU0 to execute the first instruction and sets FPU0 to the reservation register RREG to cause FPU0 to execute the subsequent FPU instruction. For example, when the first and second floating-point adding instructions cause a register conflict, the first and second floating-point adding instructions are assigned to FPU0. Furthermore, when the first and fourth floating-point adding instructions cause a register conflict, the first and fourth floating-point adding instructions are assigned to FPU0, and the second and third floating-point adding instructions are assigned to FPU1.
By controlling resource assignment as described above, the processing that information about the registers possessed by the shared resources is saved on a memory and is loaded again onto the shared resources can be cut, and thereby reduction in processing efficiency and increase in power consumption caused by increase in the amount of bus traffic can be suppressed. By such instruction assignment using the reservation register RREG, the central processing units CPU0 and CPU1 capable of using the floating-point processing circuits FPU0 and FPU1 which execute instructions independently and are shared resources can use the shared resources efficiently.
In the data processing system DPRCS2 in
The busy register BREG0 has the above busy flag BF0, and the reservation register RREG0 has the above reservation flag RF0. The central processing unit CPU1 has a busy register BREG1 and a reservation register RREG1. The busy register BREG1 has the above busy flag BF1, and the reservation register RREG1 has the above reservation flag RF1. The significances of the flags BF0, RF0, BF1, and RF1 are basically equivalent to those of the data processing system DPRCS1 shown in
Up to this point, the present invention made by the inventors has been concretely described based on the embodiments. However, it is needless to say that the present invention is not limited to them, and various modifications can be made thereto without departing from the gist of it.
For example, the numbers of processor cores, central processing units, and floating-point processing circuits may be three or more. The arithmetic circuits are not limited to floating-point processing circuits, and may be appropriate circuits performing operational processing under control of central processing units, such as coding and decoding circuits, image processing circuits, or speech processing circuits. The memory which is used as a primary storage of the cache memories may be an external memory coupled with the outside of the data processing system rendered a semiconductor integrated circuit. Each of the processor cores may not have any cache memory, and may have an address conversion buffer used for virtual storage. The present invention can be widely applied to data processing systems in which a plurality of arithmetic circuits can be used as operation resources for one central processing unit. The data processing system of the present invention is not limited to a single-chip one, and may be a multi-chip one.
Number | Date | Country | Kind |
---|---|---|---|
2007-056491 | Mar 2007 | JP | national |