1. Field of the Invention
The present invention relates to a co-processor interface (CPIF). More particularly, the present invention relates to a high performance de-coupled CPIF.
2. Description of the Related Art
A co-processor (COP) is a specialized processor usually adopted to perform and accelerate special operations such as floating-point calculation and crypto operations for a main processor (MP). Examples of COPs include graphics processing units (GPUs) and digital signal processors (DSPs). In general, the COP and the MP connect by a dedicated CPIF. Through the CPIF, the MP dispatches COP instructions to the COP, passes data and early flush events to the COP, receives status reports from the COP, and notifies the COP of the final decision of whether to commit or flush all non-commitment COP instructions.
When the width of a native data type of the COP is different from the width of a native data type of the MP, the endian of the data should be taken into consideration for data transfer between the MP and the COP. A traditional solution is processing the data by software according to data endian of the MP and the COP. Generally, the software swaps or changes the locations of data in a register. However, the performance of software is relatively low compared to that of hardware. Another traditional solution is providing a global signal to pass the endian from the MP to the COP and then process the data according to the endian in the COP automatically. However, when the MP changes its endian because of, for example, switching to another process, it is not efficient to synchronize the change of endian in the MP with the global signal. Another traditional solution is providing a control bit in the COP to indicate its endian and the control bit is used to guide the hardware to process the data accordingly. Similarly, when the COP changes its endian for some reason, it is not efficient to synchronize the change of endian in the COP with the control bit.
When the MP finds out that a branch prediction is unsuccessful and a COP instruction has to be flushed as a result, the MP passes an early flush event to the COP so that the COP can flush the COP instruction out of the pipeline of the COP. Conventionally, a CPIF has only an early flush interface (EFI) to transmit the early flush events from the MP to the COP and the performance degrades when there are many early flush events generated in different pipeline stage in the MP because some early flush events may take too long to pass the queue of the single EFI to reach the COP and consequently block the COP instructions waiting for the flush-or-no-flush verdict of the early flush events.
The MP must wait for the status reports for some COP instructions. A status report arriving at the MP too late may hinder the execution flow of a COP instruction and degrade the performance.
A CPIF may be designed in coupled or de-coupled form. A coupled CPIF means it is a pipeline-dependent interface. In other words, a coupled CPIF is specialized and optimized for specific pipeline architecture. Each signal transferred by the coupled CPIF is implemented in specific pipeline stages both in the COP and the MP. The coupled CPIF features high performance but is neither scalable nor portable.
On the other hand, a de-coupled CPIF means it is a pipeline-independent interface. Each signal transferred by the de-coupled CPIF is not necessary to be implemented in the specific pipeline stages both in the COP and the MP. The de-coupled CPIF is highly scalable and portable.
Accordingly, the present invention is directed to a de-coupled CPIF, which provides a straightforward and high-performance solution to handle data endian between an MP and a COP.
The present invention is also directed to a de-coupled CPIF, which divides the status report provided by a COP into an early status report and a late status report. The de-coupled CPIF may disable the late status report in order to improve the performance. The present invention is also directed to a de-coupled CPIF, which provides multiple EFIs in order to improve the performance of the processing of early flush events.
The present invention is also directed to a de-coupled CPIF, which combines all the functions and features of the aforementioned de-coupled CPIFs in order to improve the performance of the handling of data endian, status reports and early flush events between an MP and a COP.
According to an embodiment of the present invention, a de-coupled CPIF is provided. The de-coupled CPIF handles the execution flow of a COP instruction. An MP dispatches the COP instruction to a COP and the de-coupled CPIF includes a plurality of signal interfaces transmitting a first signal group, a second signal group and a third signal group included in the execution flow of the COP instruction between the MP and the COP. The MP uses the first signal group to dispatch the COP instruction to the COP. The second signal group is used to transfer data corresponding to the COP instruction between the MP and the COP. The MP uses the third signal group to notify the COP of whether to commit the COP instruction or to flush all non-commitment COP instructions in all pipeline stages of the COP. The endian information of the data is provided by the MP or the COP through the signal interfaces.
According to another embodiment of the present invention, another de-coupled CPIF is provided. The de-coupled CPIF includes a plurality of signal interfaces transmitting a first signal group, a second signal group, a third signal group and a fourth signal group included in the execution flow of the COP instruction between the MP and the COP. The MP uses the first signal group to dispatch the COP instruction to the COP. The COP uses the second signal group to provide an early status report to the MP and the COP uses the third signal group to provide a late status report to the MP. The early status report is provided before the late status report. The MP uses the fourth signal group to notify the COP of whether to commit the COP instruction or to flush all non-commitment COP instructions in all pipeline stages of the COP.
According to another embodiment of the present invention, another de-coupled CPIF is provided. The de-coupled CPIF includes one or a plurality of EFIs coupled between at least one stage of a pipeline of the MP and at least one stage of a pipeline of the COP. The EFIs transmit at least one signal group included in the execution flow of the COP instruction between the MP and the COP. The MP uses the signal group to pass early flush events to the COP. The early flush events notify the COP that a COP instruction passes the corresponding EFI or to flush all COP instructions which do not pass the corresponding EFI.
According to another embodiment of the present invention, another de-coupled CPIF is provided. This de-coupled CPIF includes all the functions and features of the aforementioned de-coupled CPIFs and features all the advantages and effects of the aforementioned de-coupled CPIFs.
The accompanying drawings are included to provide a further understanding of the invention, and are incorporated in and constitute a part of this specification. The drawings illustrate embodiments of the invention and, together with the description, serve to explain the principles of the invention.
Reference will now be made in detail to the present embodiments of the invention, examples of which are illustrated in the accompanying drawings. Wherever possible, the same reference numbers are used in the drawings and the description to refer to the same or like parts.
A COP instruction may trigger an early status report and a late status report from the COP that executes the COP instruction. An early status report corresponding to a COP instruction is always generated and provided to the MP no later than the late status report corresponding to the same COP instruction. Both the early status report and the late status report are used to notify the MP whether or not an abnormal status that can affect the execution flow of the corresponding COP instruction such as error, exception or trap occurs in the COP during the execution of the COP instruction. The late status report is generated in the last stage of the pipeline of the COP where an abnormal status can happen, while the early status report may be generated in any stage of the pipeline of the COP including the last stage.
In this embodiment of the present invention, the COP provides the early status reports and the late status reports as traps to the MP. In this scenario, traps are better than interrupts. Traps can enter the MP directly, while interrupts must pass through an external interrupt controller first and then reach the MP.
A signal group is a set of signals used by the MP and the COP for handshaking according to predetermined interface protocols.
In this embodiment, when a COP instruction requires endian information to resolve the order of data, the MP 110 provides the endian information of the data through the signal interfaces 140 to the COP 150. The MP 110 may incorporate the endian information into the signal group inst_dispatch or the signal group m2c_data in order to transfer the endian information to the COP 150. In some other embodiments of the present invention, the endian information may be provided by the COP 150 to the MP 110. The COP 150 may use the signal group c2m_data to provide the endian information. Alternatively, the execution flow of COP instructions may include another individual signal group transmitted between the MP 110 and the COP 150. The MP 110 or the COP 150 may use the individual signal group to transfer the endian information between the MP 110 and the COP 150.
The signal interfaces 140 of the de-coupled CPIF 130 may include a plurality of EFIs coupled between at least one stage of a pipeline of the MP 110 and at least one stage of a pipeline of the COP 150. The EFIs may transmit at least one signal group included in the execution flow of the COP instruction between the MP 110 and the COP 150. The MP 110 may use the signal group to pass early flush events to the COP 150. The early flush events notify the COP 150 that a COP instruction passes the corresponding EFI or to flush all COP instructions which do not pass the corresponding EFI. In order to improve performance, the de-coupled CPIF 130 may pass early flush events of the COP instruction from the MP 110 to the COP 150 immediately when the early flush events are generated in the pipeline stages of the MP 110.
In an embodiment of the present invention, the EFIs may be coupled between a plurality of predetermined stages of the pipeline of the MP and a plurality of predetermined stages of the pipeline of the COP. Each EFI may pass an early flush event from a different one of the predetermined stages of the pipeline of the MP to a different one of the predetermined stages of the pipeline of the COP. For example, in
When a COP instruction enters any one of the pipeline stages 551, 552 and 553 of the COP 550, before entering the next pipeline stage, the COP instruction must wait for the flush-or-no-flush verdict of the corresponding early flush event from the corresponding pipeline stage of the MP 510. The EFIs in
In another embodiment of the present invention, a particular EFI may be coupled between a single predetermined stage of the pipeline of the MP and a particular one of a plurality of predetermined stages of the pipeline of the COP to pass the early flush event from the predetermined stage of the pipeline of the MP to the particular predetermined stage of the pipeline of the COP. Each of the other EFIs may be coupled to a different one of the other predetermined stage of the pipeline of the COP to provide a dummy early flush event indicating no flush to the corresponding predetermined stage of the pipeline of the COP. For example, in
In the example of
In another embodiment of the present invention, the EFIs may be coupled between a plurality of predetermined stages of the pipeline of the MP and a single predetermined stage of the pipeline of the COP. The de-coupled CPIF connecting the MP and the COP may collect an early flush event from each of the predetermined stages of the pipeline of the MP and then provides a summary event to the predetermined stage of the pipeline of the COP according to the early flush events collected from the MP. For example, in
As mentioned above, the COP may provide an early status report and a late status report to the MP in response to a COP instruction to indicate whether or not an abnormal status occurs during the execution of the COP instruction. In some circumstances, the late status report may be disabled to improve performance. For example, when it is known previously that the COP instruction does not generate any abnormal status for the late status report, or the generation of the abnormal status is disabled, or the abnormal status generated by the COP instruction is too trivial to affect the execution flow of the COP instruction, the COP may disable the late status report. When the late status report is disabled in this way, the performance of the MP can be higher because the MP does not have to wait idly for the late status report.
The embodiment of
The de-coupled CPIF in the embodiments above transfers the endian information along with the corresponding COP instruction to the COP so that the order of data can be properly arranged. The aforementioned de-coupled CPIF provides multiple EFIs so that early flush events generated in different pipeline stages of the MP can be transferred to the COP simultaneously without blocking the execution of any COP instruction. The aforementioned de-coupled CPIF can disable late status reports to improve the performance of the MP. In summary, the present invention provides a de-coupled CPIF that is scalable and portable and features high performance.
It will be apparent to those skilled in the art that various modifications and variations can be made to the structure of the present invention without departing from the scope or spirit of the invention. In view of the foregoing, it is intended that the present invention cover modifications and variations of this invention provided they fall within the scope of the following claims and their equivalents.