The invention relates generally to embedded microprocessor architecture and more specifically to systems and methods for selectively decoupling an extended instruction pipeline from a main pipeline in an microprocessor-based system.
Processor extension logic is utilized to extend a microprocessor's capability. Typically, this logic is in parallel to and accessible by the main processor pipeline. It is often used to perform specific, repetitive, computationally intensive functions thereby freeing up the main processor pipeline.
In conventional microprocessors, there are essentially two types of parallel pipeline architectures: tightly coupled and loosely coupled, or decoupled. In the former, instructions are fetched and executed serially in the main processor pipeline. If the instruction is an instruction to be processed by the extension logic, the instruction is sent to that logic. Because every instruction originates from the main pipeline the two pipelines are said to be tightly coupled. This limits the degree of concurrency exploitable between the pipelines.
In the second architecture, the parallel instruction pipeline containing the extension logic is capable of fetching and executing its own instructions and hence maximizing concurrency. However, control and synchronization between the two pipelines becomes difficult when programming a processor having such a decoupled architecture. Thus, there exists a need for a parallel pipeline architecture that can fully exploit the advantages of parallelism without suffering from the design complexity of loosely or completely decoupled pipelines.
Accordingly, at least one embodiment of the invention provides a microprocessor architecture. The microprocessor architecture according to this embodiment comprises a first processor instruction pipeline, comprising a front end portion and a rear portion, a second processor instruction pipeline, comprising a front end portion and a rear portion, and an instruction queue coupling the first and second instruction pipeline between their respective front end and rear portions.
Another embodiment of the invention provides a method of dynamically decoupling a parallel extended processor pipeline from a main processor pipeline. The method according to this embodiment comprises sending an instruction from the main processor pipeline to the parallel extended processor pipeline instructing the parallel extended processor pipeline to operate autonomously, operating the parallel extended processor pipeline autonomously, storing subsequent instructions from the main processor pipeline to the parallel extended processor pipeline in an instruction queue, executing an instruction with the parallel extended processor pipeline to cease autonomous execution, and thereafter executing instructions supplied by the main processor pipeline in the queue.
Still a further embodiment of the invention provides a method of performing dynamically controlled parallel instruction processing in a microprocessor. The method according to this embodiment comprises fetching and executing instructions with a main processor pipeline, sending instructions from the main processor pipeline to a parallel extended processor pipeline via an instruction queue coupling the two pipelines, and if the instruction is to an instruction to be executed by the parallel extended pipeline, executing that instruction with the parallel extended pipeline, otherwise if the instruction is an instruction instructing that parallel extended pipeline to begin autonomous execution, thereafter fetching and executing instructions autonomously with the parallel extended pipeline independent of the main pipeline's instruction fetches, and storing instructions from main pipeline for the parallel extended pipeline in the instruction queue until autonomous processing has ceased.
These and other embodiments and advantages of the present invention will become apparent from the following detailed description, taken in conjunction with the accompanying drawings, illustrating by way of example the principles of the invention.
In order to facilitate a fuller understanding of the present disclosure, reference is now made to the accompanying drawings, in which like elements are referenced with like numerals. These drawings should not be construed as limiting the present disclosure, but are intended to be exemplary only.
The following description is intended to convey a thorough understanding of the embodiments described by providing a number of specific embodiments and details involving microprocessor architecture and systems and methods for selectively decoupling an extended instruction pipeline from a main instruction pipeline. It should be appreciated, however, that the present invention is not limited to these specific embodiments and details, which are exemplary only. It is further understood that one possessing ordinary skill in the art, in light of known systems and methods, would appreciate the use of the invention for its intended purposes and benefits in any number of alternative embodiments, depending upon specific design and other needs.
Referring now to
In various embodiments, a single instruction issued by the processor pipeline 12 may cause up to 16 16-bit elements to be operated on in parallel through the use of the 128-bit data path 55 in the media engine 50. In various embodiments, the SIMD engine 50 utilizes closely coupled memory units. In various embodiments, the SIMD data memory 52 (SDM) is a 128-bit wide data memory that provides low latency access to and from the 128-bit vector register file 51. The SDM contents are transferable to and from system main memory via a DMA unit 54 thereby freeing up the processor core 10 and the SIMD core 50. In various embodiments, a SIMD code memory 56 (SCM) allows the SIMD unit to fetch instructions from a localized code memory, allowing the SIMD pipeline to dynamically decouple from the processor core 10 resulting in truly parallel operation between the processor core and SIMD media engine as will be discussed in greater detail in the context of
Therefore, in various embodiments, the microprocessor architecture will permit the processor-based system 5 to operate in both closely coupled and decoupled modes of operation. In the closely coupled mode of operation, the SIMD program code fetch is exclusively handled by the main processor core 10. In the decoupled mode of operation, the SIMD pipeline 50 executes code fetched from a local memory 56 independent of the processor core 10. The processor core 10 may therefore instruct the SIMD pipeline 50 to execute autonomously in this de-coupled mode, for example, to perform video tasks such as audio processing, entropy encoding/decoding, discrete cosine transforms (DCTs) and inverse DCTs, motion compensation and de-block filtering.
Referring now to
Extending a general-purpose microprocessor with application specific extension instructions can often add significant length to the instruction pipeline. In the pipeline of
By using the single pipeline front-end to fetch and issue all instructions, the processor pipeline of
Referring now to
Therefore, instead of trying to get what effectively becomes two independent processors to work together as in the pipeline depicted in
In addition to efficient operation, another advantage of this architecture is that during debugging, such as, for example, instruction stepping, the two parallel threads can be forced to be serialized such that the CPU front portion 145 will not issue any instruction after issuing vrun to the extension pipeline until the latter fetches and executes the vendrec instruction. In various embodiments, this will give the programmer the view of a single program thread that has the same functional behavior of the parallel program when executed normally and hence will greatly simplify the task of debugging.
Another advantage of the processor pipeline containing a parallel extendible pipeline that can be dynamically coupled and decoupled is the ability to use two separate clock domains. In low power applications, it is often necessary to run specific parts of the integrated circuit at varying clock frequencies, in order to reduce and/or minimize power consumption. Using dynamic decoupling, the front end portion 145 of the main pipeline can utilize an operating clock frequency different from that of the parallel pipeline 165 of stages D1-D4 with the primary clock partitioning occurring naturally at the queue 155 labeled as Q in the
Referring now to
Referring now to
As discussed above, in the microprocessor architecture according to the various embodiments of the invention, a main processor pipeline is extended through a dynamically coupled parallel SIMD instruction pipeline. In various embodiments, the main processor pipeline may issue instructions to the extended pipeline through an instruction queue that effectively decouples the extended pipeline. In various embodiments, the extended SIMD pipeline is also able to run prerecorded macros that are stored in a local SIMD instruction memory so that a single macro instruction sent to the SIMD pipeline via the queue allows many pre-determined instructions to be executed as discussed in commonly assigned U.S. patent applications XX/XXX,XXX titled, “Systems and Methods for Recording Instructions Sequences in a Microprocessor Having a Dynamically Decoupleable Extended Instruction Pipeline,” filed concurrently herewith, the disclosure of which is hereby incorporated by reference in its entirety. This architecture, among other things, allows the SIMD media engine (the extended pipeline) to operate in parallel with the primary pipeline (processor core) and allows the processor core to operate far in advance of the parallel SIMD pipeline.
One consideration of using an instruction queue to decouple the extended SIMD pipeline from the processor core (main pipeline) is that it becomes possible for the processor core to issue too many instructions causing the queue to become full. When the main processor pipeline can no longer issue instructions to the queue, the pipeline will have to stall until the queue frees up a slot for the instruction that caused the pipeline to stall. Pipeline stalls have a negative effect on overall system performance. In this case in particular, a pipeline stall means that the processor core will stop being able to operate in parallel, therefore negating the gains derived from the dynamically decoupled extended parallel SIMD pipeline.
Therefore, in order to prevent the main processor pipeline from issuing instructions to the queue when it is full, thereby causing the main pipeline to stall, in various embodiments, the SIMD pipeline queue uses condition codes to notify the processor pipeline of the condition of the queue. In various embodiments, the SIMD queue sets a condition code of QF for queue nearly full whenever there are less than a predetermined number of empty slots remaining in the queue. In various embodiments, this number may be 16. However, in various embodiments, the number may be different than 16. In various embodiments, the SIMD queue sets a condition code of QNF as the opposite of QF when more than the predetermined number of slots remain available.
In various embodiments, rather than using several instructions to load these status values and test the value before branching on the test result, two conditional branch instructions using these condition codes directly test for such conditions, thereby reducing the number of instructions required to perform this task. In various embodiments, these instructions will only branch when the condition code used is set. In various embodiments, these instructions may have the mnemonic “BQF” for branch when queue is nearly full and “BQNF” for branch when queue is not nearly full. Such condition codes make the queue full status an integral part of the main processor programming model and make it possible to make frequent light-weight intelligent decisions by software to maximize overall performance. These condition codes are maintained by the queue itself based on the queue's status. The instruction to check the condition code are branch instructions that are specified to check the particular condition codes. In various embodiments of the invention, checking of the condition code is done by placing condition code checking branch instructions where necessary, such as before issuing any instructions to the extended pipeline. Thus, the condition codes provide an easy mechanism for preventing main pipeline stalls caused by trying to issue instructions to a full queue.
These two conditional branch instructions allow the main processor pipeline to regularly check the status of the queue before issuing more instructions into the extended SIMD pipeline queue. The main processor core can use these instructions to avoid stalling the processor when the queue is full or nearly full, and branch to another task that does not involve the SIMD engine until these queue conditions change. Therefore, these instructions provide the processor with an effective and relatively low overhead means of scheduling work load on the available resources while preventing main pipeline stalls.
The embodiments of the present inventions are not to be limited in scope by the specific embodiments described herein. For example, although many of the embodiments disclosed herein have been described with reference to systems and dynamically decoupling a parallel pipeline in a microprocessor-based system having a main instruction pipeline and an extended instruction pipeline, the principles herein are equally applicable to other aspects of microprocessor design and function. Indeed, various modifications of the embodiments of the present inventions, in addition to those described herein, will be apparent to those of ordinary skill in the art from the foregoing description and accompanying drawings. Thus, such modifications are intended to fall within the scope of the following appended claims. Further, although some of the embodiments of the present invention have been described herein in the context of a particular implementation in a particular environment for a particular purpose, those of ordinary skill in the art will recognize that its usefulness is not limited thereto and that the embodiments of the present inventions can be beneficially implemented in any number of environments for any number of purposes. Accordingly, the claims set forth below should be construed in view of the full breath and spirit of the embodiments of the present inventions as disclosed herein.
This application claims priority to U.S. Provisional Patent Application No. 60/721,108 titled “SIMD Architecture and Associated Systems and Methods,” filed Sep. 28, 2005, the disclosure of which is hereby incorporated by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
60721108 | Sep 2005 | US |