The present disclosure relates generally to the field of multi-processor systems and, more specifically, to techniques for communicating instructions to a slave processor in a multi-processor system having a master processor and pipelined slave processors.
In complex computer systems common workload is often distributed among and performed in parallel by a plurality of processors. A multi-processor system typically includes a master processor administering a plurality of pipelined (i.e., connected in series) processors or co-processors, which are collectively referred to herein as slave processors. For example, such multi-processor systems may be used for processing of large amounts of video data or rendering graphics, among others in computationally intensive applications.
In the multi-processor system, however, the master processor and each of the slave processors may operate using instructions (i.e., commands) and data formatted in their native and, as such, different programming languages. Conventionally, instructions forwarded by the master processor downstream to a respective slave processor are decoded by each intermediate slave processor in its native programming language, re-coded in the programming language of the next downstream intermediate slave processor, and then forwarded to that processor.
Such cycles of decoding the received instructions and, after re-coding in the native programming language of the downstream intermediate slave processor, forwarding them downstream continues until the instructions reach an intended, or destined, slave processor. At the destined slave processor, the received instructions are de-coded in the native programming language of that processor and executed.
Complexity of such multi-step routine for communicating instructions to slave processors adversely affects overall performance of the multi-processor system and, in particular, limits design flexibility and command throughput of the system. Despite the considerable efforts in the art devoted to increasing efficiency of communicating instructions from the master processor to the pipelined slave processors, further improvements would be desirable.
There is therefore a need in the art for techniques to efficiently implement communication of instructions to pipelined slave processors in multi-processor systems.
Techniques for communicating instructions to slave processors in a multi-processor system having a master processor and pipelined slave processors are described herein. In an embodiment, the master processor generates a pass-through command having a header block and a payload block that includes instructions to a destined slave processor. The header block is coded using a computer language understood by the pipelined slave processors, and the payload block is coded in a computer language understood by the destined slave processor. The master processor forwards the pass-through command to an outermost one of the pipelined slave processors and then the pass-through command is re-transmitted, without recoding, by intermediate (i.e., non-destined) slave processors until the pass-through command reaches the destined slave processor, which executes the instructions.
In one design, the system uses the inventive method to perform at least one of processing video data or rendering graphics.
Various aspects and embodiments of the invention are described in further detail below.
The Summary is neither intended nor should it be construed as being representative of the full extent and scope of the present invention, which these and additional aspects will become more readily apparent from the detailed description, particularly when taken together with the appended drawings.
To facilitate understanding, identical reference numerals have been used, where possible, to designate identical elements that are common to the figures, except that suffixes may be added, when appropriate, to differentiate such elements. The images in the drawings are simplified for illustrative purposes and are not depicted to scale. It is contemplated that features or steps of one embodiment may be beneficially incorporated in other embodiments without further recitation.
The appended drawings illustrate exemplary embodiments of the invention and, as such, should not be considered as limiting the scope of the invention that may admit to other equally effective embodiments.
The term “exemplary” is used herein to mean “serving as an example, instance, or illustration.” Any embodiment or design described herein as “exemplary” is not necessarily to be construed as preferred or advantageous over other embodiments or designs.
Referring to the figures,
In one exemplary embodiment, the system 100 is a portion of a graphics processing unit (GPU) of a wireless communication apparatus, such as a cellular phone, a video game console, a personal digital assistant (PDA), a laptop computer, an audio/video-enabled device, and the like.
The GPU may be compliant, for example, with a document “OpenVG Specification, Version 1.0,” Jul. 28, 2005, which is publicly available. This document is a standard for 2-D vector graphics suitable for handheld and mobile devices, such as cellular phones and other referred to above wireless communication apparatuses.
In the depicted embodiment, the system 100 illustratively includes a master processor 110 and a plurality 101 of pipelined slave processors 1201-120K, which are connected using respective system interfaces 1261-126K, where K is an integer and K>2. In one embodiment, each of the system interfaces 1261-126K includes a data bus, an address buss, and a command bus (none is shown). The master processor 110 and each of the slave processors 1201-120K may contain sub-processors, memories, peripheral devices, and support circuits, and the like elements, which, for brevity, are collectively shown herein as modules 111 and 1211-121K, respectively.
The master processor 110 and pipelined slave processors 1201-120K may be formed on a single integrated circuit (IC) such as, for example, a system-on-chip (SoC) device. Alternatively, the master processor 110 and at least one of the slave processors 1201-120K may be formed on separate ICs.
In operation, the master processor 110 controls and, optionally, monitors data processing at the slave processors 1201-120K. The master processor 110 and each of the slave processors 1201-120K may operate using different formats (i.e., computer languages) for generating or executing internal instructions or internal data exchanges.
The master processor 110 comprises an input/output (I/O) module 118 including an input buffer (IB) 112 and an output buffer (OB) 114. Correspondingly, each of the slave processors 1201-120K comprises a respective input/output (I/O) module 128 including an input buffer 122 and an output buffer 124. In operation, the I/O modules 118 and 1281-128K facilitate information exchanges within the system 100 or to/from the system 100.
Using interface 102, the input buffer 112 of the master processor 110 may be connected to at least one of a remote processor, a network, or a user controls means, which are collectively shown as a means 104. Similarly, using interface 107, the output buffer 124K of the slave processor 120K may be connected to other remote processor, network, or user controls means, which are collectively shown as a means 106.
In the system 100, via a respective bi-directional system interface 126, an input buffer 122 of a preceding (i.e., upstream) slave processor 120 is connected to an output buffer 124 of the adjacent downstream slave processor, thus forming the plurality 101 of pipelined slave processors 1201-120K. For example, an input buffer 1222 of a slave processor 1202 is connected, via a system interface 1262, to an output buffer 1241 of a slave processor 1201, and an output buffer 1242 of the slave processor 1202 is connected, via a system interface 1263, to an input buffer 1223 of a slave processor 1203 (not shown).
In one embodiment, the output buffer 114 of the master processor 110 is connected, via a system interface 1261, to an input buffer of an outermost slave processor of the plurality 101 of the pipelined slave processors 120, i.e., to an input buffer 1221 of the slave processor 1201. In operation, the master processor 110 administers control over the slave processors 1201-120K by generating and transmitting instructions to a respective slave processor. A slave processor, which is an intended recipient of these instructions, is hereafter referred to as a destined slave processor. The instructions are transmitted, via the system interface 1261, from the output buffer 114 of the master processor 110 to an input buffer 1221 of the outermost slave processor 1201.
To reach the destined slave processor, the instructions should be received and than re-transmitted, or forwarded, downstream by one or more intermediate upstream slave processors, i.e., the slave processors disposed between the master processor and the destined slave processor. Herein, the terms “to forward” and “to re-transmit” are used interchangeably.
More specifically, via the respective system interface, the instructions from an output buffer of an upstream slave processor are forwarded to an input buffer of the respective downstream slave processor (i.e., forwarded in a direction illustrated using arrow 103), which then similarly forwards the instructions further downstream until they reach the destined slave processor.
For example, when the destined slave processor is the slave processor 1203, the slave processor 1201, via the system interface 1262, forwards the instructions downstream to the slave processor 1202, which then re-transmits the instructions to the destined slave processor 1203, where the instructions are executed.
Referring to
In one embodiment, the pass-through command 200 includes a header block 210 and a payload block 220. The header block 210 is coded using a computer language that is understood by all pipelined slave processors 1201-120K of the plurality 101. Herein, the term “computer language” is collectively used in reference to programming languages and formats for instructions and data used by the master and slave processors.
In one exemplary embodiment, the header block 210 includes data modules 202, 204, and 206. In alternate embodiments (not shown), contents of the data modules 202, 204, and 206 may form a single data module or contents of any two of these modules may be included in one data module.
A data module 202 contains information identifying the pass-through command 200 (i.e., an ID of the pass-through command) among other commands of the master processor 110. A data module 204 contains information identifying the destined slave processor (e.g., address of the destined slave processor), and a data module 206 contains information regarding a bit length (for example, in the units of bytes) of the payload block 220. In an alternate embodiment (not shown), in the header block 210, the data module 206 may precede the data module 204.
The payload block 220 is coded using a computer language that is understood by the destined slave processor and includes at least one data module 222 comprising an instruction generated by the master processor 110 for execution by the respective destined slave processor (data modules 2221-222N are shown, where N is an integer and N>1).
In further embodiments, the pass-through command 200 may instruct the destined slave processor to confirm the receipt or execution of the command by sending a pre-determined message upstream to the master processor 110 (i.e., in a direction illustrated using arrow 105). For example, to efficiently communicate such a message to the master processor 110, the pass-through command 200 may instruct the destined slave processor (i) to replace, in the data module 204 of the received pass-through command, information identifying the destined slave processor with information identifying the master processor 110, (ii) include the pre-determined message in the payload block 220, and (iii) forward the modified (i.e., reply) pass-through command to an adjacent upstream slave processor.
At step 310, the master processor 110 generates the pass-through command 200. Illustratively, step 310 comprises sub-steps 312, 314, 316, and 318. In the depicted embodiment, during sub-steps 312, 314, and 316, the master processor 110 generates the header block 210 of the pass-through command 200 and, during sub-step 318, the master processor generates the payload block 220 of the pass-through command, respectively.
At sub-step 312, the master processor 110 generates the data module 202 of the header block 210. The data module 202 contains information identifying the pass-through command 200, and this information is coded using a computer language understood by each one of the slave processors 1201-120K. At sub-step 314, the master processor 110 generates the data module 204 of the header block 210. The data module 204 contains information identifying the destined slave processor (e.g., slave processor 120K) and instructions for intermediate non-destined slave processors disposed between the master processor and the destined slave processor (i.e., slave processors 1201-120K-1).
In particular, the data module 204 contains a request for the non-destined slave processors to forward, without decoding, the pass-through command 200 downstream to the destined slave processor. In one embodiment, the non-destined slave processors are instructed to copy the pass-through command from an input buffer of a respective non-destined slave processor to the output buffer of the slave processor. Content of the data module 204 is coded using a computer language understood by each one of the slave processors 1201-120K.
At sub-step 316, the master processor 110 generates the data module 206 of the header block 210. The data module 206 contains information identifying a bit length of the payload block 220 of the pass-through command 200. Similar to the contents of the data modules 202 and 204, this information is coded using a computer language understood by each one of the slave processors 1201-120K. At sub-step 318, the master processor 110 generates the payload block 220 of the pass-through command 200. The payload block 220 contains at least one instruction 222 for the destined slave processor. Contents of the payload block 220 (i.e., instructions 2221-222N) are coded using a computer language understood by the destined slave processor (e.g., slave processor 120K).
At step 320, the master processor 110 assembles the pass-through command 200 and transmits the command to the outmost slave processor (e.g., slave processor 1201).
At step 330, when the outmost slave processor is the destined slave processor, that processor executes instructions contained in the payload block 220 of the command. Accordingly, when the outmost slave processor is not the destined slave processor, the outmost slave processor forwards (i.e., re-transmits) the pass-through command 200 downstream to the adjacent slave processor (i.e., slave processor 1202), which, unless that slave processor is the destined slave processor, forwards the command to the next downstream slave processor (i.e., slave processor 1203). Such cycles of re-transmitting the pass-through command 200 continue until the command reaches the destined slave processor. In one embodiment, during step 330, the received pass-through command 200 in copied, without recoding, from an input buffer of a recipient non-destined slave processor to an output buffer of that processor.
At step 340, the pass-through command 200 reaches the destined slave processor, which executes the instructions contained in the payload block 220 of the command. In one embodiment, such instructions may include a request from the master processor 110 to confirm the receipt or execution of the pass-through command. As discussed above in reference to
In exemplary embodiments, the method 300 may be implemented in hardware, software, firmware, or any combination thereof in a form of a computer program product comprising one or more computer-executable instructions. When implemented in software, the computer program product may be stored on or transmitted using a computer-readable medium, which includes computer storage medium and computer communication medium.
The term “computer storage medium” refers herein to any medium adapted for storing the instructions that cause the computer to execute the method. By way of example, and not limitation, the computer storage medium may comprise solid-sate memory devices, including electronic memory devices (e.g., RAM, ROM, EEPROM, and the like), optical memory devices (e.g., compact discs (CD), digital versatile discs (DVD), and the like), or magnetic memory devices (e.g., hard drives, flash drives, tape drives, and the like), or other memory devices adapted to store the computer program product, or a combination of such memory devices.
The term “computer communication medium” refers herein to any physical interface adapted to transmit the computer program product from one place to another using for example, a modulated carrier wave, an optical signal, a DC or AC current, and the like means. By way of example, and not limitation, the computer communication medium may comprise twisted wire pairs, printed or flat cables, coaxial cables, fiber-optic cables, digital subscriber lines (DSL), or other wired, wireless, or optical serial or parallel interfaces, or a combination thereof.
The previous description of the disclosure is provided to enable any person skilled in the art to make or use the disclosure. Various modifications to the disclosure will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other variations without departing from the spirit or scope of the disclosure. Thus, the disclosure is not intended to be limited to the examples described herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
This application claims the benefit of U.S. Provisional Application No. 60/896,497, filed Mar. 23, 2007, the entire content of which are hereby incorporated by reference.
Number | Date | Country | |
---|---|---|---|
60896497 | Mar 2007 | US |