The invention relates to a data processing system, comprising at least a first processing element and a second processing element for processing a stream of data objects, the first processing element being arranged to pass data objects from the stream of data objects to the second processing element, wherein the first and the second processing element are arranged for execution of an application, the application comprising a set of tasks, and wherein the first and the second processing element are arranged to be responsive to the receipt of a unique identifier.
The invention further relates to a method of controlling a data processing system, the data processing system comprising at least a first processing element and a second processing element for processing a stream of data objects, wherein the first processing element is arranged to pass data objects from the stream of data objects to the second processing element, and wherein the first and the second processing element are arranged for execution of an application, the application comprising a set of tasks, the method of controlling comprising the step of recognizing a unique identifier by one of the first and the second processing element.
A multiple processing element architecture for high performance, data-dependent media processing, for example high-definition MPEG decoding, is known. Media processing applications can be specified as a set of concurrently executing tasks that exchange information solely by unidirectional streams of data. G. Kahn introduced a formal model of such applications in 1974, “The Semantics of a Simple Language for Parallel Programming”, Proc. of the IFIP congress 1974, Aug. 5-10, Stockholm Sweden, North-Holland publ. Co, 1974, pp. 471-475 followed by an operational description by Kahn and MacQueen in 1977, “Co-routines and Networks of Parallel Programming”, Information Processing 77, B. Gilchhirst (Ed.), North-Holland publ., 1977, pp. 993-998. This formal model is commonly referred to as a Kahn Process Network.
An application is known as a set of concurrently executable tasks. Information can only be exchanged between tasks by unidirectional streams of data. Tasks should communicate only deterministically by means of a read and write action regarding predefined data streams. The data streams are buffered on the basis of a FIFO behavior. Due to the buffering two tasks communicating through a stream do not have to synchronize on individual read or write actions.
In stream processing, successive operations on a stream of data are performed by different processing elements. For example, a first stream might consist of pixel values of an image, that are processed by a first processing element to produce a second stream of blocks of Discrete Cosine Transformation (DCT) coefficients of 8×8 blocks of pixels. A second processing element might process the blocks of DCT coefficients to produce a stream of blocks of selected and compressed coefficients for each block of DCT coefficients.
The data streams in the network are buffered. Each buffer is realized as a FIFO, with precisely one writer and one or more readers. Due to this buffering, the writer and readers do not need to mutually synchronize individual read and write actions on the channel. Reading from a channel with insufficient data available causes the reading task to stall. The processing elements can be dedicated hardware function units which are only weakly programmable. All processing elements run in parallel and execute their own thread of control. Together they execute a Kahn-style application, where each task is mapped onto a single processing element. The processing elements allow multi-tasking, i.e. multiple tasks can be mapped onto a single processing element.
As the state and progress of the overall application is distributed in time and space, application management faces problems with application reconfiguration, analyzing application progress as well as debugging. Especially with multitasking processing elements that dynamically schedule their tasks, the global application is difficult to control. Unsolicited events may occur which ask for an application mode change. Analyzing overall application progress is of continuous concern in systems with data dependent processing and real-time requirements. In addition, debugging applications on multiprocessor systems with multitasking processing elements, requires the ability to set breakpoints per task. Intruding running tasks for mode changes requires comparable measures as needed for setting task breakpoints.
U.S. Pat. No. 6,457,116 describes an apparatus for providing local control of processing elements in a network of processing elements. The processing elements are joined in a complete array by means of several interconnect structures. Each interconnect structure forms an independent network, but the networks do join at input switches of the processing elements. The network structure is an H-tree network structure with a single source and multiple receivers in which individual processing elements may be written to. This configuration network is the mechanism by which configuration memories of the processing elements get programmed and also to communicate the configuration data. The configuration network is arranged so that receivers receive the broadcast within the same clock cycle. A processing element is configured to store a number of configuration memory contexts, and the selected configuration memory context controls the processing element. Each processing element in the networked array of processing elements has an assigned physical identification. Data is transmitted to at least one of the processing elements of the array, the data comprising control data, configuration data, an address mask, and a destination identification. The transmitted address mask is applied to the physical identification and to the destination identification. The masked physical identification and the masked destination identification are compared, and if they match, at least one of the number of processing elements is manipulated in response to the transmitted data Manipulation comprises selecting one of the number of configuration memory contexts to control the functioning of the processing element U.S. Pat. No. 6,108,760 describes a comparable apparatus for position independent reconfiguration in a network of processing elements. Manipulation comprises programming a processing element with at least one configuration memory context.
It is a disadvantage of the prior art data processing system that the reconfiguration is performed at a specific moment in time. For example, in case of a pipelined network of processing elements, reconfiguring at a specific moment in time means that the data integrity within the pipelined network can not be guaranteed any more.
An object of the invention is to provide a generic solution for global application control in a Kahn-style data processing system.
This object is achieved with a data processing system of the kind set forth, characterized in that the stream of data objects further comprises the unique identifier, and that the first processing element is further arranged to pass the unique identifier to the second processing element.
Passing the unique identifier in the data processing system from one processing element to the other as an element in the ordered stream of data, allows global application control at a unique location in the data space, as opposed to at a single point in time. For example, application reconfiguration or individual task reconfiguration can be performed, while maintaining the pipelined processing as well as maintaining integrity of the data in the stream of data objects. As a result the overall performance of the data processing system is increased, since termination and restart of the execution of the application can be avoided.
An embodiment of the data processing system according to the invention is characterized in that at least one of the processing elements is arranged to insert the unique identifier into the stream of data objects. In case the application is ready for reconfiguration, or a breakpoint should be introduced, one of the existing processing elements is capable of inserting the unique identifier into the data stream, without requiring any additional measures.
An embodiment of the data processing system according to the invention is characterized in that at least one task of the set of tasks is arranged to have a programmable identifier, wherein a corresponding processing element of the first and the second processing elements is arranged to compare the programmable identifier with the unique identifier. The purpose of the programmable identifier is to allow a response to a specific unique identifier that is passed through via the data stream. Responding to a unique identifier is programmed per task, so that each task can respond in an individual way. In this way, the programmable identifier allows selecting a task that should be reconfigured, in case of a multitasking processing element. In case of a match between the programmable identifier and the unique identifier for a running task, it means that task is ready for reconfiguration. The comparison results in a match when these two identifiers have the same value, or for instance, when the programmable identifier has a reserved value that always enforces a match.
An embodiment of the data processing system according to the invention is characterized in that at least one processing element of the first and second processing elements is arranged to pause a corresponding task of the set of tasks, upon a match between the programmable identifier and the unique identifier. An advantage of this embodiment is that the execution of one or more tasks is suspended at a well-defined point in the data space.
At a later moment in time reconfiguration of the application can take place, without the tasks involved in the reconfiguration being further on their respective execution paths at that time.
An embodiment of the data processing system according to the invention is characterized in that at least one processing element of the first and second processing elements is arranged to generate an interrupt signal upon a match between the programmable identifier and the unique identifier. By generating an interrupt signal, the corresponding processing element can signal that a task is ready for reconfiguration, or the interrupt signal can be used to determine the progress of the task execution.
An embodiment of the data processing system according to the invention is characterized in that the data processing system further comprises a control processing element, wherein the control processing element is arranged to reconfigure the application, in response to the interrupt signal. The information needed for task reconfiguration is not related to the unique identifier, allowing the mechanism of forwarding and matching of the unique identifier to be independent of the task functionality. As a result, this mechanism can be implemented in a reusable hardware or software component.
An embodiment of the data processing system according to the invention is characterized in that the stream of data objects comprises a plurality of packets, the plurality of packets arranged to store data objects, and a dedicated packet, the dedicated packet arranged to store the unique identifier. The processing elements identify the dedicated packets, for example based on their packet header, and forward these packets unmodified without disrupting the stream of data objects.
According to the invention a method of controlling a data processing system is characterized in that the method of controlling further comprises the following steps: inserting the unique identifier into the stream of data objects, and passing the unique identifier from the first processing element to the second processing element. This method allows run-time reconfiguration, while maintaining data integrity of the application running on the data processing system. Besides reconfiguration, the unique identifier can also be used to define debug breakpoints and to determine application latency.
Further embodiments of the data processing system and the method of controlling a data processing system are described in the dependent claims.
The shells SP, SA, SB and SC comprise a reading/writing unit for data transport, a synchronization unit and a task-switching unit. The shells SP, SA, SB and SC communicate with the associated (co)processor on a master/slave basis, wherein the (co)processors acts as a master. Accordingly, the shells SP, SA, SB and SC are initialized by a request from the corresponding (co)processor. Preferably, the communication between the corresponding (co)processor and the shells SP, SA, SB and SC is implemented by a request-acknowledge handshake mechanism in order to hand over argument values and wait for requested values to return. Therefore the communication is blocking, i.e. the respective thread of control waits for their completion. The functionality of the shells SP, SA, SB and SC can be implemented in software and/or in hardware.
The reading/writing unit preferably implements two different operations, namely the read-operation enabling the processor CPU and coprocessors ProcA, ProcB and ProcC to read data objects from the memory MEM, and the write-operations enabling the processor CPU and coprocessors ProcA, ProcB and ProcC to write data objects into the memory MEM.
The synchronization unit implements two operations for synchronization to handle local blocking conditions occurring at an attempt to read from an empty FIFO or to write to a full FIFO, respectively.
The system architecture according to
The processor CPU comprises a control processor, for controlling the data processing system. The stream of data objects comprises a plurality of data packets that hold the data. For efficient packetization of the data, variable length packets are used on the data streams.
During execution of an application, a task may have to be dynamically added to or removed from the application graph, as shown in
The processor CPU and the coprocessors ProcA, ProcB and ProcC parse the incoming data stream, and are capable of recognizing location ID packets. Upon recognition of a location ID packet, the processor CPU and coprocessors ProcA, ProcB and ProcC forward the location ID packet to its output data streams. Upon receiving a location ID packet, the processor CPU and coprocessor ProcA, ProcB and ProcC also pass the payload from the packet, i.e. the location ID, to its corresponding shell SP, SA, SB and SC via the corresponding interface IP, IA, IB and IC, together with an identifier of the corresponding task, i.e. a task ID, that is currently being executed. Upon reception of a location ID and a task ID from the processor CPU or coprocessor ProcA, ProcB and ProcC, the corresponding shell SP, SA, SB and SC compares the received location ID with the programmed location ID, for the task having said task ID. Upon a match, the shell SP, SA, SB and SC suspends further processing of said task by sending a signal to the corresponding processor CPU or coprocessor ProcA, ProcB and ProcC, and also sends an interrupt signal to the control processor.
Subsequently, the control processor can analyze or reconfigure the local task state under software control. After reconfiguration, the control processor instructs the shell SP, SA, SB and SC to arrange resumption of said task by the corresponding processor CPU or coprocessor ProcA, ProcB and ProcC.
Processor CPU and coprocessor ProcA, ProcB and ProcC are capable of generating location ID packets and inserting these into the stream of data objects. Typically, these location ID packets are inserted only at predefined locations into the data stream, for example at the end of an MPEG frame. The processor CPU and the coprocessor ProcA, ProcB and ProcC are instructed to insert such a location ID packet into the data stream by the control processor, or indirectly by the corresponding shell SP, SA, SB and SC upon a request for a new task from the (co) processor.
Referring again to
The concept of location ID's allows run-time application reconfiguration while maintaining data integrity of the application graph in a multiprocessor system. As a result, no termination and restart of the execution of the application is required, increasing the overall performance of the data processing system. Application reconfiguring can entail: changing parameters of one or more tasks, modifying buffer sizes or their location in memory, modifying the task interconnect structures, modifying the mapping or tasks to (co)processors, instantiating and connecting more tasks or buffers, removing and disconnecting tasks or buffers, for example. For a multi-tasking processor, the processor can continue processing other tasks while reconfiguration takes place. Two special programmable location ID values are reserved to match any received location ID or none at all.
In different embodiments, the concept of location ID's can be used for analyzing overall application progress or debugging applications, on multiprocessor systems with multi-tasking processing elements. The location ID's allow to set debug breakpoints per task at unique positions in the data processing, or can be used to determine application latency. In case of analyzing the overall application progress, it is not required to pause the active task, but for example only generating an interrupt signal will suffice. Using that interrupt signal the progress can be determined.
The information needed for task reconfiguration is not part of the location ID packet, which allows the mechanism of forwarding the location ID packet, matching the location ID with the programmed location ID, and signaling an interrupt to the control processor, to be independent of the task functionality. As a result, the implementation of said mechanism can be done by means of a reusable hardware or software component.
In different embodiments, the processor CPU or coprocessors ProcA, ProcB and ProcC are arranged to store the value of the encountered location ID, as well as the result of the match between the location ID and the programmed location ID in their corresponding shell SP, SA, SB and SC. In another embodiment, the processor CPU or coprocessors ProcA, ProcB and ProcC are arranged to store only the result of the match between the location ID and the programmed location ID in their corresponding shell SP, SA, SB and SC. Instead of receiving an interrupt signal, the control processor itself investigates the result of the match at a later moment in time, for example by means of a polling mechanism. In yet another embodiment, the processor CPU or coprocessors ProcA, ProcB and ProcC are arranged to store only the value of the encountered location ID. The stored value of the location ID can indicate if this location ID has already passed the corresponding (co)processor, via the data stream.
The task-switching unit of shells SP, SA, SB and SC is responsible for selecting tasks to execute on the corresponding processor CPU or coprocessor ProcA, ProcB and ProcC. The interfaces IP, IA, IB and IC implement the mechanism for receiving and matching the location ID detected by the processor CPU and coprocessor ProcA, ProcB and ProcC, respectively. Implementation of said mechanism can be done by means of a so-called Report interface. The processor CPU and coprocessor ProcA, ProcB and ProcC can report messages to the task-switching unit of shells SP, SA, SB and SC via the corresponding interface IP, IA, IB and IC by calling Report (task_id, report_type, report_id). The task_id corresponds to the task ID of the active task from which the Report request is issued, report_type indicates either an error condition or a valid location ID, and report_id contains the location ID. The task tables 401, 403, 405 and 407 in the corresponding shells SP, SA, SB and SC further comprise five programmable fields for each task, not shown in
In some embodiments the processor CPU and coprocessors ProcA, ProcB and ProcC do not forward the location ID packet to all their output data streams. The intention of location ID packets in a flow of data packets through the entire application graph is such that each task will once check the location ID value against its programmed location ID values, if any, in its corresponding shell SP, SA, SB and SC in order to allow task reconfiguration. In some cases the application graph contains cycles, and then it should be avoided that these location ID packets will remain running around within the application graph. The continuous circulation of location ID packets in an application graph can be avoided by not forwarding the location ID packets to all of the output data streams of the set of tasks, as will be clear for the person skilled in the art.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design many alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word “comprising” does not exclude the presence of elements or steps other than those listed in a claim. The word “a” or “an” preceding an element does not exclude the presence of a plurality of such elements. In the device claim enumerating several means, several of these means can be embodied by one and the same item of hardware. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measures cannot be used to advantage.
Number | Date | Country | Kind |
---|---|---|---|
03100484.9 | Feb 2003 | EP | regional |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/IB04/50124 | 2/18/2004 | WO | 3/2/2006 |