Certain embodiments of the invention relate to accessing data. More specifically, certain embodiments of the invention relate to a synchronized control scheme in a parallel multi-client two-way handshake system.
Advances in compression techniques for audio-visual information have resulted in cost effective and widespread recording, storage, and/or transfer of movies, video, and/or music content over a wide range of media. The Moving Picture Experts Group (MPEG) family of standards is among the most commonly used digital compressed formats. A major advantage of MPEG compared to other video and audio coding formats is that MPEG-generated files tend to be much smaller for the same quality. This is because MPEG uses sophisticated compression techniques. However, MPEG compression may be lossy and, in some instances, it may distort the video content. In this regard, the more the video is compressed, that is, the higher the compression ratio, the less the reconstructed video retains the original information. Some examples of MPEG video distortion are loss of textures, details, and/or edges. MPEG compression may also result in ringing on sharper edges and/or discontinuities on block edges. Because MPEG compression techniques are based on defining blocks of video image samples for processing, MPEG compression may also result in visible “macroblocking” that may result due to bit errors. In MPEG, a macroblock is an area covered by a 16×16 array of luma samples in a video image. Luma may refer to a component of the video image that represents brightness. Moreover, noise due to quantization operations, as well as aliasing and/or temporal effects may all result from the use of MPEG compression operations.
When MPEG video compression results in loss of detail in the video image it is said to “blur” the video image. In this regard, operations that are utilized to reduce compression-based blur are generally called image enhancement operations. When MPEG video compression results in added distortion on the video image it is said to produce “artifacts” on the video image. For example, the term “mosquito noise” may refer to MPEG artifacts that may be caused by the quantization of high spatial frequency components in the image. In another example, the term “block noise” may refer to MPEG artifacts that may be caused by the quantization of low spatial frequency information in the image. Block noise may appear as edges on 8×8 blocks and may give the appearance of a mosaic or tiling pattern on the video image.
There may be some systems that attempt to remove video noise. However, the systems may comprise a data buffer for each of the clients that may be processing the video data. The redundancy of the video buffers may be expensive in terms of chip layout area and power consumed. The various clients may produce processed video data that may be used by other clients and/or combined to create a single output. In order to blend the video data, all of the various video data must be synchronized. Decentralized synchronization may be complex and require much coordination. As the video processing systems get larger, the problems related with chip layout area, power required, and synchronization of the various video streams may be exacerbated.
Further limitations and disadvantages of conventional and traditional approaches will become apparent to one of skill in the art, through comparison of such systems with some aspects of the present invention as set forth in the remainder of the present application with reference to the drawings.
A system and/or method for a synchronized control scheme in a parallel multi-client two-way handshake system, substantially as shown in and/or described in connection with at least one of the figures, as set forth more completely in the claims.
Various advantages, aspects and novel features of the present invention, as well as details of an illustrated embodiment thereof, will be more fully understood from the following description and drawings.
Certain embodiments of the invention may be found in a method and system for a synchronized control scheme in a parallel multi-client two-way handshake system. Various aspects of the invention may be utilized for processing video data and may comprise processing pixels by a plurality of data processing units using at least one shared buffer. The pixels may be communicated to the plurality of data processing units using a centralized and synchronized flow control mechanism. Pixel accept signals may be utilized to communicate the pixels from the shared buffer to the data processing unit without using a ready signal. Each pixel accept signal may correspond to a pixel. The pixel accept signal may be generated based on an accept signal from a subsequent pipeline stage in the shared buffer to a present pipeline stage in the shared buffer. A generated control signal from the shared buffer to the data processing unit may be used for centralized and synchronized data flow control. A delay may be generated that delays generation of the control signal to handle boundary conditions during processing.
The processed output pixels generated from the data processing units may be blended. The flow of the pixels may be pipelined by a plurality of pipeline stages within the shared buffer. An accept signal may be communicated from a subsequent pipeline stage to a present pipeline stage and a ready signal may be communicated from a present pipeline stage to a subsequent pipeline stage for the pipelining.
The VB RCV 102 may comprise suitable logic, circuitry, and/or code that may be adapted to receive MPEG-coded images in a format that is in accordance with the bus protocol supported by the video bus (VB). The VB RCV 102 may also be adapted to convert the received MPEG-coded video images into a different format for transfer to the line stores block 104. The line stores block 104 may comprise suitable logic, circuitry, and/or code that may be adapted to convert raster-scanned luma data from a current MPEG-coded video image into parallel lines of luma data. The line stores block 104 may be adapted to operate in a high definition (HD) mode or in a standard definition (SD) mode. Moreover, the line stores block 104 may also be adapted to convert and delay-match the raster-scanned chroma information into a single parallel line. The pixel buffer 106 may comprise suitable logic, circuitry, and/or code that may be adapted to store luma information corresponding to a plurality of pixels from the parallel lines of luma data generated by the line stores block 104. For example, the pixel buffer 106 may be implemented as a shift register. The pixel buffer 106 may be common to the MNR block 114, the MNR filter 116, the horizontal BNR block 108, and the vertical BNR block 110 to reduce, for example, chip layout area.
The BV MNR block 114 may comprise suitable logic, circuitry, and/or code that may be adapted to determine a block variance parameter for image blocks of the current video image. The BV MNR block 114 may utilize luma information from the pixel buffer 106 and/or other processing parameters. The temporary storage block 118 may comprise suitable logic, circuitry, and/or code that may be adapted to store temporary values determined by the BV MNR block 114. The MNR filter 116 may comprise suitable logic, circuitry, and/or code that may be adapted to determined a local variance parameter based on a portion of the image block being processed and to filter the portion of the image block being processed in accordance with the local variance parameter. The MNR filter 116 may also be adapted to determine a MNR difference parameter that may be utilized to reduce mosquito noise artifacts.
The HBNR block 108 may comprise suitable logic, circuitry, and/or code that may be adapted to determine a horizontal block noise reduction difference parameter for a current horizontal edge. The VBNR block 110 may comprise suitable logic, circuitry, and/or code that may be adapted to determine a vertical block noise reduction difference parameter for a current vertical edge.
The combiner 112 may comprise suitable logic, circuitry, and/or code that may be adapted to combine the original luma value of an image block pixel from the pixel buffer 106 with a luma value that results from the filtering operation performed by the MNR filter 116. The chroma delay 120 may comprise suitable logic, circuitry, and/or code that may be adapted to delay the transfer of chroma pixel information in the chroma data line to the VB XMT 122 to substantially match the time at which the luma data generated by the combiner 112 is transferred to the VB XMT 122. The VB XMT 122 may comprise suitable logic, circuitry, and/or code that may be adapted to assemble noise-reduced MPEG-coded video images into a format that is in accordance with the bus protocol supported by the VB.
The pipeline delay blocks 206, 212, 214 and 218 may comprise suitable logic, circuitry, and/or code that may be adapted to synchronously delay video data in order that the various video data may be correctly aligned with each other. The pipeline delay blocks 206, 212, 214 and 218 may be similar to the pixel buffer 106 or the chroma delay block 120 (
There is also shown the various two-way handshake signals between the various blocks that may indicate whether the transmitting block is ready to transmit new data and whether the receiving block is ready to receive the new data. The handshaking may be referred to as ready-accept handshaking. The i_ready ready signal and the i_data data signal may be communicated by a video handling block, for example, the VB receiver 102 (
For example, the distribute block 202 may assert a ready signal to the processing block 204 when it has data that can be transmitted to the processing block 204. The processing block 204 may have an accept signal deasserted until it is ready to process the new data. The processing block 204 may then assert the accept signal to the distribute block 202 when it has accepted the new data. When the distribute block 202 receives the asserted accept signal from the processing block 204, it may keep the ready signal asserted if it has new data to send. Otherwise, it may deassert the ready signal until it has new data to send to the processing block 204. In this manner, by asserting and deasserting the ready signal and the accept signal, the distribute block 202 may communicate data to the processing block 204.
This illustration may indicate parallel processing of video data where the video data is processed in a plurality of video paths and the three video paths are combined at the end of processing of all three video paths. Video data may be received by the distribute block 202, and the distribute block 202 may communicate the video data to be processed to the three video paths. A first video path may comprise process blocks 204 and 208, and the pipeline delay block 206. A second video path may comprise the pipeline delay block 212. A third video path may comprise the pipeline delays 214 and 218, and the processing block 216. The processed video data from the three video paths may be communicated to the merge_and_blend block 210, and that block may output a single video signal, for example, the o_data video signal.
Each video path may be synchronized with each other when they are communicated to the merge_and_blend block 210. In this manner, the video data from the plurality of video paths may be merged correctly. The synchronization may be provided by appropriate delays in the processing blocks and in the pipeline delay blocks. However, since the ready-accept handshaking may occur independently between any two blocks, assuring synchronization among the various video paths at the merge_and_blend block may be very complex. Each processing block in a video path may be considered to be a client. The various blocks may operate synchronously by utilizing a pre-determined clock signal or clock signals.
The merge blocks 310 and 330 may comprise suitable logic, circuitry, and/or code that may be adapted to synchronize the ready and accept handshaking signals among three or more video handling blocks. The blend block 324 may comprise suitable logic, circuitry, and/or code that may be adapted to receive various inputs of video data and combine the received video data into one stream of video data. The ready-accept handshaking may be as described with respect to
This illustration may be parallel processing of video data where the video data is processed and blended as soon as the processing is finished by a client. Video data may be received by the distribute block 302, and the distribute block 302 may communicate the video data to be processed to the pipeline delay block 316. The pipeline delay block 316 may communicate delayed video data to the processing block 304 and to the pipeline delay block 312 for further processing. The output signal of the processing bock 304 may be communicated to the processing block 306. The processed output data from the processing block 306 may be communicated to the merge_and_blend block 308.
The pipeline delay block 312 may communicate delayed video data to the distributive block 314. The distributive block 314 may communicate video data to the pipeline delay block 318. The pipeline delay block 318 may communicate its output to the distributive block 320 and to the processing block 328. The distributive block 320 may communicate its output to the pipeline delay block 322, which may communicate its output to the blend block 324. The processing block 328 may also communicate its output to the blend block 324, and the blend block 324 may have an output that is blended video signal of the two inputs communicated from the pipeline delay block 322 and the processing block 328. The output of the blend block 324 may be communicated to the processing block 306 and to the pipeline delay block 326. The output of the pipeline delay block 326 may also be communicated to the processing block 306, and to the merge_and_blend block 308. The output of the merge_and_blend block 308 may be the video data signal o_data.
The distribute block 302 may handshake with the processing block 304 and the pipeline delay block 316. The merge block 310 may synchronize the ready-accept signals among the processing block 304 and the pipeline delay blocks 312 and 316. The distributive blocks 314 and 320 may handshake with the processing block 306. The distributive block 314 may also handshake with the pipeline delay block 318. The pipeline delay block 318 may also handshake with the processing block 328. The distributive block 320 may also handshake with the pipeline delay block 322. The merge block 330 may synchronize the ready-accept signals among the blend block 324, the pipeline delay blocks 326, and the processing block 328. The processing block 306 and the pipeline delay block 326 may handshake with the merge_and_blend block 308. The various blocks may operate synchronously by utilizing a pre-determined clock signal or clock signals.
The processing blocks 404, 406, and 408 may comprise suitable logic, circuitry, and/or code that may be adapted to process video data, and output the processed video data with appropriate delay in a synchronous manner. The processing block 404, 406, or 408 may be, for example, similar to the BV MNR block 114, the horizontal BNR block 108, or the vertical BNR block 110 (
The plurality of video signals video_n may comprise pixels of video data at different positions in the pipeline delay blocks. For example, the processing block 404 may process pixels at positions 5 and 13 in a horizontal line of video. In this regard, the pixels at positions 5 and 13 may comprise the video signals video_n. Similarly, the plurality of pixel accept signals accept_n may correlate to the pixels in the video signals video_n. If the video signals comprise pixels at positions 5 and 13, the plurality of pixel accept signals accept_n may correspond to the pixels at positions 5 and 13. When a pixel accept signal is asserted, the corresponding pixel may be accepted as a valid pixel.
The various blocks may utilize ready-accept handshaking to transfer video data. The ready-accept handshaking may be similar to the ready-accept handshaking described with respect to
In operation, the pipeline delay block 402 may accept data and shift the data synchronously. Appropriate accept signals may be asserted to the processing unit 404. The processing unit 404 may process the appropriate pixels and communicate the output to the processing unit 406. The pipeline delay block 402 may communicate the appropriate pixel accept signals to the processing block 408. The processing block 408 may process the pixels and communicate the output to the blend block 410. The blend block 410 may blend the video output of the processing block 408 with the video output communicated by the pipeline delay block 402. The resulting video output may be communicated to the pipeline delay block 412.
Appropriate pixel accept signals corresponding to the desired pixel positions in the pipeline delay block 412 may be communicated to the processing unit 406. The processing unit 406 may process the video and communicate the processed output to the blend block 414. The pipeline delay block 412 may utilize ready-accept handshaking to communicate its output to the blend block 414. The blend block 414 may blend the video data communicated by the processing block 406 and the pipeline delay block 412 to generate an output video signal o_data. The various blocks may operate synchronously by utilizing a pre-determined clock signal or clock signals.
The generated pixels may be blended with the corresponding original pixels in, for example, the pipeline delay blocks 402 or 412 (
A present pipeline stage may communicate to a subsequent pipeline stage a ready signal that may be asserted to indicate that new data may be available for the subsequent pipeline stage. The subsequent pipeline stage may communicate to the present pipeline stage an accept signal that may be asserted to indicate that it has accepted the new data. In this manner, each of the pipeline stages PL0 . . . PL14 in the control pipeline 602 may communicate via the ready-accept handshaking signals with a previous pipeline stage and a subsequent pipeline stage to control the flow of data in the shared buffer 604. For example, the pipeline stage PL3 may communicate an asserted ready signal to the subsequent pipeline stage PL4 when it has accepted new data L3 and C3 in the luma pixel buffer 606 and the chroma pixel buffer 608, respectively, from the previous pipeline stage PL2. The subsequent pipeline stage PL4 may accept the data from the pipeline stage PL3 and may assert the accept signal to indicate to the present pipeline stage that the data has been accepted. Accordingly, a pipeline stage may accept new data when it is provided by the previous pipeline stage and when it is ready to accept the new data.
There is also shown a plurality of pixel accept signals p_accept_0 . . . p_accept_14 and a plurality of corresponding pixels pixel_0 . . . pixel_14. All, or a subset, of these pixel accept signals may be communicated to a processing block, for example, the processing block 408 (
Although only luma and chroma pixels may have been shown in this figure, the invention need not be so limited. For example, the data path may also include phase information for the video pixels.
In operation, the pixel processing blocks may have as inputs specific pixels from the common data buffer shown in
In this manner, the pixel values stored in the pixel storage blocks 704, 708 and 712 may be synchronized with the appropriate pixels shifted in to the pipeline stages. Accordingly, a blend block, for example, the blend block 410 or 414 (
The repeat condition signal (repeat_condition) may be asserted, for example, at a boundary condition such as at a beginning of a video line or at the end of a video line. For example, a client such as the processing block 408 (
Although only luma and chroma pixels may have been shown in this figure, the invention need not be so limited. For example, the data path may also include phase information for the video pixels.
Referring to
If, however, the repeat condition signal (repeat_condition) is asserted, although the ready signal from the pipeline stage PL4 may be asserted in step 930, the ready signal to the pipeline stage PL5 may be deasserted. This will effectively keep the pipeline stage PL5 from accepting new data. Furthermore, since the pipeline stage PL5 has not accepted data, it will not assert the accept signal to the pipeline stage PL4 in step 940. This may prevent pipeline stages previous to PL5 from accepting new data. In this manner, the same data may be kept for the pipeline stage PL5 as long as that data is required for processing at the PL5 pipeline stage. When normal pipeline shifting resumes, the repeat condition signal (repeat_condition) may be deasserted. This may allow PL5 to accept data, and allow assertion of accept signal to the pipeline stage PL4 in step 940.
When a subsequent pipeline stage, for example, the pipeline stage PL4, has accepted data from the present pipeline stage, for example, the pipeline stage PL3, the present pipeline stage PL3 may accept data from the previous pipeline stage, for example, the pipeline stage PL2, regardless of the accept signal input from the subsequent pipeline stage PL4. However, if the subsequent pipeline stage PL4 has not accepted data from the present pipeline stage PL3, then the present pipeline stage PL3 may not accept data from the previous pipelines stage PL2 until the subsequent pipeline stage PL4 indicates that it has accepted data from the present pipeline stage PL3 by asserting the accept signal to the present pipeline stage PL3. At times, this may mean that in order for the first pipeline stage PL0 to accept new data, each subsequent pipeline stage may have accepted data from an immediately previous pipeline stage.
Additionally, since the accept signal may be propagated from the highest position pipeline stage, for example, pipeline stage PL14, to the lowest position pipeline stage, for example, PL0, there may be a limit to the number of pipeline stages that may be cascaded for a given clock period. For example, if the number of pipeline stages in a pipeline delay block is limited to eight by a clock period, then the pipeline delay block illustrated in
Usage of a shared buffer and a synchronized, central control mechanism, for example, in the pipeline delay block 402 and 412 (
Although embodiments of the invention may have used video processing as an example, the invention need not be so limited. Embodiments of the invention may be used for other purposes, such as audio processing or digital signal processing, where data may be processed by a plurality of data processing blocks.
Accordingly, the present invention may be realized in hardware, software, or a combination of hardware and software. The present invention may be realized in a centralized fashion in at least one computer system, or in a distributed fashion where different elements are spread across several interconnected computer systems. Any kind of computer system or other apparatus adapted for carrying out the methods described herein is suited. A typical combination of hardware and software may be a general-purpose computer system with a computer program that, when being loaded and executed, controls the computer system such that it carries out the methods described herein.
The present invention may also be embedded in a computer program product, which comprises all the features enabling the implementation of the methods described herein, and which when loaded in a computer system is able to carry out these methods. Computer program in the present context means any expression, in any language, code or notation, of a set of instructions intended to cause a system having an information processing capability to perform a particular function either directly or after either or both of the following: a) conversion to another language, code or notation; b) reproduction in a different material form.
While the present invention has been described with reference to certain embodiments, it will be understood by those skilled in the art that various changes may be made and equivalents may be substituted without departing from the scope of the present invention. In addition, many modifications may be made to adapt a particular situation or material to the teachings of the present invention without departing from its scope. Therefore, it is intended that the present invention not be limited to the particular embodiment disclosed, but that the present invention will include all embodiments falling within the scope of the appended claims.
This application makes reference to: U.S. patent application Ser. No. 11/083,597 filed Mar. 18, 2005; U.S. patent application Ser. No. 11/087,491 filed Mar. 22, 2005; U.S. patent application Ser. No. 11/090,642 filed Mar. 25, 2005; U.S. patent application Ser. No. 11/089,788 filed Mar. 25, 2005; and U.S. patent application Ser. No. ______ (Attorney Docket No. 16628US01) filed May 31, 2005. The above stated applications are hereby incorporated herein by reference in their entirety.