1. Field of the Invention
This application is related to data processing systems and more particularly to pipelined data processing systems.
2. Description of the Related Art
A typical video data processing system includes a video system on a chip (SoC) integrated circuit including multiple video processing blocks and related hardware. The video SoC receives compressed video data and decompresses (i.e., decodes, uncompresses, or expands) the compressed video data to recover uncompressed (i.e., raw) video data. The video SoC writes the uncompressed video data to a buffer or a system memory for subsequent use by one or more video processing blocks. The one or more video processing blocks retrieve the uncompressed video data from the buffer or system memory and may write processed, uncompressed video data to another buffer or other portion of system memory. In general, a still video image or frame includes R×C pixels (e.g., 1920×1080 pixels for an exemplary high-definition video screen) and each pixel may be represented by multiple bytes of data. A video processing block reads a frame, or portions of a frame of video data from a buffer or the system memory, processes the video data, and, in some cases, writes the processed video data to another buffer or back to the system memory.
In at least one embodiment of the invention, an apparatus includes a first data processor configured to communicate first data and handshake information to a non-coherent memory system. The apparatus includes a second data processor coupled in a pipeline with the first data processor and configured to execute in parallel with the first data processor. The second data processor is configured to read the first data from the non-coherent memory system in response to receiving an indicator from the non-coherent memory system based on the handshake information. The apparatus may include the non-coherent memory system. The non-coherent memory system may include a memory controller configured to receive the first data and the handshake information, the memory controller being configured to provide the indicator in response to the first data being available for a read. The memory controller may be configured to provide the indicator to the second data processor in response to the first data being committed to the memory system. The indicator signal may be based on a size of the write, a write start indicator, or a write finish indicator. The memory controller may be configured to write first data out of order to the non-coherent memory system. The first data processor may write the first data to the non-coherent memory system in a first order and the second data processor may read the first data from the non-coherent memory system in a second order.
In at least one embodiment of the invention, a method includes writing first data to a non-coherent memory system. The data is received from a first processor in a pipeline of processors executing in parallel. The method includes providing handshake information to the non-coherent memory system. The method includes detecting an indicator by a second processor of the pipeline of processors, the indicator being based on the handshake information and indicating that the first data is available for a read. The method includes reading the first data from the non-coherent memory system in response to detecting the indicator. The method may include storing the data in the non-coherent memory system and receiving the handshake information from the first processor. The method may include generating the indicator based on the handshake information and providing the indicator to the second processor. The first data processor may write the first data to the non-coherent memory system in a first order and the second data processor may read the first data from the non-coherent memory system in a second order.
The present invention may be better understood, and its numerous objects, features, and advantages made apparent to those skilled in the art by referencing the accompanying drawings.
The use of the same reference symbols in different drawings indicates similar or identical items.
Referring to
Due to the large quantity of data involved, only small quantities of video data may be available to a particular video processor circuit at a particular time. Only an individual frame or a portion of an individual frame may be available for access by a particular video processor from frame buffer 114 or SoC memory controller 116. System-on-a-chip memory controller 116 reads the video data from system memory and stores it in frame buffer 114 for processing and, in some cases, SoC memory controller 116 writes processed data back to memory 104. Video SoC 102 may include a front-end display subsystem that receives video data and generates uncompressed and/or processed video data in a form usable by the back-end subsystem. Typical front-end display subsystem operations include decoding, decompression, format conversion, noise reduction (e.g., temporal, spatial, and mosquito noise reduction) and other interface operations for video data having different formats (e.g., multiple streams). Back-end display subsystem 120 delivers the uncompressed video data to a display device (e.g., video display 122, projector, or other electronic device).
Referring to
For example, where the number of fundamental blocks that span a line of a frame of the video image is N, each row of a fundamental block includes a line portion of pixels forming 1/Nth of a line of the frame of the video image. Video processor 106 may operate on the video data in a non-linear manner, i.e., not line-by-line of the frame of the video image. In at least one embodiment, video processor 106 operates on fundamental blocks of the frame of the video image, and provides the uncompressed video data in a tiled format (i.e., fundamental block by fundamental block of uncompressed video data). In at least one embodiment, video processor 106 writes one fundamental block at a time, from left-to-right, top-to-bottom of a frame of a video image, with pixels within the block being written in a linear order. However, note that each fundamental block may include video data corresponding to multiple lines. In addition, note that tiling formats and fundamental block sizes may vary with different high-compression rate video compression techniques and decoders compliant with different video compression standards.
Referring to
Referring back to
Referring to
Referring to
A technique for synchronizing execution units of a pipelined system with a non-coherent system memory relies on the memory subsystem to provide a handshake signal to the consumer execution unit, rather than the producer execution unit. Referring to
Referring to
In at least one embodiment, producer execution unit 602 provides the handshake information to the memory controller 608 after providing the last word of the information to be written. That handshake information may include a write to a particular location in memory system 604 that is dedicated to flagging the end of a buffer write. In at least one embodiment, the handshake information is communicated as data embedded in the buffer data, at the end or near the end, of the buffer data. For example, the handshake information may include a code word that has a value that is not naturally occurring in video data. When memory controller writes that code word to a buffer in storage 610, memory controller recognizes the code word and generates a ready indicator based thereon. In at least one embodiment, the handshake information includes a length of data to be written to the memory. Memory controller 608 uses that length to determine an ending address for the data buffer. When that ending address is written, memory controller generates ready indicator 618.
For example, memory controller 608 includes one or more counters, comparators, or other logic that determines the data has been committed to storage 610 based on a start address, finish address, total amount of data, a finish count, or other information including handshake information 612. In at least one embodiment, handshake information 612 includes a total number of words being written to memory. A counter, or other logic in memory controller 608, may increment for each word committed to storage 610. When the counter value equals the total number of words specified by producer execution unit 602, then memory controller 608 generates ready indicator 618. In at least one embodiment, memory controller 608 uses the total number of words to compute an end address and compares the computed end address to an address being written. When those values are equal, or memory controller 608 otherwise detects when producer execution unit 602 has completed a write to memory, memory controller 608 generates ready indicator 618. Memory system 604 provides an indication of availability of the data being committed to the memory system 604, ready indicator 618, to consumer execution unit 606 (710).
In at least one embodiment, consumer execution unit 606 is a general purpose processor or digital signal processing unit and ready indicator 618 includes a signal coupled to an interrupt input to consumer execution unit 606. When consumer execution unit 606 detects ready indicator 618, consumer execution unit 606 triggers an interrupt and an associated interrupt service routine performs a particular set of operations including issuing a read request to particular locations of memory system 604 (712). The interrupt input may include a vectored interrupt, indicating a particular interrupt service routine corresponding to a particular function corresponding to a particular producer execution unit 602 (712). In at least one embodiment, the indicator includes one or more bits written to a particular location that is being polled by consumer execution unit 606. In response to detecting the indicator, consumer execution unit 606 issues a memory request that reads particular locations of the memory 604 and clears the polling location or interrupt (712). In at least one embodiment, consumer execution unit 606 is an application specific processing circuit and ready indicator 618 triggers a reset of consumer execution unit 606. In response to the reset, consumer execution unit 606 performs its specific function, which includes reading associated locations in memory system 604 and processing those data according to the application (712). In response to the indicator, consumer execution unit 606 processes the appropriate data. The technique maintains coherency between pipelined execution units and otherwise non-coherent buffers or system memory regardless of whether writes and reads are performed using disparate formats.
Thus techniques for synchronizing memory accesses of pipelined execution units with a non-coherent memory structure have been described. Structures described herein may be implemented using software executing on a processor (which includes firmware) or by a combination of software and hardware. Software, as described herein, may be encoded in at least one tangible computer readable medium. As referred to herein, a tangible computer-readable medium includes at least a disk, tape, or other magnetic, optical, or electronic storage medium.
While circuits and physical structures have been generally presumed in describing embodiments of the invention, it is well recognized that in modern semiconductor design and fabrication, physical structures and circuits may be embodied in computer-readable descriptive form suitable for use in subsequent design, simulation, test or fabrication stages. Structures and functionality presented as discrete components in the exemplary configurations may be implemented as a combined structure or component. Various embodiments of the invention are contemplated to include circuits, systems of circuits, related methods, and tangible computer-readable medium having encodings thereon (e.g., VHSIC Hardware Description Language (VHDL), Verilog, GDSII data, Electronic Design Interchange Format (EDIF), and/or Gerber file) of such circuits, systems, and methods, all as described herein, and as defined in the appended claims. In addition, the computer-readable media may store instructions as well as data that can be used to implement the invention. The instructions/data may be related to hardware, software, firmware or combinations thereof.
The description of the invention set forth herein is illustrative, and is not intended to limit the scope of the invention as set forth in the following claims. For example, while the invention has been described in embodiments that process video data having a particular format, one of skill in the art will appreciate that the teachings herein can be utilized with pipelined processing modules that process other types of data having other formats. Variations and modifications of the embodiments disclosed herein, may be made based on the description set forth herein, without departing from the scope and spirit of the invention as set forth in the following claims.
This application claims benefit under 35 U.S.C. §119(e) of provisional application 62/159,658 filed May 11, 2015, entitled “MEMORY SUBSYSTEM SYNCHRONIZATION PRIMITIVES”, naming Brian Lee as inventor, which application is incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
62159658 | May 2015 | US |