The present invention generally relates to video processing and more specifically to systems and methods for classifying, encoding, decoding and transmitting video content based upon regions of interest.
The amount of data required to store video can be reduced using video encoding. A number of standards have been developed to facilitate the encoding and sharing of video. H.264 is a block-oriented motion-compensation based codec standard developed by the Telecommunication Standardization Sector's Video Coding Experts Group together with the International Organization for Standardization (ISO), International Electro technical Commission (IEC) and Moving Picture Experts Group (MPEG). H.264 includes a number of features that generally allow it to encode video effectively and provide more flexibility for applications in a wide variety of network environments.
Among the many features of H.264 is the ability to divide up an image into slice groups that define regions of an image. Each slice group can also be divided into several slices that are each a sequence of macroblocks. A macroblock is an image compression component that defines a still image or video frame as two or more blocks of pixels. These macroblocks can be processed in a scan order, such as left to right and top to bottom. Also, each slice can be decoded independently.
Systems and methods in accordance with embodiments of the invention encode regions of interest within video frames to reduce errors within the regions of interest. One embodiment includes a processor configured by an encoder application, where the encoder application configures the processor to: identify at least one region of interest within a frame of video; assign at least one importance value to a plurality of regions within the frame, where a higher importance value is assigned to identified regions of interest; and apply a first error propagation reduction process to at least one region assigned a first importance value and a second error propagation reduction process to at least one region assigned a second importance value.
In a further embodiment in the first importance value is higher than the second importance value.
In another embodiment, the first error propagation reduction is more computationally intensive than the second error propagation reduction process.
In a still further embodiment, the first error propagation reduction process is an adaptive intra refresh encoding process.
In still another embodiment, the second error propagation reduction process involves performing no additional error propagation reduction processing.
In a yet further embodiment, the encoder application configures the processor to encode each video frame as a set of slice groups and to assign at least one importance value to each slice group.
In yet another embodiment, the encoder application configures the processor to group the slice groups in each frame based upon the importance values assigned to the slice groups.
In a further embodiment again, the encoder application configures the processor to assign importance values based upon user input.
In another embodiment again, the encoder application configures the processor to automatically assign importance values using an automated region of interest detection process.
A further additional embodiment includes identifying at least one region of interest within a frame of video using a source encoder, assigning at least one importance value to a plurality of regions within the frame using a source encoder, where a higher importance value is assigned to identified regions of interest, and applying a first error propagation reduction process to at least one region assigned a first importance value and a second error propagation reduction process to at least one region assigned a second importance value using the source encoder
In another additional embodiment, the first importance value is higher than the second importance value.
In a still yet further embodiment, the first error propagation reduction is more computationally intensive than the second error propagation reduction process.
In still yet another embodiment, the first error propagation reduction process comprises an adaptive intra refresh encoding process.
In a still further embodiment again, the second error propagation reduction process comprises performing no additional error propagation reduction processing.
Still another embodiment again includes a processor configured by a decoder application, where the decoder application configures the processor to: receive data including a sequence of encoded video frames; decode the sequence of encoded video frames; apply a first error concealment process when a region of a frame of video has a first importance value; and apply a second error concealment process when a region of a frame of video has a second importance value.
In a still further additional embodiment, the first importance value is higher than the second importance value.
In still another additional embodiment, the first error concealment process is more computationally intensive than the second error concealment process.
In a yet further embodiment again, each video frame is encoded as a set of slice groups and each slice group is assigned at least one importance value.
In yet another embodiment again, each video frame is encoded so that the slice groups are grouped based upon importance value.
In a yet further additional embodiment, the decoder application configures the processor to decode slice groups having higher importance values before slice groups having lower importance values.
In yet another additional embodiment, the first error concealment process includes at least one process selected from the group consisting of an interlayer error concealment process, a temporal error concealment process and a spatial error concealment process.
In a further additional embodiment again, the importance values are included in the encoded video.
In another additional embodiment again, the decoder application configures the processor to assign at least one importance value to regions of the sequence of encoded frames of video.
Another further embodiment includes receiving data including a sequence of encoded video frames using a playback device, decoding the sequence of encoded video frames using the playback device, applying a first error concealment process when a region of a frame of video has a first importance value using the playback device, and applying a second error concealment process when a region of a frame of video has a second importance value using the playback device.
In still another further embodiment, the first importance value is higher than the second importance value.
In yet another further embodiment, the first error concealment process is more computationally intensive than the second error concealment process.
In another further embodiment again, the first error concealment process includes at least one process selected from the group consisting of an interlayer error concealment process, a temporal error concealment process and a spatial error concealment process.
Another further additional embodiment includes a machine readable medium containing processor instructions, where execution of the instructions by a processor causes the processor to perform a process including: identifying at least one region of interest within a frame of video using a source encoder; assigning at least one importance value to a plurality of regions within the frame using a source encoder, where a higher importance value is assigned to identified regions of interest; and applying a first error propagation reduction process to at least one region assigned a first importance value and a second error propagation reduction process to at least one region assigned a second importance value using the source encoder.
In still yet another further embodiment, the machine readable medium is non-volatile memory.
Still another further embodiment again includes a machine readable medium containing processor instructions, where execution of the instructions by a processor causes the processor to perform a process including: receiving data including a sequence of encoded video frames using a playback device; decoding the sequence of encoded video frames using the playback device; applying a first error concealment process when a region of a frame of video has a first importance value using the playback device; and applying a second error concealment process when a region of a frame of video has a second importance value using the playback device.
In still another further additional embodiment, the machine readable medium is non-volatile memory.
Turning now to the drawings, systems and methods for encoding regions of interest within video frames to reduce errors within the regions of interest in accordance with embodiments of the invention are illustrated. The differences between the decoded frame and the encoded frame are typically referred to as errors. These errors can be caused by loss of information during the encoding process and/or loss of information during the transmission of data. In a number of embodiments, different regions within a frame of video are assigned different levels of importance or importance values. Based upon the importance value assigned to each region, a video encoder can assign additional resources to the encoding of the more important regions to reduce the number of errors in that region when the video frame is decoded. Likewise, a decoder can also assign additional resources to the decoding of the more important regions to conceal any errors that may have been introduced during the transmission process. In many embodiments, the importance values assigned to regions within frames of video determine the treatment of the region throughout the encoding, transmission and decoding of the video frame.
Regions of interest are generally regions within a video frame containing visual information that is important to a viewer. Regions of interest within a frame of video and/or video sequence can be determined manually by a user or automatically by an automated region of interest detection process. In several embodiments, automated detection of regions of interest is performed by identifying moving foreground objects as regions of interest within a sequence of video frames. In many embodiments, higher importance values are assigned to regions of interest relative to background information and/or other portions of the video that are determined to have lower importance to the viewer.
Once importance values are assigned to different regions of interest within a video frame and/or sequence of video frames, encoders in accordance with embodiments of the invention can perform varying levels of error propagation reduction or error resilient encoding to each portion of a video frame based upon the assigned importance values. Error propagation reduction reduces the likelihood that a specific portion of an encoded frame of video will include errors or differences with respect to the original frame of video when decoded. Error propagation reduction can be achieved using various techniques, which are discussed below. In a number of embodiments, adaptive intra refresh is utilized to encode regions having higher importance values. In other embodiments, any of a variety of error propagation reduction processes can be utilized in the encoding of different regions of a frame having different importance values.
Importance values can also be utilized during the decoding of video to perform error concealment. Error concealment is a process that involves reducing the errors in decoded video that result from data loss. Error concealment can be performed in many ways such as (but not limited to) a computationally cheap replacement from a previous frame or using more computationally expensive interlayer, temporal or spatial concealment. In a number of embodiments of the invention, a decoder applies more computationally expensive error concealment processes to regions of a video frame having a high importance value and less computationally expensive error concealment processes to regions having lower importance values.
In many embodiments, portions of a frame of video are transmitted from an encoder to a decoder in an order that prioritizes the regions of video based upon assigned importance value. In principle, the order of importance can be chosen freely and be transmitted with each video frame, which creates additional overhead. However, in many embodiments of the invention, this order would no longer need to be transmitted as it is understood that the more important regions of interest are transmitted earlier. In this way, communication overhead can be reduced as the transmission and decoding order is fixed to send more important macroblocks first, eliminating the need to transmit additional information relating to transmission and decoding order.
Although certain embodiments are discussed above, there are many additional ways to implement preferential treatment of regions of importance in video processing in accordance with many embodiments of the invention. System architectures that implement preferential treatment of regions of importance in video processing are discussed in greater detail below.
Video encoded in accordance with many embodiments of the invention can be transmitted to playback devices via the Internet. In many instances, data is lost during transmission and performing encoding and decoding based upon importance values assigned to regions of interest can reduce the perceptible impact of data loss during playback of the video on a playback device. A video distribution system in accordance with an embodiment of the invention is illustrated in
Source encoders in accordance with many embodiments of the invention can load an encoder application as machine readable instructions from memory or other storage. A source encoder in accordance with an embodiment of the invention is illustrated in
Similarly, playback devices in accordance with many embodiments of the invention can load a decoder application as machine readable instructions from memory. A playback device in accordance with an embodiment of the invention is illustrated in
Likewise, content distribution servers in accordance with many embodiments of the invention can load a content distribution application as machine readable instructions from memory. A content distribution server in accordance with an embodiment of the invention is illustrated in
Although a video distribution system is described above with respect to a specific source encoder, content distribution server and playback devices, any of a variety of encoding, transmitting or decoding systems can be utilized in the encoding, decoding and transmission of video as appropriate to specific applications in accordance with many embodiments of the invention. Assignment of importance values in accordance with embodiments of the invention are discussed below.
Source encoders in accordance with many embodiments of the invention utilize information concerning the relative importance of different regions of video frames to prioritize the application of error propagation reduction encoding processes to different regions of a video frame during encoding. Important regions can be identified using region of interest detection processes. Each region of interest can be assigned an importance value. In block based encoding, importance values can be assigned to different slice groups corresponding to the regions of interest. Different error propagation reduction processes can then be applied to each slice group based upon the importance value assigned to the slice group.
A diagram conceptually illustrating a process of determining regions of interest within a video frame and assigning importance values to slice groups within the frame for use during the encoding and decoding of the frame in accordance with an embodiment of the invention is shown in
There are many processes that can be utilized to identify regions of interest in video. Manual processes can be utilized, such as where a user manually tags a region of interest or utilizes a user eye tracking device. Automated processes such as content recognition systems can also be used, such as by defining a region of interest to be an area of greater contextual complexity or movement in a video. Still other automated region of interest processes may define a region of interest through detection of object boundaries or contours that fall under certain criteria such as size, shape or amount of movement. Although certain region of interest detection processes are discussed above, any kind of detection of a region of interest to a user in accordance with embodiments of the invention may be made. Error propagation reduction for video in accordance with embodiments of the invention are discussed below.
Error propagation reduction can be performed on different regions of a video frame based upon the importance values assigned to the regions in accordance with many embodiments of the invention. A process for encoding video by assigning importance values to different regions of video frames and performing error propagation reduction based upon the assigned importance values in accordance with an embodiment of the invention is illustrated in
In many embodiments, resource intensive error propagation reduction is performed upon slice groups with greater degrees of importance. Typically, the ability to perform more computationally intensive processes results in improved error propagation reduction. Error propagation reduction can be performed in many ways in various embodiments, such as increasing the amount of data sent in a slice group so that data loss will be inconsequential to the overall slice group video quality. In many embodiments where block based encoding is used, error propagation reduction can be performed using an adaptive intra refresh encoding process. Adaptive intra refresh is a technique of error propagation reduction that adapts the intra-refresh rate of macroblocks according to factors including video transmission conditions and video content. Intra refreshing allows for a column of intra blocks to move across a video from one side to the other, “refreshing” the frame. Regions of interest can be prioritized using adaptive intra refresh by increasing the intra-refresh rate for regions of interest.
In a number of embodiments, intra refresh is utilized to perform error propagation reduction process with respect to regions of the video frame assigned a high importance value. In several embodiments, an error propagation reduction process applied to regions of the video frame assigned a lower importance value can be any process including (but not limited to) a processes selected from the group consisting of reference picture identification, gradual decoding refresh, redundant slices, reference picture marking repetition, spare picture signaling, scene informant signaling and constrained intra prediction. In many embodiments, no error propagation reduction process is applied to regions of the video frame assigned a low importance value. In certain embodiments, any of a variety of error propagation reduction processes can be applied to regions of the video frame assigned a high importance value and/or a lower importance value as appropriate to the requirements of a specific application.
Although certain error propagation reduction processes are discussed above, many error propagation process can be utilized in accordance with various embodiments of the invention including (but not limited to) reference picture identification, gradual decoding refresh, redundant slices, reference picture marking repetition, spare picture signaling, scene informant signaling and constrained intra prediction. Error concealment for video decoding in accordance with embodiments of the invention is discussed below.
Decoders in accordance with many embodiments of the invention can apply different levels of error concealment based upon the importance value assigned to specific regions of a frame of video. When video data is lost or corrupted during transmission, any frame that is encoded to utilize the missing data is impacted. Error concealment processes attempt to minimize the impact of missing data. Error concealment can be performed in many ways such as (but not limited to) a computationally cheap replacement from a previous frame or using more computationally expensive interlayer, temporal or spatial concealment. Interlayer error concealment utilizes a base layer of a video frame, which is the most fundamental information used to reconstruct a video frame, and enhancement layers, which are other layers that produce more refined information when combined with the base layer. Temporal error correction utilizes trends in video frames over a sequence of frames to compensate for decoding errors. Likewise, spatial error concealment utilizes trends within a frame to compensate for decoding errors. In several embodiments, the decoder applies more computationally intensive processes to perform error concealment where the missing data is related to a region of video assigned a high importance value. Less computationally intensive error concealment processes can be applied where missing data is related to a region of video assigned a low importance value.
A process for performing error concealment based upon importance values assigned to different regions of video in according with an embodiment of the invention is illustrated in
Importance values assigned to regions of video frames can be utilized in accordance with embodiments of the invention to encode the frames of video so that data related to the regions having the highest importance are transmitted before the data associated with less important regions in a video stream. A process of encoding slice groups so that they are grouped in degree of importance for transmission to a decoder in importance order in accordance with an embodiment of the invention is conceptually illustrated in
A method of encoding video for transmission of video data within each frame based upon order of importance in accordance with an embodiment of the invention is illustrated in
While the above description contains many specific embodiments of the invention, these should not be construed as limitations on the scope of the invention, but rather as an example of one embodiment thereof. It is therefore to be understood that the present invention may be practiced otherwise than specifically described, without departing from the scope and spirit of the present invention. Thus, embodiments of the present invention should be considered in all respects as illustrative and not restrictive.