Embodiments of the present invention relate generally to techniques for processing decoded video data and, more particularly, to techniques for generating and adding random noise to mask visual compression artifacts in decoded video data.
This section is intended to introduce the reader to various aspects of art that may be related to various aspects of the present invention, which are described and/or claimed below. This discussion is believed to be helpful in providing the reader with background information to facilitate a better understanding of the various aspects of the present invention. Accordingly, it should be understood that these statements are to be read in this light, and not as admissions of prior art.
As the popularity of mobile and portable electronic devices continues to grow, the demand for network-based digital multimedia has also increased. For example, many portable electronic devices, such as cellular phones and portable media players, are now capable of wirelessly connecting to and communicating through the Internet or through other networks, such as local or wide area networks, allowing a user to download or stream multimedia. However, transfer rates for downloading or streaming media are typically limited by the maximum bandwidth of a particular network, and may be further limited by any other additional network traffic simultaneously occurring on the particular network (e.g., concurrent downloads and transfers by other users).
To provide an example, it is not uncommon for video data having a window size of 320×240 pixels and a frame rate of 15 frames per second (fps) to be encoded at a bit rate of 300 kilobits/second (kb/s). At this bit rate, approximately 128.7 megabytes (MB) is required to represent one hour of video data. Thus, to stream the 128.7 MB of video data in real time, a network must support a consistent bandwidth of at least 300 kb/s, which may be well above the capabilities of some wireless or local networks. Alternatively, if a user decides to download and store a copy of the video data locally on a device, or to temporarily store the video data in memory (e.g., caching or buffering) for playback, the transfer rate for the download is still limited by the maximum network bandwidth. As such, the user may have to wait an excessive length of time for a download to complete before being able to view the video data. Moreover, mobile and portable electronic devices may be limited by the amount of storage space or memory available. Accordingly, downloading and/or storing local copies of very large video files may be impractical for some mobile or portable electronic devices.
One method for overcoming the aforementioned drawbacks of streaming media is through video compression, which refers generally to techniques for reducing the quantity of video data used to represent video images, while retaining as much of the original video image quality as possible. By compressing video data prior to transmission across a network and subsequently decoding the compressed video data on the receiving mobile or portable device, the total amount of video data transferred is reduced, thereby reducing the bit rate and the bandwidth required to transmit the digital video. For example, one such video compression standard, H.264 (also known as MPEG-4 Part 10) provides a high video compression algorithm capable of maintaining a high quality image while compressing video data by a factor of more than 30 times.
Disadvantageously, most video compression standards use lossy data compression techniques in which data determined by a particular compression algorithm to be of lesser importance to the overall content, but which is nonetheless discernible and objectionable to the user, is discarded. As a result, certain video compression algorithms may introduce visual artifacts into the decoded video stream, which may be distracting to a user when viewing the decoded video data. Such visual artifacts are generally attributable to the latent error in lossy data compression and may appear more frequently as higher video compression rates are used. Moreover, such artifacts are exacerbated when the decoded video images are scaled to larger high definition displays. One solution for reducing the impact of visual artifacts is by introducing random noise into the video stream after the compressed video is decoded. This technique is often referred to as “random dithering.” Although the added noise does not eliminate the visual artifacts, it may reduce the ability of the user to perceive the artifacts, thus rendering them less distracting to the human eye and increasing the overall aesthetic appearance of the video images.
Certain aspects of embodiments disclosed herein by way of example are summarized below. It should be understood that these aspects are presented merely to provide the reader with a brief summary of certain forms an invention disclosed and/or claimed herein might take and that these aspects are not intended to limit the scope of any invention disclosed and/or claimed herein. Indeed, any invention disclosed and/or claimed herein may encompass a variety of aspects that may not be set forth below.
The present disclosure generally relates to techniques for introducing random noise into a decoded video stream to reduce or mask the visibility of compression artifacts. In accordance with one embodiment of the invention, an exemplary technique may provide a noise generation system for determining a random noise addend value and combining the noise addend value with received video data. To determine the noise addend value, a random number may be generated and compared with one or more threshold values defining multiple non-overlapping noise distribution threshold ranges, which may be selected based on a noise distribution function. Comparison logic may be further provided to determine a threshold comparison result corresponding to the threshold range in which the random number belongs. Based on the threshold comparison result, a corresponding noise addend value may be selected from a noise addend value range, and subsequently combined with compressed video data (e.g., a pixel), essentially providing a DC offset with respect to the original value of the video data, thus introducing random “dither” noise to the video data. The random number may be generated using a random number generator having a degree of randomness, such that no obvious or noticeable repeating dithering patterns are perceivable in a single frame of video, or across multiple frames over time (e.g., in the temporal domain).
In accordance with another aspect of the present invention, the noise generation system may include a storage device storing multiple noise distribution functions, such that the threshold values and noise distribution threshold ranges may be adaptively adjusted for each frame of video data, for example, by selecting a different noise distribution function for each video image frame. The selection of the noise distribution function may be based on one or more characteristics or properties of the received video data. In accordance with a further aspect of the present invention, the noise addend value range may be configurable to select noise added values from a first range or from a second range which is lesser than the first range. The abilities to adaptively adjust the threshold values and threshold ranges, as well as to provide multiple ranges from which the noise addend values may be selected provide increased flexibility in the distribution of the random noise, thus optimizing the overall masking of visual compression artifacts in the video data.
Various refinements of the features noted above may exist in relation to various aspects of the present disclosure. Further features may also be incorporated in these various aspects as well. These refinements and additional features may exist individually or in any combination. For instance, various features discussed below in relation to one or more of the illustrated embodiments may be incorporated into any of the above-described aspects of the present invention alone or in any combination. Again, the brief summary presented above is intended only to familiarize the reader with certain aspects and contexts of embodiments of the present disclosure without limitation to the claimed subject matter.
These and other features, aspects, and advantages of the present disclosure will become better understood when the following detailed description of certain exemplary embodiments is read with reference to the accompanying drawings in which like characters represent like parts throughout the drawings, wherein:
One or more specific embodiments of the present invention will be described below. These described embodiments are only exemplary of the present invention. Additionally, in an effort to provide a concise description of these exemplary embodiments, all features of an actual implementation may not be described in the specification. It should be appreciated that in the development of any such actual implementation, as in any engineering or design project, numerous implementation-specific decisions must be made to achieve the developers' specific goals, such as compliance with system-related and business-related constraints, which may vary from one implementation to another. Moreover, it should be appreciated that such a development effort might be complex and time consuming, but would nevertheless be a routine undertaking of design, fabrication, and manufacture for those of ordinary skill having the benefit of this disclosure.
An exemplary electronic device 10 is illustrated in
In one or more embodiments, the device 10 may allow a user to connect to and communicate through the Internet or through other networks, such as local or wide area networks, for streaming video image data. The device 10, in other embodiments, may also allow users to download and store video data locally for later playback on the device 10. By way of example, the electronic device 10 may be a model of an iPod® or an iPhone®, available from Apple Inc. of Cupertino, Calif.
In certain embodiments, the device 10 may be powered by one or more rechargeable and/or replaceable batteries. Such embodiments may be highly portable, allowing a user to carry the electronic device 10 while traveling, working, exercising, and so forth. In this manner, and depending on the functionalities provided by the electronic device 10, a user may play and view video files (e.g., clips, movies, etc.), listen to music, play games, record video or take pictures, place and receive telephone calls, communicate with other users via a network, control other devices (e.g., via remote control and/or Bluetooth functionality), and so forth while moving freely with the device 10. In addition, device 10 may be sized such that it fits relatively easily into a pocket or a hand of the user. While certain embodiments of the present invention are described with respect to a portable electronic device, the presently disclosed techniques may be applicable to a wide array of other, less portable, electronic devices and systems that are configured to render graphical data, such as a desktop computer, notebook computer, or tablet computer. By way of example, the device 10 may be a model of a Macbook®, Macbook® Pro, Macbook® Air, Mac® Pro, iMac®, Mac Mini®, or iPad®. In further embodiments, the device 10 may be a television set, or a networked digital media receiver, such as Tivo, available from Tivo Inc., or Apple TV®, available from Apple Inc., or may be a television set with the functionality of a networked digital media receiver device integrated therein.
As shown, the electronic device 10 includes an enclosure or housing 12, a display 14, user input structures 16, and input/output connectors 18. The enclosure 12 may be formed from plastic, metal, composite materials, or other suitable materials, or any combination thereof. The enclosure 12 may protect the interior components of the electronic device 10 from physical damage, and may also shield the interior components from electromagnetic interference (EMI).
The display 14 may be a liquid crystal display (LCD), a light emitting diode (LED) based display, an organic light emitting diode (OLED) based display, or any other suitable display. In accordance with certain embodiments of the present invention, the display 14 may display a user interface and various other images, such as logos, avatars, photos, album art, and the like, as depicted by reference numeral 15. Additionally, in one embodiment, the display 14 may include a touch screen through which a user may interact with the user interface. The display may also include various function and/or system indicators to provide feedback to a user, such as power status, call status, memory status, or the like. These indicators may be incorporated into the user interface displayed on the display 14.
As will be understood by those skilled in the art, many display systems are, for various reasons, not capable of displaying or sensing the different color channels at the same site. Therefore, in some embodiments, the display 14 of the device 10 may also include a pixel array divided into single-color regions such as red, blue, and green, such that a “pixel” is made up of 3 sub-pixel components. The sub-pixel components contribute to the displayed or sensed color when viewed at a distance. However, for the purposes of this disclosure, the term “pixel” or the like should be interpreted as meaning either pixel or sub-pixel components of a video image. By way of example, a device providing 8 data bits (256 colors) each for red, blue, and green pixel components may be capable of providing a total of 16,777,216 (2563) color combinations. In one embodiment, the display 14 may be a high-resolution LCD display having 300 or more pixels per inch, such as model of the Retina Display®, available from Apple Inc.
In one embodiment, one or more of the user input structures 16 are configured to control the device 10, such as by controlling a mode of operation, an output level, an output type, etc. For instance, the user input structures 16 may include a button to turn the device 10 on or off. Further the user input structures 16 may allow a user to interact with the user interface on the display 14. Embodiments of the portable electronic device 10 may include any number of user input structures 16, including buttons, switches, a control pad, a scroll wheel, or any other suitable input structures. The user input structures 16 may work with the user interface displayed on the device 10 to control functions of the device 10 and/or any interfaces or devices connected to or used by the device 10. For example, the user input structures 16 may allow a user to navigate a displayed user interface or to return such a displayed user interface to a default or home screen.
The device 10 also includes various input and output (I/O) ports 18 to allow connection of additional devices. For example, a port 18 may be a headphone jack that provides for the connection of headphones. Additionally, a port 18 may have both input/output capabilities to provide for connection of a headset (e.g., a headphone and microphone combination). Embodiments of the present invention may include any number of input and/or output ports, such as headphone and headset jacks, universal serial bus (USB) ports, IEEE-1394 ports, AC and/or DC power connectors, or a Thunderbolt® port. Further, the device 10 may use the I/O ports 18 to connect to and send or receive data with any other device, such as other portable electronic devices, personal computers, printers, or the like. For example, in one embodiment, the device 10 may connect to a personal computer via a Thunderbolt® connection to send and receive data files, such as media files. Further, in one embodiment, the device 10 may not include an integrated display, and instead may include an external display that is coupled to the device 10 using one of the I/O ports 18, which may include a VGA, DVI, HDMI, Thunderbolt, or DisplayPort interface.
Turning now to
As discussed herein, the user interface 20 may be displayed on the display 14, and may provide a means for a user to interact with the electronic device 10. The user interface may be a textual user interface, a graphical user interface (GUI), or any combination thereof, and may include various layers, windows, screens, templates, elements, or other components that may be displayed in all or in part of the display 14. The user interface 20 may, in certain embodiments, allow a user to interface with displayed interface elements via one or more user input structures 16 and/or via a touch sensitive implementation of the display 14. In such embodiments, the user interface provides interactive functionality, allowing a user to select, either by touch screen or another input structure, from among options displayed on the display 14. Thus the user can operate the device 10 by appropriate interaction with the user interface 20, such as by touching the screen with the user's finger or with a stylus.
The processor(s) 22 may provide the processing capability required to execute the operating system, programs, user interface 20, and any other functions of the device 10. The processor(s) 22 may include one or more microprocessors, such as one or more “general-purpose” microprocessors, one or more special-purpose microprocessors and/or ASICs, or some combination thereof. For example, the processor 22 may include one or more reduced instruction set (RISC) processors and/or x86 processors, as well as graphics processors, video decoders, video processors, image signal processors, and/or related chip sets. By way of example only, the processor(s) 22 may include a model of a system-on-a-chip (SOC) processor available from Apple Inc., such as a model of the A4 or A5 SOC processors.
As noted above, embodiments of the electronic device 10 may also include a memory 24. The memory 24 may include a volatile memory, such as random access memory (RAM), and/or a non-volatile memory, such as read-only memory (ROM). The memory 24 may store a variety of information and may be used for various purposes. For example, the memory 24 may store the firmware for the device 10, such as an operating system, other programs that enable various functions of the device 10 including user interface functions and processor functions. Moreover, the memory 24 may be used for buffering or caching data, such as video image data, during operation of the device 10.
The non-volatile storage 26 of device 10 of the presently illustrated embodiment may include ROM, flash memory, a hard drive, or any other suitable optical, magnetic, or solid-state storage medium, or a combination thereof. The storage 26 may store data files such as media (e.g., music and video files), software (e.g., for implementing functions on device 10), preference information (e.g., media playback preferences), wireless connection information (e.g., information that may enable the device 10 to establish a wireless connection, such as a telephone connection), subscription information (e.g., information that maintains a record of podcasts, television shows, or other media to which a user subscribes), telephone information (e.g., telephone numbers), and any other suitable data. The embodiment illustrated in
The device 10 also includes a power source 30. In one embodiment, the power source 30 may be one or more batteries, such as a Li-Ion battery, may be user-removable or secured to the housing 12, and may or may not be rechargeable. Additionally, the power source 30 may include AC power, such as provided by an electrical outlet, and the device 10 may be connected to the power source 30 via the I/O ports 18. Additionally, though not shown in
The device 10 depicted in
The operation of the noise generation system 34 may be better understood through reference to
In the illustrated embodiment, the noise generation system 34 may include a noise generation circuit 40 for processing each received pixel for addition of random noise by determining a noise addend value, which may be greater than zero, less than zero, or equal to zero (e.g., no noise added), and summing the determined noise addend value with the received pixel data. The processed pixel may then be subsequently outputted to the display 14 via an output data bus 44. The noise generation system 34 may repeat this operation for each pixel of the decoded video data buffered in the memory 24 until the entire stream of the decoded video data is processed. The procedure for determining the noise addend values will be discussed in further detail below. Further, although not explicitly illustrated in
In the illustrated embodiment, the noise generation system 34 may be configured to select a noise distribution function for each frame of the buffered video data. The noise distribution function may be selected from a noise distribution function storage unit 42 containing multiple noise distribution functions. Each noise distribution function may define particular ranges and distribution of noise addend values based on one or more threshold values, which may be applied to the processing of the buffered video data. The added noise may be referred to herein as dither noise. In accordance with embodiments of the present invention, the dither noise may be added at the video source resolution (e.g., the source resolution at which the video data is received by the device 10, prior to any scaling). By way of example only, source resolutions may include any suitable resolution from between 480p to 1080p. Additionally, the pixel data to which the dither noise is added may be in YUV 4:2:0 format in one embodiment. As can be appreciated, YUV formats take human visual perception into account when encoding a color image or video. For instance, YUV formats may allow for reduced bandwidth for chrominance components, which may enable transmission errors and/or compression artifacts to be more efficiently masked via human perception.
The noise generation system 34 may further include logic for selecting a noise distribution function based on one or more video characteristics of a particular frame, such as the degree or method of video compression, color range, or amount of spatial or motion detail, just to name a few, in order to provide an optimal distribution of random noise in a particular frame of the decoded video data for masking compression artifacts. The logic for selecting the noise distribution function may be implemented in either the noise generation circuit 40 or the noise distribution function storage 42. In the illustrated embodiment, the noise distribution function storage 42 may be provided by a non-volatile storage device, such as a ROM, flash memory, a hard drive, or any other suitable optical, magnetic, or solid-state storage medium, or a combination thereof. In one embodiment, the noise distribution function storage 42 may be implemented as a standalone non-volatile storage unit separate from the main non-volatile storage 26 of the device 10. In alternate embodiments, the noise distribution function storage 42 may be included as part of the main non-volatile storage 26.
While
The noise generation system 34′ may repeat this operation, processing the decoded video data buffered in the memory 24 in subsequent groups of eight pixels until an entire stream of decoded video data is processed and outputted to the display 14. Further, as discussed above, each of the noise generation circuits 40a-40h may include its own processing unit and memory, and may receive power from the main power source 30 of the device 10. In other embodiments, the noise generation circuits 40a-40h may share a common memory and/or common processing unit.
In the illustrated embodiment, the noise generation system 34′ may also include a noise distribution function storage unit 42 having multiple noise distribution functions stored therein. For each frame of video data processed, the noise generation system 34′ may include logic for selecting a noise distribution function from the noise distribution function storage unit 42 based on one or more video characteristics of the current video frame, such as the degree or method of video compression, color range, or amount of spatial or motion detail, in order to provide the optimal distribution of random noise in a particular frame of the decoded video data for masking compression artifacts. In the illustrated embodiment, the selected noise distribution function may be applied to all the noise generation circuits 40a-40h during the processing of the current video frame. As discussed above, the noise distribution function storage 42 may be implemented as a standalone non-volatile storage unit separate from the main non-volatile storage 26 of the device 10 or, in other embodiments, may be included as part of the main non-volatile storage 26.
In one particular embodiment, the noise generation circuits 40 or 40a-40h may be implemented as a hardware block which may be standalone and separate from a main video processing unit, such as a graphics processing unit (GPU). The hardware block may be capable of processing the video data independently of a GPU, thus allowing more flexible and complex dithering algorithms to be utilized without lags or interruptions in the streamed video. For example, as discussed above, the threshold values and the noise addend values may be dynamically set on a frame by frame basis in accordance with a noise distribution function which may be optimally selected based on one or more image characteristics of the current video frame. Further, because the processing of the video data by a hardware block effectively offloads this task from the GPU, power may be saved, thereby extending the overall battery life in portable and mobile devices, such as the device 10.
The operation of the noise generation circuits 40 and 40a-40h described respectively in
The random number generator 48 is provided for generating a random number. The random number generator 48 may be implemented by any suitable hardware-based random or pseudo-random number generator or via a processing unit adapted for executing a random number generating algorithm. In one embodiment, the random number generator 48 may generate an 8-bit random number ([7:0]) having a range of values: 0-255. However, it should be noted that in alternate embodiments, the random number generator 48 may be implemented to generate a number within a smaller or larger numerical range of values depending on specific implementation and design goals of the device 10. For instance, in one embodiment, the random number generator 48 may include a linear feedback shift register (LFSR), which may be implemented in software and/or hardware. As can be appreciated by those skilled in the art, an LFSR is a shift register whose input bit is a linear function of a previous state and, when properly configured, may provide a high degree of randomness without noticeable or obvious repeating patterns. In the presently disclosed embodiments, the random number generator may be configured such that no noticeable or obvious repeating dithering patterns are present within a single output frame, or in the temporal domain, i.e., over time across multiple frames.
The generated random number is outputted by the random number generator 48, as depicted by reference numeral 50, and received by the threshold comparison logic 52. The threshold comparison logic 52 may select one of multiple non-overlapping threshold ranges defined by one or more threshold values. As will be discussed in further detail below, the one or more threshold values which define the non-overlapping threshold ranges may be set in accordance with a selected noise distribution function, which may be adaptively adjusted for each frame of the video data. In one embodiment, the threshold comparison logic 42 compares the random number 50 against the one or more threshold values stored in the threshold comparison logic 52 to determine a threshold range to which the random number 50 belongs. Based on the determined threshold range, a threshold comparison result 54 is outputted from the threshold comparison logic 52 and received by a noise addend determination logic block 56 which may select a noise addend value from multiple noise addend values based upon the received threshold comparison result 54.
Referring now to
Based on the value of R, one of the five threshold ranges defined in Table 1 may be selected and outputted as the threshold comparison result 54, which is received by the noise addend determination logic 56. As discussed above, the noise addend determination logic 56 may select a noise addend value from a range of noise addend values, based upon the threshold comparison result 54. In one embodiment, the noise determination logic 56 may select noise addend values from a noise addend value range of +2 to −2 (e.g., +2, +1, 0, −1, and −2) depending on the values of T1, T2, T3, and T4, such that each of the five threshold ranges illustrated in Table 1 corresponds to one of the noise addend values. The following table illustrates one possible implementation of this embodiment:
As illustrated in Table 2, if the threshold comparison result 54 indicates R<T1, the noise addend determination logic 56 may select +2 as the noise addend value. If the threshold comparison result 54 indicates T1≦R<T2, the noise addend value of +1 may be selected. If the threshold comparison result 54 indicates T2≦R≦T3, the zero value noise addend (indicating no change in the pixel), may be selected. Additionally, if the threshold comparison result 54 indicates T3<R≦T4, a noise addend value of −1 may be selected and, similarly, a noise addend value of −2 may be selected if the threshold comparison result 54 indicates T4<R.
To provide one example, Table 3 below illustrates five threshold ranges and demonstrates the selection of a corresponding noise addend value when the threshold values T1, T2, T3, and T4 are set to 50, 100, 150, and 200, respectively.
Based on the values set for T1, T2, T3, and T4, the five threshold ranges are defined as 0-49, 50-99, 100-150, 151-200, and 201-255. Therefore, if the threshold comparison result 54 indicates that the value of R is in the range 0-49, the noise addend determination logic 56 may select +2 as the noise addend value. If the threshold comparison result 54 indicates that the value of R is in the range 50-99, the noise addend value of +1 may be selected. Further, if the threshold comparison result 54 indicates that the value of R is in the range 100-150, the zero value noise addend (indicating no change in the pixel), may be selected. Additionally, if the threshold comparison result 54 indicates that the value of R is in the range 151-200, a noise addend value of −1 may be selected and, similarly, a noise addend value of −2 may be selected if the threshold comparison result 54 indicates that the value of R is in the range 201-255, where 255 is the maximum possible value for R, in accordance with the presently illustrated embodiment.
The noise addend determination logic 56 may be implemented using any suitable type of selection circuitry. In the illustrated embodiment, the noise addend determination logic 56 is provided by a 5-to-1 multiplexer circuit 76 configured to receive the threshold comparison result 54 as a control input. Based on the control input 54, a noise addend value may be selected from a range (e.g., +2, +1, 0, −1, −2) of possible noise addend values 78 and outputted from the multiplexer 76, as depicted by reference numeral 58. The addition of these noise addend vales effectively provides a DC offset with respect to the original value of the video data, thus introducing the random dither noise to the video data.
Further, in the presently illustrated embodiment, in addition to being capable of providing a noise addend in a range of +2 to −2, the threshold comparison logic 52 and noise addend determination logic 56 may also be configurable to provide a noise addend value in a lesser range of +1 to −1, for example. For instance, when a smaller range is preferred, the threshold values may be set such that T1 and T4 are equal to the minimum and maximum values possible for R, respectively. Table 4 below illustrates an example in which the threshold values T1, T2, T3, and T4 are configured to provide the smaller range of noise addend values from +1 to −1:
As discussed above, the value of the random number R may have a range of 0-255. Thus, as illustrated in Table 4, the threshold comparison result R<0 will never occur because the minimum value for R will never be less than 0. Similarly, the threshold comparison result 255<R will also never occur because the maximum value for R will never be greater than 255. Accordingly, the noise addend values +2 and −2 are never selected when the values of T1 and T4 are set to the minimum and maximum values for R, respectively. Therefore, the present configuration effectively yields only three possible threshold ranges, 0-119, 120-180, and 181-255, from which the noise addend values +1, 0, and −1 may be selected, respectively.
The flexibility to provide noise added values from either a range of +2 to −2 or a range of +1 to −1 allows for increased optimization in the masking of visual compression artifacts. As discussed above, the threshold values and noise addend values applied to the processing of video data may be based upon a particular noise distribution function, which, in some embodiments, may be selected from multiple noise distribution functions to optimize the amount of random noise added to each video frame. For example, the noise distribution function may be selected based on certain properties of the video data, such as degree or method of video compression, color range, or amount of spatial or motion detail. Thus, the amount of dithering noise added to each frame may be configurable (e.g., via software). Further, dithering may be disabled on certain frames under certain conditions. For instance, dithering may be disabled in some instances if it is determined that the video stream has substantially very few, if any, compression artifacts. Additionally, dithering may be disabled on a freeze frame mode or pause, where the user freezes or stops the video at a particular frame. In such cases, the display controller continues to read the same data from a frame buffer, thus displaying the same frame over and over for the duration of the freeze frame. If dithering is enabled in such situations, the image may appear to sparkle due to the changing dithering pattern as the frame repeats over and over. This sparkling effect may be undesirable and, therefore, dithering may be disabled under such conditions.
As illustrated in
The noise distribution function storage unit 42 may be implemented as a standalone non-volatile storage unit separate from the main non-volatile storage 26 of the device 10 or, in other embodiments, may be included as part of the main non-volatile storage 26. Because the noise distribution function storage unit 42 may store and provide access multiple noise distribution functions, the threshold values defining the selectable threshold ranges and noise addend values may be adaptively adjusted (e.g., selecting an appropriate noise distribution function) for each frame of video data in order to optimize the masking of visual compression artifacts in the decoded video stream.
In certain embodiments, the selection of the noise distribution function may be based on one or more video image characteristics of the current image frame, such as the degree or method of video compression, color range, or amount of spatial or motion detail just to name a few. For example, when processing a 1 megabit/second heavily compressed video stream, it may be desirable to distribute the threshold values and/or noise addend values such that for each frame processed, approximately ⅓ of the pixels are positively offset (e.g., +1 or +2), approximately ⅓ of the pixels remain unchanged (e.g., 0), and the remaining ⅓ of the pixels are negatively offset (e.g., −1 or −2). For instance, if the random number generator 48 generates numbers in the range of 0-255, as discussed above, then ⅓ of the range, or approximately 85 of the possible 256 values, may be set to correspond to a noise addend value of either +1 or +2 (positive offset). Similarly, another ⅓ of the random number range may be set to correspond to a noise addend value of 0 (no change), and the remaining ⅓ of the random number range may be set to correspond to a noise addend value of either −1 or −2 (negative offset).
Further, within the ranges corresponding to the positive and negative offsets, approximately ⅓ of the values in each range may correspond to a noise addend value of +1 or −1, respectively, whereas the remaining ⅔ of the values within each range may correspond to a noise addend value of +2 or −2, respectively. Table 5 below illustrates one embodiment wherein the distribution of the threshold and noise addend values which reflects the particular noise distribution described above:
As will be appreciated, the probability of each noise addend value may depend on the selected noise distribution function, which may be programmable/configurable for each color component. For instance, each color channel in the video data may have a different noise distribution function selected.
The various threshold values and noise addend characteristics described above are meant merely to provide illustrative examples of what may be one possible implementation of the present invention. It should be noted that other embodiments of the present invention need not be limited to the present examples and may utilize additional or fewer thresholds, different threshold values, and a different range of noise addend values depending on specific design goals or constraints. For example, if the video data being processed has a low compression rate, it may be desirable to set the threshold values and noise distribution such that fewer pixels are changed by the noise generation system 34. In one embodiment, the noise distribution functions are configured such that the DC bias on the final output (e.g., a single frame of the video stream) has an average of zero. That is, for every pixel increased by +1 or +2, another pixel is decreased by −1 or −2, respectively, so that the average DC bias is zero.
Referring again to
Further, in accordance with aspects of the present disclosure, the dither noise, as determined by the selected noise distribution function for a given frame, is identically distributed but may be independent for each color component for that frame. That is, a different noise distribution function may be selected for each color component. Thus, the distribution of noise for each color component is the same for every pixel of the same color, but may differ between color channels where different noise distribution functions are selected for each color component in a given frame. In some embodiments, the noise distribution function may be modeled based on a Gaussian distribution.
The received pixel data 62 may be processed by summing the received pixel data 62 with the selected noise addend 58 (e.g. the output from the multiplexer circuit 76 of
The illustrated embodiment provides a clamping function 66 to remedy this problem. The clamping function 66 may be configured to compare the processed pixel data 64 with an upper limit value that is less than the maximum of the color range and with a lower limit value that is greater than the minimum of the color range. For example, referring to the above-discussed embodiment, an upper limit value may be less than 255 and a lower limit value may be greater than 0. If the processed pixel data 64 falls within the range defined by the upper and lower limit values, the processed pixel data 64 is outputted to the display 14, as indicated by the output data bus 44. However, if the processed pixel data exceeds the upper limit or falls below the lower limit, the clamping function 66 may normalize the processed pixel data 64 to fall within the range defined by the upper and lower limit to reduce the probability of pixel overrun. In one embodiment, the clamping function 66 may normalize the processed pixel 64 by clipping the value of the processed pixel 64 to be equivalent to the value of the upper or lower limits. By way of example, the presently illustrated clamping function 66 may include an upper limit value of 245 and a lower limit value of 10, such that any processed pixel data 64 having a value higher than the upper limit value is normalized to a value of 245, and such that any processed pixel data 64 having a value lower than the lower limit value is normalized to a value of 10. By clipping or normalizing each processed pixel data 64 in accordance with these limits, the occurrence of pixel overrun is significantly reduced, if not eliminated.
In alternate embodiments, the clamping function 66, rather than clipping the processed pixel data 64, may simply discard the processed pixel data 64 if it falls outside the range defined by the upper and the lower limits and output the original unmodified pixel data 62 to the display 14 instead. Additionally, other embodiments may provide for dynamically adjustable upper and lower limit values for each frame of video data which, like the threshold and noise addend values, may be determined by a noise distribution function selected from the noise distribution function storage unit 42 based upon one or more image characteristics of the current video frame for the optimal distribution of random noise to mask visual compression artifacts in the video stream.
Referring now to
At step 104, a noise distribution function is selected for the current video frame. As discussed above, in some embodiments of the present invention, the noise distribution function may be selected from multiple stored noise distribution functions stored, for example, in the noise distribution function storage unit 42. Additionally, the noise distribution function may be selected to provide an optimal distribution of random noise in a particular frame of the decoded video data for masking compression artifacts, and may be based upon one or more image characteristics of the current frame, such as the degree or method of video compression, color range, or amount of spatial or motion detail. Based on the selected noise distribution function, one or more thresholds and corresponding noise addend values are determined, as illustrated at step 106. As discussed above, based on the threshold values, the noise addend value range may be configurable to provide noise addend values in a range of +2 to −2 or in a range of +1 to −1.
At step 108, a pixel of the decoded video data is received. In order to determine the amount of random noise to add to the received pixel data from step 108, a random number is generated, as illustrated by step 110, and compared to multiple threshold values at step 112. As discussed above, based on the comparison, it can be determined into what threshold range the random number from step 110 falls.
Further, based upon the threshold comparison of step 112, a corresponding noise addend value may be selected at step 114, and summed with the received pixel data (from step 108) at step 116. At step 118, the processed pixel data of step 116 is compared with an upper and lower limit value. For example, as discussed above, the upper limit value may be set lower than the maximum color value for the pixel data (e.g., 255 for an 8-bit pixel), and the lower limit value may be set higher than the minimum color value for the pixel data (e.g., 0). A determination is made, as illustrated at step 120, as to whether the processed pixel data falls within the range defined by the upper and lower limit values. If the processed pixel data is within the range, the processed pixel is outputted at step 122, for example, to the display 14. However, if the value of the processed pixel is greater than the upper limit value or less than the lower limit value, the processed pixel is normalized at step 124 to fall within the range defined by the upper and lower limit values. In one discussed embodiment, normalizing the pixel data may be accomplished by clipping the pixel to the upper limit value if the value of the processed pixel is greater than the upper limit value, or by clipping the processed pixel to the lower limit value if the value of the processed pixel is less than the lower limit value. The normalized pixel data may be outputted at step 126, for example, to the display 14. As discussed above, a normalization process, as illustrated by steps 120, 124 and 126, is desirable to prevent the occurrence of pixel overrun.
Following the output of the processed pixel data from step 122 or the normalized pixel data from step 126, a determination is made as to whether the outputted pixel is the last pixel of the current video frame, as indicated by step 128. If the current video frame still has remaining pixels to be processed, then the next pixel of the current video frame is received, as indicated by step 130, and the process for determining the random noise to add to the next pixel of the current video frame repeats starting from step 110. If, however, all pixels in the current video frame have been processed, a determination is made at step 132 as to whether there are additional video frames to be processed. If it is determined that there are no more video frames to process for addition of random noise, this generally indicates that the video data has completed processing, and has been displayed and played back (e.g., by display 14) in its entirety. As such, the process ends at step 134. However, if the video data has additional video frames remaining to be processed, the next video frame is received for processing. At step 136, a noise distribution function based on the next video frame is selected, and the process returns to step 106, wherein the threshold values and noise addend values are adjusted in accordance with the newly selected noise distribution function from step 136.
The steps described above with regard to comparing a random number to multiple thresholds and determining a noise addend value based on the threshold comparison may be better understood through reference to
As illustrated at block 150, the random number generated in step 110 of
If at decision step 154, R is greater than T1, then R is compared against a second threshold value T2, as indicated at step 160. At decision step 162, if it is determined that R is less than T2, a threshold comparison result indicating a threshold range of T1≦R<T2 is set by step 164. Based on the threshold comparison result of step 164, a noise addend value of +1 is selected at step 166.
Moreover, at decision step 162, if R is greater than T2, then R is compared against a third threshold value T3, as indicated at step 168. At decision step 170, if it is determined that R is less than or equal to T3, a threshold comparison result indicating a threshold range of T2<R<T3 is set by step 172. Based on the threshold comparison result of step 172, a noise addend value of 0, indicating no change to the pixel data received at step 108 of
Finally, if at decision step 170, R is greater than T3, then R is compared against a fourth threshold value T4, as indicated at step 176. At decision step 178, if it is determined that R is less than or equal T4, a threshold comparison result indicating a threshold range of T3<R<T4 is set by step 180. Based on the threshold comparison result of step 180, a noise addend value of −1 is selected at step 182. However, if at decision step 178, R is determined to be greater than T4, then a threshold comparison result indicating a threshold range of T4<R is set by step 184 and, based on the threshold comparison result of step 184, a noise addend value of −2 is selected at step 186. The selected noise addend values from step 158, 166, 174, 182, or 186, are subsequently outputted, as indicated by step 188, and summed with the pixel data received at step 116 of
While
In general, the device 200 may be configured to decode and present a network display stream originating from an external device. The device 200 may provide support for display streams containing protected content, including content protected by digital copyright protection measures, such as digital rights management (DRM). In one embodiment, such protection measures may include FairPlay®, available from Apple Inc., or High-bandwidth Digital Content Protection (HDCP), available from Intel Corporation of Santa Clara, Calif.
The device 200 may provide a standard H.264, High-Profile, Level 4.1 video decoder 214 capable of providing an output at 1080p resolution and 60 frames per second (fps). In one embodiment, the decoder 214 may be configured to support the H.264 Fidelity Range Extensions, which may support YCbCr 4:2:2 and 4:4:4 formats. The decoder 200 may also support out-of-loop deblocking filtering to improve visual quality and performance by smoothing edges between macroblocks within video frames.
The device 200 may include a processor 216. In one embodiment, the processor may be a model of a reduced instruction set (RISC) processor available from Arm Holdings, PLC, of Cambridge, United Kingdom, such as a model of the ARMv7, ARMv9, ARMv11, or Cortex processors. In one embodiment, the processor 216 may include at least 32 KB of I-cache and D-cache, and may include an ARM Debug Access Port (DAP) controller and an ARM Generic Interrupt Controller (GIC). The processor 216 may also include a digital audio processor 215.
The device 200 also includes memory 202 which may be low-power memory in one embodiment, such as low-power DDR2 supporting a PHY interface. The memory 202 may be configured to support decoding of 1080p video at 60 fps or greater, rotation of 1080p video at 60 fps or greater, and display refresh (with and without overlay) of 1080p video at 60 fps or greater with headroom to support CPU control of a video decoder 214, audio forwarding functions, and AES (Advanced Encryption Standard) processing. The memory 202 may also support any other desired resolutions/frame rates, i.e., 1080p at 30 fps, 720p at 60 fps, 1080p at 120 fps, etc. The device 200 also includes a memory-to-memory rotation engine, which may be part of a scalar/rotator block 218. The rotation engine may support orientation correction for a decoded video stream at 0, 90, 180, and 270 degrees. The rotation engine 218 may support pixel formats including YCbCr 4:2:0, 2-plane NV12 source and destination pixel formats.
A display interface is also provided as part of the device 200. The display interface includes a display controller 220 that may include the video noise injection logic described above. The video noise injection logic may be part of a display pipeline implemented in the display controller 220, and may help reduce perceived artifacts introduced by video compression, as discussed above. In the illustrated embodiment, the display interface includes an external display interface for connection to an external display (e.g., separate from the device). For instance, the interface may include an HDMI interface 222 configured to support HDCP 223 and output video at 1080p resolution and 60 fps or greater.
The display pipeline also includes a video scaling block (may be part of block 218). The video scalar may support high quality up-scaling and downscaling. For instance, in one embodiment, the video scalar may include a polyphase filter (having at least 16 phases) with at least 8 horizontal taps, at least 4 vertical taps supporting scaling of up to 2× in each dimension. Additionally, the display pipeline may support a blend function to combine a fixed-size ABGR8888 UI plane with the decoded/scaled YCbCr 4:2:0, 2-plane video stream. For example, a programmable color space conversion function may be provided between the video display stream so that the blend function may be implemented in RGB. In one embodiment, this color space conversion function may retain 10-bit precision per RGB component. The blend function may performed with 10-bit precision per RGB component, and the final display pipeline stage may implement a matrix dither function to reduce the RGB pixel depth to 8-bits per component. As a result, the display pipeline presents an RGB888 stream to the HDMI interface in the present embodiment.
The device 200 may also include a USB host controller. The host controller may be compatible with the EHCI standard and may support controlling devices over HSIC (high-speed inter-chip) or over a conventional USB 2.0 physical layer. Additionally, the device 200 may include an on-the-go (OTG) USB device controller accessed externally via a conventional USB 2.0 interface to support tethered network display applications.
With regard to security requirements, the device 200 may support a secure boot loader, which may be implemented in a secure ROM (e.g., boot ROM 230). The size of the secure ROM depends on the target software. For instance, in one embodiment, the secure boot ROM may provide at least 128 KB of source code and the device 200 may include an SRAM that temporarily caches the boot image while the DRAM interface is configured. Further, as discussed above, the device 200 may provide an AES function capable of decoding a stream at up to 100 Mbps. The AES block 228 may be configured to functions compatible with certain DRM technologies, such as FairPlay®.
While the above discussed features of the noise generation circuit 40 have been described primarily with reference to hardware elements, it shall be appreciated by those skilled in the art that the functions carried out by the noise generation circuit 40 are not limited strictly to hardware components. Indeed, alternate embodiments of the foregoing techniques may also be implemented fully in software, such as a computer program including executable code stored on one or more tangible computer readable medium, or via a combination of both hardware or software elements.
Moreover, while the invention may be susceptible to various modifications and alternative forms, specific embodiments have been shown by way of example in the drawings and have been described in detail herein. Therefore, it should be understood that the invention is not intended to be limited to the particular forms disclosed. Rather, the invention is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the invention as defined by the following appended claims.
This application is a Non-Provisional Patent Application of U.S. Provisional Patent Application No. 61/502,194, entitled “Video Noise Injection System and Method”, filed Jun. 28, 2011, which are herein incorporated by reference.