Embodiments of the invention relate to a device with image processing capability for enhancing picture quality.
Modern devices with image display capabilities typically perform image enhancement operations when displaying images. For example, a television may enhance images or videos to be displayed on a screen, and a smartphone may enhance images or videos captured by or displayed on the smartphone. However, a conventional device typically performs image enhancement operations based on algorithms or formulations pre-configured by the device manufacturer. There is limited flexibility in adjusting the algorithms or formulations once the device is in use by a consumer. Thus, there is a need for improving the design of an image processing device to allow more flexibility in picture quality adjustment.
In one embodiment, an image processing circuit is provided in a device. The image processing circuit includes memory to store a training database and a plurality of models; an attribute identification engine to identify an attribute from an input image based on a model stored in the memory; a picture quality (PQ) engine to generate an output image for display by enhancing the input image based on the identified attribute; a data collection module to generate a labeled image based on the input image labeled with the identified attribute, and to add the labeled image to the training database; and a training engine to re-train the model using the training database.
In another embodiment, a method performed by a device for image enhancement is provided. An attribute is identified from an input image based on a model stored in the device. By enhancing the input image based on the identified attribute, an output image for display is generated. A labeled image is generated based on the input image labeled with the identified attribute. The labeled image is added to a training database stored in the device, and the model is re-trained using the training database.
In yet another embodiment, a method performed by a device for image enhancement is provided. A user-identified attribute for an input image is received via a user interface. A labeled image is generated based on the input image labeled with the user-identified attribute. The labeled image is added to a training database, and a model is re-trained using the training database. An output image is generated for display by enhancing the input image based on the model.
Other aspects and features will become apparent to those ordinarily skilled in the art upon review of the following description of specific embodiments in conjunction with the accompanying figures.
The present invention is illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings in which like references indicate similar elements. It should be noted that different references to “an” or “one” embodiment in this disclosure are not necessarily to the same embodiment, and such references mean at least one. Further, when a particular feature, structure, or characteristic is described in connection with an embodiment, it is submitted that it is within the knowledge of one skilled in the art to effect such feature, structure, or characteristic in connection with other embodiments whether or not explicitly described.
In the following description, numerous specific details are set forth. However, it is understood that embodiments of the invention may be practiced without these specific details. In other instances, well-known circuits, structures and techniques have not been shown in detail in order not to obscure the understanding of this description. It will be appreciated, however, by one skilled in the art, that the invention may be practiced without such specific details. Those of ordinary skill in the art, with the included descriptions, will be able to implement appropriate functionality without undue experimentation.
A device including an image processing circuit is described herein. A user may view images (e.g., a video) on a display panel coupled to the image processing circuit. The image processing circuit generates a training database containing the images that are labeled automatically and/or manually. The image processing circuit further uses the training database to re-train one or more models, based on which one or more attributes of the images are identified. A picture quality (PQ) engine enhances the quality of output images by changing certain image values associated with the identified attributes. If a user is not satisfied with the quality of output images shown on the display panel, the user may provide feedback to the device to help re-train the models that were used for generating the identified attributes. Thus, users can tailor the training database and the models according to their viewing experiences and preferences, and, as a result, the device provides flexibility in image quality adjustment.
The image processing circuit 100 includes an input port 135 for receiving an input image 131 and an output port 145 for outputting an output image 141, which, in this example, is the processed image of the input image 131. The output image 141 is sent to a display panel 160 for display. For ease of description, an input image and its corresponding output image are provided as an example. It is understood that the following description is applicable when the image processing circuit 100 receives an image sequence (e.g., a video) as input and generates a corresponding image sequence as output.
The image processing circuit 100 further includes a control module 110 which sends control signals (shown in dotted lines) to control and manage on-device training and inference operations. The control module 110 triggers training operations performed by a training engine 120 to train or re-train models 125 with labeled images from a training database 155. The control module 110 also triggers inference operations performed by an attribute identification engine 130 to identify attributes (i.e., representative characteristics) in the input image 131. In one embodiment, the attribute identification engine 130 may identify the attributes by inference and/or measurement based on one or more models 125. The attribute identified by the attribute identification engine 130 may be a type (e.g., a scene type or an object type), statistic information, or a feature in the image content. For example, the attributes may include a scene type, types of objects in a scene, contrast information (e.g., histogram or statistics), luminance information (e.g., histogram or statistics), edge directions and strength, noise and degree of blur, segmentation information, motion information, etc. In some embodiments, the attribute may be identified using a machine-learning or deep-learning algorithm.
In some embodiments, the image processing circuit 100 may be implemented in a system-on-a-chip (SoC). In some embodiments, the image processing circuit 100 may be implemented in more than one chip in the same electronic device.
In one embodiment, the attribute identification engine 130 may identify multiple attributes from an image (e.g., a scene type as well as contrast information), which are collectively referred to as an attribute set of the image. The attribute identification engine 130 may further generate a confidence level of an identified attribute; e.g., 75% confidence for the nature scene type. A high confidence level (e.g., when the confidence level exceeds a threshold) indicates that the identified attribute has a correspondingly high probability to be correctly identified. The attribute identification engine 130 sends the attribute set to the PQ engine 140 and a data collection module 150.
In one embodiment, the PQ engine 140 performs image enhancement operations on the input image 131 using image processing algorithms based on the attribute set of the input image 131. Different algorithms may be used for different attributes; e.g., an algorithm for noise reduction, another algorithm for a nature scene, and yet another algorithm for a scene type of food. In some embodiments, the PQ engine 140 may perform one or more of the following operations: de-noising, scaling, contrast adjustment, color adjustment, and sharpness adjustment. For example, the PQ engine 140 may increase the warmth of the image color in a food scene, increase the sharpness in a blurry image, and de-noise in a noisy image. The output of the PQ engine 140 is the output image 141, which is sent to the data collection module 150 and the output port 145.
The data collection module 150 receives the output image 141 from the PQ engine 140, and also receives the input image 131 and the attribute set of the input image 131 from the attribute identification engine 130. In one embodiment, one or more identified attributes in the attribute set may be attached with respective confidence levels.
The data collection module 150 is a part of the image processing circuit 100 which provides labeled images to the training database 155. In a manual labeling approach, the input image 131 is labeled by a user. In an automatic labeling approach, the input image 131 is automatically labeled with identified attributes of high confidence levels. The automatic labeling and the manual labeling will be described with reference to
The control module 110 may trigger the training engine 120 to perform training operations to train and/or re-train models 125 with the labeled images from the training database 155. The training operations may be performed periodically or based on events. For example, the training operations may start when the image processing circuit 100 enters a sleep state or an idle state. For an edge device with limited processing resources (e.g., a smart TV, a smartphone, an IoT device, etc.), the models 125 may be initially trained on a server such as a cloud server, and re-trained on the edge device by the training engine 120 based on images or videos viewed on the edge device. The training operations change the weights or parameters in the models 125, such as filter weights in an image filter, kernel weights in a neural network kernel, thresholds, etc. In some embodiments, the training operations may be performed by machine learning, deep learning, or other types of learning operations.
In one embodiment, the manual labeling may be performed on demand by a user. A user may mark a displayed image as having poor picture quality; e.g., by selecting a button, and the marking action triggers the start of a manual labeling process. Alternatively, a user may request to start a manual labeling process at any time regarding any image attribute. The image processing circuit 100, in response, requests the user to label the displayed image or the corresponding input image with a correct value or type of an attribute, where the “correctness” may be determined from the user's perspective. In one embodiment, the user interface 320 may present the user with a number of selectable values or types to replace the device-identified attribute. Using the scene type as an example, the user interface 320 may present the user with options such as “people”, “food”, “nature”, “landmark” to select as the scene type attribute for an image. The user may select one of the presented types (e.g., people) to indicate the correct scene type attribute for the image. In one embodiment, the user may add a new label such as “animals” to indicate the correct scene type attribute for the image.
To improve the training accuracy, the data collection module 150 may retrieve, from the training database 155, multiple sample images that are similar to the user-labeled image with respect to an attribute of interest. In one embodiment, the data collection module 150 includes a sample select circuit 330, which selects sample images from the training database 155 and provides the selected sample images to the user. Each of the selected sample images has a confidence level exceeding a predetermined threshold with respect to the attribute of interest. For example, the sample image may be displayed on the display panel along with a list of selectable values or types of an attribute of interest. A user may label a sample image by selecting a value or type from the list. In some embodiments, a user may add a new value or a new type to the attribute of interest. Using the above example in which scene type is the attribute of interest, each sample image may be presented with a list of people”, “food”, “nature”, “landmark” for the user to select. The user may select from the list. Alternatively, the user may add “animals” to the list as a new option for the scene type. The manual labeling process ends when the user labels all of the sample images provided by the sample select circuit 330.
However, a user can determine, from the image 410, that the correct scene type attribute should be “people.” For this image 410, the people scene type is a user-identified attribute that is different from the device-identified attribute of a tree scene type. In this example, the user may select the “people” tab to change the device-identified attribute for the image 410. The user-identified scene type of people becomes a label of the image 410.
After the user labels the image 410 with a corrected attribute, the image processing circuit 100 may present the user with a number of sample images that were previously identified as the people scene type. The user may label these sample images with respect to the scene type to indicate whether or not they were correctly identified as containing the people scene type.
The training engine 120 uses the labeled images from the training database 155 to re-train the models 125. The models 125 may have been trained to detect a feature (e.g., edge directions and strength, segmentation information, motion, etc.) in an image or an image sequence, classify the image content, measure a condition of an image (e.g., contrast, sharpness, brightness, luminance, noise, etc.), etc. The models 125 may be described by mathematical formulations or representations. The models 125 may initially be installed in the image processing circuit 100 and can be re-trained, or refined, with labeled images to learn from the user's image viewing experience on the device.
The method 500 begins at step 510 with the device identifying an attribute from an input image based on a model stored in the device. At step 520, the device generates an output image for display by enhancing the input image based on the identified attribute. At step 530, the device generates a labeled image based on the input image labeled with the identified attribute. At step 540, the device adds the labeled image to a training database stored in the device. At step 550, the device re-trains the model using the training database. In one embodiment, the model may be re-trained on the device.
The CNN accelerator 612 includes hardware components specialized for accelerating neural network operations by convolutional operations, fully-connected operations, activation, pooling, normalization, element-wise mathematical computations, etc. In some embodiments, the CNN accelerator 612 includes multiple compute units and memory (e.g., Static Random Access Memory (SRAM)), where each compute unit further includes multipliers and adder circuits, among others, for performing mathematical operations such as multiply-and-accumulate (MAC) operations to accelerate the convolution, activation, pooling, normalization, and other neural network operations. The CNN accelerator 612 may perform fixed and floating-point neural network operations. In connection with the picture quality enhancement described herein, the CNN accelerator 612 may perform training and inference operations described in connection with
The device 600 further includes a memory and storage hardware 620 coupled to the processing hardware 610. The memory and storage hardware 620 may include memory devices such as dynamic random access memory (DRAM), SRAM, flash memory, and other non-transitory machine-readable storage medium; e.g., volatile or non-volatile memory devices. The memory and storage hardware 620 may further include storage devices, for example, any type of solid-state or magnetic storage device. In one embodiment, the memory and storage hardware 620 may store the models 125 and the training database 155 of
The device 600 may also include a display panel 630 to display information such as images, videos, messages, Web pages, games, texts, and other types of text, image, and video data. The images may be labeled by a user via a user interface, such as a keyboard, a touchpad, a touch screen, a mouse, a touch screen, etc. The device 600 may also include audio hardware 640, such as a microphone and a speaker, for receiving and generating sounds. The audio hardware 640 may also provide a user interface for sending and receiving voice commands.
In some embodiments, the device 600 may also include a network interface 650 to connect to a wired and/or wireless network for transmitting and/or receiving voice, digital data and/or media signals. It is understood the embodiment of
The operations of the flow diagram of
While the invention has been described in terms of several embodiments, those skilled in the art will recognize that the invention is not limited to the embodiments described, and can be practiced with modification and alteration within the spirit and scope of the appended claims. The description is thus to be regarded as illustrative instead of limiting.
This application claims the benefit of U.S. Provisional Application No. 63/016,344 filed on Apr. 28, 2020, the entirety of which is incorporated by reference herein.
Number | Date | Country | |
---|---|---|---|
63016344 | Apr 2020 | US |