This disclosure relates generally to image processing and machine learning systems. More specifically, this disclosure relates to artificial intelligence deep learning for controlling aliasing artifacts.
Depending on three-dimensional (3D) video rendering quality, rendered video such as video game video can include many artifacts. In particular, aliasing artifacts such as jaggy, broken line, or dashed line artifacts are frequent and can cause significant image quality degradation. However, collecting one or more pairs of a degraded image and a high-quality image for training a neural network or other machine learning model is very difficult if a 3D model of the entire game with a rendering system is not available and only rendered 2D images are available. This tends to be the case where a 3D model is only available to the content and/or game provider and the rendering quality control system is not available to client devices.
This disclosure relates to artificial intelligence deep learning for controlling aliasing artifacts.
In a first embodiment, a method includes receiving a degraded image including aliasing artifacts. The method also includes inputting the degraded image to an image enhancement network. The method further includes processing, using the image enhancement network, the degraded image to remove one or more of the aliasing artifacts. In addition, the method includes outputting, by the image enhancement network, a restored high-quality image.
In a second embodiment, an electronic device includes at least one processing device configured to receive a degraded image including aliasing artifacts. The at least one processing device is also configured to input the degraded image to an image enhancement network. The at least one processing device is further configured to process, using the image enhancement network, the degraded image to remove one or more of the aliasing artifacts. In addition, the at least one processing device is configured to output, by the image enhancement network, a restored high-quality image.
In a third embodiment, a non-transitory machine-readable medium contains instructions that when executed cause at least one processor of an electronic device to receive a degraded image including aliasing artifacts. The non-transitory machine-readable medium also contains instructions that when executed cause the at least one processor to input the degraded image to an image enhancement network. The non-transitory machine-readable medium further contains instructions that when executed cause the at least one processor to process, using the image enhancement network, the degraded image to remove one or more of the aliasing artifacts. In addition, the non-transitory machine-readable medium contains instructions that when executed cause the at least one processor to output, by the image enhancement network, a restored high-quality image.
In a fourth embodiment, a method includes obtaining a high-quality image of an environment. The method also includes generating at least one degraded image of the environment by performing an aliasing artifact simulation on the obtained high-quality image. Performing the aliasing artifact simulation includes at least one of performing a broken line artifact simulation to introduce one or more broken line artifacts on one or more objects in the environment of the high-quality image and performing a jaggy artifact simulation to introduce jaggy edges to one or more other objects in the environment of the high-quality image.
In a fifth embodiment, an electronic device includes at least one processing device configured to obtain a high-quality image of an environment. The at least one processing device is also configured to perform an aliasing artifact simulation on the obtained high-quality image in order to generate at least one degraded image of the environment. To perform the aliasing artifact simulation, the at least one processing device is configured to at least one of perform a broken line artifact simulation to introduce one or more broken line artifacts on one or more objects in the environment of the high-quality image and perform a jaggy artifact simulation to introduce jaggy edges to one or more other objects in the environment of the high-quality image.
In a sixth embodiment, a non-transitory machine-readable medium contains instructions that when executed cause at least one processor of an electronic device to obtain a high-quality image of an environment. The non-transitory machine-readable medium also contains instructions that when executed cause the at least one processor to perform an aliasing artifact simulation on the obtained high-quality image in order to generate at least one degraded image of the environment. The instructions that when executed cause the at least one processor to perform the aliasing artifact simulation comprise instructions that when executed cause the at least one processor to at least one of perform a broken line artifact simulation to introduce one or more broken line artifacts on one or more objects in the environment of the high-quality image and perform a jaggy artifact simulation to introduce jaggy edges to one or more other objects in the environment of the high-quality image.
Other technical features may be readily apparent to one skilled in the art from the following figures, descriptions, and claims.
Before undertaking the DETAILED DESCRIPTION below, it may be advantageous to set forth definitions of certain words and phrases used throughout this patent document. The terms “transmit,” “receive,” and “communicate,” as well as derivatives thereof, encompass both direct and indirect communication. The terms “include” and “comprise,” as well as derivatives thereof, mean inclusion without limitation. The term “or” is inclusive, meaning and/or. The phrase “associated with,” as well as derivatives thereof, means to include, be included within, interconnect with, contain, be contained within, connect to or with, couple to or with, be communicable with, cooperate with, interleave, juxtapose, be proximate to, be bound to or with, have, have a property of, have a relationship to or with, or the like.
Moreover, various functions described below can be implemented or supported by one or more computer programs, each of which is formed from computer readable program code and embodied in a computer readable medium. The terms “application” and “program” refer to one or more computer programs, software components, sets of instructions, procedures, functions, objects, classes, instances, related data, or a portion thereof adapted for implementation in a suitable computer readable program code. The phrase “computer readable program code” includes any type of computer code, including source code, object code, and executable code. The phrase “computer readable medium” includes any type of medium capable of being accessed by a computer, such as read only memory (ROM), random access memory (RAM), a hard disk drive, a compact disc (CD), a digital video disc (DVD), or any other type of memory. A “non-transitory” computer readable medium excludes wired, wireless, optical, or other communication links that transport transitory electrical or other signals. A non-transitory computer readable medium includes media where data can be permanently stored and media where data can be stored and later overwritten, such as a rewritable optical disc or an erasable memory device.
As used here, terms and phrases such as “have,” “may have,” “include,” or “may include” a feature (like a number, function, operation, or component such as a part) indicate the existence of the feature and do not exclude the existence of other features. Also, as used here, the phrases “A or B,” “at least one of A and/or B,” or “one or more of A and/or B” may include all possible combinations of A and B. For example, “A or B,” “at least one of A and B,” and “at least one of A or B” may indicate all of (1) including at least one A, (2) including at least one B, or (3) including at least one A and at least one B. Further, as used here, the terms “first” and “second” may modify various components regardless of importance and do not limit the components. These terms are only used to distinguish one component from another. For example, a first user device and a second user device may indicate different user devices from each other, regardless of the order or importance of the devices. A first component may be denoted a second component and vice versa without departing from the scope of this disclosure.
It will be understood that, when an element (such as a first element) is referred to as being (operatively or communicatively) “coupled with/to” or “connected with/to” another element (such as a second element), it can be coupled or connected with/to the other element directly or via a third element. In contrast, it will be understood that, when an element (such as a first element) is referred to as being “directly coupled with/to” or “directly connected with/to” another element (such as a second element), no other element (such as a third element) intervenes between the element and the other element.
As used here, the phrase “configured (or set) to” may be interchangeably used with the phrases “suitable for,” “having the capacity to,” “designed to,” “adapted to,” “made to,” or “capable of” depending on the circumstances. The phrase “configured (or set) to” does not essentially mean “specifically designed in hardware to.” Rather, the phrase “configured to” may mean that a device can perform an operation together with another device or parts. For example, the phrase “processor configured (or set) to perform A, B, and C” may mean a generic-purpose processor (such as a CPU or application processor) that may perform the operations by executing one or more software programs stored in a memory device or a dedicated processor (such as an embedded processor) for performing the operations.
The terms and phrases as used here are provided merely to describe some embodiments of this disclosure but not to limit the scope of other embodiments of this disclosure. It is to be understood that the singular forms “a,” “an,” and “the” include plural references unless the context clearly dictates otherwise. All terms and phrases, including technical and scientific terms and phrases, used here have the same meanings as commonly understood by one of ordinary skill in the art to which the embodiments of this disclosure belong. It will be further understood that terms and phrases, such as those defined in commonly-used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined here. In some cases, the terms and phrases defined here may be interpreted to exclude embodiments of this disclosure.
Examples of an “electronic device” according to embodiments of this disclosure may include at least one of a smartphone, a tablet personal computer (PC), a mobile phone, a video phone, an e-book reader, a desktop PC, a laptop computer, a netbook computer, a workstation, a personal digital assistant (PDA), a portable multimedia player (PMP), an MP3 player, a mobile medical device, a camera, or a wearable device (such as smart glasses, a head-mounted device (HMD), electronic clothes, an electronic bracelet, an electronic necklace, an electronic accessory, an electronic tattoo, a smart mirror, or a smart watch). Other examples of an electronic device include a smart home appliance. Examples of the smart home appliance may include at least one of a television, a digital video disc (DVD) player, an audio player, a refrigerator, an air conditioner, a cleaner, an oven, a microwave oven, a washer, a dryer, an air cleaner, a set-top box, a home automation control panel, a security control panel, a TV box (such as SAMSUNG HOMESYNC, APPLETV, or GOOGLE TV), a smart speaker or speaker with an integrated digital assistant (such as SAMSUNG GALAXY HOME, APPLE HOMEPOD, or AMAZON ECHO), a gaming console (such as an XBOX, PLAYSTATION, or NINTENDO), an electronic dictionary, an electronic key, a camcorder, or an electronic picture frame. Still other examples of an electronic device include at least one of various medical devices (such as diverse portable medical measuring devices (like a blood sugar measuring device, a heartbeat measuring device, or a body temperature measuring device), a magnetic resource angiography (MRA) device, a magnetic resource imaging (MRI) device, a computed tomography (CT) device, an imaging device, or an ultrasonic device), a navigation device, a global positioning system (GPS) receiver, an event data recorder (EDR), a flight data recorder (FDR), an automotive infotainment device, a sailing electronic device (such as a sailing navigation device or a gyro compass), avionics, security devices, vehicular head units, industrial or home robots, automatic teller machines (ATMs), point of sales (POS) devices, or Internet of Things (IoT) devices (such as a bulb, various sensors, electric or gas meter, sprinkler, fire alarm, thermostat, street light, toaster, fitness equipment, hot water tank, heater, or boiler). Other examples of an electronic device include at least one part of a piece of furniture or building/structure, an electronic board, an electronic signature receiving device, a projector, or various measurement devices (such as devices for measuring water, electricity, gas, or electromagnetic waves). Note that, according to various embodiments of this disclosure, an electronic device may be one or a combination of the above-listed devices. According to some embodiments of this disclosure, the electronic device may be a flexible electronic device. The electronic device disclosed here is not limited to the above-listed devices and may include new electronic devices depending on the development of technology.
In the following description, electronic devices are described with reference to the accompanying drawings, according to various embodiments of this disclosure. As used here, the term “user” may denote a human or another device (such as an artificial intelligent electronic device) using the electronic device.
Definitions for other certain words and phrases may be provided throughout this patent document. Those of ordinary skill in the art should understand that in many if not most instances, such definitions apply to prior as well as future uses of such defined words and phrases.
None of the description in this application should be read as implying that any particular element, step, or function is an essential element that must be included in the claim scope. The scope of patented subject matter is defined only by the claims. Moreover, none of the claims is intended to invoke 35 U.S.C. § 112(f) unless the exact words “means for” are followed by a participle. Use of any other term, including without limitation “mechanism,” “module,” “device,” “unit,” “component,” “element,” “member,” “apparatus,” “machine,” “system,” “processor,” or “controller,” within a claim is understood by the Applicant to refer to structures known to those skilled in the relevant art and is not intended to invoke 35 U.S.C. § 112(f).
For a more complete understanding of this disclosure and its advantages, reference is now made to the following description, taken in conjunction with the accompanying drawings, in which:
As noted above, depending on three-dimensional (3D) video rendering quality, rendered video such as video game video can include many artifacts. In particular, aliasing artifacts such as jaggy, broken line, or dashed line artifacts are frequent and can cause significant image quality degradation. However, collecting one or more pairs of a degraded image and a high-quality image for training a neural network or other machine learning model is very difficult if a 3D model of the entire game with a rendering system is not available and only rendered 2D images are available. This tends to be the case where a 3D model is only available to the content and/or game provider and the rendering quality control system is not available to client devices.
In some cases, a neural network or other machine learning model may be used to learn how to restore high-quality video from degraded video that includes aliasing. This disclosure provides for training a machine learning model to restore high quality video from degraded video input(s) having aliasing artifacts. The machine learning model can utilize at least one pair of a degraded aliasing image and a high-quality image, which can be geometrically aligned. As noted above, for a device without access to a 3D model for 3D rendering, the only way to capture the pair of a high-quality image and a degraded image may be to capture the 2D rendered frame with different rendering quality settings. However, there may be geometrical alignment issues in this case. Since a graphics processing unit (GPU) can render only one 2D scene image during game play, the high-quality image and the degraded image may be captured at different times, which causes geometrical alignment problems between the two images.
This disclosure also provides various techniques for generating a pair of a degraded (aliasing) image and a high-quality image to be used in training a machine learning model in how to restore high-quality images from degraded images. In various embodiments, these techniques include simulating aliasing artifacts from high-quality images to create simulated degraded images that can be used with their associated high-quality images during training. For this simulation, a set of aliasing images (target images) and a set of high-quality images may be given, but they may not be geometrically aligned and can include different contents. Thus, the techniques of this disclosure include processes for simulating the aliasing artifacts from high-quality images so that the simulated aliasing appears similar to the aliasing in the target images. This disclosure also provides for simulating different types of aliasing artifacts, such as simulating broken thin lines and/or simulating jaggy artifacts without edge transitions.
Various embodiments of this disclosure include training a machine learning model to control aliasing artifacts utilizing pair(s) of high quality and degraded images to teach the machine learning model how to learn to control aliasing artifacts, where a degraded image is generated from a high-quality image and where a degree of artifacts is controlled via an artifact simulation. Various embodiments of this disclosure also include replicating 3D rendering aliasing artifacts without a 3D model and rendering system, such as by providing an artifact simulation to generate 3D rendering artifacts based on 2D high quality rendered images without a 3D model and a rendering system. Various embodiments of this disclosure also include performing a simulation of broken thin lines using an affine transform (2D rotation), such as by using an affine transform to simulate a broken line from an unbroken line in a 2D image, re-projecting the 2D image to a 3D space, and projecting the image back to a 2D image grid by applying a rotation matrix to the 2D image followed by a nearest neighbor interpolation and applying an inverse rotation matrix followed by bi-cubic interpolation. Various embodiments of this disclosure also include performing a simulation of jaggy artifacts without edge transitions, such as by generating an image with a jaggy artifact from a 2D high quality rendered image without an edge transition, where a degree of the jaggy artifact is controllable.
Note that the various embodiments discussed below can be used in any suitable devices and in any suitable systems. Example devices in which the various embodiments discussed below may be used include various consumer electronic devices, such as smartphones, tablet computers, and televisions. However, it will be understood that the principles of this disclosure may be implemented in any number of other suitable contexts.
According to embodiments of this disclosure, an electronic device 101 is included in the network configuration 100. The electronic device 101 can include at least one of a bus 110, a processor 120, a memory 130, an input/output (I/O) interface 150, a display 160, a communication interface 170, or a sensor 180. In some embodiments, the electronic device 101 may exclude at least one of these components or may add at least one other component. The bus 110 includes a circuit for connecting the components 120-180 with one another and for transferring communications (such as control messages and/or data) between the components.
The processor 120 includes one or more processing devices, such as one or more microprocessors, microcontrollers, digital signal processors (DSPs), application specific integrated circuits (ASICs), or field programmable gate arrays (FPGAs). In some embodiments, the processor 120 includes one or more of a central processing unit (CPU), an application processor (AP), a communication processor (CP), or a graphics processor unit (GPU). The processor 120 is able to perform control on at least one of the other components of the electronic device 101 and/or perform an operation or data processing relating to communication or other functions. As described below, the processor 120 may perform various functions related to restoring degraded images to high-quality images using a machine learning model. As also described below, the processor 120 may perform various functions related to generating degraded images for use in training a neural network or other machine learning model. For instance, the processor 120 may process high-quality images from a training set to create the degraded images by simulating aliasing artifacts using the high-quality images.
The memory 130 can include a volatile and/or non-volatile memory. For example, the memory 130 can store commands or data related to at least one other component of the electronic device 101. According to embodiments of this disclosure, the memory 130 can store software and/or a program 140. The program 140 includes, for example, a kernel 141, middleware 143, an application programming interface (API) 145, and/or an application program (or “application”) 147. At least a portion of the kernel 141, middleware 143, or API 145 may be denoted an operating system (OS).
The kernel 141 can control or manage system resources (such as the bus 110, processor 120, or memory 130) used to perform operations or functions implemented in other programs (such as the middleware 143, API 145, or application 147). The kernel 141 provides an interface that allows the middleware 143, the API 145, or the application 147 to access the individual components of the electronic device 101 to control or manage the system resources. The application 147 may include one or more applications for restoring degraded images to high-quality images using a machine learning model. The application 147 may also include one or more applications for generating degraded images for use in training a neural network or other machine learning model. In some embodiments, the one or more applications 147 can perform such training of the neural network or other machine learning model. These functions can be performed by a single application or by multiple applications that each carries out one or more of these functions. The middleware 143 can function as a relay to allow the API 145 or the application 147 to communicate data with the kernel 141, for instance. A plurality of applications 147 can be provided. The middleware 143 is able to control work requests received from the applications 147, such as by allocating the priority of using the system resources of the electronic device 101 (like the bus 110, the processor 120, or the memory 130) to at least one of the plurality of applications 147. The API 145 is an interface allowing the application 147 to control functions provided from the kernel 141 or the middleware 143. For example, the API 145 includes at least one interface or function (such as a command) for filing control, window control, image processing, or text control.
The I/O interface 150 serves as an interface that can, for example, transfer commands or data input from a user or other external devices to other component(s) of the electronic device 101. The I/O interface 150 can also output commands or data received from other component(s) of the electronic device 101 to the user or the other external device.
The display 160 includes, for example, a liquid crystal display (LCD), a light emitting diode (LED) display, an organic light emitting diode (OLED) display, a quantum-dot light emitting diode (QLED) display, a microelectromechanical systems (MEMS) display, or an electronic paper display. The display 160 can also be a depth-aware display, such as a multi-focal display. The display 160 is able to display, for example, various contents (such as text, images, videos, icons, or symbols) to the user. The display 160 can include a touchscreen and may receive, for example, a touch, gesture, proximity, or hovering input using an electronic pen or a body portion of the user.
The communication interface 170, for example, is able to set up communication between the electronic device 101 and an external electronic device (such as a first electronic device 102, a second electronic device 104, or a server 106). For example, the communication interface 170 can be connected with a network 162 or 164 through wireless or wired communication to communicate with the external electronic device. The communication interface 170 can be a wired or wireless transceiver or any other component for transmitting and receiving signals, such as images.
The wireless communication is able to use at least one of, for example, WiFi, long term evolution (LTE), long term evolution-advanced (LTE-A), 5th generation wireless system (5G), millimeter-wave or 60 GHz wireless communication, Wireless USB, code division multiple access (CDMA), wideband code division multiple access (WCDMA), universal mobile telecommunication system (UMTS), wireless broadband (WiBro), or global system for mobile communication (GSM), as a communication protocol. The wired connection can include, for example, at least one of a universal serial bus (USB), high definition multimedia interface (HDMI), recommended standard 232 (RS-232), or plain old telephone service (POTS). The network 162 or 164 includes at least one communication network, such as a computer network (like a local area network (LAN) or wide area network (WAN)), Internet, or a telephone network.
The electronic device 101 further includes one or more sensors 180 that can meter a physical quantity or detect an activation state of the electronic device 101 and convert metered or detected information into an electrical signal. For example, one or more sensors 180 can include one or more cameras or other imaging sensors, which may be used to capture images of scenes. The sensor(s) 180 can also include one or more buttons for touch input, one or more microphones, a gesture sensor, a gyroscope or gyro sensor, an air pressure sensor, a magnetic sensor or magnetometer, an acceleration sensor or accelerometer, a grip sensor, a proximity sensor, a color sensor (such as an RGB sensor), a bio-physical sensor, a temperature sensor, a humidity sensor, an illumination sensor, an ultraviolet (UV) sensor, an electromyography (EMG) sensor, an electroencephalogram (EEG) sensor, an electrocardiogram (ECG) sensor, an infrared (IR) sensor, an ultrasound sensor, an iris sensor, or a fingerprint sensor. The sensor(s) 180 can further include an inertial measurement unit, which can include one or more accelerometers, gyroscopes, and other components. In addition, the sensor(s) 180 can include a control circuit for controlling at least one of the sensors included here. Any of these sensor(s) 180 can be located within the electronic device 101.
In some embodiments, the first external electronic device 102 or the second external electronic device 104 can be a wearable device or an electronic device-mountable wearable device (such as an HMD). When the electronic device 101 is mounted in the electronic device 102 (such as the HMD), the electronic device 101 can communicate with the electronic device 102 through the communication interface 170. The electronic device 101 can be directly connected with the electronic device 102 to communicate with the electronic device 102 without involving with a separate network. The electronic device 101 can also be an augmented reality wearable device, such as eyeglasses, that includes one or more imaging sensors.
The first and second external electronic devices 102 and 104 and the server 106 each can be a device of the same or a different type from the electronic device 101. According to certain embodiments of this disclosure, the server 106 includes a group of one or more servers. Also, according to certain embodiments of this disclosure, all or some of the operations executed on the electronic device 101 can be executed on another or multiple other electronic devices (such as the electronic devices 102 and 104 or server 106). Further, according to certain embodiments of this disclosure, when the electronic device 101 should perform some function or service automatically or at a request, the electronic device 101, instead of executing the function or service on its own or additionally, can request another device (such as electronic devices 102 and 104 or server 106) to perform at least some functions associated therewith. The other electronic device (such as electronic devices 102 and 104 or server 106) is able to execute the requested functions or additional functions and transfer a result of the execution to the electronic device 101. The electronic device 101 can provide a requested function or service by processing the received result as it is or additionally. To that end, a cloud computing, distributed computing, or client-server computing technique may be used, for example. While
The server 106 can include the same or similar components 110-180 as the electronic device 101 (or a suitable subset thereof). The server 106 can support to drive the electronic device 101 by performing at least one of operations (or functions) implemented on the electronic device 101. For example, the server 106 can include a processing module or processor that may support the processor 120 implemented in the electronic device 101. As described below, the server 106 may perform various functions related to restoring degraded images to high-quality images using a machine learning model. As also described below, the server 106 may perform various functions related to generating degraded images for use in training a neural network or other machine learning model. For instance, the server 106 may process high-quality images from a training set to create the degraded images by simulating aliasing artifacts using the high-quality images.
Although
As shown in
As shown in
Since the simulated degraded image 206 is generated using the artifact simulation operation 204, this avoids a need to store all the pairs of high-quality images and degraded images for training the image enhancement network 208. Instead, only the high-quality images need to be stored, while the corresponding degraded images can be generated for temporary use during training from the high-quality images. However, it is also possible to store the generated degraded images for later use, if desired. As described in this disclosure, the architecture 200 can control the degree of artifacts introduced during the artifact simulation operation 204, which can make training more efficient and robust. Note that the architecture 200 described above can be used with any desired artifact simulation technique(s) used by the artifact simulation operation 204.
Although
As shown in
Although
As described in this disclosure, various types of image artifacts can be introduced to generate degraded images for use during training of a network, such as during the training of the image enhancement network 208 described with respect to
Broken thin line artifacts are a type of aliasing artifact often displayed by game streaming services. Since the images 401 and 501 are rendered in different settings and different times, they are not geometrically aligned. Therefore, to generate the pair of aliasing image and high-quality image, an artifact simulation operation, such as the artifact simulation operation 204, can mimic similar aliasing artifacts of the target image using the high-quality image. The broken line artifact shown in
Although
It will be understood that the artifact simulation architecture 600 can be used as at least part of the artifact simulation operation 204 during training of an image enhancement network, such as described with respect to
The high-quality image 602 is also input to a thin shallow object with smooth background detection operation 608. The operation 608 uses a region detection process for the aliasing artifact. The broken line artifact type can occur when images have very thin objects around a smooth background. Since the angle of the thin object tends to be very shallow, meaning that the thin object is close to a vertical angle or a horizontal angle but not entirely vertical or horizontal, the artifact simulation architecture 600 can generate a detection map 609 (denoted as FThin) using the operation 608. Once the broken line artifact image 605, the jaggy artifact image 607, and the detection map 609 are generated, a blending operation 610 is performed to simulate the broken thin line artifacts in the image by blending the two different images 605 and 607 using the detection map 609. The blending operation 610 produces a degraded artifact image 612 (denoted as Iartifact) that includes the introduced artifacts. As described in this disclosure, the degraded artifact image 612 can be used with the high-quality image 602 during training of a machine learning model. It will be understood that the artifact simulation architecture 600 can be performed any number of times on any number of high-quality images to create a plurality of pairs of high-quality images and degraded images to be used for training.
The following now describes how various operations in the architecture 600 may operate in specific embodiments of this disclosure. The following details are for illustration and explanation only and do not limit the scope of this disclosure to these specific embodiments or details.
In some embodiments, the high-quality image 602 (Ihigh) is received by the architecture 600, and the broken line artifact generation operation 604 is performed on the high-quality image 602 to create the broken line artifact image 605 (IB). In some cases, the mathematical modeling for broken line artifact generation can be derived as follows. To simulate a broken line using a 2D high-quality image, the 3D rendering model can be defined as follows.
Here, I is a 2D image, I3D is a 3D image, S is 3D pixel coordinates, P is a camera matrix, and D is a degradation model. Note that the degradation model D models a sampling process on the 2D image. Using this model, the 2D high-quality image (Ihigh) can be rendered from the 3D image as follows.
Here, xhigh is high-quality image coordinates, and D1 is a degradation model for the high-quality image. Similarly, the low quality 2D image (broken line image) can be modelled as follows.
Here, Ilow is the low quality (broken line artifact) 2D image, xlow is low quality image coordinates, and D2 is a degradation model for the low quality image. Based on the above, it can be seen that the following equations can define the relationship between xlow and xhigh.
As mentioned above, degradation models (D1 and D2) sample the 2D continuous coordinates (PS) onto a discrete 2D image grid. Since xhigh is the 2D image coordinates of the high-quality rendered image, D3 (=D2° D1−1) converts xhigh to a continuous coordinate first. Simulating broken thin lines from normal thin lines in a 2D image is challenging because these artifacts would occur when the 3D model is projected onto some 2D image planes with inappropriate sampling/interpolation. Since the operation 604 only has access to the high-quality image 602, which already samples/interpolates the projected 3D scenes on the image grid, simulating broken line artifacts becomes a difficult process. This disclosure provides, however, a technique that includes applying sampling/interpolation on the already-sampled 2D high-quality image 602. In some cases, to convert xhigh to continuous coordinates, the broken line artifact generation operation 604 applies an affine transform (H) operation (such as 2D rotation) on xhigh, followed by a sampling/interpolation operation (such as nearest neighbor (NN)) to synthesize the transformed image on a new image grid as shown below.
Here, Mnearest is the nearest interpolation.
Note, however, that the transformed image may have a different geometric alignment with the original high-quality image 602. To align the degraded image with the high-quality image 602 (thus maintaining the same geometrical alignment with high-quality image), the broken line artifact generation operation 604 can apply an inverse affine transform to the transformed image followed by another interpolation (such as bi-cubic interpolation). When inverse affine transform is applied, the coordinates are changed back to continuous coordinates. To map the coordinates back into 2D image grid again, bi-cubic interpolation can be applied. In some cases, the inverse transform and bicubic interpolation can be expressed as follows.
Here, Mbicubic is bi-cubic interpolation.
Thus, in some embodiments, the broken line artifact image 605 (IB) can be generated using the following operations.
The degree of the artifact can be controllable by the angle θ (such as when a larger θ generates more broken lines). As described above, to generate the final output degraded artifact image 612, the broken line artifact image 605 is blended in the blending operation 610 using the detection map 609 with a jaggy artifact image 607 that includes general jaggy artifacts introduced into the image. In various embodiments, to create the jaggy artifact image 607, the jaggy artifact generation operation 606 can perform a 2× down-sampling operation on the high-quality image 602 using, for example, nearest neighbor down-sampling, and the jaggy artifact generation operation 606 can perform a 2× up-sampling on the image, such as using bicubic up-sampling.
To create the detection map 609, the thin shallow object with smooth background detection operation 608 can perform a thin object detection and a smooth background detection. The thin object detection can include performing, for example, a canny edge detection (Ecanny), a morphological closing (Ecanny/close), a thin object region (Ethin) detection (which can include subtracting the edge detection from the morphological closing (Esub=Ecanny/close−Ecanny)), and a morphological dilation on the subtracted region (Ethin=dilate(Esub)).
The smooth background (Esb) detection can include counting a number of edge pixels from the canny edge map (Ecanny) within a window (such as a 9×9 window). If the number of edge pixels is smaller than a threshold, the operation 608 can determine that a current pixel has a smooth background, and the operation 608 can perform a shallow thin object detection with a smooth background (Esh_thin/sb). On the region where a thin object with smooth background (Ethin/sb=Ethin∩Esb) is present, the operation 608 can additionally determine an orientation of pixel gradients on a grayscale image and determine a number of pixels within a larger window (such as a 15×15 window) that have an orientation gradient close to 0°, 90°, or 180° but not equal to 0°, 90°, or 180°. If the number of pixels satisfying the shallow angle within the larger window is greater than a threshold, the operation 608 keeps the region. Otherwise, the operation 608 removes that region from the detection map 609.
In order to combine the artifacts introduced in both the broken line artifact image 605 and the jaggy artifact image 607, the blending operation 610 blends the broken line artifact image 605 (IB) with the jaggy artifact image 607 (IJ) using the detection map (FThin) (such as the shallow thin object with smooth background map (Esh_thin/sb)). In some cases, the blending can be expressed as follows.
The degraded artifact image 612, which is the final blended image (Iartifact), is output by the blending operation 610.
For illustration,
Although
As described in this disclosure, various types of image artifacts can be introduced to generate degraded images for use during training of a network, such as during the training of the image enhancement network 208 described with respect to
When a 3D object with a background is projected onto a 2D image plane, the edges of the object can be well-sampled and interpolated to avoid aliasing. If inappropriate interpolation is applied, jaggy artifacts without edge transitions can occur. Since the input (such as high quality 2D image) can already include the appropriate interpolation around edges, the artifact simulation techniques of this disclosure can remove the interpolation. Also, the simulation techniques can control the degree of jaggy artifacts without edge transitions.
Although
It will be understood that the artifact simulation architecture 1000 can be used as at least part of the artifact simulation operation 204 during training of an image enhancement network, such as described with respect to
The following now describes how various operations in the architecture 1000 may operate in specific embodiments of this disclosure. The following details are for illustration and explanation only and do not limit the scope of this disclosure to these specific embodiments or details.
In some embodiments, the edge transition region detection operation 1004 can detect edges on the high-quality image 1002 and dilate the edge regions but can exclude horizontal and vertical edge regions. Also, in some embodiments, the jaggy artifact without transition generation operation 1006 uses the edge detection map 1005 with the input high-quality image 1002 to find a current pixel in the detection map and check if the neighbor pixels of the current pixel have similar colors with the current pixel. Note that if the neighbor pixels are within the edge detection map 1005, they do not have to be considered for checking color similarity. If there is a neighbor pixel that has similar color with the current pixel, the jaggy artifact without transition generation operation 1006 can replace the current pixel with the neighbor pixel. Since the pixels within an edge transition region are detected, the jaggy artifact without transition generation operation 1006 can remove the edge transition, causing jaggy artifacts. As described in this disclosure, the degraded artifact image 1008 can be used with the high-quality image 1002 during training of a machine learning model. It will be understood that the artifact simulation architecture 1000 can be performed any number of times on any number of high-quality images to create a plurality of pairs of high-quality images and degraded images to be used for training.
In particular embodiments, the edge transition region detection operation 1004 can perform edge transition region detection by performing a canny edge detection on a grayscale image and a morphological dilation on the detected edge region. The operation 1004 can exclude horizontal and vertical edges by using the same approach as the shallow edge detection described with respect to
In some embodiments, the jaggy artifact without transition generation operation 1006 can perform jaggy artifact generation with transitions by performing the following operations.
Operations (2) and (3) above are iterated for the (i, j)th pixel of Ihigh(i, j, c) where Me (i, j)=1. The degree of artifact can be controlled by the threshold Tmin. The resulting degraded artifact image 1008 is output by the operation 1006.
For illustration,
As additional illustration,
Although
After generating the pairs of high-quality images and the simulated degraded images, the image enhancement network can be trained as described in
Although
As shown in
Although
As shown in
At step 1606, the machine learning model is trained using the generated at least one degraded image and the high-quality image of the environment. For example, this may include the processor 120 using the high-quality image and the degraded image as a pair during training of the machine learning model to teach the model how, given a degraded image, to enhance the image in order to obtain a high-quality image. That is, the high-quality image in the image pair acts as a ground truth with respect to the type of output image to be achieved, and the degraded image acts as an example input as to the types of degraded or artifact images the machine learning model will encounter during inferencing.
Since the simulated degraded image can be generated from the high-quality image, this avoids a need to store all the pairs of high-quality images and degraded images for training its network for image enhancement. Instead, only the high-quality images need to be stored, while the corresponding degraded images can be generated for temporary use during training from the high-quality images (although this need not be the case). Also, as described in this disclosure, the degree of artifacts introduced during the method 1600 can be controlled, which can make training more efficient and robust. The method 1600 can be used with any desired artifact simulation technique(s). Additionally, the method 1600 can be performed any number of times to create any number of pairs of high-quality images and degraded images for use during training.
Training of the machine learning model can involve updating parameters of the model until an acceptable accuracy level is reached. For example, when a loss calculated using a loss function is larger than desired, the parameters of the model can be adjusted. Once adjusted, training can continue by providing the same or additional training data to the adjusted model, and additional outputs from the model (restored high-quality images) can be compared to the ground truths (input high-quality images from the training set) so that additional losses can be determined using the loss function. Eventually, the model produces more accurate outputs that more closely match the ground truths, and the measured loss becomes less. At some point, the measured loss can drop below a specified threshold, and the initial training of the model can be completed.
Although
As shown in
At step 1714, a general jaggy artifact simulation is performed on the high-quality image by introducing jaggy artifacts via a down-sampling operation and an up-sampling operation to generate a second aliasing artifact image. This may include, for example, the processor 120 of the electronic device 101 performing the jaggy artifact generation operation 606 as described with respect to
Although
As shown in
At step 1806, one or more pixels in the detection map associated with an edge transition region are identified using the detection map. At step 1808, it is determined whether the one or more pixels have values within a threshold distance to one or more neighboring pixels in the high-quality image. Locating the one or more neighboring pixels can be performed using a window, such as a 5×5 window, on the high-quality image. At step 1810, the identified one or more pixels are replaced with the one or more neighboring pixels and a final degraded image is output. Steps 1806-1810 may include, for example, the processor 120 of the electronic device 101 performing the jaggy artifact without transition generation operation 1006 as described with respect to
Although
It should be noted that the functions shown in or described with respect to
Although this disclosure has been described with reference to various example embodiments, various changes and modifications may be suggested to one skilled in the art. It is intended that this disclosure encompass such changes and modifications as fall within the scope of the appended claims.
This application claims priority under 35 U.S.C. § 119(e) to U.S. Provisional Patent Application No. 63/527,921 filed on Jul. 20, 2023. This provisional application is hereby incorporated by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
63527921 | Jul 2023 | US |