The present disclosure relates to an electronic device for obtaining a depth map from a coded image and a method for operating the same. In particular, the present disclosure relates to an electronic device and a method operated by the electronic device for obtaining a coded image by using a phase mask having a coded aperture pattern and obtaining a depth map from the obtained coded image by using an artificial intelligence model.
A deconvolution method is used in related art as a technique for restoring an in-focus clear image from an out-of-focus blurry image. The deconvolution method is a technique for obtaining an out-of-focus distorted image by using a phase mask with a coded aperture that causes chromatic aberration and astigmatism, then obtaining a normal in-focus image from the obtained image according to a point spread function (PSF) indicating the correlation between distortion and depth. A phase mask used in known deconvolution methods is located in a part of a diaphragm of a camera for uniform phase control across fields, and is implemented in the form of a film produced by etching an aperture through a method such as photolithography or the like.
A phase mask in related art is located in a part of a diaphragm of a camera and therefore provides the same focus distortion effect for all light entering the diaphragm, resulting in low pixel resolution of a depth value and low accuracy of the depth value. Additionally, because phase mask in related art is formed by etching an aperture pattern on a film, it is implemented as a passive mask in which the aperture pattern is physically fixed. Therefore, phase mask in related art cannot accurately measure depth values according to the characteristics of a shooting environment, such as different depth ranges.
According to an aspect of the present disclosure, an electronic device for obtaining a depth map from a coded image is provided. According to an embodiment of the present disclosure, the electronic device may include a lens assembly including at least one lens, an active mask panel configured to change refractive power of light transmitted through the active mask panel based on an electrical driving signal, an image sensor configured to receive light transmitted through the lens assembly and the active mask panel, and at least one processor. The at least one processor may be configured to generate a first phase mask having a first coded aperture pattern in a first area on the active mask panel based on controlling the electrical driving signal applied to the active mask panel. The at least one processor may be configured to obtain a coded image based on light transmitted through the first phase mask, wherein the coded image is phase-modulated, and wherein the light transmitted through the first phase mask is received via the image sensor. The at least one processor may be configured to obtain a depth map corresponding to the coded image by using an artificial intelligence model trained to extract a depth map from a convolution image.
According to another aspect of the present disclosure, a method for obtaining depth map from a coded image is provided. The method may be performed by at least one processor of an electronic device. The method may include generating a first phase mask having a first coded aperture pattern in a first area based on an electrical driving signal applied to an active mask panel; obtaining a coded image based on light transmitted through the first phase mask, wherein the coded image is phase-modulated; and obtaining a depth map corresponding to the coded image by using an artificial intelligence model trained to extract a depth map from a convolution image.
According to another aspect of the present disclosure, there is provided a non-transitory computer readable medium including one or more instructions that, when executed, cause at least one processor to generate a first phase mask having a first coded aperture pattern in a first area based on an electrical driving signal applied to an active mask panel; obtain a coded image light transmitted through the first phase mask, wherein the coded image is phase-modulated; and obtain a depth map corresponding to the coded image by using an artificial intelligence model trained to extract a depth map from a convolution image.
The present disclosure will be easily understood from the following description taken in conjunction with the accompanying drawings in which reference numerals denote structural elements.
The terms used in embodiments of the present specification are general terms that are currently widely used and are selected by taking into account the functions in the present disclosure, but these terms may vary according to the intention of one of ordinary skill in the art, precedent cases, advent of new technologies, etc. Furthermore, specific terms may be arbitrarily selected by the applicant, and in this case, the meaning of the selected terms will be described in detail in the detailed description of a corresponding embodiment. Thus, the terms used herein should be defined not by simple appellations thereof but based on the meaning of the terms together with the overall description of the present disclosure.
Singular expressions used herein are intended to include plural expressions as well unless the context clearly indicates otherwise. All the terms used herein, which include technical or scientific terms, may have the same meaning that is generally understood by a person of ordinary skill in the art described in the present specification.
Throughout the present disclosure, when a part “includes” or “comprises” an element, unless there is a particular description contrary thereto, it is understood that the part may further include other elements, not excluding the other elements. Furthermore, terms, such as “portion,” “module,” etc., used herein indicate a unit for processing at least one function or operation, and may be implemented as hardware or software or a combination of hardware and software.
The expression “configured to (or set to)” used herein may be used interchangeably, according to context, with, for example, the expression “suitable for,” “having the capacity to,” “designed to,” “adapted to,” “made to,” or “capable of”. The term “configured to (or set to)” may not necessarily mean only “specifically designed to” in terms of hardware. Instead, the expression “a system configured to” may mean, in some contexts, the system being “capable of”, together with other devices or components. For example, the expression “a processor configured to (or set to) perform A, B, and C” may mean a dedicated processor (e.g., an embedded processor) for performing the corresponding operations, or a general-purpose processor (e.g., a central processing unit (CPU) or an application processor (AP)) capable of performing the corresponding operations by executing one or more software programs stored in a memory.
Furthermore, in the present disclosure, it should be understood that when a component is referred to as being “connected” or “coupled” to another component, the component may be directly connected or coupled to the other component, but may also be connected or coupled to the other component via another intervening component therebetween unless there is a particular description contrary thereto.
It must be understood that while red, green, blue, and depth (RGB-D) images, coded images, and depth maps shown in
An embodiment of the present disclosure will be described more fully hereinafter with reference to the accompanying drawings so that the embodiment may be easily implemented by a person of ordinary skill in the art. However, the present disclosure may be implemented in different forms and should not be construed as being limited to embodiments set forth herein.
Hereinafter, embodiments of the present disclosure are described in detail with reference to the accompanying drawings.
Referring to
The electronic device 100 may apply an electrical driving signal to the active mask panel 120 to form a coded aperture pattern, and generate a phase mask 122 having the coded aperture pattern. The coded aperture pattern of the phase mask 122 includes a plurality of apertures having different shapes and sizes, the amount of light transmitted varies depending on an extent of opening of the plurality of apertures, and an amount of transmitted light causes focus distortion according to depths of the objects ob1 and ob2. The coded aperture pattern of the phase mask 122 may include apertures patterned such that, when light transmitted through the phase mask 122 is received by the image sensor 130, there is a specific correlation between a degree of distortion of an image and depth values of the objects ob1 and ob2. In an embodiment of the present disclosure, the phase mask 122 may be formed to have a coded aperture pattern of a shape and size that causes the distortion of the image to occur according to a point spread function (PSF) based on the depth values of the objects ob1 and ob2. The coded aperture pattern of the phase mask 122 may induce a phase delay of light by changing a refractive index of the light according to the depth value of the objects ob1 and ob2.
In an embodiment of the present disclosure, the active mask panel 120 may be implemented as an electrically tunable liquid crystal panel that changes an arrangement angle of liquid crystal molecules disposed in a region corresponding to a coded aperture according to an applied voltage value. The electrically tunable liquid crystal panel is a liquid crystal lens that determines whether light is transmitted based on changes in optical properties of liquid crystal, and is configured to locally adjust refractive power of light passing through liquid crystal molecules and modulate a phase of the light. However, the active mask panel 120 is not limited thereto, and may be implemented as a combination of a polarization-selective meta-lens and an active circular polarizer. In an embodiment of the present disclosure, the active mask panel 120 may be disposed between the lens assembly 110 and the image sensor 130 of the camera.
The image sensor 130 may obtain the coded image 10 by receiving light transmitted through the phase mask 122 of the active mask panel 120. The light reflected by the objects ob1 and ob2 reaches the phase mask 122 of the active mask panel 120 through the lens assembly 110 and is phase-modulated by changing a refractive index via the phase mask, and the phase-modulated light is received by a specific pixel on the image sensor 130. The electronic device 100 may obtain the coded image 10 by converting light received via the image sensor 130 into an electrical signal.
The coded image 10 may be an out-of-focus distorted image which is obtained using light with phase modulated according to the depth values of the objects ob1 and ob2. In an embodiment of the present disclosure, the coded image 10 may be a convolution image with focus distortion according to a PSF.
The electronic device 100 may obtain a depth map 20 by inputting the coded image 10 to an artificial intelligence (AI) model 156 and performing inference by using the AI model 156. The AI model 156 may be a deep neural network (DNN) model trained, via a supervised learning technique, by applying a plurality of previously obtained convolution images as input data and applying a plurality of depth maps respectively corresponding to the plurality of convolution images as output ground truth. In an embodiment of the present disclosure, the AI model 156 may be a U-Net, but is not limited thereto. The electronic device 100 may apply the coded image 10 as input data to the trained AI model 156 to obtain the depth map 20 representing depth values for each pixel in the coded image 10.
A phase mask used in a deconvolution method in related art is located in a part of a diaphragm of a camera, so it provides the same focus distortion effect to all light incident through the diaphragm. As a result, a depth map obtained by the conventional phase mask has problems of low resolution and low accuracy of pixel-wise depth values. Furthermore, because the phase mask in related art is formed by etching apertures in a specific pattern through methods such as photolithography, it is implemented as a passive mask in which an aperture pattern is physically fixed. Therefore, when using the phase mask in related art, there is a technical limitation in that the conventional phase mask cannot accurately measure depth values according to the characteristics of a shooting environment, e.g., for different depth ranges of the object ob1 and ob2.
According to an embodiment of the present disclosure, the electronic device 100 may generate the active phase mask 122 by using the active mask panel 120. The active phase mask panel may change a shape, size, and position of the aperture pattern in the active phase mask 122 according to an electrical driving signal, and obtain the depth map 20 from the coded image 10 obtained using the active phase mask 122. This enables measuring accurate depth values according to the depth values of the objects ob1 and ob2. Furthermore, because the active mask panel 120 according to an embodiment of the present disclosure is disposed between the lens assembly 110 and the image sensor 130 of the camera, it is possible to adaptively change the coded aperture pattern of the phase mask 122 according to the objects ob1 and ob2 having specific depths, rather than causing the same focus distortion for all light, obtaining the depth map 20 with improved accuracy and resolution.
The electronic device 100 may be a device that obtains an image of a real-world object captured by using a camera including a lens assembly 110 and an image sensor 130. The electronic device 100 may be implemented as a variety of devices, such as a mobile device, a smartphone, a laptop computer, a desktop, a tablet computer, a wearable device, an e-book terminal, a digital broadcast terminal, a personal digital assistant (PDA), a portable multimedia player (PMP), a navigation device, an MP3 player, a camcorder, and the like. In an embodiment of the present disclosure, the electronic device 100 may be an augmented reality device. An augmented reality device is a device capable of realizing ‘augmented reality’ and may include not only eye glasses-shaped augmented reality glasses that are typically worn on a user's face, but also a head mounted display (HMD) apparatus (HMD) or an augmented reality helmet, which is worn on the head.
Referring to
The components shown in
The lens assembly 110 is a lens optical system including at least one camera lens. In an embodiment of the present disclosure, the lens assembly 110 may include a plurality of optical lenses having different focal lengths, aperture values (or f-numbers) (e.g., f/1.4, f/2, f/2.8, f/4, f/5.6, f/8, f/11, f/16, f/22, and f/32), diameters, and fields of view (FOVs). Light passing through the lens assembly 110 may be transmitted through the active mask panel 120.
The active mask panel 120 is a transmissive panel capable of variably forming a coded aperture according to an electrical driving signal. In an embodiment of the present disclosure, the active mask panel 120 may be an electrically tunable liquid crystal panel configured to, by changing an arrangement angle of liquid crystal molecules disposed in a region corresponding to the coded aperture according to an applied voltage value, locally adjust a refractive index of light transmitted through the liquid crystal molecules and modulate the phase of light. The electrically tunable liquid crystal panel may be implemented as a transmissive liquid crystal lens that transmits light that has passed through the lens assembly 110. A location of the region where the coded aperture is formed is not fixed on the active mask panel 120 and may be changed. A control voltage applied to the active mask panel 120 may be controlled by the processor 140 and applied to the active mask panel 120 by a voltage control circuit. An embodiment in which the refractive power of a region corresponding to a coded aperture is changed by applying a control voltage is described in detail with reference to
However, the active mask panel 120 is not limited thereto, and may be implemented as a combination of a polarization-selective meta-lens and an active circular polarizer.
The image sensor 130 is an imaging element configured to receive light transmitted through the active mask panel 120, convert luminance or intensity of the received light into an electrical signal, and image the electrical signal, obtaining a coded image. The image sensor 130 may be implemented as, for example, a charge-coupled device (CCD) or complementary metal-oxide-semiconductor (CMOS), but is not limited thereto.
The processor 140 may execute one or more instructions or program code stored in the memory 150 and perform functions and/or operations corresponding to the instructions or program code. The processor 140 may be composed of hardware components that perform arithmetic, logic, and input/output (I/O) operations, and signal processing. The processor 140 may consist of at least one of, for example, a CPU, a microprocessor, a graphics processing unit (GPU), an AP, application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), and field programmable gate arrays (FPGAs), but is not limited thereto.
The processor 140 is shown as an element in
In an embodiment of the present disclosure, the processor 140 may be configured as a dedicated hardware chip that performs AI training.
The memory 150 may store instructions, algorithms and program code readable by the processor 140. The memory 150 may include, for example, at least one of a flash memory-type memory, a hard disk-type memory, a multimedia card micro-type memory, a card-type memory (e.g., an Secure Digital (SD) card or an extreme Digital (XD) memory), random access memory (RAM), static RAM (SRAM), read-only memory (ROM), electrically erasable programmable ROM (EEPROM), PROM, mask ROM, flash ROM, a hard disk drive (HDD), or a solid state drive (SSD).
The memory 150 may store instructions or program code for performing functions or operations of the electronic device 100. In an embodiment of the present disclosure, the memory 150 may store at least one of instructions, algorithms, data structures, program code, and application programs readable by the processor 140. The instructions, algorithms, data structures, and program code stored in the memory 150 may be implemented in programming or scripting languages such as C, C++, Java, assembler, etc.
The memory 150 may store instructions, algorithms, data structures, or program code related to a phase mask generation module 152, a coded image obtaining module 154, and an AI model 156. A “module” included in the memory 150 refers to a unit for processing a function or operation operated by the processor 140, and may be implemented as software such as instructions, algorithms, data structures, or program code.
In the following embodiment, the processor 140 may be implemented by executing instructions or program code stored in the memory 150.
The phase mask generation module 152 may be composed of instructions or program code related to an operation and/or a function of generating a phase mask having a coded aperture pattern by controlling an electrical driving signal applied to the active mask panel 120. The processor 140 may execute the instructions or program code related to the phase mask generation module 152 to generate a phase mask having a coded aperture pattern on the active mask panel 120. The coded aperture pattern of the phase mask may include a plurality of apertures having different shapes and sizes. The amount of light transmitted through the phase mask may vary depending on an extent of opening of the plurality of apertures included in the phase mask, and an amount of light transmitted may cause focus distortion according to a depth of an object. The processor 140 may apply an electrical driving signal to the active mask panel 120 and pattern an aperture so that, when the light transmitted through the phase mask is received by the image sensor 130, a specific correlation is present between a degree of distortion of an image and a depth value of the object. In an embodiment of the present disclosure, the processor 140 may control the active mask panel 120 to generate a phase mask having a coded aperture pattern which causes a signal value for each pixel obtained by the image sensor 130 to be distorted or modulated according to a PSF based on a depth value of the object. The coded aperture pattern of the phase mask may be a pattern including a plurality of apertures that induce a phase delay of light by changing a refractive index of the light according to a depth value of the object. The coded aperture pattern of the phase mask is described in detail with reference to
In an embodiment of the present disclosure, the processor 140 may generate a control voltage waveform with a phase modulation profile for generating the phase mask, and control a power supply device (e.g., a battery) to apply the generated control voltage waveform to the active mask panel 120. The shape, size, and position of the plurality of apertures included in the phase mask may be changed based on the control voltage waveform applied according to control by the processor 140.
The processor 140 may generate at least one phase mask on the active mask panel 120. The processor 140 may generate a plurality of phase masks including different coded aperture patterns on the active mask panel 120. In an embodiment of the present disclosure, the processor 140 may generate a first phase mask in a partial area of the entire region of the active mask panel 120, and generate a second phase mask in another area of the entire region of the active mask panel 120 where the first phase mask is not formed. The first phase mask may include a pattern having a plurality of apertures for obtaining a coded image having a first PSF corresponding to a first depth value, and the second phase mask may include a pattern having a plurality of apertures for obtaining a coded image having a second PSF corresponding to a second depth value. A ‘coded aperture pattern corresponding to a PSF’ means a coded aperture pattern optimized to obtain a highly reliable coded image by maximizing a focus distortion caused by the PSF at a specific depth value. A specific embodiment in which the processor 140 generates a plurality of phase masks on the active mask panel 120 is described in detail with reference to
In an embodiment of the present disclosure, the processor 140 may set a region of interest (ROI) among objects to be imaged, obtain a depth value of an object included in the ROI, and generate a phase mask having a coded aperture pattern for obtaining a coded image having a PSF corresponding to the obtained depth value of the object. In an embodiment of the present disclosure, the electronic device 100 may include a user input unit receiving a user input, and the processor 140 may set an ROI based on a user input received via the user input unit. In another embodiment of the present disclosure, the electronic device 100 is implemented as an augmented reality device, and the augmented reality device may include an eye-tracking sensor that detects a gaze point at which the gaze directions of the two eyes of the user converge by tracking gaze directions of two eyes of the user. In this case, the processor 140 may detect a gaze point by using the user's eye-tracking data obtained using the eye-tracking sensor and set an ROI based on a position of the gaze point. The processor 140 may obtain a depth value of an object included in the ROI and generate a phase mask having a coded aperture pattern for obtaining a coded image having a PSF corresponding to the depth value of the object.
In an embodiment of the present disclosure, the electronic device 100 may further include a low resolution LiDAR sensor for obtaining a depth value of an object. A specific embodiment in which the processor 140 sets an ROI and generates a phase mask for obtaining a coded image having a PSF corresponding to a depth value of an object included in the ROI is described in detail with reference to
In an embodiment of the present disclosure, the processor 140 may generate a coded aperture pattern in an area corresponding to the ROI in the entire region of the active mask panel 120.
The coded image obtaining module 154 is composed of instructions or program code related to an operation and/or a function of obtaining a coded image by receiving, via the image sensor 130, light transmitted through the active mask panel 120. The processor 140 may execute the instructions or program code related to the coded image obtaining module 154 to obtain a coded image. Light related to the object reaches the phase mask of the active mask panel 120 through the lens assembly 110 and is phase-modulated by changing a refractive index via the phase mask, and the phase-modulated light is received by a specific pixel on the image sensor 130. By using the image sensor 130, the processor 140 may receive light transmitted through the phase mask of the active mask panel 120 and convert information about luminance or intensity of the received light into an electrical signal, thereby obtaining a coded image.
In the present disclosure, a ‘coded image’ is an image obtained using light whose phase is modulated according to a depth value of an object, and may be an out-of-focus image with distorted focus. In an embodiment of the present disclosure, the coded image may be a convolution image with focus distortion caused by a PSF.
The AI model 156 may be a model trained to extract a depth map from the convolution image. In an embodiment of the present disclosure, the AI model 156 may be a DNN model trained, via a supervised learning technique, by applying a plurality of previously obtained convolution images as input data and applying a plurality of depth maps respectively corresponding to the plurality of convolution images as output ground truth. ‘Training’ may refer to training a neural network to discover or learn on its own a method of analyzing pieces of input data to the neural network, a method of classifying the pieces of input data, and/or a method of extracting features necessary for generating resultant data from the pieces of input data. In detail, through the training process, the DNN model may optimize weight values inside the neural network by being trained using training data (e.g., the plurality of convolution images and the plurality of depth maps). The DNN model outputs a desired result by processing input data via the neural network having the optimized weight values. A specific embodiment related to training an AI model is described in detail with reference to
In an embodiment of the present disclosure, the AI model 156 may be a U-Net. However, the AI model 156 is not limited thereto, and in another embodiment, the AI model 156 may be implemented as one of a convolutional neural network (CNN), a recurrent neural network (RNN), a restricted Boltzmann machine (RBM), a deep belief network (DBN), a bidirectional recurrent DNN (BRDNN), and a deep Q-network (DQN). Furthermore, the AI model may be subdivided. Furthermore, a CNN may be subdivided into a deep CNN (DCNN), a capsule neural network (Capsnet) (not shown), or the like.
The processor 140 may obtain a depth map from the coded image by using the trained AI model 156. In an embodiment of the present disclosure, the processor 140 may obtain a depth map corresponding to the coded image by inputting the coded image obtained via the image sensor 130 to the AI model 156 and performing inference using the AI model 156 A specific embodiment in which the processor 140 obtains a depth map from a coded image by using the AI model 156 is described in detail with reference to
In
In operation S310, the electronic device 100 generates a first phase mask having a first coded aperture pattern by applying an electrical driving signal to an active mask panel. In an embodiment of the present disclosure, the active mask panel may be implemented as an electrically tunable liquid crystal panel that changes an arrangement angle of liquid crystal molecules disposed in a region corresponding to a coded aperture according to an applied voltage value. The electrically tunable liquid crystal panel is a liquid crystal lens that determines whether light is transmitted based on changes in optical properties of liquid crystal, and is configured to locally adjust refractive power of light passing through liquid crystal molecules and modulate the phase of the light. The electronic device 100 may apply an electrical driving signal to the active mask panel and pattern an aperture so that, when light transmitted through the first phase mask is received by the image sensor (e.g., 130 of
In an embodiment of the present disclosure, the electronic device 100 may form a coded aperture pattern locally in a partial area (also referred to as a first partial area) of the entire region of the active mask panel, and the first phase mask may be generated in the partial area of the active mask panel.
In operation S320, the electronic device 100 obtains a phase-modulated coded image by receiving, via the image sensor 130, light transmitted through the first phase mask. Light related to the object reaches the first phase mask and is phase-modulated by changing a refractive index via the first phase mask, and the phase-modulated light is received by a specific pixel on the image sensor 130. By using the image sensor 130, the electronic device 100 may receive light transmitted through the first phase mask of the active mask panel and convert information about luminance or intensity of the received light into an electrical signal, thereby obtaining the coded image.
In an embodiment of the present disclosure, the coded image may be a convolution image with focus distortion caused by a PSF.
In operation S330, the electronic device 100 obtains a depth map corresponding to the coded image by using the AI model (e.g., 156 of
Referring to
The lens assembly 110 is a lens optical system including a plurality of camera lens 111 to 115. In the embodiment illustrated in
The active mask panel 120 may be disposed between the lens assembly 110 and the image sensor 130. Light related to an object reaches the phase mask of the active mask panel 120 through the lens assembly 110 and is phase-modulated by changing a refractive index via the phase mask. The phase-modulated light is received by a specific pixel on the image sensor 130. Because the material properties, operation and/or function of the active mask panel 120 are the same as those described with reference to
The image sensor 130 may receive light phase-modulated by the phase mask of the active mask panel 120. The image sensor 130 may obtain a coded image by converting luminance or intensity of the received light into an electrical signal and converting the electrical signal into an image.
A phase mask in related art is located in a part of a diaphragm of a camera, so it provides the same focus distortion effect to all light incident through the diaphragm. Thus, a depth map obtained via the phase mask in related art has problems of low resolution and low accuracy of pixel-wise depth values.
The electronic device 100, according to the embodiment illustrated in
Referring to
The coded aperture pattern of the phase mask 122 may be a pattern including a plurality of apertures that induce a phase delay of light by changing a refractive index of the light according to a depth value of an object. The amount of light transmitted through the phase mask 122 may vary depending on an extent of opening of the plurality of apertures included in the phase mask 122, and the amount of light transmitted may cause focus distortion according to a depth of an object. The processor 140 may apply an electrical driving signal to the active mask panel 120 and pattern an aperture so that, when the light transmitted through the phase mask 122 is received by the image sensor (e.g., 130 of
In an embodiment of the present disclosure, the active mask panel 120 may be an electrically tunable liquid crystal panel configured to form a plurality of apertures in a region corresponding to the coded aperture pattern according to an applied voltage value. In this case, the plurality of apertures included in the coded aperture pattern of the phase mask 122 may be formed when an arrangement angle of liquid crystal molecules included in the active mask panel is changed according to a phase modulation profile of the control voltage waveform applied by the processor (e.g., 140 of
Referring to
The active mask panel 120 may be an electrically tunable liquid crystal panel capable of adjusting a refractive index of light by changing an arrangement angle of liquid crystal molecules 120m based on a control voltage applied via the excitation electrodes 120e from a power supply device VAC. In an embodiment, the active mask panel 120 may include an electro-optic material with a pixel grid. Pixels may be arranged in a matrix of N rows and M columns. Each of the N×M pixels may accommodate a set of possible values (gray levels) that are independent of all other pixels.
The liquid crystal layer 120l may be an electro-optic layer including a plurality of liquid crystal molecules 120m. The liquid crystal layer 120l may be an electro-optic layer in which properties of liquid crystals are changed by an applied control voltage. In an embodiment, the liquid crystal layer 120l may include a polarization-independent liquid crystal layer (e.g., cholesteric liquid crystals). In the liquid crystal layer 120l, the arrangement angle of the liquid crystal molecules 120m disposed within a specific area in an active region may be changed according to a control voltage applied via the excitation electrodes 120e, so that a refractive index of the specific area may be locally adjusted.
The common electrode 120CE and the excitation electrodes 120e may receive control voltages from the power supply device VAC and apply the supplied control voltages to the liquid crystal layer 120l. The common electrode 120CE may be arranged to contact a first surface 120-1 of the liquid crystal layer 120l.
The excitation electrodes 120e may be arranged to contact a top surface of the transparent film 120P on a second surface 120-2 opposite to the first surface 120-1 of the liquid crystal layer 120l. The excitation electrodes 120e may include first array excitation electrodes and second array excitation electrodes oriented orthogonally along X-axis and Y-axis directions on the top surface of the transparent film 120P. The first array excitation electrodes and the second array excitation electrodes may each include parallel strips of a conductive material extending over the active region. In an embodiment of the present disclosure, the excitation electrodes 120e may be formed of a transparent conductive material such as indium tin oxide (ITO).
A pixel may be defined by an area where strips of the first array excitation electrodes and strips of the second array excitation electrodes overlap. A center-to-center distance between areas defined by strips of the first array excitation electrodes and strips of the second array excitation electrode may define a pitch of a pixel array, and a width of the strips may define a size of a pixel.
The processor (e.g., 140 of
The processor 140 may form a phase mask by adjusting the refractive power of a pattern, i.e., a coded aperture pattern, including the plurality of apertures in the active mask panel 120. A specific method, operated by the processor 140, of forming a phase mask having a coding aperture pattern of the active mask panel 120 is described in detail with reference to
Referring to
The plurality of first array excitation electrodes 120e-1 to 120e-5 may be arranged in the X-axis direction, and the plurality of second array excitation electrodes 120e-6 to 120e-10 may be arranged in the Y-axis direction. The plurality of first array excitation electrodes 120e-1 to 120e-5 and the plurality of second array excitation electrodes 120e-6 to 120e-10 may be arranged to be orthogonal to each other.
The plurality of driver terminals 120d for controlling a control voltage applied to the plurality of first array excitation electrodes 120e-1 to 120e-5 from the power supply device VAC may be respectively connected to the plurality of first array excitation electrodes 120e-1 to 120e-5. The plurality of driver terminals 120d for controlling a control voltage applied to the plurality of second array excitation electrodes 120e-6 to 120e-10 from the power supply device VAC may be respectively connected to the plurality of second array excitation electrodes 120e-6 to 120e-10.
A controller 140C may be electrically and/or physically connected to the plurality of driver terminals 120d and the power supply device VAC. In
The controller 140C may control the plurality of driver terminals 120d to control a control voltage applied to the plurality of first array excitation electrodes 120e-1 to 120e-5 and the plurality of second array excitation electrodes 120e-6 to 120e-10, to adjust an arrangement angle of liquid crystal molecules disposed in a specific area. Unlike as illustrated in
The processor 140 may determine position values of a plurality of apertures that induce a signal value for each pixel obtained by the image sensor 130 to be distorted or modulated according to a PSF based on a depth value of an object. In an embodiment of the present disclosure, the processor 140 may calculate position coordinate values for each of the plurality of apertures forming a coded aperture pattern in the entire region of the active mask panel 120, and provide information regarding the calculated position coordinate values to the controller 140C. The controller 140C may determine a target area in which to form the plurality of apertures A1 to An, based on the position coordinate values obtained from the processor 140.
In the embodiment illustrated in
The controller 140C may not only control application or non-application of a control voltage from the power supply device VAC, but may also control a magnitude of the control voltage applied from the power supply device VAC. The controller 140C may adjust a size of an arrangement angle of liquid crystal molecules by controlling the magnitude of the applied control voltage. For example, when the controller 140C applies a control voltage of a first magnitude to the second excitation electrode 120e-2 through the plurality of driver terminals 120d, and applies, to the third excitation electrode 120e-3, a control voltage of a second magnitude that is greater than the first magnitude, the arrangement angle of liquid crystal molecules located in an area where the third excitation electrode 120e-3 is disposed in the entire region of the liquid crystal layer 120l may be adjusted to be greater than an arrangement angle of liquid crystal molecules located in an area where the second excitation electrode 120e-2 is disposed.
That is, by modulating the a phase profile of a control voltage applied to the plurality of first array excitation electrodes 120e-1 to 120e-5 and the plurality of second array excitation electrodes 120e-6 to 120e-10 via the plurality of driver terminals 120d, the controller 140C may determine the plurality of apertures A1 to An in which an arrangement angle of the liquid crystal molecules 120m is changed in the entire region of the liquid crystal layer 1201, and form a coded aperture pattern including the plurality of apertures A1 to An.
In
Referring to
The processor (e.g., 140 of
The convolution image 730 is a simulated image in which per-pixel image values are focus distorted according to depth values by performing convolution of the pixels included in the image with the plurality of PSF patterns 720-1 to 720-n corresponding to the depth values. In an embodiment of the present disclosure, the processor 140 may obtain a plurality of convolution images 730 by using a plurality of RGB-D images 700, a plurality of depth map masks 710-1 to 710-n respectively corresponding to the plurality of RGB-D images 700, and a plurality of PSF patterns 720-1 to 720-n corresponding to different depth values. The plurality of RGB-D images 700 may be images previously obtained to generate training data.
In operation S720, the electronic device 100 trains the AI model 156 by using the training data. The processor 140 of the electronic device 100 may train the AI model 156, via a supervised learning technique, by applying the plurality of convolution images 730 as input data and a plurality of depth maps 740 respectively corresponding to the plurality of convolution images 730 as output ground truth. The plurality of depth maps 740 may be images previously obtained based on per-pixel depth value data of the RGB-D images 700. In an embodiment of the present disclosure, the AI model 156 may be a DNN model. ‘Training’ may refer to training a neural network to discover or learn on its own a method of analyzing pieces of input data to the neural network, a method of classifying the pieces of input data, and/or a method of extracting features necessary for generating resultant data from the pieces of input data. In detail, through the training process, the DNN model may optimize weight values inside the neural network by being trained using the training data (e.g., the plurality of convolution images and the plurality of depth maps).
In an embodiment of the present disclosure, the AI model 156 may be implemented as a U-Net. The U-Net is a type of DNN model used for pixel-level prediction and may have an encoder-decoder architecture. An encoder in the U-Net consists of the repeated application of two 3×3 convolutions which may be followed by a rectified linear unit (ReLU) and batch normalization. The encoder performs downsampling on an image, and in the downsampling operation, a resolution of the image is reduced by ½ via a 2×2 max pooling operation with width of 2 and number of channels of 2. A decoder in the U-Net consists of upsampling of a feature map, a 2×2 convolution that halves the number of channels, and two 3×3 convolutions, followed by a ReLU and batch normalization. The U-Net may concatenate the encoder with the decoder, and is trained by using a 1×1 convolution at the final layer with a sigmoid to map each pixel to a given depth value.
Through the above-described method, the AI model 156 (e.g., the U-Net) may be trained to extract the depth map 740 from the convolution image 730.
In the embodiment illustrated in
In the embodiment of
Referring to
In an embodiment of the present disclosure, the electronic device 100 may obtain the coded image 810 with focus distortion by receiving, via the image sensor (e.g., 130 of
The AI model 156 may be a DNN model trained, though the method described with reference to
Referring to
The first phase mask 122 may be formed to have a first coded aperture pattern. The first coded aperture pattern may be a first pattern having a plurality of apertures for obtaining a coded image having a first PSF pattern corresponding to a first depth value. The second phase mask 124 may be formed to have a second coded aperture pattern. The second coded aperture pattern may be a second pattern having a plurality of apertures for obtaining a coded image having a second PSF pattern corresponding to a second depth value different from the first depth value. Here, a pattern for obtaining a coded image having a PSF pattern corresponding to a specific depth value (e.g., the first depth value or the second depth value) refers to a coded aperture pattern optimized to maximize focus distortion caused by the PSF corresponding to the specific depth value and thus obtain a highly reliable coded image.
The positions of the first phase mask 122 and the second phase mask 124 may be changed based on a position of at least one excitation electrode to which a control voltage waveform is applied by the processor 140 from among a plurality of excitation electrodes respectively included in the first array excitation electrodes and the second array excitation electrodes.
In
Referring to
In the embodiment illustrated in
In an embodiment of the present disclosure, operation S1110 may be performed after operation S330 shown in
In operation S1110, the electronic device 100 sets an ROI based on a user input or eye-tracking data. In an embodiment of the present disclosure, the electronic device 100 may include a user input unit for receiving a user input for selecting an ROI including a specific object among real-world objects. For example, the user input unit may be configured as a touch screen that displays a preview image of an object and receives a touch input on a specific area in the preview image. However, the user input unit is not limited thereto, and in another embodiment, the user input unit may include a voice input interface (e.g., a microphone) that receives utterances from the user. The processor 140 of the electronic device (e.g., 100 of
In an embodiment of the present disclosure, the electronic device 100 may be implemented as an augmented reality device and include an eye-tracking sensor that tracks a direction of a user's gaze. By tracking gaze directions of two eyes of the user, the processor 140 of the electronic device 100 may detect a gaze point at which the gaze directions of the two eyes of the user converge. The processor 140 may set an ROI based on a position of the detected gaze point.
In operation S1120, the electronic device 100 obtains a depth value of an object corresponding to the ROI. In an embodiment of the present disclosure, the electronic device 100 may further include a low resolution LiDAR sensor. The processor (e.g., 140 of
In operation S1130, the electronic device 100 generates a coded aperture pattern for obtaining a coded image having a PSF pattern corresponding to the depth value of the object included in the ROI. In an embodiment of the present disclosure, the processor 140 of the electronic device 100 may obtain position information of at least one pixel on the image sensor (e.g., 130 of
When the ROI is set to a plurality of regions, the processor 140 may generate a plurality of coded aperture patterns having different shapes and sizes in partial regions of the entire region of the active mask panel 120. A specific embodiment in which the processor 140 generates a plurality of coded aperture patterns on the active mask panel 120 is described in detail with reference to
Referring to
The electronic device 100 may generate the phase masks 122, 124, 126, and 128 having coded aperture patterns on regions corresponding to the at least one ROI ROI1 to ROI4 in the entire region of the active mask panel 120. The processor (e.g., 140 of
The electronic device 100 may obtain the coded image 1210 by capturing images of the objects ob by using the active mask panel 120 on which the phase masks 122, 124, 126, and 128 are formed. The obtained coded image 1210 may be an image in which areas 1222, 1224, 1226, and 1228 corresponding to the ROIs are focus distorted according to different PSFs.
Referring to
The electronic device 100 may generate a second phase mask 124 in a partial area of the entire region of the active mask panel 120 corresponding to the second ROI ROI2. The second phase mask 124 may include a coded aperture pattern for obtaining a coded image having a PSF corresponding to a depth value of the object included in the second ROI ROI2. The coded aperture pattern of the second phase mask 124 may include a plurality of apertures having different shapes and sizes than in the coded aperture pattern of the first phase mask 122.
The electronic device 100 may obtain the final coded image 1310 by receiving light transmitted through the first phase mask 122 and the second phase mask 124 of the active mask panel 120 and converting the received light into electrical signals. The final coded image 1310 may be an image in which image values of pixels corresponding to the first ROI ROI1 and the second ROI ROI2 are focus distorted according to different PSFs.
The electronic device 100 obtain the depth map 1320 corresponding to the final coded image 1310 by inputting the final coded image 1310 as input data to the AI model 156 and performing inference using the AI model 156.
The electronic device 100 according to the embodiment illustrated in
According to an aspect of the present disclosure, an electronic device 100 for obtaining a depth map from a coded image is provided. According to an embodiment of the present disclosure, the electronic device 100 may include a lens assembly 110 including at least one lens, an active mask panel 120 configured to change refractive power of light transmitted through the active mask panel 120 according to an electrical driving signal, an image sensor 130 configured to receive the light transmitted through the lens assembly 110 and the active mask panel 120, and at least one processor 140. The at least one processor 140 may be configured to generate a first phase mask having a first coded aperture pattern on the active mask panel 120 by controlling the electrical driving signal applied to the active mask panel 120. The at least one processor 140 may be configured to obtain a coded image, which is phase-modulated, by receiving, via the image sensor 130, light transmitted through the first phase mask. The at least one processor 140 may be configured to obtain a depth map corresponding to the coded image by using an AI model trained to extract a depth map from a convolution image.
In an embodiment of the present disclosure, the active mask panel 120 may be an electrically tunable liquid crystal panel that, by changing an arrangement angle of liquid crystal molecules disposed in a region corresponding to a coded aperture according to an applied voltage value, locally adjusts refractive power of light transmitted through the liquid crystal molecules and modulates phase of the light.
In an embodiment of the present disclosure, the active mask panel 120 may be disposed between the lens assembly 110 and the image sensor 130 of a camera.
In an embodiment of the present disclosure, the AI model may be a DNN model trained, via supervised learning, by applying a plurality of previously obtained convolution images as input data and applying a plurality of depth maps as output ground truth. The plurality of convolution images may be images obtained by performing convolution of a plurality of RGB-D images with PSF patterns corresponding to pixel-wise depth values of the plurality of RGB-D images.
In an embodiment of the present disclosure, the first phase mask may be formed locally in a partial area of an entire region of the active mask panel 120, and a position of the first phase mask may be changed according to the electrical driving signal.
In an embodiment of the present disclosure, the at least one processor 140 may be configured to generate a second phase mask having a second coded aperture pattern in a partial area of the active mask panel 120.
In an embodiment of the present disclosure, the first coded aperture pattern may be a pattern having a plurality of apertures for obtaining a first coded image having a first PSF corresponding to a first depth value. The second coded aperture pattern may be a pattern having a plurality of apertures for obtaining a second coded image having a second PSF corresponding to a second depth value.
In an embodiment of the present disclosure, the electronic device 100 may further include an eye-tracking sensor configured to obtain gaze information by tracking gaze directions of two eyes of a user. The at least one processor 140 may set an ROI based on eye-tracking data obtained by the eye-tracking sensor. The at least one processor 140 may be configured to obtain a depth value of an object included in the set ROI by using a low resolution LiDAR sensor, and generate the second coded aperture pattern for obtaining a second coded image having a PSF corresponding to the obtained depth value.
In an embodiment of the present disclosure, the at least one processor 140 may be configured to generate the second coded aperture pattern in an area corresponding to the ROI in the entire region of the active mask panel 120.
In an embodiment of the present disclosure, the at least one processor 140 may be configured to obtain a final coded image by receiving, via the image sensor 130, light transmitted through the active mask panel 120 including the first phase mask and the second phase mask. The at least one processor 140 may be configured to input the final coded image to the AI model and obtain a depth map for the final coded image via inference using the AI model.
According to another aspect of the present disclosure, a method, operated by the electronic device 100, of obtaining a depth map from a coded image is provided. In an embodiment of the present disclosure, the method may include generating a first phase mask having a first coded aperture pattern by applying an electrical driving signal to the active mask panel 120 (S310). The method may include obtaining a coded image, which is phase-modulated, by receiving, via the image sensor 130, light transmitted through the first phase mask (S320). The method may include obtaining a depth map corresponding to the coded image by using an AI model trained to extract a depth map from a convolution image (S330).
In an embodiment of the present disclosure, the active mask panel 120 may be an electrically tunable liquid crystal panel that, by changing an arrangement angle of liquid crystal molecules disposed in a region corresponding to a coded aperture according to an applied voltage value, locally adjusts refractive power of light transmitted through the liquid crystal molecules and modulates phase of the light.
In an embodiment of the present disclosure, the method may further include generating training data by obtaining a plurality of convolution images by performing convolution of a plurality of RGB-D images with PSF patterns corresponding to pixel-wise depth values of the plurality of RGB-D images (S710), and training the AI model, via supervised learning, by applying the obtained plurality of convolution images as input data and applying a plurality of depth maps respectively corresponding to the plurality of RGB-D images as output ground truth (S720).
In an embodiment of the present disclosure, the first phase mask may be formed locally in a partial area of an entire region of the active mask panel 120, and a position of the first phase mask may be changed according to the electrical driving signal.
In an embodiment of the present disclosure, the method may further include generating a second phase mask having a second coded aperture pattern in a partial area of the active mask panel 120.
In an embodiment of the present disclosure, the first coded aperture pattern may be a pattern having a plurality of apertures for obtaining a first coded image having a first PSF corresponding to a first depth value. The second coded aperture pattern may be a pattern having a plurality of apertures for obtaining a second coded image having a second PSF corresponding to a second depth value.
In an embodiment of the present disclosure, the method may further include setting an ROI based on a user input or eye-tracking data of a user (S1110), and obtaining a depth value of an object included in the set ROI (S1120). The generating of the second phase mask may include generating the second coded aperture pattern for obtaining a second coded image having a PSF corresponding to the depth value of the object included in the ROI.
In an embodiment of the present disclosure, the generating of the second phase mask may include generating the second coded aperture pattern in an area corresponding to the ROI in the entire region of the active mask panel 120.
In an embodiment of the present disclosure, the method may further include obtaining a final coded image by receiving, via the image sensor 130, light transmitted through the active mask panel 120 including the first phase mask and the second phase mask. The method may further include inputting the final coded image to the AI model and obtaining a depth map for the final coded image via inference using the AI model.
To solve the technical problems described above, according to another aspect of the present disclosure, there is provided a computer program product including a computer-readable storage medium having recorded thereon a program for execution on a computer. The storage medium may include instructions that are readable by the electronic device 100 to generate a first phase mask having a first coded aperture pattern by applying an electrical driving signal to the active mask panel 120, obtain a coded image, which is phase-modulated, by receiving, via the image sensor 130, light transmitted through the first phase mask, and obtain a depth map corresponding to the coded image by using an AI model trained to extract a depth map from a convolution image.
A program executed by the electronic device 100 described in this specification may be implemented as a hardware component, a software component, and/or a combination of the hardware component and the software component. The program may be executed by any system capable of executing computer-readable instructions.
Software may include a computer program, a piece of code, an instruction, or a combination of one or more thereof, and configure a processing device to operate as desired or instruct the processing device independently or collectively.
The software may be implemented as a computer program including instructions stored in computer-readable storage media. Examples of the computer-readable recording media include magnetic storage media (e.g., ROM, RAM, floppy disks, hard disks, etc.), optical recording media (e.g., compact disc (CD)-ROM and a digital versatile disc (DVD)), etc. The computer-readable recording media may be distributed over computer systems connected through a network so that computer-readable code may be stored and executed in a distributed manner. The media may be readable by a computer, stored in a memory, and executed by a processor.
A computer-readable storage medium may be provided in the form of a non-transitory storage medium. In this regard, the term ‘non-transitory’ only means that the storage medium does not include a signal and is a tangible device, and the term does not differentiate between where data is semi-permanently stored in the storage medium and where the data is temporarily stored in the storage medium. For example, the ‘non-transitory storage medium’ may include a buffer in which data is temporarily stored.
Furthermore, programs according to embodiments disclosed in the present specification may be included in a computer program product when provided. The computer program product may be traded, as a product, between a seller and a buyer.
The computer program product may include a software program and a computer-readable storage medium having stored thereon the software program. For example, the computer program product may include a product (e.g., a downloadable application) in the form of a software program electronically distributed by a manufacturer of the electronic device 100 or through an electronic market (e.g., Samsung Galaxy Store™ and Google Play Store™). For such electronic distribution, at least a part of the software program may be stored in the storage medium or may be temporarily generated. In this case, the storage medium may be a storage medium of a server of a manufacturer of a vehicle or the electronic device 100, a server of the electronic market, or a relay server for temporarily storing the software program.
In a system including the electronic device 100 and/or a server, the computer program product may include a storage medium of the server or a storage medium of the electronic device 100. Alternatively, in a case where there is a third device (e.g., a mobile device or wearable device) communicatively connected to the electronic device 100, the computer program product may include a storage medium of the third device. Alternatively, the computer program product may include a software program itself that is transmitted from the electronic device 100 to an electronic device or the third device or that is transmitted from the third device to the electronic device.
In this case, at least one of the electronic device 100 and the third device may execute the computer program product to perform methods according to the disclosed embodiments. Alternatively, one of the electronic device 100 and the third device may execute the computer program product to perform the methods according to the disclosed embodiments in a distributed manner.
For example, the electronic device 100 may execute the computer program product stored in the memory (e.g., 150 of
In another example, the third device may execute the computer program product to control an electronic device communicatively connected to the third device to perform the methods according to the disclosed embodiments.
In a case where the third device executes the computer program product, the third device may download the computer program product from the electronic device 100 and execute the downloaded computer program product. Alternatively, the third device may execute the computer program product that is pre-loaded therein to perform the methods according to the disclosed embodiments.
While the embodiments have been described above with reference to limited examples and figures, it will be understood by those of ordinary skill in the art that various modifications and changes in form and details may be made from the above descriptions. For example, adequate effects may be achieved even when the above-described techniques are performed in a different order than that described above, and/or the aforementioned components such as computer systems or modules are coupled or combined in different forms and modes than those described above or are replaced or supplemented by other components or their equivalents.
Number | Date | Country | Kind |
---|---|---|---|
10-2022-0031647 | Mar 2022 | KR | national |
This application is a continuation of International Application No. PCT/KR2023/000649, filed on Jan. 13, 2024, at the Korean Intellectual Property Office, which claims priority to Korean Patent Application No. 10-2022-0031647, filed on Mar. 14, 2022, at the Korean Intellectual Property Office, the disclosures of which are incorporated herein in their entireties.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/KR2023/000649 | Jan 2023 | WO |
Child | 18885122 | US |