This invention relates to data-efficient photorealistic 3D holography, and in particular use of Layered Depth Images and Deep Double Phase Encoding.
A number of techniques, whether used alone or in combination, provide data efficient and/or computation efficient computer-generated holography, examples of which may be implemented on low-power devices such as smartphones and virtual-reality/augmented-reality devices and provide high fidelity holographic images. These techniques include:
In a first aspect, in general, a first method for generating a digital hologram includes accepting a layered representation (an “LDI” representation) of a three-dimension image and forming the digital hologram from the LDI representation. The LDI representation includes, for each location of a plurality of locations (e.g., x-y pixels) of the hologram, two or more depth values (e.g., z values) each representing spatial coordinates of a point along a line (e.g., a ray along the z-axis) at an intersection of the line and a surface of an object in the three-dimensional image. In at least some examples, the layered representation comprises corresponding two or more RGB+depth (“RGBD”) images, where the depth values are not necessarily quantized at a voxel-based resolution of the three-dimensional image. In some examples, the LDI representation is precomputed, for example, using a ray-casting procedure resulting in a fixed size representation (e.g., a fixed number of RBGD images per video frame) that is stored and retrieved as the images are rendered in holographic form, while in other examples, the LDI representations are determined “on the fly.” for example, based on retrieving a stored surface model representation of each three-dimensional image.
In a second aspect, in general, a second method for generating a digital hologram includes accepting a layered representation (an “LDI” representation) of a three-dimension image and using a neural network to generate the digital hologram from the LDI representation. The method includes inputting the layered representation of one RBGD layer and preferably at least two layers (e.g., all layers of the layered representation) into a neural network (e.g., a convolutional neural network, CNN) whose output is used to generate the digital hologram. In some examples, the output of the CNN represents a complex hologram (e.g., an array of representations of magnitude and phase), and the complex hologram is further processed to form a double-phase representation (e.g., an array of pairs of phases, which together represent magnitude and phase) of the hologram.
The method for generating the digital hologram can include one or more of the following features.
The layered representation of the three-dimensional image (or sequential images/frames of a video) is precomputed and stored, and retrieved during generation of the hologram.
The layered representation of the three-dimensional image is computed during the generation of the hologram (e.g., frame by frame for a video).
The layered representation is computed based on a user's direction of view, for example, determined using an eye tracker.
In a third aspect, in general, a method for training (e.g., determining values of configurable parameters/weights of) the neural network used in the second method for generating a digital hologram uses training data that for each training item associates an LDI representation of a three-dimensional image as input and a function of a double-phase hologram representation as output. In some examples, this training method permits end-to-end training of the neural network to match the function of the double-phase representation and a function of a reference hologram corresponding to the LDI representation.
The method for training the neural network can include one or more of the following features.
The reference hologram is generated using the first method for generating a digital hologram from an LDI representation.
The function of the double-phase representation and the function of the reference hologram each comprises a focal stack of images.
The neural network is trained to optimize a match of (e.g., to minimize a loss function between) a focal stack (e.g., a set of images determined at different focal lengths derived from a hologram) derived from the double-phase encoding and a target focal stack corresponding to the input LDI representation. For example, the neural network is configured to generate a complex hologram that is passed through a complex hologram to double-phase encoder. Training then optimizes the neural network to best generate the complex hologram suitable for processing through the double-phase encoder.
The neural network includes two parts, for example, two separate CNNs. Training of the first part is based on a matching of a target hologram with an output of the first part of the neural network (e.g., a complex hologram and/or a midpoint hologram), for example, based on the matching of a target focal stack with a focal stack derived from the output of the first part of the neural network.
After the first part is trained, the second part (e.g., a second CNN) is trained to accept an output (e.g., a complex hologram) of the first part (or a transformation of that output) as its input and to generate a pre-processed complex hologram that is passed through a complex hologram to a double-phase encoder. The training of the second part is based, for example, on the matching of a target focal stack with a focal stack derived from the double-phase encoding. In some examples, the second part essentially pre-processes a complex hologram before feeding the resulting pre-processed hologram to the complex hologram to the double-phase encoder to better eliminate encoding artifacts.
In another aspect, in general, a method for generating a digital hologram using a neural network integrates at least one of vision correction (e.g., astigmatism) and lens aberration correction into the neural network. In these methods, the target hologram (and thereby the functions of the target hologram) incorporates the optical effects that correct the vision or lens effects. The neural network then learns a transformation that essentially integrates the correction thereby avoiding a need for further correction prior to presentation of a holographic image to the user.
In yet another aspect, in general, a holographic imaging system includes a neural network that accepts a layered representation of a three-dimension image (or video frame sequence) as input and uses the neural network to generate a double-phase hologram that configures a spatial light modulator (SLM). The system includes a light source for passing light via the SLM to yield a holographic image, for example, for presentation to a user. Some examples of the system include a ray tracer for generating the layered representations from three-dimensional (e.g., surface) representations of objects in the image.
Other features and advantages of the invention are apparent from the following description, and from the claims.
Referring to
In some examples, the hologram 130 may represent an amplitude and a phase (e.g., a gain and a phase delay) at each location in the image for each color channel. Such a hologram may be referred to and computationally represented as a “complex” intensity, recognizing that representation of phase and magnitude using complex numbers is a mathematical convenience rather than necessarily corresponding to a physical phenomenon. In some examples, hologram 130 may represent a phase-only hologram in which only the phase (and not the amplitude) is represented for each location in the image. The display 140 may be viewed via a viewer's natural optical system, that is, their eyes, of through a physical or simulated lens system to yield a two-dimensional image as one would produce, for example, using a conventional camera. As described below, in some embodiments, the formation of the complex image takes into account characteristics of the optical system used to view the image, for example, to compensate for certain non-ideal properties of such an optical system. In one such embodiment, the optical system is a user's natural vision system and the non-ideal properties are near- or far-sightedness of the user.
In
Referring to
A first transformation 115 produces a relatively small number (e.g., N=6) of layered depth images (LDI) 120. The nth LDI includes for each pixel location (x, y) an intensity an(x, y) at that location, and a depth of the image n(x, y) at that location (with the depth in general being shared among multiple color channels). Generally, the original volumetric image 110 may be approximated by combining the LDI images, such that
Note that while the x-y plane may be quantized into pixels, the depth values (x, y) are not required to be similarly quantized, and therefore the depth resolution may be greater than might be attainable in an explicit three-dimensional voxel representation of the volumetric image (and accordingly in the equation above, ≈n(x, y) represents a rounding to the depth of a voxel from a possibly higher accuracy depth of the LDI image). Such a higher resolution may be attainable, for example, by direct computation of the LDI during animation rather than explicitly producing an intermediate voxel representation. A variety of techniques may be used to compute the LDI images. For example, computation of LDI images for use in other image presentation tasks is described in Shade et al. “Layered depth images.” In Proceedings of the 25th annual conference on Computer graphics and interactive techniques, pp. 231-242. 1998, and in L. Bavoil, K. Myers, “Order independent transparency with dual depth peeling.” NVIDIA OpenGL SDK (2008). In this holographic application of LDI, sufficient parts of the volumetric image are represented in the LDI such that light paths through the aperture of display are sufficiently represented (i.e., all or at least sufficient rays though an imaging lens should be represented in the LDI images to avoid artifacts in the rendered hologram).
A learned transformation 125, which is discussed in detail below, is applied to the LDI images 125 to yield a double-phase hologram 130. This hologram provides a spatially-varying phase transformation for transmitted light through the display 140 but does not introduce a spatial amplitude variation. For example, local amplitude variation is achieved by summation of offsetting (e.g., equal magnitude) phase delays, for example, as represented in a complex representation with neighboring phase delays a and b (in radians), the net effect is eja+ejb=cos(a−b)/2)ej(a+b)/2.
Referring to
Referring to
The output 435 of the first neural network 430 is then processed by a plane translation procedure 440. This plane translation is deterministic and represents a transformation that would be applied to a complex hologram at the mid-plane to yield a complex hologram at the desired display plane for use with the display 140. This transformation is defined by the free-space propagation from the mid plane to the display plane. This results in a complex hologram 445.
A second learned neural network 450 is applied to the complex hologram 445 to yield a modified hologram 455. This hologram is passed to a fixed (i.e., not learned and/or parameterized) double-phase encoder 460, which uses the amplitude and phase of each pixel value of the input hologram 455 to yield a phase-only representation, for example, at the same pixel resolution as the hologram or optionally at a greater resolution such as twice the pixel density of the hologram (i.e., two phase values per complex hologram value). Each phase value in the double-phase hologram controls one phase-delay element of the display 140.
Therefore, in summary, there are two learned (trained) transformations, the first neural network 430 and the second neural network 450, while the remaining transformation steps of plane translation 440 and double-phase encoding are fixed, for example, based on physical principles.
Referring to
Each image in the focal stack 555 is an image (e.g., an RGB image) rendered based on the hologram 545 with a different focus depth. Therefore, different parts of the original volumetric image will be in focus in each of the images of the focal stack. For example, the dimension is uniformly sampled in to yield corresponding focal images of the focal stack at those sampled depth. These images are computed digitally based on the complex hologram using conventional imaging techniques. In some examples, the depths at which the focal stack images are focused is a fixed set, while in some examples, the depths depend on the content of the volumetric image, while in yet other examples, some of the depths are fixed and some are determined from the image.
Continuing to refer to
Note that the slab stack 525 may optionally be computed “on the fly” such that the required slabs of the stack may be computed during the forming of the target hologram. Such computation may be preferable to avoid having to store the entire slab stack before computing the hologram.
This target hologram 550 is then processed to digitally compute the images of the target focal stack. For example, angular spectral method (ASM) may be used as described in Matsushima, K. et al. “Band-limited angular spectrum method for numerical simulation of free-space propagation in far and near fields.” Opt. Express 17, 22, 19662-19673 (2009).
Training of the neural networks 430, 450 is performed in two stages. Dividing the training into these two stages may provide advantages of convergence to a better overall solution and/or more rapid convergence to a solution, thereby reducing the amount of computation required.
Referring to
Note that a midpoint hologram computed by the trained first neural network 430 could be used for display, for example, by applying the plane transformation 440 (see
Referring to
The second training stage provides values of the trainable parameters 726 of a second neural network 450. As illustrated in
Note that the double phase encoder embodies the capabilities of the display 140 via which the hologram will be displayed. For example, if phase settings have limited resolution or pixel resolution is limited, or there is a particular approach to assigning phase pairs to encode pixel amplitude and phase is phase-only pixels, such limitations are represented in the double-phase encoder 460 used in the training. Therefore, the neural network 450 has the possibility of compensating for limitations of the display or the phase-only encoding process.
The training approach does not have a direct reference to assess the quality of the double-phase hologram. Therefore, there is no direct way of defining a loss function according to which to train the neural network 460. Rather, the loss function is defined in terms of transformations of the double-phase hologram.
The transformations of the double-phase hologram that are used to evaluate the loss function beings with processing the phase-only hologram using a complex hologram encoder 770 to return the hologram back into a complex hologram form. This transformation is deterministic and makes use of the prediction of how light would be controlled by the pixels of the display. The complex hologram undergoes a plane transformation 780 to yield a mid-plane complex hologram 550. Generally, this mid-point hologram 550 corresponds to the mid-plane hologram 645 of the first training stage (see
The mid-plane hologram 550 of the second training stage is transformed into a focal stack 785 (referred to as the “post encoding” focal stack) in a process that is essentially the same as that used to convert mid-plane hologram 645 to yield the focal stack 655 (referred to as the “pre-encoding” focal stack) as illustrated in
In examples in which the training takes into account non-ideal viewing optics through which the output hologram 130 is viewed, these non-ideal characteristics are “inverted” or otherwise compensated for in the output double-phase hologram 130. This is accomplished by the transformation of the mid-plane hologram 550 to the focal stack simulating the non-ideal optics characteristics. For example, if the hologram is to be presented to a human viewer 150 with near sight, the focal stack represents essentially the retinal images the viewer would sense rather than what an ideal camera would acquire with different focal lengths and aperture settings.
Having outlined the overall runtime and training approaches, the remainder of this document provides details of specific steps in which these approaches have been evaluated.
To compute a 3D hologram from an LDI, ray casting is performed from each point's mesh (i.e., pixel) at the recorded 3D location. Because runtime is ultimately unimportant for dataset synthesis, we use the silhouette-mask layer-based method (SM-LBM), as described in Zhang, H., Cao, L. & Jin, G. “Computer-generated hologram with occlusion effect using layer-based processing.” Appl. Opt. 56, F138-F143 (2017), with ultra-dense depth partition to avoid the mixed-use of geometric and wave optics models. SM-LBM was originally proposed to receive a voxel grid input generated by slab-based rendering, which does not scale with increasing depth resolution. Using SM-LBM with LDI is implemented such that a non-zero pixel in an LDI defines a valid 3D point before depth quantization. When the number of depth layers N is determined, each point is projected to its nearest plane, and a silhouette is set at the same spatial location. Denote the complex amplitude distribution of the N-th layer LN∈R
Here, N is the signed distance from the N-th layer to the hologram plane, where a negative distance denotes a layer behind the hologram plane and vice versa, AN∈x×R
where μ∈R
and the complex amplitude at the N−1 layer is updated by adding the masked complex field
L
N−1
=C
N−1
M
N−1
+L
N−1.
By iterating this process until reaching the first layer, the final complex hologram is obtained by propagating the updated first layer to the hologram plane.
We further augment SM-LBM with aberration correction at a cost of computational efficiency. Reconsidering the forward propagation of N-th layer LN, we only process the occlusion of frontal layer without adding their content, namely removing the second addition term in the previous equation. After processing occlusion of all frontal layers, we propagate the resulted wavefront back to the starting location of Ly to obtain an occlusion-processed L′N. We then perform aberration correction in the frequency domain
L′
N
=−1((L′N)⊙Φ
where ∈R
SM-LBM and its aberration-correction variant may be slow due to sequential occlusion processing. To improve the performance, we generate a new dataset with LDIs and SM-LBM holograms, and train a CNN to accelerate inference. Generating this dataset involves setting three significant parameters: the depth of the 3D volume, the number of layers used by LDIs, and the number of layers (depth resolution) used by SM-LBM.
We set the 3D volume depth to be 6 mm under collimated illumination to facilitate quantitative comparison with the publicly available TensorHolo VI1 network as described in Shi, L., et al. “Towards real-time photorealistic 3D holography with deep neural networks,” Nature 591, 234-239 (2021), and similarly for the random scene configuration. To determine the number of layers for LDIs, we compute the mean peak signal to noise ratio (PSNR) and the mean structure similarity index (SSIM) for the amplitude maps of the holograms computed from LDIs with N=1,2, . . . ,9 layers against the ones computed from LDIs with N=10 layers (after which we observe few valid pixels) over 10 random scenes. The mean SSIM plateaus after N=5, reflecting a diminishing improvement with more layers. Thus, we choose N=5 for this work, but more layers can be used for higher accuracy. Similarly, to determine the number of layers for SM-LBM, we compute the holograms using 2N
Although a CNN can be trained to directly predict an unconstrained 3D phase-only hologram using unsupervised learning by only forcing the focal stack to match the one produced by the target complex hologram, ablation studies have shown that removing the supervision of ground truth complex hologram noticeably degrades the image quality and enforcing the phase-only constraint can only worsen the performance. Moreover, direct synthesis of phase-only holograms prevents the use of midpoint hologram for reducing computational cost since learning an unconstrained midpoint phase-only hologram does not guarantee a uniform amplitude at the target hologram plane.
As introduced above, a two-stage supervised and unsupervised training is used to overcome these challenges. An insight is to keep using the double phase principle to perform phase-only encoding for retaining the advantage of learning the midpoint hologram, while embedding the encoding process into the end-to-end training pipeline and relegating the CNNs to discover the optimal pre-encoding complex hologram through unsupervised training. We detail the training process below and refer to this neural phase-only conversion method as the deep double phase method (DDPM).
The first stage supervised training trains two versions of CNNs. Both are trained to predict the target midpoint hologram computed from the LDI input, but one receives the full LDI and the other receives only the first layer of the LDI. The latter CNN has an additional job to hallucinate the occluded points close to the depth boundaries and fill in their missing wavefront. It is particularly useful for reducing the rendering overhead and for reproducing real-world scenes captured as RGB-D images, where physically capturing a pixel-aligned LDI is nearly impossible.
Once the CNN excels at this task, we initialize the second stage unsupervised training by applying a chain of operations to the network-predicted midpoint hologram {tilde over (H)}mid=Ãmidexp(i{tilde over (ϕ)}mid). First, it is propagated to the target hologram plane and pre-processed by a second CNN to yield the pre-encoding target hologram prediction
where doffset is the signed distance from the midpoint hologram to the target hologram plane, Ãtgt-pre is the normalized amplitude, and ãtgt-pre is the scale multiplier. The second CNN serves as a content-adaptive filter to replace the Gaussian blur in AA-DPM. The exponential phase correction term ensures that the phase after propagation is still roughly centered at 0 for all color channels. It is also important to the success of AA-DPM, which minimizes phase wrapping. Next, the standard double phase encoding is applied to obtain a phase-only hologram
P(x, y)=0.5ãtgt-preexp(i({tilde over (ϕ)}tgt-pre(x, y)−cos−1Ãtgt-pre(x, y))), when x+y is odd, and
P(x, y)=0.5ãtgt-preexp(i({tilde over (ϕ)}tgt-pre(x, y)+cos−1Ãtgt-pre(x, y))), when x+y is even.
and no pre-blurring is applied in contrast to AA-DPM. Third, the phase-only hologram is filtered in the Fourier space to obtain the post-encoding target hologram prediction
{tilde over (H)}
tgt-post =−1((P)⊙MFourier),
where MFourier models a circular aperture in the Fourier plane
Here, r is the radius of the aperture in the pixel space. We set it to half of the image resolution, which lets the entire first-order diffraction pass through the physical aperture. Finally, the post-encoding target hologram prediction is propagated back to yield the post-encoding midpoint hologram
{tilde over (H)}
mid-post=ASM({tilde over (H)}tgt-post,doffset).
By appending these operations, the second stage unsupervised training fine-tunes the CNN prediction using the dynamic focal stack loss calculated between the post-encoding midpoint hologram and the ground truth midpoint hologram, plus a regularization loss on the pre-encoding target hologram phase
The regularization loss encourages the pre-encoding target hologram phase to be zero mean and to exhibit a small standard deviation. This term minimizes phase wrapping during the double phase encoding, which may not affect the simulated image quality but may degrade the experimental result. Without this loss, the unregulated phase exhibits a large standard deviation and shifts away from zero mean, leading to non-negligible phase wrapping, especially when the maximum phase modulation is limited to 2π.
In the second training stage, direct supervision from the ground truth midpoint hologram is intentionally ablated. This expands the solution space by allowing the CNNs to freely explore the neural filtering to optimally match the ground truth focal stack, which a user ultimately sees. It also facilitates regularization on the pre-encoding target hologram phase to better handle hardware limitations (i.e., limited range of phase modulation). In practice, the resulting prediction of the post-midpoint hologram phase visually differs from the ground truth as high-frequency details are attenuated or altered in a spatially-varying and content-adaptive manner to avoid speckle noise. With direct supervision that encourages retention of high-frequency details, we find it negatively impacts speckle elimination.
Collectively, the two-stage training first excels at reproducing the ground truth complex 3D holograms at all levels of detail, then fine-tunes a display-specific CNN for fully automatic speckle-free 3D phase-only hologram synthesis. The second training stage takes fewer iterations to converge, therefore, upon the completion of the first training stage, it is efficient to optimize multiple CNNs for different display configurations. The training process is detailed below.
The CNNs are implemented and trained using TensorFlow 1.15 on an NVIDIA RTX 8000 GPU with Adam optimizer. The hologram synthesis CNN consists of 30 convolution layers with 24 3×3 kernels per layer. The pre-filtering CNN uses the same architecture, but with only 8 convolution layers and 8 3×3 kernels per layer. When the target hologram coincides with the midpoint hologram, the pre-filtering CNN can be omitted. The learning rate is 0.0001 with an exponential decay rate of β1=0.9 for the first moment and β1=0.99 for the second moment. The first stage training runs for 3000 epochs. The second stage training first pre-trains the pre-filtering CNN 50 epochs for identity mapping and then 1000 epochs jointly with the hologram synthesis CNN. The pre-training accelerates the convergence and yields better results. Both versions of CNN use a batch size of 2, wdata=1.0, wpcp=1.0, wtgt-pcp=0.07, where wdata, wpcp, wtgt-pcp are the weights for the data fidelity loss, the dynamic focal stack loss, and the regularization loss.
The experimental setup uses a HOLOEYE PLUTO (VIS-014) phase-only LCOS with a resolution of 1920×1080 pixels and a pixel pitch of 8 μm. This SLM provides a refresh rate of 60 Hz (monochrome) with a bit depth of 8 bits. The laser is a FISBA RGBeam single-mode fiber-coupled module with three optically aligned laser diodes at wavelengths of 638, 520, and 450 nm. The diverging beam emitted by the laser is collimated by a 300 mm achromatic doublet (Thorlabs AC254-300-A-ML) and polarized (Thorlabs LPVISE100-A) to match the SLM's function polarization direction. The beam is directed to the SLM by a beamsplitter (Thorlabs BSW10R), and the SLM is mounted on a linear translation stage (Thorlabs XRN25P/M). When displaying holograms with different relative positions to the 3D volumes, we adjust the linear translation stage to keep the position of 3D volumes stationary and thus avoid modification of the following imaging optics. The modulated wavefront is imaged by a 125 mm achromat (Thorlabs AC254-125-A-ML) and magnified by a Meade Series 5000 21 mm MWA eyepiece. An aperture is placed at the Fourier plane to block excessive light diffracted by the grating structure and higher-order diffractions. A SONY A7M3 mirrorless full-frame camera paired with a 16-35 mm f/2.8 GM lens is used to photograph the results. A Labjack U3 USB DAQ is used to send field sequential signals and synchronize the display of color-matched phase-only holograms.
Hardware imperfection can cause experimental results to deviate from the idealized simulations. Here we discuss methods to compensate three sources of error: laser source intensity variation as a Gaussian beam, SLM's non-linear voltage-to-phase response, and optical aberrations.
To calibrate the laser source intensity variation, we substitute the SLM with a diffuser and capture the reflected beam as a scaling map for adjusting the target amplitude. A 5×5 median filter is applied to the measurements to avoid pepper noise caused by dust on the optical elements. A Gaussian mixture model can be used to fit an analytical model of the resulting scaling map if needed.
For an imprecisely calibrated SLM, the non-linear voltage-to-phase response can severely reduce display contrast, especially for double-phase encoded hologram since achieving deep black requires offsetting the checkerboard grating accurately by 1π. In many cases, the pixel response is also spatially non-uniform, thus using a global look-up table is often inadequate. Other calibration methods may operate on the change of interference fringe offset or the change of near/far-field diffraction pattern, but they do not produce a per-pixel look-up table (LUT). The present calibration procedure that uses double phase encoding to accomplish this goal. Specifically, for every 2-by-2 pixels, we keep the top right and bottom left pixels at 0 as a reference and increase the top left and bottom right pixels jointly from 0 to 255. Without modifying the display layout, we set the camera focus on the SLM and capture the change of intensity for the entire frame. If the phase modulation range for the operating wavelength is greater equal than 2π, the intensity of the captured image will decrease to the minimum at 1 kπ offset, return to the maximum at 2π offset, and repeat this pattern for every 2π cycle. Denote the k -th captured image Ik, the absolute angular difference in the polar coordinate between a reference pixel and an active pixel set to k is
where Imin(x, y) and Imax(x, y) are the minimal and maximal intensities measured at location (x, y) when sweeping from 0 to 255. Let kmin(x, y) be the frame id associated with the minimal measurement at (x, y), the phase difference is given by
Experimentally, we take high-resolution measurements (24 megapixels) of the SLM response, downsample to the SLM resolution, perform the aforementioned calculations, and fit a linear generalized additive model (GAM) with monotonic increasing constraint to obtain a smoothed phase curve for producing a per-pixel LUT. For simplicity, the LUT is directly loaded into the GPU memory for fast inference. To reduce memory consumption, a multi-layer perceptron can be learned and applied as a 1×1 convolution. This in-situ calibration procedure eliminates potential model mismatch between a separate calibration setup and the display setup. The ability to accurately address phase difference results in more accurate color reproduction, i.e., producing deep black by accurately addressing 1π phase offset.
The optical aberrations are corrected using a variant of a technique described in Maimone, A., Georgiou, A. & Kollin, J. S. “Holographic near-eye displays for virtual and augmented reality,” ACM Trans. Graph. 36, 1-16 (2017). Let ϕd∈R
to model system aberrations, where aj
Φd=ATFd=(PSFd)=(ASM(ϕd,d)), which we use to perform frequency-domain aberration correction for the occlusion-processed layer. Note that this calibration procedure can be performed for different focal distances, and parameters can be piecewise linearly interpolated.
For compact setups with strong aberrations, spatially-varying aberration correction is often needed. In this case, we can calibrate the display at multiple points (i.e., 15 points) and update the above procedure by convolving a spatially varying PSFd (x, y) calculated by interpolating the nearest measured parameters. Note that this operation can only be performed in the spatial domain but not in the Fourier domain. However, GPUs can accelerate this process, and speed is ultimately not critical for the sake of dataset generation. On the learning side, the CNN needs to receive an additional two-channel image that records the normalized x-y coordinates to learn aberration correction in a spatially-varying manner.
Examples of the approaches described above may be implemented in software, in hardware, or in a combination of software and hardware. Hardware may include application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs) and the like. Software includes instructions, which are generally stored on non-transitory machine-readable media. Such instructions may be executed by a physical (i.e., circuit-implemented) processor, such as a general purpose computer processor (e.g., a central processing unit, CPU), a special purpose or application-specific processor, or an attached processor, such as a numerical accelerator, graphics processing unit (GPU). The processor may be hosted in a variety of devices. For example, for presentation of three-dimensional content to a user, the hologram generation procedures described above may be executed in whole or in part on a mobile user device, such as a smartphone, a head-mounted display system (e.g., 3D goggles), or as part of a heads-up display (e.g., an in-vehicle display). Some or all of the procedures may be performed at a central computer, for example at a network accessible server in a data center (which may be referred to as being in the “cloud”). Some of the computations may be performed while content is being displayed, while some of the computations may be performed in non-real time before presentation. For example, and LDI representation of an image may be performed before presentation of a three-dimensional video to the user. Training procedures can be performed on different computers than the runtime rendering of content to the user. Incorporating optics corrections into the neural networks may be done ahead of time, for example, prior to fielding a system to a particular user. In other examples, the neural networks are adapted to the characteristics of a particular user by modifying the parameters of the neural networks prior to using them for display to the user.
A number of embodiments of the invention have been described. Nevertheless, it is to be understood that the foregoing description is intended to illustrate and not to limit the scope of the invention, which is defined by the scope of the following claims. Accordingly, other embodiments are also within the scope of the following claims. For example, various modifications may be made without departing from the scope of the invention. Additionally, some of the steps described above may be order independent, and thus can be performed in an order different from that described.
This application claims the benefit of: U.S. Provisional Application No. 63/167,441, filed Mar. 29, 2021 and titled “Data-Efficient Photorealistic 3D Holography using Layered Depth Images and Deep Double Phase Encoding”; PCT Application No. PCT/US21/28449, filed Apr. 21, 2021, published as WO/2021/216747 on Oct. 28, 2021, and titled “Real-Time Photorealistic 3D Holography with Deep Neural Networks”; and U.S. Provisional Application No. 63/257,823, filed Oct. 20, 2021 and titled “Data-Efficient Photorealistic 3D Holography,” which are incorporated herein by reference. For United States purposes, this application is a Continuation-in-Part (CIP) of PCT Application No. PCT/US21/28449, filed Apr. 21, 2021, which claims the benefit of U.S. Provisional Applications 63/013,308, filed Apr. 21, 2020, and 63/167,441, filed Mar. 29, 2021, which are each incorporated herein by reference.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2022/021853 | 3/25/2022 | WO |
Number | Date | Country | |
---|---|---|---|
63257823 | Oct 2021 | US | |
63167441 | Mar 2021 | US | |
63013308 | Apr 2020 | US | |
63167441 | Mar 2021 | US |
Number | Date | Country | |
---|---|---|---|
Parent | PCT/US2021/028449 | Apr 2021 | WO |
Child | 18284920 | US |