1. Field of the Invention
This disclosure relates generally to multi-focal displays.
2. Description of Related Art
Multi-focal displays (MFDs) typically use rapid temporal and focal modulation of a series of 2-dimensional images to render 3-dimensional (3D) scenes that occupy a certain 3D volume. This series of images is typically focused at parallel planes positioned at different, discrete distances from the viewer. The number of focal planes directly affects the viewers' eye accommodation and 3D perception quality of a displayed scene. If a given 3D scene is continuous in depth, too few planes may make the MFD rendering look piecewise with discontinuities between planes or result in contrast loss. More planes is typically better in terms of perceptual quality, but can be more expensive to implement and often may not be achievable because of practical display limitations including bandwidth and focal modulation speed.
Therefore, an important consideration for MFDs is the focal plane configuration, including the number of focal planes and the location of the focal planes (that is, distances from the viewer). Multi-focal displays typically use focal plane configurations where the number and location of focal planes are fixed. Often, the focal planes are uniformly spaced. This one size fits all approach does not take into account differences in the scenes to be displayed and the result can be a loss of spatial resolution and perceptual accuracy.
Thus, there is a need for better approaches to determining focal plane configuration.
The present disclosure overcomes the limitations of the prior art by selecting the locations of the focal planes for a multi-focal display, based on an analysis of the scene to be rendered by the multi-focal display. In one example, a distortion metric is defined that measures a distortion between an ideal rendering of a three-dimensional scene versus the rendering by a limited number of focal planes in the multi-focal display. The locations of the focal planes are selected by optimizing the distortion metric. One distortion metric is based on differences between the location of a point in the ideal rendering versus the location of the closest focal planes of the multi-focal display. Another distortion metric is based on differences in the defocus blurring for the ideal rendering versus the rendering by the multi-focal display.
Other aspects include components, devices, systems, improvements, methods, processes, applications, computer readable mediums, and other technologies related to any of the above.
Embodiments of the disclosure have other advantages and features which will be more readily apparent from the following detailed description and the appended claims, when taken in conjunction with the accompanying drawings, in which:
The figures depict various embodiments for purposes of illustration only. One skilled in the art will readily recognize from the following discussion that alternative embodiments of the structures and methods illustrated herein may be employed without departing from the principles described herein.
The figures and the following description relate to preferred embodiments by way of illustration only. It should be noted that from the following discussion, alternative embodiments of the structures and methods disclosed herein will be readily recognized as viable alternatives that may be employed without departing from the principles of what is claimed.
Introduction
Optional pre-processing module 130 receives data representing the 3D scene to be rendered and adapts it to rendering requirements. For example, pre-processing module 130 may perform functions such as magnifying, cropping and sharpening. Focal plane placement module 140 analyzes the content of the 3D scene and selects the locations of the focal planes based on the content analysis. The selection can also be based on rendering requirements. Scene separation module 150 separates the 3D scene into the constituent 2D images to be rendered. This typically involves depth blending, as will be described below. The content of each 2D image will depend on the focal plane locations. Rendering engine 160 then renders the 2D images onto the display, in coordination with adjustment of the optical element 120 to effect the different focal planes. Additional post-processing can also be performed. For example, smoothing constraints (temporal and/or spatial) may be applied, or occlusion edges may be processed to further improve perceived quality.
In
Depth Blending
MFD technology can represent a 3D scene by a series of 2D images at different focal planes due to a concept known as depth blending. By illuminating two adjacent focal planes simultaneously, a focus cue may be rendered at any axial distance between the planes. Since the two focal planes lie along a line of sight, the luminance provided by each of the adjacent focal planes determines where the cue will be highest (where the eye perceives the highest visual quality, or where the area under the modulation transfer function (MTF) observed by the eye is highest).
A simple form of luminance weighting used for depth blending is a linear interpolation of the luminance values observed by each pixel for the adjacent focal planes, which we will use as an example although other types of depth blending can also be used. Let wn and wf respectfully denote the luminance weights given to the near and far focal planes. These values, which sum to 1 to retain the correct luminance perceived by the eye, are computed as follows:
where zn and zf are the locations of the near and far focal planes and z is the actual location of the object in the 3D scene, which is between zn and zf. In this linear formulation, if z=zn (object point at the near focal plane), then wf=0 and wn=1, meaning that all of the luminance is allocated to the near focal plane. Conversely, if z=zf (object at the far focal plane), then wf=1 and wn=0, and all of the luminance is allocated to the far focal plane. For an intermediate position such as z=(zn+zf)/2, then wf= 1/2 and wn= 1/2 so luminance is split between the far and near focal planes. In this way, a virtual object can be rendered at any position z between zn and zf by splitting its luminance between the two images rendered at focal planes zn and zf.
Problem Formulation
We first formulate the problem of placement of focal planes based on a given objective function, and then show two examples of different objective functions. The objective function typically is a type of distortion metric that measures a distortion between an ideal rendering of the 3D scene versus the rendering by the MFD.
Let (x,y,z) denote the two transverse dimensions and the axial dimension of the 3D space rendered by the MFD. In practice, what we are typically given are the following quantities:
an N-voxel 3D scene to be projected S={(pn, In), n=1, . . . , N}, where pn=(xn,yn,zn) denotes a vector of 3D coordinates of a 3D point, and In denotes the intensity or color value of that 3D point. These points can be obtained by a 3D camera or generated by a computer graphics engine, for example.
number of available depth planes M
Given these quantities, we want to estimate the following unknown variables:
position of focal planes q=(q1, q2, . . . , qM). Note that the values qm are actually z-coordinates of focal planes and that the focal planes are fronto-parallel to the eye. We use q instead of z to clearly separate the focal plane positions from other z values.
To estimate the best positions of focal planes, we formulate the following optimization problem:
where the objective function D(S, q) denotes a distortion error metric for representing a 3D scene S on M focal planes positioned at q=(q1, q2, . . . , qM). This can in general be any metric that minimizes the error compared to a perfect rendering.
Alternately, we can pose the optimization problem such that it finds a solution for focal plane placement that maximizes the quality of the 3D scene rendering Q(S, q):
In the following, we show two specific examples of automatic focal plane placement. In the first example, we use an error metric D(S,q) and minimize it to obtain q. In the second example, we use a quality metric Q(S,q) that can be used for focal plane placement. Other distortion metric functions, including other error or quality metrics, can be used as well.
The first example of an objective function can be derived by considering the problem of focal plane placement as a clustering problem. Given the z-coordinates of all 3D data points in a scene. That is, given z1, z2, . . . , zN, we can use the K-means algorithm to find the best placement of M focal planes. In this case, our optimization problem becomes:
Solving this problem using the K-means algorithm gives a placement of focal planes such that the focal planes used to represent 3D data are close to the actual location of the data. Hence, in most cases this optimization problem will give a solution different from the conventional strategy of uniform focal plane spacing. Note that in the optimization above, instead of distance z in meters, we can also use distance in diopters (inverse meters) or other measures of optical power, in order to take into account for the decreasing sensitivity of depth perception with increasing distance.
Spatial frequencies of the content also impact accommodative response when depth blending is used. For low-frequency stimuli (for example, 4 cycle per degree or cpd), linear depth blending can drive accommodation relatively accurately between planes. But for high-frequency stimuli (for example, 21 cpd) and broadband stimuli (for example, 0-30 cpd), accommodation is almost always at or near a focal plane no matter how the luminance weights wf, wn are distributed. Therefore, a weighted K-means algorithm can be used to take this spatial frequency dependency into account. For example, if the spatial frequency or spatial gradient value near a point is higher than a threshold, it can be assigned a large weight, otherwise it can be assigned a small weight. Denote
Table 1 below shows the focal plane positions using uniform focal plane spacing, using K-means focal plane spacing and using weighted K-means focal plane spacing.
These focal plane locations are also shown by the arrows above the graph in
K-means is used just as an example. Other clustering techniques can be applied, for example clustering based on Gaussian Mixture Models (GMM) or support vector machines (SVM).
When a given 3D scene with continuous depth values is displayed on a multi-focal display with a finite number of focal planes, human eyes will perceive it with a certain amount of defocus compared to an ideal continuous 3D rendering. We describe here a model of that defocus, which we then use within our objective function for focal plane placement. Namely, our objective function will place the focal planes such that it maximizes the quality of the 3D scene rendering by minimizing the defocus.
Optical defocus is typically modeled through Fourier optics theory, in a continuous waveform domain. Therefore, assume that a given 3D scene is a set of samples from a continuous 3D function f(x,y,z), where we have that In=f(xn,yn,zn) for n=1, 2, . . . , N given points in our 3D scene. We first provide a Fourier derivation of a human eye's sensitivity to defocus and then use the derived theory to define a quality metric for a given 3D scene.
Let primed coordinates (x′, y′) denote the retinal coordinates. When the eye accommodates to a distance ze, a 2D retinal image g(x′, y′) may be expressed as a convolution of the 3D object with the 3D blur kernel h(x, y, z) evaluated at a distance ze−z, followed by integration along the axial dimension:
g(x′, y′, ze)=∫∫∫f(x, y, z)h(x−x′, y−y′, ze−z)dxdydz. (9)
Note that in the case of in-focus plane-to-plane imaging (ze−z=0), the convolution kernel h reduces to the eye's impulse response. This configuration yields maximum contrast, where contrast is defined in the conventional way in the spatial frequency domain. Deviations from that in-focus imaging result in a reduction in contrast. The severity of the lost contrast depends on the amount of defocus.
To quantify the effects of defocus, we turn to the pupil function of the eye's optical system. For a rotationally-symmetric optical system with focal length F and circular pupil of diameter A, the lens transmittance through the exit pupil is modeled as:
where the pupil function P is given by
In our system, the pupil diameter A may vary between ˜2-8 mm based on lighting conditions. Though the eye is, in general, not rotationally symmetric, we approximate it as such to simplify formulation in this example.
In the presence of aberrations, the wavefront passing through the pupil is conventionally represented by the generalized pupil function G(x, y)=P(x, y)exp(iφ(x, y)), where the aberration function φ is a polynomial according to Seidel or Zernike aberration theory. The defocus aberration is commonly measured by the coefficient w20 of φ. Defocus distortion can alternatively be modeled by including a distortion term θz in the pupil function and defining the pupil function of a system defocused by distance θz in axial dimension as
P
θ
(x, y)=exp (πi(θz/λ)(x2+y2))P(x, y), (11)
where θz=1/z+1/zr−1/F with zr being the distance between the pupil and the retina. The relationship between θz and the conventional defocus aberration coefficient w20 is given by θz=2w20/A2. Using this formulation, we can formulate the defocus transfer function, which is the optical transfer function of the defocused system, as the auto-correlation of the pupil function of the defocused system as follows:
Now we replace the defocus distortion distance θz with 1/ze−1/z and define the normalized defocus transfer function (DTF) of the eye as
Optical aberrations of the eye and/or the MFD system can be modeled into the DTF as well.
The image as formed on the retina is described by the multiplication of the defocus transfer function and the Fourier transform of the function f(u,v,z) describing the object displayed at distance z from the eye by
ĝ(u, v, z, ze)=Ĥ(u, v, z, ze){circumflex over (f)}(u, v, z). (14)
In a MFD system, we can typically display only a small number of focal planes fast enough to be perceived as simultaneously displayed by the human eye. For the case that two objects are being displayed at two focal planes located at distances q1 and q2 away from the eye, the eye integrates the two objects as imaged through the eye's optical system. That is, it integrates over the light emitting from the two objects after passing through the eye's optical system described by the defocus transfer function. We derive this image formation at the retina plane by the following formula
ĝ
r(u, v, q1, q2, ze)=Ĥ(u, v, q1, ze){circumflex over (f)}(u, v, z)+Ĥ(u, v, q2, ze){circumflex over (f)}(u, v, z). (15)
If linear depth blending is applied to the input scene f(x,y,z), using coefficients w1 and w2, then the Fourier transform of perceived image on the retina is described by
ĝ
r(u, v, q1, q2, ze)=w1Ĥ(u, v, q1, ze){circumflex over (f)}(u, v, z)+w2Ĥ(u, v, q2, ze){circumflex over (f)}(u, v, z). (16)
Using this observation, we define the depth-blended defocus transfer function of the entire system as
Ĥ
blend(u, v, (q1, q2), ze)=w1Ĥ(u, v, q1, ze)+w2Ĥ(u, v, q2, ze), (17)
We can also generalize this blending function using all display planes q1, . . . , qM to derive an effective or blended transfer function for the multi-focal display as:
Depth blending drives the accommodation of the eye to a focal plane with a Ĥblend(u, v,q, ze) closest to the ideal DTF curve. We can see from
The eye will accommodate to a distance that maximizes the area under the DTF. However, since that distance depends on the spatial frequency, we further assume that the eye will accommodate to the distance that maximizes a certain quality metric QDM(S,q) based on this defocus measure (area under the DTF). Since this distance varies with each patch, we seek a solution that incorporates all of the patches into a single metric.
In one approach, we partition the displayed image f(x,y,z) into Np patches fi(x,y,zi), i=1, . . . , Np, where zi is a scalar representing the ith patch's mean object distance. Overlapping patches may be used. We may compute each patch's Fourier transform and multiply it with the depth-fused DTF to find the information transferred from a stimulus to the eye according to a placement of focal planes located at q={q1, q2, . . . , qM} and a local stimulus located at distance zo to compute the scalar value βi for each patch:
βi(zi,q)=∫u
where [u0, u1] and [v0, v1] denote the frequency interval of interest. Other metrics describing the object's information content, such as measures of contrast, entropy, or other transformative metrics could be used to define βi(zi, q) as well.
If we store the metrics from all of the patches into a vector β we can alter the focal plane placement for up to M focal planes. We seek to solve the following optimization problem to find q*, the optimal set of dioptric distances to place the available focal planes:
which can be relaxed or adjusted if not solvable in realistic time.
The resulting entries of q* signify where best to place the set of M focal planes. For example, optimizing 2 focal planes to represent 3 objects clustered about dioptric distances of 1/z1=0.6D, 1/z2=1.5D; 1/z3=2.0D might result in the optimal focal plane placement of 1/q1=1.1D, 1/q2=1.8D.
The solution for q could begin with an initial guess of uniform focal plane spacing based on the available focal planes. For example, a 6-plane system seeking a workspace between 0 and 3 diopters could start with {0, 0.6, 1.2, 1.8, 2.4, 3.0}D. As the optimization algorithm iterates through iterations k, the entries of q would change until |QDMk(S,q)−QDMk+1(S,q)|≦ε, where ε is a tolerance parameter telling the algorithm when to stop. Extra specifications could be incorporated into the optimization algorithm to constrain the feasible solution set, as well.
Finally, note that the metric QDM(S, q) quantifies the quality of the rendering of a given 3D scene, with respect to defocus. Therefore, in addition to focal plane placement, this metric can be also used for rendering quality assessment in MFDs.
The eye's accommodation was varied in increments of 0.1D between these two focal planes. The accommodation is between −0.3 and +0.3D, where +0D corresponds to the dioptric midpoint of the focal planes at q1 and q2.
That is, the top left square is an image of a 9 cpd image where the eye accommodates to −0.3 D. For the top middle square, the eye accommodates to −0.2 D, and so on. The bottom middle and bottom right squares are not used, so they are left blank.
Although the detailed description contains many specifics, these should not be construed as limiting the scope of the invention but merely as illustrating different examples and aspects of the invention. It should be appreciated that the scope of the invention includes other embodiments not discussed in detail above. For example,
In another aspect, in addition to selecting the locations of the renderable volumes, the multi-focal display also selects the number of renderable volumes. In the original example with six focal planes, the multi-focal display might determine the number M of focal planes where M can be up to six. Less than the maximum number may be selected for various reasons, for example to reduce power consumption.
In yet another aspect,
Various other modifications, changes and variations which will be apparent to those skilled in the art may be made in the arrangement, operation and details of the method and apparatus of the present invention disclosed herein without departing from the spirit and scope of the invention as defined in the appended claims. Therefore, the scope of the invention should be determined by the appended claims and their legal equivalents.
In alternate embodiments, aspects of the invention are implemented in computer hardware, firmware, software, and/or combinations thereof. Apparatus of the invention can be implemented in a computer program product tangibly embodied in a non-transitory machine-readable storage device for execution by a programmable processor; and method steps of the invention can be performed by a programmable processor executing a program of instructions to perform functions of the invention by operating on input data and generating output. The invention can be implemented advantageously in one or more computer programs that are executable on a programmable system including at least one programmable processor coupled to receive data and instructions from, and to transmit data and instructions to, a data storage system, at least one input device, and at least one output device. Each computer program can be implemented in a high-level procedural or object-oriented programming language, or in assembly or machine language if desired; and in any case, the language can be a compiled or interpreted language. Suitable processors include, by way of example, both general and special purpose microprocessors. Generally, a processor will receive instructions and data from a read-only memory and/or a random access memory. Generally, a computer will include one or more mass storage devices for storing data files; such devices include magnetic disks, such as internal hard disks and removable disks; magneto-optical disks; and optical disks. Storage devices suitable for tangibly embodying computer program instructions and data include all forms of non-volatile memory, including by way of example semiconductor memory devices, such as EPROM, EEPROM, and flash memory devices; magnetic disks such as internal hard disks and removable disks; magneto-optical disks; and CD-ROM disks. Any of the foregoing can be supplemented by, or incorporated in, ASICs (application-specific integrated circuits) and other forms of hardware.
The term “module” is not meant to be limited to a specific physical form. Depending on the specific application, modules can be implemented as hardware, firmware, software, and/or combinations of these. Furthermore, different modules can share common components or even be implemented by the same components. There may or may not be a clear boundary between different modules.
This application claims priority under 35 U.S.C. §119(e) to U.S. Provisional Patent Application Ser. No. 62/084,264, “Content-Adaptive Multi-Focal Display,” filed Nov. 25, 2014. The subject matter of all of the foregoing is incorporated herein by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
62084264 | Nov 2014 | US |