The present disclosure relates to image conversion and, more particularly, to conversion of imagery and/or video for three-dimensional (3D) displays, four-dimensional (4D) experiences, next-generation user interfaces (next-gen UIs), virtual reality (VR), augmented reality (AR), mixed reality experiences, interactive experiences, and the like into imagery and/or video suitable for a two-dimensional (2D) display. In some embodiments, a 2D display is configured to generate a 3D-like effect. Throughout the present disclosure, reference to “3D” is not intended to be limiting and includes applications to 3D, 4D, next-gen UI, VR, AR, mixed reality, interactive experience technologies, and the like without limitation. Further, reference to “2D” is not intended to be limiting and includes, for example, applications for displays that may be relatively flat, slightly curved, flexible, multi-faceted, and the like without limitation provided the display utilizes 2D principles for display of images to observers.
Despite recent advances in 3D display technologies, adoption is limited due to a lack of viewer comfort. For instance, special glasses or wearable devices including head-mounted devices (HMDs) are required. However, these glasses and devices are often considered uncomfortable for viewers and cause fatigue and nausea, especially after extended use. Also, 3D viewing using stereo glasses (e.g., color filtering anaglyph or polarized glasses) contributes to undesirable viewer fatigue and cross-talk.
In one conventional approach, multi-view displays are provided that do not require special glasses or HMDs; however, such multi-view displays are limited by a fixed number of views and a requirement that a viewer switch between discrete viewing points. In another conventional approach, directional backlights are used to achieve a 3D effect. However, cross-talk also occurs in these devices. These conventional multi-view and directional backlight displays also tend to generate relatively small and thus undesirable displays.
Further, conventional free-viewpoint displays are problematic since viewers must be at a relatively close distance to observe the 3D effect. As such, conventional free-viewpoint displays are unsuitable for comfortable viewing from a distance, like watching a large display device from a typical viewing distance. Also, special barriers are often required to create multiple views, require relatively high energy consumption, and reduce brightness significantly relatively to 2D displays.
As such, a need has arisen for methods and systems that overcome these problems and deliver an improved viewing experience.
Methods, systems, devices, techniques, and articles are described that provide, among other advantages, an easy way to enjoy complex content—that is, 3D, 4D, next-gen UI, VR, AR, mixed reality, and interactive content, and the like—on a 2D display device. The complex content is enjoyed without the discomfort, fatigue, nausea, cross-talk, small format, limited viewpoints, inability to watch at a typical distance, high energy consumption, low brightness, and related problems associated with conventional devices and systems otherwise required to enjoy the complex content. Further, the complex content is accessed without a need to acquire relatively expensive equipment or systems; that is, most consumers may utilize the present methods and systems with a display device they already own. Still further, with popularity exemplified by sites such as Twitch.tv (approximately 140 million unique visitors per month on 100,000+ concurrent channels as of 2022), the present methods and systems are useful for enhancing the ability of an individual, a group, or multiple groups interested in watching others playing or interacting in complex content environments such as those provided for gaming and eSports.
3D content is converted to a format suitable for display on a 2D device. The conversion process involves various methods of determining values and parameters of 3D input and representing the determined values and parameters for the 2D display in an optimal manner.
A value for an effect implemented to display a 2D representation of a 3D scene on a 2D display device may be determined. The value may be numeric, but is not limited to such. The effect includes any effect that provides a 3D effect or 3D-like effect in the 2D environment. Possible effects include but are not limited to depth, motion, shadow, focus, sharpness, intensity, color, effects derived from the same, combinations of these effects, and the like.
A default set of values and parameters may be determined. The values and parameters may be modeled and then implemented. Artificial intelligence, including neural networks and adversarial networks, may be utilized for training and modeling. User feedback, either of individual users or groups, may be utilized for optimization.
Inputs include 3D input such as 3D rendering data, signals sent to and received from a 3D display device, various gestures and movements performed by the operator of the 3D equipment, position of the 3D equipment, operations performed in the 3D environment, and the like.
Once the 3D display data is received, various values and parameters are extracted and processed to set up the conversion. Information associated with various effects (including depth, motion, shadow, focus, sharpness, intensity, color, effects derived from the same, combinations of these effects, and the like) are converted to a 2D analog of each effect. Non-limiting examples of conversions are disclosed herein.
Given a default or baseline set of values and/or parameters for the 3D effects, the 3D effects may be modified, converted to 2D form, displayed, modeled, and tested with human and/or machine systems for optimization of the 2D replica of the 3D effect. Feedback from the modeling and/or testing results in optimal patterns for various uses, content types, environments, and the like. The optimized 3D-to-2D conversion may be performed in advance and utilized as a default set, optimized periodically, or continuously optimized in real time or near real time (within practical processing limits).
The conversion is optimized with processes focused, for example, on movements, gestures, and actions performed in the 3D environment; determination of depth or distance of objects in the 3D environment relative to the viewpoint; biological realities (e.g., left eye/right eye considerations); and the like. Movement of any body part may be utilized. For instance, hand, finger, eye, and head movements are detected and parameterized. A speed of an alteration of the 3D-to-2D converted display is adjusted and optimized. Movements, gestures, and commands in the 3D environment are converted to pan, tilt, and zoom actions in the 2D environment. In some embodiments, a focus of the user and a region of interest (e.g., related to game play) in the 3D environment are determined. The focus and the region of interest may be determined based on eye movement. The 2D display may be zoomed in to and/or focused on the determined region of interest.
In some embodiments, 3D data for generating the 3D scene on a 3D display device is transmitted by a network to an intermediate processing module and then from the processing module to the 2D display device configured to display the changed display. In other embodiments, such data is directly transmitted to the 2D display for processing and display within a 2D display configured for processing the conversion. The 3D data may include assets, textures, and/or animations.
The 2D display device may be configured to send group feedback and/or user feedback to a server configured to generate 2D data for the changed display. The group feedback and/or user feedback may be used to optimize the changed display.
In some embodiments, a set-top box (STB) is configured with a graphical processing unit configured to generate the changed display.
The neural network module may include a generative adversarial network (GAN) trained to produce the changed display. The GAN may be trained by varying a motion parameter, a shadow parameter, a focus parameter, a sharpness parameter, an intensity parameter, and/or a color parameter. The GAN may include a U-net with at least one layer of resolution. The U-net may include iterative pooling and/or upsampling. The GAN may include coupling at least one 3D convolution block with at least one rectified linear unit. The GAN may include a subjective score of the changed display from a human observer or a human judge. The GAN may include generating a perceptual score of the changed display based on at least one no-reference perceptual quality metric. The GAN may include generating with a neural network a perceptual score comparing the changed display with a reference display.
Various calculations may be employed to improve the conversion. For instance, the rendering data may include a calculation of a color depending on a distance by increasing a saturation with the distance, a calculation of an intensity depending on a distance, and/or a calculation of an extent of a focus depending on a distance. The calculation may be defined for different ranges of depths. The rendering data may include a binary variable for optimizing view satisfaction. The binary variable may include bS: 1 indicating saturation modification based on depth is enabled, 0 indicating saturation is not modified, bI: 1 indicating intensity modification based on depth is enabled, 0 indicating intensity is not modified, bC: 1 indicating focus modification based on depth is enabled, 0 indicating focus is not modified, bMP: 1 indicating motion parallax is enabled, 0 indicating motion parallax is disabled, bMO: 1 indicating object motion is enabled, 0 indicating object motion is disabled, bSH: 1 indicating object shadow is enabled, 0 indicating object shadow is disabled, SbMP: indicating speed of view change when motion parallax is enabled, and/or SbMO: indicating speed of object motion when object motion is enabled. A plurality of binary variables including each of bS, bI, bC, bMP, bMO, bSH, SbMP, and SbMO may be utilized.
A method is provided to display a 3D representation of a scene on a 2D display device in a manner to provide a 3D perception to a viewer. An active modification or a passive modification of a view projected on the 2D display device is provided depending on a viewer preference or a viewer interaction. The active modification or the passive modification may include introducing a movement of an object based on the 3D representation of the scene, or a change in a viewpoint based on a free viewpoint video. The active modification may be based on at least one of a gesture made by the viewer, a head movement of the viewer, or an eye movement of the viewer. The passive modification may be based on an automatic movement of the object based on the 3D representation of the scene, or an automatic change in the viewpoint based on the free viewpoint video.
A speed of the movement of the object or a speed of the change in the viewpoint may be controlled by a parameter. The parameter controlling the speed may be learned through an active measurement of viewer satisfaction or a passive measurement of viewer satisfaction.
The 3D perception of the viewer is enhanced by an intensity variation associated with a depth, an intensity variation associated with a color variation associated with a depth, highlighting a shadow, controlling an extent of a focus based on a depth, and/or a factor that facilitates the 3D perception. The factor need not necessarily be an intensity variation associated with a depth, a color variation associated with a depth, highlighting a shadow, or controlling an extent of a focus based on a depth, and may include other types of parameters.
A method to train a neural network to generate a 2D projection enhancing depth perception is provided. A method for learning a viewer preference over a time in order to make a passive modification to a 2D view enhancing a 3D perception is provided.
A system is provided comprising circuitry configured to perform a method including any of the steps noted herein in any suitable combination. A device is configured to perform a method including any of the steps noted herein in any suitable combination. A device is provided comprising means for performing a method including any of the steps noted herein in any suitable combination. A non-transitory, computer-readable medium is provided having non-transitory, computer-readable instructions encoded thereon, that, when executed, perform a method including any of the steps noted herein in any suitable combination.
A system to track the head movement and the eye movement of the viewer to support the active modification of the view projected on the 2D display device is provided. A system to track the gesture or gestures of the viewer to support the active modification to the view projected on the 2D screen is provided. A system to train a neural network to generate a 2D projection enhancing depth perception is provided. A system to actively acquire a ground truth on viewer satisfaction is provided. A system to passively acquire a ground truth on viewer satisfaction is provided. Each system may be configured to perform a method including any of the steps noted herein in any suitable combination.
The present invention is not limited to the combination of the elements as listed herein and may be assembled in any combination of the elements as described herein.
These and other capabilities of the disclosed subject matter will be more fully understood after a review of the following figures, detailed description, and claims.
The present disclosure, in accordance with one or more various embodiments, is described in detail with reference to the following figures. The drawings are provided for purposes of illustration only and merely depict non-limiting examples and embodiments. These drawings are provided to facilitate an understanding of the concepts disclosed herein and should not be considered limiting of the breadth, scope, or applicability of these concepts. It should be noted that for clarity and ease of illustration these drawings are not necessarily made to scale.
The embodiments herein may be better understood by referring to the following description in conjunction with the accompanying drawings, in which like reference numerals indicate identically or functionally similar elements, of which:
The drawings are intended to depict only typical aspects of the subject matter disclosed herein, and therefore should not be considered as limiting the scope of the disclosure. Those skilled in the art will understand that the structures, systems, devices, and methods specifically described herein and illustrated in the accompanying drawings are non-limiting embodiments and that the scope of the present invention is defined solely by the claims.
A more natural method and system for viewing free-viewpoint video is provided. 3D content is made viewable on a 2D display device, rather than multi-view displays or special 3D display devices. Viewpoint changes of a viewer observing a 2D screen are combined with additional cues to 3D perception through conversions including changes in occlusion, silhouette boundaries, shadows, focus, and color variation. These changes may vary with perceived depth. An ability to see around occluding boundaries provides viewers a perception of 3D. In some embodiments, wearable devices track viewer movements and avoid using cameras to track head or eye movements. Such tracking information may enhance the 3D effect. Gestures may be used instead of or in addition to hand, eye, and/or head movements to effectuate changes during conversion of a 3D scene onto a 2D display. Throughout the present disclosure, a reference to “cue,” “effect,” value” or “parameter” is not intended to be limiting and includes factors, parameters, and conversions that contribute to creation of a 3D or 3D-like effect in the 2D environment. Further, the present specification incorporates by reference herein in their entireties the full disclosures of U.S. patent application Ser. Nos. 17/975,049 and 17/975,057, both titled “VISUAL EFFECTS AND CONTENT ENHANCEMENTS FOR VR,” and both filed Oct. 27, 2022. The '049 and '057 applications are directed to, inter alia, systems and methods to enable creation of an enhanced image (e.g., 2D or 3D images, photos, videos) of a view of a VR environment.
The present methods and systems achieve an ability to watch free-viewpoint video on a 2D display screen at a distance by modifying a viewpoint. For instance, when watching a sporting event, a viewing direction is optimally changed for the 3D-to-2D conversion. View modifications are provided without a need for a special remote control or a touch-based control on a smartphone. In some embodiments, gesture-based interaction is performed without cameras or real-time image processing, without a need to learn and properly execute gestures that might be natural for users, and without a need to make parts of the user's body clearly visible to cameras. Rather, a viewer's preferences may be learned over time, and modifications to the 2D screen may be made automatically based on learned preferences. The systems and methods learn an optimal strategy to create 3D perception on a 2D screen for a viewer. The procedure may start with a default strategy, which is then modified to best suit a specific user's preferences and perception.
A ground truth of viewer satisfaction in response to modifications made on a 2D screen may be obtained directly based on viewer feedback through gestures, voice, and/or adjustments made to wearable devices. Such ground truth may also be obtained indirectly through brain-computer interfaces and electroencephalogram (EEG) devices. Advanced devices like functional magnetic resonance imaging (fMRI) may also be used to obtain ground truth data and to provide an accurate baseline calibration based on a representative viewer group. Optimization and customization may be achieved one viewer at a time observing a 2D screen. Also, optimization and customization for multiple viewers observing a 2D screen at the same time may be achieved by optimizing the average satisfaction of a group of viewers.
Artificial intelligence (AI) and machine learning (ML) may be used to customize the 3D-to-2D conversion. In some embodiments, hand, eye, and/or head movements are tracked, and feedback from the viewer is obtained in response to changes made on the converted display. A combination of cues may be used in addition to hand, eye, and/or head movements to passively modify a 2D view to achieve 3D viewer perception. Mathematical modeling and optimization may be employed to determine optimal combinations of different cues. Direct and indirect measurements of viewer satisfaction may be used to learn the optimal combination of different cues.
The present methods and systems may combine cues, learn and customize cues based on evaluations, and apply mathematical modeling and implementation strategies to the same. While application of a single viewer interacting with digital content is disclosed, the present methods and systems are not limited thereto and are applicable to various viewing group environments, including movie theaters, sports bars, and displays before a group of viewers, e.g., family members and/or friends. In some embodiments, an STB may be provided to continually update a model to deliver content that achieves maximum user satisfaction.
The 3D device 205 may include a 3D rendering module 210 configured to generate 3D rendering data 215, which may be transmitted to a 3D display device 220. An input-output device 225 may be configured to send and receive data to and from the 3D display device 220 and provide feedback to the 3D rendering module 210. The input-output device 225 may be configured to send and receive data to/from external modules (disclosed herein). The 3D device 205 may be configured to send or receive data to/from a 3D-to-2D conversion module 250.
The 3D-to-2D conversion module 250 may be configured to extract at least one 3D effect/value/parameter 230 from the 3D device 205. The effect/value/parameter 230 may include at least one of a movement parameter 231, a depth parameter 232, a motion parameter 233, a shadow parameter 234, a focus parameter 235, a sharpness parameter 236, an intensity parameter 237, a color parameter 238, or an n-th parameter 239 that delivers a 3D or 3D-like effect.
The 3D-to-2D conversion module 250 may be configured to transmit the at least one effect/value/parameter 230 extracted from the 3D device 205 to the 2D device 260. The at least one effect/value/parameter 230 may be converted to a corresponding 2D effect/value/parameter 265. The 2D effect/value/parameter 265 may be transmitted to a 2D rendering module 270, which generates information for display on a 2D display device 275. An input-output device 280 may be configured to send and receive data to and from the 2D display device 275 and provide feedback to the 2D rendering module 270. The input-output device 280 may be configured to send and receive data to/from external modules. The 2D device 260 may be configured to send or receive data to/from the 3D-to-2D conversion module 250, a user feedback module 285, a group feedback module 290, and/or an AI/neural network/training/modeling module 295, one or more of which may be configured to send and receive data to/from the 3D-to-2D conversion module 250. The group feedback module 290 may be configured to receive data from other users. Each of the modules disclosed in summary hereinabove is described in greater detail hereinbelow. Each of the 3D-to-2D conversion module 250, the user feedback module 285, the group feedback module 290, and the AI/neural network/training/modeling module 295 may be provided as a part of the 3D device 205, as an independent module (as shown in
The process 300 may include converting 305 a video frame from a 3D scene into a 2D rendered view 315. The converting 305 of the video frame from the 3D scene may be based initially on default parameters 310. The default parameters 310 may include values associated with parameters that affect depth perception and that are used to generate the 2D rendered view 315. The default parameters 310 may include at least one of the movement parameter 231, the depth parameter 232, the motion parameter 233, the shadow parameter 234, the focus parameter 235, the sharpness parameter 236, the intensity parameter 237, the color parameter 238, or the n-th parameter 239 that delivers the 3D or 3D-like effect. The process 300 may include presenting a viewer 320 with an initial version of the 2D rendered view 315, and the viewer 320 may be prompted to provide feedback 325. The feedback 325 may be utilized to update the default parameters 310, which are output as updated parameters 330. Once sufficient feedback is collected and analyzed, custom parameters 335 for an individual viewer may replace the default parameters 310. See, e.g.,
The motion parameter 233 may include information relating to the conversion of images represented by
The intensity parameter 237 may include information relating to the conversion of images represented by
The color parameter 238 may include information relating to the conversion of images represented by
The shadow parameter 234 may include information relating to the conversion of images represented by
The focus parameter 235 may include information relating to the conversion of images represented by
One or both of the user feedback module 285 and the group feedback module 290 may include software for obtaining user and/or group feedback via a two-alternative forced choice (2AFC) process, screenshots of which are represented by
One or both of the user feedback module 285 and the group feedback module 290 may be operatively connected to wearable devices for passive measurement of user satisfaction. One advantage of passive measurement via wearable device is that viewers do not need to make any subjective decision since the electronic devices mounted on the head measure the satisfaction of the viewer passively. Brain machine interfaces (BMI) may be used to passively obtain feedback on viewer satisfaction. For example, products like the OpenBCI EEG headband kit may be used to passively obtain feedback. Furthermore, these types of BMI devices may be integrated into wearable head-mounted devices and are generally affordable (less than about $300 per unit).
A process 1400 for generating a neural network to create 2D rendered views with depth perception from a 3D scene is provided. Processes of the GAN are performed to create a GAN generator 1420 that may automatically generate 2D rendered 1430 views to give 3D perception to a viewer. The GAN generator 1420 may start with any generic architecture for rendering 2D scenes from 3D descriptions, with additional layers added to support parameter optimization to enhance 3D perception on 2D displays.
A neural network may be trained to produce the optimal 2D rendering for 3D perception of one or a group of viewers. This may be achieved for example by using an adversarial network that may optimize itself by competing with the results of a strategy using a combination of cues to affect 3D perception. Specifically, the GAN generator 1420 in
The process 1400 may include receiving 1405 a set of default parameters. A 3D-to-2D rendering may be converted 1415 based on the set of default parameters. An actual 2D view is generated 1425 based on the 3D-to-2D conversion (the “actual 2D view” may also be referred to as a “2D enhanced view” herein). The actual 2D view may be input into a GAN discriminator 1440. In parallel, the process 1400 may include receiving 1410 random parameters. The GAN generator 1420 may receive the random parameters to generate 1430 a 2D view, which is input into the GAN discriminator 1440. The GAN discriminator 1440 generates discriminator loss 1445 and/or generator loss 1450. That is, through the adversarial process, the GAN generator 1420 may continue to improve until the GAN generator does not discriminate between the output of the GAN generator 1420 and a customized computer graphics-based 3D-to-2D renderer tuned to enhance 3D perception on 2D displays. At this stage, the GAN generator 1420 may be used as a neural network that enhances 3D perception on 2D displays.
Conventional purely mathematical perceptual metrics systems that utilize functions that combine different types of visual data do not identify or verify presence of a superior function to approximate human perception. That is, conventional mathematical functions are limited and depend on the knowledge of a specific person or persons who are involved in designing the functions and a model. In contrast, neural networks may approximate complex models and functions based on a large number of parameters in a network with many modules and layers.
The neural network 1820 is provided to approximate human perception that would otherwise be difficult to achieve with conventional functions, and is particularly adapted for delivering improved 3D-to-2D conversion. The architecture shown in
The 3D data for generating the 3D scene on a 3D display device may be transmitted 2040 by any suitable communication system (server, network, cloud-based, or otherwise) to the 2D display device configured to display the changed display. The 3D data may include 2045 at least one of assets, textures, animations, combinations of the same, or the like. The 2D display device may be configured 2050 to send the group feedback and/or the user feedback to a device, server, or cloud for further processing. The device, server, or cloud may be configured to generate 2D data for the changed display. An STB may be configured 2055 with a graphical processing unit (GPU). The GPU may be configured to generate the changed display. Other types of devices, including 2D display devices, may be configured with the GPU or processing modules configured to perform the disclosed functions.
The analyzing 2030 may include analyzing 2125, with a derivation module, rendering data based on at least one of the detecting step 2105, the determining step 2110, the determining step 2115, or the determining step 2120 (or any other related prior step). The process 2100 may include training 2130, with a neural network module, a model based on at least one of the detecting step 2105, the determining step 2110, the determining step 2115, or the determining step 2120 (or any other related prior step). The training 2130 may include training 2135 a generative adversarial network to produce the changed display. The generative adversarial network may be trained by varying at least one of the movement parameter 231, the depth parameter 232, the motion parameter 233, the shadow parameter 234, the focus parameter 235, the sharpness parameter 236, the intensity parameter 237, the color parameter 238, or the n-th parameter 239 that delivers the 3D or 3D-like effect.
Gestures of a viewer may be used instead of head and eye movements of the viewer. All other descriptions of the movement capture may be freely combined with gesture capture. In particular, left-right hand movements may be used to simulate pan movements, while up-down hand movements may be used to simulate tilt movements. Likewise, opening of fingers may be used to simulate zoom-in, while closing of fingers may be used to simulate zoom-out.
In the detecting step 2105, the movement may include at least one of detecting 2205 a hand movement, detecting 2210 an eye movement, or detecting 2215 a head movement. A speed of alteration of the changed display may be based 2220 on any of the detected movements described herein. The speed of the alteration of the changed display may be based 2225 on the movement, which is adjusted based on at least one of the determining step 2120, or the analyzing step 2125 (or any other related prior step).
The detected hand movement may include at least one of a left-right hand movement 2230, an up-down hand movement 2240, or an opening-closing fingers movement 2250. The left-right hand movement 2230 may be converted to a pan movement 2235 in the changed display. The up-down hand movement 2240 may be converted to a tilt movement 2245 in the changed display. The opening-closing fingers movement 2250 may be converted to a zoom-in-zoom-out movement 2255 in the changed display.
In step 2210, a region of interest may be determined 2260 based on the eye movement. In response to determining 2260 the region of interest, the changed display may be zoomed 2265 to the determined region of interest.
In step 2215, the head movement may include at least one of a left-right head movement 2270, or an up-down head movement 2280. The left-right head movement 2270 may be converted 2275 to a pan movement in the changed display. The up-down head movement 2280 may be converted 2285 to a tilt movement in the changed display.
The depth parameter of step 2110 may include at least one of detecting 2305 a motion parameter, detecting 2320 a shadow parameter, detecting 2330 a focus parameter, detecting 2340 a sharpness parameter, detecting 2350 an intensity parameter, or detecting 2355 a color parameter. The motion parameter may include at least one of detecting 2310 a motion parallax parameter, or detecting 2315 a motion of an object parameter. The shadow parameter may be binary 2325, where 1 corresponds with casting of a shadow by an object, and where 0 corresponds with no casting of the shadow by the object. The focus parameter may be a variable 2335 dependent on the depth parameter. The sharpness parameter may be dependent 2345 on the focus parameter.
To vary colors with distance, the saturation may be increased with distance using Equation (1):
Equation (1) is specifically formulated for this disclosure and is not obtained from existing sources.
Similarly intensity may be varied with distance using Equation (2), assuming that the maximum intensity is 255:
Equation (2) is specifically formulated for this disclosure and is not obtained from existing sources.
Equation (3) controls the extent of focus depending on distance. It enables nearer objects to be more in focus than distant objects. The variation controlled by Equation (3) is continuous with distance. Instead, discrete variation may also be conceived by defining DF(p) based on CF(p) such that discrete values are defined for different ranges of depths.
Note that the sin function in Equations (1) and (2) makes the modification based on depth super-linear in the range 0 to 1. It makes the relative changes more prominent at nearer depths and the changes slow down at greater depths. However, other functions that are modifiable based on some parameters may also be used. For example, consider the function in Equation (4).
The value of g in Equation (4) may be varied based on distance, and the parameters a and b may be chosen to control the super-linear variation of the curves in Equations (1) and (2).
A simple strategy for optimizing view satisfaction may be based on defining binary variables for various factors that affect 3D perception on 2D displays. These binary variables are:
Obtaining optimal parameters for a specific viewer may be reduced to the problem of finding the values of these six binary variables. In other words, the user evaluations would essentially find an allocation of values to six binary bits. For this simple strategy, we need to optimize among 26 possible combinations and determine the one that maximizes viewer satisfaction. However, in reality other variables need to be considered. For example, for object motion we need to consider the speed at which an object is rotated, for motion parallax the speed at which the viewpoint is changed, for focus different depth ranges may be defined with different focus, and so on. Introduction of these types of sub-parameters and variations may make finding the best parameter values computationally very expensive and conducting viewer feedback very time consuming. Thus, numerical approximation techniques may be pursued to determine an approximate optimal value of sub-parameters, like SbMP and SbMO.
In a numerical approximation technique, not all possible speeds of rotating an object need to be considered. Instead, based on evaluations for a few initial values the next value to evaluate may be estimated. This process may be continued in an iterative manner until the results do not change significantly from one iteration to the next.
The mathematical modeling and optimization strategy described here is a simplified one in order to convey the overall concept. In general, we may consider all of the binary variables described herein to be enabled (i.e., have a value of 1), and consider functions under each component that control the manner in which parameters such as saturation, intensity, focus, and shadow are modified to provide the best 3D perception to a single viewer or a group of viewers. In this case, there will be parameters in several functions that need to optimized, and possibly the ability to choose between a class of functions may be considered as well. A further extension to the systems and methods described in this patent will be using one or more neural networks to learn functions for individual components that affect 3D perception on 2D displays, or even considering networks that learn functions combining multiple components that affect 3D perception on 2D displays.
As such, again referring to
In accordance with some embodiments of the disclosure, transmission and storage of 3D content to 2D displays is provided. 3D scenes (assets, textures, animations, and the like) may be transmitted directly from the network to a 2D display with rendering and rasterization capability. The 2D display may send user/viewer parameters back to a server that delivers customized video back to the device. An STB may be equipped with a GPU to render a specific embodiment.
Throughout the present disclosure, determinations, predictions, likelihoods, and the like are determined with one or more predictive models. For example,
The predictive model 2850 receives as input usage data 2830. The predictive model 2850 is based, in some embodiments, on at least one of a usage pattern of the user or media device, a usage pattern of the requesting media device, a usage pattern of the media content item, a usage pattern of the communication system or network, a usage pattern of the profile, or a usage pattern of the media device.
The predictive model 2850 receives as input load-balancing data 2835. The predictive model 2850 is based on at least one of load data of the display device, load data of the requesting media device, load data of the media content item, load data of the communication system or network, load data of the profile, or load data of the media device.
The predictive model 2850 receives as input metadata 2840. The predictive model 2850 is based on at least one of metadata of the streaming service, metadata of the requesting media device, metadata of the media content item, metadata of the communication system or network, metadata of the profile, or metadata of the media device. The metadata includes information of the type represented in the media device manifest.
The predictive model 2850 is trained with data. The training data is developed in some embodiments using one or more data techniques including but not limited to data selection, data sourcing, and data synthesis. The predictive model 2850 is trained in some embodiments with one or more analytical techniques including but not limited to classification and regression trees (CART), discrete choice models, linear regression models, logistic regression, logit versus probit, multinomial logistic regression, multivariate adaptive regression splines, probit regression, regression techniques, survival or duration analysis, and time series models. The predictive model 2850 is trained in some embodiments with one or more machine learning approaches including but not limited to supervised learning, unsupervised learning, semi-supervised learning, reinforcement learning, and dimensionality reduction. The predictive model 2850 in some embodiments includes regression analysis including analysis of variance (ANOVA), linear regression, logistic regression, ridge regression, and/or time series. The predictive model 2850 in some embodiments includes classification analysis including decision trees and/or neural networks. In
The predictive model 2850 is configured to output a current state 2881, and/or a future state 2883, and/or a determination, a prediction, or a likelihood 2885, and the like.
The current state 2881, and/or the future state 2883, and/or the determination, the prediction, or the likelihood 2885, and the like may be compared 2890 to a predetermined or determined standard. In some embodiments, the standard is satisfied (2890=OK) or rejected (2890=NOT OK). If the standard is satisfied or rejected, the predictive process 2800 outputs at least one of the current state, the future state, the determination, the prediction, or the likelihood to any device or module disclosed herein.
Communication network 2906 may include one or more network systems, such as, without limitation, the Internet, LAN, Wi-Fi, wireless, or other network systems suitable for audio processing applications. The system 2900 of
Computing device 2902 includes control circuitry 2908, display 2910 and input/output (I/O) circuitry 2912. Control circuitry 2908 may be based on any suitable processing circuitry and includes control circuits and memory circuits, which may be disposed on a single integrated circuit or may be discrete components. As referred to herein, processing circuitry should be understood to mean circuitry based on at least one microprocessors, microcontrollers, digital signal processors, programmable logic devices, field-programmable gate arrays (FPGAs), or application-specific integrated circuits (ASICs), etc., and may include a multi-core processor (e.g., dual-core, quad-core, hexa-core, or any suitable number of cores). In some embodiments, processing circuitry may be distributed across multiple separate processors or processing units, for example, multiple of the same type of processing units (e.g., two Intel Core i7 processors) or multiple different processors (e.g., an Intel Core i5 processor and an Intel Core i7 processor). Some control circuits may be implemented in hardware, firmware, or software. Control circuitry 2908 in turn includes communication circuitry 2926, storage 2922 and processing circuitry 2918. Either of control circuitry 2908 and 2934 may be utilized to execute or perform any or all the methods, processes, and outputs of one or more of
In addition to control circuitry 2908 and 2934, computing device 2902 and server 2904 may each include storage (storage 2922, and storage 2938, respectively). Each of storages 2922 and 2938 may be an electronic storage device. As referred to herein, the phrase “electronic storage device” or “storage device” should be understood to mean any device for storing electronic data, computer software, or firmware, such as random-access memory, read-only memory, hard drives, optical drives, digital video disc (DVD) recorders, compact disc (CD) recorders, BLU-RAY disc (BD) recorders, BLU-RAY 8D disc recorders, digital video recorders (DVRs, sometimes called personal video recorders, or PVRs), solid state devices, quantum storage devices, gaming consoles, gaming media, or any other suitable fixed or removable storage devices, and/or any combination of the same. Each of storage 2922 and 2938 may be used to store various types of content, metadata, and/or other types of data. Non-volatile memory may also be used (e.g., to launch a boot-up routine and other instructions). Cloud-based storage may be used to supplement storages 2922 and 2938 or instead of storages 2922 and 2938. In some embodiments, a user profile and messages corresponding to a chain of communication may be stored in one or more of storages 2922 and 2938. Each of storages 2922 and 2938 may be utilized to store commands, for example, such that when each of processing circuitries 2918 and 2936, respectively, are prompted through control circuitries 2908 and 2934, respectively. Either of processing circuitries 2918 or 2936 may execute any of the methods, processes, and outputs of one or more of
In some embodiments, control circuitry 2908 and/or 2934 executes instructions for an application stored in memory (e.g., storage 2922 and/or storage 2938). Specifically, control circuitry 2908 and/or 2934 may be instructed by the application to perform the functions discussed herein. In some embodiments, any action performed by control circuitry 2908 and/or 2934 may be based on instructions received from the application. For example, the application may be implemented as software or a set of and/or one or more executable instructions that may be stored in storage 2922 and/or 2938 and executed by control circuitry 2908 and/or 2934. The application may be a client/server application where only a client application resides on computing device 2902, and a server application resides on server 2904.
The application may be implemented using any suitable architecture. For example, it may be a stand-alone application wholly implemented on computing device 2902. In such an approach, instructions for the application are stored locally (e.g., in storage 2922), and data for use by the application is downloaded on a periodic basis (e.g., from an out-of-band feed, from an Internet resource, or using another suitable approach). Control circuitry 2908 may retrieve instructions for the application from storage 2922 and process the instructions to perform the functionality described herein. Based on the processed instructions, control circuitry 2908 may determine a type of action to perform in response to input received from I/O circuitry 2912 or from communication network 2906.
In client/server-based embodiments, control circuitry 2908 may include communication circuitry suitable for communicating with an application server (e.g., server 2904) or other networks or servers. The instructions for carrying out the functionality described herein may be stored on the application server. Communication circuitry may include a cable modem, an Ethernet card, or a wireless modem for communication with other equipment, or any other suitable communication circuitry. Such communication may involve the Internet or any other suitable communication networks or paths (e.g., communication network 2906). In another example of a client/server-based application, control circuitry 2908 runs a web browser that interprets web pages provided by a remote server (e.g., server 2904). For example, the remote server may store the instructions for the application in a storage device.
The remote server may process the stored instructions using circuitry (e.g., control circuitry 2934) and/or generate displays. Computing device 2902 may receive the displays generated by the remote server and may display the content of the displays locally via display 2910. For example, display 2910 may be utilized to present a string of characters. This way, the processing of the instructions is performed remotely (e.g., by server 2904) while the resulting displays, such as the display windows described elsewhere herein, are provided locally on computing device 2904. Computing device 2902 may receive inputs from the user via input/output circuitry 2912 and transmit those inputs to the remote server for processing and generating the corresponding displays.
Alternatively, computing device 2902 may receive inputs from the user via input/output circuitry 2912 and process and display the received inputs locally, by control circuitry 2908 and display 2910, respectively. For example, input/output circuitry 2912 may correspond to a keyboard and/or a set of and/or one or more speakers/microphones which are used to receive user inputs (e.g., input as displayed in a search bar or a display of
Server 2904 and computing device 2902 may transmit and receive content and data such as media content via communication network 2906. For example, server 2904 may be a media content provider, and computing device 2902 may be a smart television configured to download or stream media content, such as a live news broadcast, from server 2904. Control circuitry 2934, 2908 may send and receive commands, requests, and other suitable data through communication network 2906 using communication circuitry 2932, 2926, respectively. Alternatively, control circuitry 2934, 2908 may communicate directly with each other using communication circuitry 2932, 2926, respectively, avoiding communication network 2906.
It is understood that computing device 2902 is not limited to the embodiments and methods shown and described herein. In nonlimiting examples, computing device 2902 may be a television, a Smart TV, a set-top box, an integrated receiver decoder (IRD) for handling satellite television, a digital storage device, a digital media receiver (DMR), a digital media adapter (DMA), a streaming media device, a DVD player, a DVD recorder, a connected DVD, a local media server, a BLU-RAY player, a BLU-RAY recorder, a personal computer (PC), a laptop computer, a tablet computer, a WebTV box, a personal computer television (PC/TV), a PC media server, a PC media center, a handheld computer, a stationary telephone, a personal digital assistant (PDA), a mobile telephone, a portable video player, a portable music player, a portable gaming machine, a smartphone, or any other device, computing equipment, or wireless device, and/or combination of the same, capable of suitably displaying and manipulating media content.
Computing device 2902 receives user input 2914 at input/output circuitry 2912. For example, computing device 2902 may receive a user input such as a user swipe or user touch. It is understood that computing device 2902 is not limited to the embodiments and methods shown and described herein.
User input 2914 may be received from a user selection-capturing interface that is separate from device 2902, such as a remote-control device, trackpad, or any other suitable user movement-sensitive, audio-sensitive or capture devices, or as part of device 2902, such as a touchscreen of display 2910. Transmission of user input 2914 to computing device 2902 may be accomplished using a wired connection, such as an audio cable, USB cable, ethernet cable and the like attached to a corresponding input port at a local device, or may be accomplished using a wireless connection, such as Bluetooth, Wi-Fi, WiMAX, GSM, UTMS, CDMA, TDMA, 8G, 4G, 4G LTE, 5G, or any other suitable wireless transmission protocol. Input/output circuitry 2912 may include a physical input port such as a 12.5 mm (0.4921 inch) audio jack, RCA audio jack, USB port, ethernet port, or any other suitable connection for receiving audio over a wired connection or may include a wireless receiver configured to receive data via Bluetooth, Wi-Fi, WiMAX, GSM, UTMS, CDMA, TDMA, 3G, 4G, 4G LTE, 5G, or other wireless transmission protocols.
Processing circuitry 2918 may receive user input 2914 from input/output circuitry 2912 using communication path 2916. Processing circuitry 2918 may convert or translate the received user input 2914 that may be in the form of audio data, visual data, gestures, or movement to digital signals. In some embodiments, input/output circuitry 2912 performs the translation to digital signals. In some embodiments, processing circuitry 2918 (or processing circuitry 2936, as the case may be) carries out disclosed processes and methods.
Processing circuitry 2918 may provide requests to storage 2922 by communication path 2920. Storage 2922 may provide requested information to processing circuitry 2918 by communication path 2946. Storage 2922 may transfer a request for information to communication circuitry 2926 which may translate or encode the request for information to a format receivable by communication network 2906 before transferring the request for information by communication path 2928. Communication network 2906 may forward the translated or encoded request for information to communication circuitry 2932, by communication path 2930.
At communication circuitry 2932, the translated or encoded request for information, received through communication path 2930, is translated or decoded for processing circuitry 2936, which will provide a response to the request for information based on information available through control circuitry 2934 or storage 2938, or a combination thereof. The response to the request for information is then provided back to communication network 2906 by communication path 2940 in an encoded or translated format such that communication network 2906 forwards the encoded or translated response back to communication circuitry 2926 by communication path 2942.
At communication circuitry 2926, the encoded or translated response to the request for information may be provided directly back to processing circuitry 2918 by communication path 2954 or may be provided to storage 2922 through communication path 2944, which then provides the information to processing circuitry 2918 by communication path 2946. Processing circuitry 2918 may also provide a request for information directly to communication circuitry 2926 through communication path 2952, where storage 2922 responds to an information request (provided through communication path 2920 or 2944) by communication path 2924 or 2946 that storage 2922 does not contain information pertaining to the request from processing circuitry 2918.
Processing circuitry 2918 may process the response to the request received through communication paths 2946 or 2954 and may provide instructions to display 2910 for a notification to be provided to the users through communication path 2948. Display 2910 may incorporate a timer for providing the notification or may rely on inputs through input/output circuitry 2912 from the user, which are forwarded through processing circuitry 2918 through communication path 2948, to determine how long or in what format to provide the notification. When display 2910 determines the display has been completed, a notification may be provided to processing circuitry 2918 through communication path 2950.
The communication paths provided in
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used herein, the singular forms “a”, “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. As used herein, the term “and/or” includes any and all combinations of one or more of the associated listed items.
Although at least one embodiment is described as using a plurality of units or modules to perform a process or processes, it is understood that the process or processes may also be performed by one or a plurality of units or modules. Additionally, it is understood that the term controller/control unit may refer to a hardware device that includes a memory and a processor. The memory may be configured to store the units or the modules and the processor may be specifically configured to execute said units or modules to perform one or more processes which are described herein.
Unless specifically stated or obvious from context, as used herein, the term “about” is understood as within a range of normal tolerance in the art, for example within 2 standard deviations of the mean. “About” may be understood as within 10%, 9%, 8%, 7%, 6%, 5%, 4%, 3%, 2%, 1%, 0.5%, 0.1%, 0.05%, or 0.01% of the stated value. Unless otherwise clear from the context, all numerical values provided herein are modified by the term “about.”
The use of the terms “first”, “second”, “third”, and so on, herein, are provided to identify structures or operations, without describing an order of structures or operations, and, to the extent the structures or operations are used in an embodiment, the structures may be provided or the operations may be executed in a different order from the stated order unless a specific order is definitely specified in the context.
The methods and/or any instructions for performing any of the embodiments discussed herein may be encoded on computer-readable media. Computer-readable media includes any media capable of storing data. The computer-readable media may be transitory, including, but not limited to, propagating electrical or electromagnetic signals, or may be non-transitory (e.g., a non-transitory, computer-readable medium accessible by an application via control or processing circuitry from storage) including, but not limited to, volatile and non-volatile computer memory or storage devices such as a hard disk, floppy disk, USB drive, DVD, CD, media cards, register memory, processor caches, random access memory (RAM), etc.
The interfaces, processes, and analysis described may, in some embodiments, be performed by an application. The application may be loaded directly onto each device of any of the systems described or may be stored in a remote server or any memory and processing circuitry accessible to each device in the system. The generation of interfaces and analysis there-behind may be performed at a receiving device, a sending device, or some device or processor therebetween.
The systems and processes discussed herein are intended to be illustrative and not limiting. One skilled in the art would appreciate that the actions of the processes discussed herein may be omitted, modified, combined, and/or rearranged, and any additional actions may be performed without departing from the scope of the invention. More generally, the disclosure herein is meant to provide examples and is not limiting. Only the claims that follow are meant to set bounds as to what the present disclosure includes. Furthermore, it should be noted that the features and limitations described in any one embodiment may be applied to any other embodiment herein, and flowcharts or examples relating to one embodiment may be combined with any other embodiment in a suitable manner, done in different orders, or done in parallel. In addition, the systems and methods described herein may be performed in real time. It should also be noted that the systems and/or methods described herein may be applied to, or used in accordance with, other systems and/or methods.
This specification discloses embodiments, which include, but are not limited to, the following items:
Item 1. A method comprising:
Item 2. The method of item 1, comprising at least one of:
Item 3. The method of item 2 comprising:
Item 4. The method of item 2 including at least two of steps a-d.
Item 5. The method of item 2 including each of steps a-d.
Item 6. The method of item 2, wherein the method includes the detecting step a, wherein the movement includes at least one of a hand movement, an eye movement, or a head movement.
Item 7. The method of item 2, wherein a speed of alteration of the changed display is based on the movement.
Item 8. The method of item 7, wherein the speed of the alteration of the changed display based on the movement is adjusted based on at least one of the determining step d, or the analyzing step e.
Item 9. The method of item 6, wherein the movement includes the hand movement,
Item 10. The method of item 6, wherein the movement includes the eye movement, and wherein a region of interest is determined based on the eye movement.
Item 11. The method of item 10, wherein, in response to determining the region of interest, the changed display is zoomed to the determined region of interest.
Item 12. The method of item 6, wherein the movement includes the head movement,
Item 13. The method of item 2, wherein the depth parameter includes at least one of a motion parameter, a shadow parameter, a focus parameter, a sharpness parameter, an intensity parameter, or a color parameter.
Item 14. The method of item 13, wherein the motion parameter includes at least one of a motion parallax parameter, or a motion of an object parameter.
Item 15. The method of item 13, wherein the shadow parameter is binary, where 1 corresponds with casting of a shadow by an object, and where 0 corresponds with no casting of the shadow by the object.
Item 16. The method of item 13, wherein the focus parameter is a variable dependent on the depth parameter.
Item 17. The method of item 13, wherein the sharpness parameter is dependent on the focus parameter.
Item 18. The method of item 2, wherein a default set of parameters for the changed display is based on the group feedback.
Item 19. The method of item 18, wherein the default set of parameters for the changed display is based on the user feedback.
Item 20. The method of item 18, wherein the default set of parameters includes optimization of the depth parameter determined by the determining step b.
Item 21. The method of item 20, wherein the optimization is based on at least one of a motion parameter, a shadow parameter, a focus parameter, a sharpness parameter, an intensity parameter, or a color parameter.
Item 22. The method of item 2, wherein a default set of parameters for the changed display is based on the user feedback.
Item 23. The method of item 22, wherein the default set of parameters includes optimization of the depth parameter determined by the determining step b.
Item 24. The method of item 23, wherein the optimization is based on at least one of a motion parameter, a shadow parameter, a focus parameter, a sharpness parameter, an intensity parameter, or a color parameter.
Item 25. The method of item 2, wherein at least one of the group feedback or the user feedback is obtained with a wearable device.
Item 26. The method of item 25, wherein the wearable device includes a brain machine interface.
Item 27. The method of item 2, wherein the group feedback is aggregated and averaged for at least one of a motion parameter, a shadow parameter, a focus parameter, a sharpness parameter, an intensity parameter, or a color parameter.
Item 28. The method of item 2, wherein at least one of the group feedback or the user feedback is obtained with a remote control device.
Item 29. The method of item 1, wherein 3D data for generating the 3D scene on a 3D display device is transmitted by a network to the 2D display device configured to display the changed display.
Item 30. The method of item 29, wherein the 3D data includes at least one of assets, textures, or animations.
Item 31. The method of item 29, wherein the 2D display device configured to display the changed display is configured to send at least one of the group feedback or the user feedback to a server configured to generate 2D data for the changed display.
Item 32. The method of item 1, wherein a set-top box is configured with a graphical processing unit configured to generate the changed display.
Item 33. The method of item 3, wherein the neural network module includes a generative adversarial network trained to produce the changed display.
Item 34. The method of item 33, wherein the generative adversarial network is trained by varying at least one of a motion parameter, a shadow parameter, a focus parameter, a sharpness parameter, an intensity parameter, or a color parameter.
Item 35. The method of item 33, wherein the generative adversarial network includes a U-net with at least one layer of resolution, and the method comprises iterative pooling and upsampling.
Item 36. The method of item 33 comprising coupling at least one 3D convolution block with at least one rectified linear unit.
Item 37. The method of item 33 comprising receiving a subjective score of the changed display from a human observer or a human judge.
Item 38. The method of item 33 comprising generating a perceptual score of the changed display based on at least one no-reference perceptual quality metric.
Item 39. The method of item 33 comprising generating with a neural network a perceptual score comparing the changed display with a reference display.
Item 40. The method of item 2, wherein the rendering data includes a calculation of a color depending on a distance by increasing a saturation with the distance.
Item 41. The method of item 2, wherein the rendering data includes a calculation of an intensity depending on a distance.
Item 42. The method of item 2, wherein the rendering data includes a calculation of an extent of a focus depending on a distance.
Item 43. The method of item 42, wherein the calculation is defined for different ranges of depths.
Item 44. The method of item 2, wherein the rendering data includes a binary variable for optimizing a view satisfaction.
Item 45. The method of item 44, wherein the binary variable includes at least one of:
Item 46. The method of item 45, wherein the binary variable is a plurality of binary variables including each of bS, bI, bC, bMP, bMO, bSH, SbMP, and SbMO.
Item 47. A system comprising circuitry configured to perform the method of any one of items 1-46.
Item 48. A device configured to perform the method of any one of items 1-46.
Item 49. A device comprising means for performing the steps of the method of any one of items 1-46.
Item 50. A non-transitory, computer-readable medium having non-transitory, computer-readable instructions encoded thereon, that, when executed perform the method of any one of items 1-46.
Item 51. A system comprising circuitry configured to:
Item 52. The system of item 51, comprising at least one of:
Item 53. The system of item 52 comprising circuitry configured to:
Item 54. The system of item 52 including at least two of steps a-d.
Item 55. The system of item 52 including each of steps a-d.
Item 56. The system of item 52, wherein the system includes the detecting step a, wherein the movement includes at least one of a hand movement, an eye movement, or a head movement.
Item 57. The system of item 52, wherein a speed of alteration of the changed display is based on the movement.
Item 58. The system of item 57, wherein the speed of the alteration of the changed display based on the movement is adjusted based on at least one of the determining step d, or the analyzing step e.
Item 59. The system of item 56, wherein the movement includes the hand movement,
Item 60. The system of item 56, wherein the movement includes the eye movement, and
wherein a region of interest is determined based on the eye movement.
Item 61. The system of item 60, wherein, in response to determining the region of interest, the changed display is zoomed to the determined region of interest.
Item 62. The system of item 56, wherein the movement includes the head movement,
Item 63. The system of item 52, wherein the depth parameter includes at least one of a motion parameter, a shadow parameter, a focus parameter, a sharpness parameter, an intensity parameter, or a color parameter.
Item 64. The system of item 63, wherein the motion parameter includes at least one of a motion parallax parameter, or a motion of an object parameter.
Item 65. The system of item 63, wherein the shadow parameter is binary, where 1 corresponds with casting of a shadow by an object, and where 0 corresponds with no casting of the shadow by the object.
Item 66. The system of item 63, wherein the focus parameter is a variable dependent on the depth parameter.
Item 67. The system of item 63, wherein the sharpness parameter is dependent on the focus parameter.
Item 68. The system of item 52, wherein a default set of parameters for the changed display is based on the group feedback.
Item 69. The system of item 68, wherein the default set of parameters for the changed display is based on the user feedback.
Item 70. The system of item 68, wherein the default set of parameters includes optimization of the depth parameter determined by the determining step b.
Item 71. The system of item 70, wherein the optimization is based on at least one of a motion parameter, a shadow parameter, a focus parameter, a sharpness parameter, an intensity parameter, or a color parameter.
Item 72. The system of item 52, wherein a default set of parameters for the changed display is based on the user feedback.
Item 73. The system of item 72, wherein the default set of parameters includes optimization of the depth parameter determined by the determining step b.
Item 74. The system of item 73, wherein the optimization is based on at least one of a motion parameter, a shadow parameter, a focus parameter, a sharpness parameter, an intensity parameter, or a color parameter.
Item 75. The system of item 52, wherein at least one of the group feedback or the user feedback is obtained with a wearable device.
Item 76. The system of item 75, wherein the wearable device includes a brain machine interface.
Item 77. The system of item 52, wherein the group feedback is aggregated and averaged for at least one of a motion parameter, a shadow parameter, a focus parameter, a sharpness parameter, an intensity parameter, or a color parameter.
Item 78. The system of item 52, wherein at least one of the group feedback or the user feedback is obtained with a remote control device.
Item 79. The system of item 51, wherein 3D data for generating the 3D scene on a 3D display device is transmitted by a network to the 2D display device configured to display the changed display.
Item 80. The system of item 79, wherein the 3D data includes at least one of assets, textures, or animations.
Item 81. The system of item 79, wherein the 2D display device configured to display the changed display is configured to send at least one of the group feedback or the user feedback to a server configured to generate 2D data for the changed display.
Item 82. The system of item 51, wherein a set-top box is configured with a graphical processing unit configured to generate the changed display.
Item 83. The system of item 53, wherein the neural network module includes a generative adversarial network trained to produce the changed display.
Item 84. The system of item 83, wherein the generative adversarial network is trained by varying at least one of a motion parameter, a shadow parameter, a focus parameter, a sharpness parameter, an intensity parameter, or a color parameter.
Item 85. The system of item 83, wherein the generative adversarial network includes a U-net with at least one layer of resolution, and the system comprises iterative pooling and upsampling.
Item 86. The system of item 83, wherein the circuitry is configured to couple at least one 3D convolution block with at least one rectified linear unit.
Item 87. The system of item 83, wherein the circuitry is configured to receive a subjective score of the changed display from a human observer or a human judge.
Item 88. The system of item 83, wherein the circuitry is configured to generate a perceptual score of the changed display based on at least one no-reference perceptual quality metric.
Item 89. The system of item 83, wherein the circuitry is configured to generate with a neural network a perceptual score comparing the changed display with a reference display.
Item 90. The system of item 52, wherein the rendering data includes a calculation of a color depending on a distance by increasing a saturation with the distance.
Item 91. The system of item 52, wherein the rendering data includes a calculation of an intensity depending on a distance.
Item 92. The system of item 52, wherein the rendering data includes a calculation of an extent of a focus depending on a distance.
Item 93. The system of item 92, wherein the calculation is defined for different ranges of depths.
Item 94. The system of item 52, wherein the rendering data includes a binary variable for optimizing a view satisfaction.
Item 95. The system of item 94, wherein the binary variable includes at least one of:
Item 96. The system of item 95, wherein the binary variable is a plurality of binary variables including each of bS, bI, bC, bMP, bMO, bSH, SbMP, and SbMO.
Item 97. A method to display a 3D representation of a scene on a 2D display device in a manner to provide a 3D perception to a viewer, comprising:
Item 98. The method according to item 97, wherein a speed of the movement of the object or a speed of the change in the viewpoint is controlled by a parameter.
Item 99. The method according to item 98, wherein the parameter controlling the speed is learned through an active measurement of viewer satisfaction or a passive measurement of viewer satisfaction.
Item 100. The method according to item 97, wherein the 3D perception of the viewer is enhanced by an intensity variation associated with a depth.
Item 101. The method according to item 97, wherein the 3D perception of the viewer is enhanced by a color variation associated with a depth.
Item 102. The method according to item 97, wherein the 3D perception of the viewer is enhanced by highlighting a shadow.
Item 103. The method according to item 97, wherein the 3D perception of the viewer is enhanced by controlling an extent of a focus based on a depth.
Item 104. The method according to item 97, wherein the 3D perception of the viewer is enhanced by a factor that facilitates the 3D perception, wherein the factor is not an intensity variation associated with a depth, a color variation associated with a depth, highlighting a shadow, or controlling an extent of a focus based on a depth.
Item 105. The method according to item 97, wherein the 3D perception of the viewer is enhanced by at least two of an intensity variation associated with a depth, a color variation associated with a depth, highlighting a shadow, or controlling an extent of a focus based on a depth.
Item 106. The method according to item 97, wherein the 3D perception of the viewer is enhanced by each of an intensity variation associated with a depth, a color variation associated with a depth, highlighting a shadow, and controlling an extent of a focus based on a depth.
Item 107. A system to track the head movement and the eye movement of the viewer to support the active modification of the view projected on the 2D display device according to any one of items 1-46 and 97-106.
Item 108. A system to track gestures of the viewer to support the active modification to the view projected on the 2D screen according to any one of items 1-46 and 97-106.
Item 109. A system to train a neural network to generate a 2D projection enhancing depth perception including the method of any one of items 1-46 and 97-106.
Item 110. A method to train a neural network to generate a 2D projection enhancing depth perception including the method of any one of items 1-46 and 97-106.
Item 111. A method for learning a viewer preference over a time in order to make a passive modification to a 2D view enhancing a 3D perception.
Item 112. A system to actively acquire a ground truth on viewer satisfaction including the method of any one of items 1-46, 97-106, 110, and 111.
Item 113. A system to passively acquire a ground truth on viewer satisfaction including the method of any one of items 1-46, 97-106, 110, and 111.
While some portions of this disclosure may refer to “convention” or “conventional” examples. Any such reference is merely to provide context to the instant disclosure and does not form any admission as to what constitutes the state of the art.
Accordingly, this description is to be taken only by way of example and not to otherwise limit the scope of the embodiments herein. Therefore, it is the object of the appended claims to cover all such variations and modifications as come within the true spirit and scope of the embodiments herein.