1. Field
This invention relates generally to light field and 3D image and video processing, more particularly to the preprocessing of data to be used as input for full parallax light field compression and full parallax light field display systems.
2. Background
The following references are cited for the purpose of more clearly describing the present invention, the disclosures of which are hereby incorporated by reference:
The environment around us contains objects that reflect an infinite number of light rays. When this environment is observed by a person, a subset of these light rays is captured through the eyes and processed by the brain to create the visual perception. A light field display tries to recreate a realistic perception of an observed environment by displaying a digitized array of light rays that are sampled from the data available in the environment being displayed. This digitized array of light rays correspond to the light field generated by the light field display.
Different light field displays have different light field producing capabilities. Therefore the light field data has to be formatted differently for each display. Also the large amount of data required for displaying light fields and large amount of correlation that exists in the light field data gives way to light field compression algorithms. Generally light field compression algorithms are display hardware dependent and they can benefit from hardware specific preprocessing of the light field data.
Prior art light field display systems use inefficient compression algorithms. These algorithms first capture or render the scene 3D data or light field input data. Then this data is compressed for transmission within the light field display system, then the compressed data is decompressed, and finally the decompressed data is displayed.
With the introduction of new emissive and compressive displays it is now possible to realize full parallax light field displays with wide viewing angle, low power consumption, high refresh rate, high resolution, large depth of field and real time compression/decompression capability. New full parallax light field compression methods have been introduced to take advantage of the inherent correlation in the full parallax light field data very efficiently. These methods can reduce the transmission bandwidth, reduce the power consumption, reduce the processing requirements and achieve real-time encoding and decoding performance.
In order to achieve compression, prior art methods aim to improve the compression performance by preprocessing the input data to adapt the input characteristics to the display compression capabilities. For example, Ref. [3] describes a method that utilizes a preprocessing stage to adapt the input light field to the subsequent block-based compression stage. Since a block-based method was adopted in the compression stage, it is expected that the blocking artifacts introduced by the compression will affect the angular content, compromising the vertical and horizontal parallax. In order to adapt the content to the compression step, the input image is first transformed from elemental images to sub-images (gathering all angular information into one unique image), and then the image is re-sampled so that its dimension is divisible by the block size used by the compression algorithm. The method improves compression performance; nevertheless it is only tailored to block-based compression approaches and does not exploit the redundancies between the different viewing angles.
In Ref. [1], compression is achieved by encoding and transmitting only a subset of the light field information to the display. A 3D compressive imaging system receives the input data and utilizes the depth information transmitted along with the texture to reconstruct the entire light field. The process of selecting the images to be transmitted depends on the content and location of elements of the scene, and is referred to as the visibility test. The reference imaging elements are selected according to the position of objects relative to the camera location surface, and each object is processed in order of their distance from that surface and closer objects are processed before more distant objects. The visibility test procedure uses a plane representation for the objects and organizes the 3D scene objects in an ordered list. Since the full Parallax compressed light field 3D imaging system renders and displays objects from an input 3D database that could contain high level information such as objects description, or low level information such as simple point clouds, a preprocessing of the input data needs to be performed to extract the information used by the visibility test.
It is therefore the objective of this invention to introduce data preprocessing methods to improve light field compression stages used in the full parallax compressed light field 3D imaging systems. Additional objectives and advantages of this invention will become apparent from the following detailed description of a preferred embodiment thereof that proceeds with reference to the accompanying drawings.
In the following description, like drawing reference numerals are used for the like elements, even in different drawings. The matters defined in the description, such as detailed construction and elements, are provided to assist in a comprehensive understanding of the exemplary embodiments. However, the present invention can be practiced without those specifically defined matters. Also, well-known functions or constructions are not described in detail since they would obscure the invention with unnecessary detail. In order to understand the invention and to see how it may be carried out in practice, a few embodiments of it will now be described, by way of non-limiting example only, with reference to accompanying drawings, in which:
In the following description, numerous specific details are set forth. However, it is understood that embodiments of the invention may be practiced without these specific details. In other instances, well-known circuits, structures and techniques have not been shown in detail in order not to obscure the understanding of this description.
In the following description, reference is made to the accompanying drawings, which illustrate several embodiments of the present invention. It is understood that other embodiments may be utilized, and mechanical compositional, structural, electrical, and operational changes may be made without departing from the spirit and scope of the present disclosure. The following detailed description is not to be taken in a limiting sense, and the scope of the embodiments of the present invention is defined only by the claims of the issued patent.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. Spatially relative terms, such as “beneath”, “below”, “lower”, “above”, “upper”, and the like may be used herein for ease of description to describe one element's or feature's relationship to another element(s) or feature(s) as illustrated in the figures. It will be understood that the spatially relative terms are intended to encompass different orientations of the device in use or operation in addition to the orientation depicted in the figures. For example, if the device in the figures is turned over, elements described as “below” or “beneath” other elements or features would then be oriented “above” the other elements or features. Thus, the exemplary term “below” can encompass both an orientation of above and below. The device may be otherwise oriented (e.g., rotated 90 degrees or at other orientations) and the spatially relative descriptors used herein interpreted accordingly.
As used herein, the singular forms “a”, “an”, and “the” are intended to include the plural forms as well, unless the context indicates otherwise. It will be further understood that the terms “comprises” and/or “comprising” specify the presence of stated features, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, elements, components, and/or groups thereof.
As shown in
Recently introduced light field display systems, as shown in
The preprocessing methods 401 for efficient full parallax compressed light field 3D display systems 403 described in present invention can collect, analyze, create, format, store and provide light field input data 201 to be used at specific stages of the compression operation, see
The preprocessing 401 may convert the light field input data 201 from data space to the display space of the light field display hardware. Conversion of the light field input data from data space to display space is needed for the display to be able to show the light field information in compliance with the light field display characteristics and the user (viewer) preferences. When the light field input data 201 is based on camera input, the light field capture space (or coordinates) and the camera space (coordinates) are typically not the same and the preprocessor needs to be able to convert the data from any camera's (capture) data space to the display space. This is particularly the case when multiple cameras are used to capture the light field and only a portion of the captured light field in included in the viewer preference space.
This data space to display space conversion is done by the preprocessor 401 by analyzing the characteristics of the light field display hardware and, in some embodiments, the user (viewer) preferences. Characteristics of the light field display hardware include, but are not limited to, image processing capabilities, refresh rate, number of hogels and anglets, color gamut, and brightness. Viewer preferences include, but are not limited to, object viewing preferences, interaction preferences, and display preferences.
The preprocessor 401 takes the display characteristics and the user preferences into account and converts the light field input data from data space to display space. For example, if the light field input data consists of mesh objects, then preprocessing analyzes the display characteristics such as number of hogels, number of anglets and FOV, then analyzes the user preferences such as object placement and viewing preferences then calculates bounding boxes, motion vectors, etc. and reports this information to compression and display system. Data space to display space conversion includes data format conversion and motion analysis in addition to coordinate transformation. Data space to display space conversion involves taking into account the position of the light modulation surface (display surface) and the object's position relative to the display surface in addition to what is learned from compressed rendering regarding the most efficient (compressed) representation of the light field as viewed by the user.
When the preprocessing methods 401 interact with the compressed rendering 302, the preprocessing 401 usually involves preparing and providing data to aid in the visibility test 601 stage of the compressed rendering.
When the preprocessing methods 401 interact with the display matched encoding 303, the display operation may bypass the compressed rendering stage 302, or provide data to aid in the processing of the information that comes from the compressed rendering stage. In the case when the compressed rendering stage 302 is bypassed, preprocessing 401 may provide all the information that is usually reserved for compressed rendering 302 to display matched encoding 303, in addition include further information about the display system, settings and type of encoding that needs to be performed at the display matched encoding 303. In the case when the compressed rendering stage 302 is not bypassed, the preprocessing can provide further information in the form of expected holes, and the best set of residual data to increase the image quality, further information about display, settings and encoding method to be used in display matched encoding 303.
When the preprocessing methods 401 interact with the display of compressed data 304 directly, the preprocessing can affect the operational modes of the display, including but not limited to: adjusting the field of view (FOV), number of anglets, number of hogels, active area, brightness, contrast, color, refresh rate, decoding method and image processing methods in the display. If there is already preprocessed data stored in the display's preferred input format, then this data can bypass compressed rendering 302 and display matched encoding 303, and be directly displayed 304, or either compressed rendering and/or display matched encoding stages can be bypassed depending on the format of the available light field input data and the operation currently being performed on the display by user interaction 402.
Interaction of the preprocessing 401 with any of the subsystems in the imaging system as shown in
As stated earlier, the feedback is an integral part of the light field display and the user (viewer) preferences that are used by preprocessing of the light field input 401. As another example of feedback, the compressed rendering 302 may issue requests to have the preprocessing 401 transfer selected reference hogels to faster storage 505 (
Preprocessing methods of the light field input data can be used for full parallax light field display systems that utilize input images from three types of sources, see
Preprocessing methods of the light field input data can be applied on static or dynamic light fields and would typically be performed on specifically designed specialized hardware. In one embodiment of this invention preprocessing 401 is applied to convert the light field data 201 from one format such as LIDAR to another format such as mesh data and store the result in a slow storage medium 504 such as a hard drive with a rotating disk. Then the preprocessing 401 moves a subset of this converted information in slow storage 504 to fast storage 505 such as a solid state hard drive. The information in 505 can be used by compressed rendering 302 and display matched encoding 303 and it usually would be a larger amount of data then what can be displayed on the light field display. The data that can be immediately displayed on a light field display is stored in the on board memory 506 of the light field display 304. Preprocessing can also interact with the on board memory 506 to receive information about the display and send commands to the display that may be related to display operational modes, and applications. Preprocessing 401 makes use of the user interaction data to prepare the display and interact with the data stored in different storage mediums. For example, if a user wants to zoom in, preprocessing would typically move a new set of data from the slow storage 504 to fast storage 505, and then send commands to the on board memory 506 to adjust the display refresh rate the data display method such as the method for decompression.
Other examples of system performance improvements due to preprocessing with different speed storage devices include: User interaction performance improvements and compression operation speed improvements. In one embodiment of the present invention, if a user is interacting with high altitude light field images of a continent in the form of point cloud data and is currently interested in examining the light field images of a specific city (or region of interest), this light field data about the city would be stored in the on board memory 506 of the display system. Predicting that the user may be interested in examining light field images of the neighboring cities, the preprocessing can load information about these neighboring cities into the fast storage system 505 by transferring this data from the slow storage system 504. In another embodiment of this invention the preprocessing can convert that data in the slow storage system 504 into a display system preferred data format, for example from point cloud data to mesh data, and save it back into the slow storage system 504, this conversion can be performed offline or in real-time. In another embodiment of this invention the preprocessing system can save different levels of detail for the same light field data to enable faster zooming. For example 1×, 2×, 4×, and 8× zoom data can be created and stored in the slow storage devices 504 and then moved to fast storage 505 and on board memory 506 to display. In these scenarios the data that is stored on the fast storage would be decided by examining the user interaction 402. In another embodiment of this invention, preprocessing would enable priority access to light field input data 201 for the objects closer to the display surface 103 to speed up the visibility test 601 because an object closer to the display surface may require more reference hogels and, therefore, is processed first in the visibility test.
In a computer generated (CG) capture environment, where computer generated 3D models are used to capture and compress a full parallax light field image, some information would be already known before the rendering process is started. This information includes location of the models, size of the models, bounding box of the models, capture camera information (CG cameras) motion vectors of the models and target display information. Such information is beneficial and can be used in Compressed Rendering operations of the full parallax compressed light field 3D display systems as described in patent application Ref. [1] as a priori information.
In one preprocessing method the a priori information could be polled from the computer graphics card, or could be captured through measurements or user interaction devices through wired or wireless means 401.
In another preprocessing method, the a priori information could be supplied as a part of a command, as a communication packet or instruction from another subsystem either working as a master or a slave in a hierarchical imaging system. It could be a part of an input image as instructions on how to process that image in the header information.
In another preprocessing method, within the 3D imaging system the preprocessing method could be performed as a batch process by a specialized graphic processing unit (GPU), or a specialized image processing device prior to the light field rendering or compression operations. In this type of preprocessing, the preprocessed input data would be saved in a file or memory to be used at a later stage.
In another preprocessing method, preprocessing can also be performed in real-time using a specialized hardware system having sufficient processing resources before each rendering or compression stage as new input information becomes available. For example, in an interactive full parallax light field display, as the interaction information 402 becomes available, it can be provided to the preprocessing stage 401 as motion vectors. In this type of preprocessing the preprocessed data can be used immediately in real-time or can be saved for a future use in memory or in a file.
The full parallax light field compression methods described in Ref [1] combine the rendering and compression stages into one stage called compressed rendering 302. Compressed rendering 302 achieves its efficiencies through the use of the priori known information about the light field. In general such priori information would include the objects location and bounding boxes in the 3D scene. In the compressed rendering method of the full parallax light field compression system described in Ref. [1] a visibility test makes use of such a priori information about the objects in the 3D scene to select the best set of imaging elements (or hogels) to be used as reference.
In order to perform the visibility test the light field input data must be formatted into a list of 3D planes representing objects, ordered by their distances to the light field modulation surface of the full parallax compressed light field 3D display system.
The preprocessing block 401 receives the light field input data 201, and extracts the information necessary for the visibility test 601 of Ref. [1]. The visibility test 601 will then select the list of imaging elements (or hogels) to be used as reference by utilizing the information extracted from the preprocessing block 401. The rendering block 602 will access the light field input data and render only the elemental images (or hogels) selected by the visibility test 601. The reference texture 603 and depth 604 are generated by the rendering block 602, and then the texture is further filtered by an adaptive texture filter 605 and the depth is converted to disparity 606. The multi-reference depth image based rendering (MR-DIBR) 607 utilizes the disparity and the filtered texture to reconstruct the entire light field texture 608 and disparity 609.
The light field input data 201 can have several different data formats, from high level object directives to low level point cloud data. However, the visibility test 601 only makes use of a high level representation of the light field input data 201. The input used by the visibility test 601 would typically be an ordered list of 3D objects within the light field display volume. In this embodiment such an ordered list of 3D objects would be in reference to the surface of the axis-aligned bounding box closest to the light field modulation (or display) surface. The ordered list of 3D objects is a list of 3D planes representing the 3D objects, ordered by their distances to the light field modulation surface of the full parallax compressed light field 3D display system. A 3D object may be on the same side of the light field modulation surface as the viewer or on the opposite side with the light field modulation surface between the viewer and the 3D object. The ordering of the list is by distance to the light field modulation surface without regard to which side of the light field modulation surface the 3D object is on. In some embodiments, the distance to the light field modulation surface may be represented by a signed number that indicates which side of the light field modulation surface the 3D object is on. In these embodiments the ordering of the list is by the absolute value of the signed distance value.
As illustrated in
For the case of a 3D scene containing multiple objects such as the illustration of
For displaying a dynamic light field 102, as in the case of displaying a live scene that is being captured by any of a light field camera 1201, by an array of 2D cameras 1202, by an array of 3D cameras 1203 (including laser ranging, IR depth capture, or structured light depth sensing), or by an array of light field cameras 1204, see
In one preprocessing method 401 of this invention where a single light field camera 1201 is used to capture the light field, the preprocessed light field input data can include the maximum number of pixels to capture, specific instructions for certain pixel regions on the camera sensor, specific instructions for certain micro lens or lenslets groups in the camera lens and the pixels below the camera lens. The preprocessed light field input data can be calculated and stored before image capture, or it can be captured simultaneously or just before the image capture. In the case of when the preprocessing of the light field input data is performed right before the capture, a subsample of the camera pixels can be used to determine rough scene information, such as depth, position, disparity and hogel relevance for the visibility test algorithm.
In another embodiment of this invention, see
In another embodiment of this invention, see
In another embodiment of this invention the preprocessing methods of this invention are used within the context of the networked light field photography system of Ref. [2] to enable capture feedback to the cameras used to capture the light field. Ref. [2] describes a networked light field photography method that uses multiple light field and/or conventional cameras to capture a 3D scene simultaneously or over a period of time. The data from cameras in the networked light field photography system which captured the scene early in time can be used to generate preprocessed data for the later cameras. This preprocessed light field data can reduce the number for cameras capturing the scene or reduce the pixels captured by each camera, thus reducing the required interface bandwidth from each camera. Similar to 2D and 3D array capture methods described earlier, networked light field cameras can also be partitioned to achieve different functions.
While certain exemplary embodiments have been described and shown in the accompanying drawings, it is to be understood that such embodiments are merely illustrative of and not restrictive on the broad invention, and that this invention is not limited to the specific constructions and arrangements shown and described, since various other modifications may occur to those of ordinary skill in the art. The description is thus to be regarded as illustrative instead of limiting.
This application claims the benefit pursuant to 35 U.S.C. 119(e) of U.S. Provisional Application No. 62/024,889, filed Jul. 15, 2014, which application is specifically incorporated herein, in its entirety, by reference.
Number | Date | Country | |
---|---|---|---|
62024889 | Jul 2014 | US |