Virtual reality systems can be referred to as immersive multimedia or computer-simulated reality. Such systems replicate an environment that simulates a physical presence in places in the real world or an imagined world, allowing the user to interact in that world. These systems artificially create sensory experiences, which can include sight, hearing, touch, and smell, Some virtual realities generated from the systems are displayed either on a computer screen or with special stereoscopic displays, and some simulations include additional sensory information and focus on real sound through speakers or headphones targeted towards virtual reality users. Virtual reality also covers remote communication environments which provide virtual presence of users with the perceptions of telepresence and tele-existence either through the use of standard input devices such as a keyboard and mouse, or through multimodal devices such as via a wired glove or omnidirectional treadmills, for example. The simulated environment can be similar to the real world in order to create lifelike experiences in simulations for pilot or combat training, for example, or it can differ significantly from reality, such as in virtual reality games.
This disclosure relates systems and methods that allow dynamic visual alterations to the appearance of real world objects in real-time for augmented reality and other applications such as live presentations. User interactions can be captured and one or more augmentation images can be dynamically projected onto a physical object in real time, such that the user does not need to wear special glasses or look through a tablet or other display when visualizing projections. The systems and methods can employ depth and/or color sensing to model and track a physical object as it moves in front of a projector and camera system. As an example, a user can virtually paint or render images onto physical objects to change their appearance or to mark-up an object for others to visualize. To enable real time augmentation of physical objects via user renderings, the systems and methods can generate models of the physical object and construct user renderings in real time. Unlike conventional systems, the physical object being projected onto can move dynamically in a spatial region (e.g., a three-dimensional projection space) and thus, need not be static while images are projected onto the object. Since the object is acquired and dynamically modeled when present in the system, the system does not require a priori scanning or initialization of the target object. A user can indicate how they want to virtually modify the object using a variety of methods such as using a laser pointer, for example. For instance, the user can move laser pointer light across the object to virtually paint on the object. System models then identify the location of the laser pointer light on the object and update projected content rendered by the projected light on to the object in real time. Color of the projection can be selected via the same or a different user interface (e.g., touch screen, mouse, keyboard or the like).
The physical object 110 is interacted with via user interactions (or commands received from an interface). Such user interactions can be captured from images (e.g., RGB images and/or depth images) acquired over time within an interaction space of a processing system such as depicted in the example of
An object model generator 130 is provided to generate an object model from segmented image data captured from the physical object 110. The object model includes data representing location and geometry of the physical object 110. A user input detector 140 generates an augmentation model that includes data representing a graphical image and location information thereof with respect to the physical object 110 in response to the user interaction associated with the physical object. For example, the user input detector can receive image data acquired by a camera for the interaction space in which the physical object has been placed. The interaction space can be defined by or include a three dimensional spatial region corresponding to the field of view of an image capture device (e.g., a camera).
An augmentation mapper 150 maps the augmentation model to the object model such that the graphical image of the augmentation model is spatially linked to the physical object. As shown output from the augmentation mapper 150 includes spatial linkage data that can be employed by projection blocks (See e.g.,
A processor 214 executes various executable functions from a memory 220. The memory 220 stores non-transitory computer readable instructions that are executed by the processor 214. The memory 220 includes instructions for an object detector 224 that employs a segmentation engine 226 to generate segmented image data by segmenting two- or three-dimensional images of the physical object 210 from background image pixels. An object position detector/tracker 230 detects a position of the physical object 210 (e.g., in the image space) via movement of the segmented three dimensional image pixels as the physical object moves from a first position to a second position. An object model generator 234 generates an object model from the segmented three dimensional image data captured from the physical object 210. The object model includes data (e.g., object geometry or volumetric data) representing location geometry and direction of the physical object as detected by the object position detector 230. As shown, the object model can be geometrically transformed via geometric transform 234 where the transform may include generating a volumetric or geometric representation of the object model.
A user input detector 236 receives input, representing user interactions, and generates an augmentation model 240 that includes data representing a graphical image and location information thereof with respect to the physical object 210 in response to a user interaction associated with the physical object. The augmentation model 240 defines coordinates of the data representing the graphical image and where the data should be projected onto the physical object. The input representing user interactions can be provided from a user input device 238 and/or from the camera 204 as part of the image data. An augmentation mapper 244 maps the augmentation model 240 to the object model such that the graphical image of the augmentation model is spatially linked to the physical object 210.
By way of example, the user input detector 236 can include an extractor 250 to extract user interactions from the input data stream generated by the camera 204. Output from the extractor 250 can be provided to a command interpreter 254 which drives the augmentation model 240. The command interpreter 254 determines whether the user interactions are gestures such as drawing an image or a global command such as indicating changing the color of the physical object 210 or a portion of the physical object. The command interpreter 254 can also receive commands directly from the user input device 238 as shown. Output from the extractor 250 can be processed by a location and three dimensional transform 260 which determines a location and the coordinates for the user interactions with the physical object 210. For instance, the extractor 250 can utilize a pixel threshold to determine light interactions that are above the light levels detected from the object. Output from the augmentation mapper 244 can be provided to a projection transform 264 that provides graphical image coordinates and image pixel data to project user renderings onto the physical object 210. A projector 270 projects the graphical image from the augmentation model 244 and projection transform 264 onto the physical object 210 based on the spatial linkages specified by the augmentation model 244.
The object position detector 230 detects a position (e.g., location and orientation) of the physical object in real time, such as it moves from a first position to a second position. The object position detector 230 determines movement of the physical object 210 by utilizing a correspondence and transformation identifying algorithm such as an iterative closet point (ICP) computation in one example between a first set of points associated with the first position and a second set of points associated with the second position by minimizing an error metric between the first set of points and the second set of points. If movement is detected, the object model can be modified based on the detected position and determined movement of the physical object to maintain the spatial link between the augmentation model 240 and the physical object as the physical object moves from the first position to the second position. The object model generator 234 also divides the volumetric representation of the physical object into pixel voxels that define three dimensional units of pixel image space related to the physical object 210. The object model generator 234 in one example employs a signed distance function to classify each voxel as empty, unseen, or near the surface of a volumetric representation based on a threshold value applied to each voxel. The geometric transform 234 transforms the volumetric representation of the physical object into a geometric representation of the physical object 210. In one example, the geometric transform 234 employs a marching cubes transform to convert the volumetric representation of the physical object 210 into a geometric representation of the physical object. The marching cubes transform creates surface for each voxel detected near the surface of the physical object by comparing each voxel to a table of voxel-image links that connect voxel types to a predetermined geometric representation of the voxel. When a voxel type has been determined, the portion of the volume where the voxel type has been detected can be assigned to the respective geometric type linked by the voxel type.
Input to the system can be received by the 3D camera 204 which generates color and depth image streams of the target area on the physical object 210. An active depth camera can be employed as the camera 204 but passive stereo cameras or other types of sensing technologies are also possible. One processing aspect includes identifying what parts of the view are background objects or surfaces that should be ignored and what parts may be foreground objects to be tracked and reconstructed via the object detector 224 and segmentation engine 226. In one example, the system 200 can assume some initial views of only the background environment without any foreground objects. The objector detector 224 can then generate a per-pixel model using the color and depth streams to enable real-time segmentation of foreground and background objects via the segmentation engine 226. Other processing for background removal is possible, such as assuming a planar surface in the scene comprising the background. Thus, in one example the background image pixels can be captured from a pre-saved image of a background image before image pixels of the physical object appears with the background image. In another example, the background image pixels can be determined dynamically based on ambient sensor data received from the three dimensional camera 204 before the image pixels of the physical object appears before the camera. Motion can also be used to help identify foreground objects. A binary mask can be employed by the segmentation engine 226 to subtract the background image pixels from the image pixels of the physical object 210.
The object position detector 230 determines how the object 210 has moved relative to the previous time instance when it was observed. This motion information is utilized to understand how the object is moving relative to the camera 204 so that a 3D model of the object can be built up over time and the viewpoint relative to the object can be determined. An Iterative Closest Point (ICP) algorithm can be utilized in one example for position detections. Given two sets of points in 3D space, one set from a previous view and another set of 3D points for a current view, ICP computes the transformation (rotation and translation) to align the two sets of points. This transformation corresponds to the motion of the object relative to the camera. Other motion estimation computations are also possible. The object model generator 234 generates a 3D model of the object. The object model generator 234 can update the model over time as the camera 204 acquires new images of the object. A volumetric representation of the object using a Truncated Signed Distance Function (TSDF) technique can be employed to determine the volumetric representation. This merges 3D point cloud data into the volumetric model and updates the object model as desired. For rendering the 3D model (such as for re-projection) it is useful to have an explicit geometric representation suitable for computer graphics display. As mentioned above, the geometric transform 234 can employ a marching cubes computation technique, or other methods, to convert from the volumetric representation to a geometric mesh representation.
The extractor 250 determines where the user is drawing or marking-up the object 210. A laser pointer in one example can be utilized to draw on the object 210. To identify the laser light in the input stream, the extractor identifies the brightest pixels in the color stream that are above a certain threshold and matches expected laser colors. When the 2D location of the laser light has been identified, it can be transformed to the appropriate 3D space for updating the augmentation model 240. The updated augmentation maps from the augmentation mapper 244 are mapped into the relative space of the projector 270 via the projection transform 264 (using the 3D model) and the augmentation is projected onto the object 210.
where E is the error metric,
In view of the foregoing structural and functional features described above, an example method will be better appreciated with reference to
What have been described above are examples. It is, of course, not possible to describe every conceivable combination of components or methods, but one of ordinary skill in the art will recognize that many further combinations and permutations are possible. Accordingly, the invention is intended to embrace all such alterations, modifications, and variations that fall within the scope of this application, including the appended claims. Additionally, where the disclosure or claims recite “a,” “an,” “a first,” or “another” element, or the equivalent thereof, it should be interpreted to include one or more than one such element neither requiring nor excluding two or more such elements. As used herein; the term “includes” means includes but not limited to, and the term “including” means including but not limited to. The term “based on” means based at least in part on.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2016/012364 | 1/2/2016 | WO | 00 |