Virtual reality (VR) provides a simulated environment that can be quite similar or vastly different from the real world. Instead of viewing a screen in front of them, users are immersed and able to interact in three-dimensional (3D) worlds. Simulating a physical presence in a real or imaginary world is one of the major aspects of virtual reality and this ability has taken the gaming industry among other fields to a heightened level of the senses. The technology is new and innovative and provides many different types of experiences that are both safe and enjoyable and which further allow easy remote access.
The accompanying drawings illustrate various implementations of the principles described herein and are a part of the specification. The illustrated implementations are merely examples and do not limit the scope of the claims.
Throughout the drawings, identical reference numbers designate similar, but not necessarily identical, elements.
In the following description, for purposes of explanation, specific details are set forth in order to provide a thorough understanding of the disclosure. It will be apparent, however, to one skilled in the art that examples consistent with the present disclosure may be practiced without these specific details. Reference in the specification to “an implementation,” “an example” or similar language means that a particular feature, structure, or characteristic described in connection with the implementation or example is included in at least that one implementation, but not necessarily in other implementations. The various instances of the phrase “in one implementation” or similar phrases in various places in the specification are not necessarily all referring to the same implementation.
The preceding description has been presented only to illustrate and describe examples of the principles described. This description is not intended to be exhaustive or to limit these principles to any precise form disclosed. Many modifications and variations are possible in light of the above teaching.
Users typically rely on multiple gesture hardware to interact within a virtual reality (VR) environment. For example, control devices that are maneuvered with hands are relied on to track hand movements of the user. A control device may further be used for head tracking and eye tracking. Multiple devices may be used to track different parts of the body. None of the current solutions provide accurate full body tracking.
Furthermore, typically one or more depth cameras are used to generate a model of the user. The depth cameras must be positioned directly in front of the user to ascertain the range of area in which a user can move. Three-dimensional (3D) data captured by the depth cameras is used to create 3D vectors of various widths, heights, and distances of the user's body structures from the sensor. Each body part may be characterized as a mathematical vector defining joints and bones of a skeletal model.
Red, green, blue (RGB) cameras in conjunction with a sensor (e.g., depth camera, a second RGB camera, etc.) may also be used to obtain skeletal models. A color image is expressed in pixel form and a matrix is used to correlate each pixel in the image to a value that represents the distance of the pixel from the sensor. In this manner, the body of a user playing an on-screen character (e.g., avatar, etc.) can be tracked and a skeletal model may be used to model the user as the on-screen character. Current processing requires multiple steps such as estimations, interpolations, and extrapolations from the data to accomplish any meaningful representation which can be inaccurate and inefficient.
The following provides human interaction in VR by replacing gesture tracking devices with 3D real time tracking. A single RGB camera may be used to provide a new level of interaction that can completely or nearly completely remove the need of handheld control devices during play time. Instead of a skeletal vector-based model, image mapping in real-time provides a 3D surface representation model. The tracking information provided can cover the whole body and provide the necessary input from the user to properly interact in VR with both simple (e.g., waving arm, leg lift, etc.) and complex body movements (e.g., throwing a ball, dance movement, etc.). The user's whole body can be captured rather than line-based frames associated with skeletal vector-based models or partial body models or models that depend on multiple devices. Images captured within a physical environment are used to map one or more 3D surface representation models within a VR environment.
“Virtual reality” (VR), as used herein, is an interactive experience that immerses users in a fully artificial digital environment. The digital environment provides a simulated environment that can be similar to or completely different from the real world. Applications of virtual reality can include entertainment (i.e., gaming) and educational purposes (i.e. medical or military training).
“Augmented reality” (AR), as used herein, is an interactive experience that overlays virtual objects on the real world environment. Objects that reside in the real-world are enhanced by computer-generated perceptual information that can be constructive (e.g., additive to the natural environment, user, object, etc.), or destructive (e.g., masking of the natural environment, user, object, etc.). In this way, augmented reality alters one's ongoing perception of a real-world environment, whereas virtual reality completely replaces the users real-world environment with a simulated one. Immersive perceptual information is sometimes combined with supplemental information like scores over a live video feed of a sporting event.
“Mixed reality” (MR), as used herein, denotes the merging of real and virtual worlds to produce new environments and visualizations. Physical and digital objects co-exist and interact in real time. Users can interact with both the real world and the virtual environment. MR takes place not only in the physical world or the virtual world, but is a hybrid of reality and virtual reality, encompassing both augmented reality and augmented virtuality via immersive technology.
Each environment can be compared and contrasted to further differentiate them from each other. In MR and AR, virtual objects behave based on the user's perspective the real world. In VR, virtual objects change their position and size based on the user's perspective in the virtual world. In MR and VR, perfectly rendered virtual objects can't be distinguished from real objects. In AR, virtual objects can be identified based on their nature and behavior, such as floating text that follows a user. In AR and MR, a combination of computer-generated images and real life objects are used. In VR, computer graphics are generated by a computer. In VR, head-mounted displays (HMDs) are used. In AR/MR, HMDs can be used as well as holodisplays, head up technology, and other viewing technology.
While reference is made to one or more of the environments throughout the description, it is to be understood that each environment is interchangeable according to principles discussed herein.
A “head-mounted display” (HMD), as used herein, includes a display device worn on the head. It may be part of a helmet, a set of goggles, or other head supported device. The HMD includes a small display optic in front of one or each eye. Variations of an HMD are also anticipated, such as an optical HMD, which include a wearable display that can reflect projected images and allow a user to see through it.
An example VR system for providing real time user interaction according to principles discussed herein includes at least one RGB camera to capture image data of a user's body in a physical environment. A processor receives the image data to formulate a 3D model of the user's body and then maps the 3D model inside a VR environment. An HMD displays the 3D model and movements of the 3D model within the VR environment. In this manner, a whole host of opportunities awaits the user in the form of the 3D model in worlds without end.
An example method of providing real time user interaction in an AR environment according to principles discussed herein includes the use of 3D tracking to capture image data of the user's body in a physical environment. The method further includes formulating a 3D model of a user's body with the image data, mapping the 3D model inside an AR environment that is visible to a user wearing an HMD, and replicating at least one body movement of the user using the 3D model within the AR environment.
An example non-transitory computer readable medium according to principles discussed herein includes usable program code to, when executed by a processor, use 3D tracking to capture image data of the user's body in a physical environment, formulate a 3D model of a user's body with the image data, map the 3D model inside a VR environment, and replicate at least one or more body movements of the user using the 3D model within the VR environment. The 3D model and VR environment are visible to a user wearing an HMD.
Turning to
As shown, the RGB camera 104 is positioned at an angle in
An RGB camera may be placed directly in front of the user but is not restricted to being placed directly in front of the user like common image acquisition tracking systems, and may instead be angled to get different sides of the user's body. For example, an RGB camera may be placed in an upper corner of a square room because the image capture being relied on does not require frontal placement. In this way, the user has much more freedom than when using traditional forms of capture. For example, a user's movements of turning around are easily captured with an RGB camera having a range of placements. Also, users that are standing one in front of the other are not hidden from a corner placement of the camera like they would be from cameras placed directly in front of the users that use other techniques.
Other placements of an RGB camera besides corner placements are anticipated. For example, an RGB camera may be strategically positioned directly in front of the user like traditional capture or on top of the ceiling facing down on top of the user's body. The RGB camera may be strategically positioned at a side of the room to provide a side stance of the user. Other placements are also anticipated.
Note that additional RGB cameras may also be used to create the 3D models. Furthermore, other types of tracking cameras and devices may be used to create the 3D models. For example, depth cameras, 3D cameras, movement sensors, and other devices that are used in motion capture may be used.
Note that inanimate objects such as balls and other objects can be made into 3D models. The virtual environment can be the same or different than the original environment. The tracked image data can be manipulated and rendered inside the VR environment to simulate a movement that is representative or nearly identical to that of the user 101 or whatever is being tracked.
Users can further benefit from HMDs that enables users to see themselves within the VR environment. Turning to
A VR backpack or other VR portable device may contain, for example, a processor that processes the image data. A battery may also be contained in the VR backpack to power the VR system for maximum portability. The user may thus carry the entire system on his body and can freely move around from place to place and from one location to another location and from one RGB camera in one location to another location having a different RGB camera.
As an alternative, the users 202 and 203 may see each other as 3D models 232 and 223 in a VR environment but not the same VR environment as each other. For example,
The concepts presented herein can be extended to a virtual educational setting 320 as shown in
The concepts presented herein can be extended to several applications in VR like virtual meeting rooms. For example, the concept can be applied to virtual meeting rooms by having users tracked and mapped into a virtual meeting room with many other virtual attendees. The room can be a real or virtual meeting room mapped into a VR environment that will join all tracked users into the same or virtual place. Each user can be in a different physical place while attending the meeting but still be virtually present with all other attendees. A virtual meeting room is shown in
An MR environment is shown in
An endless array of possibilities await the creation of a virtual experience. For example, one user may appear as a likeness of themselves while another user appears as a 3D model or as a different type of entity altogether (e.g., animal, inanimate object, etc.). Environments may be the same or different to each user, including mixed reality being only apparent to certain users and not others. This allows imaginative play and creative ways of leading the user through a virtual environment.
In
The representation 439 in
The RGB image 438 may be segmented, or otherwise partitioned, to define common body parts (e.g., arms, legs, clothing area, etc.) 446 or other areas as desired as represented by
Different techniques are anticipate for modeling. Object detection and object classification techniques may be used. Object detection is used to correctly label a dominant object in an image. Object classification is used to provide a correct label and location of objects in an image. Object detection may incorporate regions of interest (RoIs) pooling to match sample points with body parts. A region of interest includes a number of samples within a data set that are identified for a particular purpose, such as a data set that may represent an area of the body (e.g., arm, leg, face, hand, etc.). Region of interest pooling (RoI pooling) is an operation widely used in object detection tasks using convolutional neural networks.
Image annotation is performed to classify information that is of relevance to an image. For example, segmentation is a form of image annotation that partitions or labels body parts so that an association or correspondence can be made between image pixels and the object of the body. The partitions or labels are used for surface coordinate prediction on 3D models.
3D pose estimation techniques may be used to determine the transformation of the object in a 2D image into the 3D object. For example, there exist environments where it is difficult to extract corners or edges from an mage. To circumvent these issues, the object is dealt with as a whole through the use of free-form contours. Also, dense human pose estimation techniques can be used.
Regression techniques can be performed for coordinate predictions on the 3D model and assignments can be made where each pixel within the body part should be assigned to the 3D model.
A set of images may be used for correspondence. In one example, a set of six rendered views of the same face but with each face directed at a different angle (e.g., front view, side view, angled view, etc.) may be provided and the body part of the user can be matched with one of the images to help correspond a sample point within the body part with the appropriate location on the face. Each view includes surface coordinates, and the surface coordinates can be used to localize the collected 2D points on the 3D model. This technique along with others can be used so that each pixel of an image can be associated with a unique surface coordinate of a 3D model.
Manipulated surfaces or texture using data from the original image or other information can be drawn over the body parts 547 to create an avatar or other computerized image. For example, body parts can be manipulated so that they appear different from how they appeared in an original image to change the body appearance in the original image. As shown in
Corresponding body parts 446 to a 3D model 506 in real-time provides an approach for mapping all human pixels of two-dimensional (2D) RGB images to 3D surface-based models of bodies. Each model may be made of surface coordinates where each pixel can be associated with a unique surface coordinate. In
Textures can be applied to one or more 3D models. A texture may include body parts from the original image or manipulated model surfaces as discussed above. Alternatively, textures may include information from other images and other manipulated model surfaces. Information that is not related to the original image may be applied. Textures allow creative application modeling to enhance the virtual experience.
A surface can be broken into many body parts that will be correlated to the body areas found in the original image.
The process can be implemented automatically and in real time to provide a dynamic experience for the user. For example, operations may include handling multiple frames per second on a single GPU and include tens or even hundreds of humans simultaneously.
An HMD 759 is included in the VR system 750 to provide visual information to the user. The information can include a 3D model of the user that has movement associated with movement of the user. The movement of the 3D model may be simultaneous or nearly simultaneous with the movement of the user. Alternatively, movement of the user may be displayed as movements performed by other figures or objects on the HMD. Moreover, the user's movements may be related to more than one figure or object on the display at the same time or at other times throughout a VR experience. Visual information may be available through various displays, such as through a display screen, glasses, wrist device (e.g., wrist watch, bracelet, wristband, etc.), headset, hologram, or other type of visual system. The visual display 759 may be worn by the user (e.g., on head, shoulders, arms) or be independently displayed to the user and not physically supported by the user.
Visual information may include an interaction of the 3D model with one or more virtual objects, where information pertaining to the interaction may be received from an application 760. The example VR system 750 shown includes application 760, which may be a device or software application and which provides a game or other type of program facilitating the user's interaction and 3D model movement within the VR system 750. The application 760 makes use of the various information provided by the RGB camera as well as other information that may be available such as audio information, information from other cameras or other tracking devices, or information made available by interfacing with other device or software applications.
A flow diagram 860 illustrates an example process of incorporating a mesh model or 3D model into the VR system 750 and will be discussed as it relates to elements of the VR system in
At block 866, the computing system 754 maps the 3D model to a virtual environment like the ones described in
By obtaining additional images through multiple image acquisition (e.g., video recording, etc.), the computing system 754 is able to show shifts in position of the 3D model that replicate movement of the 3D model corresponding to movement of the user. This is represented by block 868. At block 869, the movement of the 3D model is displayed to the user. While reference is made to the computing system 754, note that the application 760 may be made to perform all or some of the various acts described in the flow diagram 860.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2019/038988 | 6/25/2019 | WO | 00 |