At least one of the present embodiments generally relates to augmented reality and more particularly to the anchors used for positioning in a virtual environment.
Augmented reality (AR) is a concept and a set of technologies for merging real and virtual elements to produce visualizations where physical and digital objects co-exist and interact in real time. AR visualizations require a means to see augmented virtual elements as a part of the physical view. This can be implemented using an augmented reality terminal (AR terminal) equipped with a camera and a display, which captures video from the user’s environment and combines this captured information with virtual elements on a display. Examples of such devices are such as smartphones, tablets or head-mounted displays. 3D models and animations are the most obvious virtual elements to be visualized in AR. However, AR objects can more generally be any digital information for which spatiality (3D position and orientation in space) gives added value, for example pictures, videos, graphics, text, and audio. AR visualizations can be seen correctly from different viewpoints, so that when a user changes his/her viewpoint, virtual elements stay or act as if they would be part of the physical scene. This requires tracking technologies for deriving 3D properties of the environment to produce AR content, and when viewing the content, for tracking the position of the AR terminal with respect to the environment. The AR terminal’s position can be tracked, for example by tracking known objects or visual features in the AR terminal’s video stream and/or using one or more sensors. Before the AR objects can be augmented into physical reality, their positions must be defined with respect to the physical environment. A particular challenge of augmented reality is when multiple users access the same AR scene and thus can interact together though this virtual environment. A precise and reliable positioning of AR terminals is a critical aspect of an AR system since such feature is mandatory to enjoy the AR experience.
In at least one embodiment, in an augmented reality system, helper data are associated to augmented reality anchors to describe the surroundings of the anchor in the real environment. This allows to verify that positional tracking is correct, in other words, that an augmented reality terminal is localized at the right place in an augmented reality scene. This helper data may be shown on request. Typical examples of helper data are a cropped 2D image or a 3D mesh.
A first aspect of at least one embodiment is directed to a method for creating an anchor for an augmented reality scene comprising displaying feature points detected while displaying an augmented reality scene, obtaining the selection of at least one feature point, capturing helper data, and creating a new anchor and associate to it parameters of the selected at least one feature point and the captured helper data.
A second aspect of at least one embodiment is directed to a method for displaying an augmented reality scene on an augmented reality terminal comprising, when the display of helper data is activated and an augmented reality anchor is detected, obtaining a helper data associated to the detected augmented reality anchor and displaying a graphical representation of the helper data.
A third aspect of at least one embodiment is directed to a method for verifying an augmented reality anchor in an augmented reality scene on an augmented reality terminal, the method comprising determining an augmented reality anchor corresponding to at least one feature point detected while displaying an augmented reality scene, obtaining helper data associated to the detected augmented reality anchor, obtaining captured data representative of a real-world scene, comparing the helper data to the captured data and responsively triggering a recovery.
A fourth aspect of at least one embodiment is directed to an apparatus for creating an anchor for an augmented reality scene comprising a processor configured to display feature points detected while displaying an augmented reality scene, obtain the selection of at least one feature point, capture helper data, and create a new anchor and associate to it parameters of the selected at least one feature point and the captured helper data.
A fifth aspect of at least one embodiment is directed to an apparatus for displaying an augmented reality scene on an augmented reality terminal comprising a processor configured to, when the display of helper data is activated and an augmented reality anchor is detected, obtain a helper data associated to the detected augmented reality anchor and display a graphical representation of the helper data.
A sixth aspect of at least one embodiment is directed to an apparatus for verifying an augmented reality anchor in an augmented reality scene on an augmented reality terminal, the apparatus comprising a processor configured to determine an augmented reality anchor corresponding to at least one feature point detected while displaying an augmented reality scene, obtain helper data associated to the detected augmented reality anchor, obtain captured data representative of a real-world scene, compare the helper data to the captured data and responsively trigger a recovery.
A seventh aspect of at least one embodiment is directed to an augmented reality system comprising an augmented reality scene, an augmented reality controller and an augmented reality terminal, wherein the augmented reality scene comprises an augmented reality anchor associated to parameters of a feature point of a representation of the augmented reality scene and helper data representative of surrounding of the augmented reality anchor.
According to variants of these seven embodiments, the helper data is based on a picture captured when creating the anchor, or is a cropped version of the picture captured when creating the anchor or is based on a three-dimensional mesh captured when creating the anchor.
According to an eighth aspect of at least one embodiment, a computer program comprising program code instructions executable for implementing at least the steps of a method according to one of the first three aspects when executed by a processor.
According to a ninth aspect of at least one embodiment, a non-transitory computer readable medium comprises program code instructions executable for implementing at least the steps of a method according to one of the first three aspects when executed by a processor.
In a collaborative experience using the system of
Defining the position and orientation of a real object in space is known as positional tracking and may be determined with the help of sensors. Sensors record the signal from the real object when it moves or is moved, and the corresponding information is analyzed with regards to the overall real environment to determine the position. Different mechanisms can be used for the positional tracking of an AR terminal including wireless tracking, optical tracking with or without markers, inertial tracking, sensor fusion, acoustic tracking, etc.
In consumer environments, optical tracking is one of the technique conventionally used for positional tracking. Indeed, typical augmented reality capable devices such as smartphones, tablets or head-mounted displays comprise a camera able to provide images of the scene facing the device. Some AR systems use visible markers like QR codes physically printed and positioned at a known location both in the real scene and in the AR scene, thus enabling to perform a correspondence between virtual and real worlds when detecting these QR codes.
Less intrusive markerless AR systems may use a two step approach were the AR scene is first modeled to enable the positioning in a second step. The modeling may be done for example through a capture of a real environment. Feature points are detected from the captured data corresponding to the real environment. A feature point is a trackable 3D point so it is mandatory that it can be differentiated from its closest points in the current image. With this requirement, it is possible to match it uniquely with a corresponding point in a video sequence corresponding to the captured environment. Therefore, the neighborhood of a feature should be sufficiently different from the neighborhoods obtained after a small displacement. Usually, it is high frequency point like a corner. Typical examples of such points are a corner of a table, the junction between the floor and the wall, a knob on a furniture equipment, the border of a frame on a wall, etc. An AR scene also be modeled instead of captured. In this case, anchors are associated to selected distinctive points in the virtual environment. Then, when using such AR system, the captured image from an AR terminal is continuously analyzed to recognize the previously determined distinctive points and thus make the correspondence with their position in the virtual environment allowing thus to determine the position of the AR terminal.
In addition, some AR systems combines the 2D feature points of captured image with depth information for example obtained through a time-of-flight sensor or with motion information for example obtained from accelerometers, gyroscopes or inertial measurement units based on micromechanical systems.
According to the system described in
In order to minimize the positional tracking computation workload, some AR systems use a subset of selected feature points named anchors. While a typical virtual environment may comprise hundreds or thousands of feature points, anchors are generally predetermined within the AR scene, for example manually selected when building the AR scene. A typical AR scene may comprise around half a dozen of anchors, therefore minimizing the computation resources required for the positional tracking. An anchor is a virtual object defined by a pose (position and a rotation) in a world frame. An anchor is associated with a set of features points that define a unique signature. The anchor position is consequently very stable and robust. When an anchor has been placed in a zone of an AR scene, the visualization of said zone when captured by the camera of an AR terminal will lead to an update of the localization. This is done in order to correct any drifts. In addition, virtual objects of an AR scene are generally attached to anchors to secure their spatial position in the world frame.
Anchors may be defined using ray casting. Feature points are displayed as virtual 3D particles. The user will make sure to select an object belonging to a dense set, this will give a stronger signature to the area. The pose of the feature point hit by the ray will give the pose of the anchor.
The elements 250A and 250B respectively shown in
The difference between users A and B is that the Alice is well positioned within the AR scene. More exactly, the position of the AR terminal 100A is correct while the position of the AR terminal 100B is incorrect. Indeed, the corner 230 corresponds to an anchor defined in the AR scene by Alice and used to position the virtual object 270. Bob is trying to visualize the anchor set by Alice. However, since the corners 230 and the corner 240 are very similar in shape and texture, they comprise very similar feature points and the positional tracking has some difficulties to differentiate them. Thus, Bob is believed to be at the same place than Alice, i.e. at the corner of the table where the mug is located. However, the animation shown to Bob is not as expected by the AR scene designer and should only be shown in proximity of the corner 230.
Although this example is a toy example for the sake of drawability of the associated figures, it illustrates the issue of erroneous positioning when using anchors. The situation can be much more complicated in a more realistic situation where the AR scene comprises dozens of virtual objects and the real environment comprises many physical elements such as furniture.
Embodiments described hereafter have been designed with the foregoing in mind.
In at least one embodiment, it is proposed to associate helper data to augmented reality anchors. Helper data may describe the surrounding of the anchor in the real environment and may allow to verify that positional tracking is correct, in other words, that an AR terminal is localized at the right place in an AR scene. This helper data may be shown on request of the user or the AR terminal or the AR controller in order to perform a verification. The disclosure uses an example of 2D image as helper data but other types of helper such as a 3D mesh or a map showing the position of the anchor within the environment can be used according the same principles.
In at least one embodiment, the verification is done by the user. This requires that the helper data be understandable by the user. In at least one embodiment, the helper data is an image of the real environment captured when creating the anchor. Indeed, this is a data easy to capture and simply to understand for the user: he may simply compare visually the helper image with the real environment and decide if the positional tracking is correct. In another embodiment, the verification is done automatically by the AR system, as described further below with regards to
Such verification process 300 can be requested either by the user himself, by the AR terminal or by the AR controller. A first reason for requesting a verification is when something is wrong within the AR scene or when virtual objects do not mix well with the real environment. A second reason is when the position of the virtual objects to append to the real environment do not comply with the captured environment. A third reason is when the position of an AR terminal is incoherent, for example multiple AR terminal are detected to have the same position in the real environment, an AR terminal is detected as being inside an object (i.e. behind the surface of the mesh).
In at least one embodiment, the helper data is a 2D image of the surroundings of the anchor 500. This 2D image was captured at the creation of the anchor thanks to the built-in camera of the AR terminal. Therefore, the helper image 501 comprises a representation 510 of the table 210 and a representation 520 of the mug 220. These representations allow an easy verification by the user: Alice can simply check that the helper image 501 corresponds to the capture of the real environment (as shown in
In another example embodiment not illustrated, the helper data is a 3D mesh of the surroundings of the anchor. Such mesh may be reconstructed by using depth information captured by a depth sensor integrated into the AR terminal or by using other 3D reconstruction techniques for example based on Structure From Motion (SFM) or Multi-View Stereo (MVS). In addition, the mesh may also be textured using information from the 2D image captured by the camera and thus represents a virtual 3D surrounding of the anchor. In this case, the helper data is a 3D textured mesh.
While
In such situation, Bob could simply move around, hoping that the AR system will correctly detect his position. In some situations, he may be obliged to manually force a reset of his localization or even to relaunch the AR application
While
In at least one embodiment, the verification is done automatically by the AR system by computing a distance between the real environment and the helper image. When the distance is smaller than a threshold, the AR system determines that the position is correct and does not interrupt the user experience. The comparison may be done for example at the 2D image level using conventional image processing techniques and algorithms. The comparison may also be done in the 3D space when depth information is available, and the helper data contains such information.
When the helper data is a 2D image, the distance may be computed using well known algorithms. One embodiment uses a “Feature Detection” algorithm like those provided by OpenCv, (SURF for example). Features are computed for image provided with the anchor and for the current frame. Detection is followed by a matching process. To check the success of the operation, a distance criterion between descriptors is applied to the matched elements to filter the result. In an implementation, when the anchor is detected in the field of view of the AR device, the process described above is launched. Precision and recall parameters are evaluated, according to their values, the presence of the anchor can be validated in a first step.
In a second step, in the set of matched points, we look for the closest point of the center, get the correspondence in the current frame and use this point as the result of a ray casting. The distance between the feature point hit by the ray and the anchor is computed. Rotation is not evaluated in this case. A deviation of few centimeters will be accepted.
When the helper data is a 3D textured mesh, it would be possible to compute a distance in 3D space but although it is feasible, it would require heavy computations. Another more efficient way is to go through an intermediate step of 2D rendering. As the pose of the mesh is known (the same than the anchor), the mesh can be rendered from the point of view of the user. The result is a 2D picture. The process described above can be used again.
Other techniques for implementing the automated verification process may be used. For example, deep learning techniques could be used for that purpose, thus operating directly on the 2d images without requiring a feature extraction step.
In at least one embodiment, when the recovery chosen is to relocate the AR anchor, the other AR anchors are also relocated by computing the transform to relocate the original anchor position to its new position, and applying this transform to the other AR anchors. Since such modification may have a huge impact, particularly in a multi-user scenario, it is preferred that such operation must be confirmed by the user through a validation.
The processor 801 may be coupled to an input unit 802 configured to convey user interactions. Multiple types of inputs and modalities can be used for that purpose. Physical keypad or a touch sensitive surface are typical examples of input adapted to this usage although voice control could also be used. In addition, the input unit may also comprise a digital camera able to capture still picture or video that are essential for the AR experience.
The processor 801 may be coupled to a display unit 803 configured to output visual data to be displayed on a screen. Multiple types of displays can be used for that purpose such as a liquid crystal display (LCD) or organic light-emitting diode (OLED) display unit. The processor 801 may also be coupled to an audio unit 804 configured to render sound data to be converted into audio waves through an adapted transducer such as a loudspeaker for example.
The processor 801 may be coupled to a communication interface 805 configured to exchange data with external devices. The communication preferably uses a wireless communication standard to provide mobility of the AR terminal, such as LTE communications, Wi-Fi communications, and the like.
The processor 801 may be coupled to a localization unit 806 configured to localize the AR terminal within its environment. The localization unit may integrate a GPS chipset providing longitude and latitude position regarding the current location of the AR Terminal but also other motion sensors such as an accelerometer and/or an e-compass that provide localization services. It will be appreciated that the AR terminal may acquire location information by way of any suitable location-determination method while remaining consistent with an embodiment.
The processor 801 may access information from, and store data in, the memory 807, that may comprise multiple types of memory including random access memory (RAM), read-only memory (ROM), a hard disk, a subscriber identity module (SIM) card, a memory stick, a secure digital (SD) memory card, any other type of memory storage device. In other embodiments, the processor 801 may access information from, and store data in, memory that is not physically located on the AR terminal, such as on a server, a home computer or another device.
The processor 801 may receive power from the power source 210 and may be configured to distribute and/or control the power to the other components in the AR terminal 800. The power source 210 may be any suitable device for powering the AR terminal. As examples, the power source 210 may include one or more dry cell batteries (e.g., nickelcadmium (NiCd), nickel-zinc (NiZn), nickel metal hydride (NiMH), lithium-ion (Li-ion), and the like), solar cells, fuel cells, and the like.
While
The processor 801 may further be coupled to other peripherals or units not depicted in
As stated above, typical examples of AR terminal are smartphones, tablets, or see-through glasses. However, any device or composition of devices that provides similar functionalities can be used as AR terminal.
The processor 901 may be coupled to a communication interface 902 configured to exchange data with external devices. The communication preferably uses a wireless communication standard to provide mobility of the AR controllers, such as LTE communications, Wi-Fi communications, and the like.
The processor 901 may access information from, and store data in, the memory 903, that may comprise multiple types of memory including random access memory (RAM), read-only memory (ROM), a hard disk, a subscriber identity module (SIM) card, a memory stick, a secure digital (SD) memory card, any other type of memory storage device. In other embodiments, the processor 901 may access information from, and store data in, memory that is not physically located on the AR controller, such as on a server, a home computer or another device. The memory 903 may store the AR scene
The processor 901 may further be coupled to other peripherals or units not depicted in
It will be appreciated that the AR controller 110 may include any sub-combination of the elements described herein while remaining consistent with an embodiment.
Then, in step 1040, the 2D position of the anchor in the image is computed based on a pinhole camera model. To map a 3D point to the image plane, a camera projective model is used as follows:
where:
In at least one embodiment, the display of the helper data is enhanced by using the pose of the corresponding anchor when available. In this case, the helper data is positioned in the 3D space according to the anchor’s pose. When the helper data is a 3D element, such as a 3D mesh or a full 3D model, the orientation of this 3D element will be set so that it matches the pose of the anchor. When the helper data is a 2D image, the 2D rectangle corresponding to the 2D image is warped so that it is positioned in a plane defined by the anchor’s pose.
In at least one embodiment, the helper image is displayed according to the viewer’s orientation so that it is facing the camera.
In at least one embodiment, the helper image is displayed as semi-transparency (using an alpha channel) in order to see “through” the helper to make the comparison easier.
In at least one embodiment, an AR terminal also includes the functionalities of an AR controller and thus allows standalone operation of an AR scene, while still being compatible with embodiments described herein.
Some AR systems balance the computation workload by performing some of the computations in the AR controller, that is typically a computer or server. This requires transmitting the information gathered from the AR terminal sensor to the AR controller.
In at least one embodiment, the pose of the AR Terminal (pose of the device when capturing the picture associated to the anchor) given by the server is also used. The user positions his camera as close as possible to the pose provided, but this only makes sense if there is no error.
An indication concerning the position of the anchor can be provided to the user so that he is moves into the right direction when no anchor is in her/his field of view.
In at least one embodiment, when multiple anchors are detected, both of them are displayed. In another embodiment, when multiple anchors are detected, only one of them is displayed. The selection of the anchor to be displayed may be done according to multiple criteria such as shortest distance, best matching viewing angle, etc.
Reference to “one embodiment” or “an embodiment” or “one implementation” or “an implementation”, as well as other variations thereof, mean that a particular feature, structure, characteristic, and so forth described in connection with the embodiment is included in at least one embodiment. Thus, the appearances of the phrase “in one embodiment” or “in an embodiment” or “in one implementation” or “in an implementation”, as well any other variations, appearing in various places throughout the specification are not necessarily all referring to the same embodiment.
Additionally, this application or its claims may refer to “determining” various pieces of information. Determining the information may include one or more of, for example, estimating the information, calculating the information, predicting the information, or retrieving the information from memory.
Further, this application or its claims may refer to “accessing” various pieces of information. Accessing the information may include one or more of, for example, receiving the information, retrieving the information (for example, from memory), storing the information, moving the information, copying the information, calculating the information, predicting the information, or estimating the information.
Additionally, this application or its claims may refer to “receiving” various pieces of information. Receiving is, as with “accessing”, intended to be a broad term. Receiving the information may include one or more of, for example, accessing the information, or retrieving the information (for example, from memory or optical media storage). Further, “receiving” is typically involved, in one way or another, during operations such as, for example, storing the information, processing the information, transmitting the information, moving the information, copying the information, erasing the information, calculating the information, determining the information, predicting the information, or estimating the information.
It is to be appreciated that the use of any of the following “/”, “and/or”, and “at least one of”, for example, in the cases of “A/B”, “A and/or B” and “at least one of A and B”, is intended to encompass the selection of the first listed option (A) only, or the selection of the second listed option (B) only, or the selection of both options (A and B). As a further example, in the cases of “A, B, and/or C” and “at least one of A, B, and C”, such phrasing is intended to encompass the selection of the first listed option (A) only, or the selection of the second listed option (B) only, or the selection of the third listed option (C) only, or the selection of the first and the second listed options (A and B) only, or the selection of the first and third listed options (A and C) only, or the selection of the second and third listed options (B and C) only, or the selection of all three options (A and B and C). This may be extended, as readily apparent by one of ordinary skill in this and related arts, for as many items listed.
Number | Date | Country | Kind |
---|---|---|---|
20305836.7 | Jul 2020 | EP | regional |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2021/068640 | 7/6/2021 | WO |