At least one of the present embodiments generally relates to augmented reality and more particularly to the generation of a map representing the real environment and the association of this map to an augmented reality scene.
Augmented reality (AR) is a concept and a set of technologies for merging real and virtual elements to produce visualizations where physical and digital objects co-exist and interact in real time. AR visualizations require a means to see augmented virtual elements as a part of the physical view. This can be implemented using an augmented reality terminal (AR terminal) equipped with a camera and a display, which captures video from the user's environment and combines this captured information with virtual elements on a display. Examples of such devices are such as smartphones, tablets or head-mounted displays. 3D models and animations are the most obvious virtual elements to be visualized in AR. However, AR objects can more generally be any digital information for which spatiality (3D position and orientation in space) gives added value, for example pictures, videos, graphics, text, and audio. AR visualizations can be seen correctly from different viewpoints, so that when a user changes his/her viewpoint, virtual elements stay or act as if they would be part of the physical scene. This requires capture and tracking technologies for deriving 3D properties of the environment to produce AR content by scanning the real environment, and when viewing the content, for tracking the position of the AR terminal with respect to the environment. The position of AR objects is defined with respect to the physical environment so that AR objects can be augmented into physical reality. The AR terminal's position can be tracked, for example by tracking known objects in the AR terminal's video stream or using one or more sensors. Typically, a known simple object (printed QR code, picture frame) with a known position within the virtual environment is used when starting an AR session to synchronize the localization.
A challenge for users of an augmented reality system is to localize themselves in the augmented environment. Even though AR applications take place in a physically bounded location such as a room, when a user focuses on his AR terminal, his/her orientation and perception of the environment can be biased. For example, the visual attention of the user is so focused on the screen of the AR terminal that sometimes she/he does not know where she/he is in the room. It is obviously the case for handheld video pass-through devices such as phones and tablets, but it is also true with head-mounted optical see-through displays, because of their limited field of view. To locate themselves in the real world, users are forced to leave their screen to look around, which is not very practical. Additionally, in case of a multi-user application, a user does not necessarily know where the others are located.
Hence it would be useful to display a bird-eye view of the environment (a kind of map) providing an overview of the entire environment and showing the location of other users of the augmented environment in real-time. Such solution is quite common in games and in VR applications since these applications are based on a virtual environment that is manually modeled. It is easy to extract a perfect map from such data. It is not so much used in AR, because generally AR applications are based on a scan of the real environment. This scan allows to correctly position the virtual scene on the top of the real environment. The 3D model of the room can be built from a set of photos taken to cover all the elements in the room, using 3D reconstruction methods for example based on Structure From Motion (SFM) or Multi-View Stereo (MVS) techniques. However, such reconstructed 3D models are often incomplete, noisy, and badly delimitated.
Embodiments described hereafter have been designed with the foregoing in mind.
In at least one embodiment, in an augmented reality system, a map of the real environment is generated from a 3D textured mesh obtained through captured data representing the real environment. Some processing is done on the mesh to remove unnecessary elements and generate the map that comprises a set of 2D pictures: one picture for the ground level and one picture for the other elements of the scene.
The generated map may then be rendered on an AR terminal. The ground and the non-ground content may be rendered independently, then additional elements, such as other users of the AR scene or virtual objects, are localized and represented in the map in real-time using a proxy. Finally, the rendering can be adapted to the user moves, poses and as well to the devices themselves.
A first aspect of at least one embodiment is directed to a method for creating a map representing an augmented reality scene comprising reconstructing a 3D textured mesh from captured data, splitting the reconstructed 3D textured mesh into a first 3D textured mesh in which data representing the ground of the scene have been removed, and a second 3D textured mesh representing the ground of the scene, and rendering a first picture from a top view of the first 3D textured mesh and a second picture from a top view at a detected ground level, wherein the map comprises the first and the second pictures.
A second aspect of at least one embodiment is directed to an apparatus for creating a map representing an augmented reality scene comprising a processor configured to reconstruct a 3D textured mesh from captured data, split the reconstructed 3D textured mesh into a first 3D textured mesh in which data representing the ground of the scene have been removed, and a second 3D textured mesh representing the ground of the scene, and render a first picture from a top view of the first 3D textured mesh and a second picture from a top view at a detected ground level, wherein the map comprises the first and the second pictures.
In variants of first and second aspects, the second 3D textured mesh representing the ground of the scene is replaced by a mesh using a polygonal shape based on intersection lines between detected wall planes and a ground plane, the texture of the second 3D textured mesh is determined by an image inpainting process or regenerated using a texture synthesis or filled uniformly with a single color value representing an average color of the original second picture, the rendering is done using an orthographic camera according to camera parameters based on boundaries of the second 3D textured mesh and pixel size of the first and second pictures, the orthographic camera used is positioned at the center of the augmented reality scene, the center being determined based on boundaries of the second 3D textured mesh, the 3D textured mesh from captured data is cleaned to remove isolated elements, the 3D textured mesh from captured data is cleaned to remove elements outside the detected wall planes and a ground plane of the second 3D textured mesh.
A third aspect of at least one embodiment is directed to a method for displaying a map representing augmented reality scene comprising obtaining data representative of an augmented reality scene, a map generated according to the first aspect, an information representative of user localization, data representative of a capture of the real environment, and displaying a representation of the data representative of a capture of the real environment, on which is overlaid a representation of the data representative of an augmented reality scene, on which is overlaid a representation of the map, on which is overlaid a representation of the user localization.
In variants of the third aspect, the size of the map is responsive to user input, the second picture related to the ground is displayed with a level of transparency.
A fourth aspect of at least one embodiment is directed to an augmented reality system comprising an augmented reality scene, an augmented reality controller, an augmented reality terminal, wherein a map generated according to the first aspect is associated to the augmented reality scene and displayed by the augmented reality terminal.
According to a fifth aspect of at least one embodiment, a computer program comprising program code instructions executable by a processor is presented, the computer program implementing at least the steps of a method according to the first aspect.
According to a sixth aspect of at least one embodiment, a computer program product which is stored on a non-transitory computer readable medium and comprises program code instructions executable by a processor is presented, the computer program product implementing at least the steps of a method according to the first aspect.
To enjoy the AR scene, users will join other users in the shared augmented space using an AR terminal (100A, 100B). The AR terminal displays the virtual objects of the AR scene superimposed to the view of the real-world environment. To ensure consistent interactions with the AR scene, all AR terminals must be continuously localized in the same world frame coordinate system. AR terminals and the AR controller are exchanging data together through respective communication interfaces 111, 101 coupled to a communication network 150. This network is preferably wireless to provide mobility to the AR terminals.
From a functional point of view, AR terminals 100A, 100B may comprise sensing capabilities using sensors 102 such as cameras, inertial measurement units, various input controls (keys, touch screen, microphone), and display capabilities 104 to render the AR scene to the user. An AR application 103 allows to control the interactions between the user, the AR scene and the other users.
In a collaborative experience using the system of
Defining the position and orientation of a real object in space is known as positional tracking and may be determined with the help of sensors. Sensors record the signal from the real object when it moves or is moved, and the corresponding information is analyzed with regards to the overall real environment to determine the pose. Different mechanisms can be used for the positional tracking of an AR terminal including wireless tracking, vision-based tracking with or without markers, inertial tracking, sensor fusion, acoustic tracking, etc.
In consumer environments, optical tracking is one of the techniques conventionally used for positional tracking. Indeed, typical augmented reality capable devices such as smartphones, tablets or head-mounted displays comprise a camera able to provide images of the scene facing the device. Some AR systems use visible markers like QR codes physically printed and positioned at a known location both in the real scene and in the AR scene, thus enabling to perform a correspondence between virtual and real worlds when detecting these QR codes.
Less intrusive markerless AR systems may use a two step approach where the AR scene is first modeled to enable the positioning in a second step. The modeling may be done for example through a capture of a real environment. Feature points are detected from the captured data corresponding to the real environment. A feature point is a trackable 3D point so it is mandatory that it can be differentiated from its closest points in the current image. With this requirement, it is possible to match it uniquely with a corresponding point in a video sequence corresponding to the captured environment. Therefore, the neighborhood of a feature should be sufficiently different from the neighborhoods obtained after a small displacement. Usually, it is high frequency point like a corner. Typical examples of such points are a corner of a table, the junction between the floor and the wall, a knob on a furniture equipment, the border of a frame on a wall, etc. An AR scene also be modeled instead of captured. In this case, anchors are associated to selected distinctive points in the virtual environment. Then, when using such AR system, the captured image from an AR terminal is continuously analyzed to recognize the previously determined distinctive points and thus make the correspondence with their position in the virtual environment allowing thus to determine the pose of the AR terminal.
In addition, some AR systems combines the 2D feature points of captured image with depth information for example obtained through a time-of-flight sensor or with motion information for example obtained from accelerometers, gyroscopes or inertial measurement units based on micromechanical systems.
According to the system described in
In order to minimize the positional tracking computation workload, some AR systems use a subset of selected feature points named anchors. While a typical virtual environment may comprise hundreds or thousands of feature points, anchors are generally predetermined within the AR scene, for example manually selected when building the AR scene. A typical AR scene may comprise around half a dozen of anchors, therefore minimizing the computation resources required for the positional tracking. An anchor is a virtual object defined by a pose (position and a rotation) in a world frame. An anchor is associated with a set of features points that define a unique signature. When an anchor has been placed in a zone of an AR scene, the visualization of said zone when captured by the camera of an AR terminal will lead to an update of the localization. This is done in order to correct any drifts. In addition, virtual objects of an AR scene are generally attached to anchors to secure their spatial position in the world frame.
Anchors may be defined using raycasting. Feature points are displayed as virtual 3D particles. The user will make sure to select an object belonging to a dense set, this will give a stronger signature to the area. The pose of the feature point hit by the ray will give the pose of the anchor.
The processor 201 may be coupled to an input unit 202 configured to convey user interactions. Multiple types of inputs and modalities can be used for that purpose. Physical keypad or a touch sensitive surface are typical examples of input adapted to this usage although voice control could also be used. In addition, the input unit may also comprise a digital camera able to capture still picture or video that are essential for the AR experience.
The processor 201 may be coupled to a display unit 203 configured to output visual data to be displayed on a screen. Multiple types of displays can be used for that purpose such as a liquid crystal display (LCD) or organic light-emitting diode (OLED) display unit. The processor 201 may also be coupled to an audio unit 204 configured to render sound data to be converted into audio waves through an adapted transducer such as a loudspeaker for example.
The processor 201 may be coupled to a communication interface 205 configured to exchange data with external devices. The communication preferably uses a wireless communication standard to provide mobility of the AR terminal, such as LTE communications, Wi-Fi communications, and the like.
The processor 201 may be coupled to a localization unit 206 configured to localize the AR terminal within its environment. The localization unit may integrate a GPS chipset providing longitude and latitude position regarding the current location of the AR Terminal but also other motion sensors such as an accelerometer and/or an e-compass that provide localization services. It will be appreciated that the AR terminal may acquire location information by way of any suitable location-determination method while remaining consistent with an embodiment.
The processor 201 may access information from, and store data in, the memory 207, that may comprise multiple types of memory including random access memory (RAM), read-only memory (ROM), a hard disk, a subscriber identity module (SIM) card, a memory stick, a secure digital (SD) memory card, any other type of memory storage device. In other embodiments, the processor 201 may access information from, and store data in, memory that is not physically located on the AR terminal, such as on a server, a home computer, or another device.
The processor 201 may receive power from the power source 210 and may be configured to distribute and/or control the power to the other components in the AR terminal 200. The power source 210 may be any suitable device for powering the AR terminal. As examples, the power source 210 may include one or more dry cell batteries (e.g., nickel-cadmium (NiCd), nickel-zinc (NiZn), nickel metal hydride (NiMH), lithium-ion (Li-ion), and the like), solar cells, fuel cells, and the like.
While the figure depicts the processor 201 and the other elements 202 to 208 as separate components, it will be appreciated that these elements may be integrated together in an electronic package or chip. It will be appreciated that the AR Terminal 200 may include any sub-combination of the elements described herein while remaining consistent with an embodiment.
The processor 201 may further be coupled to other peripherals or units not depicted in
As stated above, typical examples of AR terminal are smartphones, tablets, or see-through glasses. However, any device or composition of devices that provides similar functionalities can be used as AR terminal.
The processor 301 may be coupled to a communication interface 302 configured to exchange data with external devices. The communication preferably uses a wireless communication standard to provide mobility of the AR controllers, such as LTE communications, Wi-Fi communications, and the like.
The processor 301 may access information from, and store data in, the memory 303, that may comprise multiple types of memory including random access memory (RAM), read-only memory (ROM), a hard disk, a subscriber identity module (SIM) card, a memory stick, a secure digital (SD) memory card, any other type of memory storage device. In other embodiments, the processor 301 may access information from, and store data in, memory that is not physically located on the AR controller, such as on a server, a home computer, or another device. The memory 303 may store the AR scene or the AR scene may be stored using an external memory.
The processor 301 may further be coupled to other peripherals or units not depicted in the figure which may include one or more software and/or hardware modules that provide additional features, functionality and/or wired or wireless connectivity. For example, the peripherals may include keyboard, display, various interfaces such as a universal serial bus (USB) port, a Bluetooth® module, and the like.
It will be appreciated that the AR controller 110 may include any sub-combination of the elements described herein while remaining consistent with an embodiment.
In step 430, the 3D textured mesh is split according to a planar analysis to determine horizontal and vertical planes. The ground plane is determined as being the horizontal plane at lowest vertical position. The ceiling plane is determined as being the horizontal plane at highest vertical position. The 3D mesh corresponding to the ceiling is removed. The wall planes are selected among the vertical planes that surround the scene. Ground corners are extracted as the intersection points between the wall planes and the ground plane and determine the ground area as being contained between four corners, in other words determining the scene boundaries. At that point, a second cleaning phase may be done by removing all the data located outside the bounded space. Indeed, these elements would be behind the walls. Also, the original 3D mesh data corresponding to the ground may also be removed. In addition, to remove the noisy reconstruction around ground, a margin value is also defined to remove the data above and below the detected ground plane. A separate mesh for the ground is built using a geometrical shape (generally a quadrilateral) based on the determined corners. As a result, the data comprises two 3D textured meshes: one very simple for the ground and one for the other elements of the scene, hereafter respectively named the ground mesh and the scene mesh. Examples of scene mesh at this stage of the process are illustrated in
In step 440, the meshes are rendered from a top view to generate 2D images. For that purpose, an orthographic camera is positioned over the scene mesh, in direction to the ground, centered on the origin (point with null coordinates) of the 3D textured meshes, and the scale factor of the camera is adjusted so that the rendering cover the whole scene boundaries. This rendering generates two 2D images, one for the scene and one for the ground, as illustrated in
In step 450, the scene and ground pictures rendered in step 440 may then adjusted when needed. Indeed, according to one rendering technique, the rendering may cover a lot of unnecessary space depending on the position of the origin of the 3D textured meshes. The ground picture is used as a mask to determine the cropping size and thus the size of the scene and ground pictures are reduced accordingly. Optionally, the ground and scene pictures may be rotated if needed. In at least one embodiment, the step 440 comprises an optimal positioning and scaling of the camera (and possibly rotation) over the center of the 3D textured meshes, so that the step 450 becomes unnecessary. Indeed, the rendering will directly provide the ground and scene pictures at the best size. This positioning may be done thanks to measurements of the ground corner positions in the 3D textured mesh.
In step 460, the AR map comprising the ground and scene pictures is generated. Examples of these pictures are illustrated in
The map generation process 400 may be executed by a standalone AR terminal or by an AR controller in combination with an AR terminal. In a typical implementation, the steps after the scanning are performed on an AR Controller to benefit from better computation resources available on such device.
After this broad description, the description hereafter details the different steps of the processes to generate and display an AR map.
The analysis uses a gravity direction that may be determined directly using the sensors equipped in the mobile device of capturing 3D model. For instance, in an example implementation based on an Android platform, a software-based gravity sensor estimates the direction and magnitude of gravity thanks to data provided by the accelerometer and magnetometer or gyroscope of the device. Moreover, the gravity direction can be indicated interactively by the user, in the case of the scene model containing a specific reference object that could be used to re-align that model regarding gravity direction. For instance, in the process of 3D modeling using a photogrammetry approach, the axes of a coordinate system may be indicated manually within a marker image, where Y-axis is inverse to the gravity direction. Then the reconstructed 3D model is transformed into a user-defined coordinate system, thanks to a reference object generally identified as the origin (point with null coordinates) in the virtual environment.
The proposed solution to identify the ground, walls and ceiling will take benefit of the presence of that reference object (or marker), assuming the following constraints:
With the determined direction of gravity, the planar analysis of the scene model can classify the detected planes into horizontal and vertical ones. Thus, the ground plane is determined as the furthest significant horizontal plane along the gravity direction. If it exists, the ceiling plane is determined as the furthest horizontal plane along the inverse gravity direction. And the wall planes are selected among the vertical planes that surround the scene.
A further cleaning can then be done to deal with the noisy data and the isolated components. The significant bounding planes of the scene (walls, ground, ceiling) are detected and the data elements located outside these bounding planes are removed.
For example, we assume that the indoor scene captured and reconstructed as illustrated in
In addition, the original data of ground is also removed for better rendering of AR map and replaced by a separate mesh for the ground as mentioned earlier. The ceiling data, if any, is also removed. Thus, this step generates one 3D textured mesh for the scene and one (very simple) for the ground.
In complex scenes, the space is not limited in a cuboid. An analysis of the reconstructed mesh allows to detect when the room geometry is more complex than a cuboid. This is done thanks by checking the wall and ground intersections. The detection of ground or ceiling plane can be realized as the foregoing. Without the assumption of cuboid scene, the wall planes are selected from the vertical planes to bound the scene as well as possible. For instance, the 3D data of vertical planes with larger area size than a threshold are firstly projected on the detected ground plane. Then a convex hull can be computed from the projected points, which indicates the bounding of the significant scene data. And the wall planes are detected as the set of vertical planes which can best fit the convex hull. Thus, the adjacent wall planes are not necessary to be perpendicular and the number of walls planes can be arbitrary (larger than 2). In this case, a polygonal shape based on the intersection lines between the detected wall plane and the ground plane is used for the ground representation. Another problematic situation is when the real environment is not a closed space with obvious walls or when the walls are far away, for example beyond the scanning range of the device. In this case, the significant vertical planes, such as the furniture planes, form the bounding of the scene. The extraction of these planes can be controlled by configuring a threshold of the size of the area.
Regarding the ground plane, using the corners extracted by intersecting the walls and the ground, a corresponding planar shape is built. The result is generally quadrilateral or polygonal so that a simple meshing can be used. This quadrangle or polygon is positioned at the same height as the ground. For the texture, an average color close to the color of the floor can be chosen as simply as the mean or the median color over all original data of ground. A synthetic texture based on the captured picture of the ground may also be used for increased realism. The partial texture of the ground from the captured picture can be employed to generate the complete but partially synthetic texture using a method of image inpainting. For instance, the quadrangle or polygon is partially mapped with the texture and its fronto-parallel view is synthesized to be used as the input of image inpainting. Then the inpainted image is used as the synthetic texture. A method of texture synthesis can also be employed to generate a new texture image from only a small sample of the captured picture of the ground, i.e. by stitching together small patches of this sample until obtaining the as large as desired texture. Alternatively, the synthetic texture can also come from an available floor texture database: for each texture map available in the database, a similarity measure is computed between sample patches of the original ground texture and sample patches of the texture from the database. Such a similarity measure can be based on a combination of color similarity (sum of square difference for example) and texture similarity (based on Gabor filters for example). The texture from the database with the highest similarity to the original ground texture is retained and cropped to match the desired size. This textured quadrangle or polygon is then used to replace the original reconstructed ground. With this definition of the ground plane, the holes possibly corresponding to non-observed regions and so remaining in the ground after the reconstruction process are no more existing and the ground is completely defined.
The 3D textured mesh of the scene and the 3D textured mesh of the ground are rendered separately but using the exact same camera setup and thus generate two pictures: one for the scene and one for the ground. The result of the rendering is illustrated in
After this rendering, the obtained pictures are cropped according to the ground picture. The ground picture is used as a mask for cropping. In other words, the unused areas of the ground picture define minimal and maximal values in horizontal and vertical directions. The values are used as cropping limits for both the scene picture and the ground picture itself so that only the pixels inside these limits are kept in the resulting pictures. This corresponds to the first part of step 440 of the generation process. An example of result of this cropping is illustrated in
Transform=T(O→M)*(R)*T(M→O)
Applying this transform to the scene picture and the ground picture results in the final corrected images. These images form the foundation of the AR map.
SF=1/(Max(H,W)/2)
Thus, for a room whose dimensions are 3 by 4 meters, a scale factor of 0.5 will be determined. In a second step, the distance of the corners is then determined with respect to the 3D textured mesh origin point 902. The corner with highest coordinates (Cx, Cy) is then selected. In the example of the figure, this corner is C2. A translation vector 903 is then determined as follows:
Tx=Cx−W/2
Ty=Cy−H/2
Once these parameters have been determined, it is possible to position the camera at the center of the scene 904 using the translation vector 903 and adjust the scale to the scale factor SF in order to generate an optimal 2D image of the 3D textured mesh. In order to ensure a better distinguishability of the walls, it is preferred to add a safety factor to the scale factor to cover some empty space around the scene. For example, if the scene width is 10 meters, the determined scale factor would be 1/10/2=0.2. A safety factor of 10% would reduce this value to 0.18, thus effectively covering a greater space roughly equivalent to 11 meters.
Compared to the first example of orthographic projection, the second example of orthographic projection provides a better-quality image since the full resolution of the camera is used to generate the image and no cropping must be done afterwards. It allows to generate directly the images shown in
This is made possible by the tracking of AR terminals and by the knowledge of the virtual objects of the AR scene by the AR controller. In a multi user application, each AR terminal regularly provides its position to the AR controller which then provides this information to all the other AR devices. Then, these devices can update the positions of other users on the map using a specific icon per user or a specific color per user. In the screenshot of
Tracking the AR terminal in the world space allows the system to show the virtual scene from the user's perspective. Indeed, it is possible to know exactly the position and the orientation of the AR terminal in the world frame in real time and to update it accordingly.
The notation for homogeneous transformation 4×4 matrix T is the following:
where is represents the rotation, t represents the translation.
The pose of the camera C1 of an AR terminal (in the world frame) is the following transform:
w
T
C1
Therefore, it is possible to transmit a 3D vector for the position and a quaternion for the orientation (rotation) of the AR terminal. These position and orientation can be shared amongst users of a common AR scene so that each of them can display the pose of the other users on his AR map.
One corner has to be defined as reference corner C, for example the bottom left corner. In a numeric example, if the position of the user in world frame coordinates are (−0.5,0.1,2), the coordinates of the C in world frame coordinates are (−1.5, −1.9,3), one meter is equivalent to 200 pixels (DSF=200).
The coordinates in pixel of the user on the AR map relative to C will be:
(−0.5+1.5)*200=200
(−2+3)*200=200(C is the new reference, we consider X′=X,Y′=−Z)
For the insertion of the AR map, we define an area with dimensions proportional to that of the final picture, then we will fit (interpolation and filtering) the picture in this area. The canvas settings automatically adapt to the resolution of the screen. This will optimize the resolution of the mini map.
The coordinates of the ground corners are expressed in the world coordinate system at real scale. We deduce a display scale factor from the affine transform which rescales the simple geometric shape formed by the ground corners to the canvas area.
The
In another implementation, a slider allows to directly adjust the zoom level to a desired value and the size of the AR map is updated accordingly.
The centering of the window is constrained by the edges of the mini map as illustrated in
The selection of this user-centered cropping feature is preferably under control of the user.
Other features not illustrated can further enhance the AR map.
According to at least one embodiment, the AR map is reoriented according to the orientation of the user within the AR scene, so that the top of the map represents the current orientation of the user. While the former description used an AR map with a fixed orientation, having a variable map orientation allows for improved way finding. Such feature is preferably used with a circular AR map instead of the square or rectangular AR map used throughout the description.
According to at least one embodiment, the AR map further displays labels to identify objects of the AR scene. These objects may be determined by a segmentation step of the AR scene to determine the objects and associate them labels. These elements can further be stored as parameters of the AR scene.
According to at least one embodiment, the AR controller stores the positions of the user over a period of time. This information is then used to display on the AR map the path followed by the users, for example represented as a trail of dots leading to the icon representing the user. The period of time may be adjusted to display either short-term movements (for example the last five seconds) making the map very dynamic or long-term movements for tracking all movements within an AR scene. When an identifier is associated to a set of positions, it is possible to know who and where was the corresponding user.
According to at least one embodiment, the computation workload of the AR terminal is reduced by performing some of the computations in the AR controller, that is typically a computer or server. This requires transmitting the information gathered from the AR terminal sensor to the AR controller.
According to at least one embodiment, an AR terminal also includes the functionalities of an AR controller and thus allows standalone operation of an AR scene, while still being compatible with embodiments described herein. In such mono user application, an on-board map may be used, locally updated with the user position (thanks to a marker for example).
Although the AR map generation process has been described above in a conventional client-server scenario using an AR controller and AR terminals, a peer-to-peer approach is also possible. In such implementation, all the role and functionalities we described as being on the AR controller would be spread on the set of clients for the current session. Some specific elements would need to be added though, to manage the sessions and clients discovery, session model and data persistency, as it is common in peer-to-peer network based system.
A mixed approach is also possible where a first AR terminal operates as a standalone AR system, hosting the AR scene, performing its own localization, enhancing the scene with virtual objects and switching to a peer-to-peer mode when another AR terminal is detected within the AR scene, further sharing the AR scene and interactions together.
Reference to “one embodiment” or “an embodiment” or “one implementation” or “an implementation”, as well as other variations thereof, mean that a particular feature, structure, characteristic, and so forth described in connection with the embodiment is included in at least one embodiment. Thus, the appearances of the phrase “in one embodiment” or “in an embodiment” or “in one implementation” or “in an implementation”, as well any other variations, appearing in various places throughout the specification are not necessarily all referring to the same embodiment.
Additionally, this application or its claims may refer to “determining” various pieces of information. Determining the information may include one or more of, for example, estimating the information, calculating the information, predicting the information, or retrieving the information from memory.
Further, this application or its claims may refer to “accessing” various pieces of information. Accessing the information may include one or more of, for example, receiving the information, retrieving the information (for example, from memory), storing the information, moving the information, copying the information, calculating the information, predicting the information, or estimating the information.
Additionally, this application or its claims may refer to “receiving” various pieces of information. Receiving is, as with “accessing”, intended to be a broad term. Receiving the information may include one or more of, for example, accessing the information, or retrieving the information (for example, from memory or optical media storage). Further, “receiving” is typically involved, in one way or another, during operations such as, for example, storing the information, processing the information, transmitting the information, moving the information, copying the information, erasing the information, calculating the information, determining the information, predicting the information, or estimating the information.
It is to be appreciated that the use of any of the following “/”, “and/or”, and “at least one of”, for example, in the cases of “A/B”, “A and/or B” and “at least one of A and B”, is intended to encompass the selection of the first listed option (A) only, or the selection of the second listed option (B) only, or the selection of both options (A and B). As a further example, in the cases of “A, B, and/or C” and “at least one of A, B, and C”, such phrasing is intended to encompass the selection of the first listed option (A) only, or the selection of the second listed option (B) only, or the selection of the third listed option (C) only, or the selection of the first and the second listed options (A and B) only, or the selection of the first and third listed options (A and C) only, or the selection of the second and third listed options (B and C) only, or the selection of all three options (A and B and C). This may be extended, as readily apparent by one of ordinary skill in this and related arts, for as many items listed.
Number | Date | Country | Kind |
---|---|---|---|
20305839.1 | Jul 2020 | EP | regional |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2021/068642 | 7/6/2021 | WO |