XR devices (e.g., Virtual Reality (VR) displays and Augmented Reality (AR) displays) are becoming more and more affordable, and prevalent as consumer and enterprise devices. Using an XR device to its full potential, however, requires some heavy practical self-training from the XR device user in order to become proficient in the use of an XR environment.
Having an additional person able to access and view a real-time stream consumed by an XR device user, and to help guide the XR device user in an XR environment in real time would considerably lower the difficulty of the XR device user's training. Furthermore, this additional person would be in a position, for instance, to monitor, in the stream, the occurrence of events that could offend the XR device user, to surveil whether the XR device user is consuming the stream they are supposed to be consuming, to simply enjoy the stream consumed by the XR device user, and/or to act as a commentator on the XR device user's actions. There is thus a need for having a device, controlled by the additional person, able to seamlessly access a stream rendered by an XR device, more precisely a replica stream representing what it is currently rendering, or switch between multiple replica streams. Hereinafter, the additional person and the device controlled by the additional person are to be called the moderator and the supervisory device, respectively.
In one aspect, the disclosure relates to a method for operating a supervisory device to supervise a plurality of XR device users. A problem arises in that a supervisor hoping to assist a user in operating in the XR environment cannot easily obtain a representation of the XR environment which a particular user amongst the plurality of users is currently perceiving. The disclosure tackles this problem by receiving, from a user of the supervisory device, a selection of a location of a particular XR device. The supervisory device responds by retrieving a replica stream associated with the XR device at the selected location, and generating the replica stream for display on the supervisory device.
In some embodiments, the supervisory device generates and displays a map including map symbols representing the XR devices, with an overlay of the replica stream of one or more of the XR devices being placed on the map in association with the map symbols representing the respective XR devices.
In other embodiments, the supervisory device includes camera apparatus, obtains a live video from the camera, and displays the live video from the camera apparatus with one or more overlays representing the replica stream of one or more of the respective XR devices, each overlay being displayed in association with the image of the corresponding XR device in the live video.
The present disclosure relates to methods and systems for operating a supervisory device able to access replica streams representing what multiple extended reality (XR) devices are currently rendering. More particularly, but not exclusively, the present disclosure relates to the seamless access, by the supervisory device, to replica streams, each replica stream representing what a XR device of a plurality of XR devices is currently rendering, and the plurality of XR devices and the supervisory device may be located in a given zone.
Methods and systems are thus provided herein for operating a supervisory device. Position data representing a position of each of a plurality of XR devices in association with connection information data representing connection information relating to each of a plurality of XR devices, wherein the connection information comprises a source of a replica stream representing in real-time what is perceived by a user of the XR device, are received. Additionally, mapping data mapping the position data of each XR device to the associated connection information are stored. Furthermore, a selection of a position of an XR device of the plurality of XR devices to select an XR device is received. Moreover, based on the received selection, the connection information associated with the selected XR device are determined using the mapping data. In addition, the replica stream of the selected XR device is retrieved, using the determined connection information. Furthermore, the replica stream of the selected XR device is generated for display.
In this way, the supervisory device is able to present the replica stream relating to the XR device selected from the plurality of XR devices, using the position data and the connection information of the selected XR device. The moderator may then train, teach, or surveil an XR device user in real time. The moderator may also control, censor, comment, and describe or experience the content of a stream consumed by an XR device user in real time.
As referred to herein, the term ‘stream’ should be understood to mean data (e.g., video data, audio data or any combination thereof) that are transferred directly or indirectly from a device to another device, and may then be presented on the other device to be consumed as soon as they are received by the other device.
As referred to herein, the term ‘replica stream’ relating to an XR device should be understood to mean a stream representing what the XR device is currently rendering.
Each stream is rendered by an XR device of the plurality of XR devices and the corresponding replica stream may be then transferred, on the moderator's request, to the supervisory device. If the XR device in question is a VR display, the XR device may begin to encode and transmit the VR display-rendered stream to the supervisory device on the moderator's request. If the XR device in question is an AR display, the XR device may generate a camera view (containing for instance AR display users) with virtual elements (e.g., a stream, a virtual character) via an AR see-through display, and encode and transmit the camera view augmented with the virtual elements to the supervisory device on the moderator's request.
In some examples, the supervisory device may be in communication with the plurality of XR devices via a single communication network, such as a Wide Area Network (WAN). Each XR device may send a replica stream representing what it is current rendering, to a remote server via the WAN. The supervisory device may utilize the remote server to receive the replica stream from the XR device. The remote server may receive all the replica streams at all times. Alternatively, the remote server may receive the replica streams only on demand due to bandwidth limitations dependent, in part, on the number of XR devices. A stream database connected to the remote server may store the replica streams. Each XR device may broadcast (continuously, occasionally, periodically, every 5 to 10 seconds, every 10 to 60 seconds, or every one to two minutes), over the WAN communication network, position data along with connection information (e.g., URL). The supervisory device receives the broadcasting (continuously, occasionally, periodically, every 5 to 10 seconds, every 10 to 60 seconds, or every one to two minutes), by each XR device, of the position data (e.g., spatial coordinates) and connection information (e.g., URL) over the WAN communication network. To access a replica stream relating to a selected XR device, the supervisory device may make a request, via the WAN, to the remote server. In an example, the request contains the position data and/or connection information (e.g., URL) relating to the selected XR device. The supervisory device then reads the URL relating to the selected XR device in order to find the remote server and fetch the replica stream from the remote server. The supervisory device then presents the replica stream via the supervisory device to the moderator. In this case, the supervisory device may simultaneously seamlessly access and simultaneously present multiple replica streams, or present only one of the multiple Replica streams.
In some examples, the supervisory device may be in communication with the plurality of XR devices via a single communication network, such as a Local Area Network (LAN). The plurality of XR devices may broadcast (continuously, occasionally, periodically, every 5 to 10 seconds, every 10 to 60 seconds, or every one to two minutes) the position data (e.g., spatial coordinates) and connection information (e.g., URL) over the LAN communication network to which is connected the supervisory device. Once a connection to a local server (connected to the LAN) is requested by the supervisory device, the selected XR device will start an audio encoder encoding the audio and a video encoder encoding the XR device viewport to prepare the replica stream representing what the XR device is currently rendering. The encoded video and audio will be multiplexed and transmitted over the LAN via RTP (Real-time Transport Protocol) to the selected XR device-related network address: port (e.g., URL) of the local server. The supervisory device then reads the URL relating to the selected XR device in order to find the local server and fetch the replica stream from the local server. The replica stream relating to the selected XR device is presented via the supervisory device. The use of a LAN to transfer the streams from the plurality of XR devices to the supervisory device offers low latency and high bandwidth, while the use of a WAN to perform this task provides high latency and low bandwidth.
In some examples, the supervisory device may be in communication with the plurality of XR devices via a combination of two communication networks. The transfer of a replica stream (representing what a XR device is currently rendering) to the supervisory device may utilize a combination of two communication networks of different types (e.g., WAN and LAN), the combination further utilizing two servers of different types (e.g., a remote server and a local server). For instance, each XR device may broadcast (continuously, occasionally, periodically, every 5 to 10 seconds, every 10 to 60 seconds, or every one to two minutes), over the LAN, the position data along with the connection information (e.g., URL). The connection information (e.g., URL) may, however, refer to the other server, the remote server, which receives the replica streams at all times or on demand due to bandwidth limitations dependent in part on the number of XR devices. A stream database connected to the remote server may store the replica streams. The supervisory device receives the broadcasting, by each XR device, of the position data and WAN-related connection information (e.g., URL) over the LAN, the broadcasting occurring continuously, occasionally, periodically, every 5 to 10 seconds, every 10 to 60 seconds or every one to two minutes. To access a replica stream representing what a XR device is currently relating to a selected XR device, the supervisory device may make a request, via the WAN, to the remote server, the request containing, for instance, the position data and/or the connection information (e.g., URL) of the selected XR device. The supervisory device then reads the URL relating to selected XR device in order to find the remote server and fetch the replica stream relating to the selected XR device from the remote server. The supervisory device then presents the replica stream to the moderator. In this case, the supervisory device may simultaneously seamlessly access and simultaneously present multiple replica streams.
In some examples, each XR device may broadcast continuously, occasionally, periodically, every 5 to 10 seconds, every 10 to 60 seconds, or every one to two minutes, over a communication network (e.g., WAN or LAN), the position data along with the connection information (e.g., 5G SideLink connection information, URL). The supervisory device receives the broadcasting, by each XR device, of the position data and connection information (e.g., 5G SideLink connection information, URL) over the communication network, the broadcasting occurring continuously, occasionally, periodically, every 5 to 10 seconds, every 10 to 60 seconds, or every one to two minutes. Additionally or alternatively, the supervisory device may be in communication with one of the plurality of XR devices via a direct device-to-device communication based on various technologies, such as Bluetooth or 5G SideLink. The bandwidth and latency limitations inherent to Bluetooth technology may affect the resolution of the stream presented on the supervisory device, such that the stream resolution may be deemed less desirable by the moderator. But the 5G SideLink-based direct device-to-device communication provides low latency and high bandwidth. To access a replica stream relating to a selected XR device, the supervisory device makes a request directly to the selected XR device, the request containing, for example, the position data and/or the 5G Sidelink connection information, of the selected XR device. The supervisory device connects, to the selected XR device, using the 5G SideLink connection information. In this case, the moderator's device may connect to a single XR device at a time, and thus to present one replica stream at a time.
The point of determining the position data of each XR devices in real time is to use those position data (e.g., spatial coordinates) to identify each XR device of the plurality of XR devices, opening up the possibility to distinguish between XR devices of the plurality of XR devices and to select any one of the plurality of XR devices based on the position data of the plurality of XR devices. Additionally, each XR device of the plurality of XR devices may broadcast (continuously, occasionally, periodically, every 5 to 10 seconds, every 10 to 60 seconds or every one to two minutes) their spatial coordinates along with a connection information (e.g., URL, 5G SideLink connection information) to receive a receive a replica stream. The supervisory device may read the URLs of the plurality of XR devices to access the respective replica streams or connect to a specific XR device via device-to-device communication using the related 5G SideLink connection information. The position data of each XR device of the plurality of XR devices may be determined based on various technologies mentioned below.
In some examples, the supervisory device and the plurality of XR devices may be located in the same zone such that the supervisory device equipped with sensors is able to detect the presence of the plurality of XR devices and to locate them in the vicinity of the supervisory device. For instance, the supervisory device may comprise a LIDAR (Light Detection And Ranging) component: the LIDAR component would accommodate a laser source able to emit and scan laser pulses across the surface of objects surrounding the supervisory device (e.g., the plurality of XR devices) and retrieve the laser pulses reflected by the surrounding objects so as to measure distances between the supervisory device and those surrounding objects. Therefore, the supervisory device is able to locate the detected XR devices using the supervisory device as a reference point (whose location may be unknown to the supervisory device). Locating the detected XR devices may utilize image recognition and ways to measure distances.
Additionally or alternatively, each XR device of the plurality of XR devices may comprise a Radio Frequency IDentification (RFID) tag and the supervisory device may comprise a RFID reader in order to determine the position data of each XR device in real time.
Additionally or alternatively, each XR device of the plurality of XR devices may comprise an Inertial Measurement Unit (IMU) comprising two accelerometers and two gyroscopes if the plurality of XR devices are located on the same plane, or three accelerometers and three gyroscopes if the plurality of XR devices are placed on different planes (e.g., at different heights). Each IMU allows for generating motion-related data useful for tracking each XR device.
Additionally or alternatively, each XR device of the plurality of XR devices may comprise a Visual Inertial Odometry (VIO) component, each VIO component comprising a camera and an IMU. The VIO component combines inertial measurements from the IMU with the pose of each XR device from camera images so as to generate motion-related data useful for tracking each XR device.
Additionally or alternatively, (following the technology employed in Air Tags®) each XR device of the plurality of XR devices may emit a Bluetooth or ultra-wideband signal to a cloud-based system accessible by the supervisory device in order for the supervisory device to obtain the position data of the plurality of XR devices.
Additionally, some of the aforementioned technologies may be supplemented by SLAM (Simultaneous Localization And Mapping) technology that uses algorithms to generate maps using topological data acquired by sensors. For instance, LIDAR data of the supervisory device, IMU data and camera-generated video data of the plurality of XR devices may be used by SLAM algorithms to build a map representing the real-time locations of the plurality of XR devices.
Additionally or alternatively, each XR device of the plurality of XR devices may be confined to an area (whose location is known) e.g., surrounded by soft fences in order for the XR device users to safely operate in the area and avoid collisions between the XR device users. This setup is especially useful if the XR device employed is a VR display wherein the XR device user is unable to perceive the surrounding real scene.
In some examples, the supervisory device may be able to present at least one replica stream of the plurality of replica streams, each replica stream representing what a XR device of the plurality of XR devices is currently rendering. If there are multiple replica streams to be presented, the supervisory device may present one of the multiple replica streams, several of the multiple replica streams at the same time, several of them sequentially, all of the multiple replica streams at the same time or all of the multiple replica streams sequentially.
In some examples, the supervisory device may comprise a phone, a tablet, a XR device, a laptop, a computer or the like. The term ‘presented via the device’ is to be interpreted according to the nature of the supervisory device employed. For instance, if the supervisory device were to be a phone, tablet, laptop, computer or VR display, the supervisory device would comprise a display to present, to the moderator, e.g., at least one stream, a map, a set of images (captured by a camera integrated in the supervisory device) or a combination thereof. If the supervisory device were to be an AR display, the supervisory device would comprise a light engine able to produce an output pupil image (comprising e.g., at least one stream, a map or a combination thereof) and a see-through waveguide to perceive the moderator's immediate environment and replicate the output pupil image (of the light engine) across an eyebox in order for one of those replicated output pupil images to reach the moderator's retina. Alternatively, a small movable mirror could be used to steer the output pupil image from the projector towards the moderator's retina (which would utilize a retina tracker). The proportion of the display or the output pupil image occupied by a stream and the resolution of the stream dictate the level of details of the stream that is accessible by the moderator.
In some examples, the XR devices of the plurality of XR devices may be identical or different to each other, in terms of design and working principle. The supervisory device may present a replica stream representing what a XR device is currently rendering, irrespective of the type of the XR device.
According to some implementations, position data of the supervisory device are determined. In addition, the position of a particular XR device is based at least in part on the position data of the particular XR device relative to the position data of the supervisory device.
In some examples, the supervisory device may be used as a pointer to conveniently and effectively target and thus select any one of the plurality of XR devices so as to present the replica stream relating to the selected XR device. Knowing the spatial coordinates of the plurality of XR devices and the spatial coordinates and orientation (or tilt) of the supervisory device, it is possible to determine whether the extrapolation of the orientation (or tilt) of the supervisory device intersects any one of the plurality of XR devices (or any user wearing a XR device of the plurality of XR devices), and to identify which XR device of the plurality of XR devices (or which user wearing a XR device of the plurality of XR devices) is targeted by the moderator who controls the position data (e.g., position, orientation) of the supervisory device. This allows the moderator to select a XR device in a very convenient manner and to have the replica stream relating to the selected XR device presented, via the supervisory device, to the moderator.
In some examples, the supervisory device may comprise an Inertial Measurement Unit (IMU) comprising two accelerometers and two gyroscopes if the supervisory device and the plurality of XR devices are located on the same plane, or three accelerometers and three gyroscopes if the supervisory device and the plurality of XR devices are placed on different planes (e.g., at different heights). Each IMU allows for generating motion-related data useful for tracking the supervisory device and determining the position data of the supervisory device.
Additionally or alternatively, the supervisory device may comprise a Visual Inertial Odometry (VIO) component comprising a camera and an IMU. The VIO component combines inertial measurements from the IMU with the pose of the supervisory device from camera images so as to generate motion-related data useful for tracking the supervisory device.
Additionally or alternatively, (following the technology employed in Air Tags®) the supervisory device may emit a Bluetooth or ultra-wideband signal to a cloud-based system accessible by the supervisory device in order for the supervisory device to obtain the position data of the supervisory device.
Additionally, some of the aforementioned technologies may be supplemented by SLAM (Simultaneous Localization And Mapping) technology that uses algorithms to generate maps using topological data acquired by sensors. For instance, IMU data and camera-generated video data of the supervisory device may be used by SLAM algorithms to build a map representing the position of the supervisory device.
According to some implementations, generating the replica stream for display comprises generating for display a view from a viewpoint of the selected XR device augmented with one or more visual elements generated by the selected XR device.
The supervisory device is able to present in real-time, to the moderator, what the user of the selected XR device views, including any visual elements generated by the XR device itself. This allows the moderator to have full knowledge of the what the XR device user is consuming in real-time and to perform any of the aforementioned tasks.
According to some implementations, position data of the supervisory device are determined. Additionally, the supervisory device is operated to generate for display, based on the position data of each XR device and the supervisory device, a map display representing a location of each XR device and a location of the supervisory device. Furthermore, receiving the selection of the position of an XR device further comprises receiving a selection of a representation of the location of the XR device on the map display.
The supervisory device may use the position data of each XR device of the plurality of XR devices and the position data of the supervisory device to build an accurate map representing the locations (e.g., real-time locations) of the plurality of XR devices and the location of the supervisory device, respectively. For instance, the map may be generated by feeding LIDAR data, IMU data and camera-generated video data to SLAM algorithms. The map may be a plan view of the spatial arrangement of the plurality of XR devices and the supervisory device. The moderator is then able to identify each XR device of the plurality of XR devices more easily than if they had to directly look at the spatial arrangement of the plurality of XR devices and the supervisory device.
According to some implementations, the supervisory device is operated to additionally generate for display in association with a representation of an XR device on the map display, the replica stream of the XR device.
Hereby, the supervisory device is able to present, to the moderator, a replica stream in a dedicated portion of a map, using for instance Picture-in-Picture (PiP). This makes the selection of an XR device easier.
In this way, it is possible to obtain a map representing the locations (e.g., real-time locations) of the supervisory device, and the locations (e.g., real-time locations) of the plurality of XR devices accompanied each by a small floating window presenting the respective stream mapped to the XR device located adjacent to the small floating window. For instance, the map may be generated by feeding LIDAR data, IMU data and camera-generated video data to SLAM algorithms. This is of great interest for the moderator who has to simultaneously monitor multiple streams as they can have an overview of all the available streams, which makes the selection by the moderator easier to perform. By selecting one of the small floating windows, only the stream mapped to the XR device located next to the small floating window is presented, via the apparatus, to the moderator, which allows for the consumption, by the moderator, of the stream at a level of details higher than if the stream were to be confined to a small floating window. Additionally or alternatively, the selection may be done by selecting one XR device user represented on the map.
According to some implementations, an image of the environment perceived by the supervisory device is generated for display. Additionally, each XR device present in the image of the perceived environment is detected. Furthermore, each detected XR device in the image of the perceived environment is identified based on position data of each detected XR device. Moreover, an overlay representing the replica stream of the identified XR device in association with the XR device as shown in the image of the perceived environment is generated for display, for each identified XR device.
In some examples, the supervisory device may have at least one sensor (e.g., a camera, a LIDAR component) able to detect the presence of XR devices in the field of view of the sensor. The supervisory device may then assess the real-time location of each detected XR device in the field of view of the sensor (knowing the position data of the supervisory device) and cross-reference the position of each detected XR device in the field of view of the sensor with the position data of the plurality of the XR devices determined by the above methods so as to identify each detected XR device. For each detected XR device, an overlay of the replica stream relating to the detected XR device is presented next to the detected XR device. The moderator may then select the overlay of their choice so as to have only the stream (presented in the overlay) presented via the apparatus, which allows for the consumption of the stream at a level of details higher than if the stream were to be confined to a small overlay.
The moderator may be also able to switch between multiple replica streams they are to monitor and consume the multiple replica streams at a higher level of details by selecting one of the multiple replica streams.
According to some implementations, the source is a local server, wherein the local server, the supervisory device and the plurality of XR devices are on the same Local Area Network.
In some examples, the plurality of XR devices may broadcast continuously, occasionally, periodically, every 5 to 10 seconds, every 10 to 60 seconds or every one to two minutes the position data (e.g., spatial coordinates) and connection information (e.g., URL) over a first communication network to which is connected the supervisory device. Once a connection is requested by the supervisory device, the selected XR device will start an audio encoder encoding the audio and a video encoder encoding the XR device viewport. The encoded video and audio will be multiplexed and transmitted over the LAN via RTP (Real-time Transport Protocol) to the network address: port (e.g., URL) of a first server relating to the selected XR device. The supervisory device then reads the URL relating to the selected XR device in order to find the first server and fetch the replica stream relating to the selected XR device. The replica stream relating to the selected XR device is presented via the supervisory device. The communication between the supervisory device and the plurality of XR devices mediated via a Local Area Network (LAN) offers low latency and high bandwidth compared to the case where the communication between the supervisory device and the plurality of XR devices is implemented via a Wide Area Network.
According to some implementations, the source is a remote server in communication with the supervisory device and the plurality of XR devices via a Wide Area Network.
In some examples, the plurality of XR devices may broadcast continuously, occasionally, periodically, every 5 to 10 seconds, every 10 to 60 seconds or every one to two minutes the position data (e.g., spatial coordinates) and connection information (e.g., URL) over a first communication network to which is connected the supervisory device. Each XR device may transfer a replica stream representing what it is currently rendering, via the first communication network, to a first server belonging to the first communication network (either at all times or on demand due to bandwidth limitations dependent, in part, on the number of XR devices). The supervisory device may then receive the replica stream relating to the selected XR device via the first communication network by making a request to the first server, the request containing, for example, the position data (e.g., spatial coordinates) and/or the connection information (e.g., URL) of the selected XR. The first communication network and the first server may be a WAN and a remote server, respectively.
According to some implementations, the supervisory device and the plurality of XR devices are on the same Local Area Network, when the source is a remote server in communication with the supervisory device and the plurality of XR devices via a Wide Area network.
In some examples, the plurality of XR devices may broadcast continuously, occasionally, periodically, every 5 to 10 seconds, every 10 to 60 seconds or every one to two minutes the position data (e.g., spatial coordinates) and connection information (e.g., URL) over a first communication network to which are connected the supervisory device and a first server. Each XR device may transfer a replica stream representing what it is currently rendering, via a second communication network different from the first communication network, to a second server connected to the second communication network; the supervisory device may then receive the replica stream relating to the selected XR device via the second communication network by making a request to the second server, the request containing, for instance, the position data (e.g., spatial coordinates) and/or the second communication network-related connection information (e.g., URL) of the selected XR device. In this configuration, the first and second communication networks are a LAN and a WAN, respectively and the first and second servers are a local server and a remote server.
According to some implementations, the source is the selected XR device and the supervisory device and selected XR device are in direct device-to-device communication with one another
In some examples, the plurality of XR devices may broadcast continuously, occasionally, periodically, every 5 to 10 seconds, every 10 to 60 seconds or every one to two minutes the position data (e.g., spatial coordinates) and connection information (e.g., URL, 5G SideLink connection information) over a first communication network to which is connected the supervisory device. The supervisory device may then receive the replica stream relating to the selected XR device by making a request to the selected XR device, the request containing, for instance, the position data (e.g., spatial coordinates) and/or the connection information (e.g., 5G SideLink connection information) of the selected XR device. The direct device-to-device communication allows for low latency and high bandwidth.
In another aspect, the disclosure relates to method for supervising a plurality of users of XR devices. A problem arises in that a supervisor hoping to assist a user in operating in the XR environment cannot easily obtain a representation of the XR environment which a particular user amongst the plurality of users is currently perceiving. This other aspect of the disclosure tackles this problem by having each XR device and a supervisory device independently determine the position of an XR device. Each XR device provides replica stream data in association with its position, and the user of the supervisory device selects an XR device by selecting its position. By tallying the position in the supervisor's selection of an XR device with the position associated with each replica stream, the replica stream of the XR device selected by the supervisor can be provided to the supervisory device.
In some embodiments, the tallying is performed by a third device (e.g., a server) which receives i) the replica stream and the associated position from the XR device, and ii) the position of a selected XR device from the supervisory device. The server may then find the replica stream associated with the position received from the supervisory device and transmit the replica stream to the supervisory device, or may alternatively transmit connection information to the supervisory device indicating a resource (in some examples the server itself) from which the supervisory device can obtain the replica stream.
In other embodiments, the tallying is performed by the XR device. In such embodiments, the supervisory device broadcasts a request for the stream from the XR device at a given position to all the XR devices, and the XR device at the given position responds to the supervisory device with the replica stream or connection information from which the supervisory device can obtain the replica stream.
In yet other embodiments, the tallying is performed by the supervisory device. In such embodiments, each of the XR devices sends its replica stream (or connection information pointing to a resource from which its replica stream can be retrieved) to the supervisory device along with an indication of its position, and the XR device then, if required, obtains the replica stream from the resource, and generates the replica stream for display on the supervisory device.
Throughout the detailed description, the supervisory device is called the moderator's apparatus.
Each XR device and each related XR device user are represented by the same numeral. Accordingly, XR device user 110a consumes stream 110b rendered by XR device 110a. Similarly, XR device user 112a consumes stream 112b rendered by XR device 112a. Similarly, XR device user 114a consumes stream 114b rendered by XR device 114a. The spatial coordinates of XR devices 110a, 112a and 114a may be determined using various technologies discussed herein, such as LIDAR, RFID, IMU, VIO, Bluetooth. Additionally, for instance, LIDAR data, IMU data and camera-generated video data of the moderator's apparatus 108b and IMU data and camera-generated video data of XR devices 110a, 112a and 114a (if XR devices are equipped with VIO technology) may be used by SLAM algorithms to build a map representing the real-time locations of the moderator's apparatus and the plurality of XR devices. The position data (e.g., real-time spatial coordinates) and/or the connection information (e.g., URL, 5G SideLink connection information) of the plurality of the XR devices and the real-time position data moderator's apparatus (e.g., spatial coordinates, orientation or tilt of the moderator's apparatus) may be continuously broadcast over communication network 104. The position data of the plurality of XR devices and the moderator's apparatus may be fed to SLAM algorithms so as to build a real-time map representing the real-time locations of the plurality of XR devices and the moderator's apparatus. The SLAM algorithms are run in a SLAM system shared by the plurality of XR devices and the moderator's apparatus.
In a first scenario, communication network 104 and server 106 may be a WAN and a remote server, respectively. Each XR device (e.g. XR device 110a, 112a or 114a) may continuously broadcast, over communication network 104, position data and connection information (e.g., URL). The moderator's apparatus 108b may receive the continuous broadcasting, by each XR device, of the position data and connection information (e.g., URL), over communication network 104. Each stream (e.g., stream 110b, 112b or 114b) rendered by a XR device (e.g., XR device 110a, 112a or 114a) is transferred, via communication network 104, to server 106, either all the time or on demand (due to bandwidth limitations dictated by the number of XR devices) using the moderator's apparatus 108b. Each stream (e.g., stream 110b, 112b or 114b) may be stored in the stream database 102 connected to server 106. The moderator's apparatus 108b may select a XR device (e.g., XR device 110a, 112a or 114a) using the position data to identify the selected XR device. The moderator's apparatus 108b may make a request, via communication network 104, to server 106 so as to access the replica stream relating to the selected XR device: the request may comprise the position data and connection information (e.g., URL) of the selected XR device. The moderator's apparatus 108b then connects to the URL of the selected XR device and the stream (e.g., stream 110b, 112b or 114b) rendered by the selected XR device (e.g., XR device 110a, 112a or 114a) is presented via the moderator's apparatus 108b. In this manner, the moderator's apparatus 108b may simultaneously seamlessly access and simultaneously present multiple replica streams (e.g., 110b, 112b and 114b), or present at least one of the multiple replica streams (e.g., stream 110b, 112b or 114b). By simply allowing each XR device to share the stream (each XR device renders) with a server and encoder, the consumption is detached from any connectivity hassle with any one of the plurality of XR devices.
In a second scenario, communication network 104 and server 106 may be a LAN and a local server, respectively. Each XR device (e.g. XR device 110a, 112a or 114a) may continuously broadcast, over communication network 104, position data and connection information (e.g., URL). The moderator's apparatus 108b may receive the continuous broadcasting, by each XR device, of the position data and connection information (e.g., URL), over communication network 104. The moderator's apparatus 108b may select a XR device (e.g., XR device 110a, 112a or 114a) using the position data to identify the selected XR device. The moderator's apparatus 108b may make a request, via communication network 104, to server 106 so as to access the replica stream relating to the selected XR device: the request may comprise the position data and connection information (e.g., URL) of the selected XR device. Once a connection is requested by the moderator's apparatus, the selected XR device will start an audio encoder encoding the audio and a video encoder encoding the XR device viewport. The encoded video and audio will be multiplexed and transmitted over the LAN via RTP (Real-time Transport Protocol) to the network address: port (e.g., URL) relating to the selected XR device. The transmission can leverage any low latency protocol and RTP is an example. WebRTC and SCReAM2 are examples leveraging self-clocked rate adaptation allowing for video encoder bit rate modifications based on changing network QoS conditions. The moderator's apparatus 108b then connects to the URL of the selected XR device and the stream (e.g., stream 110b, 112b or 114b) rendered by the selected XR device (e.g., XR device 110a, 112a or 114a) is presented via the moderator's apparatus 108b. In this manner, the moderator's apparatus 108b may simultaneously seamlessly access and simultaneously present multiple replica streams (e.g., 110b, 112b and 114b), or present at least one of the multiple replica streams (e.g., stream 110b, 112b or 114b).
In a third scenario, communication network 104 is indeed two distinct communication networks, a LAN and a WAN, and server 106 is indeed two distinct servers, a local server and a remote server, the local server and the remote server belonging to the LAN and the WAN, respectively. Each XR device (e.g. XR device 110a, 112a or 114a) may continuously broadcast, over the LAN, position data and connection information (e.g., URL). The moderator's apparatus 108b may receive the continuous broadcasting, by each XR device, of the position data and connection information (e.g., URL), over the LAN. Each stream (e.g., stream 110b, 112b or 114b) rendered by a XR device (e.g., XR device 110a, 112a or 114a) is transferred, via the WAN, to the remote server, either all the time or on demand (due to bandwidth limitations dictated by the number of XR devices) using the moderator's apparatus 108b. Each stream (e.g., stream 110b, 112b or 114b) may be stored in the stream database 102 connected to server 106. The moderator's apparatus 108b may select a XR device (e.g., XR device 110a, 112a or 114a) using the position data to identify the selected XR device. The moderator's apparatus 108b may make a request, via the WAN, to the remote server so as to access the stream mapped to the selected XR device: the request may comprise the position data and connection information (e.g., URL) of the selected XR device. The moderator's apparatus 108b then connects to the URL of the selected XR device and the stream (e.g., stream 110b, 112b or 114b) rendered by the selected XR device (e.g., XR device 110a, 112a or 114a) is presented via the moderator's apparatus 108b. In this manner, the moderator's apparatus 108b may simultaneously seamlessly access and simultaneously present multiple replica streams (e.g., 110b, 112b and 114b), or present at least one of the multiple replica streams (e.g., stream 110b, 112b or 114b).
In a fourth scenario, communication network 104 and server 106 may be a LAN and a local server, respectively or a WAN and a remote server, respectively. Each XR device (e.g. XR device 110a, 112a or 114a) may continuously broadcast, over communication network 104, position data and connection information (e.g., URL, 5G SideLink connection information). The moderator's apparatus 108b may receive the continuous broadcasting, by each XR device, of the position data and 5G Sidelink connection information, over communication network 104. The moderator's apparatus 108b may select a XR device (e.g., XR device 110a, 112a or 114a) using the position data to identify the selected device. To access a stream (e.g., stream 110b, 112b or 114b) mapped to a selected XR device (e.g., XR device 110a, 112a or 114a), the moderator's apparatus may make a request, via 5G SideLink connection (e.g., duplet 108c/110c, 108c/112c or 108c/114c), to the selected XR device (e.g., XR device 110a, 112a or 114a) so as to access the replica stream relating to the selected XR device, the request may comprise the position data and 5G SideLink connection information of the selected XR device. In the frame of direct device-to-device communication, the moderator's device 108b connects to the selected XR device (e.g., XR device 110a, 112a or 114a), using the 5G Sidelink connection information: a two-way communication path will be established allowing for both near real-time viewing and interaction between the devices. In this manner, the moderator's device 108b may seamlessly connect to an XR device (e.g., XR device 110a, 112a or 114a), and present the replica stream (e.g., stream 110b, 112b or 114b).
A stream (e.g., streams 110b, 112b or 114b), selected by moderator 108a via the moderator's apparatus 108b, may be presented on the full display of the moderator's apparatus 108b. The selection of the replica stream to be presented may be implemented following different approaches exemplified in
The spatial coordinates of XR devices 110a, 112a and 114a may be determined using various technologies discussed herein, e.g., LIDAR, RFID, IMU, VIO, Bluetooth. The spatial coordinates and orientation (or tilt) of moderator's apparatus 108b may be determined using various technologies discussed herein, e.g., IMU, VIO, Bluetooth. Additionally, for instance, LIDAR data (if the moderator's device is equipped with a LIDAR component), IMU data and camera-generated video data of the moderator's device 108b and IMU data and camera-generated video data of XR devices 110a, 112a and 114a (if XR devices are equipped with VIO technology) may be used by SLAM algorithms to build a map representing the real-time locations of the moderator's apparatus and the plurality of XR devices. Knowing the spatial coordinates of XR devices 110a, 112a and 114a, and the spatial coordinates and orientation (or tilt) of the moderator's apparatus 108b, it is possible to determine whether the extrapolation of the orientation (or tilt) of the moderator's apparatus 108b is to intersect one of the available XR device (user) 110a, 112a or 114a. In
The spatial coordinates of XR devices 110a, 112a and 114a may be determined using various technologies mentioned earlier e.g., LIDAR, RFID, IMU, VIO, Bluetooth. The spatial coordinates and orientation (or tilt) of moderator's apparatus 108b may be determined using various technologies mentioned earlier e.g., IMU, VIO, Bluetooth. Additionally, for instance, LIDAR data, IMU data and camera-generated video data of the moderator's device 108b and IMU data and camera-generated video data of XR devices 110a, 112a and 114a (if XR devices are equipped with VIO technology) may be used by SLAM algorithms to build a map representing the real-time locations of the moderator's apparatus and the plurality of XR devices.
The display of moderator's apparatus 108b shows an image 126 representing the camera-captured scene, i.e., what moderator 108a would see on the display of the moderator's apparatus 108b, if moderator 108a were to capture the scene in front of them using the camera of the moderator's apparatus 108b. On the display of the moderator's apparatus 108b, XR device users 110a, 112a and 114a are depicted as XR device user representations 120a, 122a and 124a, respectively. Overlay 120b of stream 110b, overlay 122b of stream 112b and overlay 124b of stream 114b are superimposed on respective portions of image 126 and placed close to XR device user representations 120a, 122a and 124a, respectively. Streams 110b, 112b and 114b are simultaneously presented within overlays 120b, 122b and 124b, respectively. Using cursor 108d, the moderator 108a (not shown here) selects XR device user representation 120a or overlay 120b so as to present stream 110b (mapped to XR device 110a) on the full display of the moderator's apparatus 108b, allowing moderator 108a to access a higher level of details for a better understanding of what XR device user 110a consumes. Having overlays 120b, 122b and 124b within the image of the camera-captured scene, in the vicinity of XR device user representations 130a, 132a and 134a, respectively allows moderator 108a to have a clear overview of which XR device user is watching which stream and to visually diagnose the status of each XR device user in real time.
The spatial coordinates of XR devices 110a, 112a and 114a may be determined using various technologies mentioned earlier e.g., LIDAR, RFID, IMU, VIO, Bluetooth. The spatial coordinates and orientation (or tilt) of the moderator's apparatus 108b may be determined using various technologies mentioned earlier (e.g., IMU, VIO, Bluetooth). Additionally, for instance, LIDAR data, IMU data and camera-generated video data of the moderator's device 108b and IMU data and camera-generated video data of XR devices 110a, 112a and 114a (if XR devices are equipped with VIO technology) may be used by SLAM algorithms to build a map representing the real-time locations of the moderator's apparatus and the plurality of XR devices.
In
The spatial coordinates of XR devices 110a, 112a and 114a may be determined using various technologies mentioned earlier e.g., LIDAR, RFID, IMU, VIO, Bluetooth. The spatial coordinates and orientation (or tilt) of the moderator's apparatus 108b may be determined using various technologies mentioned earlier (e.g., IMU, VIO, Bluetooth). Additionally, for instance, LIDAR data, IMU data and camera-generated video data of the moderator's device 108b and IMU data and camera-generated video data of XR devices 110a, 112a and 114a (if XR devices are equipped with VIO technology) may be used by SLAM algorithms to build a map representing the real-time locations of the moderator's apparatus and the plurality of XR devices.
In
The spatial coordinates of XR devices 110a, 112a and 114a may be determined using various technologies mentioned earlier e.g., LIDAR, RFID, IMU, VIO, Bluetooth. The spatial coordinates and orientation (or tilt) of the moderator's apparatus 108b may be determined using various technologies mentioned earlier (e.g., IMU, VIO, Bluetooth). Additionally, for instance, LIDAR data, IMU data and camera-generated video data of the moderator's device 108b and IMU data and camera-generated video data of XR devices 110a, 112a and 114a (if XR devices are equipped with VIO technology) may be used by SLAM algorithms to build a map representing the real-time locations of the moderator's apparatus and the plurality of XR devices.
The display of the moderator's apparatus 108b represents a map 138 representing the real-time locations of each XR device user 110a, 112a and 114a and of moderator 108a. In map 138, XR device users 110a, 112a and 114a are depicted as XR device user representations 130a, 132a and 134a, respectively and moderator 108a is represented as the moderator representation 118b. Fixed windows 130b, 132b and 134b are located outside map 138, and arranged in a row 139. Streams 110b, 112b and 114b are simultaneously presented within fixed windows 130b, 132b and 134b, respectively. Using cursor 108d, moderator 108a (not shown here) selects either XR device user representation 130a or fixed window 130b so as to present stream 110b (mapped to XR device 110a) on the full display of the moderator's apparatus 108b, allowing moderator 108a to access a higher level of details for a better understanding of what XR device user 110a consumes. Having floating windows 130b, 132b and 134b outside map 138 (in row 139), in the vicinity of XR device user representations 130a, 132a and 134a, respectively allows moderator 108a to have a clear overview of which XR device user is watching which stream and to visually diagnose the status of each XR device user in real time.
Cloud service 1210 may comprise a communication network, a server and a stream database (not shown in
At 1402, control circuitry provides an apparatus (e.g., moderator's apparatus 108b or 1208b) in communication with a plurality of XR devices (e.g., XR devices 110a, 112a, 114a, 1202a, 1204a and 1206a). This communication may be mediated via a communication network (e.g., communication network 104, 204 or 1210) involving the use of a server e.g., server 106 or 210 (Refer to the first and second scenarios). Alternatively, this communication may be mediated via a combination of two communication networks involving each a server (Refer to the third scenario). Alternatively, this communication may be mediated via a communication network (e.g., communication network 104, 204 or 1210) involving the use of a server (e.g., server 106 or 210), and a direct device-to-device communication e.g., duplets 108c/110c, 108c, 112c or 108c/114c (Refer to the fourth scenario).
At 1404, control circuitry receives position data (e.g., spatial coordinates) of the plurality of XR devices (e.g., XR devices 110a, 112a, 114a, 1202a, 1204a and 1206a) wherein each XR device of the plurality of XR devices renders a respective stream (e.g., streams 110b, 112b, 114b, 1202b, 1204b or 1206b). Position data (e.g., spatial coordinates) of XR devices 110a, 112a and 114a may be determined in real-time using various technologies mentioned earlier e.g., LIDAR, RFID, IMU, VIO, Bluetooth. The position data of the plurality of XR devices may be fed to SLAM algorithms to build a map representing the real-time locations of the plurality of XR devices.
At 1406, control circuitry maps the position data (e.g., spatial coordinates) of each XR device (e.g., XR devices 110a, 112a, 114a, 1202a, 1204a or 1206a) to the respective stream (e.g., streams 110b, 112b, 114b, 1202b, 1204b or 1206b) each XR renders.
At 1408, control circuitry selects one of the plurality of XR devices, based on the position data (e.g., spatial coordinates) of the selected XR device (e.g., XR devices 110a, 112a, 114a, 1202a, 1204a or 1206a). There are three main options to select one of the plurality of XR devices, some of those options presenting some variants (See
At 1410, control circuitry causes the stream (e.g., streams 110b, 112b, 114b, 1202b, 1204b or 1206b) mapped to the selected XR device (e.g., XR devices 110a, 112a, 114a, 1202a, 1204a or 1206a) to be presented via the apparatus (e.g., the moderator's apparatus 108b or 1208b). The presentation of the stream mapped to the selected XR device, via the moderator's apparatus, results from the occurrence of either the first, second, third or fourth scenario mentioned earlier.
The process 1500 allows for accessing multiple replica streams (e.g., streams 110b, 112b, 114b, 1202b, 1204b or 1206b) from an apparatus (e.g., the moderator's apparatus 108b or 1208b) in order for the moderator (e.g., moderator 108a or 1208a) to consume at least one of the replica streams. The selection of one replica stream may be implemented via three different routes (e.g., from 1508 to 1514, from 1516 through 1518 to 1528 and from 1516 through 1530 to 1540). The moderator's apparatus (e.g., the moderator's apparatus 108b or 1208b) may comprise a phone, a tablet, a XR device, a laptop, a computer or any combination thereof. The moderator's apparatus (e.g., the moderator's apparatus 108b or 1208b) may comprise a display, at least one camera able to capture the surrounding environment and an IMU.
At 1502, control circuitry provides an apparatus (e.g., moderator's apparatus 108b or 1208b) in communication with a plurality of XR devices (e.g., XR devices 110a, 112a, 114a, 1202a, 1204a and 1206a). This communication may be mediated via a communication network (e.g., communication network 104, 204 or 1210) or a direct device-to-device communication (e.g., duplets 108c/110c, 108c, 112c or 108c/114c). This communication may be mediated via a communication network (e.g., communication network 104, 204 or 1210) involving the use of a server e.g., server 106 or 210 (Refer to the first and second scenarios). Alternatively, this communication may be mediated via a combination of two communication networks involving each a server (Refer to the third scenario). Alternatively, this communication may be mediated via a communication network (e.g., communication network 104, 204 or 1210) involving the use of a server (e.g., server 106 or 210), and a direct device-to-device communication e.g., duplets 108c/110c, 108c, 112c or 108c/114c (Refer to the fourth scenario).
At 1504, control circuitry receives position data (e.g., spatial coordinates) of the plurality of XR devices (e.g., XR devices 110a, 112a, 114a, 1202a, 1204a and 1206a) wherein each XR device of the plurality of XR devices renders a respective stream (e.g., streams 110b, 112b, 114b, 1202b, 1204b or 1206b). Position data (e.g., spatial coordinates) of XR devices 110a, 112a and 114a may be determined in real-time using various technologies mentioned earlier e.g., LIDAR, RFID, IMU, VIO, Bluetooth. The position data of the plurality of XR devices may be fed to SLAM algorithms to build a map representing the real-time locations of the plurality of XR devices.
At 1506, control circuitry maps the position data (e.g., spatial coordinates) of each XR device (e.g., XR devices 110a, 112a, 114a, 1202a, 1204a or 1206a) to the respective stream (e.g., streams 110b, 112b, 114b, 1202b, 1204b or 1206b) each XR renders.
Two alternatives (i.e. 1508 or 1516) for selecting one of the plurality of XR devices are then available.
At 1512, control circuitry determines whether or not the orientation (or tilt) of the apparatus (e.g., the moderator's apparatus 108b or 1208b) intersects one of the plurality of XR devices. If it is the case, it leads to 1514 otherwise it leads back to 1508/1510.
At 1514, control circuitry causes the stream mapped to the selected XR device to be presented via the apparatus. The presentation of the stream mapped to the selected XR device, via the moderator's apparatus, results from the occurrence of either the first, second, third or fourth scenario mentioned earlier.
At 1516, control circuitry determines the position data (e.g., spatial coordinates, orientation (or tilt)) of the apparatus (e.g., the moderator's apparatus 108b or 1208b).
Two alternatives (i.e. 1518 or 1530) for selecting one of the plurality of XR devices are then available.
At 1518, control circuitry generates for display, via the apparatus (e.g., the moderator's apparatus 108b or 1208b), an environment perceived by the apparatus. The apparatus may be equipped with a camera trained to visually detect XR devices or assisted by an image-treatment software allowing for capturing the real scene facing the camera of the apparatus.
At 1520, control circuitry detects each XR device present in the perceived environment, using an image-treatment software able to visually recognize XR devices (or even XR device users) on images captured by the camera.
At 1522, control circuitry identifies each detected XR device, based on the position data of each detected XR device. By cross-referencing the real-time locations of the detected XR devices in the perceived environment and the position data of the plurality of the XR devices (both using the position data of the moderator's apparatus as a reference), it is possible to identify each detected XR device relative to the moderator's apparatus.
At 1524, control circuitry generates for display, for each detected XR device, an overlay of the mapped stream, each overlay being located in the perceived environment based on the position data of the detected XR device.
At 1526, control circuitry selects one of the plurality of overlays to select one of the plurality of XR devices, based on an input from the moderator (e.g., moderator 108a or 1208a).
At 1528, control circuitry causes the stream mapped to the selected XR device to be presented via the apparatus. The presentation of the stream mapped to the selected XR device, via the moderator's apparatus, results from the occurrence of either the first, second, third or fourth scenario mentioned earlier.
At 1530, control circuitry generates for display, via the apparatus, a map representing a real-time location of each XR device and a real-time location of the apparatus, based on the position data of each XR device and the position data of the apparatus, respectively.
At 1532, control circuitry generates for display, via the apparatus, an image comprising at least one portion, wherein each of the at least one portions presents one respective stream.
At 1534, control circuitry superimposes the map on the image such that each of the at least one portions is apparent on the map.
At 1536, control circuitry places, on the map, each of the at least one portions based on the real-time location of each XR device mapped to the one respective stream each of the at least one portions presents.
At 1538, control circuitry selects one of the at least one portions of the image to select one of the plurality of XR devices based on an input from the moderator (e.g., moderator 108a or 1208a).
At 1540, control circuitry causes the stream mapped to the selected XR device to be presented via the apparatus. The presentation of the stream mapped to the selected XR device, via the moderator's apparatus, results from the occurrence of either the first, second, third or fourth scenario mentioned earlier.
Irrespective of the way the moderator's apparatus select a XR device, the connection of the moderator's apparatus to the replica stream relating to the selected XR device enables two-ways interactions e.g., chat and/or voice communications, exclusively between the moderator's apparatus and the selected XR device, in order for the moderator to assist the XR device user and for the XR device user to explain any problem they are facing.
Additionally, the connection of the moderator's apparatus to the replica stream relating to the selected XR device enables one-way interactions from the moderator's apparatus to the XR device. For instance, the moderator's apparatus may be able to generate visual signs (e.g., arrows, road signs), via the XR device, directly on the stream being rendered by the selected XR device, in order to bring the XR device user's attention to a specific object or a specific location. The moderator's apparatus may be also able to take control of the viewpoint of the XR device in order for the moderator to bring the XR device user's attention to a specific place or specific object. Furthermore, the moderator's apparatus may be able to power off and on the XR device.
The processes described above are intended to be illustrative and not limiting. One skilled in the art would appreciate that the steps of the processes discussed herein may be omitted, modified, combined, and/or rearranged, and any additional steps may be performed without departing from the scope of the invention. More generally, the above disclosure is meant to be illustrative and not limiting. Only the claims that follow are meant to set bounds as to what the present invention includes. Furthermore, it should be noted that the features and limitations described in any one example may be applied to any other example herein, and flowcharts or examples relating to one example may be combined with any other example in a suitable manner, done in different orders, or done in parallel. In addition, the systems and methods described herein may be performed in real time. It should also be noted that the systems and/or methods described above may be applied to, or used in accordance with, other systems and/or methods.