This disclosure generally relates to artificial-reality systems.
In particular embodiments, the computing device 1201 may be associated with a user 1203. In particular embodiments, the computing device may be associated with a wearable device such as an HMD 1104, or an augmented-reality glasses 1110. In particular embodiments, the computing device 1201 may be any suitable computing device that has one or more interfaces towards a user 1203 and has one or more communication links towards a user device 1205.
In particular embodiments, the computing device 1201 may receive user signals from the user 1203. In particular embodiments, the user signals may comprise voice signals of the user 1203. The voice signals may be received through a microphone associated with the computing device 1201. In particular embodiments, the user signals may comprise a point of gaze of the user 1203. The point of gaze of the user 1203 may be sensed by an eye tracking module associated with the computing device 1201. In particular embodiments, the user signals may comprise brainwave signals sensed by a brain-computer interface (BCI) associated with the computing device 1201. In particular embodiments, the user signals may comprise any suitable combination of user input that may comprise voice, gaze, gesture, brainwave or any suitable user input that is detectable by the computing device. As an example and not by way of limitation, continuing with a prior example illustrated in
In particular embodiments, the computing device 1201 may determine a user intention based on the received user signals. In order to detect the user intention, the computing device 1201 may first analyze the received user signals and then may determine the user intention based on data that maps the user signals to the user intention. In particular embodiments, the computing device may use a machine-learning model for determining the user intention. As an example and not by way of limitation, continuing with a prior example illustrated in
In particular embodiments, the computing device 1201 may construct one or more first commands for a user device 1205 based on the determined user intention. The one or more first commands may be commands that are to be executed in order by the user device 1205 to fulfill the determined user intention. In order to construct the one or more first commands for the user device 1205, the computing device 1201 may select a user device 1205 that needs to perform one or more functions to fulfill the determined user intention among one or more available user devices 1205. The computing device 1201 may access current status information associated with the selected user device 1205. The computing device 1201 may communicate with the selected user device 1205 to access the current status information associated with the selected user device 1205. The current status information may comprise current environment information surrounding the selected user device 1205 or information associated with current state of the selected user device 1205. The computing device 1201 may construct the one or more commands that are to be executed by the selected user device 1205 from the current status associated with the selected user device 1205 to fulfill the determined user intention. As an example and not by way of limitation, continuing with a prior example illustrated in
In particular embodiments, the computing device 1201 may send one of the one or more first commands to the user device 1205. The user device 1205 may comprise a communication module to communicate with the computing device 1201. The user device 1205 may be capable of executing each of the one or more commands upon receiving the command from the computing device 1201. In particular embodiments, the user device may comprise a power wheelchair, a refrigerator, a television, a heating, ventilation, and air conditioning (HVAC) device, or any Internet of Things (IoT) device. As an example and not by way of limitation, continuing with a prior example, the communication module 1350 of the augmented-reality glasses 1410 may send a first command of the one or more commands constructed by the command generation module 1340 to the power wheelchair 1420 through the established secure wireless communication link 1407. The wireless communication interface 1423 of the power wheelchair 1420 may receive the first command from the communication module 1350 of the augmented-reality glasses 1410. The wireless communication interface 1423 may forward the first command to an embedded processing unit. The embedded processing unit may be capable of executing each of the one or more commands generated by the command generation module 1340 of the augmented-reality glasses 1410. Although this disclosure describes sending a command to the user device in a particular manner, this disclosure contemplates sending a command to the user device in any suitable manner.
In particular embodiments, the computing device 1201 may receive status information associated with the user device 1205 from the user device 1205. The status information may be sent by the user device 1205 in response to the one of the one or more first commands. The status information may comprise current environment information surrounding the user device 1205 or information associated with current state of the user device 1205 upon executing the one of the one or more first commands. In particular embodiments, the computing device 1201 may determine that the one or the one or more first commands has been successfully executed by the user device 1205 based on the status information. The computing device 1201 may send one of the remaining of the one or more first commands to the user device 1205. As an example and not by way of limitation, continuing with a prior example illustrated in
In particular embodiments, the computing device 1201 may, upon receiving status information from the user device 1205, determine that environment surrounding the user device has changed since the one or more first commands were constructed. The computing device 1201 may determine that state of the user device 1205 has changed since the one or more first commands were constructed. The computing device 1201 may determine that those changes require modifications to the one or more first commands. The computing device 1201 may construct one or more second commands for the user device 1205 based on the determination. The one or more second commands may be updated commands from the one or more first commands based on the received status information. The one or more second commands are to be executed by the user device 1205 to fulfill the determined user intention given the updated status associated with the user device 1205. The computing device 1201 may send one of the one or more second commands to the user device 1205. As an example and not by way of limitation, continuing with a prior example illustrated in
In particular embodiments, a computing device may generate a three-dimensional first-resolution digital map of a geographic area in real world based on second-resolution observations on the geographic area using a machine-learning model, where the first resolution is higher than the second resolution. In particular embodiments, the second-resolution observations may be two-dimensional images. In particular embodiments, the second-resolution observations may be three-dimensional point cloud. In particular embodiments, the second-resolution observations may be captured by a camera associated with a user device including an augmented-reality glasses or a smartphone. A digital maps may comprise a three-dimensional feature layer comprising three-dimensional point clouds and a contextual layer comprising contextual information associated with points in the point cloud. With a digital map, a user device, such as an augmented-reality glasses, may be able to tap into the digital map rather than reconstructing the surroundings in real time, which allows significant reduction in compute power. Thus, a user device with a less powerful mobile chipset may be able to provide better artificial-reality services to the user. With the digital maps, the user device may provide teleportation experience to the user. Also, the user may be able to search and share real-time information about the physical world using the user device. The applications of the digital maps may include, but not limited to, digital assistant that brings user information associated with the location the user is in real time, an overlay that allows the user to anchor virtual content in the real world. For example, a user associated with an augmented-reality glasses may get showtimes just by looking at a movie theater's marquee. Previously, generating a high-resolution digital map for an area may require a plurality of high-resolution images capturing the geographic area. This approach requires high computing resources. Furthermore, the digital map generated by this approach may lack of contextual information. The systems and methods disclosed in this application allows generating the first-resolution digital map based on the second-resolution images. The generated digital map may comprise contextual information associated with points in the point cloud. Although this disclosure describes generating a three-dimensional high-resolution digital map of a geographic area in real world based on low-resolution observations on the geographic area using a machine-learning model in a particular manner, this disclosure contemplates generating a three-dimensional high-resolution digital map of a geographic area in real world based on low-resolution observations on the geographic area using a machine-learning model in any suitable manner.
In particular embodiments, the computing device may generate one or more high-resolution representations of one or more objects by processing the set of semantic classified low-resolution observations 2201 for the geographic area, camera poses 2203 associated with the low-resolution observations, and the low-resolution map 2205 for the geographic area using a machine-learning model 2210. The machine-learning model 2210 may be a collection of generative continuous models 2210A, 2210B, 2210N. Each generative continuous models 2210A, 2210B, 2210N corresponds to a semantic class of an object in the observations. In particular embodiments, objects detected within the low-resolution observation may be semantically classified. Thus, a semantic classified observations 2201 along with the corresponding camera poses 2203 and the low-resolution map 2205 may be processed through a corresponding generative continuous model within the machine-learning model 2210. The semantic class may include, but not limited to, humans, animals, natural landscape, structures, manufactured items, furniture. Each generative continuous models 2210A, 2210B and 2210N within the machine-learning model 2210 may be trained separately using respectively prepared training data. Technical details for the generative continuous models 2210A, 2210B, and 2210N can be found in arXiv:2003.10983 (2020), arXiv:1901.05103 (2019), arXiv:1809.05068 (2018), and arXiv:2005.05125 (2020). Although this disclosure describes generating one or more high-resolution representations of one or more objects by processing the set of semantic classified low-resolution observations, camera poses, and low-resolution map in a particular manner, this disclosure contemplates generating one or more high-resolution representations of one or more objects by processing the set of semantic classified low-resolution observations, camera poses, and low-resolution map in any suitable manner.
In particular embodiments, the computing device may combine the high-resolution digital representations of the one or more objects identified in the semantic classified low-resolution observations 2201. The computing device may perform a scene level optimization using a scene level optimizer 2220 to create a high-resolution three-dimensional scene 2209. For example, the computing device may optimize the combined representations to fit the low-resolution map 2205. Although this disclosure describes post-inference processes for generating a high-resolution scene in a particular manner, this disclosure contemplates post-inference processes for generating a high-resolution scene in any suitable manner.
In particular embodiments, training the machine-learning model 2210 may comprise training each of the generative continuous models 2210A, 2210B, and 2210N. The computing device may train a plurality of generative continuous models (e.g., using auto-decoder described in arXiv:1901.05103 (2019)) for different classes of objects (e.g., one model for furniture, another for trees, etc.) using prepared training data for each class. Each generative model may be conditioned on a latent code to represent the manifold of geometry and appearances. A generative model may be a combination of a decoder plus a latent code. Each generative continuous model may employ a different architecture and training scheme to exploit similarities in those classes and reduce the capacity needed for the model to generalize to everything. For example, a generative continuous model for human/animals may be a codec-avatar-like scheme, while a generative continuous model for a furniture may be a model in arXiv:2005.05125 (2020). A generative continuous model for landscapes may utilize procedural synthesis techniques. Although this disclosure describes training a generative continuous model for a semantic class in a particular manner, this disclosure contemplates a generative continuous model for a semantic class in any suitable manner.
In particular embodiments, a computing device may train a machine-learning model 2210 that comprises a plurality of generative continuous models 2210A, 2210B, and 2210N. The computing device may train each generative continuous model one by one.
In particular embodiments, the computing device may train the high-resolution encoder 2310 and the decoder 2320 using the set of semantic classified high-resolution observations 2301 as training data. The high-resolution encoder 2310 may generate a latent code 2303 for a given semantic classified high-resolution observation 2301. The decoder 2320 may generate a high-resolution three-dimensional representation 2305 for a given latent code 2303. The gradients may be computed using a loss function based on difference between a ground truth high-resolution three-dimensional representation and the generated high-resolution three-dimensional representation 2305 for each semantic classified high-resolution observation 2301 in the set of training samples. A backpropagation procedure with the computed gradients may be used for training the high-resolution encoder 2310 and the decoder 2320 until a training goal is reached. Although this disclosure describes training the high-resolution encoder and the decoder of an auto-encoder generative continuous model in a particular manner, this disclosure contemplates training the high-resolution encoder and the decoder of an auto-encoder generative continuous model in any suitable manner.
In particular embodiments, once the training of the high-resolution encoder 2310 and the decoder 2310 of an auto-encoder generative continuous model finishes, the computing device may train the low-resolution encoder 2330. The computing device may prepare a set of low-resolution observations 2307 respectively corresponding to the set of semantic classified high-resolution observations 2301. The computing device may train the low-resolution encoder 2330 using the prepared set of low-resolution observations 2307. The low-resolution encoder 2330 may generate a latent code 2303 for a given low-resolution observation 2307. The computing device may compute gradients using a loss function based on difference between the generated latent code 2303 and a latent code 2303 the high-resolution encoder 2310 generates for a corresponding high-resolution observation 2301. A backpropagation procedure with the computed gradients may be used for training the low-resolution encoder 2330. The details of training an auto-encoder generative continuous model may be found in arXiv:2003.10983 (2020), arXiv:1901.05103 (2019), arXiv:1809.05068 (2018), and arXiv:2005.05125 (2020). Although this disclosure describes training the low-resolution encoder of an auto-encoder generative continuous model in a particular manner, this disclosure contemplates training the low-resolution encoder of an auto-encoder generative continuous model in any suitable manner.
In particular embodiments, the generative continuous model may be an auto-decoder generative continuous model.
In particular embodiments, the computing device may train the auto-decoder generative continuous model. During the training procedure, the plurality of latent codes 2353 and the decoder 2360 may be optimized to generate a high-resolution three-dimensional representation 2355 for a given latent code 2353 representing a shape. The gradients may be computed using a loss function based on difference between a ground truth high-resolution three-dimensional representation corresponding to a shape in the prepared set of training samples and the generated high-resolution three-dimensional representation 2355 for a given latent code corresponding to the shape. A backpropagation procedure with the computed gradients may be used for training the decoder 2360 and for optimizing the plurality of latent codes 2353. Although this disclosure describes training an auto-decoder generative continuous model in a particular manner, this disclosure contemplates training an auto-decoder generative continuous model in any suitable manner.
In particular embodiments, the computing device may estimate an optimal latent code 2353 for a given semantic classified low-resolution observation when generating high-resolution scenes based on low-resolution observations using the auto-decoder generative continuous model. The estimated optimal latent code 2353 may be provided to the auto-decoder generative continuous model to generate a high-resolution three-dimensional representation. An auto-decode generative continuous model can be trained with high-resolution training data only without requiring low-resolution training data. However, the low-resolution data can be used for inferring high-resolution three-dimensional representations. The details of training an auto-decoder generative continuous model and inferring high-resolution three-dimensional representations may be found in arXiv:1901.05103 (2019). Although this disclosure describes generating high-resolution three-dimensional representations using an auto-decoder generative continuous model in a particular manner, this disclosure contemplates generating high-resolution three-dimensional representations using an auto-decoder generative continuous model in any suitable manner.
Visual Odometry without Initialization
In particular embodiments, a computing device 3108 may access a first frame 3201 of a video stream captured by a camera associated with the computing device 3108. The computing device 3108 may also access signals 3203 from IMU sensors associated with the camera. As an example and not by way of limitation, an artificial-reality application may run on the computing device 3108. The artificial-reality application may need to construct a map associated with the environment that is being captured by the camera associated with the computing device 3108. A position and/or a pose of the camera may be required to construct the map. Thus, the computing device 3108 may activate the camera associated with the computing device 3108. Frame-to-Frame Tracker 3210 may access a series of image frames 3201 captured by the camera associated with the computing device 3108. The computing device 3108 may also activate IMU sensors associated with the camera. Frame-to-Frame Tracker 3210 may also access real-time signals 3203 from IMU sensors associated with the camera. Although this disclosure describes accessing an image frame and IMU signals in a particular manner, this disclosure contemplates accessing an image frame and IMU signals in any suitable manner.
In particular embodiments, the computing device 3108 may compute bearing vectors 3205 corresponding to tracked features in the first frame. To compute the bearing vectors 3205 corresponding to the tracked features in the first frame, the computing device 3108 may access bearing vectors 3205 corresponding to the tracked features in a previous frame of the first frame. The computing device 3108 may compute bearing vectors 3205 corresponding to the tracked features in the first frame based on the computed bearing vectors 3205 corresponding to the tracked features in the previous frame and an estimated relative pose of the camera corresponding to the first frame with respect to the previous frame. In particular embodiments, epipolar constraints may be used to reduce a search radius for computing the bearing vectors 3201 corresponding to the tracked features in the first frame. As an example and not by way of limitation, continuing with a prior example, Frame-to-Frame Tracker 3210 may compute bearing vectors 3205 corresponding to tracked features in frame t. Frame-to-Frame Tracker 3210 may access computed bearing vectors 3205 corresponding to the tracked features in frame t-1. Frame-to-Frame Tracker 3210 may estimate relative pose of the camera corresponding to frame t with respect to frame t-1. Frame-to-Frame Tracker 3210 may compute bearing vectors 3205 corresponding to the tracked features in frame t based on the computed bearing vectors 3205 corresponding to the tracked features in frame t-1 and the estimated relative pose of the camera corresponding to frame t with respect to frame t-1. Frame-to-Frame Tracker 3210 may use epipolar constraints to reduce a search radius for computing the bearing vectors 3201 corresponding to the tracked features in frame t. Frame-to-Frame Tracker 3210 may forward the computed bearing vectors 3205 corresponding to the tracked features in frame t to First Frame Pose Estimator 3220. Although this disclosure describes computing bearing vectors corresponding to tracked features in a frame in a particular manner, this disclosure contemplates computing bearing vectors corresponding to tracked features in a frame in any suitable manner.
In particular embodiments, the relative pose of the camera corresponding to the first frame with respect to the previous frame may be estimated based on signals 3203 from the IMU sensors. As an example and not by way of limitation, continuing with a prior example, Frame-to-Frame Tracker 3210 may estimate the relative pose of the camera corresponding to frame t with respect to frame t-1 based on signals 3203 from the IMU sensors. Although this disclosure describes estimating a relative pose of a camera corresponding to a frame with respect to a previous frame in a particular manner, this disclosure contemplates estimating a relative pose of a camera corresponding to a frame with respect to a previous frame in any suitable manner.
In particular embodiments, the computing device 3108 may compute a rotation 3207 and an unscaled translation 3309 of the camera corresponding to the first frame with respect to a previous keyframe. Computing the rotation 3207 and the unscaled translation 3309 of the camera corresponding to the first frame with respect to the previous keyframe may comprise optimizing an objective function of 3 Degree of Freedom (DoF) rotation and 2 DoF unit norm translation. In particular embodiments, the computing device 3108 may minimize the Jacobians of the objective function instead of minimizing the objective function. This approach may make the dimension of the residual equal to the number of unknowns. The computing device 3108 may also improve the results by including the objective function itself in the cost function. The properties of the estimation can be tuned by differently weighting the Jacobians and 1-d residual. As an example and not by way of limitation, the relative pose estimator module 3320 may compute a rotation 3207 and an unscaled translation 3309 of the camera corresponding to frame t with respect to a previous keyframe k, where k<t. The relative pose estimator module 3320 may utilize bearing vectors 3205 corresponding to the tracked features in frame t and bearing vectors 3205 corresponding to the tracked features in frame k for optimizing the objective function. In particular embodiments, Although this disclosure describes computing a rotation and an unscaled translation of the camera corresponding to the first frame with respect to a previous keyframe in a particular manner, this disclosure contemplates computing a rotation and an unscaled translation of the camera corresponding to the first frame with respect to a previous keyframe in any suitable manner.
In particular embodiments, the computing device 3108 may remove outliers by only estimating the direction of the translation vector using a closed form solution. The inputs to the closed form solution may be the relative rotation (gyro prediction 3211) and the bearing vectors 3205. Once the outliers are removed, the computing device 3108 may re-estimate the relative transformation using the relative pose estimator module 3320. If a good gyro prediction 3211 is not available, the computing device 3108 may randomly generate a gyro prediction 3211 within a Random sample consensus (RANSAC) framework. Although this disclosure describes removing outlier features in a particular manner, this disclosure contemplates removing outlier features in any suitable manner.
In particular embodiments, the previous keyframe may be determined based on heuristics by the keyframe heuristics module 3310. In particular embodiments, the keyframe heuristics module 3310 may determine a new keyframe when computing a rotation 3207 and an unscaled translation 3309 of the camera corresponding to a frame with respect to the previous keyframe fails. As an example and not by way of limitation, the relative pose estimator module 3320 may fail to compute a rotation 3207 and an unscaled translation 3309 of the camera corresponding to frame t with respect to the previous keyframe k because the tracked features in the previous keyframe k may not match well to the tracked features in frame t. In such a case, the keyframe heuristics module 3310 may determine a new keyframe k′. In particular embodiments, frame k′ may be a later frame than frame k. In particular embodiments, the keyframe heuristics module 3310 may determine a new keyframe in a regular interval. The regular interval may become short when the camera moves fast while the regular interval may become long when the camera moves slow. As an example and not by way of limitation, the camera moves fast. Then, a probability that a feature in a frame may not exist in from another frame becomes higher. Thus, the keyframe heuristics module 3310 may configure the regular interval short, such that a new keyframe is determined more often. When the camera moves slow, the keyframe heuristics module 3310 may configure the regular interval long, such that a new keyframe is determined less often. Although this disclosure describes determining a new keyframe in a particular manner, this disclosure contemplates determining a new keyframe in any suitable manner.
In particular embodiments, the computing device 3108 may determine a scaled translation 3209 of the camera corresponding to the first frame with respect to the previous keyframe by computing a scale of the translation. Determining the scale of the translation may comprise minimizing the squared re-projection errors of the features with estimated depth based on features of the current frame and re-projected features of the previous keyframe to the first frame. A Gauss-Newton algorithm is used for the minimization. As the depth of the features is not known for the first frame, a constant depth may be assumed. As an example and not by way of limitation, the scale estimator module 3330 may determine a scaled translation of the camera corresponding to frame t with respect to the previous keyframe k. The scale estimator module 3330 may re-project the tracked features in the previous keyframe k into frame t. The scale estimator module 3330 may minimize the squared re-projection errors of the features with estimated depth acquired from a depth estimator module 3340. The depth estimator module 3340 may estimate the depth of features by points filters of a 3d-2d tracker. Although this disclosure describes determining a scaled translation of the camera in a particular manner, this disclosure contemplates determining a scaled translation of the camera in any suitable manner.
In particular embodiments, the computing device 3108 may send the rotation 3207 and the scaled translation 3209 of the camera corresponding to the first frame with respect to the previous keyframe to an application utilizing a pose information. As an example and not by way of limitation, an artificial-reality application may utilize the pose information. The FFT 3200 may send the rotation 3207 and the scaled translation 3209 of the camera to the artificial-reality application. Although this disclosure describes sending the rotation and the scaled translation of the camera to an application in a particular manner, this disclosure contemplates sending the rotation and the scaled translation of the camera to an application in any suitable manner.
Different computing devices have different advantages. Tradeoffs are made between computing power, battery life, accessibility, and visual range. For example, glasses rank highly in visual range but have lower computing power and battery life than a laptop. The ability to connect multiple devices through a network opens the door to mixing and matching some of these advantages. Running applications (apps) can take up a large amount of computing power and battery life. For this reason, it is desirable to have the ability to run the apps on a computing device with more system resources, such as a watch, and project the images onto a device that, though has more limited system resources, is in a better visual range for a user, such as smart glasses. However, the amount of data transfer required to move an image from a watch to glasses over a network is immense, causing delays and excessive power loss. Thus, it would be beneficial to have a method of reducing the amount of data transfer required between these two devices. It also may be desirable to be able to run multiple apps at once in different lines of sight, much like using multiple monitors at a workstation but for use when a person is on the go.
This invention describes systems and processes that enable one mobile device to use the display of another mobile device to display content. For ease of reference and clarity, this disclosure would use the collaboration between a smart watch and a pair of smart glasses as an example to explain the techniques described herein. However, the computing device where the app resides (transferor device) or where the content is displayed (transferee device) may be, for example, a smart watch, smart glasses, a cell phone, a tablet, or a laptop. This invention solves the previously described problem of massive amounts of data transfer by sending instructions to the glasses for forming an image rather than sending the image itself.
In one embodiment, the outputting computing device, such as a smart watch, does the bulk of the computing. An app, such as a fitness app, is run on this device. The user may be wearing a smart watch on her wrist and a pair of smart glasses on her face. While the smart watch has the power to run her apps, in many instances, such as during exercise, it may be inconvenient to have to look down at her watch.
An embodiment of the invention is directed to a method that solves problems associate with large amounts of data transfer and differences in display size between two connected devices. This connection can be through wires or through a variety of wireless means such as through a local area network (LAN) such as Wi-Fi or a personal area network (PAN) such as Bluetooth, infrared, Zigbee, and ultrawideband (UWB) technology. Many methods allow for a short-range connection between two or more devices. For example, an individual may own a watch and glasses and wish to use them at the same time in a way that data can be exchanged between them in real-time. The devices, such as with a watch and glasses, may be different in terms of size, computational power, and display.
For example, a person may be running while wearing a watch and glasses, each being equipped with a computational device that is capable of running and displaying content generated by apps. This individual may run apps primarily on the watch, which has a higher computational capability, storage, or power or thermal capacity. The individual may wish to be able to view one app on the watch while viewing another on the display of the glasses. The user may instruct the watch to send content generated by the second app to the glasses for display. In one embodiment, the user's instruction may cause the CPU of the watch to generate rendering commands for the GPU to render the visual aspects associated with the app. If the app is to be run on the watch display, the rendering command is sent directly to the GPU of the watch. If, however, the user wishes the visual aspects associated with the app to be displayed on the glasses display, the rendering command is sent over the connection to the GPU of the glasses. It is the GPU of the glasses that renders the visual aspects associated with the app. This is different from the naïve method of sending the completed image over the connection to the glasses display. It saves cost associated with data transfer since the commands (generated instructions) require less data than the rendered image.
Simultaneously or separately, the CPU 4110 on the watch 4101 may generate rendering commands 4112 for the same app that generated command 4113 or for a different app. The app that caused the CPU 4110 to generate command 4112 may be called a background app since it is running in the background and its content will not be shown on the watch 4101. For example, the background app may be one for playing music while the same user is on their run. Moving the content generated by the background app from the watch 4101 to the glasses is done by first sending the rendering commands 4112 for rendering the background app's content to the communication connection on the watch side 4140, which may be a wired or a wireless interface.
The Particular embodiments may repeat one or more steps of the method of
Even as AR devices such as smart glasses become more popular, several factors hinder their broader adoption for everyday use. As an example, the amount and size of the electronics, batteries, sensors, and antennas required to implement AR functionalities are often too large to fit within the glasses themselves. But even when some of these electronics are offloaded from the smart glasses to a separate handheld device that communicates wirelessly with the smart glasses, the smart glasses often remain unacceptably bulky and too heavy, hot, or awkward-looking for everyday wear.
Further challenges of smart glasses and accompanying handheld devices include the short battery life and high power consumption of both devices, which may even cause thermal shutdowns of the device(s) during heavy use cases like augmented calling. Battery life may further force a user to carry both the accompanying handheld device as well as their regular cell phone, rather than allowing the cell phone to operate as the handheld device itself. Both devices may also suffer from insufficient thermal dissipation, as attempting to minimize their bulkiness results in devices that do not have enough surface to dissipate heat. Size and weight may be problems; the glasses may be so large that they are non-ubiquitous, and a user may not want to wear them in public. Users with prescription glasses may further need to now carry two pairs of glasses, their regular prescription glasses and their bulky AR smart glasses.
Importantly, separating some functionality from the smart glasses themselves to the separate handheld device introduces several new problems. As an example, the handheld device is frequently carried in a pocket, purse, or backpack. This affects line of sight (LOS) communications, and further impacts radio frequency (RF) performance, since the antennas in the handheld device may be severely loaded and detuned. Additionally, both units may use field of view (FOV) sensors, which take up significant space and are easily occluded during normal operation. These sensors may require the user to raise their hands in front of the glasses for gesture-controlled commands, which may be odd-looking in public. The use of both glasses and a handheld device further burdens the user, as it requires them to carry so many devices (for example, a cell phone, the handheld device, the AR glasses, and potentially separate prescription glasses), especially since the batteries of the AR glasses and the handheld device often do not last for an entire day, eventually rendering two of the devices the user is carrying useless.
Many of these challenges may be avoided with a more ubiquitous, wearable AR system that mimics common, socially acceptable dress.
In particular embodiments, the hat 5210 may also be configured to detachably couple to the pair of glasses 5220, and thus the data bus ring itself is configured to detachably couple to the glasses 5220. As an example, the hat 5210 may include a connector 5307 to connect the AR glasses 5220 to the hat 5210. In particular embodiments, this connector 5307 may be magnetic. When the AR glasses 5220 are physically connected to the hat 5310 by such a connector 5307, wired communication may occur through the connector 5307, rather than relying on wireless connections between the hat 5210 and the glasses 5220. In such an embodiment, this wired connection may reduce the need for several transmitters and may further reduce the amount of battery power consumed by the AR system 5200 over the course of its use. In this embodiment, the glasses may further draw power from the hat, thus reducing, or even eliminating, the number of batteries needed on the glasses themselves.
The hat 5210 may further include various internal and/or external sensors. As an example, one or more inertial measurement unit (IMU) sensors may be connected to the data bus ring 5301 to capture data of user movement and positioning. Such data may include information concerning direction, acceleration, speed, or positioning of the hat 5210, and these sensors may be either internal or external to the hat 5210. Other internal sensors may be used to capture biological signals, such as EMG sensors to detect brain wave signals. In particular embodiments, these brain wave signals may even be used to control the AR system.
The hat 5210 may further include a plurality of external sensors for hand tracking and assessment of a user's surroundings.
This configuration of an AR system 5200 including smart glasses 5220 and a hat 5210 provides numerous advantages. As an example, offloading much of the electronics of the AR system to the hat 5210 may increase the ubiquity and comfort of the AR system. The weight of the glasses 5220 may be reduced, becoming light and small enough to replace prescription glasses (thus providing some users with one less pair of glasses to carry). Including optical sensors on the visor of the hat may provide privacy to the user Veronica Martinez 5230, as her hands do not need to be lifted in front of the glasses 5220 during gestures in order to be captured by the sensors of the AR system. Rather, user gestures may be performed and concealed close to the body in a natural position.
As another example, positioning TX/RX antennas at the edge of the visor may provide sufficient distance and isolation from the user's body and head for maximum performance and protection from RF radiation. These antennas may not be loaded or detuned by body parts, and the fixed distance from the head may eliminate Specific Absorption Rate (SAR) concerns, since the visor may be further from the body than a cell phone during normal usage. Often, handheld devices and wearables like smart watches suffer substantial RF performance reductions due to head, hand, arm, or body occlusion or loading; however, by placing the antennas at the edge of the visor, they may not be loaded by any body parts. Also, enabling the direct, wired connection of the smart glasses 5220 to the hat 5210 through the connector 5307 may eliminate the need for LOS communications, as is required when smart glasses communicate with a handheld unit that may be carried in a pocket or purse. Placing GPS and cellular antennas on a hat rather than an occluded handheld device may result in reduced power consumption and increased battery life, and thermal dissipation for these antennas may not be as great a problem.
Even the hat 5210 itself provides many advantages. As an example, the simple size and volume of the hat 5210 may allow plenty of surface area for thermal dissipation. The position of the hat close to the user's head may allow for new sensors (such as EMG sensors) to be integrated into and seamlessly interact with the AR system. Further, the visor may provide natural shadow to the solar glare that often affects optical sensors mounted on the glasses 5220. And when the hat 5210 is removed, the AR system 5200 may be disabled, thus providing the user 5230 and people around the user with an easily controllable and verifiable indication of when the AR system 5200 is operating and detecting their surroundings and biological data. In this case, the AR glasses 5220 may no longer collect or transmit images or sounds surrounding the user 5230 even if the user 5230 continues to wear them (e.g., as prescription glasses), thus reassuring her privacy. This disabling of the AR system by removing the hat may also provide an easily verifiable sign to those around the user 5230 that the user's AR system is no longer collecting images or sounds of them.
This disclosure contemplates any suitable number of computer systems 1700. This disclosure contemplates computer system 1700 taking any suitable physical form. As example and not by way of limitation, computer system 1700 may be an embedded computer system, a system-on-chip (SOC), a single-board computer system (SBC) (such as, for example, a computer-on-module (COM) or system-on-module (SOM)), a desktop computer system, a laptop or notebook computer system, an interactive kiosk, a mainframe, a mesh of computer systems, a mobile telephone, a personal digital assistant (PDA), a server, a tablet computer system, or a combination of two or more of these. Where appropriate, computer system 1700 may include one or more computer systems 1700; be unitary or distributed; span multiple locations; span multiple machines; span multiple data centers; or reside in a cloud, which may include one or more cloud components in one or more networks. Where appropriate, one or more computer systems 1700 may perform without substantial spatial or temporal limitation one or more steps of one or more methods described or illustrated herein. As an example and not by way of limitation, one or more computer systems 1700 may perform in real time or in batch mode one or more steps of one or more methods described or illustrated herein. One or more computer systems 1700 may perform at different times or at different locations one or more steps of one or more methods described or illustrated herein, where appropriate.
In particular embodiments, computer system 1700 includes a processor 1702, memory 1704, storage 1706, an input/output (I/O) interface 1708, a communication interface 1710, and a bus 1712. Although this disclosure describes and illustrates a particular computer system having a particular number of particular components in a particular arrangement, this disclosure contemplates any suitable computer system having any suitable number of any suitable components in any suitable arrangement.
In particular embodiments, processor 1702 includes hardware for executing instructions, such as those making up a computer program. As an example and not by way of limitation, to execute instructions, processor 1702 may retrieve (or fetch) the instructions from an internal register, an internal cache, memory 1704, or storage 1706; decode and execute them; and then write one or more results to an internal register, an internal cache, memory 1704, or storage 1706. In particular embodiments, processor 1702 may include one or more internal caches for data, instructions, or addresses. This disclosure contemplates processor 1702 including any suitable number of any suitable internal caches, where appropriate. As an example and not by way of limitation, processor 1702 may include one or more instruction caches, one or more data caches, and one or more translation lookaside buffers (TLBs). Instructions in the instruction caches may be copies of instructions in memory 1704 or storage 1706, and the instruction caches may speed up retrieval of those instructions by processor 1702. Data in the data caches may be copies of data in memory 1704 or storage 1706 for instructions executing at processor 1702 to operate on; the results of previous instructions executed at processor 1702 for access by subsequent instructions executing at processor 1702 or for writing to memory 1704 or storage 1706; or other suitable data. The data caches may speed up read or write operations by processor 1702. The TLBs may speed up virtual-address translation for processor 1702. In particular embodiments, processor 1702 may include one or more internal registers for data, instructions, or addresses. This disclosure contemplates processor 1702 including any suitable number of any suitable internal registers, where appropriate. Where appropriate, processor 1702 may include one or more arithmetic logic units (ALUs); be a multi-core processor; or include one or more processors 1702. Although this disclosure describes and illustrates a particular processor, this disclosure contemplates any suitable processor.
In particular embodiments, memory 1704 includes main memory for storing instructions for processor 1702 to execute or data for processor 1702 to operate on. As an example and not by way of limitation, computer system 1700 may load instructions from storage 1706 or another source (such as, for example, another computer system 1700) to memory 1704. Processor 1702 may then load the instructions from memory 1704 to an internal register or internal cache. To execute the instructions, processor 1702 may retrieve the instructions from the internal register or internal cache and decode them. During or after execution of the instructions, processor 1702 may write one or more results (which may be intermediate or final results) to the internal register or internal cache. Processor 1702 may then write one or more of those results to memory 1704. In particular embodiments, processor 1702 executes only instructions in one or more internal registers or internal caches or in memory 1704 (as opposed to storage 1706 or elsewhere) and operates only on data in one or more internal registers or internal caches or in memory 1704 (as opposed to storage 1706 or elsewhere). One or more memory buses (which may each include an address bus and a data bus) may couple processor 1702 to memory 1704. Bus 1712 may include one or more memory buses, as described below. In particular embodiments, one or more memory management units (MMUs) reside between processor 1702 and memory 1704 and facilitate accesses to memory 1704 requested by processor 1702. In particular embodiments, memory 1704 includes random access memory (RAM). This RAM may be volatile memory, where appropriate. Where appropriate, this RAM may be dynamic RAM (DRAM) or static RAM (SRAM). Moreover, where appropriate, this RAM may be single-ported or multi-ported RAM. This disclosure contemplates any suitable RAM. Memory 1704 may include one or more memories 1704, where appropriate. Although this disclosure describes and illustrates particular memory, this disclosure contemplates any suitable memory.
In particular embodiments, storage 1706 includes mass storage for data or instructions. As an example and not by way of limitation, storage 1706 may include a hard disk drive (HDD), a floppy disk drive, flash memory, an optical disc, a magneto-optical disc, magnetic tape, or a Universal Serial Bus (USB) drive or a combination of two or more of these. Storage 1706 may include removable or non-removable (or fixed) media, where appropriate. Storage 1706 may be internal or external to computer system 1700, where appropriate. In particular embodiments, storage 1706 is non-volatile, solid-state memory. In particular embodiments, storage 1706 includes read-only memory (ROM). Where appropriate, this ROM may be mask-programmed ROM, programmable ROM (PROM), erasable PROM (EPROM), electrically erasable PROM (EEPROM), electrically alterable ROM (EAROM), or flash memory or a combination of two or more of these. This disclosure contemplates mass storage 1706 taking any suitable physical form. Storage 1706 may include one or more storage control units facilitating communication between processor 1702 and storage 1706, where appropriate. Where appropriate, storage 1706 may include one or more storages 1706. Although this disclosure describes and illustrates particular storage, this disclosure contemplates any suitable storage.
In particular embodiments, I/O interface 1708 includes hardware, software, or both, providing one or more interfaces for communication between computer system 1700 and one or more I/O devices. Computer system 1700 may include one or more of these I/O devices, where appropriate. One or more of these I/O devices may enable communication between a person and computer system 1700. As an example and not by way of limitation, an I/O device may include a keyboard, keypad, microphone, monitor, mouse, printer, scanner, speaker, still camera, stylus, tablet, touch screen, trackball, video camera, another suitable I/O device or a combination of two or more of these. An I/O device may include one or more sensors. This disclosure contemplates any suitable I/O devices and any suitable I/O interfaces 1708 for them. Where appropriate, I/O interface 1708 may include one or more device or software drivers enabling processor 1702 to drive one or more of these I/O devices. I/O interface 1708 may include one or more I/O interfaces 1708, where appropriate. Although this disclosure describes and illustrates a particular I/O interface, this disclosure contemplates any suitable I/O interface.
In particular embodiments, communication interface 1710 includes hardware, software, or both providing one or more interfaces for communication (such as, for example, packet-based communication) between computer system 1700 and one or more other computer systems 1700 or one or more networks. As an example and not by way of limitation, communication interface 1710 may include a network interface controller (NIC) or network adapter for communicating with an Ethernet or other wire-based network or a wireless NIC (WNIC) or wireless adapter for communicating with a wireless network, such as a WI-FI network. This disclosure contemplates any suitable network and any suitable communication interface 1710 for it. As an example and not by way of limitation, computer system 1700 may communicate with an ad hoc network, a personal area network (PAN), a local area network (LAN), a wide area network (WAN), a metropolitan area network (MAN), or one or more portions of the Internet or a combination of two or more of these. One or more portions of one or more of these networks may be wired or wireless. As an example, computer system 1700 may communicate with a wireless PAN (WPAN) (such as, for example, a BLUETOOTH WPAN), a WI-FI network, a WI-MAX network, a cellular telephone network (such as, for example, a Global System for Mobile Communications (GSM) network), or other suitable wireless network or a combination of two or more of these. Computer system 1700 may include any suitable communication interface 1710 for any of these networks, where appropriate. Communication interface 1710 may include one or more communication interfaces 1710, where appropriate. Although this disclosure describes and illustrates a particular communication interface, this disclosure contemplates any suitable communication interface.
In particular embodiments, bus 1712 includes hardware, software, or both coupling components of computer system 1700 to each other. As an example and not by way of limitation, bus 1712 may include an Accelerated Graphics Port (AGP) or other graphics bus, an Enhanced Industry Standard Architecture (EISA) bus, a front-side bus (FSB), a HYPERTRANSPORT (HT) interconnect, an Industry Standard Architecture (ISA) bus, an INFINIBAND interconnect, a low-pin-count (LPC) bus, a memory bus, a Micro Channel Architecture (MCA) bus, a Peripheral Component Interconnect (PCI) bus, a PCI-Express (PCIe) bus, a serial advanced technology attachment (SATA) bus, a Video Electronics Standards Association local (VLB) bus, or another suitable bus or a combination of two or more of these. Bus 1712 may include one or more buses 1712, where appropriate. Although this disclosure describes and illustrates a particular bus, this disclosure contemplates any suitable bus or interconnect.
Herein, a computer-readable non-transitory storage medium or media may include one or more semiconductor-based or other integrated circuits (ICs) (such, as for example, field-programmable gate arrays (FPGAs) or application-specific ICs (ASICs)), hard disk drives (HDDs), hybrid hard drives (HHDs), optical discs, optical disc drives (ODDs), magneto-optical discs, magneto-optical drives, floppy diskettes, floppy disk drives (FDDs), magnetic tapes, solid-state drives (SSDs), RAM-drives, SECURE DIGITAL cards or drives, any other suitable computer-readable non-transitory storage media, or any suitable combination of two or more of these, where appropriate. A computer-readable non-transitory storage medium may be volatile, non-volatile, or a combination of volatile and non-volatile, where appropriate.
Herein, “or” is inclusive and not exclusive, unless expressly indicated otherwise or indicated otherwise by context. Therefore, herein, “A or B” means “A, B, or both,” unless expressly indicated otherwise or indicated otherwise by context. Moreover, “and” is both joint and several, unless expressly indicated otherwise or indicated otherwise by context. Therefore, herein, “A and B” means “A and B, jointly or severally,” unless expressly indicated otherwise or indicated otherwise by context.
The scope of this disclosure encompasses all changes, substitutions, variations, alterations, and modifications to the example embodiments described or illustrated herein that a person having ordinary skill in the art would comprehend. The scope of this disclosure is not limited to the example embodiments described or illustrated herein. Moreover, although this disclosure describes and illustrates respective embodiments herein as including particular components, elements, feature, functions, operations, or steps, any of these embodiments may include any combination or permutation of any of the components, elements, features, functions, operations, or steps described or illustrated anywhere herein that a person having ordinary skill in the art would comprehend. Furthermore, reference in the appended claims to an apparatus or system or a component of an apparatus or system being adapted to, arranged to, capable of, configured to, enabled to, operable to, or operative to perform a particular function encompasses that apparatus, system, component, whether or not it or that particular function is activated, turned on, or unlocked, as long as that apparatus, system, or component is so adapted, arranged, capable, configured, enabled, operable, or operative. Additionally, although this disclosure describes or illustrates particular embodiments as providing particular advantages, particular embodiments may provide none, some, or all of these advantages.
This application claims the benefit under 35 U.S.C. § 119(e) of U.S. Provisional Patent Application No. 63/078,811, filed 15 Sep. 2020, U.S. Provisional Patent Application No. 63/078,818, filed 15 Sep. 2020, U.S. Provisional Patent Application No. 63/108,821, filed 2 Nov. 2020, U.S. Provisional Patent Application No. 63/172,001, filed 7 Apr. 2021, and U.S. Provisional Patent Application No. 63/213,063, filed 21 Jun. 2021, which are incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
63078811 | Sep 2020 | US | |
63078818 | Sep 2020 | US | |
63108821 | Nov 2020 | US | |
63172001 | Apr 2021 | US | |
63213063 | Jun 2021 | US |