The present disclosure generally relates to displaying a visual representation of audible data based on a region of interest.
Some devices present visual representations of audible information. Visual representations can include a transcript of audible spoken words or an audio portion of a media content item in verbatim or in edited form. Some visual representations may include descriptions of non-speech elements. The visual representations need not be hard-coded into the media content item thereby providing the user an option to view the visual representations or not view the visual representations. Most devices display the visual representations at a fixed display location.
So that the present disclosure can be understood by those of ordinary skill in the art, a more detailed description may be had by reference to aspects of some illustrative implementations, some of which are shown in the accompanying drawings.
In accordance with common practice the various features illustrated in the drawings may not be drawn to scale. Accordingly, the dimensions of the various features may be arbitrarily expanded or reduced for clarity. In addition, some of the drawings may not depict all of the components of a given system, method or device. Finally, like reference numerals may be used to denote like features throughout the specification and figures.
Various implementations disclosed herein include devices, systems, and methods for displaying a visual representation of audible signal data at a distance that is based on a region of interest. In some implementations, a device includes a display, an audio sensor, a non-transitory memory, and one or more processors coupled with the display, the audio sensor and the non-transitory memory. In various implementations, a method includes presenting a representation of a three-dimensional (3D) environment from a current point-of-view. In some implementations, the method includes identifying a region of interest within the 3D environment. In some implementations, the region of interest is located at a first distance from the current point-of-view. In some implementations, the method includes receiving, via the audio sensor, an audible signal and converting the audible signal to audible signal data. In some implementations, the method includes displaying, on the display, a visual representation of the audible signal data at a second distance from the current point-of-view that is a function of the first distance between the region of interest and the current point-of-view.
In accordance with some implementations, a device includes one or more processors, a non-transitory memory, and one or more programs. In some implementations, the one or more programs are stored in the non-transitory memory and are executed by the one or more processors. In some implementations, the one or more programs include instructions for performing or causing performance of any of the methods described herein. In accordance with some implementations, a non-transitory computer readable storage medium has stored therein instructions that, when executed by one or more processors of a device, cause the device to perform or cause performance of any of the methods described herein. In accordance with some implementations, a device includes one or more processors, a non-transitory memory, and means for performing or causing performance of any of the methods described herein.
Numerous details are described in order to provide a thorough understanding of the example implementations shown in the drawings. However, the drawings merely show some example aspects of the present disclosure and are therefore not to be considered limiting. Those of ordinary skill in the art will appreciate that other effective aspects and/or variants do not include all of the specific details described herein. Moreover, well-known systems, methods, components, devices and circuits have not been described in exhaustive detail so as not to obscure more pertinent aspects of the example implementations described herein.
A physical environment refers to a physical world that people can sense and/or interact with without aid of electronic devices. The physical environment may include physical features such as a physical surface or a physical object. For example, the physical environment corresponds to a physical park that includes physical trees, physical buildings, and physical people. People can directly sense and/or interact with the physical environment such as through sight, touch, hearing, taste, and smell. In contrast, an extended reality (XR) environment refers to a wholly or partially simulated environment that people sense and/or interact with via an electronic device. For example, the XR environment may include augmented reality (AR) content, mixed reality (MR) content, virtual reality (VR) content, and/or the like. With an XR system, a subset of a person's physical motions, or representations thereof, are tracked, and, in response, one or more characteristics of one or more virtual objects simulated in the XR environment are adjusted in a manner that comports with at least one law of physics. As one example, the XR system may detect head movement and, in response, adjust graphical content and an acoustic field presented to the person in a manner similar to how such views and sounds would change in a physical environment. As another example, the XR system may detect movement of the electronic device presenting the XR environment (e.g., a mobile phone, a tablet, a laptop, or the like) and, in response, adjust graphical content and an acoustic field presented to the person in a manner similar to how such views and sounds would change in a physical environment. In some situations (e.g., for accessibility reasons), the XR system may adjust characteristic(s) of graphical content in the XR environment in response to representations of physical motions (e.g., vocal commands).
There are many different types of electronic systems that enable a person to sense and/or interact with various XR environments. Examples include head mountable systems, projection-based systems, heads-up displays (HUDs), vehicle windshields having integrated display capability, windows having integrated display capability, displays formed as lenses designed to be placed on a person's eyes (e.g., similar to contact lenses), headphones/earphones, speaker arrays, input systems (e.g., wearable or handheld controllers with or without haptic feedback), smartphones, tablets, and desktop/laptop computers. A head mountable system may have one or more speaker(s) and an integrated opaque display. Alternatively, a head mountable system may be configured to accept an external opaque display (e.g., a smartphone). The head mountable system may incorporate one or more imaging sensors to capture images or video of the physical environment, and/or one or more microphones to capture audio of the physical environment. Rather than an opaque display, a head mountable system may have a transparent or translucent display. The transparent or translucent display may have a medium through which light representative of images is directed to a person's eyes. The display may utilize digital light projection, OLEDs, LEDs, uLEDs, liquid crystal on silicon, laser scanning light source, or any combination of these technologies. The medium may be an optical waveguide, a hologram medium, an optical combiner, an optical reflector, or any combination thereof. In some implementations, the transparent or translucent display may be configured to become opaque selectively. Projection-based systems may employ retinal projection technology that projects graphical images onto a person's retina. Projection systems also may be configured to project virtual objects into the physical environment, for example, as a hologram or on a physical surface.
While a device is presenting an XR environment, a user of the device can enable visual representations of data. The visual representations can include a speech transcript of a person in the XR environment. Additionally or alternatively, the visual representations can include a description of the XR environment. For example, the visual representations can include information regarding an object in the XR environment. In previously available devices, the visual representations are displayed at a particular location on a display. For example, the visual representations may be displayed towards a bottom of the display. If the user is focusing on a portion of the XR environment that does not overlap with the location of the visual representations, the user has to shift his/her focus to the location of the visual representations. Repeatedly shifting his/her focus between the location of the visual representations and other portions of the environment may impose a strain on the user's eyes. Moreover, requiring the user to shift his/her focus to the location of the visual representations may detract from a user experience of the device by requiring the user to look away from objects that may interest the user.
Moreover, previously available devices generally display visual representations at a fixed depth. For example, the visual representations may be displayed at a depth that coincides with a point-of-view of the device (e.g., at zero depth from the point-of-view of the device). If the visual representations are displayed at a fixed depth from the point-of-view of the device (e.g., at zero depth from the point-of-view of the device) and the user is gazing at an object that is at a different depth, the user has to repeatedly adjust a depth at which the user is focusing thereby imposing a strain on the user's eyes. Displaying visual representations at a close depth while the user is focusing on an object at a greater depth may impose an even greater strain on a user that is farsighted because the user may not be able to clearly view the visual representations.
The present disclosure provides methods, systems, and/or devices for displaying a visual representation of audio data in an XR environment at a location that is based on a region of interest. While the device is presenting an XR environment from a current point-of-view, the device identifies a region of interest within the XR environment. The region of interest is sometimes located at a particular distance from the current point-of-view. The device can identify the region of interest by determining that a user of the device is gazing at an object that is positioned at a first distance from the current point-of-view of the device. In order to reduce strain on the user's eyes, the device can display visual representations in such a manner that the visual representations appear to be positioned at the same distance as the region of interest. For example, the device can display the visual representations such that the visual representations appear to be positioned at the same distance as the object that the user is gazing at. Since the visual representations are displayed at the same distance as the region of interest, the user does not have to shift his/her focus between different distances thereby reducing strain on the user's eyes.
The distance between the region of interest and the current point-of-view may be referred to as a depth-of-focus since the user is currently focusing at that depth. The device displays the visual representations at a depth that matches the depth-of-focus. Rendering the visual representations at or near the same depth as the depth-of-focus tends to reduce strain on the user's eyes because the user does not have to adjust his/her eyes to focus at different depths. If the current point-of-view is assigned a depth of zero, the depth-of-focus is a number that is greater than zero. In order to ensure that the user can effectively read the visual representations, the device adjusts a font size of the visual representations based on the depth-of-focus. The font size may be directly related (e.g., proportional) to the depth-of-focus so that the user can effectively read the visual representations. As such, as the depth-of-focus increases, the device increases the font size of the visual representations.
After identifying the region of interest, the device can display visual representations at a location that overlaps with the region of interest. Displaying the visual representations in the same plane as the depth-of-focus reduces eye strain, for example, because the user does not have to adjust his/her focus between different depths. The device can further reduce eye strain by displaying the visual representations adjacent to the region of interest within the plane. For example, if the region of interest is towards a top-right corner of the plane, the device can display the visual representations near the top-right corner instead of displaying the visual representations at the bottom-center of the plane. Displaying the visual representations adjacent to the region of interest reduces the need for the user to shift his/her gaze between different portions of the plane.
The device can identify the region of interest by localizing a sound that the visual representations correspond to. For example, if the visual representations include a transcript of a person's speech, the device can identify the person in the XR environment and render the visual representations adjacent to the person's face and at the same depth as the person. As another example, if the visual representations include lyrics of a song being played by a media device, the device can identify the media device and render the visual representations adjacent to the media device and at the same depth as the media device.
The device can display visual representations in a head-locked manner so that the user can view the visual representations as the user rotates his/her head. The depth-of-focus may change over time as the user shifts his/her gaze within the environment to objects that are located at different depths. As the depth-of-focus changes, the device displays the visual representations at matching depths. As the device displays the visual representations at different depths, the device changes a font size of the visual representations so that the visual representations appear to be displayed at a constant font size. The device can render the visual representations based on a tiered approach. The device can categorize the depth-of-focus into one of several tiers, and display the visual representations at a depth that corresponds to the tier associated with the depth-of-focus.
In some implementations, the electronic device 20 includes a handheld computing device that can be held by the user 22. For example, in some implementations, the electronic device includes a smartphone, a tablet, a media player, a laptop, or the like. In some implementations, the electronic device 20 includes a wearable computing device that can be worn by the user 22. For example, in some implementations, the electronic device 20 includes a head-mountable device (HMD) or an electronic watch. In some implementations, the electronic device 20 includes a smartphone or a tablet, and the image sensor 30 includes a rear-facing camera that captures an image of the person 50 when the user 22 points the rear-facing camera towards the person 50. In some implementations, the electronic device 20 includes an HMD, and the image sensor 30 includes a scene-facing camera that captures an image of the person 50 when the user 22 looks at the person 50.
In some implementations, the electronic device 20 determines to generate and display a transcript of the speech 60 in response to a voice characteristic (e.g., an amplitude, a speed and/or a language) of the speech 60 being outside an audible speech range. In some implementations, the electronic device 20 generates and displays visual representations for the speech 60 when an amplitude of the speech 60 is below a threshold amplitude. For example, the electronic device 20 displays a transcript of the speech 60 when the person 50 is speaking too softly for the user 22 to properly hear the speech 60 but loud enough for the audio sensor 40 to detect the speech 60. In some implementations, the electronic device 20 determines to generate and display a transcript of the speech 60 when a speed at which the person 50 is speaking is greater than a threshold speed. In some implementations, the electronic device 20 determines to generate and display a transcript of the speech 60 when a language in which the person 50 is speaking is different from a preferred language of the user 22. In some implementations, the electronic device determines to generate and display visual representations for the physical environment 10 when an ambient sound level of the physical environment 10 is greater than a threshold sound level (e.g., when the physical environment 10 is too loud for the user 22 to hear the speech 60). In some implementations, the electronic device 20 determines to generate and display visual representations for the physical environment 10 in response to determining that the user 22 is audially impaired. In some implementations, the user 22 generally uses a wearable hearing aid device, and the electronic device 20 determines to generate and display a transcript of the speech in response to detecting that the user 22 is currently not wearing the wearable hearing aid device.
Referring to
In various implementations, the electronic device 20 provides the user 22 an option to view a textual description of sounds in the XR environment 110. For example, the electronic device 20 may provide a visual representations option that, when activated, generates and displays visual representations for the XR environment 110. When the user 22 enables visual representations for the XR environment 110, the electronic device 20 generates a transcription of the speech 60 uttered by the person 50. If there are multiple people in the physical environment 10 that are uttering speech, the electronic device 20 can provide the user 22 an option to select which person's speech to transcribe and which person's speech to not transcribe. The electronic device may be associated with a default language. For example, the user 22 may have specified a preferred language during a setup operation. If the speech 60 is in a different language from the preferred language specified during the setup operation, the electronic device 20 can translate the speech 60 into the preferred language and display a transcript of the speech 60 in the preferred language.
In various implementations, the electronic device 20 identifies a region of interest within the XR environment 110. The region of interest 70 corresponds to a volumetric space that the user 22 appears to be interested in. In some implementations, the electronic device 20 identifies the region of interest 70 based on a gaze of the user 22. The electronic device 20 can determine where the user 22 is gazing based on image data captured by a user-facing camera. In some implementations, the electronic device 20 identifies the region of interest 70 based on a location of an audible signal in the XR environment 110. The electronic device 20 identifies the location of the audible signal as the region of interest 70, for example, because the user 22 may be expected to gaze at the source of the audible signal. In the example of
Referring to
The electronic device 20 selects the second depth 162 for displaying the visual representation 160 based on the first depth 152 of the region of interest 70. In some implementations, the second depth 162 is the same as (e.g., equal to) the first depth 152. In some implementations, the second depth 162 is within a threshold of the first depth 152. In some implementations, the electronic device 20 categorizes the first depth 152 into one of several depth tiers. Each depth tier is associated with a depth for displaying visual representations, and the electronic device 20 selects the second depth 162 by selecting the depth associated with the depth tier of the first depth 152.
In some implementations, the electronic device 20 determines whether the user 22 is visually impaired, and the electronic device 20 selects the second depth 162 based on a visual impairment of the user 22. In some implementations, the electronic device 20 determines that the user 22 is farsighted, and the electronic device 20 selects the second depth 162 such that the second depth 162 matches the first depth 152 or is slightly greater than the first depth 152 in order to reduce eye strain on the user 22. In some implementations, the electronic device 20 determines that the user 22 is nearsighted, and the electronic device 20 selects the second depth 162 such that the second depth 162 is slightly less than the first depth 152 in order to reduce eye strain on the user 22. Adjusting the depth at which the visual representations are displayed may help reduce eye strain if the user 22 is visually impaired and/or if the user 22 has difficulty in adjusting his/her optical focus between different depths.
While
Referring to
While
In various implementations, the data obtainer 210 obtains an image 212 from an image sensor (e.g., the image sensor 30 shown in
In various implementations, the region of interest determiner 220 generates a region of interest indication 222 that indicates a region of interest within the XR environment (e.g., the region of interest 70 shown in
In some implementations, the region of interest determiner 220 generates the region of interest indication 222 based on saliency data 226. In some implementations, the saliency data 226 indicates which part of the image 212 is the most salient part. In such implementations, the region of interest determiner 220 selects the most salient part of the image 212 as the region of interest. In some implementations, the region of interest determiner 220 determines the saliency data 226 for the image 212 based on human-curated saliency data for similar images.
In some implementations, the region of interest determiner 220 includes a machine-learned model that identifies the region of interest. In some implementations, the machine-learned model is trained with a corpus of images for which an operator (e.g., a human operator) has identified the respective regions of interest. In such implementations, the region of interest determiner 220 determines the region of interest by providing the image 212 to the machine-learned model and the machine-learned model generates the region of interest indication 222.
In various implementations, the content presenter 230 generates a visual representation 232 of the audible signal data 214. In some implementations, the content presenter 230 includes a speech-to-text converter 240 that converts speech represented by the audible signal data 214 to text. In some implementations, the content presenter 230 includes a translation determiner 250 that translates speech represented by the audible signal data 214 from a source language to a target language (e.g., to a preferred language of the user).
In some implementations, the visual representation 232 includes non-verbal elements. In some implementations, the content presenter 230 includes a pose determiner that determines a pose of a person that uttered speech represented by the audible signal data 214. In such implementations, the visual representation 232 can specify the pose of the person. For example, if the pose determiner determines that a pose of the speaker has changed from a seated pose to a standing pose, the visual representation 232 may include text that states “He stands up”.
In various implementations, the content presenter 230 determines a depth for displaying the visual representation 232. The content presenter 230 determines a depth of the region of interest and selects the depth for the visual representation 232 based on the depth of the region of interest. In some implementations, the content presenter 230 displays the visual representation 232 at the same depth as the region of interest. In some implementations, the region of interest is a volumetric region that spans multiple depths. In such implementations, the depth of the visual representation 232 is a function of the multiple depths that the volumetric region spans. For example, the depth of the visual representation 232 may be an average of the depths that the volumetric region spans. As another example, the depth of the visual representation 232 may be set to the smallest or the greatest of the depths that the volumetric region spans.
In some implementations, the depth of the visual representation 232 is within a threshold of the depth of the region of interest. In some implementations, the content presenter 230 categorizes the depth of the region of interest into one of several depth tiers. Each depth tier may be associated with a different depth for the visual representation 232, and the content presenter 230 displays the visual representation 232 at the depth associated with the depth tier of the region of interest. In various implementations, displaying the visual representation 232 at or near the same depth as the region of interest reduces eye strain by reducing the need for the user to shift his/her gaze between significantly different depths.
In some implementations, the content presenter 230 determines a position for displaying the visual representation 232 in other dimensions in addition to or as an alternative to determining the depth at which the visual representation 232 is to be displayed. In some implementations, the content presenter 230 positions the visual representation 232 such that the visual representation 232 is near (e.g., adjacent to) the region of interest horizontally and/or vertically. In some implementations, displaying the visual representation 232 horizontally and/or vertically near the region of interest further reduces eye strain by reducing the need for the user to shift his/her gaze horizontally and/or vertically between the region of interest and the visual representation 232.
In some implementations, the content presenter 230 displays the visual representation 232 at a position that is different from a default position in response to obtaining an indication that displaying the visual representation 232 at the default position may result in undue eye strain. In some implementations, the content presenter 230 has access to health data that indicates an eye condition of the user. The content presenter 230 can display the visual representation 232 at a non-default position to reduce eye strain when the health data indicates that the eye condition of the user is outside an acceptable range. For example, the content presenter 230 can display the visual representation 232 at the same depth as the region of interest when the health data indicates that having to shift focus between different depths will likely result in undue eye strain. As another example, the content presenter 230 can display the visual representation 232 near a horizontal position and/or near a vertical position of the region of interest when the health data indicates that having to shift gaze between the region of interest and a default horizontal position or a default vertical position will likely result in undue eye strain.
In some implementations, the content presenter 230 estimates an eye condition of the user (e.g., a level of tiredness of the user's eye) based on screen time data that indicates an amount of time that the user has viewed a display during a particular time duration. For example, if the screen time data indicates that the user has exceeded his/her average daily screen time by a threshold, then the content presenter 230 displays the visual representation 232 at a position that is based on a position of the region of interest. As an example, if the user has exceeded his/her average daily screen time by a threshold percentage (e.g., 20 percent), the content presenter 230 displays the visual representation 232 at a depth that is based on a depth of the region of interest, at a horizontal position that is based on a horizontal position of the region of interest and/or at a vertical position that is based on a vertical position of the region of interest.
As represented by block 310, in various implementations, the method 300 includes presenting a representation of a three-dimensional (3D) environment from a current point-of-view. For example, as shown in
As represented by block 310a, in some implementations, presenting the representation of the 3D environment includes displaying a video pass-through of a physical environment. For example, the electronic device 20 may present the XR environment 110 shown in
As represented by block 310b, in some implementations, presenting the representation of the 3D environment includes displaying a virtual environment that is different from a physical environment of the device. In some implementations, the method 300 includes generating a synthetic environment (e.g., a fictional environment) and displaying the synthetic environment on a display of the device.
As represented by block 320, in various implementations, the method 300 includes identifying a region of interest within the 3D environment. In some implementations, the region of interest is located at a first distance (e.g., a first depth) from the current point-of-view. For example, as shown in
As represented by block 320a, in some implementations, the region of interest includes a representation of a person that is generating the audible signal, and the visual representation includes a transcript that is displayed near the representation of the person. For example, as shown in
As represented by block 320b, in some implementations, identifying the region of interest includes localizing the audible signal to identify a source of the audible signal in the environment, and selecting a position of the source of the audible signal as the region of interest. For example, referring to
As represented by block 320c, in some implementations, identifying the region of interest includes detecting movement of an object in the 3D environment, and selecting a position of the object as the region of interest. As an example, referring to
As represented by block 320d, in some implementations, identifying the region of interest includes identifying the region of interest based on gaze data that indicates a gaze position of a user of the device. For example, as shown in
As represented by block 320e, in some implementations, identifying the region of interest includes identifying the region of interest based on a saliency map of the 3D environment. For example, as shown in
As represented by block 320f, in some implementations, the first distance represents a distance between the device and the region of interest within the 3D environment. For example, referring to
As represented by block 330, in some implementations, the method 300 includes receiving, via the audio sensor, an audible signal and converting the audible signal to audible signal data. For example, as shown in
As represented by block 340, in various implementations, the method 300 includes displaying, on the display, a visual representation of the audible signal data at a second distance from the current point-of-view that is a function of the first distance between the region of interest and the current point-of-view. For example, as shown in
As represented by block 340a, in some implementations, the first distance represents a first depth at which the region of interest is located and the second distance represents a second depth at which the visual representation is displayed. For example, as shown in
As represented by block 340b, in some implementations, the second distance is within a threshold distance of the first distance. For example, in some implementations, the device displays the visual representations at a depth that is near the depth of the region of interest. In some implementations, the device categorizes the first distance into one of several categories. Each category is associated with a depth for displaying visual representations and the device sets the second distance to a depth associated with a category of the first distance. In some implementations, the second distance is the same as the first distance. For example, in some implementations, the device displays visual representations at the same depth as the region of interest.
As represented by block 340c, in some implementations, the visual representation of the audible signal data includes a transcript of speech represented by the audible signal data. For example, as shown in
As represented by block 340d, in some implementations, the visual representation includes text that is displayed at a threshold size. In some implementations, the device adjusts a font size of the visual representation so that the text size appears to stay constant regardless of the depth at which the text is displayed. For example, as shown in
As represented by block 340e, in some implementations, displaying the visual representation includes categorizing the first distance into a first category of a plurality of categories that are associated with respective rendering depths including a first rendering depth associated with the first category, and selecting the first rendering depth as the second distance. More generally, in various implementations, the device categorizes the current depth-of-focus into one of several tiers and the device displays the visual representation at a depth that corresponds to a tier of the depth-of-focus.
As represented by block 340f, in some implementations, the audible signal data corresponds to a sound being generated within the region of interest and the visual representation includes a textual description of the sound being generated within the region of interest. For example, as described in relation to
As represented by block 340g, in some implementations, the method 300 includes identifying a second region of interest that is located at a third distance from the current point-of-view, receiving, via the audio sensor, a second audible signal and converting the second audible signal to second audible signal data, and displaying, on the display, a second visual representation of the second audible signal data at a fourth distance from the current point-of-view that is a function of the third distance between the second region of interest and the current point-of-view. For example, as shown in
In some implementations, the network interface 402 is provided to, among other uses, establish and maintain a metadata tunnel between a cloud hosted network management system and at least one private network including one or more compliant devices. In some implementations, the one or more communication buses 405 include circuitry that interconnects and controls communications between system components. The memory 404 includes high-speed random access memory, such as DRAM, SRAM, DDR RAM or other random access solid state memory devices, and may include non-volatile memory, such as one or more magnetic disk storage devices, optical disk storage devices, flash memory devices, or other non-volatile solid state storage devices. The memory 404 optionally includes one or more storage devices remotely located from the one or more CPUs 401. The memory 404 comprises a non-transitory computer readable storage medium.
In some implementations, the memory 404 or the non-transitory computer readable storage medium of the memory 404 stores the following programs, modules and data structures, or a subset thereof including an optional operating system 406, the data obtainer 210, the region of interest determiner 220, the content presenter 230, the speech-to-text converter 240 and the translation determiner 250. In various implementations, the device 400 performs the method 300 shown in
In some implementations, the data obtainer 210 includes instructions 210a, and heuristics and metadata 210b for obtaining an image of an environment (e.g., the image 212 shown in
In some implementations, the region of interest determiner 220 includes instructions 220a, and heuristics and metadata 220b for identifying a region of interest within an environment (e.g., the region of interest 70 shown in
In some implementations, the content presenter 230 includes instructions 230a, and heuristics and metadata 230b for presenting an XR environment (e.g., the XR environment 110 shown in
In some implementations, the speech-to-text converter 240 includes instructions 240a, and heuristics and metadata 240b for converting speech into text (e.g., for converting the speech 60 shown in
In some implementations, the translation determiner 250 includes instructions 250a, and heuristics and metadata 250b for translating speech from a source language to a target language. In some implementations, the translation determiner 250 performs at least some of the operation(s) represented by block 340c in
In some implementations, the one or more I/O devices 408 include an input device for obtaining an input. In some implementations, the input device includes a touchscreen for detecting touch inputs, an image sensor for detecting 3D gestures, and/or a microphone for detecting voice inputs and/or sounds originating in a physical environment of the device (e.g., for detecting the speech 60 shown in
In various implementations, the one or more I/O devices 408 include a video pass-through display which displays at least a portion of a physical environment surrounding the device 400 as an image captured by a camera (e.g., for displaying the XR environment 110 shown in
It will be appreciated that
Implementations described herein contemplate the use of gaze information to present salient points of view and/or salient information. Implementers should consider the extent to which gaze information is collected, analyzed, disclosed, transferred, and/or stored, such that well-established privacy policies and/or privacy practices are respected. These considerations should include the application of practices that are generally recognized as meeting or exceeding industry requirements and/or governmental requirements for maintaining the user privacy. The present disclosure also contemplates that the use of a user's gaze information may be limited to what is necessary to implement the described embodiments. For instance, in implementations where a user's device provides processing power, the gaze information may be processed at the user's device, locally.
While various aspects of implementations within the scope of the appended claims are described above, it should be apparent that the various features of implementations described above may be embodied in a wide variety of forms and that any specific structure and/or function described above is merely illustrative. Based on the present disclosure one skilled in the art should appreciate that an aspect described herein may be implemented independently of any other aspects and that two or more of these aspects may be combined in various ways. For example, an apparatus may be implemented and/or a method may be practiced using any number of the aspects set forth herein. In addition, such an apparatus may be implemented and/or such a method may be practiced using other structure and/or functionality in addition to or other than one or more of the aspects set forth herein.
This application claims the benefit of U.S. Provisional Patent App. No. 63/348,267, filed on Jun. 2, 2022, which is incorporated by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
63348267 | Jun 2022 | US |