This disclosure relates generally to cameras, and in particular to capturing gaze-guided images.
A head mounted device is a wearable electronic device, typically worn on the head of a user. Head mounted devices may include one or more electronic components for use in a variety of applications, such as gaming, aviation, engineering, medicine, entertainment, activity tracking, and so on. Head mounted devices may include one or more displays to present virtual images to a wearer of the head mounted device. When a head mounted device includes a display, it may be referred to as a head mounted display. Head mounted devices may include one or more cameras to facilitate capturing images.
Non-limiting and non-exhaustive embodiments of the invention are described with reference to the following figures, wherein like reference numerals refer to like parts throughout the various views unless otherwise specified.
Embodiments of gaze-guided image capturing are described herein. In the following description, numerous specific details are set forth to provide a thorough understanding of the embodiments. One skilled in the relevant art will recognize, however, that the techniques described herein can be practiced without one or more of the specific details, or with other methods, components, materials, etc. In other instances, well-known structures, materials, or operations are not shown or described in detail to avoid obscuring certain aspects.
Reference throughout this specification to “one implementation” or “an implementation” means that a particular feature, structure, or characteristic described in connection with the implementation is included in at least one implementation of the present invention. Thus, the appearances of the phrases “in one implementation” or “in an implementation” in various places throughout this specification are not necessarily all referring to the same implementation. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more implementations.
In some implementations of the disclosure, the term “near-eye” may be defined as including an element that is configured to be placed within 50 mm of an eye of a user while a near-eye device is being utilized. Therefore, a “near-eye optical element” or a “near-eye system” would include one or more elements configured to be placed within 50 mm of the eye of the user.
In aspects of this disclosure, visible light may be defined as having a wavelength range of approximately 380 nm-700 nm. Non-visible light may be defined as light having wavelengths that are outside the visible light range, such as ultraviolet light and infrared light. Infrared light having a wavelength range of approximately 700 nm-1 mm includes near-infrared light. In aspects of this disclosure, near-infrared light may be defined as having a wavelength range of approximately 700 nm-1.6 μm.
In aspects of this disclosure, the term “transparent” may be defined as having greater than 90% transmission of light. In some aspects, the term “transparent” may be defined as a material having greater than 90% transmission of visible light.
Implementations of devices, systems, and methods of capturing gaze-guided images are disclosed herein. In some implementations of the disclosure, a head mounted device includes an eye-tracking system that determines a gaze direction of an eye of a user of the head mounted device. One or more gaze-guided images is generated based on the gaze direction from one or more images captured by one or more cameras of the head mounted device that are configured to image an external environment of the head mounted device.
In an implementation, a head mounted device includes an eye-tracking system, a first image sensor, a second image sensor, and processing logic. The eye-tracking system generates a gaze direction of an eye of a user of the head mounted device. The processing logic receives the gaze direction and selects between the first image sensor and the second image sensor to capture the gaze-guided image. The image sensor that has a FOV that corresponds to the gaze direction may be selected for capturing the gaze-guided image(s), for example.
An implementation of the disclosure includes a method of operating a head mounted device. A gaze direction of an eye of a user of a head mounted device is determined and one or more images are captured by a camera of the head mounted device. One or more gaze-guided images are generated from the one or more images based on the gaze direction of the user. In an implementation, generating the gaze-guided images includes digitally cropping one or more of the images. In an implementation, generating the gaze-guided images includes rotating the camera in response to the gaze direction.
Generating gaze-guided images in response to a gaze direction allows users to capture images that are relevant to where they are gazing/looking without requiring additional effort. Additionally, in some implementations, generating gaze-guided images in response to a gaze direction of the user allows for one or more cameras to capture images that are focused to a depth of field that the user is looking at. By way of example, cameras may be focused to a near-field subject (a flower close to the user) or a far-field subject (e.g. mountains in the distance) in response to the gaze direction determined by the eye-tracking system. These and other implementations are described in more detail in connection with
In addition to image sensors, various other sensors of head mounted device 100 may be configured to capture eye data that is utilized to determine a gaze direction of the eye (or eyes). Ultrasound or light detection and ranging (LIDAR) sensors may be configured in frame 102 to detect a position of an eye of the user by detecting the position of the cornea of the eye, for example. Discrete photodiodes included in frame 102 or optical elements 110A and/or 110B may also be used to detect a position of the eye of the user. Discrete photodiodes may be used to detect “glints” of light reflecting off of the eye, for example. Eye data generated by various sensors may not necessarily be considered “images” of the eye yet the eye-data may be used by an eye-tracking system to determine a gaze direction of the eye(s).
When head mounted device 100 includes a display, it may be considered a head mounted display. Head mounted device 100 may be considered an augmented reality (AR) head mounted display. While
Illumination layer 130A is shown as including a plurality of in-field illuminators 126. In-field illuminators 126 are described as “in-field” because they are in a field of view (FOV) of a user of the head mounted device 100. In-field illuminators 126 may be in a same FOV that a user views a display of the head mounted device 100, in an implementation. In-field illuminators 126 may be in a same FOV that a user views an external environment of the head mounted device 100 via scene light 191 propagating through near-eye optical elements 110. Scene light 191 is from the external environment of head mounted device 100. While in-field illuminators 126 may introduce minor occlusions into the near-eye optical element 110A, the in-field illuminators 126, as well as their corresponding electrical routing may be so small as to be unnoticeable or insignificant to a wearer of head mounted device 100. In some implementations, illuminators 126 are not in-field. Rather, illuminators 126 could be out-of-field in some implementations.
As shown in
As shown in
Optically transparent layer 120A is shown as being disposed between the illumination layer 130A and the eyeward side 109 of the near-eye optical element 110A. The optically transparent layer 120A may receive the infrared illumination light emitted by the illumination layer 130A and pass the infrared illumination light to illuminate the eye of the user in an eyebox region of the head mounted device. As mentioned above, the optically transparent layer 120A may also be transparent to visible light, such as scene light 191 received from the environment and/or image light 141 received from the display layer 140A. In some examples, the optically transparent layer 120A has a curvature for focusing light (e.g., display light and/or scene light) to the eye of the user. Thus, the optically transparent layer 120A may, in some examples, may be referred to as a lens. In some aspects, the optically transparent layer 120A has a thickness and/or curvature that corresponds to the specifications of a user. In other words, the optically transparent layer 120A may be a prescription lens. However, in other examples, the optically transparent layer 120A may be a non-prescription lens.
Head mounted device 100 includes at least one camera for generating gaze-guided images in response to a gaze direction of the eye(s). In the particular illustrated example of
In
Second camera 293B includes a second image sensor configured to capture second images 295B of an external environment of the head mounted device. The second image sensor has a second field of FOV 297B and axis 298B illustrates a middle of the second FOV 297B. Axis 298B may correspond to an optical axis of a lens assembly of second camera 293B and axis 298B may intersect a middle of the second image sensor. Second camera 293B is configured to provide second images 295B to processing logic 270.
Third camera 293C includes a third image sensor configured to capture third images 295C of an external environment of the head mounted device. The third image sensor has a third field of FOV 297C and axis 298C illustrates a middle of the third FOV 297C. Axis 298C may correspond to an optical axis of a lens assembly of third camera 293C and axis 298C may intersect a middle of the third image sensor. Third camera 293C is configured to provide third images 295C to processing logic 270.
Fourth camera 293D includes a fourth image sensor configured to capture fourth images 295D of an external environment of the head mounted device. The fourth image sensor has a fourth field of FOV 297D and axis 298D illustrates a middle of the fourth FOV 297D. Axis 298D may correspond to an optical axis of a lens assembly of fourth camera 293D and axis 298D may intersect a middle of the fourth image sensor. Fourth camera 293D is configured to provide fourth images 295D to processing logic 270.
Eye-tracking system 260 includes one or more sensors configured to determine a gaze direction of an eye in an eyebox region of a head mounted device. Eye-tracking system 260 may also include digital or analog processing logic to assist in determining/calculating the gaze direction of the eye. Any suitable technique may be used to determine a gaze direction of the eye(s). For example, eye-tracking system 260 may include one or more cameras to image the eye(s) to determine a pupil-position of the eye(s) to determine where the eye is gazing. In another example, “glints” reflecting off the cornea (and/or other portions of the eye) are utilized to determine the position of the eye that is then used to determine the gaze direction. Other sensors described in association with
Eye-tracking system 260 is configured to generate gaze direction data 265 that includes a gaze direction of the eye(s) and provide gaze direction data 265 to processing logic 270. Gaze direction data 265 may include vergence data representative of a focus distance and a direction of where two eyes are focusing. Processing logic 270 is configured to receive gaze direction data 265 from eye-tracking system 260 and select a selected image sensor to capture one or more gaze-guided images based on gaze direction data 265. In the illustrated implementations of
In an implementation, processing logic 270 selects a particular image sensor for capturing the gaze-guided image(s) based on the gaze direction included in gaze direction data 265. For example, processing logic 270 may select between two or more image sensors to capture the gaze-guided image(s). Selecting the selected image sensor to capture the one or more gaze-guided images may be based on the gaze direction (included in gaze direction data 265) with respect to the FOV of the image sensors.
The FOV of the image sensors may overlap in some implementations. In
At a subsequent point in time, a gaze direction of the user may change such that gaze vector 262 is representative of a subsequent gaze direction of subsequent gaze direction data 265. Gaze vector 262 may be included in both FOV 297B and FOV 297C. Processing logic 270 may select the image sensor of the camera where the gaze vector (e.g. gaze vector 262) is closest to a middle of the FOV of that image sensor. In the illustrated example, the image sensor of camera 293C may be selected by processing logic 270 as the “subsequent-selected image sensor” to capture gaze-guided images since gaze vector 262 is closer to the middle of FOV 297C (axis 298C) than it is to the middle of FOV 297B (axis 298B). The subsequent-selected image sensor may then generate the gaze-guided images.
At yet another point in time, a gaze direction of the user may change such that gaze vector 261 is representative of the gaze direction of gaze direction data 265. Gaze vector 261 may be included in both FOV 297B and FOV 297C. Processing logic 270 may select the image sensor of the camera where the gaze vector (e.g. gaze vector 261) is closest to a middle of the FOV of that image sensor. In the illustrated example, the image sensor of camera 293B may be selected by processing logic 270 as the “selected image sensor” to capture gaze-guided images since gaze vector 261 is closer to the middle of FOV 297B (axis 298B) than it is to the middle of FOV 297C (axis 298C). In this context, second images 295B captured by camera 293B are stored in memory 280 as gaze-guided images 275.
Display layer 440 presents virtual images in image light 441 to an eyebox region 401 for viewing by an eye 403. Processing logic 470 is configured to drive virtual images onto display layer 440 to present image light 441 to eyebox region 401. Illumination layer 430 includes light sources 426 configured to illuminate an eyebox region 401 with infrared illumination light 427. Illumination layer 430 may include a transparent refractive material that functions as a substrate for light sources 426. Infrared illumination light 427 may be near-infrared illumination light. Eye-tracking system 460 includes a camera configured to image (directly) eye 403, in the illustrated example of
The camera of eye-tracking system 460 may include a complementary metal-oxide semiconductor (CMOS) image sensor, in some implementations. An infrared filter that receives a narrow-band infrared wavelength may be placed over the image sensor of the camera so it is sensitive to the narrow-band infrared wavelength while rejecting visible light and wavelengths outside the narrow-band. Infrared light sources (e.g. light sources 426) such as infrared LEDs or infrared VCSELS that emit the narrow-band wavelength may be oriented to illuminate eye 403 with the narrow-band infrared wavelength.
In the illustrated implementation of
In some implementations, processing logic 470 may transmit gaze direction data 465 and images 496 to a mobile device 499 or other computing device. Images 496 may include the one or more images 495 received from cameras 493A and 493B. Processing logic 498 of mobile device 499 may then generate the gaze-guided images using any of the techniques of this disclosure. Transmitting the gaze direction data 465 and image 496 to mobile device 499 for generating the gaze-guided images may be advantageous to conserve compute power and processing power of head mounted device 400, for example.
In process block 505, a gaze direction of an eye of a user (of a head mounted device) is determined. The gaze direction may be determined by an eye-tracking system (e.g. eye-tracking system 260 or 460) or by processing logic that receives gaze direction data (e.g. processing logic 270 or 470), for example.
In process block 510, one or more images are captured by at least one camera of the head mounted device.
One or more gaze-guided images is generated in process block 515. The one or more gaze-guided images are based on the gaze direction of the user. Process 500 may return to process block 505 after executing process block 515 to determine a new gaze direction of the eye of user and repeat process 500 to generate gaze-guided images based on a gaze direction of the user.
In an implementation of process 500, the at least one camera of process block 510 is included in a plurality of cameras of the head mounted device and generating the one or more gaze-guided images includes selecting a selected camera among the plurality of cameras of the head mounted device. The selected camera is selected to capture the one or more gaze-guided images based on the gaze direction.
In an implementation of process 500, generating the one or more gaze-guided images includes cropping one or more images to generate the gaze-guided images where the one or more images are cropped in response to the gaze direction with respect to a field of view (FOV) of the at least one camera.
In an implementation of process 500, the at least one camera of process block 510 includes a lens assembly configured to focus image light onto an image sensor of the camera and generating the one or more gaze-guided images includes driving an optical zoom of the lens assembly in response to the gaze direction.
Referring again to
By way of example, a subject such as mountains 241 in
In another implementation of process 500, generating the one or more gaze-guided images includes: (1) identifying a focus distance that corresponds with the gaze direction of the user; and (2) applying filters to the one or more images to generate the gaze-guided images. In an example, a blur filter is applied to the one or more images to blur a foreground of the image (the foreground having a depth less than the focus distance) and/or a background of the image (the background having a depth greater than the focus distance). In this way, the subject that the user may be gazing at is in focus (sharp) in the gaze-generated image.
In yet another implementation of process 500, the at least one camera of process block 510 includes a lens assembly configured to focus image light onto an image sensor of the camera and generating the one or more gaze-guided images includes rotating the image sensor and the lens assembly of the camera in response to the gaze direction. In other words, the camera may be physically rotated to be pointed where the user is gazing.
Embodiments of the invention may include or be implemented in conjunction with an artificial reality system. Artificial reality is a form of reality that has been adjusted in some manner before presentation to a user, which may include, e.g., a virtual reality (VR), an augmented reality (AR), a mixed reality (MR), a hybrid reality, or some combination and/or derivatives thereof. Artificial reality content may include completely generated content or generated content combined with captured (e.g., real-world) content. The artificial reality content may include video, audio, haptic feedback, or some combination thereof, and any of which may be presented in a single channel or in multiple channels (such as stereo video that produces a three-dimensional effect to the viewer). Additionally, in some embodiments, artificial reality may also be associated with applications, products, accessories, services, or some combination thereof, that are used to, e.g., create content in an artificial reality and/or are otherwise used in (e.g., perform activities in) an artificial reality. The artificial reality system that provides the artificial reality content may be implemented on various platforms, including a head-mounted display (HMD) connected to a host computer system, a standalone HMD, a mobile device or computing system, or any other hardware platform capable of providing artificial reality content to one or more viewers.
The term “processing logic” (e.g. 270 and/or 470) in this disclosure may include one or more processors, microprocessors, multi-core processors, Application-specific integrated circuits (ASIC), and/or Field Programmable Gate Arrays (FPGAs) to execute operations disclosed herein. In some embodiments, memories (not illustrated) are integrated into the processing logic to store instructions to execute operations and/or store data. Processing logic may also include analog or digital circuitry to perform the operations in accordance with embodiments of the disclosure.
A “memory” or “memories” (e.g. 280 and/or 475) described in this disclosure may include one or more volatile or non-volatile memory architectures. The “memory” or “memories” may be removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules, or other data. Example memory technologies may include RAM, ROM, EEPROM, flash memory, CD-ROM, digital versatile disks (DVD), high-definition multimedia/data storage disks, or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information for access by a computing device.
Network may include any network or network system such as, but not limited to, the following: a peer-to-peer network; a Local Area Network (LAN); a Wide Area Network (WAN); a public network, such as the Internet; a private network; a cellular network; a wireless network; a wired network; a wireless and wired combination network; and a satellite network.
Communication channels may include or be routed through one or more wired or wireless communication utilizing IEEE 802.11 protocols, BlueTooth, SPI (Serial Peripheral Interface), I2C (Inter-Integrated Circuit), USB (Universal Serial Port), CAN (Controller Area Network), cellular data protocols (e.g. 3G, 4G, LTE, 5G), optical communication networks, Internet Service Providers (ISPs), a peer-to-peer network, a Local Area Network (LAN), a Wide Area Network (WAN), a public network (e.g. “the Internet”), a private network, a satellite network, or otherwise.
A computing device may include a desktop computer, a laptop computer, a tablet, a phablet, a smartphone, a feature phone, a server computer, or otherwise. A server computer may be located remotely in a data center or be stored locally.
The processes explained above are described in terms of computer software and hardware. The techniques described may constitute machine-executable instructions embodied within a tangible or non-transitory machine (e.g., computer) readable storage medium, that when executed by a machine will cause the machine to perform the operations described. Additionally, the processes may be embodied within hardware, such as an application specific integrated circuit (“ASIC”) or otherwise.
A tangible non-transitory machine-readable storage medium includes any mechanism that provides (i.e., stores) information in a form accessible by a machine (e.g., a computer, network device, personal digital assistant, manufacturing tool, any device with a set of one or more processors, etc.). For example, a machine-readable storage medium includes recordable/non-recordable media (e.g., read only memory (ROM), random access memory (RAM), magnetic disk storage media, optical storage media, flash memory devices, etc.).
The above description of illustrated embodiments of the invention, including what is described in the Abstract, is not intended to be exhaustive or to limit the invention to the precise forms disclosed. While specific embodiments of, and examples for, the invention are described herein for illustrative purposes, various modifications are possible within the scope of the invention, as those skilled in the relevant art will recognize.
These modifications can be made to the invention in light of the above detailed description. The terms used in the following claims should not be construed to limit the invention to the specific embodiments disclosed in the specification. Rather, the scope of the invention is to be determined entirely by the following claims, which are to be construed in accordance with established doctrines of claim interpretation.