Unless otherwise indicated herein, the materials described in this section are not prior art to the claims in this application and are not admitted to be prior art by inclusion in this section.
Computing devices such as personal computers, laptop computers, tablet computers, cellular phones, and countless types of Internet-capable devices are increasingly prevalent in numerous aspects of modern life. Over time, the manner in which these devices are providing information to users is becoming more intelligent, more efficient, more intuitive, and/or less obtrusive.
The trend toward miniaturization of computing hardware, peripherals, as well as of sensors, detectors, and image and audio processors, among other technologies, has helped open up a field sometimes referred to as “wearable computing.” In the area of image and visual processing and production, in particular, it has become possible to consider wearable displays that place a very small image display element close enough to a wearer's (or user's) eye(s) such that the displayed image fills or nearly fills the field of view, and appears as a normal sized image, such as might be displayed on a traditional image display device. The relevant technology may be referred to as “near-eye displays.”
Near-eye displays are fundamental components of wearable displays, also sometimes called “head-mounted displays” (HMDs). A head-mounted display places a graphic display or displays close to one or both eyes of a wearer. To generate the images on a display, a computer processing system may be used. Such displays may occupy a wearer's entire field of view, or only occupy part of wearer's field of view. Further, head-mounted displays may be as small as a pair of glasses or as large as a helmet.
Emerging and anticipated uses of wearable displays include applications in which users interact in real time with an augmented or virtual reality. Such applications can be mission-critical or safety-critical, such as in a public safety or aviation setting. The applications can also be recreational, such as interactive gaming.
In one aspect, a computer-implemented method is provided. A field of view of an environment is provided through a head-mounted display (HMD) of a wearable computing device. The HMD is operable to display a computer-generated image overlaying at least a portion of the view. The wearable computing device can be engaged in an experience sharing session. At least one image of the environment is captured using a camera associated with the wearable computing device. The wearable computing device determines a first portion of the at least one image that corresponds to a region of interest within the field of view. The wearable computing device formats the at least one image such that a second portion of the at least one image is of a lower-bandwidth format than the first portion. The second portion of the at least one image is outside of the portion that corresponds to the region of interest. The wearable computing device transmits the formatted at least one image as part of the experience-sharing session.
In another aspect, a method is provided. A field of view of an environment is provided through a head-mounted display (HMD) of a wearable computing device. The HMD is operable to display a computer-generated image overlaying at least a portion of the view. The wearable computing device can be engaged in an experience sharing session. An instruction of audio of interest is received at the wearable computing device. Audio input is received at the wearable computing device via one or more microphones. The wearable computing device determines whether the audio input includes at least part of the audio of interest. In response to determining that the audio input includes the at least part of the audio of interest, the wearable computing device generates an indication of a region of interest associated with the at least part of the audio of interest. The wearable computing device displays the indication of the region of interest as part of the computer-generated image.
In yet another aspect, a method is provided. An experience sharing session is established at a server. The server receives one or more images of a field of view of an environment via the experience sharing session. The server receives an indication of a region of interest within the field of view of the environment via the experience sharing session. A first portion of one or more images is determined that corresponds to the region of interest. The one or more images are formatted such that a second portion of the one or more images is formatted in a lower-bandwidth format that the first portion. The second portion of the one or more images is outside of the portion that corresponds to the region of interest. The formatted one or more images are transmitted.
In a further aspect, a wearable computing device is provided. The wearable computing device includes a processor and memory. The memory has one or more instructions that, in response to execution by the processor, cause the wearable computing device to perform functions. The functions include: (a) establish an experience sharing session, (b) receive one or more images of a field of view of an environment via the experience sharing session, (c) receive an indication of a region of interest within the field of view of the one or more images via the experience sharing session, (d) determine a first portion of the one or more images that corresponds to the region of interest, (e) format the one or more images such that a second portion of the one or more images is formatted in a lower-bandwidth format that the first portion, where the second portion of the one or more images is outside of the portion that corresponds to the region of interest, and (f) transmit the formatted one or more images.
In yet another aspect, an apparatus is provided. The apparatus includes: (a) means for establishing an experience sharing session, (b) means for receiving one or more images of a field of view of an environment via the experience sharing session, (c) means for receiving an indication of a region of interest within the field of view of the one or more images via the experience sharing session, (d) means for determining a first portion of the one or more images that corresponds to the region of interest, (e) means for formatting the one or more images such that a second portion of the one or more images is formatted in a lower-bandwidth format that the first portion, where the second portion of the one or more images is outside of the portion that corresponds to the region of interest, and (f) means for transmitting the formatted one or more images.
In the figures:
Exemplary methods and systems are described herein. It should be understood that the word “exemplary” is used herein to mean “serving as an example, instance, or illustration.” Any embodiment or feature described herein as “exemplary” is not necessarily to be construed as preferred or advantageous over other embodiments or features. The exemplary embodiments described herein are not meant to be limiting. It will be readily understood that certain aspects of the disclosed systems and methods can be arranged and combined in a wide variety of different configurations, all of which are contemplated herein.
In the following detailed description, reference is made to the accompanying figures, which form a part thereof. In the figures, similar symbols typically identify similar components, unless context dictates otherwise. The illustrative embodiments described in the detailed description, figures, and claims are not meant to be limiting. Other embodiments may be utilized, and other changes may be made, without departing from the spirit or scope of the subject matter presented herein. It will be readily understood that the aspects of the present disclosure, as generally described herein, and illustrated in the figures, can be arranged, substituted, combined, separated, and designed in a wide variety of different configurations, all of which are contemplated herein.
Experience sharing generally involves a user sharing media that captures their experience with one or more other users. In an exemplary embodiment, a user may use a wearable computing device or another computing device to capture media that conveys the world as they are experiencing it, and then transmit this media to others in order to share their experience. For example, in an experience-sharing session (ESS), a user may share a point-of-view video feed captured by a video camera on a head-mounted display of their wearable computer, along with a real-time audio feed from a microphone of their wearable computer. Many other examples are possible as well.
In an experience-sharing session, the computing device that is sharing a user's experience may be referred to as a “sharing device” or a “sharer,” while the computing device or devices that are receiving real-time media from the sharer may each be referred to as a “viewing device” or a “viewer.” Additionally, the content that is shared by the sharing device during an experience-sharing session may be referred to as a “share.” Further, a computing system that supports an experience-sharing session between a sharer and one or more viewers may be referred to as a “server”, an “ES server,” “server system,” or “supporting server system.”
In some exemplary methods, the sharer may transmit a share in real time to the viewer, allowing the experience to be portrayed as it occurs. In this case, the sharer may also receive and present comments from the viewers. For example, a sharer may share the experience of navigating a hedge maze while receiving help or criticism from viewers. In another embodiment, the server may store a share so that new or original viewers may access the share outside of real time.
A share may include a single type of media content (i.e., a single modality of media), or may include multiple types of media content (i.e., multiple modalities of media). In either case, a share may include a video feed, a three-dimensional (3D) video feed (e.g., video created by two cameras that is combined to create 3D video), an audio feed, a text-based feed, an application-generated feed, and/or other types of media content.
Further, in some embodiments a share may include multiple instances of the same type of media content. For example, in some embodiments, a share may include two or more video feeds. For instance, a share could include a first video feed from a forward-facing camera on a head-mounted display (HMD), and a second video feed from a camera on the HMD that is facing inward towards the wearer's face. As another example, a share could include multiple audio feeds for stereo audio or spatially-localized audio providing surround sound.
In some implementations, a server may allow a viewer to participate in a voice chat that is associated with the experience-sharing session in which they are a viewer. For example, a server may support a voice chat feature that allows viewers and/or the sharer in an experience-sharing session to enter an associated voice-chat session. The viewers and/or the sharer who participate in a voice-chat session may be provided with a real-time audio connection with one another, so that each of those devices can play out the audio from all the other devices in the session. In an exemplary embodiment, the serving system supporting the voice-chat session may sum or mix the audio feeds from all participating viewers and/or the sharer into a combined audio feed that is output to all the participating devices. Further, in such an embodiment, signal processing may be used to minimize noise when audio is not received from a participating device (e.g., when the user of that device is not speaking). Further, when a participant exits the chat room, that participant's audio connection may be disabled. (Note however, that they may still participate in the associated experience-sharing session.) This configuration may help to create the perception of an open audio communication channel.
In a further aspect, a server could also support a video-chat feature that is associated with an experience-sharing session. For example, some or all of the participants in a video chat could stream a low-resolution video feed. As such, participants in the video chat may be provided with a view of a number of these low-resolution video feeds on the same screen as the video from a sharer, along with a combined audio feed as described above. For instance, low-resolution video feeds from viewers and/or the sharer could be displayed to a participating viewer. Alternatively, the supporting server may determine when a certain participating device is transmitting speech from its user, and update which video or videos are displayed based on which participants are transmitting speech at the given point in time.
In either scenario above, and possibly in other scenarios, viewer video feeds may be formatted to capture the users themselves, so that the users can be seen as they speak. Further, the video from a given viewer or the sharer may be processed to include a text caption including, for example, the name of a given device's user or the location of device. Other processing may also be applied to video feeds in a video chat session.
In some embodiments, a video chat session may be established that rotates the role of sharer between different participating devices (with those devices that are not designated as the sharer at a given point in time acting as a viewer.) For example, when a number of wearable computers are involved in a rotating-sharer experience-sharing session, the supporting server system may analyze audio feeds from the participating wearable computers to determine which wearable computer is transmitting audio including the associated user's speech. Accordingly, the server system may select the video from this wearable computer and transmit the video to all the other participating wearable computers. The wearable computer may be de-selected when it is determined that speech is no longer being received from it. Alternatively, the wearable computer may be de-selected after waiting for a predetermined amount of time after it ceases transmission of speech.
In a further aspect, the video from some or all the wearable computers that participate in such a video chat session may capture the experience of the user that is wearing the respective wearable computer. Therefore, when a given wearable computer is selected, this wearable computer is acting as the sharer in the experience-sharing session, and all the other wearable computers are acting as viewers. Thus, as different wearable computers are selected, the role of the sharer in the experience-sharing session is passed between these wearable computers. In this scenario, the sharer in the experience-sharing session is updated such that the user who is speaking at a given point in time is sharing what they are seeing with the other users in the session.
In a variation on the above-described video-chat application, when multiple participants are acting a sharers and transmitting a share, individual viewers may be able to select which share they receive, such that different viewers may be concurrently receiving different shares.
In another variation on the above-described video-chat application, the experience-sharing session may have a “directing viewer” that may select which shares or shares will be displayed at any given time. This variation may be particularly useful in an application of a multi-sharer experience-sharing session, in which a number of viewers are all transmitting a share related to a certain event. For instance, each member of a football team could be equipped with a helmet-mounted camera. As such, all members of the team could act as sharers in a multi-sharer experience-sharing session by transmitting a real-time video feed from their respective helmet-mounted cameras. A directing viewer, could then select which video feeds to display at a given time. For example, at a given point in time, the directing viewer might select a video feed or feeds from a member or members that are involved in a play that is currently taking place.
In a further aspect of such an embodiment, the supporting server system may be configured to resolve conflicts if multiple devices transmit speech from their users simultaneously. Alternatively, the experience-sharing session interface for participants may be configured to display multiple video feeds at once (i.e., to create multiple simultaneous sharers in the experience-sharing session). For instance, if speech is received from multiple participating devices at once, a participating device may divide its display to show the video feeds from some or all of the devices from which speech is simultaneously received.
In a further aspect, a device that participates in an experience-sharing session, may store the share or portions of the share for future reference. For example, in a video-chat implementation, a participating device and/or a supporting server system may store the video and/or audio that is shared during the experience-sharing session. As another example, in a video-chat or voice-chat session, a participating device and/or a supporting server system may store a transcript of the audio from the session.
In many instances, users may want to participate in an experience-sharing session via their mobile devices. However, streaming video and other media to mobile devices can be difficult due to bandwidth limitations. Further, users may have bandwidth quotas in their service plans, and this wish to conserve their bandwidth usage. For these and/or other reasons, it may be desirable to conserve bandwidth where possible. As such, exemplary methods may take advantage of the ability to identify a region of interest (ROI) in a share, which corresponds to what the sharer is focusing on, and then format the share so as to reduce the bandwidth required for portions of the share other than the ROI. Since viewers are more likely to be interested in what the sharer is focusing on, this type of formatting may help to reduce bandwidth requirements, without significantly impacting a viewer's enjoyment of the session.
For example, to conserve bandwidth, a wearable computing device may transmit the portion of the video that corresponds to the region of interest a high-resolution or format and the remainder of the video in a low-resolution format. In some embodiments, the high-resolution format takes relatively more bandwidth to transmit than the low-resolution format. Thus, the high-resolution format can be considered as a “high-bandwidth format” or “higher-bandwidth format,” while the low-resolution format can be considered as a “low-bandwidth format” or “lower-bandwidth format.” Alternatively, the portion outside of the region of interest might not be transmitted at all. In addition to video, the wearable computing device could capture and transmit an audio stream. The user, a remote viewer, or an automated function may identify a region of interest in the audio stream, such as a particular speaker.
In some embodiments, identifying the ROI, determining a portion of images in the share corresponding to the ROI, and/or formatting the images based on the determined portion can be performed in real-time. The images in the share can be transmitted as video data in real-time.
ROI functionality may be implemented in many other scenarios as well. For instance, in an experience-sharing session, a sharer might want to point out notable features of the environment. For example, in an experience sharing session during a scuba dive, a sharer might want to point out an interesting fish or coral formation. Additionally or alternatively, a viewer in the experience-sharing session might want to know what the sharer is focusing their attention on. In either case, one technique to point out notable features is to specify a region of interest (ROI) within the environment.
The region of interest could be defined either by the user of the wearable computing device or by one of the remote viewers. Additionally or alternatively, the sharing device may automatically specify the region of interest on behalf of its user, without any explicit instruction from the user. For example, consider a wearable computer that is configured with eye-tracking functionality, which is acting as a sharing device in an experience-sharing session. The wearable computer may use eye-tracking data to determine where its wearer is looking, or in other words, to determine an ROI in the wearer's field of view. A ROI indication may then be inserted into a video portion of the share at a location that corresponds to the ROI in the wearer's field of view. Other examples are also possible.
In a further aspect, the region of interest can be one or more specific objects shown in the video, such as the fish or coral formation in the scuba example mentioned above. In another example, the region of interest is delimited by a focus window, such as a square, rectangular, or circular window. The user or remote viewer may be able to adjust the size, shape, and/or location of the focus window, for example, using an interface in which the focus window overlays the video or overlays the wearer's view through the HMD, so that a desired region of interest is selected.
In some embodiments, the wearable computing device can receive request(s) to track object(s) with region(s) of interests during an experience sharing session, automatically track the object(s) during the experience sharing session, and maintain the corresponding region(s) of interest throughout a subsequent portion or entirety of the experience sharing session. After receiving the request(s) to track objects, the wearable computing device can receive corresponding request(s) to stop tracking object(s) during the experience sharing session, and, in response, delete any corresponding region(s) of interest.
In other embodiments, some or all regions of interest can be annotated with comments or annotations. The comments can appear as an annotation on or near the region of interest in a live or stored video portion of an electronic sharing session. The comments can be maintained throughout the electronic sharing session, or can fade from view after a pre-determined period of time (e.g., 10-60 seconds after the comment was entered). In particular embodiments, faded comments can be re-displayed upon request.
In an embodiment where a wearable computing device includes an HMD, the HMD may display an indication of the region of interest. For example, if the region of interest is an object, the HMD may display an arrow, an outline, or some other image superimposed on the user's field of view such that the object is indicated. If the region of interest is defined by a focus window, the HMD may display the focus window superimposed on the user's field of view so as to indicate the region of interest.
As shown, wearable computer 100 includes a head-mounted display (HMD) 106, several input sources 134, a data processing system, and a transmitter/receiver 102.
An exemplary set of input sources 134 are shown in
The exemplary data processing system 110 may include a memory system 120, a central processing unit (CPU) 130, an input interface 108, and an audio visual (A/V) processor 104. The memory system 120 may be configured to receive data from the input sources 134 and/or the transmitter/receiver 102. The memory system 120 may also be configured to store received data and then distribute the received data to the CPU 130, the HMD 106, a set of one or more speakers 136, or to a remote device through the transmitter/receiver 102. The CPU 130 may be configured to detect a stream of data in the memory system 120 and control how the memory system distributes the stream of data. The input interface 108 may be configured to process a stream of data from the input sources 134 and then transmit the processed stream of data into the memory system 120. This processing of the stream of data converts a raw signal, coming directly from the input sources 134 or A/V processor 104, into a stream of data that other elements in the wearable computer 100, the viewers 112, and the server 122 can use. The A/V processor 104 may be configured perform audio and visual processing on one or more audio feeds from one or more microphones 124 and on one or more video feeds from one or more video cameras 114. The CPU 130 may be configured to control the audio and visual processing performed on the one or more audio feeds and the one or more video feeds. Examples of audio and video processing techniques, which may be performed by the A/V processor 104, will be given later.
The transmitter/receiver 102 may be configured to communicate with one or more remote devices through the communication network 132. Each connection made to the network (142, 152A, 152B, 152C, and 162) may be configured to support two-way communication and may be wired or wireless.
The HMD 106 may be configured to display visual objects derived from many types of visual multimedia, including video, text, graphics, pictures, application interfaces, and animations. In some embodiments, one or more speakers 136 may also present audio objects. Some embodiments of an HMD 106 may include a visual processor 116 to store and transmit a visual object to a physical display 126, which actually presents the visual object. The visual processor 116 may also edit the visual object for a variety of purposes. One purpose for editing a visual object may be to synchronize displaying of the visual object with presentation of an audio object to the one or more speakers 136. Another purpose for editing a visual object may be to compress the visual object to reduce load on the display. Still another purpose for editing a visual object may be to correlate displaying of the visual object with other visual objects currently displayed by the HMD 106.
While
In general, it should be understood that any computing system or device described herein may include or have access to memory or data storage, which may take include a non-transitory computer-readable medium having program instructions stored thereon. Additionally, any computing system or device described herein may include or have access to one or more processors. As such, the program instructions stored on such a non-transitory computer-readable medium may be executable by at least one processor to carry out the functionality described herein.
Further, while not discussed in detail, it should be understood that the components of a computing device that serves as a viewing device in an experience-sharing session may be similar to those of a computing device that serves as a sharing device in an experience-sharing session. Further, a viewing device may take the form of any type of networked device capable of providing a media experience (e.g., audio and/or video), such as television, a game console, and/or a home theater system, among others.
Each of the frame elements 204, 206, and 208 and the extending side-arms 214, 216 may be formed of a solid structure of plastic and/or metal, or may be formed of a hollow structure of similar material so as to allow wiring and component interconnects to be internally routed through the head-mounted device 202. Other materials may be possible as well.
One or more of each of the lens elements 210, 212 may be formed of any material that can suitably display a projected image or graphic. Each of the lens elements 210, 212 may also be sufficiently transparent to allow a user to see through the lens element. Combining these two features of the lens elements may facilitate an augmented reality or heads-up display where the projected image or graphic is superimposed over a real-world view as perceived by the user through the lens elements.
The extending side-arms 214, 216 may each be projections that extend away from the lens-frames 204, 206, respectively, and may be positioned behind a user's ears to secure the head-mounted device 202 to the user. The extending side-arms 214, 216 may further secure the head-mounted device 202 to the user by extending around a rear portion of the user's head. Additionally or alternatively, for example, the system 200 may connect to or be affixed within a head-mounted helmet structure. Other possibilities exist as well.
The system 200 may also include an on-board computing system 218, a video camera 220, a sensor 222, and a finger-operable touch pad 224. The on-board computing system 218 is shown to be positioned on the extending side-arm 214 of the head-mounted device 202; however, the on-board computing system 218 may be provided on other parts of the head-mounted device 202 or may be positioned remote from the head-mounted device 202 (e.g., the on-board computing system 218 could be wire- or wirelessly-connected to the head-mounted device 202). The on-board computing system 218 may include a processor and memory, for example. The on-board computing system 218 may be configured to receive and analyze data from the video camera 220 and the finger-operable touch pad 224 (and possibly from other sensory devices, user interfaces, or both) and generate images for output by the lens elements 210 and 212.
The video camera 220 is shown positioned on the extending side-arm 214 of the head-mounted device 202; however, the video camera 220 may be provided on other parts of the head-mounted device 202. The video camera 220 may be configured to capture images at various resolutions or at different frame rates. Many video cameras with a small form-factor, such as those used in cell phones or webcams, for example, may be incorporated into an example of the system 200.
Further, although
In yet another example, wearable computing device 312 can include an inward-facing camera that tracks the user's eye movements. Thus, the region of interest could be defined based on the user's point of focus, for example, so as to correspond to the area within the user's foveal vision.
Additionally or alternatively, wearable computing device 312 may include one or more inward-facing light sources (e.g., infrared LEDs) and one or more inward-facing receivers such as photodetector(s) that can detect reflections of the inward-facing light sources from the eye. The manner in which beams of light from the inward-facing light sources reflect off the eye may vary depending upon the position of the iris. Accordingly, data collected by the receiver about the reflected beams of light may be used to determine and track the position of the iris, perhaps to determine an eye gaze vector from the back or fovea of the eye through the iris.
The sensor 222 is shown on the extending side-arm 216 of the head-mounted device 202; however, the sensor 222 may be positioned on other parts of the head-mounted device 202. The sensor 222 may include one or more of a gyroscope or an accelerometer, for example. Other sensing devices may be included within, or in addition to, the sensor 222 or other sensing functions may be performed by the sensor 222.
The finger-operable touch pad 224 is shown on the extending side-arm 214 of the head-mounted device 202. However, the finger-operable touch pad 224 may be positioned on other parts of the head-mounted device 202. Also, more than one finger-operable touch pad may be present on the head-mounted device 202. The finger-operable touch pad 224 may be used by a user to input commands. The finger-operable touch pad 224 may sense at least one of a position and a movement of a finger via capacitive sensing, resistance sensing, or a surface acoustic wave process, among other possibilities. The finger-operable touch pad 224 may be capable of sensing finger movement in a direction parallel or planar to the pad surface, in a direction normal to the pad surface, or both, and may also be capable of sensing a level of pressure applied to the pad surface. The finger-operable touch pad 224 may be formed of one or more translucent or transparent insulating layers and one or more translucent or transparent conducting layers. Edges of the finger-operable touch pad 224 may be formed to have a raised, indented, or roughened surface, so as to provide tactile feedback to a user when the user's finger reaches the edge, or other area, of the finger-operable touch pad 224. If more than one finger-operable touch pad is present, each finger-operable touch pad may be operated independently, and may provide a different function.
The lens elements 210, 212 may act as a combiner in a light projection system and may include a coating that reflects the light projected onto them from the projectors 228, 232. In some embodiments, a reflective coating may not be used (e.g., when the projectors 228, 232 are scanning laser devices).
In alternative embodiments, other types of display elements may also be used. For example, the lens elements 210, 212 themselves may include: a transparent or semi-transparent matrix display, such as an electroluminescent display or a liquid crystal display, one or more waveguides for delivering an image to the user's eyes, or other optical elements capable of delivering an in focus near-to-eye image to the user. A corresponding display driver may be disposed within the frame elements 204, 206 for driving such a matrix display. Alternatively or additionally, a laser or LED source and scanning system could be used to draw a raster display directly onto the retina of one or more of the user's eyes. Other possibilities exist as well.
As shown in
The wearable computing device 272 may include a single lens element 280 that may be coupled to one of the side-arms 273 or the center frame support 274. The lens element 280 may include a display such as the display described with reference to
As described in the previous section and shown in
In some exemplary embodiments a remote server may help reduce the sharer's processing load. In such embodiments, the sharing device may send the share to a remote, cloud-based serving system, which may function to distribute the share to the appropriate viewing devices. As part of a cloud-based implementation, the sharer may communicate with the server through a wireless connection, through a wired connection, or through a network of wireless and/or wired connections. The server may likewise communicate with the one or more viewers through a wireless connection, through a wired connection, or through a network of wireless and/or wired connections. The server may then receive, process, store, and/or transmit both the share from the sharer and comments from the viewers.
An experience-sharing server may process a share in various ways before sending the share to a given viewer. For example, the server may format media components of a share to help adjust for a particular viewer's needs or preferences. For instance, consider a viewer that is participating in an experience-sharing session via a website that uses a specific video format. As such, when the share includes a video, the experience-sharing server may format the video in the specific video format used by the web site before transmitting the video to this viewer. In another example, if a viewer is a PDA that can only play audio feeds in a specific audio format, the server may format an audio portion of a share in the specific audio format used by the PDA before transmitting the audio portion to this viewer. Other examples of formatting a share (or a portion of a share) for a given viewer are also possible. Further, in some instances, the ES server may format the same share in a different manner for different viewers in the same experience-sharing session.
Further, in some instances, an experience-sharing server may compress a share or a portion of a share before transmitting the share to a viewer. For instance, if a server receives a high-resolution share, it may be advantageous for the server to compress the share before transmitting it to the one or more viewers. For example, if a connection between a server and a certain viewer runs too slowly for real-time transmission of the high-resolution share, the server may temporally or spatially compress the share and send the compressed share to the viewer. As another example, if a viewer requires a slower frame rate for video feeds, a server may temporally compress a share by removing extra frames before transmitting the share to that viewer. And as another example, the server may be configured to save bandwidth by downsampling a video before sending the stream to a viewer that can only handle a low-resolution image. Additionally or alternatively, the server may be configured to perform pre-processing on the video itself, e.g., by combining multiple video sources into a single feed, or by performing near-real-time transcription (closed captions) and/or translation.
Yet further, an experiencing-sharing server may decompress a share, which may help to enhance the quality of an experience-sharing session. In some embodiments, to reduce transmission load on the connection between a sharer and a server, the sharer may compress a share before sending the share to the server. If transmission load is less of a concern for the connection between the server and one or more viewers, the server may decompress the share before sending it to the one or more viewers. For example, if a sharer uses a lossy spatial compression algorithm to compress a share before transmitting the share to a server, the server may apply a super-resolution algorithm (an algorithm which estimates sub-pixel motion, increasing the perceived spatial resolution of an image) to decompress the share before transmitting the share to the one or more viewers. In another implementation, the sharer may use a lossless data compression algorithm to compress a share before transmission to the server, and the server may apply a corresponding lossless decompression algorithm to the share so that the share may be usable by the viewer.
As noted above, in order to format a share so as to reduce bandwidth requirements, and/or to enhance the quality of experience sharing, an exemplary method may identify a region-of interest in a share. Some techniques for identifying a region of interest may involve using of eye-tracking data to determine where a user of a sharing device is looking, specifying this area as a region of interest, and then formatting the share so as to reduce the data size of the portion of the share outside of the region of interest.
Other techniques for identifying the region of interest may involve receiving input from a user that specifies a region of interest within the user's field of view. Once specified, images and/or video can concentrate on the region of interest. For example, images and/or video of an experience sharing session can utilize a higher-resolution portion of the image or video within the region of interest than utilized outside of the region of interest.
Once region of interest 320 is specified, WCD 310 and/or server 122 can generate displays of the environment based on the region of interest. At 300A of
In an experience-sharing session, an image containing both an environment and an outline of region of interest can be shared with viewers to show interesting features of the environment from the sharer's point of view. For example, the image shown at 300B1 can be shared with viewers of an electronic sharing system.
At 300B2 of
Sub-regions of interest can be specified within regions of interest. At 300B3 of
In some embodiments not pictured, wearable computing device 312 can include one or more external cameras. Each external camera can be partially or completely controlled by wearable computing device 312. For example, an external camera can be moved using servo motors.
As another example, the wearable computing device can be configured to remotely control a remotely-controllable camera so activate/deactivate the external camera, zoom in/zoom out, take single and/or motion pictures, use flashlights, and/or other functionality of the external camera. In these embodiments, the wearer and/or one or more sharers, either local or remote, can control a position, view angle, zoom, and/or other functionality of the camera, perhaps communicating these controls via an experience-sharing session. The wearable computing device can control multiple cameras; for example, a first camera with a wide field of view and relatively low resolution and a second camera under servo/remote control with a smaller field of view and higher resolution.
Once a region (or sub-region) of interest is specified, media content in the share can be formatted so as to concentrate on the region of interest. For example, images and/or video of an experience sharing session can include one or more “composite images” that utilize a higher-resolution portion of the image or video within the region of interest than utilized outside of the region of interest. These composite images can be generated both to save bandwidth and to draw a viewer's attention to the region of interest. For example, a composite image can be generated from two images: a “ROI image” of the region of interest and an “environmental image” which is an image of the environment outside of the region of interest representative of the wearer's field of view. In some embodiments, the ROI image can have relatively-higher resolution (e.g., take more pixels/inch) than the environmental image.
Some additional bandwidth savings may be obtained by replacing the portion of environmental image 342 that overlaps ROI image 344 with ROI image 344, thus generating composite image 340. Then, by only transmitting only composite image 340, the bandwidth required to transmit the portion of environmental image 342 that overlaps ROI image 344 can be saved.
To further preserve bandwidth, still lower resolution versions of an environmental image can be utilized. At 300E of
For the example of scenario 300, suppose that the size of region of interest 320 as shown in
In some embodiments, identifying the region of interest, determining a first portion in image(s) that correspond to the region of interest and a second portion of the image(s) that is not in the first portion, and/or formatting the image(s) based on the determined portion(s) of the image(s) can be performed in real-time. The formatted image(s) can be transmitted as video data in real-time.
At 300F of
For example, suppose that each grid cell in grid 340 is 100×150 pixels, and so the size of environment 310 is 400×600 pixels. Continuing this example, suppose the respective locations of the upper-left-hand corner and the location of the lower-right-hand corner of region of interest 320 are at pixel locations (108, 200) and (192, 262) of environment 310, with pixel location (0, 0) indicating the upper-left-hand corner of environment 310 and pixel location (400, 600) indicating the lower-right-hand corner of environment 310.
Then, with this additional location information, a receiving device can display region of interest 320 in the relative position captured within environment 310. For example, upon receiving size information for environment 310 of 600×400 pixels, the receiving device can initialize a display area or corresponding stored image of 600×400 to one or more predetermined replacement pixel-color values. Pixel-color values are specified herein as a triple (R, G, B), where R=an amount of red color, G=an amount of green color, and B=an amount of blue color, and with each of R, G, and B specified using a value between 0 (no color added) and 255 (maximum amount of color added).
Additional information about the environment can be provided by adding relatively small amounts of additional bandwidth. For example, at 300G of
As noted above, a region of interest may also be indicated via explicit user instruction. In particular, a sharing device or a viewing device may receive explicit instructions to select and/or control a certain region of interest. Further, when a sharing device receives an explicit selection of a region of interest, the sharing device may relay the selection to the experience-sharing server, which may then format the share for one or more viewers based on the region of interest. Similarly, when a viewing device receives an explicit selection of a region of interest, the viewing device may relay the selection to the experience-sharing server. In this case, the server may then format the share for the viewing device based on the selected region of interest and/or may indicate the region of interest to the sharing device in the session.
In a further aspect, the explicit instructions may specify parameters to select a region of interest and/or actions to take in association with the selected region of interest. For example, the instructions may indicate to select regions of interest based on features within an environment, to perform searches based on information found within the environment, to show indicators of regions of interest, to change the display of the region of interest and/or environmental image, and/or to change additional display attributes. The instructions may include other parameters and/or specify other actions, without departing from the scope of the invention.
The instructions can be provided to wearable computing device 312 via voice input(s), textual input(s), gesture(s), network interface(s), combinations of these inputs thereof, and by other techniques for providing input to wearable computing device 312. The instructions can be provided as part of an experience sharing session with wearable computing device 312. In particular, the instructions can be provided to a server, such as server 122, to control display of a video feed for the experience sharing session, perhaps provided to a viewer of the experience sharing session. If multiple viewers are watching the experience sharing session, then the server can customize the views of the experience sharing session by receiving explicit instructions to control a region of interest and/or imagery from some or all of the viewers, and carrying out those instructions to control the video feeds sent to the multiple viewers.
Upon receiving instructions 410, wearable computing device 312 can execute the instructions. The first of instructions 400 “Find Objects in Text and Apples” can be performed by wearable computing device 312 capturing an image of environment 310 and scanning the image for text, such as the word “Canola” shown in environment 310. Upon finding the text “Canola”, wearable computing device 312 can utilize one or more image processing or other techniques to determine object(s) associated with the text “Canola.” For example, wearable computing device 312 can look for boundaries of an object that contains the text “Canola.”
Then, wearable computing device can scan environment 310 for apples. For example, wearable computing device 312 can scan for objects shaped like apples, perform search(es) for image(s) of apples and compare part or all of the resulting images with part or all of the image of environment 310, or via other techniques.
In scenario 400, in response to the “Find Objects with Text and Apples” instruction, wearable computing device 312 has found two objects: (1) a canola oil bottle with the text “Canola” and (2) a basket of apples.
Lens/display 314 has been enlarged in
In scenario 400, wearable computing device 312 then executes the remaining two commands “Search on Text” and “Show Objects with Text and Search Results.” To execute the “Search on Text” command, wearable computing device 312 can generate queries for one or more search engines, search tools, databases, and/or other sources that include the text “Canola.” Upon generating these queries, wearable computing device 312 can communicate the queries as needed, and, in response, receives search results based on the queries.
At 400A3 of
Scenario 400 continues on
Upon receiving instructions 420, wearable computing device 312 can execute the instructions. The first of instructions 400 “Find Bananas” can be performed by wearable computing device 312 capturing an image of environment 310 and scanning the image for shapes that appear to be bananas. For example, wearable computing device 312 can scan for objects shaped like bananas, perform search(es) for image(s) of bananas and compare part or all of the resulting images with part or all of the image of environment 310, or via other techniques. In scenario 400, wearable computing device finds bananas in environment 310.
In scenario 400, wearable computing device 312 then executes the second instruction of instructions 420: “Show Bananas with Rest of Environment as Gray.” In response, at 400B3 of
Scenario 400 continues on
Upon receiving instructions 420, wearable computing device 312 can execute the instructions. The first of instructions 400 “Find Bananas” can be performed by wearable computing device 312 as discussed above for 400B2 of
In scenario 400, wearable computing device 312 then executes the “Indicate When Found” and “Show Bananas” instructions of instructions 430. In response, at 400B3 of
Scenario 400 continues on
Wearable computing device 442 and wearable computing device 312 then establish experience sharing session 450 (if not already established). Then, wearable computing device 442 sends instructions 460 to wearable computing device 312. As shown in
In scenarios not shown in the Figures, the wearer of wearable computing device 442 shares an experience sharing session shared from wearable computing device 312 via a server, such as server 112. For example, in response to a request to establish an experience sharing session for wearable computing device 442 to view the share generated by wearable computing device 312, the server can provide a full video feed of the experience sharing session 450. Then, the server can receive instructions 460 from wearable computing device 442 to control the video feed, change the video feed based on instructions 442, and provide the changed video feed to wearable computing device 442.
In embodiments not shown in
As an example use of the remote wearable computing device interface, wearable computing device 442 can select the corresponding display of wearable computing device 312 and use a touchpad or other device to generate the text “Hello” within the corresponding display. In response, wearable computing device 442 can send instructions to wearable computing device 312 to display the text “Hello” as indicated in the corresponding display. Many other examples of use of a remote wearable computing device interface are possible as well.
Upon receiving instructions 460, wearable computing device 312 can execute the instructions. The “Find Corn” instruction of instructions 460 can be performed by wearable computing device 312 as discussed above for 400B2 of
In scenario 400, wearable computing device 312 then executes the “Indicate When Found” and “Only Show Corn” instructions of instructions 430. In response, at 400F2 of
In some cases, a viewer or a sharer of an experience sharing session may wish to explicitly request that a region of interest be directed to or surround one or more objects of interest. For example, suppose a viewer of an experience sharing session of a deep-sea dive sees a particular fish and wishes to set the region of interest to surround the particular fish. Then, the viewer can instruct a wearable computing device and/or a server, such as server 122, to generate a region of interest that “snaps to” or exactly or nearly surrounds the particular fish. In some scenarios, the region of interest can stay snapped to the object(s) of interest while the objects move within the environment; e.g., continuing the previous example, the region of interest can move with the particular fish as long as the particular fish remains within the image(s) of the share.
At 500B of
The snap-to instruction instructs the wearable computing device to reset the region of interest as specified by a user. For example, region of interest 510 includes portions of a basket of corn, a watermelon and a cotton plant. Upon receiving the “Snap-to Round Object” instruction of instructions 510, wearable computing device 312 can examine region of interest 510 for a “round object” and determine that the portion of the watermelon can be classified as a “round object.”
Supposing that wearable computing device 312 had not found a “round object” within region of interest 510, wearable computing device 312 can expand a search to include all of environment 310. Under this supposition, perhaps wearable computing device 312 would have found one or more of the tomatoes, bowls, grapes, apples, jar top, avocado portions, cabbage, and/or watermelon portion shown within environment 310 as the round objects.
After identifying the watermelon portion within region of interest 510 as the “round object”, wearable computing device 312 can execute the “Show Round Object” and “Identify Round Object” instructions of instructions 430. In response, as shown at 500D of
To execute the “Show Round Object” command, wearable computing device 312 can capture an ROI image for region of interest 530, and display the ROI image as image 532 above prompt 438, as shown in
To execute the “Identify Round Object” command, wearable computing device 312 can generate queries for one or more search engines, search tools, databases, and/or other sources that include the ROI image. In some embodiments, additional information beyond the ROI image can be provided with the queries. Examples of additional information include contextual information about environment 310 such as time, location, etc. and/or identification information provided by the wearer of wearable computing device 312, such as a guess as to the identity of the “round object.” Upon generating these queries, wearable computing device 312 can communicate the queries as needed, and, in response, receive search results based on the queries. Then, wearable computing device 312 can determine the identity of the ROI image based on the search results. As shown in at 500D of
At 540B of
The set ROI instruction instructs the wearable computing device to set a region of interest defined by a point, perhaps arbitrarily defined, or object. For example, environment 542 is shown as a rectangular region with four corners and a center point. Upon receiving the “Set ROI1 at upper left corner” instruction of instructions 550, wearable computing device 312 can define a region of interest ROI1 whose upper-left-hand corner equals the upper-left-hand corner of environment 542. Similarly, if this instruction would have been “Set ROI1 at lower right corner” instruction, wearable computing device 312 can define a region of interest ROI1 whose lower-right-hand corner equals the lower-right-hand corner or environment 542.
In some embodiments, a region of interest can be provided to and/or defined on a server, such as the server hosting the experience sharing session. For example, the wearer can send region-of-interest information for a sequence of input images, the region-of-interest information can be information provided by the server, and/or region-of-interest information can be defined by a sharer interested in particular region of the sequence of input images. The region-of-interest information can be sent as metadata for the sequence of input images. For example, each region of interest can be specified as a set of pairs of Cartesian coordinates, where each pair of Cartesian coordinates corresponds to a vertex of a polygon that defines a region of interest within a given image of the sequence of input images. Then, as the input images and/or other information are sent from wearable computing device 312 to the server, the server can apply the region-of-interest information as needed. For example, suppose the images from the wearer are transmitted to the server as a sequence of full video frames and one or more wearer-defined regions of interest transmitted as metadata including pairs of Cartesian coordinates as discussed above. Then, the server can apply one or more wearer-defined regions of interest to the full video frames as needed.
The region-of-interest compressed video can then be sent to one or more viewers with relatively low bandwidth and/or to viewers who specifically request this compressed video, while other viewer(s) with a suitable amount of bandwidth can receive the sequence of full video frames. Server based region-of-interest calculations require less computing power for wearable computing devices with sufficient bandwidth and enable flexible delivery of video; e.g., both full video frames and region-of-interest compressed video, in comparison with only region-of-interest compressed video if the region-of-interest is applied using wearable computing device 312. In still other scenarios, full video frames can be sent to a viewer with suitable bandwidth along with region-of-interest information, perhaps sent as the metadata described above. Then, the viewer can use suitable viewing software to apply none, some, or all of the region-of-interest information in the metadata to the full video frames as desired.
In other scenarios not shown in
The “Set ROI2 at image center” instruction of instructions 550 can instruct wearable computing device 312 can define a region of interest ROI2 that is centered at center of environment 542. As shown in
In embodiments where wearable computing device 312 can recognize one or more faces in an environment, an instruction such as “Set ROI3 on leftmost face” instruction of instructions 550 can instruct wearable computing device 312 to search an image of environment 542 for faces. At least three faces of people about to exit from an escalator can be recognized in environment 542. In some of these embodiments, facial detection and recognition can be “opt-in” features; i.e., wearable computing device 312 would report detection and/or recognition of faces of persons who have agreed to have their faces detected and/or recognized, and would not report detection and/or recognition of faces of persons who have not so agreed.
After recognizing the three faces, wearable computing device 312 can determine which face is the “leftmost” and set ROI3 to that portion of an image of environment 542. Then, in response to the “Show ROI3” instruction of instructions 540, wearable computing device 312 utilizes lens/display 314 to display captured image 562, corresponding to ROI3, and corresponding prompt 564a.
Upon viewing captured image 562, scenario 500 can continue by receiving additional instruction 564 to “Double size of ROI3 and show.” In response to instruction 564, wearable computing device 312 can display a double-sized processed image 566 and corresponding “ROI3 2×:” prompt 564b, as shown in
In some embodiments not shown in
In other embodiments, facial and/or object detection within a sequence of image frames provided by wearable computing device 312 can be performed by a server, such as the server hosting the experience sharing session. The server can detect faces and/or objects of interest based on requests from one or more sharers and/or the wearer; e.g., the “Set ROI3 on leftmost face” instruction of instructions 550. Once the server has detected faces and/or objects of interest, the server can provide information about location(s) of detected face(s) and/or object(s) to wearable computing device 312 to the wearer and/or the one or more sharers.
In still other embodiments, both wearable computing device 312 and a server can cooperate to detect faces and/or objects. For example, wearable computing device 312 can detect faces and/or objects of interest to the wearer, while the server can detect other faces and/or images not specifically requested by the wearer; e.g., wearable computing device 312 performs the facial/object recognition processing requested by instructions such as instructions 550, and the server detects any other object(s) or face(s) requested by the one or more sharers. As the faces and/or objects are detected, wearable computing device 312 and the server can communicate with each other to provide information about detected faces and/or objects.
The resolution of an image, perhaps corresponding to a region of interest, can be increased based on a collection of images. The received collection of images can be treated as a panorama of images of the region of interest. As additional input images are received for the region of interest, images of overlapping sub-regions can be captured several times.
Overlapping images can be used to generate a refined or “processed” image of the region of interest. A super-resolution algorithm can generate the processed image from an initial image using information in the overlapping images. A difference image, as well as differences in position and rotation, between an input image of the overlapping images and the initial image is determined. The difference image can be mapped into a pixel space of the initial image after adjusting for the differences in position and rotation. Then, the processed image can be generated by combining the adjusted difference image and the initial image. To further refine the processed image, the super-resolution algorithm can utilize a previously-generated processed image as the initial image to be combined with an additional, perhaps later-captured, input image to generate a new processed image. Thus, the initial image is progressively refined by the super-resolution algorithm to generate a (final) processed image.
Also, features can be identified in the overlapping images to generate a “panoramic” or wide-viewed image. For example, suppose two example images are taken: image1 and image2. Each of image1 and image2 are images of separate six-meter wide by four-meter high areas, where the widths of the two images overlap by one meter. Then, image1 can be combined with image2 to generate an panoramic image of a eleven-meter wide by four-meter high area by either (i) aligning images image1 and image2 and then combining the aligned images using an average or median of the pixel data from each images or (ii) each region in the panoramic image can be taken from only one of images image1 or image2. Other techniques for generating panoramic and/or processed images can be used as well or instead.
Once generated, each processed image can be sent to one or more sharers of an experience sharing session. In some cases, input and/or processed images can be combined as a collection of still images and/or as a video. As such, a relatively high resolution collection of images and/or video can be generated using the captured input images.
Scenario 570 begins at 570A with wearable computing device 312 worn by a wearer during an experience sharing session involving environment 572, which a natural gas pump at a bus depot. A region of interest 574 of environment 572 has been identified on a portion of the natural gas pump. Region of interest 574 can be identified by the wearer and/or by one or more sharers of the experience sharing session. In scenario 570, wearable computing device 312 is configured to capture images from a point of view of the wearer using at least one forward-facing camera.
In scenario 570, the wearer for the experience sharing session captures input images for generating processed images of region of interest 574, but does not have access to the processed images. At 570A, prompt 576a and/or capture map 580a can provide feedback to the wearer to ensure suitable input images are captured to for processed image generation. Prompt 576a, shown in
Capture map 580a can depict region of interest 574 and show where image(s) need to be captured. As shown at 570
The herein-described prompts, capture maps, and/or processed images can be generated locally, e.g., using wearable computing device 312, and/or remotely. For remote processing, the input images and/or sensor data can be sent from wearable computing device 312 to a server, such as the server hosting the experience sharing session. The server can generate the prompts, capture map, and/or processed images based on the input images, and transmit some or all of these generated items to wearable computing device 312.
Scenario 570 continues with the wearer turning left and walking forward, while images are captured along the way. At 570B of
Scenario 570 continues with region of interest 574 being extended to the right by a right extension area, as shown at 570C of
Gaze up eye 650 shows eye 620 when looking directly upwards. The bottom of pupil 612 for gaze up eye 650 is well above eye X axis 632 and again is centered along eye Y axis 634. Gaze down eye 660 shows eye 620 when looking directly downward. Pupil 612 for gaze down eye 650 is centered slightly above eye X axis 632 and centered on eye Y axis 634.
Gaze right eye 670 shows eye 620 when looking to the right.
Gaze left eye 680 shows eye 620 when looking to the left.
Gaze direction 602 of eye 620 can be determined based on the position of pupil 612 with respect to eye X axis 632 and eye Y axis 634. For example, if pupil 612 is slightly above eye X axis 632 and centered along eye Y axis 634, eye 620 is gazing straight ahead, as shown by gaze ahead eye 640 of
Similarly, gaze direction 602 would have a rightward (+X) component if pupil 612 were to travel further to the left of eye Y axis 634 than indicated by gaze ahead eye 640, and would have a leftward (−X) component if pupil 612 were to travel further to the right of eye Y axis 634 than indicated by gaze ahead eye 640.
At pupil position 718 (shown in grey for clarity in
At pupil position 720, which corresponds to a position of pupil 612 in gaze up eye 650, eye gaze vector 722 is shown pointing in the positive eye Y axis 634 (upward) direction with a zero eye X axis 632 component. At pupil position 724, which corresponds to a position of pupil 612 in gaze down eye 660, eye gaze vector 726 is shown pointing in the negative eye Y axis 634 (upward) direction with a zero eye X axis 632 component.
As shown in
At pupil position 718, corresponding to gaze ahead eye 620, eye gaze vector 732 has a zero eye Y axis 634 component and a positive (outward) Z axis component. As mentioned above, eye gaze vector 732 is (0, 0, Zahead), where Zahead is the value of the Z axis component for this vector. At pupil position 714, corresponding to gaze left eye 680, eye gaze vector 716 has a negative eye X axis 632 component and a positive Z axis component. Thus, eye gaze vector 716 will be (Xleft, 0, Zleft), where Xleft and Zleft are the values of the respective eye X axis 632 and Z axis components for this eye gaze vector, with Xleft<0 and Zleft>0. At pupil position 710, corresponding to gaze right eye 670, eye gaze vector 712 has both positive eye X axis 632 and Z axis components. Thus, eye gaze vector 712 will be (Xright, 0, Zright), where Xright and Zright are values of the respective eye X axis 632 and Z axis components for this eye gaze vector, with Xright>0 and Zright>0. A basis can be generated for transforming an arbitrary pupil position (Px, Py) into an eye gaze vector (X, Y, Z), such as by orthogonalizing some or all eye gaze vectors 712, 716, 722, 726, and 732, where Px and Py are specified in terms of eye X axis 632 and eye Y axis 634, respectively.
Then, wearable computing device 312 can receive an image of a picture of an eye of a wearer of wearable computing device 312, determine a pupil position (Px, Py) specified in terms of eye X axis 632 and eye Y axis 634 by analyzing the image by comparing the pupil position to pupil positions of gazing eyes 640, 650, 660, 670, and 680, and use the basis to transform the (Px, Py) values into a corresponding eye gaze vector. In some embodiments, wearable computing device 312 can send the image of the eye(s) of the wearer to a server, such as server 122, for the server to determine the eye gaze vector based on received images of the eye(s) of the wearer.
An eye gaze vector can be combined a head-tilt vector to determine a gaze direction and perhaps locate a region of interest in an environment.
Head-tilt sensor(s) 754 can be configured to determine a head-tilt vector of a head of wearer 752 corresponding to a vector perpendicular to head axis 764. Head axis 764 is a vector from a top to a base of the head of wearer 752 running through the center of the head of wearer 752. Head tilt vector 762 is a vector perpendicular to head axis 764 that is oriented in the direction of a face of the viewer (e.g., looking outward). In some embodiments, the head axis 764 and head tilt vector through a fovea of an eye of wearer 752, or some other location within the head of wearer 752.
One technique is to use one or more accelerometers as head-tilt sensor(s) 754 to determine head axis 764 relative to gravity vector 766. Head tilt vector 762 can be determined by taking a cross product of head axis 764 and the (0, 0, +1) vector, assuming the +Z direction is defined to be looking outward in the determination of head axis 764. Other methods for determining head tilt vector 762 are possible as well. Eye gaze vector 760 can be determined using the techniques discussed above or using other techniques as suitable. Gaze direction 764 can then be determined by performing vector addition of head tilt vector 762 and eye gaze vector 760. In other embodiments, data from head-tilt sensor(s) 754 and/or other data can be sent to a server, such as server 122, to determine head tilt vector 762. In particular embodiments, the server can determine eye gaze vectors, such as eye gaze vector 760, as mentioned above and thus determine gaze direction 764.
Eye gaze vector 760, head tilt vector 762, and/or gaze direction 764 can then be used to locate features in images of an environment in the direction(s) of these vectors and determine an appropriate region of interest. In scenario 740, gaze direction 764 indicates wearer 752 may be observing airplane 770. Thus, a region of interest 772 surrounding airplane 770 can be indicated using eye gaze vector 760 and/or gaze direction 764 and images of an environment. If the images are taken from a point of view of wearer 752, eye gaze vector 760 specifies a line of sight within the images. Then, wearable computing device 312 and/or a server, such as server 122, can indicate region(s) of interest that surround object(s) along the line of sight.
If images of the environment are taken from a different point of view than the point of view of wearer 752, gaze direction 764 can be used to determine a line of sight within the images, perhaps by projecting gaze direction 764 along a vector specifying the point of view of the images. Then, wearable computing device 312 and/or a server, such as server 122, can indicate region(s) of interest that surround object(s) along the line of sight specified by the projection of gaze direction 764.
Note that the description herein discusses the use of pupil positions, or the position of a pupil of an eye, to determine eye gaze vectors. In some embodiments, pupil positions can be replaced with iris positions, or the position of an iris of the eye, to determine eye gaze vectors.
Moreover, it should be understood that while several eye-tracking techniques are described for illustrative purposes, the type of eye-tracking technique employed should not be construed as limiting. Generally, any eye-tracking technique that is now known or later developed may be employed to partially or completely determine a region of interest, without departing from the scope of the invention.
The above examples have generally dealt with specifying a visual region of interest. However, some embodiment may additionally or alternatively involve auditory regions of interest (e.g., what a user is listening to).
At 800A of
At 800A of
Specifying the term “CARDS” as part of the Use Sound ROI instruction, further instructs wearable computing device 810 to specify a sound-based region of interest only after detecting terms related to “CARDS”; that is, sounds are to be screened for terms related to terms found in a sound-based-ROI file or other storage medium accessible by wearable computing device 810 using the name “CARDS.” Example terms in a sound-based-ROI file for “CARDS” could include standard terms used for cards (e.g., “Ace”, “King”, “Queen”, “Jack”) various numbers, and/or card-related jargon (e.g., “hand”, “pair”, “trick”, “face cards”, etc.). As another example, sound-based-ROI for patents can include standard terms (e.g., “patent”, “claim”, “specification”), various numbers, and/or jargon (e.g., “file wrapper”, “estoppel”, “102 rejection”, etc.) Many other examples of terminology can be provided in the sound-based-ROI file to specify a sound-based region of interest are possible as well. In other scenarios, various sounds can be used instead or along with terms in the sound-based-ROI file; for example, the sound of gears grinding may be added to a sound-based-ROI file as part of terminology and sounds related to auto repair.
Scenario 800 continues at 800B1 of
In some embodiments, wearable computing device 810 includes a speech-to-text module, which can be used to convert utterance 850 to text.
Scenario 800 continues at 800C1 of
In other scenarios not shown in
In still other scenarios, a user of wearable computing device 810 can inhibit display of regions of interest and/or indicators from one or more areas or microphones. For example, suppose the user of wearable computing device 810 is attending a play where the user is unfamiliar with the terminology that might be used or does not want to screen the play based on terminology. Further suppose, the user does not want to have regions of interest and/or indicators appear on wearable computing device 810 based on sounds from the audience.
Then, the user of wearable computing device 810 can inhibit wearable computing device 810 from providing regions of interest and/or indicators from microphone(s) and/or area(s) corresponding to microphone(s) most likely to detect sounds from audience members and/or within areas mostly or completely containing audience members. Thus, the user of wearable computing device 810 can use regions of interest and/or indicators to track the sounds primarily made by the cast of the play, and perhaps aid following the plot to enhance the user's enjoyment of the play.
In other embodiments, audio from the microphones 821-827 can be captured and stored. The captured audio can be then transmitted in portions, perhaps corresponding to audio portions as captured by one of microphones 821-827; e.g., a transmitted portion that includes sounds detected by microphone 821, a next portion that includes sounds detected by microphone 822, a third portion that includes sounds detected by microphone 823, and so on. In some embodiments, an “interesting” portion of the captured audio can be transmitted in a first audio format and an “uninteresting” portion of the captured audio can be transmitted in a second audio format. In these embodiments, the interesting portion can correspond to audio of interest or an audio region of interest, such area 835 in scenario 800 discussed above. In scenario 800, the interesting portion may then include sounds detected by microphone 814 and the first audio format can provides a higher audio volume or fidelity than the second audio format used for the uninteresting portion, such as sounds detected by microphone 827 in scenario 800 discussed above.
In still other embodiments, wearable computing device 810 can compress different audio sources based on expected or actual content. For example, the microphone near the wearer's mouth can be associated with and/or use a compression algorithm designed for speech, while an external microphone may use a compression algorithm designed for music or other sounds.
As another example, wearable computing device 810 can test compression algorithms on a sample and utilize the best algorithm based on performance of the sample. That is, wearable computing device 810 receive a sample of audio from a microphone, compress the sample using two or more compression algorithms, and use the compression algorithm that best performs on the sample for subsequent audio received from the microphone. The wearable computing device 810 can then choose another sample for compression testing and use, either as requested by a wearer of wearable computing device 810, upon power up and subsequent reception of audio signals, after a pre-determined amount of time, after a pre-determined period of silence subsequent to sampling, and/or based on other conditions.
Additionally, direct specification of a sound-based region of interest can be performed. In the example shown in
Example methods 900, 1000, and 1100 related to regions of interest are disclosed below.
At block 910, a field of view of an environment is provided through a head-mounted display (HMD) of a wearable computing device. The HMD is operable to display a computer-generated image overlaying at least a portion of the view. The wearable computing device is engaged in an experience sharing session. Views of environments provided by wearable computing devices are discussed above at least in the context of
In some embodiments, the experience sharing session can include an experience sharing session with the wearable computing device and at least a second computing device, such as discussed above at least in the context of
At block 920, at least one image of the real-world environment is captured using a camera on the wearable computing device. Capturing images of the environment is discussed above at least in the context of
In other embodiments, the camera is configured to move with the HMD, such as discussed above at least in the context of
In still other embodiments, the camera is configured to be controlled via the wearable computing device, such as discussed above at least in the context of
At block 930, the wearable computing device determines a first portion of the at least one image that corresponds to a region of interest within the field of view. Determining regions of interest are discussed above in the context of at least in the context of
In some embodiments, determining the first portion of the at least one image that corresponds to the region of interest can include receiving an indication of the region of interest from a wearer of the wearable computing device, such as discussed above at least in the context of
In particular of these embodiments, defining the region of interest can be based, at least in part, on an eye movement of the wearer, such as discussed above in the context of
In other of these particular embodiments, defining the region of interest can include determining a head tilt vector, determining a gaze direction based on the eye gaze vector and the head tilt vector; and determining the region of interest based on the gaze direction.
In still other of these particular embodiments, the wearable computing device can include a photodetector. Then, defining the region of interest can include: determining a location of an iris of an eye of the wearer using the photodetector and determining the eye gaze vector based on the location of the iris of the eye, such as discussed above at least in the context of
In still other embodiments, such as discussed above at least in the context of
In some embodiments, such as discussed above at least in the context of
At block 940, formatting the at least one image such that a second portion of the at least one image is of a lower-bandwidth format than the first portion, such as discussed above at least in the context of
In some embodiments, the second portion corresponds to at least one environmental image, such as discussed above at least in the context of
In further embodiments, determining the first portion of the at least one image that corresponds to the region of interest can include determining the first portion of the at least one image in real time, and formatting the at least one image can include formatting the at least one image in real time.
At block 950, the wearable computing device transmits the formatted at least one image. Transmitting images of the real-world environment using different resolutions is discussed above in the context of at least in the context of
In further embodiments, the wearable computing device can display, on the HMD, an indication of the region of interest. Displaying indications of regions of interest are discussed above in the context of at least in the context of
In other embodiments of method 900, the wearable computing device can transmit the at least one image of the real-world environment. In some of these other embodiments, the transmitted at least one image can include transmitted video.
In still other embodiments, the region of interest is defined by a focus window such as the rectangular and other-shaped indicators of a region of interest shown in
At block 1020, at least one image of the real-world environment is captured using a camera associated the wearable computing device. Capturing images of the environment are discussed above at least in the context of
At block 1020, the wearable computing device receives an indication of audio of interest. Receiving indications of the audio of interest are discussed above in the context of at least in the context of
At block 1030, the wearable computing device receives audio input via one or more microphones, such as discussed above in the context of at least in the context of
At block 1040, the wearable computing device can determine whether the audio input includes at least part of the audio of interest. Determining whether or not audio input includes at least part of audio of interest is discussed above in the context of at least in the context of
At block 1050, the wearable computing device can, in response to determining that the audio input includes at least part of the audio of interest, generate an indication of a region of interest associated with the at least part of the audio of interest. Generating indications of regions of interest associated with audio of interest is discussed above in the context of at least in the context of
In some embodiments, generating the indication of a region of interest associated with the audio of interest can include: (a) converting the audio input that includes at least part of the audio of interest to text; and (b) generating the indication of the region of interest associated with the at least part of the audio of interest, where the indication includes at least part of the text. Generating indications with text generated from audio is discussed above in the context of at least in the context of
At block 1060, the wearable computing device can display an indication of the region of interest as part of the computer-generated image. Displaying indications of regions of interest are discussed above in the context of at least in the context of
In some embodiments, the wearable computing device can transmit a first portion of the received audio input in a first audio format and a second portion of the received audio input in a second audio format, where the first portion of the video corresponds to the at least part of the audio of interest, and where the first audio format differs from the second audio format. Transmitting audio input using different audio formats is discussed above in the context of at least in the context of
In other embodiments, each of the one or more microphones is associated with an area. In these other embodiments, receiving audio input via the one or more microphones can include receiving the audio input including the at least part of the audio of interest at a first microphone of the one or more microphones, where the first microphone is related to a first area, and where the region of interest is associated with the first area. Receiving audio input via microphones associated with areas is discussed above in the context of at least in the context of
In still other embodiments, the wearable computing device can receiving additional audio input via the one or more microphones. The wearable computing device can determine whether the additional audio input includes at least part of the audio of interest. In response to determining that the additional audio input includes the at least part of the audio of interest, the wearable computing device can generate an additional indication of an additional region of interest associated with the at least part of the audio of interest, where the additional indication of the additional region of interest differs from the indication of the region of interest. Generating multiple indications of regions of interest is discussed above at least in the context of
At block 1120, the server can receive one or more images of a field of view of an environment via the experience sharing session, such as discussed above in the context of
At block 1130, the server can receive an indication of a region of interest within the field of view of the one or more images via the experience sharing session. Indications of regions of interest are discussed above in the context of at least in the context of
In some embodiments, the indication of the region of interest within the field of view of the environment can include one or more eye gaze vectors. In these embodiments, method 1100 can further include the server determining the region of interest within the field of view based on the one or more images of the field of view and the one or more eye gaze vectors.
In other embodiments, the server can receive the indication of the region of interest from a sharer of the experience sharing session.
In particular of these embodiments, the server can receive a plurality of indications of regions of interest from a plurality of sharers. In these embodiments, the server can format a plurality of formatted images, wherein a formatted image for a given sharer can include a first portion and a second portion, the first portion formatted in a high-bandwidth format, and the second portion formatted in a low-bandwidth format, wherein the first portion corresponds to the region of interest indicated by the given sharer. Then, the server can send the formatted image for the given sharer to the given sharer.
At block 1140, the server can determine a first portion of the one or more images that corresponds to the region of interest.
At block 1150, the server can format the one or more images such that a second portion of the one or more images is formatted in a lower-bandwidth format that the first portion. The second portion of the one or more images is outside of the portion that corresponds to the region of interest. Formatting portions of the images using different resolutions or formats is discussed above in the context of at least in the context of
Then, at block 1160, the server can transmit the formatted one or more images. In some embodiments, transmitting the one or more images can include transmitting video data as part of the experience-sharing session. The video data can include the formatted one or more images.
Exemplary methods and systems are described herein. It should be understood that the word “exemplary” is used herein to mean “serving as an example, instance, or illustration.” Any embodiment or feature described herein as “exemplary” is not necessarily to be construed as preferred or advantageous over other embodiments or features. The exemplary embodiments described herein are not meant to be limiting. It will be readily understood that certain aspects of the disclosed systems and methods can be arranged and combined in a wide variety of different configurations, all of which are contemplated herein.
The above detailed description describes various features and functions of the disclosed systems, devices, and methods with reference to the accompanying figures. In the figures, similar symbols typically identify similar components, unless context dictates otherwise. The illustrative embodiments described in the detailed description, figures, and claims are not meant to be limiting. Other embodiments can be utilized, and other changes can be made, without departing from the spirit or scope of the subject matter presented herein. It will be readily understood that the aspects of the present disclosure, as generally described herein, and illustrated in the figures, can be arranged, substituted, combined, separated, and designed in a wide variety of different configurations, all of which are explicitly contemplated herein.
With respect to any or all of the ladder diagrams, scenarios, and flow charts in the figures and as discussed herein, each block and/or communication may represent a processing of information and/or a transmission of information in accordance with example embodiments. Alternative embodiments are included within the scope of these example embodiments. In these alternative embodiments, for example, functions described as blocks, transmissions, communications, requests, responses, and/or messages may be executed out of order from that shown or discussed, including substantially concurrent or in reverse order, depending on the functionality involved. Further, more or fewer blocks and/or functions may be used with any of the ladder diagrams, scenarios, and flow charts discussed herein, and these ladder diagrams, scenarios, and flow charts may be combined with one another, in part or in whole.
A block that represents a processing of information may correspond to circuitry that can be configured to perform the specific logical functions of a herein-described method or technique. Alternatively or additionally, a block that represents a processing of information may correspond to a module, a segment, or a portion of program code (including related data). The program code may include one or more instructions executable by a processor for implementing specific logical functions or actions in the method or technique. The program code and/or related data may be stored on any type of computer readable medium such as a storage device including a disk or hard drive or other storage medium.
The computer readable medium may also include non-transitory computer readable media such as computer-readable media that stores data for short periods of time like register memory, processor cache, and random access memory (RAM). The computer readable media may also include non-transitory computer readable media that stores program code and/or data for longer periods of time, such as secondary or persistent long term storage, like read only memory (ROM), optical or magnetic disks, compact-disc read only memory (CD-ROM), for example. The computer readable media may also be any other volatile or non-volatile storage systems. A computer readable medium may be considered a computer readable storage medium, for example, or a tangible storage device.
Moreover, a block that represents one or more information transmissions may correspond to information transmissions between software and/or hardware modules in the same physical device. However, other information transmissions may be between software modules and/or hardware modules in different physical devices.
It should be understood that for situations in which the embodiments discussed herein collect and/or use any personal information about users or information that might relate to personal information of users, the users may be provided with an opportunity to opt in/out of programs or features that involve such personal information (e.g., information about a user's preferences or a user's contributions to social content providers). In addition, certain data may be anonymized in one or more ways before it is stored or used, so that personally identifiable information is removed. For example, a user's identity may be anonymized so that no personally identifiable information can be determined for the user and so that any identified user preferences or user interactions are generalized (for example, generalized based on user demographics) rather than associated with a particular user.
While various aspects and embodiments have been disclosed herein, other aspects and embodiments will be apparent to those skilled in the art. The various aspects and embodiments disclosed herein are for purposes of illustration and are not intended to be limiting, with the true scope and spirit being indicated by the following claims.
This application is a continuation of U.S. patent application Ser. No. 14/923,232, filed Oct. 26, 2017, which is a continuation of U.S. patent application Ser. No. 13/402,745, filed Feb. 22, 2012, which claims priority to U.S. Patent App. No. 61/510,020, filed Jul. 20, 2011, the contents of all of which are incorporated by reference herein for all purposes.
Number | Date | Country | |
---|---|---|---|
61510020 | Jul 2011 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 14923232 | Oct 2015 | US |
Child | 15885506 | US | |
Parent | 13402745 | Feb 2012 | US |
Child | 14923232 | US |