Many different types of sound-producing electronic systems have been developed, including radios, televisions, digital music players, digital video players, home theater systems, speaker phones, and portable electronic devices. The sounds produced by many electronic systems are non-directional (i.e., the sounds are radiated equally in essentially all directions). Some electronic systems, however, include sound systems that are capable of producing directed sound beams. In one approach, directed sound beams are produce by physically aiming one or more loudspeakers in a selected target direction. In another approach, directed sound beams are produce by a phased array of loudspeakers that are controlled to produce directed sound beams can be steered, focused, and shaped.
In one approach, a directed acoustic sound system includes a one-panel loudspeaker array that can deliver sound in up to seven separate beams that can be steered, as well as controlled to become a tightly focused or wider beam. Multi-channel surround sound can be delivered to a listener's position through reflections off ceiling and walls. The listener's position is determined based on signals transmitted to the system from a remote control unit that is carried by the listener.
In another approach, a directed acoustic sound system includes a disk-shaped parametric loudspeaker that may be mounted on a motorized mounting stand that can be rotated to different positions to account for varying listener positions. The mounting stand may be configured to track the listener automatically by sensing sounds produced by the listener's movements. The directed acoustic sound system also may include a proximity sensor (e.g., ultrasonic, echo, etc.) that detects how far the listener is from the system. The parameters of the loudspeaker may be optimally adjusted based on the detected proximity information.
An interactive directed light/light system has been proposed that includes a speaker system that can direct sound in a narrow beam. A motorized mount is used to redirect the sound beam to different locations. The system also includes a complex vision processing system that processes image data from one or more video cameras to distinguish moving (or “foreground”) objects from static (or “background”) parts of an interactive area. The vision processing system may be configured to track the location of each foreground object in the interactive area. The system stores information about the physical locations of real and virtual objects within the interactive area to allow users to interact with the virtual objects. In one implementation, a specialized audio stream is delivered to a single person moving around the interactive area. The specialized audio stream may be used to deliver music, private instructions, information, advertisements, or warnings to the person without disturbing others and without the encumbrance of headphones.
Hitherto, directed sound systems have either encumbered the listener with a locating device (e.g., a remote control) to determine the listener's location or have relied on locating methods that cannot readily distinguish persons from other objects without the use of substantial processing resources. In addition, none of the prior directed sounds systems is capable of controlling the generation of directed sound beams based on an unobtrusive detection of the listener's attentional state.
In one aspect, the invention features an electronic system that includes a source of an audio signal, a sound projector, an imaging system, and a controller. The sound projector is operable to generate at least one directed sound beam based on the audio signal. The imaging system is operable to capture images of a person in a space adjacent to the sound projector and to process images captured by the imaging system to identify at least one eye of the person and to estimate a position of the person in the space based on the identified ones of the person's eyes. The controller is operable to control the generation of the directed sound beam based on the estimated position of the person in the space.
In another aspect, the invention features a method of generating a directed sound beam. In accordance with this inventive method, images of a person in a space are captured. The captured images are processed to identify at least one eye of the person and to estimate a position of the person in the space based on the identified ones of the person's eyes. At least one directed sound beam is generated based on an audio signal and the estimated position of the person in the space.
Other features and advantages of the invention will become apparent from the following description, including the drawings and the claims.
In the following description, like reference numbers are used to identify like elements. Furthermore, the drawings are intended to illustrate major features of exemplary embodiments in a diagrammatic manner. The drawings are not intended to depict every feature of actual embodiments nor relative dimensions of the depicted elements, and are not drawn to scale.
The embodiments that are described in detail below control the generation of directed sound beams in ways that readily distinguish persons from other objects without incorporating substantial processing resources and without encumbering the listener with a locating device (e.g., a remote control). In particular, these embodiments determine a listener's location based on image-based detection of a person's eyes. In addition, these embodiments also are capable of controlling the operational state of an electronic system, including controlling the generation of directed sound beams, based on an unobtrusive detection of the listener's attentional state.
The screen 12 may be any type of video screen or monitor, including a CRT screen and a flat panel screen (e.g., a plasma display screen).
The sound projector 14 may be any type of sound system that is capable of selectively transmitting sound to particular locations in the space 26, including sound systems capable of physically aiming a loudspeaker to selected locations in the space 26 and sound systems capable of virtually aiming sound from an array of loudspeakers to particular locations in the space 26. In the implementation shown in
Each of the one or more A/V sources 16 may be any type of A/V source, including a CD player, a DVD player, a video player, an MP3 player, a broadcast radio, a satellite radio, an internet radio, a video game console, and a cable or satellite set-top box capable of decoding and playing paid audio and video programming.
The imaging system 18 includes an imaging device and an image processing system. The imaging device typically remains fixed in place in an orientation facing the space 26 in front of the sound projector 14. Exemplary imaging devices include remote-controllable digital cameras (e.g., a Kodak DCS760 camera), USB video cameras (or “webcams”), and Firewire/1394 cameras. In some implementations, the imaging device captures images at a rate of 30 fps (frames per second) and a resolution of 320 pixels×240 pixels. The image processing system controls the capture of images by the imaging device. As explained in detail below, the image processing system processes the captured images to identify at least one eye of the person 24 and estimates a position of the person in the space based on the identified ones of the person's eyes. In some implementations, the image processing system also processes the captured images to determine an attentional state of the person 24.
Referring to
Referring to
In some implementations, the imaging device 42 captures the images of the person 24 alternately illuminated by the on-axis light source 44 and the off-axis light source 46. The light from the light sources 44, 46 are emitted in pulses that are synchronized with the frame rate of the imaging device 42. The light pulses may be emitted at a rate equal to the frame rate or in bursts each having a period longer than the frame rate. The wavelength of the light emitted from the light sources 44, 46 may be the same or different. In some implementations, the light emitted from both light sources is in the infrared or near-infrared wavelength ranges.
Differential reflectivity off the retinas of the person's eyes is dependent upon the angle θ1 between light source 44 and the axis 50 of the imaging device 42, and the angle θ2 between the light source 46 and the axis 50. In general, a smaller angle θ1 will increase the retinal return (i.e., the intensity of light that is reflected off the back of the person's eye and detected by the imaging device 42). Accordingly, images captured with the person 24 illuminated by the on-axis light source 44 will contain a bright spot corresponding to the person's pupil when the person's eyes are open, whereas images captured with the person 24 illuminated by the off-axis light source 46 will not contain such a bright spot. Therefore, when the person's eyes are open, the difference between the images captured under the on-axis and off-acis illuminations will highlight the pupils of the person's eyes.
For example,
In the illustrated embodiment, the imaging system 18 includes an image processing system 56 that processes the images captured by the imaging device 42 to detect at least one eye of the person 24 (block 58). In other embodiments, the image processing system 56 is incorporated in the receiver 20.
The image processing system 56 detects the person's eyes based on a difference image that is derived from the images that are alternately captured under the different illuminations that are provided by the on-axis and off-axis light sources 44, 46. An exemplary difference image 60, which corresponds to the subtraction of image 54 from the image 52, is shown in
In general, the imaging device 42 and the light sources 46, 48 may be located at any distance from the person 24 within the space 26 so long as the light sources 46, 48 provide sufficient illumination for imaging device 42 to detect a retinal return along the optical axis 50. In addition, it is noted that this method of eye detection is substantially unaffected by the angle of the person's gaze toward the screen 12. Therefore, the orientation of the head and eyes of the person 24 may move relative to the light sources 44, 46 and the detector 42 without significantly affecting the efficiency and reliability of the eye detection process.
Additional details regarding the construction and operation of the above-described eye detection methods, as well as details regarding alternative methods of detecting the pupil regions of the person's eyes, may be obtained from U.S. Patent Application Publication No. 2004/0170304.
After the image processing system 56 has detected at least one eye of the person 24 (block 58), the image processing system 56 additionally processes the captured images to estimate the position of the person 24 in the space 26 based on the identified ones of the person's eyes (block 62). In some implementations, the image processing system 56 may map the position of the person's eyes in an image of the space 26 to a sound beam direction that can be used by the controller 28 to direct the sound beam 22 to the estimated location of the person 24.
Referring to
The distance y1 may be determined by pre-calibrating the imaging system 18 with at least one predetermined listening locations (e.g., one or more predetermined seating locations in the space 26). For example, in one implementation, if the person's right eye appears in the left region 66 of the image 64, the distance y1 is assumed to correspond to a calibrated distance DL. If the person's eye appears in the center region 68 of the image 64, the distance y1 is assumed to correspond to a calibrated distance DC. If the person's eye appears in the center region 68 of the image 64, the distance y1 is assumed to correspond to a calibrated distance DR. Alternatively, the distance y1 may be determined dynamically by using an optical (or acoustic) range finding device to determine the distance between the person 24 and the sound projector 14. The determined distance is mapped to the distance y1 using simple geometry. Other methods of mapping the coordinates of the person's eye in the image 64 to a three-dimensional coordinate system that is anchored to the location of the sound projector 14 also may be used.
Referring back to
The size of the sound focus region may be determined empirically. In some implementations, the size of the sound focus region is selected to be large enough to encompass all of the eyes that are identified in the space 26, provided the size of the sound focus region does not exceed a predetermined threshold. In some of these implementations, if the sound focus region size that is needed to encompass all of the identified eyes would have to be larger than the predetermined threshold, multiple sound beams are generated and respectively focused onto respective clusters of ones of the eyes that have been identified in the space 26.
Referring to
In some these implementations, the imaging system 18 determines the attentional state of the person 24 based on the identification of the person's pupils in the images captured by the imaging device 42. For example, if the image processing system 56 fails to detect any eyes in the space 26 for longer than a prescribed period, the image processing system 56 may infer that nobody is located within the space 26 or that a person who previously entered the space 26 has either left the space 26, has fallen asleep, or is no longer interested in gazing at the screen 12. The image processing system 56 also may determine the focus of the user's attention based on an estimate of the angle of the person's gaze with respect to the entertainment system 10.
After the attentional state of the person 24 has been determined (block 72), the controller 28 may change the operational status of one or more components of the entertainment system 10 (block 74). The controller 28 may be programmed to respond the determined attentional state in many different, configurable ways. For example, the controller 28 may raise or lower the volume of the directed sound beam 22, change the equalization of the directed sound beam 22, change the images that are presented on the screen (e.g., present a predetermined visualization on the screen 12 when the person 24 is determined to be dancing to music), or selectively turn off or turn on one or more components of the entertainment system 10.
Depending on the determined state of wakefulness, the controller 28 may change the operational status of one or more components of the electronic entertainment system 10 (block 78). For example, if the person is determined to have fallen asleep, the controller 28 may lower the volume of the directed sound beam 22 or turn-down components of the electronic entertainment system 10 (e.g., place one or more components in a low-power standby mode of operation or a shutdown mode of operation). Alternatively, is the person is determined to have just woken-up, the controller 28 may increase the volume of the directed sound beam 22 or return the components of the electronic entertainment system 10 to their operational status before the person 24 fell asleep.
In general, the angle of the person's gaze may be determined in any one of a wide variety of different ways. For example, known eye-gaze tracking systems, determine the person's gaze angle by tracking the relative positions of the glint and bright-eye reflections from at least one of the person's eyes when illuminated by infrared light.
Referring to
Referring back to
The controller 28 may use the encoded surround sound data to determine an appropriate modification of the equalization parameters. For example, if the person 24 is determined to be gazing at the right region 86 of the screen 12, the sounds in the right front and right rear channels of the encoded A/V signals might be enhanced; or the sounds in the center, left front and left rear channels might be reduced. If the person 24 is determined to be gazing at the left region 82 of the screen 12, the sounds in the left front and left rear channels of the encoded A/V signals might be enhanced; or the sounds in the center, right front and right rear channels might be reduced. If the person 24 is determined to be gazing at the center region 84 of the screen 12, the sounds in the center channel of the encoded A/V signals might be enhanced; or the sounds in the right front, right rear, left front and left rear channels might be reduced.
The systems and methods described herein are not limited to any particular hardware or software configuration. These systems and methods may be implemented in any computing or processing environment, including in digital electronic circuitry or in computer hardware, firmware, or software.
Other embodiments are within the scope of the claims.
For example, the embodiments were described above in the context of a home theater entertainment system. These embodiments, however, readily may be incorporated in a wide variety of other electronic systems, including broadcast, satellite and internet radio systems, television systems, memory-based video and music playback systems, and video game systems.