The present disclosure relates to techniques for performing sound reproduction preferable for listeners in audiovisual (AV) systems.
Sound propagation varies depending on the locational relationship between a sound source and a listener and the environment surrounding the sound source and the listener. Accordingly, the listener senses the difference in sound propagation to perceive the location of the sound source and an impression of the environment. For example, in a situation where the location of the sound source is fixed in front of the listener, a left sound when the listener faces to the right, or a right sound when the listener faces to the light, is relatively turned up, and reaches an external auditory meatus earlier (which causes a level difference between ears and a time difference between ears). The shape of an auricle has different influences on frequency characteristics of an incoming sound depending on the incoming direction of the sound. Accordingly, the listener can perceive the presence of the sound source more clearly with characteristics (e.g., frequency characteristics) of a sound received by both ears and a change of the sound received by both ears.
A sound transfer characteristic between the entrance of an external auditory meatus and a sound source is called a head related transfer function (HRTF), and is known to have a significant influence on sound localization (i.e., the ability of identifying the origin of a sound) by a human being. In recent years, AV systems, such as home theater systems, capable of reproducing highly realistic sound with multi-channel loudspeakers such as 5.1 ch or 7.1 ch loudspeakers by utilizing the sound localization ability of a human being have become widespread among consumers.
In such an AV system, a loudspeaker is generally recommended to face toward a listener at a predetermined location on a circle about the listener. The loudspeaker, however, cannot always be placed at a recommended location because of limitations on, for example, space for installation of the loudspeaker. In this case, the following problem arises.
First, it is difficult to reproduce a sound in a manner intended by a content creator. Specifically, in a situation where the location of a loudspeaker is different from the recommended location, for example, the direction of an incoming sound perceived by a listener does not always coincide with an expected direction. This incoincidence affects not only a sound produced by this loudspeaker but also a balance with a sound produced by another loudspeaker. Accordingly, the sound impression on the listener might greatly differ from that intended by the content creator.
In addition, even in a situation where the loudspeaker is placed at the recommended location, if the listener does not hear at the recommended location or has been moved from the recommended location, a similar problem occurs.
To solve the problems, Japanese Patent Publication No. H06-311211 shows a sound reproduction device including: a location detecting part for detecting the locations of a plurality of loudspeakers and a viewer in real time; and a control part for outputting sound signals to the loudspeakers. The control part calculates a locational relationship between the viewer and each of the loudspeakers based on a detection result from the location detecting part, and sets the timing of outputting a sound signal to each of the loudspeakers from the calculation result, thereby controlling a reproduced sound.
Japanese Patent Publication No. 2003-32776 describes a method for controlling a reproduced sound by detecting, with a camera, the direction in which a listener faces or the number of listeners, and switching a filter coefficient for sound image control according to the location of the listener obtained with the camera.
The conventional techniques described above, however, have the following drawbacks.
First, in the technique described in Japanese Patent Publication No. H06-311211, a relative locational relationship between a listener and a loudspeaker is detected, and the timing of outputting a sound signal is controlled based on the detected locational relationship. That is, only the location of the loudspeaker relative to the listener is taken into consideration in controlling sound reproduction. In the technique described in Japanese Patent Publication No. 2003-32776, a reproduced sound is merely controlled according to the location of the listener obtained with the camera.
However, sound reproduction is affected not only by the locational relationship between the listener and the loudspeaker. For example, the orientation of the loudspeaker relative to the listener greatly affects perception of a sound. This is because the directional characteristics of the loudspeaker vary depending on the frequency. The loudspeaker is originally designed to have a balance of frequency characteristics with respect to a sound received in front of the loudspeaker. However, since the directional characteristics of the loudspeaker vary depending on the frequency, when a sound is received at a side or the rear of the loudspeaker, for example, the balance of the frequency characteristics is disturbed, thus failing to exhibit original acoustic performance of the loudspeaker.
Thus, to achieve optimum sound reproduction, the orientation of the loudspeaker relative to the listener also needs to be reflected on control of sound reproduction. In addition, in view of movement of the listener during listening, it is preferable to allow information on the orientation of the loudspeaker relative to the listener to be acquired in real time in order to enable dynamic control.
It is therefore an object of the present disclosure to achieve control of sound reproduction, while allowing the orientation of a loudspeaker relative to a listener to be dynamically reflected on an AV system.
In a first aspect of the present disclosure, a camera-equipped loudspeaker includes a loudspeaker body; and a camera united with the loudspeaker body, and configured to capture an image in a direction in which the loudspeaker body outputs a sound.
In this aspect, the camera united with the loudspeaker body can acquire an image in a direction in which the loudspeaker body outputs a sound. From this image, an image processing technique can recognize the location of a listener and detect the orientation of the loudspeaker body relative to the listener. Accordingly, the use of the camera-equipped loudspeaker can achieve control on sound reproduction with the orientation of the loudspeaker relative to the listener dynamically reflected thereon.
In a second aspect of the present disclosure, a signal processor for the camera-equipped loudspeaker of the first aspect includes: a recognition unit configured to receive an image signal output from the camera, recognize a location of a listener from an image shown by the image signal, and detect an orientation of the loudspeaker body relative to the listener based on the recognized location of the listener; and a sound control unit configured to perform signal processing on a given sound signal for generating an output signal, and output the output signal as an acoustic signal to the loudspeaker body.
In this aspect, from an image taken by the camera of the camera-equipped loudspeaker, the recognition unit can recognize the location of the listener and detect the orientation of the loudspeaker body relative to the listener. Accordingly, it is possible to achieve control on sound reproduction with the orientation of the loudspeaker relative to the listener dynamically reflected thereon.
In a third aspect of the present disclosure, an AV system includes: a loudspeaker body; a camera united with the loudspeaker body, and configured to capture an image in a direction in which the loudspeaker body outputs a sound; and a recognition unit configured to receive an image signal output from the camera, recognize a location of a listener from an image shown by the image signal, and detect an orientation of the loudspeaker body relative to the listener based on the recognized location of the listener; and a sound control unit configured to perform signal processing on a given sound signal for generating an output signal, and output the output signal as an acoustic signal to the loudspeaker body.
In this aspect, the camera united with the loudspeaker body can acquire an image in a direction in which the loudspeaker body outputs a sound. From this image, the recognition unit can recognize the location of the listener and detect the orientation of the loudspeaker body relative to the listener. Accordingly, it is possible to achieve control on sound reproduction with the orientation of the loudspeaker relative to the listener dynamically reflected thereon.
Accordingly to the present disclosure, the use of the camera-equipped loudspeaker can achieve control on sound reproduction with the orientation of the loudspeaker relative to the listener dynamically reflected thereon, thus achieving sound reproduction more appropriate for a listener.
Embodiments of the present disclosure will be described in detail hereinafter with reference to the drawings.
In the signal processor 104, the recognition unit 103 recognizes the location of a listener P1 from an image shown by an image signal output from the camera 112, and based on the recognized listener location, detects the orientation of the loudspeaker body 111 relative to the listener P1. For example, an angle θh formed by the front direction (indicated by a dash-dotted line in
Although
The camera of the camera-equipped loudspeaker is not necessarily placed in the manner as shown in the example of
Alternatively, a plurality of cameras may be provided. This configuration can expand a shooting range, and thereby, the listener is more likely to be within the camera view. In addition, the use of information captured by the plurality of cameras can increase the accuracy in detecting the location of the listener.
Referring now to
Then, the location of the face image IP1 in the horizontal direction of the camera image is obtained. In this embodiment, the center of the face image IP1 is located at a distance of a (where 0<a<1 and the width of the camera image in the horizontal direction is 2) from the center of the camera image to the left. Suppose that the angle formed by the front direction (indicated by a dash-dotted line in
θh=γ*a
where a is the length described above. In a different aspect, this angle θh indicates the direction of the loudspeaker body 111 in the horizontal direction relative to the listener P1 (where the orientation of the loudspeaker body 111 and the orientation of the camera 112 are already known).
If the face image IP1 is included in the right half of the camera image, the angle θh can also be detected in the same manner. Through the same process, an angle θv in the vertical direction can be detected. The foregoing process allows the recognition unit 103 to detect the orientation of the loudspeaker body relative to the listener P1.
Then, an example of a method for estimating the distance L between a loudspeaker and the listener P1 will be described with reference to
The heads of the actual users do not always have a standard size, and may have sizes larger or smaller than the standard size. Thus, as shown in
The method for estimating the distance L between the loudspeaker and the listener P1 is not limited to the method described above, and may be a method for calculating the distance L based on image information from two cameras whose locations are known, or a method for estimating the distance L based on a focus position at which the listener is detected by auto-focus of a camera.
In the manners described above, the recognition unit 103 can detect location information (i.e., the angles θh and θv and the distance L) of the listener P1 using an image signal output from the camera 112. In particular, since the camera 112 is united with the loudspeaker body 111, the location of the listener P1 relative to the loudspeaker body 111 can be easily detected. This configuration can provide more appropriate sound reproduction than that in conventional configurations.
Then, processing in the sound control unit 102 will be described. As illustrated in
First, a method for using direction information θh and θv will be described. Here, the use of the direction information θh and θv for signal processing on a sound signal allows correction of an output signal based on directional characteristics of the loudspeaker body 111. Specifically, in this embodiment, an output signal is corrected based on the directional characteristics of the loudspeaker body 111 according to the orientation of the loudspeaker body 111 relative to the listener P1.
As shown in
To solve these problems, the directional characteristics of the loudspeaker are measured to previously calculate an equalizer for correcting an influence of the directional characteristics, and equalizer processing is performed according to detected direction information θh and θv, i.e., the orientation of the loudspeaker body relative to the listener. This processing enables well-balanced reproduction independent of the orientation of the loudspeaker relative to the listener.
Referring now to
The foregoing description is directed to the directional characteristics on a horizontal plane, but the directional characteristics of a loudspeaker are defined on a sphere surrounding the loudspeaker. Thus, the table shown in
To perform equalizer processing, it is sufficient for the sound control unit 102 to include an analog filter or a digital filter such as an IIR filter and an FIR filter. For example, if a parametric equalizer is used for correction, a Q value (i.e., a value indicating the sharpness of a peak of frequency characteristics) may be set in addition to correction gains.
Thereafter, a method for using distance information L will be described. In a case where a sound is produced from a point, the sound propagates in all the directions, and attenuates to a degree corresponding to the propagation. The amount of the attenuation is inversely proportional to the square of the distance. For example, as shown in
To prevent this unwanted situation, gain correction is performed on a sound produced by a loudspeaker according to detected distance information L. This gain correction enables well-balanced reproduction even in a case where the distance between the listener and the loudspeaker is not optimum.
The relationship between the distance and the attenuation described here holds in the presence of an ideal point sound source (i.e., a dimensionless nondirectional theoretical sound source) and an ideal free sound field. In practice, the sound source is not a point sound source, i.e., has dimensions and directivity. In addition, a sound field has various reflections, and thus, is not a free sound field. Accordingly, for an actual loudspeaker or actual reproduction environments, correction gains associated with the respective distances as shown in
The correction gain may be set for each frequency. A sound having a high frequency component is known to show a large amount of attenuation depending on the distance, as compared to a sound having a low frequency component. Accordingly, if a data table as shown in
Alternatively, correction may be performed by equalizing the sound pressure levels of a plurality of loudspeakers. For example, in a case where loudspeakers are located at distances of r1, r2, and r3, respectively, shown in
Such correction by the sound control unit 102 according to angle information θh and θv and distance information L can achieve better sound reproduction even in a case where the orientation of a loudspeaker does not face toward the listener or a case where the distance from a loudspeaker to the listener is not optimum.
In this embodiment, correction values for each angle and each distance are obtained as gains for the entire band or each frequency. Alternatively, each correction value may be held as a correction FIR filter to be used for correction. The use of an FIR filter enables phase control so that more accurate correction can be performed.
Then, an example of operation timings of image shooting by the camera 112, detection processing by the recognition unit 103, and correction by the sound control unit 102 will be described.
For example, the camera 112 always takes photographs, and continuously outputs an image signal to the recognition unit 103. The recognition unit 103 always detects the location of a listener from an image signal, and continuously outputs location information on the listener to the sound control unit 102 in real time. The sound control unit 102 receives location information which is output in real time, switches correction processing in real time, and continuously corrects an acoustic signal. In this manner, even when the location of the listener dynamically changes, sound control can follow this change.
In such control, however, correction processing switches even with a small movement of the listener, and causes a change only to an inaudible degree in some cases. Such switching of the correction processing is meaningless in terms of audibility. To avoid such switching, location information on a listener may be output to the sound control unit 102 only when the recognition unit 103 detects a movement (e.g., a change in angle or distance) of the listener to a degree larger than or equal to a predetermined threshold value, for example.
Alternatively, image shooting by the camera 112 and detection processing by the recognition unit 103 may be performed at predetermined time intervals. This operation can reduce a processing load in the system. Alternatively, the recognition unit 103 and the sound control unit 102 may execute processing when a user turns a trigger switch on with, for example, a remote controller. This operation can further reduce a processing load in the system.
Alternatively, the initial value of location information on a listener may be previously set by, for example, performing a measurement mode included in a system, for example, such that subsequent dynamic correction caused by movement of the listener can be performed using an image signal output from the camera 112.
The correction data table as described in this embodiment is recorded in, for example, a nonvolatile memory in the sound control unit 102.
Since an actual AV system includes a plurality of loudspeakers, application of the technique described here to each of the loudspeakers enables control to be performed on a sound reproduced by the loudspeaker according to the user location.
In the configuration illustrated in
In this embodiment, the array loudspeaker 113 is provided with a camera 112, and in a signal processor 204, a recognition unit 103 detects the orientation of the array loudspeaker 113 relative to a listener. This detection can be achieved in the same manner as in the first embodiment. Then, a sound control unit 202 performs signal processing on a sound signal such that the peak of the directivity of the array loudspeaker 113 is directed to the listener, and outputs acoustic signals to the respective loudspeaker units.
The direction of the peak of the directivity of the array loudspeaker 113 can be easily controlled, for example, with settings of delays and gains to be added to acoustic signals to the respective loudspeaker units. Specifically, to shift the direction of the peak of the directivity slightly to the right, for example, a delay of an acoustic signal to a left loudspeaker unit is reduced and a gain of this acoustic signal is increased so that a sound is output more quickly at a larger volume.
In addition, to direct the peak of the directivity of the array loudspeaker 113 to a listener P1 with higher accuracy, a data table for holding, for each angle, an FIR filter coefficient for use in sound control on each loudspeaker unit as shown in
The foregoing description is directed to directivity control on a horizontal plane, but the use of a loudspeaker array in which loudspeaker units are arranged in a vertical direction enables directivity control according to angle information θv in a vertical direction to be achieved in the same manner.
The loudspeaker units may be arranged in a plane. In this case, directivity control according to angle information on each of the horizontal and vertical directions can be achieved.
As in the first embodiment, in control according to distance information L, gain correction according to the distance may be performed on acoustic signals of the respective loudspeaker units.
In the case of using an array loudspeaker, so-called localized reproduction can be performed, and this embodiment may be applied to control on this localized reproduction. The localized reproduction is such reproduction that a sound is reproduced only in a predetermined region and the sound volume rapidly decreases at a location apart from this region. For example, in a case where the camera 112 detects the location of the listener P1 and it is found that the listener P1 is located out of an expected region, the sound control unit 202 switches a control parameter to perform control such that the location of the listener P1 is included in the region of the localized reproduction.
In the configuration illustrated in
Control of actually changing the orientation of the loudspeaker as described above may be performed in combination with the correction processing on directional characteristics of a loudspeaker described in the first embodiment. Specifically, for example, control may be performed in such a manner that the correction processing on directional characteristics is employed if the angle information θh and θv indicating the orientation of the loudspeaker body 111 relative to the listener P1 is less than or equal to a predetermined threshold value, and the orientation of the loudspeaker is changed by the movable mechanism 114 if the angle information θh and θv exceeds the predetermined threshold value. When the orientation of the loudspeaker greatly deviates from the listener, a large correction gain needs to be given in order to correct the directional characteristics. However, if the correction gain is increased, the problem of an overflow occurs in digital signals, and distortion might occur in a sound because of a reproduction upper limit gain of the loudspeaker itself. Accordingly, a combination of control of this embodiment with correction of directional characteristics can avoid such a problem.
This embodiment is also applicable to the array loudspeaker of the second embodiment. Specifically, the array loudspeaker may be provided in the movable mechanism so that the movable mechanism is controlled to change the orientation of the array loudspeaker. This configuration enables directivity control or control for localized reproduction.
In the configuration illustrated in
In detecting the number of listeners from a camera image, if a plurality of listeners overlap when viewed from the loudspeaker, for example, a plurality of listeners might be recognized as one. In this case, however, control of directional characteristics on the listeners recognized as one causes no serious problems in terms of sound quality. That is, if a plurality of listeners appear to overlap each other, the number of these listeners does not need to be strictly detected, and the processing is simplified accordingly.
The foregoing embodiments have been given mainly on correction of directional characteristics. However, other configurations, such as a configuration in which the face direction of a listener when viewed from a loudspeaker or the distance between the loudspeaker and the listener is detected and the head-related transfer function from the loudspeaker is estimated so that a sound control unit performs control, may be employed. The sound control unit previously holds a control parameter according to the face direction and the distance, and switches the control parameter according to the detection result to perform reproduction. An example of easy correction includes correction of the distance from the loudspeaker to the listener. For example, if the distance from a loudspeaker to a listener is smaller than that from another loudspeaker, the timing of producing a sound is delayed. This operation can obtain the same advantages as those obtained by extending the loudspeaker distance.
The present disclosure can provide sound reproduction more appropriate for a listener in an AV system, and thus is, useful for improvement of sound quality in, for example, home theater equipment.
Number | Date | Country | Kind |
---|---|---|---|
2009-048981 | Mar 2009 | JP | national |
This is a continuation of PCT International Application PCT/JP2010/001328 filed on Feb. 26, 2010, which claims priority to Japanese Patent Application No. 2009-048981 filed on Mar. 3, 2009. The disclosures of these applications including the specifications, the drawings, and the claims are hereby incorporated by reference in their entirety.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/JP2010/001328 | Feb 2010 | US |
Child | 13224632 | US |