Traditionally, surround sound systems are calibrated using a multi-element microphone placed at a sweet spot or default listening position to measure audio signals played by each loudspeaker. The multi-element microphone is usually tethered to an AV receiver or processor by means of a long cable, which could be cumbersome for consumers. Furthermore, when a loudspeaker is moved or a listener is away from the sweet spot, existing calibration methods have no way to detect such changes without a full manual recalibration procedure. It is therefore desirable to have a method and apparatus to calibrate surround sound systems with minimum user intervention.
A brief summary of various exemplary embodiments is presented. Some simplifications and omissions may be made in the following summary, which is intended to highlight and introduce some aspects of the various exemplary embodiments, but not to limit the scope of the invention. Detailed descriptions of a preferred exemplary embodiment adequate to allow those of ordinary skill in the art to make and use the inventive concepts will follow in later sections.
Various exemplary embodiments relate to a method, an apparatus and a system for calibrating multichannel surround sound systems. The apparatus may include a speaker, a headphone (over-the-ear, on-ear, or in-ear), a microphone, a computer, a mobile device, a home theater receiver, a television, a Blu-ray (BD) player, a compact disc (CD) player, a digital media player, or the like. The apparatus may be configured to receive an audio signal, process the audio signal and filter the audio signal for output.
Various exemplary embodiments further relate to a method for calibrating a multichannel surround sound system including a soundbar and one or more surround loudspeakers, the method comprising: receiving, by an integrated microphone array, a test signal played at a surround loudspeaker to be calibrated, the integrated microphone array mounted in a relationship to the soundbar; estimating a position of the surround loudspeaker relative to the microphone array; receiving, by the microphone array, a sound from a listener; estimating a position of the listener relative to the microphone array; and performing a spatial calibration to the surround sound system based at least on one of the estimated position of the surround loudspeaker and the estimated position of the listener.
In some embodiments, the microphone array includes two or more microphones. In some embodiments, the position of the surround loudspeaker and the position of the listener each includes a distance and an angle relative to the microphone array, wherein the position of the loudspeaker is estimated based on a direct component of the received test signal, and wherein the angle of the loudspeaker is estimated using two or more microphones in the microphone array and based on a time difference of arrival (TDOA) of the test signal at the two or more microphones in the microphone array. In some embodiments, the sound from the listener includes the listener's voice or other sound cues made by the listener. In some embodiments, the position of the listener is estimated using three or more microphones in the microphone array. In some embodiments, performing the spatial calibration comprises: adjusting delay and gain of a sound channel for the surround loudspeaker based on the estimated position of the surround loudspeaker and the listener; and correcting spatial position of the sound channel by panning the sound channel to a desired position based on the estimated positions of the surround loudspeaker and the listener. In some embodiments, performing the spatial calibration comprises panning a sound object to a desired position based on the estimated positions of the surround loudspeaker and the listener.
Various exemplary embodiments further relate to a method comprising: receiving a request to calibrate a multichannel surround sound system including a soundbar with an integrated microphone array and one or more surround loudspeakers; responsive to the request including estimating a position of a surround loudspeaker, playing a test signal at the surround loudspeaker; and estimating the position of the surround loudspeaker relative to the microphone array based on received test signal at the microphone array; responsive to the request including estimating a position of a listener, estimating the position of the listener relative to the microphone array based on a received sound of the listener at the microphone array; and performing a spatial calibration to the multichannel surround sound system based at least on one of the estimated position of the surround loudspeaker and the estimated position of the listener.
Various exemplary embodiments further relate to an apparatus for calibrating a multichannel surround sound system including one or more loudspeakers, the apparatus comprising: a microphone array integrated in a front component of the surround sound system, wherein the integrated microphone array is configured for receiving a test signal played at a loudspeaker to be calibrated, and for receiving a sound from the listener; an estimation module configured for estimating a position of the loudspeaker relative to the microphone array based on the received test signal from the loudspeaker, and for estimating a position of the listener relative to the microphone array based on the received sound from the listener; and a calibration module configured for performing a spatial calibration to the surround sound system based at least on one of the estimated position of the loudspeaker and the estimated position of the listener.
In some embodiments, the front component of the surround sound system is one of a soundbar, a front loudspeaker and an A/V receiver. In some embodiments, the position of the loudspeaker and the position of the listener each includes a distance and an angle relative to the microphone array, wherein the position of the loudspeaker is estimated based on a direct component of the received test signal, and wherein the angle of the loudspeaker is estimated using two or more microphones in the microphone array and based on a time difference of arrival (TDOA) of the test signal at the two or more microphones in the microphone array. In some embodiments, the position of the listener is estimated using three or more microphones in the microphone array. In some embodiments, performing the spatial calibration comprises: adjusting delay and gain of a sound channel for the loudspeaker based on the estimated position of the loudspeaker and the listener; and correcting spatial position of the sound channel by panning the sound channel to a desired position based on the estimated positions of the surround loudspeaker and the listener. In some embodiments, performing the spatial calibration comprises panning a sound object to a desired position based on the estimated positions of the surround loudspeaker and the listener.
Various exemplary embodiments further relate to a system for calibrating a multichannel surround sound system including one or more loudspeakers, the system comprising: a microphone array with two or more microphones integrated in a front component of the surround sound system, wherein the microphone array is configured for receiving a test signal played at a loudspeaker to be calibrated and for receiving a sound from the listener; an estimation module configured for estimating a position of the loudspeaker relative to the microphone array based on the received test signal from the loudspeaker, and for estimating a position of the listener relative to the microphone array based on the received sound from the listener; and a calibration module configured for performing a spatial calibration to the surround sound system based at least on one of the estimated position of the loudspeaker and the estimated position of the listener.
In some embodiments, the front component of the surround sound system is one of a soundbar, a front loudspeaker and an A/V receiver.
These and other features and advantages of the various embodiments disclosed herein will be better understood with respect to the following description and drawings, in which like numbers refer to like parts throughout, and in which:
The detailed description set forth below in connection with the appended drawings is intended as a description of the presently preferred embodiment of the invention, and is not intended to represent the only form in which the present invention may be constructed or utilized. The description sets forth the functions and the sequence of steps for developing and operating the invention in connection with the illustrated embodiment. It is to be understood, however, that the same or equivalent functions and sequences may be accomplished by different embodiments that are also intended to be encompassed within the spirit and scope of the invention. It is further understood that the use of relational terms such as first and second, and the like are used solely to distinguish one from another entity without necessarily requiring or implying any actual such relationship or order between such entities.
The present application concerns a method and apparatus for processing audio signals, which is to say signals representing physical sound. These signals are represented by digital electronic signals. In the discussion which follows, analog waveforms may be shown or discussed to illustrate the concepts; however, it should be understood that typical embodiments of the invention will operate in the context of a time series of digital bytes or words, said bytes or words forming a discrete approximation of an analog signal or (ultimately) a physical sound. The discrete, digital signal corresponds to a digital representation of a periodically sampled audio waveform. As is known in the art, for uniform sampling, the waveform must be sampled at a rate at least sufficient to satisfy the Nyquist sampling theorem for the frequencies of interest. For example, in a typical embodiment a uniform sampling rate of approximately 44.1 thousand samples/second may be used. Higher sampling rates such as 96 khz may alternatively be used. The quantization scheme and bit resolution should be chosen to satisfy the requirements of a particular application, according to principles well known in the art. The techniques and apparatus of the invention typically would be applied interdependently in a number of channels. For example, it could be used in the context of a “surround” audio system (having more than two channels).
As used herein, a “digital audio signal” or “audio signal” does not describe a mere mathematical abstraction, but instead denotes information embodied in or carried by a physical medium capable of detection by a machine or apparatus. This term includes recorded or transmitted signals, and should be understood to include conveyance by any form of encoding, including pulse code modulation (PCM), but not limited to PCM. Outputs or inputs, or indeed intermediate audio signals may be encoded or compressed by any of various known methods, including MPEG, ATRAC, AC3, or the proprietary methods of DTS, Inc. as described in U.S. Pat. Nos. 5,974,380; 5,978,762; and 6,487,535. Some modification of the calculations may be required to accommodate that particular compression or encoding method, as will be apparent to those with skill in the art.
The present invention may be implemented in a consumer electronics device, such as a Digital Video Disc (DVD) or Blu-ray Disc (BD) player, television (TV) tuner, Compact Disc (CD) player, handheld player, Internet audio/video device, a gaming console, a mobile phone, or the like. A consumer electronic device includes a Central Processing Unit (CPU) or Digital Signal Processor (DSP), which may represent one or more conventional types of such processors, such as an IBM PowerPC, Intel Pentium (x86) processors, and so forth. A Random Access Memory (RAM) temporarily stores results of the data processing operations performed by the CPU or DSP, and is interconnected thereto typically via a dedicated memory channel. The consumer electronic device may also include permanent storage devices such as a hard drive, which are also in communication with the CPU or DSP over an I/O bus. Other types of storage devices, such as tape drives and optical disk drives, may also be connected. A graphics card is also connected to the CPU via a video bus, and transmits signals representative of display data to the display monitor. External peripheral data input devices, such as a keyboard or a mouse, may be connected to the audio reproduction system over a USB port. A USB controller translates data and instructions to and from the CPU for external peripherals connected to the USB port. Additional devices such as printers, microphones, speakers, and the like may be connected to the consumer electronic device.
The consumer electronic device may utilize an operating system having a graphical user interface (GUI), such as WINDOWS from Microsoft Corporation of Redmond, Wash., MAC OS from Apple, Inc. of Cupertino, Calif., various versions of mobile GUIs designed for mobile operating systems such as Android, and so forth. The consumer electronic device may execute one or more computer programs. Generally, the operating system and computer programs are tangibly embodied in a computer-readable medium, e.g. one or more of the fixed and/or removable data storage devices including the hard drive. Both the operating system and the computer programs may be loaded from the aforementioned data storage devices into the RAM for execution by the CPU. The computer programs may comprise instructions which, when read and executed by the CPU, cause the same to perform the steps to execute the steps or features of the present invention.
The present invention may have many different configurations and architectures. Any such configuration or architecture may be readily substituted without departing from the scope of the present invention. A person having ordinary skill in the art will recognize the above described sequences are the most commonly utilized in computer-readable mediums, but there are other existing sequences that may be substituted without departing from the scope of the present invention.
Elements of one embodiment of the present invention may be implemented by hardware, firmware, software or any combination thereof. When implemented as hardware, the audio codec may be employed on one audio signal processor or distributed amongst various processing components. When implemented in software, the elements of an embodiment of the present invention may be the code segments to perform various tasks. The software may include the actual code to carry out the operations described in one embodiment of the invention, or code that may emulate or simulate the operations. The program or code segments can be stored in a processor or machine accessible medium or transmitted by a computer data signal embodied in a carrier wave, or a signal modulated by a carrier, over a transmission medium. The “processor readable or accessible medium” or “machine readable or accessible medium” may include any medium configured to store, transmit, or transfer information.
Examples of the processor readable medium may include an electronic circuit, a semiconductor memory device, a read only memory (ROM), a flash memory, an erasable ROM (EROM), a floppy diskette, a compact disk (CD) ROM, an optical disk, a hard disk, a fiber optic medium, a radio frequency (RF) link, etc. The computer data signal includes any signal that may propagate over a transmission medium such as electronic network channels, optical fibers, air, electromagnetic, RF links, etc. The code segments may be downloaded via computer networks such as the Internet, Intranet, etc. The machine accessible medium may be embodied in an article of manufacture. The machine accessible medium may include data that, when accessed by a machine, may cause the machine to perform the operation described in the following. The term “data” here refers to any type of information that may be encoded for machine-readable purposes. Therefore, it may include program, code, data, file, etc.
All or part of an embodiment of the invention may be implemented by software. The software may have several modules coupled to one another. A software module may be coupled to another module to receive variables, parameters, arguments, pointers, etc. and/or to generate or pass results, updated variables, pointers, etc. A software module may also be a software driver or interface to interact with the operating system running on the platform. A software module may also be a hardware driver to configure, set up, initialize, send and receive data to and from a hardware device.
One embodiment of the invention may be described as a process which is usually depicted as a flowchart, a flow diagram, a structure diagram, or a block diagram. Although a block diagram may describe the operations as a sequential process, many of the operations may be performed in parallel or concurrently. In addition, the order of the operations may be re-arranged. A process may be terminated when its operations are completed. A process may correspond to a method, a program, a procedure, etc.
Overview
Embodiments of the present invention provide a method and an apparatus for calibrating multichannel surround sound systems and listener position estimation with minimal user interaction. The apparatus includes a microphone array integrated with an anchoring component of the surround sound system, which is placed at a predictable position. For example, the anchoring component can be a soundbar, a front speaker, or an A/V receiver centrally positioned directly above or below a video screen or TV. The microphone array is positioned inside or on top of the enclosure of the anchoring component such that it is facing other satellite loudspeakers of the surround sound system. The distance and angle of each satellite loudspeaker relative to the microphone array can be estimated by analyzing the inter-microphone gains and delays obtained from test signals. The estimated satellite loudspeaker positions can then be used for spatial calibration of the surround sound system to improve listening experience even if the loudspeakers are not arranged in a standard surround sound layout.
Furthermore, the microphone array may help locate a listener by ‘listening’ to his or her voice or other sound cues and analyzing the inter-microphone gains and delays. The listener position can be used to adapt the sweet spot for the surround sound system or other spatial audio enhancements (e.g. stereo widening). Another application of the integrated microphone array is to measure background noise for adaptive noise compensation. Based on the analysis of the environmental noise, system volume can be automatically turned up or down to compensate for background noises. In another example, the microphone array may be used to measure the “liveness” or diffuseness of the playback environment. The diffuseness measurement can help choosing proper post-processing for sound signals in order to maximize a sense of envelopment during playback. In addition to audio applications, the integrated microphone array can also be used as voice input devices for various other applications, such as VOIP and voice controlled user interfaces.
The advent of DVD, Blu-ray and streaming content has led to the availability of multichannel soundtracks as standard. However, most modern surround sound formats specify ideal loudspeaker placement to properly reproduce such content. Typical consumers that own surround sound systems often cannot comply with such specifications to set up loudspeakers due to practical reasons, such as room layout or furniture placement. This often results in a mismatch between the content producer's intent and the consumer's spatial audio experience. For example, it is the best practice to place loudspeakers along a recommended arrangement circle 130 and for the listener to sit at a sweet spot 121 in the center of the circle as shown in
One solution for such a problem, generally known as spatial calibration, typically requires a user to place a microphone array at the default listening position (or sweet spot). By approximating the location of each loudspeaker, the system can spatially reformat a multichannel soundtrack to the actual speaker layout. However, this calibration process can be intimidating or inconvenient for a typical consumer. Another approach for spatial calibration is to install a microphone at each loudspeaker, which can be very expensive. Besides, when a listener is moving away from the sweet spot, existing methods have no way to detect this change and the listener has to go through the entire calibration process manually by putting the microphone at the new listening position. In contrast, using the integrated microphone array 114 in the soundbar 110, the calibration engine 116 can perform spatial calibration for loudspeakers as well as estimate listener's position with minimal user intervention. Since the listener position is estimated automatically, listening experience can be improved dynamically even when the listener changes position often. The listener can simply give a voice command and recalibration will be performed by the system.
Note that
Computer Architecture
Computer 200 is such an example for use as the calibration engine 116 in the example room environment 100 for calibrating multichannel surround sound systems including listener position estimation shown in
Processor 210 includes one or more central processing units (CPUs), graphics processing units (GPUs), digital signal processors (DSPs), application specific integrated circuits (ASICs), radio-frequency integrated circuits (RFICs), or any combination of these. Storage unit 230 comprises a non-transitory computer-readable storage medium 232, including a solid-state memory device, a hard drive, an optical disk, or a magnetic tape. The instructions 235 may also reside, completely or at least partially, within memory 220 or within processor 210's cache memory during execution thereof by computer 200, memory 220 and processor 210 also constituting computer-readable storage media. Instructions 235 may be transmitted or received over network 140 via network interface 260.
Input devices 250 include a keyboard, mouse, track ball, or other type of alphanumeric and pointing devices that can be used to input data into computer 200. The graphics adapter 212 displays images and other information on one or more display devices, such as monitors and projectors (not shown). The network adapter 260 couples the computer 200 to a network, for example, network 140. Some embodiments of the computer 200 have different and/or other components than those shown in
Calibration Engine
The inclusion of the microphone array 114 placed around the midpoint of the sound bar 110 is all that necessary for the calibration engine 116 to estimate each surround loudspeaker's position relative to the soundbar. Since the soundbar is usually predictably placed directly above or below the video screen (or TV), the geometry of the measured distance and incident angle can be translated to an absolute position relative to any point in front of that reference soundbar location using simple trigonometric principals.
Generally, a multi-element microphone array with two or more microphones integrated in an anchoring speaker or receiver (e.g., soundbar 110) is capable of measuring incident wave fronts from many directions, especially in the front plane. A two-element (stereo) microphone array is capable of determining two-dimensional positions of left and right satellite loudspeaker within a 180 degree ‘field of view’ without ambiguity. The position of a loudspeaker thus determined includes a distance and an angle between the loudspeaker and the integrated microphone array. For localization of a listener in front of it, a microphone array with at least three elements can be used to determine the distance and angle between the listener and the microphone array. In order to determine spatial information in three dimension, one more microphone has to be added to the microphone array for estimating both the loudspeaker and listener positions due to the extra height axis.
In one embodiment, the integrated microphone array may be mounted inside the enclosure of the anchoring component, such as a soundbar, a front speaker or an A/V receiver. Alternatively or in addition, the microphone array may be mounted in other fixed relationships to the anchoring component, such as at the top or bottom, on the left or right side, to the front or back of the enclosure.
In other embodiments, the microphone array integrated in an anchoring component (e.g., soundbar, front channel speakers, or the A/V receiver) of the surround sound system may include different numbers of microphones, and have different configurations other than linear or triangle arrays shown in
The calibration engine 116 controls the process of loudspeaker and listener position estimations and spatial calibration of the multichannel surround sound systems.
The calibration request receiver 410 receives requests from users or listeners of the surround sound systems to perform positions estimation and spatial calibration. The calibration requests may come from button pressing events on a remote, menu item selections on a video or TV screen, or voice commands picked up by the microphone array 114, among other means. After receiving a calibration request 405, the calibration request receiver 410 may determine whether to estimate positions of the loudspeakers, position of the listener, or both before passing the request to the position estimator 430. The calibration request receiver 410 may also update the calibration log 420 with information, such as date and time of the received request 405 and tasks requested.
The position estimator 430 estimates the distance and angle of a loudspeaker relative to the microphone array based on test signals 432 played by the loudspeaker and measurements 434 received at the microphone array.
In one embodiment, the distance between a loudspeaker and a microphone is estimated by playing a test signal and measuring the time of flight (TOF) between the emitting loudspeaker and the receiving microphone. The time delay of the direct component of a measured impulse response can be used for this purpose. The direct component represents the sound signals that travel directly from the emitting loudspeaker to the receiving microphone without any reflections. The impulse response between the loudspeaker and a microphone array element can be obtained by playing a test signal through the loudspeaker under analysis. Test signal choices include a maximum length sequence (MLS), a chirp signal, also known as the logarithmic sine sweep (LSS) signal, or other test tones. The room impulse response can be obtained, for example, by calculating a circular cross-correlation between the captured signal and the MLS input.
The MLS test signals captured by a stereo microphone array including two microphone elements can be used to estimate the angle θ of the loudspeaker 108. In one embodiment, the angle is calculated based on one of the most commonly used methods for sound source localization called time-delay of arrival (TDOA) estimation and a common solution to the TDOA, the generalized cross correlation (GCC) solution is represented as:
where τ is an estimate of the TDOA between the two microphone elements, X1(ω) and X2(ω) are the Fourier transforms of the signals captured by the two microphone elements, and W(ω) is a weighting function.
In GCC-based TDOA estimation, various weighting functions can be adopted, including the maximum likelihood (ML) weighting function and phase transform based weighting function (GCC-PHAT). The GCC-PHAT weighting function is defined as
The GCC-PHAT method utilizes the phase information exclusively and is found to be more robust in reverberant environments. An alternative weighting function for GCC is the smoothed coherence transform (GCC-SCOT), which can be expressed as
where PX
Assume that the distance between two microphones is dm (in meter), the angle θ of the loudspeaker (in radians) can be estimated as
where C is the speed of sound in air, which is approximately 342 m/s, and τ is the estimated time delay. Based on the estimated distance d and angle θ, the position estimator 430 can compute the coordinates of the loudspeaker using trignometry.
In testing the performance of the loudspeaker position estimation, simulations have been conducted, in which a test input with source direction changing from 70 to 110 degrees with one degree increment is generated. Sampling rate of the signals was set to 48 kHz. The distance between the two microphone elements was set to 7.5 cm. To avoid spatial aliasing, the maximum frequency processed was limited to be less than 2.3 KHz.
In various embodiments, to increase the robustness of the estimation methods, a histogram of all the possible TDOA estimates can be used to select the most likely TDOA in a specified time interval. The average of the interpolated output for the chosen TDOA candidate can then be used to further increase the accuracy of the TDOA estimate. Experiments conducted in a typical office environment with a GCC-SCOT weighting function prove that the algorithm can reliably estimate a loudspeaker's distance and angle. The average error in loudspeaker distance estimation is less than three centimeters.
Most spatial calibration systems require the use of a multi-element microphone placed at an assumed listening position. In practice, a listener often listens to the surround sound system away from the measured listening position. As a result, the listening experience degrades significantly for the listener as the surround system may have reformatted the original content assuming the originally measured position. To correct this, typical calibration systems require the listener to go through another calibration measurement at the new listening position. This is not necessary for the calibration engine 116 since the position estimator 430 can detect a listener's actual listening position using the integrated microphone array 114 without going through the recalibration.
In one embodiment, to ensure that the listener's position is detected only when intended, a key phrase detection can be configured to trigger the listener position estimation process. For example, a listener can say a key phase such as “DTS Speaker” to activate the process. Other sound cues made by the listener can also be used as input signal to the position estimator 430 for listener position estimation.
Existing methods for microphone array based sound source localization include TDOA based estimation and steered response power (SRP) based estimation. While these methods can be used to localize sound source in three dimensions, it is assumed that the microphone array and the sound source (i.e., the listener) having the same height in the following descriptions for clarity purpose. That is, only two-dimensional sound source localization is described, three-dimensional listener position can be estimated using similar techniques.
In one embodiment, the position estimator 430 adopts the TDOA-based sound source localization for estimating the listener position.
where dij is the distance difference between microphone Mi and Mj relative to the sound source (i.e., the listener 120), and dij=Cτij, where τij is the TDOA between microphone Mi and Mj and C is the speed of sound in air.
Alternatively, a steered response power (SRP) based estimation algorithm can be implemented by the position estimator 430 to localize the listener's position. In SRP, the output power of a filter-and-sum beamformer, such as a simple delay and sum beamformer, is calculated for all possible sound source locations. The position that yields the maximum power is selected as the sound source position. For example, an SRP phase transform (SRP-PHAT) can be computed as the sum of the GCC for all possible pairs of the microphones expressed in
where τl and τk are the delays from the source location to microphones Ml and Mk, respectively, and Wlk is a filter weight defined as
The SRP-PHAT method can also be applied to three-dimensional sound source localization as well as two-dimensional sound source localization.
Tests have been conducted in a typical office environment similar to the room environment 100 to evaluate the performances of the TDOA-based method and SRP-PHAT method.
Referring back to
In one embodiment, the spatial calibrator 440 adjusts the delay and gain of multichannel audio signals sent to each loudspeaker based on the derived distances from each loudspeaker to the listener. Assume that the distance from the ith loudspeaker to the listener is di, and the maximum distance among di is dmax. The spatial calibrator 440 applies a compensating delay (in samples) to all loudspeakers closer to the listener using the following equation:
where Rs is the sampling rate of the audio signals and C is the speed of sound in air. In addition, since sound pressure at the listening position is in general inversely proportional to the squared distance between the loudspeaker and the listener. Therefore, the sound level (in dB) can be adjusted for the ith loudspeaker based on the distance differences by:
In addition to the above described adjustments to delay and gain, the spatial calibrator 440 can also reformat the spatial information on the actual layout. For instance, the right surround speaker 108 shown in
There exists a variety of techniques for panning a sound source, such as vector base amplitude palming (VBAP), distance-based amplitude panning (DBAP), and Ambisonics. In VBAP, all the loudspeakers are assumed to be positioned approximately the same distance away from the listener. A sound source is rendered using either two loudspeakers for two-dimensional panning, or three loudspeakers for three-dimensional panning. On the other hand, DBAP has no restrictions on the number of loudspeakers and renders the sound source based on the distances between the loudspeakers and the sound source. The gain for each loudspeaker is calculated independent of the listener's position. If the listening position is known, the performance of DBAP can be improved by adjusting the delays so that the sound from each loudspeaker arrives at the listener at the same time.
In one embodiment, the spatial calibrator 440 applies spatial correction to loudspeakers that are not placed at the right angles for channel-based audio content by using the sound panning techniques to create virtual speakers (or phantom sources) at recommended positions with the correct angles based off the actual speaker layout. For example, in the room environment shown in
In one embodiment, the spatial calibrator 440 provides spatial correction for rendering object-based audio content based on the actual positions of the loudspeakers and the listener. Audio objects are created by associating sound sources with position information, such as location, velocity and the like. Position and trajectory information of audio objects can be defined using two or three dimensional coordinates. Using the actual positions of the loudspeaker and listener, the spatial calibrator 440 can determine which loudspeaker or loudspeakers are used for playing back objects' audio.
When the listener 120 moves away from the sweet spot 121, the calibration problem can be treated as if most loudspeakers in the surround sound system have moved away from the recommended positions. Obviously, the listening experience will be significantly degraded without applying any spatial calibration. For instance, when the soundbar 110 is active, the listener 120 at his or her current position may think the signal only comes from the left element of the speaker array 112 due to distance differences. The delays and gains from all the loudspeakers need to be adjusted. In one embodiment, when the listener 120 changes his or her position, the spatial calibrator 440 uses the new listener position as the new sweet spot, and applies the spatial correction based on each loudspeaker's angular position. In addition to the spatial correction, the spatial calibrator 440 also readjusts the delays and gains for all the loudspeakers.
Tests have been conducted in a listening room similar to the room environment 100 shown in
After the spatial calibrator 440 performs the delay and gain adjustments and spatial correction, the positions and calibration information can be cached and/or recorded in the calibration log 420 for further reference. For example, if a new calibration request 405 is received and the position estimator 430 determines that the positions of the loudspeakers have not changed or the changes are below a predetermined threshold, the spatial calibrator 440 may simply update the calibration log 420 and skip the recalibration process in response to the insignificant position changes. If it is determined that any newly estimated positions match a previous calibration record, the spatial calibrator 440 can conveniently retrieves the previous record from the calibration log 420 and applies the same spatial calibration. In case a recalibration is indeed required, the spatial calibrator 440 may consult the calibration log 420 to determine whether to perform partial or incremental adjustment or full recalibration depending on the calibration history and/or significance of the changes.
Next, the calibration system determines 704 whether to estimate the positions of the loudspeakers in the surround sound system. In one embodiment, the calibration system may have a default configuration for this estimation requirement. For example, estimation is required for initial system setup and not required for recalibration. Alternatively or in addition, the received calibration request may explicitly specify whether or not to perform position estimations to override the default configuration. The calibration request may optionally allow the listener to identify which loudspeaker or loudspeakers have been repositioned, thus require position estimation. If so determined, the calibration system continues to perform position estimation for at least one loudspeaker.
For each of the one or more loudspeakers of which positions to be estimated, the calibration system plays 706 a test signal, and measures 708 the test signal through the integrated microphone array. Based on the measurement, the calibration system estimates 710 the distance and angle of the loudspeaker relative to the microphone array. As described above, the test signal can be a chirp or a MLS signal, and the distance and angle can be estimated using a variety of existing algorithms, such as TDOA and GCC.
After each of the requested loudspeaker positions has been computed, or none estimation is required, the calibration system determines 710 whether to estimate the listener's position. Similarly, the listener position estimation may be required for initial setup and/or triggered by changes in the listening position. If the calibration system determines that listener position estimation is to be performed, it measures 712 the sound received by the microphone array from the listener. The sound for position estimation can be the same voice command that invokes the listener position estimation or any other sound cues from the listener. The calibration system then estimates 714 the distance and angle of the listener position relative to the microphone array. Example estimation methods include TDOA and SRP.
After the listener's position has been computed, or no estimation of the listener position is required, the calibration system performs 716 spatial calibration based on updated or previously estimated position information of the loudspeakers and the listener. The spatial calibrations include, but not limited to, adjusting the delay and gain of the signal for each loudspeaker, spatial correction, and accurate sound panning.
In conclusion, embodiments of the present invention provide a system and a method for spatial calibrating surround sound systems. The calibration system utilizes a microphone array integrated into a component of the surround sound system, such as a center speaker or a soundbar. The integrated microphone array eliminates the need for a listener to manually position the microphone at the assumed listening position. In addition, the calibration system is able to detect the listener's position through his or her voice input. Test results show that the calibration system is capable of detecting accurately the positions of the loudspeakers and the listener. Based on the estimated loudspeaker positions, the system can render a sound source position more accurately. For channel based input, the calibration system can also perform spatial correction to correct spatial errors due to imperfect loudspeaker setup.
The particulars shown herein are by way of example and for purposes of illustrative discussion of the embodiments of the present invention only, and are presented in the case of providing what is believed to be the most useful and readily understood description of the principles and conceptual aspects of the present invention. In this regard, no attempt is made to show particulars of the present invention in more detail than necessary for the fundamental understanding of the present invention, the description taken with the drawings make apparent to those skilled in the art how the several forms of the present invention may be embodied in practice.
This application claims the benefit of U.S. Provisional Application No. 61/846,478, filed on Jul. 15, 2013, which is incorporated by reference in its entirety.
Number | Name | Date | Kind |
---|---|---|---|
5666424 | Fosgate et al. | Sep 1997 | A |
6741273 | Waters et al. | May 2004 | B1 |
7095455 | Jordan et al. | Aug 2006 | B2 |
7123731 | Cohen et al. | Oct 2006 | B2 |
7158643 | Lavoie et al. | Jan 2007 | B2 |
7676044 | Sasaki et al. | Mar 2010 | B2 |
7769183 | Bharitkar et al. | Aug 2010 | B2 |
8204248 | Haulick et al. | Jun 2012 | B2 |
8279709 | Choisel et al. | Oct 2012 | B2 |
8472632 | Riedel et al. | Jun 2013 | B2 |
8577048 | Chaikin et al. | Nov 2013 | B2 |
20050281411 | Vesely et al. | Dec 2005 | A1 |
20070263889 | Melanson | Nov 2007 | A1 |
20100290643 | Mihelich et al. | Nov 2010 | A1 |
20120075957 | De Bruijn | Mar 2012 | A1 |
20120101610 | Ojala et al. | Apr 2012 | A1 |
20120106747 | Crockett et al. | May 2012 | A1 |
20120114151 | Nguyen et al. | May 2012 | A1 |
20120288124 | Fejzo et al. | Nov 2012 | A1 |
20130064042 | Aarts et al. | Mar 2013 | A1 |
20130208898 | Vincent et al. | Aug 2013 | A1 |
20140119552 | Beaucoup | May 2014 | A1 |
Number | Date | Country |
---|---|---|
WO2012154823 | Nov 2012 | WO |
Entry |
---|
Guangji Shi, Martin Walsh, and Edward Stein, “Spatial Calibration of Surround Sound Systems Including Listener Position Estimation,” Audio Engineering Society Convention Paper, Oct. 9-12, 2014 Los Angeles, USA. |
Web page print out on Aug. 29, 2014: “Yamaha Digital Sound Projector, Model #YSP-5100,” taken from url: http://usa.yamaha.com/products/audio-visual/hometheater-systems/digital-sound-projector/ysp-5100—black—u/?mode=model#psort=latest&mode=paging. |
International Search Report and Written Opinion in corresponding PCT Application No. PCT/US2014/046738, mailed Nov. 6, 2014. |
Yamaha Product YSP-5100 Digital Sound Projector print out obtained from http://usa.yamaha.com/products/audio-visual/hometheater-systems/digital-sound-projector/ysp-5100 black < u/?mode=model#psort=latest&mode=paging. |
International Preliminary Report on Patentability in corresponding PCT Application No. PCT/US2014/046738, mailed Aug. 19, 2015, 18 pages. |
Number | Date | Country | |
---|---|---|---|
20150016642 A1 | Jan 2015 | US |
Number | Date | Country | |
---|---|---|---|
61846478 | Jul 2013 | US |