The invention relates to systems and methods for monitoring audio playback systems, e.g., to monitor status of loudspeakers of an audio playback system and/or to monitor reactions of an audience to an audio program played back by an audio playback system. Typical embodiments are systems and methods for monitoring cinema (movie theater) environments (e.g., to monitor status of loudspeakers employed to render an audio program in such an environment and/or to monitor reactions of an audience to an audiovisual program played back in such an environment).
Typically, during an initial alignment process (in which a set of speakers of an audio playback system is initially calibrated), pink noise (or another stimulus such as a sweep or pseudo-random noise sequence) is played through each speaker of the system and captured by a microphone. The pink noise (or other stimulus), as emitted from each speaker and captured by a “signature” microphone placed on a sidewall/ceiling/in-room, is typically stored for use during subsequent maintenance checks (quality checks). Such a subsequent maintenance check is conventionally performed in the playback system environment (which may be a movie theater) by exhibitor staff when no audience is present, using pink noise rendered through a predetermined sequence of the speakers (whose status is to be monitored) during the check. During the maintenance check, for each speaker sequenced in the playback environment, the microphone captures the pink noise emitted by the loudspeaker, and the maintenance system identifies any difference between the initially measured pink noise (emitted from the speaker and captured during the alignment process) and the pink noise measured during the maintenance check. This can be indicative of a change in the set of speakers that has occurred since the initial alignment, such as damage to an individual driver (e.g., woofer, mid-range, or tweeter) in one of the speakers, or a change in a speaker output spectrum (relative to an output spectrum determined in the initial alignment), or a change in polarity of the output of one of the speakers, relative to a polarity determined in the initial alignment (e.g., due to replacement of a speaker). The system can also use loudspeaker-room responses deconvolved from pink-noise measurements for analysis. Additional modifications include gating or windowing the time-response to analyze the direct sound of the loudspeaker.
However, there are several limitations and disadvantages of such a conventionally implemented maintenance check, including the following: (i) it is time-consuming to run pink noise individually and sequentially through a theater's loudspeakers, and to de-convolve each corresponding loudspeaker-room impulse response from each microphone (typically located on a wall of the theater), especially since a movie theater may have as many as 26 (or more) loudspeakers; and (ii) performing the maintenance check does not aid in promoting the theater's audiovisual system format directly to an audience in the theater.
In some embodiments, the invention is a method for monitoring loudspeakers within an audio playback system (e.g., movie theater) environment. In a typical embodiment in this class, the monitoring method assumes that initial characteristics of the speakers (e.g., a room response for each of the speakers) have been determined at an initial time, and relies on one or more microphones positioned (e.g., on a side wall) within the environment to perform a maintenance check (sometimes referred to herein as a quality check or “QC” or status check) on each of the loudspeakers in the environment to identify whether a change to at least one characteristic of any of the loudspeakers has occurred since the initial time (e.g., since an initial alignment or calibration of the playback system). The status check can be performed periodically (e.g., daily).
In a class of embodiments, trailer-based loudspeaker quality checks (QCs) are performed on the individual loudspeakers of a theater's audio playback system during playback of an audiovisual program (e.g., a movie trailer or other entertaining audiovisual program) to an audience (e.g., before a movie is played to the audience). Since it is contemplated that the audiovisual program is typically a movie trailer, it will often be referred to herein as a “trailer.” In one embodiment, the quality check identifies (for each loudspeaker of the playback system) any difference between a template signal (e.g., a measured initial signal captured by a microphone in response to playback of the trailer's soundtrack by the speaker at an initial time, e.g., during a speaker calibration or alignment process), and a measured signal (sometimes referred to herein as a status signal or “QC” signal) captured by the microphone in response to playback (by the speakers of the playback system) of the trailer's soundtrack during the quality check. In another embodiment, typical loudspeaker-room responses are obtained during the initial calibration step for theater equalization. The trailer signal is then filtered in a processor by the loudspeaker-room responses (which may in turn be filtered with the equalization filter), and summed with another appropriate loudspeaker-room equalized response filtering a corresponding trailer signal. The resulting signal at the output then forms the template signal. The template signal is compared against the captured signal (called the status signal in the following text) when the trailer is rendered in the presence of an audience.
When the trailer includes subject matter which promotes the format of the theater's audiovisual system, a further advantage (to the entity which sells and/or licenses the audiovisual system, as well as to the theater owner) of using such trailer-based loudspeaker QC monitoring is that it incentivizes theater owners to play the trailer to facilitate performance of the quality check while simultaneously providing a significant benefit of promoting (e.g., marketing, and/or increasing audience awareness of) the audiovisual system format.
Typical embodiments of the inventive, trailer-based, loudspeaker quality check method extract individual loudspeaker characteristics from a status signal captured by a microphone during playback of the trailer by all speakers of a playback system during a status check (sometimes referred to herein as a quality check or QC). In typical embodiments, the status signal obtained during the status check is essentially a linear combination of all the room-response convolved loudspeaker output signals (one for each of the loudspeakers which emits sound during playback of the trailer during the status check) at the microphone. Any failure mode detected by the QC by processing of the status signal is typically conveyed to the theater owner and/or used by a decoder of the theater's audio playback system to change a rendering mode in case of loudspeaker failure.
In some embodiments, the inventive method includes a step of employing a source separation algorithm, a pattern matching algorithm, and/or unique fingerprint extraction from each loudspeaker, to obtain a processed version of the status signal which is indicative of sound emitted from an individual one of the loudspeakers (rather than a linear combination of all the room-response convolved loudspeaker output signals). Typical embodiments, however, implement a cross-correlation/PSD (power spectral density) based approach to monitor status of each individual speaker in the playback environment from a status signal indicative of sound emitted from all the speakers in the environment (without employing a source separation algorithm, a pattern matching algorithm, or unique fingerprint extraction from each speaker).
The inventive method can be performed in home environments as well as in cinema environments, e.g., with the required signal processing of microphone output signals being performed in a home theater device (e.g., an AVR or Blu-ray player that is shipped to the user with the microphone to be employed to perform the method).
Typical embodiments of the invention implement a cross-correlation/power spectral density (PSD) based approach to monitor status of each individual speaker in the playback environment (which is typically a movie theater) from a status signal which is a microphone output signal indicative of sound captured during playback (by all the speakers in the environment) of an audiovisual program. The audiovisual program will be referred to below as a trailer, since it is typically a movie trailer. For example, a class of embodiments of the inventive method includes the steps of:
(a) playing back a trailer whose soundtrack has N channels (which may be speaker channels or object channels), where N is a positive integer (e.g., an integer greater than one), including by emitting sound, determined by the trailer, from a set of N speakers positioned in the playback environment in response to driving each of the speakers with a speaker feed for a different one of the channels of the soundtrack. Typically, the trailer is played back in the presence of an audience in a movie theater;
(b) obtaining audio data indicative of a status signal captured by each microphone of a set of M microphones in the playback environment during emission of the sound in step (a), where M is a positive integer (e.g., M=1 or 2). In typical implementations, the status signal for each microphone is the analog output signal of the microphone during step (a), and the audio data indicative of the status signal are generated by sampling the output signal. Preferably, the audio data are organized into frames having a frame size adequate to obtain sufficient low frequency resolution, and the frame size is preferably sufficient to ensure the presence of content from all channels of the soundtrack in each frame; and
(c) processing the audio data to perform a status check on each speaker of the set of N speakers, including by comparing (e.g., identifying whether a significant difference exists between), for each said speaker and each of at least one microphone in the set of M microphones, the status signal captured by the microphone (said status signal being determined by the audio data obtained in step (b)) and a template signal, wherein the template signal is indicative (e.g., representative) of response of a template microphone to playback by the speaker, in the playback environment at an initial time, of a channel of the soundtrack corresponding to said speaker. Alternatively, the template signal (representing the response at a signature microphone or microphones) can be computed in a processor with a-priori knowledge of the loudspeaker-room responses (equalized or unequalized) from the loudspeaker to the corresponding signature microphone(s). The template microphone is positioned, at the initial time, at at least substantially the same position in the environment as is a corresponding microphone of the set during step (b). Preferably, the template microphone is the corresponding microphone of the set, and is positioned, at the initial time, at the same position in the environment as is said corresponding microphone during step (b). The initial time is a time before performance of step (b), and the template signal for each speaker is typically predetermined in a preliminary operation (e.g., a preliminary speaker alignment process), or is generated before (or during) step (b) from a predetermined room response for the corresponding speaker-microphone pair and the trailer soundtrack.
Step (c) preferably includes an operation of determining a cross-correlation (for each speaker and microphone) of the template signal for said speaker and microphone (or a bandpass filtered version of said template signal) with the status signal for said microphone (or a bandpass filtered version thereof), and identifying a difference (if any significant difference exists) between the template signal and the status signal from a frequency domain representation (e.g., power spectrum) of the cross-correlation. In typical embodiments, step (c) includes an operation (for each speaker and microphone) of applying a bandpass filter to the template signal (for the speaker and microphone) and the status signal (for the microphone), and determining (for each microphone) a cross-correlation of each bandpass filtered template signal for the microphone with the bandpass filtered status signal for the microphone, and identifying a difference (if any significant difference exists) between the template signal and the status signal from a frequency domain representation (e.g., power spectrum) of the cross-correlation.
This class of embodiments of the method assumes knowledge of the room responses of the loudspeakers (typically obtained during a preliminary operation, e.g., a speaker alignment or calibration operation) and knowledge of the trailer soundtrack. To determine the template signal employed in step (c) for each speaker-microphone pair, the following steps may be performed. The room response (impulse response) of each speaker is determined (e.g., during a preliminary operation) by measuring sound emitted from the speaker with the microphone positioned in the same environment (e.g., room) as the speaker. Then, each channel signal of the trailer soundtrack is convolved with the corresponding impulse response (the impulse response of the speaker which is driven by the speaker feed for the channel) to determine the template signal (for the microphone) for the channel. The template signal (template) for each speaker-microphone pair is a simulated version of the microphone output signal to be expected at the microphone during performance of the monitoring (quality check) method with the speaker emitting sound determined by the corresponding channel of the trailer soundtrack.
Alternatively, the following steps may be performed to determine each template signal employed in step (c) for each speaker-microphone pair. Each speaker is driven by the speaker feed for the corresponding channel of the trailer soundtrack, and the resulting sound is measured (e.g., during a preliminary operation) with the microphone positioned in the same environment (e.g., room) as the speaker. The microphone output signal for each speaker is the template signal for the speaker (and corresponding microphone), and is a template in the sense that it is the output signal to be expected at the microphone during performance of the monitoring (quality check) method with the speaker emitting sound determined by the corresponding channel of the trailer soundtrack.
For each speaker-microphone pair, any significant difference between the template signal for the speaker (which is either a measured or a simulated template), and a measured status signal captured by the microphone in response to the trailer soundtrack during performance of the inventive monitoring method, is indicative of an unexpected change in the loudspeaker's characteristics.
Typical embodiments of the invention monitor the transfer function applied by each loudspeaker to the speaker feed for a channel of an audiovisual program (e.g., a movie trailer) as measured by capturing sound emitted from the loudspeaker using a microphone, and flag when changes occur. Since a typical trailer does not cause only one loudspeaker at a time active sufficiently long to make a transfer function measurement, some embodiments of the invention employ cross correlation averaging methods to separate the transfer function of each loudspeaker from that of the other loudspeakers in the playback environment. For example, in one such embodiment the inventive method includes steps of: obtaining audio data indicative of a status signal captured by a microphone (e.g., in a movie theater) during playback of a trailer; and processing the audio data to perform a status check on the speakers employed to render the trailer, including by, for each of the speakers, comparing (including by implementing cross correlation averaging) a template signal indicative of response of the microphone to playback of a corresponding channel of the trailer's soundtrack by the speaker at an initial time, and the status signal determined by the audio data. The step of comparing typically includes identifying a difference, if any significant difference exists, between the template signal and the status signal. The cross correlation averaging (during the step of processing the audio data) typically includes steps of determining a sequence of cross-correlations (for each speaker) of the template signal for said speaker and the microphone (or a bandpass filtered version of said template signal) with the status signal for said microphone (or a bandpass filtered version of the status signal), where each of the cross-correlations is a cross-correlation of a segment (e.g., a frame or sequence of frames) of the template signal for said speaker and the microphone (or a bandpass filtered version of said segment) with a corresponding segment (e.g., a frame or sequence of frames) of the status signal for said microphone (or a bandpass filtered version of said segment), and identifying a difference (if any significant difference exists) between the template signal and the status signal from an average of the cross-correlations.
In another class of embodiments, the inventive method processes data indicative of the output of at least one microphone to monitor audience reaction (e.g., laughter or applause) to an audiovisual program (e.g., a movie played in a movie theater), and provides the resulting output data (indicative of audience reaction) to interested parties (e.g., studios) as a service (e.g., via a web connected d-cinema server). The output data can inform a studio that a comedy is doing well based on how often and how loud the audience laughs or how a serious film is doing based on whether audience members applaud at the end. The method can provide geographically based feedback (e.g., to studios) which may be used to direct advertising for promotion of a movie.
Typical embodiments in this class implement the following key techniques: (i) separation of playback content (i.e., audio content of the program played back in the presence of the audience) from each audience signal captured by each microphone (during playback of the program in the presence of the audience). Such separation is typically implemented by a processor coupled to receive the output of each microphone; and (ii) content analysis and pattern classification techniques (also typically implemented by a processor coupled to receive the output of each microphone) to discriminate between different audience signals captured by the microphone(s). Separation of playback content from audience input can be achieved by performing a spectral subtraction (for example), where the difference is obtained between the measured signal at each microphone and a sum of filtered versions of the speaker feed signals delivered to the loudspeakers (with the filters being copies of equalized room responses of the speakers measured at the microphone). Thus, a simulated version of the signal expected to be received at the microphone in response to the program alone is subtracted from the actual signal received at the microphone in response to the combined program and audience signal. The filtering can be done with different sampling rates to get better resolution in specific frequency bands.
The pattern recognition can utilize supervised or unsupervised clustering/classification techniques.
Aspects of the invention include a system configured (e.g., programmed) to perform any embodiment of the inventive method, and a computer readable medium (e.g., a disc) which stores code for implementing any embodiment of the inventive method.
In some embodiments, the inventive system is or includes at least one microphone (each said microphone being positioned during operation of the system to perform an embodiment of the inventive method to capture sound emitted from a set of speakers to be monitored), and a processor coupled to receive a microphone output signal from each said microphone. Typically the sound is generated during playback of an audiovisual program (e.g., a movie trailer) in the presence of an audience in a room (e.g., a movie theater) by the speakers to be monitored. The processor can be a general or special purpose processor (e.g., an audio digital signal processor), and is programmed with software (or firmware) and/or otherwise configured to perform an embodiment of the inventive method in response to each said microphone output signal. In some embodiments, the inventive system is or includes a general purpose processor, coupled to receive input audio data (e.g., indicative of output of at least one microphone in response to sound emitted from a set of speakers to be monitored). Typically the sound is generated during playback of an audiovisual program (e.g., a movie trailer) in the presence of an audience in a room (e.g., a movie theater) by the speakers to be monitored. The processor is programmed (with appropriate software) to generate (by performing an embodiment of the inventive method) output data in response to the input audio data, such that the output data are indicative of status of the speakers.
Throughout this disclosure, including in the claims, the expression performing an operation “on” signals or data (e.g., filtering, scaling, or transforming the signals or data) is used in a broad sense to denote performing the operation directly on the signals or data, or on processed versions of the signals or data (e.g., on versions of the signals that have undergone preliminary filtering prior to performance of the operation thereon).
Throughout this disclosure including in the claims, the expression “system” is used in a broad sense to denote a device, system, or subsystem. For example, a subsystem that implements a decoder may be referred to as a decoder system, and a system including such a subsystem (e.g., a system that generates X output signals in response to multiple inputs, in which the subsystem generates M of the inputs and the other X−M inputs are received from an external source) may also be referred to as a decoder system.
Throughout this disclosure including in the claims, the following expressions have the following definitions:
speaker and loudspeaker are used synonymously to denote any sound-emitting transducer. This definition includes loudspeakers implemented as multiple transducers (e.g., woofer and tweeter);
speaker feed: an audio signal to be applied directly to a loudspeaker, or an audio signal that is to be applied to an amplifier and loudspeaker in series;
channel (or “audio channel”): a monophonic audio signal;
speaker channel (or “speaker-feed channel”): an audio channel that is associated with a named loudspeaker (at a desired or nominal position), or with a named speaker zone within a defined speaker configuration. A speaker channel is rendered in such a way as to be equivalent to application of the audio signal directly to the named loudspeaker (at the desired or nominal position) or to a speaker in the named speaker zone. The desired position can be static, as is typically the case with physical loudspeakers, or dynamic;
object channel an audio channel indicative of sound emitted by an audio source (sometimes referred to as an audio “object”). Typically, an object channel determines a parametric audio source description. The source description may determine sound emitted by the source (as a function of time), the apparent position (e.g., 3D spatial coordinates) of the source as a function of time, and optionally also other at least one additional parameter (e.g., apparent source size or width) characterizing the source;
audio program: a set of one or more audio channels and optionally also associated metadata that describes a desired spatial audio presentation;
render: the process of converting an audio program into one or more speaker feeds, or the process of converting an audio program into one or more speaker feeds and converting the speaker feed(s) to sound using one or more loudspeakers (in the latter case, the rendering is sometimes referred to herein as rendering “by” the loudspeaker(s)). An audio channel can be trivially rendered (“at” a desired position) by applying the signal directly to a physical loudspeaker at the desired position, or one or more audio channels can be rendered using one of a variety of virtualization (or upmixing) techniques designed to be substantially equivalent (for the listener) to such trivial rendering. In this latter case, each audio channel may be converted to one or more speaker feeds to be applied to loudspeaker(s) in known locations, which are in general (but may not be) different from the desired position, such that sound emitted by the loudspeaker(s) in response to the feed(s) will be perceived as emitting from the desired position. Examples of such virtualization techniques include binaural rendering via headphones (e.g., using Dolby Headphone processing which simulates up to 7.1 channels of surround sound for the headphone wearer) and wave field synthesis. Examples of such upmixing techniques include ones from Dolby (Pro-logic type) or others (e.g., Harman Logic 7, Audyssey DSX, DTS Neo, etc.);
azimuth (or azimuthal angle): the angle, in a horizontal plane, of a source relative to a listener/viewer. Typically, an azimuthal angle of 0 degrees denotes that the source is directly in front of the listener/viewer, and the azimuthal angle increases as the source moves in a counter clockwise direction around the listener/viewer;
elevation (or elevational angle): the angle, in a vertical plane, of a source relative to a listener/viewer. Typically, an elevational angle of 0 degrees denotes that the source is in the same horizontal plane as the listener/viewer, and the elevational angle increases as the source moves upward (in a range from 0 to 90 degrees) relative to the viewer;
L: Left front audio channel. A speaker channel, typically intended to be rendered by a speaker positioned at about 30 degrees azimuth, 0 degrees elevation;
C: Center front audio channel A speaker channel, typically intended to be rendered by a speaker positioned at about 0 degrees azimuth, 0 degrees elevation;
R: Right front audio channel A speaker channel, typically intended to be rendered by a speaker positioned at about −30 degrees azimuth, 0 degrees elevation;
Ls: Left surround audio channel. A speaker channel, typically intended to be rendered by a speaker positioned at about 110 degrees azimuth, 0 degrees elevation;
Rs: Right surround audio channel A speaker channel, typically intended to be rendered by a speaker positioned at about −110 degrees azimuth, 0 degrees elevation; and
Front Channels: speaker channels (of an audio program) associated with frontal sound stage. Typical front channels are L and R channels of stereo programs, or L, C and R channels of surround sound programs. Furthermore, the fronts could also involve other channels driving more loudspeakers (such as SDDS-type having five front loudspeakers), there could be loudspeakers associated with wide and height channels and surrounds firing as array mode or as discrete individual mode as well as overhead loudspeakers.
Many embodiments of the present invention are technologically possible. It will be apparent to those of ordinary skill in the art from the present disclosure how to implement them. Embodiments of the inventive system, medium, and method will be described with reference to
In some embodiments, the invention is a method for monitoring loudspeakers within an audio playback system (e.g., movie theater) environment. In a typical embodiment in this class, the monitoring method assumes that initial characteristics of the speakers (e.g., a room response for each of the speakers) have been determined at an initial time, and relies on one or more microphones positioned (e.g., on a side wall) within the environment to perform a maintenance check (sometimes referred to herein as a quality check or “QC” or status check) on each of the loudspeakers in the environment to identify whether one or more of the following events has occurred since the initial time: (i) at least one individual driver (e.g., woofer, mid-range, or tweeter) in any of the loudspeakers is damaged; (ii) there has been a change in a loudspeaker output spectrum (relative to an output spectrum determined in initial calibration of speakers in the environment); and (iii) there has been a change in polarity of the output of a loudspeaker (relative to a polarity determined in initial calibration of speakers in the environment), e.g., due to replacement of a speaker. The QC check can be performed periodically (e.g., daily).
In a class of embodiments, trailer-based loudspeaker quality checks (QCs) are performed on the individual loudspeakers of a theater's audio playback system during playback of an audiovisual program (e.g., a movie trailer or other entertaining audiovisual program) to an audience (e.g., before a movie is played to the audience). Since it is contemplated that the audiovisual program is typically a movie trailer, it will often be referred to herein as a “trailer.” The quality check identifies (for each loudspeaker of the playback system) any difference between a template signal (e.g., a measured initial signal captured by a microphone in response to playback of the trailer's soundtrack by the speaker during a speaker calibration or alignment process), and a measured status signal captured by the microphone in response to playback (by the speakers of the playback system) of the trailer's soundtrack during the quality check. When the trailer includes subject matter which promotes the format of the theater's audiovisual system, a further advantage (to the entity which sells and/or licenses the audiovisual system, as well as to the theater owner) of using such trailer-based loudspeaker QC monitoring is that it incentivizes theater owners to play the trailer to facilitate performance of the quality check while simultaneously providing a significant benefit of promoting (e.g., marketing, and/or increasing audience awareness of) the audiovisual system format.
Typical embodiments of the inventive, trailer-based, loudspeaker quality check method extract individual loudspeaker characteristics from a status signal captured by a microphone during playback of the trailer by all speakers of a playback system during a quality check. Although, in any embodiment of the invention, a microphone set comprising two or more microphones could be used (rather than a single microphone) to capture a status signal during a speaker quality check (e.g., by combining the output of individual microphones in the set to generate the status signal), for simplicity the term “microphone” is used herein (to describe and claim the invention) in a broad sense denoting either an individual microphone or a set of two or more microphones whose outputs are combined to determine a signal to be processed in accordance with an embodiment of the inventive method
In typical embodiments, the status signal obtained during the quality check is essentially a linear combination of all the room-response convolved loudspeaker output signals (one for each of the loudspeakers which emits sound during playback of the trailer during the QC) at the microphone. Any failure mode detected by the QC by processing of the status signal is typically conveyed to the theater owner and/or used by a decoder of the theater's audio playback system to change a rendering mode in case of loudspeaker failure.
In some embodiments, the inventive method includes a step of employing a source separation algorithm, a pattern matching algorithm, and/or unique fingerprint extraction from each loudspeaker, to obtain a processed version of the status signal which is indicative of sound emitted from an individual one of the loudspeakers (rather than a linear combination of all the room-response convolved loudspeaker output signals). Typical embodiments, however, implement a cross-correlation/PSD (power spectral density) based approach to monitor status of each individual speaker in the playback environment from a status signal indicative of sound emitted from all the speakers in the environment (without employing a source separation algorithm, a pattern matching algorithm, or unique fingerprint extraction from each speaker).
The inventive method can be performed in home environments as well as in cinema environments, e.g., with the required signal processing of microphone output signals being performed in a home theater device (e.g., an AVR or Blu-ray player that is shipped to the user with the microphone to be employed to perform the method).
Typical embodiments of the invention implement a cross-correlation/power spectral density (PSD) based approach to monitor status of each individual speaker in the playback environment (which is typically a movie theater) from a status signal which is a microphone output signal (sometimes referred to herein as a QC signal) indicative of sound captured during playback (by all the speakers in the environment) of an audiovisual program. The audiovisual program will be referred to below as a trailer, since it is typically a movie trailer. For example, a class of embodiments of the inventive method includes the steps of:
(a) playing back a trailer whose soundtrack has N channels, where N is a positive integer (e.g., an integer greater than one), including by emitting sound, determined by the trailer, from a set of N speakers positioned in the playback environment, with each of the speakers driven by a speaker feed for a different one of the channels of the soundtrack. Typically, the trailer is played back in the presence of an audience in a movie theater;
(b) obtaining audio data indicative of a status signal captured by each microphone of a set of M microphones in the playback environment during play of the trailer in step (a), where M is a positive integer (e.g., M=1 or 2). In typical implementations, the status signal for each microphone is the analog output signal of the microphone in response to play of the trailer during step (a), and the audio data indicative of the status signal are generated by sampling the output signal. Preferably, the audio data are organized into frames having a frame size adequate to obtain sufficient low frequency resolution, and the frame size is preferably sufficient to ensure the presence of content from all channels of the soundtrack in each frame; and
(c) processing the audio data to perform a status check on each speaker of the set of N speakers, including by comparing (e.g., identifying whether a significant difference exists between), for each said speaker and each of at least one microphone in the set of M microphones, the status signal captured by the microphone (said status signal being determined by the audio data obtained in step (b)) and a template signal, wherein the template signal is indicative (e.g., representative) of response of a template microphone to playback by the speaker, in the playback environment at an initial time, of a channel of the soundtrack corresponding to said speaker. The template microphone is positioned, at the initial time, at at least substantially the same position in the environment as is a corresponding microphone of the set during step (b). Preferably, the template microphone is the corresponding microphone of the set, and is positioned, at the initial time, at the same position in the environment as is said corresponding microphone during step (b). The initial time is a time before performance of step (b), and the template signal for each speaker is typically predetermined in a preliminary operation (e.g., a preliminary speaker alignment process), or is generated before (or during) step (b) from a predetermined room response for the corresponding speaker-microphone pair and the trailer soundtrack. Alternatively, the template signal (representing the response at a signature microphone or microphones) can be computed in a processor with a-priori knowledge of the loudspeaker-room responses (equalized or unequalized) from the loudspeaker to the corresponding signature microphone(s).
Step (c) preferably includes an operation of determining a cross-correlation (for each speaker and microphone) of the template signal for said speaker and microphone (or a bandpass filtered version of said template signal) with the status signal for said microphone (or a bandpass filtered version thereof), and identifying a difference (if any significant difference exists) between the template signal and the status signal from a frequency domain representation (e.g., power spectrum) of the cross-correlation. In typical embodiments, step (c) includes an operation (for each speaker and microphone) of applying a bandpass filter to the template signal (for the speaker and microphone) and the status signal (for the microphone), and determining (for each microphone) a cross-correlation of each bandpass filtered template signal for the microphone with the bandpass filtered status signal for the microphone, and identifying a difference (if any significant difference exists) between the template signal and the status signal from a frequency domain representation (e.g., power spectrum) of the cross-correlation.
This class of embodiments of the method assumes knowledge of the room responses of the loudspeakers (typically obtained during a preliminary operation, e.g., a speaker alignment or calibration operation) including any equalization or other filters, and knowledge of the trailer soundtrack. In addition knowledge of any other processing related to panning laws and other signals going to the speaker feeds is preferred so as to be modeled in a cinema processor to obtain a template signal at a signature microphone. To determine the template signal employed in step (c) for each speaker-microphone pair, the following steps may be performed. The room response (impulse response) of each speaker is determined (e.g., during a preliminary operation) by measuring sound emitted from the speaker with the microphone positioned in the same environment (e.g., room) as the speaker. Then, each channel signal of the trailer soundtrack is convolved with the corresponding impulse response (the impulse response of the speaker which is driven by the speaker feed for the channel) to determine the template signal (for the microphone) for the channel. The template signal (template) for each speaker-microphone pair is a simulated version of the microphone output signal to be expected at the microphone during performance of the monitoring (quality check) method with the speaker emitting sound determined by the corresponding channel of the trailer soundtrack.
Alternatively, the following steps may be performed to determine each template signal employed in step (c) for each speaker-microphone pair. Each speaker is driven by the speaker feed for the corresponding channel of the trailer soundtrack, and the resulting sound is measured (e.g., during a preliminary operation) with the microphone positioned in the same environment (e.g., room) as the speaker. The microphone output signal for each speaker is the template signal for the speaker (and corresponding microphone), and is a template in the sense that it is the output signal to be expected at the microphone during performance of the monitoring (quality check) method with the speaker emitting sound determined by the corresponding channel of the trailer soundtrack.
For each speaker-microphone pair, any significant difference between the template signal for the speaker (which is either a measured or a simulated template), and a measured status signal captured by the microphone in response to the trailer soundtrack during performance of the inventive monitoring method, is indicative of an unexpected change in the loudspeaker's characteristics.
We next describe an exemplary embodiment in more detail with reference to
In step 10 of
Then, in step 12 of
Then, in step 14 of
In step 20 of
Then, in step 22 of
Then, in step 24 of
Then, in step 26 of
In step 28, each cross-correlation PSD determined in step 26 is analyzed (e.g., plotted and analyzed) to determine any significant change (in the relevant frequency pass band) in at least one characteristic of any of the speakers (i.e., in any of the room responses that were preliminarily determined in step 10 of
An exemplary embodiment of the method described with reference to
The exemplary method includes the steps of:
(a) playing back a trailer whose soundtrack has three channels (L, C, and R), including by emitting sound determined by the trailer from the left channel speaker (the L speaker), the center channel speaker (the C speaker), and the right channel speaker (the R speaker), where each of the speakers is positioned in the movie theater, and the trailer is played back in the presence of an audience (identified as audience A in
(b) obtaining audio data indicative of a status signal captured by the microphone in the movie theater during playback of the trailer in step (a). The status signal is the analog output signal of the microphone during step (a), and the audio data indicative of the status signal are generated by sampling the output signal. The audio data are organized into frames having a frame size (e.g., a frame size of 16K, i.e., 16,384=(128)2 samples per frame) adequate to obtain sufficient low frequency resolution, and sufficient to ensure the presence of content from all three channels of the soundtrack in each frame; and
(c) processing the audio data to perform a status check on the L speaker, the C speaker, and the R speaker, including by identifying for each said speaker, a difference (if any significant difference exists) between: a template signal indicative of response of the microphone (the same microphone used in step (b), positioned at the same position as is the microphone in step (b), to play of a corresponding channel of the trailer's soundtrack by the speaker at an initial time, and the status signal determined by the audio data obtained in step (b). The “initial time” is a time before performance of step (b), and the template signal for each speaker is determined from a predetermined room response for each speaker-microphone pair and the trailer soundtrack.
In the exemplary embodiment, step (c) includes an operation of determining (for each speaker) a cross-correlation of a first bandpass filtered version of the template signal for said speaker with a first bandpass filtered version of the status signal, a cross-correlation of a second bandpass filtered version of the template signal for said speaker with a second bandpass filtered version of the status signal, and a cross-correlation of a third bandpass filtered version of the template signal for said speaker with a third bandpass filtered version of the status signal. A difference is identified (if any significant difference exists) between the state of each speaker (during performance of step (b)) and the speaker's state at the initial time, from a frequency domain representation of each of the nine cross-correlations. Alternatively, such difference (if any significant difference exists) is identified by otherwise analyzing the cross-correlations.
A damaged low-frequency driver of the L speaker (to be referred to sometimes as the “Channel 1” speaker) is simulated by applying an elliptic high pass filter (HPF), having cutoff frequency of fc=600 Hz and stop-band attenuation of 100 dB, to the speaker feed for the Channel 1 speaker during playback of the trailer during step (a). The speaker feeds for other two channels of the trailer soundtrack are not filtered by the elliptic HPF. This simulates damage only to the low-frequency driver of the Channel 1 speaker. The state of the C speaker (to be referred to sometimes as the “Channel 2” speaker) is assumed to be identical to its state at the initial time, and the state of the R speaker (to be referred to sometimes as the “Channel 3” speaker) is assumed to be identical to its state at the initial time.
The first bandpass filtered version of the template signal for each speaker is generated by filtering the template signal with a first bandpass filter, the first bandpass filtered version of the status signal is generated by filtering the status signal with the first bandpass filter, the second bandpass filtered version of the template signal for each speaker is generated by filtering the template signal with a second bandpass filter, the second bandpass filtered version of the status signal is generated by filtering the status signal with the second bandpass filter, the third bandpass filtered version of the template signal for each speaker is generated by filtering the template signal with a third bandpass filter, and the third bandpass filtered version of the status signal is generated by filtering the status signal with the third bandpass filter.
Each of the band pass filters has linear-phase and length sufficient for adequate transition band rolloff and good stop-band attenuation in its pass band, so that three octave bands of the audio data can be analyzed: a first band between 100-200 Hz (the pass band of the first bandpass filter), a second band between 150-300 Hz (the pass band of the second bandpass filter), and third band between 1-2 kHz (the pass band of the third bandpass filter). The first bandpass filter and the second bandpass filter are linear-phase filters with a group delay of 2K samples. The third bandpass filter has a 512 sample group delay. These filters can be arbitrarily linear-phase, non-linear phase, or quasi-linear phase in the pass-band.
The audio data obtained during step (b) are obtained as follows. Rather, than actually measuring sound emitted from the speakers with the microphone, measurement of such sound is simulated by convolving predetermined room responses for each speaker-microphone pair with the trailer soundtrack (with the speaker feed for Channel 1 of the trailer soundtrack distorted with the elliptic HPF).
More specifically, the audio data obtained during step (b) of the exemplary embodiment, are generated as follows. The HPF filtered Channel 1 signal generated in step (a) is convolved with the room response of the Channel 1 speaker to determine a convolution indicative of the damaged Channel 1 speaker output that would be measured by microphone 3 during playback by the damaged Channel 1 speaker of Channel 1 of the trailer. The (nonfiltered) speaker feed for Channel 2 of the trailer soundtrack is convolved with the room response of the Channel 2 speaker to determine a convolution indicative of the Channel 2 speaker output that would measured by microphone 3 during playback by the Channel 2 speaker of Channel 2 of the trailer, and the (nonfiltered) speaker feed for Channel 3 of the trailer soundtrack is convolved with the room response of the Channel 3 speaker to determine a convolution indicative of the Channel 3 speaker output that would measured by microphone 3 during playback by the Channel 3 speaker of Channel 3 of the trailer. The three resulting convolutions are summed to generate audio data indicative of a status signal which simulates the expected output of microphone 3 during playback by all three speakers (with the Channel 1 speaker having a damaged low-frequency driver) of the trailer.
Each of the above-described band-pass filters (one having a pass band between 100-200 Hz, the second having a pass band between 150-300 Hz, and third having a pass band between 1-2 kHz) is applied to the audio data generated in step (b), to determine the above-mentioned first bandpass filtered version of the status signal, second bandpass filtered version of the status signal, and third bandpass filtered version of the status signal.
The template signal for the L speaker is determined by convolving the predetermined room response for the L speaker (and microphone 3) with the left channel (channel 1) of the trailer soundtrack. The template signal for the C speaker is determined by convolving the predetermined room response for the C speaker (and microphone 3) with the center channel (channel 2) of the trailer soundtrack. The template signal for the R speaker is determined by convolving the predetermined room response for the R speaker (and microphone 3) with the right channel (channel 3) of the trailer soundtrack.
In the exemplary embodiment, the following correlation analysis is performed in step (c) on the following signals:
the cross-correlation of the first bandpass filtered version of the template signal for the Channel 1 speaker with the first bandpass filtered version of the status signal. This cross-correlation undergoes a Fourier transform to determine a cross-correlation power spectrum for the 100-200 Hz band of the Channel 1 speaker (of the type generated in step 26 of above-described
the cross-correlation of the second bandpass filtered version of the template signal for the Channel 1 speaker with the second bandpass filtered version of the status signal. This cross-correlation undergoes a Fourier transform to determine a cross-correlation power spectrum for the 150-300 Hz band of the Channel 1 speaker. This cross-correlation power spectrum, and smoothed version S3 of the power spectrum, are plotted in
the cross-correlation of the third bandpass filtered version of the template signal for the Channel 1 speaker with the third bandpass filtered version of the status signal. This cross-correlation undergoes a Fourier transform to determine a cross-correlation power spectrum for the 1000-2000 Hz band of the Channel 1 speaker. This cross-correlation power spectrum, and smoothed version S5 of the power spectrum, are plotted in
the cross-correlation of the first bandpass filtered version of the template signal for the Channel 2 speaker with the first bandpass filtered version of the status signal. This cross-correlation undergoes a Fourier transform to determine a cross-correlation power spectrum for the 100-200 Hz band of the Channel 2 speaker (of the type generated in step 26 of above-described
the cross-correlation of the second bandpass filtered version of the template signal for the Channel 2 speaker with the second bandpass filtered version of the status signal. This cross-correlation undergoes a Fourier transform to determine a cross-correlation power spectrum for the 150-300 Hz band of the Channel 2 speaker. This cross-correlation power spectrum, and smoothed version S4 of the power spectrum, are plotted in
the cross-correlation of the third bandpass filtered version of the template signal for the Channel 2 speaker with the third bandpass filtered version of the status signal. This cross-correlation undergoes a Fourier transform to determine a cross-correlation power spectrum for the 1000-2000 Hz band of the Channel 2 speaker. This cross-correlation power spectrum, and smoothed version S6 of the power spectrum, are plotted in
the cross-correlation of the first bandpass filtered version of the template signal for the Channel 3 speaker with the first bandpass filtered version of the status signal. This cross-correlation undergoes a Fourier transform to determine a cross-correlation power spectrum for the 100-200 Hz band of the Channel 3 speaker (of the type generated in step 26 of above-described
the cross-correlation of the second bandpass filtered version of the template signal for the Channel 3 speaker with the second bandpass filtered version of the status signal. This cross-correlation undergoes a Fourier transform to determine a cross-correlation power spectrum for the 150-300 Hz band of the Channel 3 speaker. This cross-correlation power spectrum (or a smoothed version of it) is analyzed (e.g., plotted and analyzed) in a manner to be described below. The smoothing performed to generate the smoothed version may be accomplished by fitting a simple fourth-order polynomial to the cross-correlation power spectrum or in any of a variety of other smoothing methods); and
the cross-correlation of the third bandpass filtered version of the template signal for the Channel 3 speaker with the third bandpass filtered version of the status signal. This cross-correlation undergoes a Fourier transform to determine a cross-correlation power spectrum for the 1000-2000 Hz band of the Channel 3 speaker. This cross-correlation power spectrum (or a smoothed version of it) is analyzed (e.g., plotted and analyzed) in a manner to be described below. The smoothing performed to generate the smoothed version may be accomplished by fitting a simple fourth-order polynomial to the cross-correlation power spectrum or in any of a variety of other smoothing methods).
A difference is identified (if any significant difference exists) between the state of each speaker (during performance of step (b)) in each of the three octave-bands, and the speaker's state in each of the three octave-bands at the initial time, from the nine cross-correlation power spectra described above (or a smoothed version of each of them).
More specifically, consider the smoothed versions S1, S2, S3, S4, S5, and S6, of cross-correlation power spectra which are plotted in
Due to the distortion present in Channel 1 (i.e., the change in status of the Channel 1 speaker, namely the simulated damage to its low frequency driver, during performance of step (b) relative to its status at the initial time), the smoothed cross-correlation power spectra S1, S3, and S5 (of
Since no distortion is present in Channel 2 (i.e., the Channel 2 speaker's status during performance of step (b) is identical to its status at the initial time), the smoothed cross-correlation power spectra S2, S4, and S6 (of
In this context, presence of “significant deviation” from zero amplitude in the relevant frequency band means that the mean or the standard deviation (or each of the mean and the standard deviation) of the amplitude of the relevant smoothed cross-correlation power spectrum is greater than zero (or another metric of the relevant cross-correlation power spectrum differs from zero or another predetermined value) by more than a predetermined threshold for the frequency band. In this context, the difference between the mean (or standard deviation) of the amplitude of the relevant smoothed cross-correlation power spectrum, and a predetermined value (e.g., zero amplitude), is a “metric” of the smoothed cross-correlation power spectrum. Metrics other than standard deviation could be utilized such as spectral deviation, etc. In other embodiments of the invention, some other characteristic of the cross-correlation power spectra obtained in accordance with the invention (or of smoothed versions of them) is employed to assess status of loudspeakers in each frequency band in which the spectra (or smoothed versions of them) include useful information.
Typical embodiments of the invention monitor the transfer function applied by each loudspeaker to the speaker feed for a channel of an audiovisual program (e.g., a movie trailer) as measured by capturing sound emitted from the loudspeaker using a microphone, and flag when changes occur. Since a typical trailer does not cause only one loudspeaker at a time active sufficiently long to make a transfer function measurement, some embodiments of the invention employ cross correlation averaging methods to separate the transfer function of each loudspeaker from that of the other loudspeakers in the playback environment. For example, in one such embodiment the inventive method includes steps of: obtaining audio data indicative of a status signal captured by a microphone (e.g., in a movie theater) during playback of a trailer; and processing the audio data to perform a status check on the speakers employed to play back the trailer, including by, for each of the speakers, comparing (including by implementing cross correlation averaging) a template signal indicative of response of the microphone to play back of a corresponding channel of the trailer's soundtrack by the speaker at an initial time, and the status signal determined by the audio data. The step of comparing typically includes identifying a difference, if any significant difference exists, between the template signal and the status signal. The cross correlation averaging (during the step of processing the audio data) typically includes steps of determining a sequence of cross-correlations (for each speaker) of the template signal for said speaker and the microphone (or a bandpass filtered version of said template signal) with the status signal for said microphone (or a bandpass filtered version of the status signal), where each of the cross-correlations is a cross-correlation of a segment (e.g., a frame or sequence of frames) of the template signal for said speaker and the microphone (or a bandpass filtered version of said segment) with a corresponding segment (e.g., a frame or sequence of frames) of the status signal for said microphone (or a bandpass filtered version of said segment), and identifying a difference (if any significant difference exists) between the template signal and the status signal from an average of the cross-correlations.
Cross correlation averaging can be employed because correlated signals add linearly with the number of averages while uncorrelated ones add as the square root of the number of averages. Thus the signal to noise ratio (SNR) improves as the square root of the number of averages. Situations with a large amount of uncorrelated signals compared to the correlated ones require more averages to get a good SNR. The averaging time can be adjusted by comparing the total level at the microphone to what is predicted from the speaker being assessed.
It has been proposed to employ cross correlation averaging in adaptive equalization processes (e.g., for Bluetooth headsets). However, before the present invention, it had not been proposed to employ correlated averaging to monitor status of individual loudspeakers in an environment in which multiple loudspeakers are emitting sound simultaneously and a transfer function for each loudspeaker needs to be determined. As long as each loudspeaker produces output signals uncorrelated with those produced by the other loudspeakers, correlated averaging can be used to separate the transfer functions. However, since this may not always be the case, the estimated relative signal levels at the microphone and the degree of correlation between the signals at each loudspeaker can be used to control the averaging process.
For example, in some embodiments, during assessment of the transfer function from one of the speakers to a microphone, when a significant amount of correlated signal energy between other speakers and the speaker being assessed for its transfer function is present, the transfer function estimating process is turned off or slowed. For example, if a 0 dB SNR is required, the transfer function estimating process can be turned off for each speaker-microphone combination when the total estimated acoustic energy at the microphone from the correlated components of all other speakers is comparable to the estimated acoustic energy from the speaker whose transfer function is being estimated. The estimated correlated energy at the microphone can be obtained by determining the correlated energy in the signals feeding each speaker, filtered by the appropriate transfer functions from each speaker to each microphone in question, with these transfer functions typically having been obtained during an initial calibration process. Turning off the estimation process can be done on a frequency band by band basis rather than the whole transfer function at a time.
For example, a status check on each speaker of a set of N speakers can include, for each speaker-microphone pair consisting of one of the speakers and one of a set of M microphones, the steps of:
(d) determining cross-correlation power spectra for the speaker-microphone pair, where each of the cross-correlation power spectra is indicative of a cross-correlation of the speaker feed for the speaker of said speaker-microphone pair and the speaker feed for another one of the set of N speakers;
(e) determining an auto-correlation power spectrum indicative of an auto-correlation of the speaker feed for the speaker of said speaker-microphone pair;
(f) filtering each of the cross-correlation power spectra and the auto-correlation power spectrum with a transfer function indicative of a room response for the speaker-microphone pair, thereby determining filtered cross-correlation power spectra and a filtered auto-correlation power spectrum;
(g) comparing the filtered auto-correlation power spectrum to a root mean square sum of all the filtered cross-correlation power spectra; and
(h) temporarily halting or slowing down the status check for the speaker of the speaker-microphone pair in response to determining that the root mean square sum is comparable to or greater than the filtered auto-correlation power spectrum.
Step (g) can include a step of comparing the filtered auto-correlation power spectrum and the root mean square sum on a frequency band-by-band basis, and step (h) can include a step of temporarily halting or slowing down the status check for the speaker of the speaker-microphone pair in each frequency band in which the root mean square sum is comparable to or greater than the filtered auto-correlation power spectrum.
In another class of embodiments, the inventive method processes data indicative of the output of at least one microphone to monitor audience reaction (e.g., laughter or applause) to an audiovisual program (e.g., a movie played in a movie theater), and provides the resulting output data (indicative of audience reaction) to interested parties (e.g., studios) as a service (e.g., via a web connected d-cinema server). The output data can inform a studio that a comedy is doing well based on how often and how loud the audience laughs or how a serious film is doing based on whether audience members applaud at the end. The method can provide geographically based feedback (e.g., to studios) which may be used to direct advertising for promotion of a movie.
Typical embodiments in this class implement the following key techniques:
(i) separation of playback content (i.e., audio content of the program played back in the presence of the audience) from audience signals captured by each microphone (during playback of the program in the presence of the audience). Such separation is typically implemented by a processor coupled to receive the output of each microphone and is achieved by knowing the signal to the speaker feeds, knowing the loudspeaker-room responses to each of the “signature” microphones, and performing temporal or spectral subtraction of the measured signal at the signature microphone from a filtered signal, where the filtered signal is computed in a side-chain in the processor, the filtered signal being obtained by filtering the loudspeaker-room responses with the speaker feed signals. The speaker-feed signals by themselves could be filtered versions of the actual arbitrary movie/advertisement/preview content signals with the associated filtering being done by equalization filters and other processing such as panning; and
(ii) content analysis and pattern classification techniques (also typically implemented by a processor coupled to receive the output of each microphone) to discriminate between different audience signals captured by the microphone(s).
For example, an embodiment in this class is a method for monitoring audience reaction to an audiovisual program played back by a playback system including a set of N speakers in a playback environment, where N is a positive integer, wherein the program has a soundtrack comprising N channels. The method includes steps of: (a) playing back the audiovisual program in the presence of an audience in the playback environment, including by emitting sound, determined by the program, from the speakers of the playback system in response to driving each of the speakers with a speaker feed for a different one of the channels of the soundtrack; (b) obtaining audio data indicative of at least one microphone signal generated by at least microphone in the playback environment during emission of the sound in step (a); and (c) processing the audio data to extract audience data from said audio data, and analyzing the audience data to determine audience reaction to the program, wherein the audience data are indicative of audience content indicated by the microphone signal, and the audience content comprises sound produced by the audience during playback of the program.
Separation of playback content from audience content can be achieved by performing a spectral subtraction, where the difference is obtained between the measured signal at each microphone and a sum of filtered versions of the speaker feed signals delivered to the loudspeakers (with the filters being copies of equalized room responses of the speakers measured at the microphone). Thus, a simulated version of the signal expected to be received at the microphone in response to the program alone is subtracted from the actual signal received at the microphone in response to the combined program and audience signal. The filtering can be done with different sampling rates to get better resolution in specific frequency bands.
The pattern recognition can utilize supervised or unsupervised clustering/classification techniques.
With reference to
Step 32 determines audience audio data, indicative of sound produced by the audience during step 30 (referred to as an “audience generated signal” or “audience signal” in
In step 34, time, frequency, or time-frequency tile features are extracted from the audience audio data.
After step 34, at least one of steps 36, 38, and 40 is performed (e.g., all of steps 36, 38, and 40 are performed).
In step 36, the type of audience audio data (e.g., a characteristic of audience reaction to the program indicated by the audience audio data) is identified from the tile features determined in step 34, based on probabilistic or deterministic decision boundaries.
In step 38, the type of audience audio data (e.g., a characteristic of audience reaction to the program indicated by the audience audio data) is identified from the tile features determined in step 34, based on unsupervised learning (e.g., clustering).
In step 40, the type of audience audio data (e.g., a characteristic of audience reaction to the program indicated by the audience audio data) is identified from the tile features determined in step 34, based on supervised learning (e.g., neural networks).
The
As indicated in
The other elements of block 100 of
The estimated room responses, ĥji(n) for the “j”th microphone can be determined (e.g., during a preliminary operation with no audience present) by measuring sound emitted from the speakers with the microphone positioned in the same environment (e.g., room) as the speakers. The preliminary operation may be an initial alignment process in which the speakers of the audio playback system are initially calibrated. Each such response is an “estimated” response in the sense that it is expected to be similar to the room response (for the relevant microphone-speaker pair) actually existing during performance of the inventive method to determine monitoring audience reaction to an audiovisual program, although it may differ from the room response (for the microphone-speaker pair) actually existing during performance of the inventive method due (e.g., due to changes over time to the state of one or more of the microphone, the speaker, and the playback environment, that may have occurred since performance of the preliminary operation).
Alternatively, the estimated room responses, ĥji(n), for the “j”th microphone, can be determined by adaptively updating an initially determined set of estimated room responses (e.g., where the initially determined estimated room responses are determined during a preliminary operation with no audience present). The initially determined set of estimated room responses may be determined in an initial alignment process in which the speakers of the audio playback system are initially calibrated.
For each value of index n, the output signals of all the ĥji(n) elements of block 100 are summed (in addition elements 102) to generate the estimated program content sample, {hacek over (z)}j(n), for said value of index n. The current estimated program content sample, {hacek over (z)}j(n), is asserted to subtraction element 101 in which it is subtracted from a corresponding sample, mj(n), of the microphone output obtained during playback of the program in the presence of the audience whose reactions are to be monitored.
More specifically, the room response for the Left speaker, hj1(n), is the “Left” channel speaker response plotted in
Similarly, the room response for the Center speaker, hj2(n), is the “Center” channel speaker response plotted in
Similarly, the room response for the Right speaker, hj3(n), is the “Right” channel speaker response plotted in
To generate the simulated microphone output samples, mj(n), that were asserted to one input of element 101 of
Aspects of the invention include a system configured (e.g., programmed) to perform any embodiment of the inventive method, and a computer readable medium (e.g., a disc) which stores code for implementing any embodiment of the inventive method. For example, such a computer readable medium may be included in processor 2 of
In some embodiments, the inventive system is or includes at least one microphone (e.g., microphone 3 of
In some embodiments of the inventive method, some or all of the steps described herein are performed simultaneously or in a different order than specified in the examples described herein. Although steps are performed in a particular order in some embodiments of the inventive method, some steps may be performed simultaneously or in a different order in other embodiments.
While specific embodiments of the present invention and applications of the invention have been described herein, it will be apparent to those of ordinary skill in the art that many variations on the embodiments and applications described herein are possible without departing from the scope of the invention described and claimed herein. It should be understood that while certain forms of the invention have been shown and described, the invention is not to be limited to the specific embodiments described and shown or the specific methods described.
This application is a divisional of U.S. patent application Ser. No. 14/126,985 filed 17 Dec. 2013, which is a National Phase entry of International Patent Application No. PCT/US2012/044342 filed on 27 Jun. 2012, which claims priority to U.S. Provisional Application No. 61/504,005 filed 1 Jul. 2011; U.S. Provisional Application No. 61/635,934 filed 20 Apr. 2012; and U.S. Provisional Application No. 61/655,292 filed 4 Jun. 2012, all of which are hereby incorporated by reference in entirety for all purposes.
Number | Name | Date | Kind |
---|---|---|---|
7158643 | Lavoie | Jan 2007 | B2 |
7525440 | Carreras | Apr 2009 | B2 |
7881460 | Looney | Feb 2011 | B2 |
7889073 | Zalewski | Feb 2011 | B2 |
8036767 | Soulodre | Oct 2011 | B2 |
8081776 | Haulick | Dec 2011 | B2 |
8126161 | Togami | Feb 2012 | B2 |
8737636 | Park | May 2014 | B2 |
8776102 | Brown | Jul 2014 | B2 |
20020073417 | Kondo | Jun 2002 | A1 |
20030105540 | Debail | Jun 2003 | A1 |
20040117815 | Kondo | Jun 2004 | A1 |
20040156510 | Isaka | Aug 2004 | A1 |
20040174991 | Hirai | Sep 2004 | A1 |
20050123143 | Platzer | Jun 2005 | A1 |
20050137859 | Yoshino | Jun 2005 | A1 |
20050152557 | Sasaki | Jul 2005 | A1 |
20050289582 | Tavares | Dec 2005 | A1 |
20060083387 | Emoto | Apr 2006 | A1 |
20060182287 | Schulein | Aug 2006 | A1 |
20060210093 | Ishibashi | Sep 2006 | A1 |
20060251265 | Asada | Nov 2006 | A1 |
20070019815 | Asada | Jan 2007 | A1 |
20080195385 | Pereg | Aug 2008 | A1 |
20090316923 | Tashev | Dec 2009 | A1 |
20100043021 | Torsiello | Feb 2010 | A1 |
20100189275 | Christoph | Jul 2010 | A1 |
20100189292 | Wurzbacher | Jul 2010 | A1 |
20110004474 | Bansal | Jan 2011 | A1 |
20110019833 | Kuech | Jan 2011 | A1 |
20110164754 | Gleissner | Jul 2011 | A1 |
20120020505 | Yamada | Jan 2012 | A1 |
Number | Date | Country |
---|---|---|
19901288 | Jul 2000 | DE |
1956865 | Aug 2008 | EP |
2448766 | Oct 2008 | GB |
1332 | Aug 2013 | RU |
2008006952 | Jan 2008 | WO |
2008096336 | Aug 2008 | WO |
2011120800 | Oct 2011 | WO |
Entry |
---|
“Difference Between Bandwidth and Speed” http://www.differencebetween.net/technology/internet/difference-between-bandwidth-and-speed/. |
Cheng, Yi-Hsiang, et al. “Pre-Processing Scheme to Effectively Compensate Environment and Equipment Factors for Sound Source Separation” IEEE Region 10 Annual International Conference Proceedings, pp. 2072-2076, 2010. |
Davy, M. et al. “Loudspeaker Fault Detection Using Time-Frequency Representations” ICASSP IEEE INT Acoustic Speech Signal Processing 2001. |
Erten, G. “Voice Signal Extraction for Enhanced Speech Quality in Noisy Vehicle Environments” Digital Avionics Systems Conference, 1999, Proc. 18th IC Tech, Inc. vol. 2. |
Peltola, Leevi “Synthesis of Hand Clapping Sounds” Audio, Speech, and Language Processing, Transactions on IEEE, vol. 15, Issue 3, pp. 1021-1029, Dec. 2006. |
Schuller, B. et al “Discrimination of Speech and Non-Linguistic Vocalizations by Non-Negative Matrix Factorization” IEEE International Conference on Mar. 14-19, 2010, pp. 5054-5057, ICASSP. |
Stanojevic, T. “Some Technical Possibilities of Using the Total Surround Sound Concept in the Motion Picture Technology”, 133rd SMPTE Technical Conference and Equipment Exhibit, Los Angeles Convention Center, Los Angeles, California, Oct. 26-29, 1991. |
Stanojevic, T. et al “Designing of TSS Halls” 13th International Congress on Acoustics, Yugoslavia, 1989. |
Stanojevic, T. et al “The Total Surround Sound (TSS) Processor” SMPTE Journal, Nov. 1994. |
Stanojevic, T. et al “The Total Surround Sound System”, 86th AES Convention, Hamburg, Mar. 7-10, 1989. |
Stanojevic, T. et al “TSS System and Live Performance Sound” 88th AES Convention, Montreux, Mar. 13-16, 1990. |
Stanojevic, T. et al. “TSS Processor” 135th SMPTE Technical Conference, Oct. 29-Nov. 2, 1993, Los Angeles Convention Center, Los Angeles, California, Society of Motion Picture and Television Engineers. |
Stanojevic, Tomislav “3-D Sound in Future HDTV Projection Systems” presented at the 132nd SMPTE Technical conference, Jacob K. Javits Convention Center, New York City, Oct. 13-17, 1990. |
Stanojevic, Tomislav “Surround Sound for a New Generation of Theaters, Sound and Video Contractor” Dec. 20, 1995. |
Stanojevic, Tomislav, “Virtual Sound Sources in the Total Surround Sound System” Proc. 137th SMPTE Technical Conference and World Media Expo, Sep. 6-9, 1995, New Orleans Convention Center, New Orleans, Louisiana. |
Usher, J. et al. “Enhancement of Spatial Sound Quality: A New Reverberation-Extraction Audio Upmixer” IEEE Transactions on Audio, Speech, and Language Processing, vol. 15, No. 7, Sep. 2007, pp. 2141-2150. |
Number | Date | Country | |
---|---|---|---|
20170026766 A1 | Jan 2017 | US |
Number | Date | Country | |
---|---|---|---|
61504005 | Jul 2011 | US | |
61635934 | Apr 2012 | US | |
61655292 | Jun 2012 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 14126985 | US | |
Child | 15282631 | US |