The present Application for Patent is related to the following co-pending U.S. Patent Applications:
Ser. No. 13/280,303 “THREE-DIMENSIONAL SOUND CAPTURING AND REPRODUCING WITH MULTI-MICROPHONES”, filed concurrently herewith, assigned to the assignee hereof; and
13/280,203 “SYSTEMS, METHODS, APPARATUS, AND COMPUTER-READABLE MEDIA FOR HEAD TRACKING BASED ON RECORDED SOUND SIGNALS”, filed concurrently herewith, assigned to the assignee hereof.
1. Field
This disclosure relates to audio signal processing.
2. Background
Many activities that were previously performed in quiet office or home environments are being performed today in acoustically variable situations like a car, a street, or a café. For example, a person may desire to communicate with another person using a voice communication channel. The channel may be provided, for example, by a mobile wireless handset or headset, a walkie-talkie, a two-way radio, a car-kit, or another communications device. Consequently, a substantial amount of voice communication is taking place using portable audio sensing devices (e.g., smartphones, handsets, and/or headsets) in highly variable environments. Incorporation of video recording capability into communications devices also presents new opportunities and challenges.
A method of orientation-sensitive recording control according to a general configuration includes indicating, within a portable device and at a first time, that the portable device has a first orientation relative to a gravitational axis and, based on the indication, selecting a first pair among at least three microphone channels of the portable device. This method also includes indicating, within the portable device and at a second time that is different than the first time, that the portable device has a second orientation relative to the gravitational axis that is different than the first orientation and, based on the indication, selecting a second pair among the at least three microphone channels that is different than the first pair. In this method, each of the at least three microphone channels is based on a signal produced by a corresponding one of at least three microphones of the portable device. Computer-readable storage media (e.g., non-transitory media) having tangible features that cause a machine reading the features to perform such a method are also disclosed.
An apparatus for orientation-sensitive recording control according to a general configuration includes means for indicating, at a first time, that a portable device has a first orientation relative to a gravitational axis, and means for selecting a first pair among at least three microphone channels of the portable device, based on said indication that the portable device has the first orientation. This apparatus also includes means for indicating, at a second time that is different than the first time, that the portable device has a second orientation relative to the gravitational axis that is different than the first orientation, and means for selecting a second pair among the at least three microphone channels that is different than the first pair, based on said indication that the portable device has the second orientation. In this apparatus, each of the at least three microphone channels is based on a signal produced by a corresponding one of at least three microphones of the portable device.
An apparatus for orientation-sensitive recording control according to another general configuration includes an orientation sensor configured to indicate, at a first time, that a portable device has a first orientation relative to a gravitational axis, and a microphone channel selector configured to select a first pair among at least three microphone channels of the portable device, based on said indication that the portable device has the first orientation. The orientation sensor is configured to indicate, at a second time that is different than the first time, that the portable device has a second orientation relative to the gravitational axis that is different than the first orientation. The microphone channel selector is configured to select a second pair among the at least three microphone channels that is different than the first pair, based on said indication that the portable device has the second orientation. In this apparatus, each of the at least three microphone channels is based on a signal produced by a corresponding one of at least three microphones of the portable device.
Nowadays we are experiencing prompt exchange of individual information through rapidly growing social network services such as Facebook, Twitter, etc. At the same time, we also see the distinguishable growth of network speed and storage, which already supports not only text, but also multimedia data. In this environment, we see an important need for capturing and reproducing three-dimensional (3D) audio for more realistic and immersive exchange of individual aural experiences.
Multi-microphone-based audio processing algorithms have recently been developed in the context of enhancing speech communication. This disclosure describes several unique features for 3D audio based on a multi-microphone topology.
Unless expressly limited by its context, the term “signal” is used herein to indicate any of its ordinary meanings, including a state of a memory location (or set of memory locations) as expressed on a wire, bus, or other transmission medium. Unless expressly limited by its context, the term “generating” is used herein to indicate any of its ordinary meanings, such as computing or otherwise producing. Unless expressly limited by its context, the term “calculating” is used herein to indicate any of its ordinary meanings, such as computing, evaluating, smoothing, and/or selecting from a plurality of values. Unless expressly limited by its context, the term “obtaining” is used to indicate any of its ordinary meanings, such as calculating, deriving, receiving (e.g., from an external device), and/or retrieving (e.g., from an array of storage elements). Unless expressly limited by its context, the term “selecting” is used to indicate any of its ordinary meanings, such as identifying, indicating, applying, and/or using at least one, and fewer than all, of a set of two or more. Where the term “comprising” is used in the present description and claims, it does not exclude other elements or operations. The term “based on” (as in “A is based on B”) is used to indicate any of its ordinary meanings, including the cases (i) “derived from” (e.g., “B is a precursor of A”), (ii) “based on at least” (e.g., “A is based on at least B”) and, if appropriate in the particular context, (iii) “equal to” (e.g., “A is equal to B”). Similarly, the term “in response to” is used to indicate any of its ordinary meanings, including “in response to at least.”
References to a “location” of a microphone of a multi-microphone audio sensing device indicate the location of the center of an acoustically sensitive face of the microphone, unless otherwise indicated by the context. The term “channel” is used at times to indicate a signal path and at other times to indicate a signal carried by such a path, according to the particular context. Unless otherwise indicated, the term “series” is used to indicate a sequence of two or more items. The term “logarithm” is used to indicate the base-ten logarithm, although extensions of such an operation to other bases are within the scope of this disclosure. The term “frequency component” is used to indicate one among a set of frequencies or frequency bands of a signal, such as a sample of a frequency domain representation of the signal (e.g., as produced by a fast Fourier transform) or a subband of the signal (e.g., a Bark scale or mel scale subband).
Unless indicated otherwise, any disclosure of an operation of an apparatus having a particular feature is also expressly intended to disclose a method having an analogous feature (and vice versa), and any disclosure of an operation of an apparatus according to a particular configuration is also expressly intended to disclose a method according to an analogous configuration (and vice versa). The term “configuration” may be used in reference to a method, apparatus, and/or system as indicated by its particular context. The terms “method,” “process,” “procedure,” and “technique” are used generically and interchangeably unless otherwise indicated by the particular context. The terms “apparatus” and “device” are also used generically and interchangeably unless otherwise indicated by the particular context. The terms “element” and “module” are typically used to indicate a portion of a greater configuration. Unless expressly limited by its context, the term “system” is used herein to indicate any of its ordinary meanings, including “a group of elements that interact to serve a common purpose.” Any incorporation by reference of a portion of a document shall also be understood to incorporate definitions of terms or variables that are referenced within the portion, where such definitions appear elsewhere in the document, as well as any figures referenced in the incorporated portion.
A method as described herein may be configured to process the captured signal as a series of segments. Typical segment lengths range from about five or ten milliseconds to about forty or fifty milliseconds, and the segments may be overlapping (e.g., with adjacent segments overlapping by 25% or 50%) or nonoverlapping. In one particular example, the signal is divided into a series of nonoverlapping segments or “frames”, each having a length of ten milliseconds. A segment as processed by such a method may also be a segment (i.e., a “subframe”) of a larger segment as processed by a different operation, or vice versa.
A portable audio sensing device may be implemented to have a configurable multi-microphone array geometry. Depending on the use case, different combinations (e.g., pairs) of the microphones of the device may be selected to support spatially selective audio recording in different source directions.
During the operation of a multi-microphone audio sensing device, a microphone array produces a set of microphone channels in which each channel is based on the response of a corresponding one of the microphones to the acoustic environment. One microphone of the array may receive a particular sound more directly than another microphone, such that the corresponding channels differ from one another to provide collectively a more complete representation of the acoustic environment than can be captured using a single microphone.
A spatially selective recording operation may include filtering a multichannel signal, where the gain response of the filter differs according to direction of arrival.
One class of spatially selective filters is beamformers, which include phased arrays, minimum variance distortionless response (MVDR) beamformers, and linearly constrained minimum variance (LCMV) beamformers. Such a filter is typically calculated offline according to a desired direction of the beam pattern but may be calculated and/or adapted online (e.g., based on characteristics of a noise component of the multichannel signal). Another class of spatially selective filters is blind source separation (BSS) filters, which include filters whose coefficients are calculated using independent component analysis (ICA) or independent vector analysis (IVA). A BSS filter is typically trained offline to an initial state and may be further adapted online.
It may be desirable to configure a recording operation to select among several spatially selective filtering operations according to a desired recording direction. For example, a recording operation may be configured to apply a selected one of two or more beam patterns according to the desired recording direction. In such a case, the recording operation may be configured to select the beam pattern whose direction is closest to the desired recording direction.
Additionally or alternatively, a spatially selective recording operation may be configured to select a beam pattern that has a null beam in a desired direction. Such selection may be desirable for blocking sound components from an interfering source. For example, it may be desired to select a beam pattern according to both its direction (i.e., of the main beam) and the direction of its null beam. In the example of
As noted above, a beam pattern is typically symmetrical around the axis of the array. For a case in which the microphones are omnidirectional, therefore, the pickup cones that correspond to the specified ranges of direction may be ambiguous with respect to the front and back of the microphone pair (e.g., as shown in
It may be desirable to calculate a set of beam patterns offline, to support online selection among the beam patterns. For an example in which the device includes multiple possible array configurations (e.g., multiple possible microphone pairs), it may be desirable to calculate a different set of beam patterns offline for each of two or more of the possible array configurations. However, it is also possible to apply the same beam pattern to different array configurations, as a similar response may be expected if the dimensions of the configurations are the same and the individual responses of the microphones of each array are matched.
A spatially selective filter may be implemented to filter a multichannel signal to produce a desired signal in an output channel. Such a filter may also be implemented to produce a noise estimate in another output channel. A potential advantage of such a noise estimate is that it may include nonstationary noise events from other directions. Single-channel audio processing systems are typically unable to distinguish nonstationary noise that occurs in the same frequencies as the desired signal.
Lens L10 of a camera of handset H100 is also arranged on the rear face, and it is assumed in this case that the effective imaging axis of the device is orthogonal to the plane of touchscreen TS10. Alternative placements of lens L10 and corresponding imaging path arrangements are also possible, such as an effective imaging axis that is parallel to either axis of symmetry of touchscreen TS10. A loudspeaker LS10 is arranged in the top center of the front face near microphone MF10, and two other loudspeakers LS20L, LS20R are also provided (e.g., for speakerphone applications).
Handset H100 may be used for video recording via lens L10, using an internal imaging sensor that captures a sequence of images received via the lens and a video recording module that encodes the image sequence for storage and/or transmission. In this case, a front-back microphone pair can be used to record front and back directions (i.e., to steer beams into and away from the camera point direction). Examples of microphone pairs that may be used as an implementation of array R100 to provide directional recording with respect to a front-back axis include microphones MF30 and MR10, microphones MF30 and MR20, and microphones MF10 and MR10, with left and right direction preferences that may be manually or automatically configured. For directional sound recording with respect to one axis that is orthogonal to the front-back axis, an implementation of array R100 that includes microphone pair MR10 and MR20 is one option. For directional sound recording with respect to another axis that is orthogonal to the front-back axis, an implementation of array R100 that includes microphone pair MF20 and MF30 is another option.
It may be desirable to record audio from a particular direction and/or to suppress audio from a particular direction. For example, it may be desirable to record a desired signal that arrives from the direction of the user of the device (e.g., to support narration of the recorded video sequence by the user), or from the direction of a companion of the user, or from the direction of a performance stage or other desired sound source, while suppressing sound arriving from other directions. Alternatively or additionally, it may be desirable to record audio while suppressing interfering sound arriving from a particular direction, such as a loudspeaker of a public address (PA) system, a television or radio, or a loud spectator at a sporting event.
It may also be desirable to provide robust sound direction tracking and maintaining. In such case, it may be desirable to implement the device to maintain a selected recording direction, regardless of the current orientation of the device. Once a preferred recording direction has been specified for a given holding angle of the device, for example, it may be desirable to maintain this direction even if the holding angle of the device subsequently changes.
The response of a spatially selective filter as applied to a pair of microphone channels may be described in terms of an angle relative to the array axis.
When the array axis is horizontal, such selectivity may be used to separate signal components that arrive from different directions in a horizontal plane (i.e., a plane that is orthogonal to the gravitational axis). When the array axis is vertical, however, as shown in
It may be desirable to avoid a loss of spatial directivity in a horizontal plane when the device is rotated between a landscape holding position and a portrait holding position. For example, it may be desirable to use a different microphone pair for recording in the new device orientation such that the desired spatial selectivity in the horizontal plane is maintained. The device may include one or more orientation sensors to detect an orientation of the device. When the device is rotated between landscape and portrait holding positions, for example, it may be desirable to detect such rotation and, in response to the detection, to select the microphone pair whose axis is closest to horizontal, given the current device orientation. Typically the location of each of the orientation sensors within the portable device is fixed.
Such preservation of a desired spatial selectivity may be obtained by using one or more orientation sensors (e.g., one or more accelerometers, gyroscopic sensors, and/or magnetic sensors) to track the orientation of the handset in space. Such tracking may be performed according to any such technique known in the art. For example, such tracking may be performed according to a technique that supports rotation of the display image on a typical smartphone when changing between a landscape holding position to a portrait holding position. Descriptions of such techniques may be found, for example, in U.S. Publ. Pat. Appls. Nos. 2007/0032886 A1 (Tsai), entitled “ELECTRONIC APPARATUS CAPABLE OF ADJUSTING DISPLAY DIRECTION AND DISPLAY_DIRECTION ADJUSTING METHOD THEREOF”; 2009/0002218 A1 (Rigazio et al.), entitled “DIRECTION AND HOLDING-STYLE INVARIANT, SYMMETRIC DESIGN, TOUCH AND BUTTON BASED REMOTE USER INTERACTION DEVICE”; 2009/0207184 A1(Laine et al.), entitled “INFORMATION PRESENTATION BASED ON DISPLAY SCREEN ORIENTATION”; and 2010/0129068 A1 (Binda et al.), entitled “DEVICE AND METHOD FOR DETECTING THE ORIENTATION OF AN ELECTRONIC APPARATUS”. Such adjustment of spatial recording directions based on relative phone orientations may help to maintain a consistent spatial image in the audio recording (e.g., with respect to a contemporaneous video recording).
The indications produced by tasks T110 and T130 may have the form of a measure of an angle relative to the gravitational axis (e.g., in degrees or radians). Such a measure may also be indicated as one within a range of values (e.g., an 8-bit value from 0 to 255). In such cases, tasks T120 and T140 may be configured to compare the corresponding indications to a threshold value (e.g., forty-five degrees or a corresponding value in the range) and to select the channel pair according to a result of the comparison. In another example, the indications produced by tasks T110 and T130 are binary values that have one state when the device is in a portrait holding pattern and the other state when the device is in a landscape holding pattern (e.g., “0”, “low”, or “off” and “1”, “high”, or “on”, respectively, or vice versa).
Orientation sensor 100 may include one or more inertial sensors, such as gyroscopes and/or accelerometers. A gyroscope uses principles of angular momentum to detect changes in orientation about an axis or about each of two or three (typically orthogonal) axes (e.g., changes in pitch, roll and/or twist). Examples of gyroscopes, which may be fabricated as micro-electromechanical systems (MEMS) devices, include vibratory gyroscopes. An accelerometer detects acceleration along an axis or along each of two or three (typically orthogonal) axes. An accelerometer may also be fabricated as a MEMS device. It is also possible to combine a gyroscope and an accelerometer into a single sensor. Additionally or alternatively, orientation sensor 100 may include one or more magnetic field sensors (e.g., magnetometers), which measure magnetic field strength along an axis or along each of two or three (typically orthogonal) axes. In one example, a magnetic field sensor is used to indicate an orientation of the device in a plane orthogonal to the gravitational axis.
Apparatus A100 may also be implemented such that no microphone channel is common to both selected pairs.
As described above, sensing a rotation about a line that is orthogonal to the gravitational axis may be used to select a microphone pair that is expected to support a desired spatial selectivity in a horizontal plane. Additionally or alternatively to such selection, it may be desirable to maintain recording selectivity in a desired direction in the horizontal plane as the device is rotated about the gravitational axis.
It may be desirable to configure a spatial processing module to maintain a desired directional selectivity regardless of the current orientation of the device. For example, it may be desirable to configure the spatial processing module to select a beam pattern based on a desired direction and on a current orientation of the device about the gravitational axis.
Spatial processing module 300 may be configured to select a beam pattern based on the orientation indication and on at least one specified direction (e.g., the direction of a desired source and/or the direction of an interfering source). Spatial processing module 300 may also be configured to store a reference orientation (e.g., indicating an orientation of the portable device relative to the second axis at a time when the direction was specified). In such case, spatial processing module 300 may be configured to calculate a difference between the indicated orientation and the reference orientation, to subtract this difference from the specified direction to obtain a target direction, and to select a beam pattern that is directed toward the target direction, given the indicated orientation.
It may also be desirable to select a different microphone pair in response to a rotation around the gravitational axis.
It is possible that a user's hand may occlude one or more of microphones corresponding to the selected pair and adversely affect a desired spatial response. It may be desirable to configure the recording operation to detect such failure of separation (e.g., by detecting a reduction in the filtered output and/or by comparing the output of the selected beam pattern to the output of another beam pattern in a similar direction) and to select a different pair in response to such detecting. Alternatively, it may be desirable to configure the recording operation to select a different beam pattern in response to such detecting.
A user interface may be configured to support selection of a desired audio recording direction in a horizontal plane (e.g., two-dimensional selection), and the device may be configured to maintain this recording direction through rotation about the gravitational axis (i.e., an axis that is normal to the earth's surface).
As noted above, it may also be desirable to record an indication of the orientation of the device (e.g., in a plane orthogonal to the gravitational axis) at the time the selection is made. For example, such an indication may be recorded as an angle with respect to a magnetic axis. Selection of a direction of an interfering source for spatially selective suppression may be performed in a similar manner. It may also be desirable for the user interface module to emphasize that a direction being selected is a direction in a horizontal plane by warping the selection display according to the current inclination of the device with respect to a horizontal plane (e.g., a plane normal to the gravitational axis), as shown in
For either two-dimensional (e.g., horizontal) or three-dimensional selection, the user interface may be configured for point-and-click selection. For example, during display on touchscreen TS10 of a video sequence currently being captured via lens L10, the user interface module may implement the selection display as an overlay to prompt the user to move the device to place a target (e.g., a cross or colored dot) on the desired source or at the desired direction, and to click a button switch or touch a selection point on the display when the target is placed appropriately to indicate selection of that direction.
The principles of orientation-sensitive recording as described herein may also be extended to recording applications using head-mounted microphones. In such case, it may be desirable to perform orientation tracking using one or more head-mounted implementations of orientation sensor 100.
It may be desirable for array R100 to perform one or more processing operations on the signals produced by the microphones to produce the microphone channels to be selected (e.g., by microphone channel selector 200).
It may be desirable for array R100 to produce each microphone channel as a digital signal, that is to say, as a sequence of samples. Array R210, for example, includes analog-to-digital converters (ADCs) C10a and C10b that are each arranged to sample the corresponding analog channel. Typical sampling rates for acoustic applications include 8 kHz, 12 kHz, 16 kHz, and other frequencies in the range of from about 8 to about 16 kHz, although sampling rates as high as about 44.1, 48, and 192 kHz may also be used. In this particular example, array R210 also includes digital preprocessing stages P20a and P20b that are each configured to perform one or more preprocessing operations (e.g., echo cancellation, noise reduction, and/or spectral shaping) on the corresponding digitized channel to produce the corresponding microphone channels CM1, CM2. Additionally or in the alternative, digital preprocessing stages P20a and P20b may be implemented to perform a frequency transform (e.g., an FFT or MDCT operation) on the corresponding digitized channel to produce the corresponding microphone channels CM1, CM2 in the corresponding frequency domain. Although
Each microphone of array R100 may have a response that is omnidirectional, bidirectional, or unidirectional (e.g., cardioid). The various types of microphones that may be used in array R100 include (without limitation) piezoelectric microphones, dynamic microphones, and electret microphones. In a device for portable voice communications, such as a handset or headset, the center-to-center spacing between adjacent microphones of array R100 is typically in the range of from about 1.5 cm to about 4.5 cm, although a larger spacing (e.g., up to 10 or 15 cm) is also possible in a device such as a handset or smartphone, and even larger spacings (e.g., up to 20, 25 or 30 cm or more) are possible in a device such as a tablet computer. For a far-field application, the center-to-center spacing between adjacent microphones of array R100 is typically in the range of from about four to ten centimeters, although a larger spacing between at least some of the adjacent microphone pairs (e.g., up to 20, 30, or 40 centimeters or more) is also possible in a device such as a flat-panel television display. The microphones of array R100 may be arranged along a line (with uniform or non-uniform microphone spacing) or, alternatively, such that their centers lie at the vertices of a two-dimensional (e.g., triangular) or three-dimensional shape.
The teachings herein with reference to array R100 may be applied to any combination of microphones of the portable device. For example, any two or more (and possibly all) of the microphones of a device as described herein may be used as an implementation of array R100.
It is expressly noted that the microphones may be implemented more generally as transducers sensitive to radiations or emissions other than sound. In one such example, the microphone pair is implemented as a pair of ultrasonic transducers (e.g., transducers sensitive to acoustic frequencies greater than fifteen, twenty, twenty-five, thirty, forty, or fifty kilohertz or more).
It may be desirable to perform a method as described herein within a portable audio sensing device that has an array R100 of two or more microphones configured to receive acoustic signals. Examples of a portable audio sensing device that may be implemented to include such an array and may be used to perform such a method for audio recording and/or voice communications applications include a telephone handset (e.g., a cellular telephone handset); a wired or wireless headset (e.g., a Bluetooth headset); a handheld audio and/or video recorder; a personal media player configured to record audio and/or video content; a personal digital assistant (PDA) or other handheld computing device; and a notebook computer, laptop computer, netbook computer, tablet computer, or other portable computing device. The class of portable computing devices currently includes devices having names such as laptop computers, notebook computers, netbook computers, ultra-portable computers, tablet computers, mobile Internet devices, smartbooks, and smartphones. Such a device may have a top panel that includes a display screen and a bottom panel that may include a keyboard, wherein the two panels may be connected in a clamshell or other hinged relationship. Such a device may be similarly implemented as a tablet computer that includes a touchscreen display on a top surface.
Chip/chipset CS10 includes a receiver which is configured to receive a radio-frequency (RF) communications signal (e.g., via antenna C40) and to decode and reproduce (e.g., via loudspeaker SP10) an audio signal encoded within the RF signal. Chip/chipset CS10 also includes a transmitter which is configured to encode an audio signal that is based on an output signal produced by apparatus A100 (e.g., the spatially selectively filtered signal) and to transmit an RF communications signal (e.g., via antenna C40) that describes the encoded audio signal. For example, one or more processors of chip/chipset CS10 may be configured to perform a noise reduction operation (e.g., Wiener filtering or spectral subtraction, using a noise reference as described above) on one or more channels of the output signal such that the encoded audio signal is based on the noise-reduced signal. In this example, device D20 also includes a keypad C10 and display C20 to support user control and interaction. It is expressly disclosed that applicability of systems, methods, and apparatus disclosed herein is not limited to the particular examples noted herein.
The methods and apparatus disclosed herein may be applied generally in any transceiving and/or audio sensing application, especially mobile or otherwise portable instances of such applications. For example, the range of configurations disclosed herein includes communications devices that reside in a wireless telephony communication system configured to employ a code-division multiple-access (CDMA) over-the-air interface. Nevertheless, it would be understood by those skilled in the art that a method and apparatus having features as described herein may reside in any of the various communication systems employing a wide range of technologies known to those of skill in the art, such as systems employing Voice over IP (VoIP) over wired and/or wireless (e.g., CDMA, TDMA, FDMA, and/or TD-SCDMA) transmission channels.
It is expressly contemplated and hereby disclosed that communications devices disclosed herein may be adapted for use in networks that are packet-switched (for example, wired and/or wireless networks arranged to carry audio transmissions according to protocols such as VoIP) and/or circuit-switched. It is also expressly contemplated and hereby disclosed that communications devices disclosed herein may be adapted for use in narrowband coding systems (e.g., systems that encode an audio frequency range of about four or five kilohertz) and/or for use in wideband coding systems (e.g., systems that encode audio frequencies greater than five kilohertz), including whole-band wideband coding systems and split-band wideband coding systems.
The foregoing presentation of the described configurations is provided to enable any person skilled in the art to make or use the methods and other structures disclosed herein. The flowcharts, block diagrams, and other structures shown and described herein are examples only, and other variants of these structures are also within the scope of the disclosure. Various modifications to these configurations are possible, and the generic principles presented herein may be applied to other configurations as well. Thus, the present disclosure is not intended to be limited to the configurations shown above but rather is to be accorded the widest scope consistent with the principles and novel features disclosed in any fashion herein, including in the attached claims as filed, which form a part of the original disclosure.
Those of skill in the art will understand that information and signals may be represented using any of a variety of different technologies and techniques. For example, data, instructions, commands, information, signals, bits, and symbols that may be referenced throughout the above description may be represented by voltages, currents, electromagnetic waves, magnetic fields or particles, optical fields or particles, or any combination thereof.
Important design requirements for implementation of a configuration as disclosed herein may include minimizing processing delay and/or computational complexity (typically measured in millions of instructions per second or MIPS), especially for computation-intensive applications, such as playback of compressed audio or audiovisual information (e.g., a file or stream encoded according to a compression format, such as one of the examples identified herein) or applications for wideband communications (e.g., voice communications at sampling rates higher than eight kilohertz, such as 12, 16, or 44 kHz).
Goals of a multi-microphone processing system may include achieving ten to twelve dB in overall noise reduction, preserving voice level and color during movement of a desired speaker, obtaining a perception that the noise has been moved into the background instead of an aggressive noise removal, dereverberation of speech, and/or enabling the option of post-processing for more aggressive noise reduction.
The various elements of an implementation of an apparatus as disclosed herein (e.g., apparatus A100, A200, A300, and MF100) may be embodied in any combination of hardware with software, and/or with firmware, that is deemed suitable for the intended application. For example, such elements may be fabricated as electronic and/or optical devices residing, for example, on the same chip or among two or more chips in a chipset. One example of such a device is a fixed or programmable array of logic elements, such as transistors or logic gates, and any of these elements may be implemented as one or more such arrays. Any two or more, or even all, of these elements may be implemented within the same array or arrays. Such an array or arrays may be implemented within one or more chips (for example, within a chipset including two or more chips).
One or more elements of the various implementations of the apparatus disclosed herein may also be implemented in whole or in part as one or more sets of instructions arranged to execute on one or more fixed or programmable arrays of logic elements, such as microprocessors, embedded processors, IP cores, digital signal processors, FPGAs (field-programmable gate arrays), ASSPs (application-specific standard products), and ASICs (application-specific integrated circuits). Any of the various elements of an implementation of an apparatus as disclosed herein may also be embodied as one or more computers (e.g., machines including one or more arrays programmed to execute one or more sets or sequences of instructions, also called “processors”), and any two or more, or even all, of these elements may be implemented within the same such computer or computers.
A processor or other means for processing as disclosed herein may be fabricated as one or more electronic and/or optical devices residing, for example, on the same chip or among two or more chips in a chipset. One example of such a device is a fixed or programmable array of logic elements, such as transistors or logic gates, and any of these elements may be implemented as one or more such arrays. Such an array or arrays may be implemented within one or more chips (for example, within a chipset including two or more chips). Examples of such arrays include fixed or programmable arrays of logic elements, such as microprocessors, embedded processors, IP cores, DSPs, FPGAs, ASSPs, and ASICs. A processor or other means for processing as disclosed herein may also be embodied as one or more computers (e.g., machines including one or more arrays programmed to execute one or more sets or sequences of instructions) or other processors. It is possible for a processor as described herein to be used to perform tasks or execute other sets of instructions that are not directly related to an orientation-sensitive recording procedure, such as a task relating to another operation of a device or system in which the processor is embedded (e.g., an audio sensing device). It is also possible for part of a method as disclosed herein to be performed by a processor of the audio sensing device and for another part of the method to be performed under the control of one or more other processors.
Those of skill will appreciate that the various illustrative modules, logical blocks, circuits, and tests and other operations described in connection with the configurations disclosed herein may be implemented as electronic hardware, computer software, or combinations of both. Such modules, logical blocks, circuits, and operations may be implemented or performed with a general purpose processor, a digital signal processor (DSP), an ASIC or ASSP, an FPGA or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to produce the configuration as disclosed herein. For example, such a configuration may be implemented at least in part as a hard-wired circuit, as a circuit configuration fabricated into an application-specific integrated circuit, or as a firmware program loaded into non-volatile storage or a software program loaded from or into a data storage medium as machine-readable code, such code being instructions executable by an array of logic elements such as a general purpose processor or other digital signal processing unit. A general purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration. A software module may reside in RAM (random-access memory), ROM (read-only memory), nonvolatile RAM (NVRAM) such as flash RAM, erasable programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. An illustrative storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC. The ASIC may reside in a user terminal. In the alternative, the processor and the storage medium may reside as discrete components in a user terminal.
It is noted that the various methods disclosed herein may be performed by an array of logic elements such as a processor, and that the various elements of an apparatus as described herein may be implemented as modules designed to execute on such an array. As used herein, the term “module” or “sub-module” can refer to any method, apparatus, device, unit or computer-readable data storage medium that includes computer instructions (e.g., logical expressions) in software, hardware or firmware form. It is to be understood that multiple modules or systems can be combined into one module or system and one module or system can be separated into multiple modules or systems to perform the same functions. When implemented in software or other computer-executable instructions, the elements of a process are essentially the code segments to perform the related tasks, such as with routines, programs, objects, components, data structures, and the like. The term “software” should be understood to include source code, assembly language code, machine code, binary code, firmware, macrocode, microcode, any one or more sets or sequences of instructions executable by an array of logic elements, and any combination of such examples. The program or code segments can be stored in a processor readable medium or transmitted by a computer data signal embodied in a carrier wave over a transmission medium or communication link.
The implementations of methods, schemes, and techniques disclosed herein may also be tangibly embodied (for example, in one or more computer-readable media as listed herein) as one or more sets of instructions readable and/or executable by a machine including an array of logic elements (e.g., a processor, microprocessor, microcontroller, or other finite state machine). The term “computer-readable medium” may include any medium that can store or transfer information, including volatile, nonvolatile, removable and non-removable media. Examples of a computer-readable medium include an electronic circuit, a semiconductor memory device, a ROM, a flash memory, an erasable ROM (EROM), a floppy diskette or other magnetic storage, a CD-ROM/DVD or other optical storage, a hard disk, a fiber optic medium, a radio frequency (RF) link, or any other medium which can be used to store the desired information and which can be accessed. The computer data signal may include any signal that can propagate over a transmission medium such as electronic network channels, optical fibers, air, electromagnetic, RF links, etc. The code segments may be downloaded via computer networks such as the Internet or an intranet. In any case, the scope of the present disclosure should not be construed as limited by such embodiments.
Each of the tasks of the methods described herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. In a typical application of an implementation of a method as disclosed herein, an array of logic elements (e.g., logic gates) is configured to perform one, more than one, or even all of the various tasks of the method. One or more (possibly all) of the tasks may also be implemented as code (e.g., one or more sets of instructions), embodied in a computer program product (e.g., one or more data storage media such as disks, flash or other nonvolatile memory cards, semiconductor memory chips, etc.), that is readable and/or executable by a machine (e.g., a computer) including an array of logic elements (e.g., a processor, microprocessor, microcontroller, or other finite state machine). The tasks of an implementation of a method as disclosed herein may also be performed by more than one such array or machine. In these or other implementations, the tasks may be performed within a device for wireless communications such as a cellular telephone or other device having such communications capability. Such a device may be configured to communicate with circuit-switched and/or packet-switched networks (e.g., using one or more protocols such as VoIP). For example, such a device may include RF circuitry configured to receive and/or transmit encoded frames.
It is expressly disclosed that the various methods disclosed herein may be performed by a portable communications device such as a handset, headset, or portable digital assistant (PDA), and that the various apparatus described herein may be included within such a device. A typical real-time (e.g., online) application is a telephone conversation conducted using such a mobile device.
In one or more exemplary embodiments, the operations described herein may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, such operations may be stored on or transmitted over a computer-readable medium as one or more instructions or code. The term “computer-readable media” includes both computer storage media and communication media, including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a computer. By way of example, and not limitation, such computer-readable media can comprise an array of storage elements, such as semiconductor memory (which may include without limitation dynamic or static RAM, ROM, EEPROM, and/or flash RAM), or ferroelectric, magnetoresistive, ovonic, polymeric, or phase-change memory; CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to store desired program code, in the form of instructions or data structures, in tangible structures that can be accessed by a computer. Also, any connection is properly termed a computer-readable medium. For example, if the software is transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technology such as infrared, radio, and/or microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technology such as infrared, radio, and/or microwave are included in the definition of medium. Disk and disc, as used herein, includes compact disc (CD), laser disc, optical disc, digital versatile disc (DVD), floppy disk and Blu-ray Disc™ (Blu-Ray Disc Association, Universal City, Calif.), where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.
An acoustic signal processing apparatus as described herein may be incorporated into an electronic device that accepts speech input in order to control certain operations, or may otherwise benefit from separation of desired noises from background noises, such as communications devices. Many applications may benefit from enhancing or separating clear desired sound from background sounds originating from multiple directions. Such applications may include human-machine interfaces in electronic or computing devices which incorporate capabilities such as voice recognition and detection, speech enhancement and separation, voice-activated control, and the like. It may be desirable to implement such an acoustic signal processing apparatus to be suitable in devices that only provide limited processing capabilities.
The elements of the various implementations of the modules, elements, and devices described herein may be fabricated as electronic and/or optical devices residing, for example, on the same chip or among two or more chips in a chipset. One example of such a device is a fixed or programmable array of logic elements, such as transistors or gates. One or more elements of the various implementations of the apparatus described herein may also be implemented in whole or in part as one or more sets of instructions arranged to execute on one or more fixed or programmable arrays of logic elements such as microprocessors, embedded processors, IP cores, digital signal processors, FPGAs, ASSPs, and ASICs.
It is possible for one or more elements of an implementation of an apparatus as described herein to be used to perform tasks or execute other sets of instructions that are not directly related to an operation of the apparatus, such as a task relating to another operation of a device or system in which the apparatus is embedded. It is also possible for one or more elements of an implementation of such an apparatus to have structure in common (e.g., a processor used to execute portions of code corresponding to different elements at different times, a set of instructions executed to perform tasks corresponding to different elements at different times, or an arrangement of electronic and/or optical devices performing operations for different elements at different times).
The present Application for Patent claims priority to Provisional Application No. 61/406,396, entitled “THREE-DIMENSIONAL SOUND CAPTURING AND REPRODUCING WITH MULTI-MICROPHONES,” filed Oct. 25, 2010, and assigned to the assignee hereof.
Number | Name | Date | Kind |
---|---|---|---|
5987142 | Courneau et al. | Nov 1999 | A |
6005610 | Pingali | Dec 1999 | A |
6507659 | Iredale et al. | Jan 2003 | B1 |
7327852 | Ruwisch | Feb 2008 | B2 |
7606373 | Moorer | Oct 2009 | B2 |
8855341 | Kim et al. | Oct 2014 | B2 |
20020167862 | Tomasi et al. | Nov 2002 | A1 |
20030118197 | Nagayasu et al. | Jun 2003 | A1 |
20050069149 | Takahashi et al. | Mar 2005 | A1 |
20050147257 | Melchior et al. | Jul 2005 | A1 |
20050226437 | Pellegrini et al. | Oct 2005 | A1 |
20060045294 | Smyth | Mar 2006 | A1 |
20060195324 | Birk et al. | Aug 2006 | A1 |
20080089531 | Koga et al. | Apr 2008 | A1 |
20080192968 | Ho et al. | Aug 2008 | A1 |
20080247562 | Nagayasu et al. | Oct 2008 | A1 |
20080247565 | Elko et al. | Oct 2008 | A1 |
20090012779 | Ikeda et al. | Jan 2009 | A1 |
20090129620 | Tagawa et al. | May 2009 | A1 |
20090164212 | Chan et al. | Jun 2009 | A1 |
20100046770 | Chan et al. | Feb 2010 | A1 |
20100098258 | Thorn | Apr 2010 | A1 |
20110033063 | McGrath et al. | Feb 2011 | A1 |
20110038489 | Visser et al. | Feb 2011 | A1 |
20120128160 | Kim et al. | May 2012 | A1 |
20120128166 | Kim et al. | May 2012 | A1 |
Number | Date | Country |
---|---|---|
101133679 | Feb 2008 | CN |
07095698 | Apr 1995 | JP |
H07336250 | Dec 1995 | JP |
2002135898 | May 2002 | JP |
2005176063 | Jun 2005 | JP |
2005176138 | Jun 2005 | JP |
2006066988 | Mar 2006 | JP |
2007266754 | Oct 2007 | JP |
2007318373 | Dec 2007 | JP |
2008507926 | Mar 2008 | JP |
2008079255 | Apr 2008 | JP |
2008512015 | Apr 2008 | JP |
2008131616 | Jun 2008 | JP |
2008193420 | Aug 2008 | JP |
2008219458 | Sep 2008 | JP |
2009044588 | Feb 2009 | JP |
2009296232 | Dec 2009 | JP |
2010506525 | Feb 2010 | JP |
2010128952 | Jun 2010 | JP |
2012523731 | Oct 2012 | JP |
19990076219 | Oct 1999 | KR |
20090131237 | Dec 2009 | KR |
WO 2006028587 | Mar 2006 | WO |
WO-2007099908 | Sep 2007 | WO |
WO-2008043731 | Apr 2008 | WO |
WO-2009086017 | Jul 2009 | WO |
WO-2009117471 | Sep 2009 | WO |
WO-2010048620 | Apr 2010 | WO |
WO-2010116153 | Oct 2010 | WO |
Entry |
---|
International Search Report and Written Opinion—PCT/US2011/057730—ISA/EPO—Mar. 5, 2012. |
ISA/EPO—Mar. 5, 2012. |
Number | Date | Country | |
---|---|---|---|
20120128175 A1 | May 2012 | US |
Number | Date | Country | |
---|---|---|---|
61406396 | Oct 2010 | US |