The present embodiments relate generally to microphone networks. In particular, the location of microphones and filter responses to use for maintaining signal from a target while reducing influence from interference sources is determined.
There has been extensive work in the sensor placement problem using a variety of strategies. In design or selection of already implemented microphone arrays, the position of the selected microphones is solved using various approaches. Simulated annealing may be used to simultaneously optimize both weights and sensor locations on a linear array. Sensor location may be found using convex optimization. A binary variable of a sensor being off, 0, or on, 1, is relaxed by letting the variable instead be in the range of [0, 1]. In another relaxation, the unknown vector is converted to a matrix of 0s and 1s that belong to the class of Steifel matrices. The relaxation is to a 1-d sphere, and multiple dimensions are found using a greedy algorithm. Objective criteria are optimized using the KullbackLeibler divergence.
There has also been extensive work in the optimization of filterbanks. For example, a quadrature mirror filterbank is optimized to meet a user-given frequency response criteria. The ripple energy and out of band energy are minimized using a search algorithm whose success is highly dependent on both the starting point and step size. In another example, analysis filters at the microphones are fixed, and the synthesis filters prior to summation are optimized to achieve the best possible reconstruction given a user-specified integer time delay. The problem is converted to a H1 problem to take advantage of existing software. In yet another example, a multi-dimensional perfect reconstruction filterbank has both the analysis and synthesis filter as FIR filters of equal length. This non-linear and non-convex constraint is embedded directly into the optimization where the objective function measures the difference between a desired analysis filterbank and the optimized analysis filterbank.
With both placement and filter response criteria, it may be difficult or time consuming to determine microphone placement as well as filter response while still meeting the criteria of both decisions.
By way of introduction, the preferred embodiments described below include methods, systems, and computer readable media for placement of microphones and design of filters in a microphone network. Using filterbanks with multiple sub-channels for each microphone, the design of the filter response is solved simultaneously with placement. By using an objective function that penalizes the number of sub-channels in any solution, only some of many possible sub-channels and corresponding microphones and filters are selected while also solving for the filter responses for the selected sub-channels. For a given target location, the location of the microphones and the filter responses to beamform are optimized.
In a first aspect, a method is provided to place microphones and design filters in a microphone network. Possible locations for the microphones of an array of the microphone network are determined in a region. Two or more sub-channels are assigned for each of the possible locations, and a filter is assigned for each of the sub-channels. For a target source in the region, a sub-set of the possible locations and filter responses for the filters of the sub-channels of the sub-set are solved. The solutions for the sub-set of the possible locations and the filter responses for the sub-set being are simultaneous. The filter responses for the sub-set are linked to the microphones at the possible locations of the sub-set.
In a second aspect, a system is provided for placing microphones and designing filters. A processor is configured to determine possible locations for the microphones of an array of the microphone network in a region, assign two or more sub-channels for each of the possible locations and a filter for each of the sub-channels, and, for a target source in the region, solving for a sub-set of the possible locations and filter responses for the filters of the sub-channels of the sub-set. The solutions for the sub-set of the possible locations and the filter responses for the sub-set are simultaneous. A memory is configured to store the filter responses for the sub-set and the possible locations of the sub-set.
In a third aspect, a system is provided to filter microphone signals. A plurality of beamformer channels each include a microphone, a first filter having at least two sub-channels, a communication network connecting output of the sub-channels to second filters, and the second filters configured to filter the outputs of the sub-channels of the first filters where filter responses of the second filters are from a simultaneous solution of location of the microphones and the filter responses. A summer is configured to sum outputs from the beamformer channels.
The present invention is defined by the following claims, and nothing in this section should be taken as a limitation on those claims. Further aspects and advantages of the invention are discussed below in conjunction with the preferred embodiments and may be later claimed independently or in combination.
Given a fixed number of sensors, optimization is used to determine a best possible beam pattern. The placement of the fixed number of sensors is simultaneously solved as part of the optimization. A sensing system may use a large number of N sensors (microphones) placed in multiple dimensions to monitor an acoustic field. Using and/or implementing all the microphones at once is impractical because of the amount data generated. Instead, a sub-set of D microphones is selected to be active. The D set (i.e., sub-set of N) of microphones that minimizes the largest interference gain at multiple frequencies while monitoring a target of interest is determined. A direct, combinatorial approach—testing all N to choose D subsets of microphones—is impractical because of the problem size. Instead, a convex optimization induces sparsity through a /1-penalty to determine which subset of microphones to use. Not only the optimal placement (i.e., location in space) of microphones is determined, but also how to process the output of each microphone (e.g., in time and/or frequency) is optimized.
The output of each of the N microphones is processed by an individual multirate filterbank, providing C sub-channels for separately processing the microphone signals. The N processed filterbank outputs are then combined to form one final signal. In this approach, the analysis filters implemented locally to the microphones are fixed, and the optimization is over all the synthesis filters applied to the outputs of the analysis filters. The continuous frequency problem is converted to a discrete frequency approximation that is computationally tractable for the optimization. In this random source/multirate filterbank case, the optimization is over space-time-frequency simultaneously. Not only choosing the placement of microphones but also how to process each of the microphones sampled signals is optimized to monitor a target while attenuating other interfering sources.
The audio systems are designed or used to monitor targets in complex environments. Industrial environments may use the audio system. For example, engineering managers are interested in monitoring specific bearings on a wind turbine, car manufacturers are interested in the sound of a specific piston, or train conductors are interested in detecting aberrant sounds in a specific wheel set. The optimization provides for the audio system to monitor specific locations while reducing signal from interference sources at other locations. The audio system may operate where microphones cannot be placed adjacent to the target of interest, where a quiet or interference-free environment does not exist, and/or where the interference sources' location and signature are not known. A large number of interferences with known locations or a small number of interferences with unknown locations may be modeled. In addition, a limited number of microphones is possible due to bandwidth or other constraints. Other environments than industrial may benefit from the audio system, such as medical, acoustic monitoring, sonography, or surveillance.
To make the problem computationally tractable, possible microphone locations are discretized so that there is a finite set of possible microphone locations. Choosing a reduced number of microphone locations from a set of possible microphone locations is a combinatorial problem, and, for even a moderate size problem, the number of possibilities may be overwhelming.
In one embodiment, the p-norm of interference source gains is minimized while both reconstructing perfectly the target source and using a sparse number of sub-channels of the filterbanks of the microphones. In the problem model, there are two types of sources: interferences, I, whose gains are to be attenuated and a single target, whose gain from system processing is to be exactly equal to 1. In other words, the system processes the target source with no distortion, but embodiments allowing for some distortion of the target may be provided. In one example representation, the optimization of the filter responses, G, is represented as:
refers to the product of two frequency domain objects: the propagation of source r to microphone n and the target inversion filter specific to microphone n, D is a signal decimation factor, C is the number of sub-channels per filterbank, and VS is the desired number of active sub-channels. Unfortunately, this is not a convex optimization problem. The set of N·C analysis filters, denoted by F, may be fixed and known where N is the number of microphones and C is the number of sub-channels for each microphone, resulting in:
being a known quantity.
The locations of the sources and microphones are assumed. A separate audio system (i.e., microphone placement or selection and filter responses for the selected microphones) is designed or determined for different target source locations, allowing scanning of a region by sequentially or simultaneous application of different audio systems. By fixing or assuming the locations of the sources and the microphones, the cascade or product, denoted by H, of the source propagation and target inversion filters is also known. This image model defines the possible locations for microphones, a sub-set of which are selected in optimization. There are I+1 sources with the interferences and a single target source and N microphones. Hence, H has (I+1)·N transfer functions. H may be defined as follows
The optimization is then defined over the N·C synthesis filters, denoted by the filter response G, where G is defined as follows:
To make finding the unknown G computationally tractable, the continuous variable for frequency, w, 0≦w<2π, is discretized with Nf equally spaced frequency points. This is represented as:
To simplify the computations, the number of discretized frequencies, Nf, is treated as an even multiple of D (the down sampling factor). That is:
N
f
=M·D (6)
with M a positive even integer. In addition, to vectorize the computations, the indexes n (number of microphones) and I (sub-channel) over index, s (for sub-channels) are set by letting
s=l+nC and S=N·C (7)
Hence, the discretized version of unknown synthesis filters G, Gcompute, has S·Nf unknown complex numbers, that is:
By penalizing non-zero sub-channel synthesis tap coefficients in the optimization problem with an absolute value penalty term, in the spirit of the LASSO algorithm, certain sub-channels are forced to be considered inactive, creating a sparse set of active sub-channels. A sub-channel is considered inactive if the synthesis tap coefficients are zero or close to zero, such as a threshold amount from zero.
In the optimization, the gain of the interference and source are calculated. Any measure of gain may be used. In one example, the gain is measured as a time-averaged energy assuming a fixed source x. The gain is computed for each of the I+1 sources, xr. The objective function of the optimization includes two terms, the p-norm of interference source gains and the sparse sub-channel penalty term.
The target is at one given location, (3, 2.5) in this example. The audio system is optimized for a target source at this location. The target is an acoustic source of interest. For other target locations, other audio systems are separately optimized. The multiple audio systems may then be used to monitor the room. For example, a scan is performed by applying different audio systems and analyzing the output signals. If the output signal of one audio system has desired characteristics, the target location for that audio system is identified as the location of the target source at that time.
In
The microphones are shown as being in any of 32 possible locations possible locations, x, distributed uniformly along the walls of the room. Other numbers of possible locations may be provided, such as tens, hundreds, or thousands. Non-uniform spacing and/or possible locations in the interior may be used. Each possible location, x, represents a location of any number of sub-channels, such as two sub-channels for each location, resulting in 64 total sub-channels.
The optimization may be for selection of existing microphones. For example,
Additional, different, or fewer components may be provided. For example, a server, computer, or processor connects with the output of the summer 18. The output is a combined signal with attenuated interferences and maintenance of the target signal. This output signal may be analyzed by the processor, such as analyzing pitch, frequency distribution, or another characteristic. As another example, a memory is provided for recording the audio signal output by the summer 18.
The beamformer channels 10 each includes a microphone 12, an analysis filter 14, a communication path 15, and a synthesis filter 16. Additional, different, or fewer components may be provided. For example, the analysis filter 14 and synthesis filter 16 are combined into one filterbank. As another example, a pre-amplifier and analog-to-digital converter are provided between the microphone 12 and the analysis filter 14. In yet another example, the communications path 15 is not provided, such as where the analysis filter 12 and synthesis filter 16 are located in a same housing or room.
The microphone 12 is a transducer for converting acoustic energy into electrical energy. Piezoelectric, drum, membrane, or other microphones may be used. In other embodiments, other sensors than acoustic sensors are used.
The analysis filter 14 is a finite impulse response filter, but infinite impulse response or other filters may be used. The analysis filter 14 has a fixed frequency response, such as a low pass, high pass, or bandpass frequency response. Discrete hardware or a programmable filter is used to implement the analysis filter 14. In one embodiment, the analysis filter 14 represents the frequency response of any electronics (e.g., pre-amp, analog-to-digital converter, down sampler, and any filters (e.g., filtering after conversion and/or down sampling)) between the microphone 12 and the communications path 15. The design of the microphone 12 and the electronics are used to determine the frequency response of the analysis filter 12, and/or the frequency response is measured.
In one embodiment, the analysis filter 14 includes a decimator for down sampling the output provided to the communications path 15. The data rate from the sampled audio signal of the microphone 12 is reduced for communication to the synthesis filter 16. An up sampler is provided in the synthesis filter 16 to up sample to the original data rate or another data rate. In alternative embodiments, down sampling and/or corresponding up sampling is not used or are provided separately from the filters.
The communications path 15 is a communications network, such as an Ethernet network. TCP/IP network communications are used. The communication network connects the output of the analysis filter 14 to the input of the synthesis filter 16. Alternatively, the communications path 15 is a wired or wireless direct connection between the analysis filter 14 and the synthesis filter 16. Any format for communications may be used.
The synthesis filter 16 is a programmable filter. The weights for one or more taps are programmable to provide different frequency response. A finite impulse or infinite impulse response filter is used. In one embodiment, the synthesis filter 16 is implemented by a processor configured for filtering, such as a general processor of a computer or server, a digital signal processor, or field programmable gate array. In other embodiments, the synthesis filter 16 is implemented as filter hardware, such as an application specific integrated circuit. The synthesis filters 16 of the different channels 10 are implemented by the same or different devices.
The synthesis filter 16 is spaced from the analysis filter 14 by the communications path 15. For example, the synthesis filter 16 is part of a control processor or computer for a building and/or the audio system while the analysis filter 14 is positioned with the microphone 12 in or by the region to be monitored.
The synthesis filters 16 each have an individually programmable frequency response. By using different and/or the same frequency response for different channels 10, the summation of the signals from the different channels may attenuation interference and maintain target sound. The analysis filter 14 may be the same for each channel 10, such as where each channel 10 uses the same electronics before the communications path 15, but may be different. The synthesis filter 16 filters the output of the analysis filter 14 after any down sampling, communication transmission, and up sampling. The frequency response used for the synthesis filter 16 of each channel 10 is determined by simultaneous solution with the location of the microphone 12 and the filter response.
The summer 18 is implemented by the same processor or component as the synthesis filter 16. Alternatively, a separate summer is used, such as a node connecting the outputs of the synthesis filters 16 or a summing device. The summer 18 combines the filtered outputs from the synthesis filters 16. The combination provides an audio signal sampled digitally with attenuated interference and maintained source acoustics. The combination of the location of the microphones 12 and the programmable filter response of the synthesis filters 16 acts to reduce sound from some locations and maintain sound from a desired location within the monitored region. The optimization finds not only the microphone locations but also the corresponding beamforming weights in the form of frequency response or filter tap values. In other words, the optimization places the microphones among a sub-set of the possible locations and offers filter responses to process the sampled output of each the placed microphones.
This processing scheme operates as a delay-scale-sum beamformer. A chosen delay and amplitude scaling are applied to each of the N microphones, and the resulting N processed signals are summed to give a final output. In the frequency domain, this delay and scaling beamforming weight is represented as simply a scaled complex exponential. Each of the N microphones sample the continuous time signal at the appropriate sampling rate (>=Nyquist). A Discrete Fourier Transform (DFT) of sufficient length to achieve the needed frequency resolution is then taken on each of the N streams of discrete samples. If the original source signals consisted only of a pure tone (i.e., single frequency) and the correct sampling rate and DFT length were chosen, the DFT transform produces an output of DFT coefficients with only one non-zero entry. For each of the N sets of DFT coefficients, the system multiplies the computed beamforming weight at the non-zero frequency bin. The beamforming weights may vary for each of the N processing streams. A set of weights that can be used to further process the DFT of the input signals are generated. The channel 10 is implemented in the time domain, so the inverse DFT (IDFT) of the DFT coefficients provides the values of the taps of the synthesis filters 16.
As used herein, the Discrete Time Fourier Transform (DTFT) of a discrete function x[t], X(w), is defined as:
If z=ejw in the z-transform, the DTFT is:
In the case where the acoustic signals are broadband (e.g., the signals are assumed to be a sum of F narrowband signals), the optimization finds the optimal placement of microphones and also computes beamforming weights for each of the F frequencies of interest. If the original source signals are only of a sum of F pure tones and the correct sampling rate and DFT length are chosen, the DFT transform produces an output of DFT coefficients with only F non-zero entries. The optimization computes beamforming weights for each of the F non-zero entries for each of the N processing streams (i.e., channels 10).
As represented in
The microphone placement and filter response processing is generalized, providing N multirate filterbanks, each processing the corresponding output of one of the N microphones. Each of the N filterbanks decomposes the discrete input into C sub-channels, resulting in a total of N·C sub-channels. The output of between N and N·C microphones 12 is used without violating bandwidth constraints by selecting sub-channels to process in each filterbank. The placement of the microphones is refined to be placement by sub-channel. By using N microphones and each of the N microphones C sub-channels, the bandwidth constraint of N·C sub-channels is fulfilled. If using N·C microphones but only choosing to use one sub-channel of each of the microphones, the bandwidth constraint of N·C·1 sub-channels is fulfilled. In other words, instead of choosing the placement of N microphones out of a set of P possible microphone locations, a subset of N·C sub-channels to use out of a possible P·C sub-channels is chosen. For deploying relatively inexpensive microphones with bandwidth expense in the transfer of the collected data of each of the microphones, this selection may reduce the bandwidth cost.
In multirate filterbanks, each sub-channel is processed by both an analysis and a synthesis filter 14, 16. The analysis filters 14 are fixed to reduce the computational complexity, and, instead, the tap values for the synthesis filters 16 for each of the chosen N·C sub-channels are computed in the solution. Computing the filters for the multirate filterbanks generalizes computing beamforming weights. The DFT and IDFT implementation may be interpreted as the analysis and synthesis filtering respectively. The choice of beamforming weights corresponds to the choice of the synthesis filters.
In the embodiment represented in
The processor 20 is a general processor, server, computer, digital signal processor, field programmable gate array, application specific integrated circuit, analog circuit, digital circuit, combinations thereof, or other now known or later developed device for solving an object function (see equation (1)). The processor 20 is configured by hardware, firmware, and/or software to solve the objective function.
In one embodiment, the processor 20 is configured to determine possible locations for the microphones of an array of the microphone network in a region. The possible locations correspond to locations of existing microphones. The optimization provides a selection of a sub-set of the existing microphones. Alternatively or additionally, the possible locations correspond to locations where microphones may be installed, such as along a uniform grid in the region to be monitored. The optimization provides a selection of a sub-set of the possible locations for installation of the microphones.
The processor 20 is configured to assign two or more sub-channels for each of the possible locations and a filter for each of the sub-channels. In the modeling for solving the objective function, the possible locations designate not just the physical microphone location, but also the origin of the particular sub-channel. Each possible location has two or more sub-channels for selection or placement, so the optimization may result in one, some, all, or none of the sub-channels for a particular possible location.
The processor 20 is configured to solve an objective function that simultaneously provides for the selection or placement of sub-channels and for the filter response to be used for the synthesis filter for each selected or placed sub-channel. The solution is for a given target source, such as given target source location in the region to be monitored. Given the assigned sub-channels and corresponding filters for all the possible locations, the processor 20 uses the filter response model 28 and location model 26 represented as terms in the objective function (e.g., see equation 1) to provide an audio system for the given target source (e.g., see
The processor 20 may implement the synthesis filters, so may be configured to apply the optimized filter responses for different channels. The summer may also be implemented by the processor 20. In other embodiments, different devices implement, and the processor 20 is used for optimization.
The memory 22 is a database, cache, random access, hard drive, optical, removable, or other memory. The memory 22 is configured by the processor 20 or other processor to store the filter responses for the sub-set of sub-channels and the locations of the sub-set of sub-channels provided by the optimization. Alternatively, the processor 20 transmits the selection or placement and configures the synthesis filters with the filter responses without storage in the memory 22. The memory 22 may store other information, such as information input to and/or created during the optimization.
Alternatively or additionally, the memory 22 is a computer-readable storage device for storing instructions. The instructions, when implemented by the processor 20, cause the processor 20 to solve the objective function. The instructions for implementing the processes, methods, and/or techniques discussed herein are provided on non-transitory computer-readable storage media or memories, such as a cache, buffer, RAM, removable media, hard drive or other computer readable storage media. Computer readable storage media include various types of volatile and nonvolatile storage media. The functions, acts or tasks illustrated in the figures or described herein are executed in response to one or more sets of instructions stored in or on computer readable storage media. The functions, acts or tasks are independent of the particular type of instructions set, storage media, processor or processing strategy and may be performed by software, hardware, integrated circuits, firmware, micro code and the like, operating alone or in combination. Likewise, processing strategies may include multiprocessing, multitasking, parallel processing and the like.
The optimization part of the method is performed by the system of
The optimization and performance solution may operate for one or both of single and multi-frequency sources. Regardless of the type of interference and/or type of target, the optimizing of both microphone weights and positions simultaneously maintains the target acoustics while attenuating the interference acoustics, at least for a given target source location. Other audio systems may be optimized and performed for other target source locations.
Additional, different, or fewer acts may be provided. For example, acts 46 and 48 are not provided, such as where the method is for optimization without performance using the solution. As another example, acts for communicating and/or controlling are provided. In yet another example, acts 40 and 42 are combined, such as where the sub-channels and filters are provided with the microphones in an image model of placement of the possible locations.
The acts are performed in the order shown (e.g., top to bottom) or other order. For example, act 42 is performed prior to act 40. In another example, acts 40-48 or acts 44-46 are repeated for a same microphone array in a same region, but a different target source location.
In act 40, the possible locations for microphones are determined. An array of a microphone network is to be provided or already exists in a region. The possible locations are actual locations of microphones where only a sub-set are to be used for any given target source location or are locations where microphones may be later placed where only a sub-set of the possible locations are selected for later placing actual microphones. For example, the possible locations correspond to locations that may be included in a design or to locations for an already designed array. The possible locations may be uniformly spaced, but non-uniform spacing may be used. The possible locations are distributed in one, two, or three dimensions.
In act 42, two or more sub-channels are assigned for each of the possible locations. A filter is also assigned for each of the sub-channels. Using the modeling, the processor provides for sub-channels and corresponding synthesis filters for each possible location for microphones. In alternative embodiments, only one sub-channel and corresponding filter sequence (i.e., one channel without frequency division) is provided for each microphone.
Each sub-channel and corresponding filter is for a range of frequencies. The spectrum is divided into two or more ranges, such as low and high frequency sub-channels. Each sub-channel filters for signal content in the assigned frequency range. For each microphone or possible location, the same frequency divisions are used, but different divisions may be used for different possible locations.
Each sub-channel filter may be assigned as a combination of an analysis filter and synthesis filter, such as a fixed analysis filter and a programmable synthesis filter for each sub-channel. For example, an FIR filter with a plurality of taps in a multirate filterbank is assigned to each sub-channel. By linking the filters and the microphone placement, this assignment in the modeling may be used to solve for both placement and filter response simultaneously.
For N microphones, the output of each of the N microphones, after pre-filtering, is processed by an individual filterbank. Each filterbank is implemented as a multi-rate, finite-impulse response (FIR) filterbank, as shown in
The subscript r of xr indexes the I+1 sources, which include I interference sources. It is the overall gain of the interference sources that is to be minimized. The “+1” is for the target source, whose gain is to be exactly 1 or as close to 1 as possible so that the audio system perfectly or closely reconstructs the sound from the target. In the index r, the target source is treated as r=0, so the target source is denoted by X0. The target source is modeled with a recorded or expected signal or is modeled with a broadband (e.g., white noise), narrowband, or single frequency signal.
Referring to
The n-th filterbank receives a sampled input originating from acoustic source xr. Each microphone samples at the same uniform rate, and this rate is sufficient to recover all I+1 sources, each of which is assumed to be bandlimited. The sampled input is given by xr,n[k]=xr,n(kTs), where k is an integer time index and Ts is the sampling period. Referring to
where Gn,I is the transfer function of the synthesis filter of filterbank n's sub-channel I, and Fn,I is the transfer function of the analysis filter of filterbank n's sub-channel I. In short, yr,n is the processed output of filterbank n given an input signal propagating from source xr.
The monitored region is modeled as I+1 acoustic point sources. Given Hr,n, results in:
X
r,n(z)=Hr,n(z)Xr(z) (12)
Note that the target inversion pre-filter does not vary with source xr but only varies with filterbank n. Ideally, target source x0 enters each of the N filterbanks with only an amplitude scaling and with its original phase. Assuming that the propagation P0,n from target source x0 to microphone n is inverted perfectly by prefilter I0,n, then the cascade is represented as:
where α0,n is a real scalar representing the amplitude change from propagation and ΔHn represents a processing delay.
Referring again to
The solution is optimized for a given target source location. For other target source locations, a different solution may result. A bank of audio systems or separate solutions may be used to scan the region to determine if an expected target is at various target locations. Alternatively, a single audio system is used to monitor for a target at the given target location.
The solution in one model provides coefficients in the frequency or z-transform domain. By converting back to the temporal domain, the values for taps of a FIR synthesis filter may be determined. Alternatively, the model is performed in the time domain, solving for the values of the taps. In yet other embodiments, the filtering is applied in the frequency or z-transform domain, so the filter response in that frequency or z-transform domain is used.
The solution is handled as a convex optimization. An objective function is solved. The objective function includes two or more terms. For example, one term is a p-norm of a gain of I interferences from interference sources, and another term is a penalty for the sub-channels. Both terms include consideration of the synthesis or other programmable filter responses, G, and the penalty term selects placement from a sub-set of possible locations for the microphones and corresponding sub-channel origins.
In one embodiment, the objective function includes the p-norm of the gain of the I interferences represented as JI,p, and the other term penalizing active sub-channels represented as JS. Active sub-channels are those sub-channels with non-zero synthesis filter responses. Typical p-norms of interest are p=1, 2, ∞. The severity of the active sub-channel penalty is adjusted by changing a non-negative constant, λ, to weight the active sub-channel term, JS. The larger λ chosen, the more severe the active sub-channel penalty and the fewer number of active sub-channels recovered in the optimization. Conversely, the smaller λ chosen, the less severe the active sub-channel penalty and the greater number active sub-channels recovered by the optimization. By choosing λ equal to 0, the active sub-channel penalty term is eliminated altogether, allowing the use of all N·C sub-channels (i.e., selection of all of the possible locations and sub-channels for each location).
The optimization is over N·C synthesis filter responses where N is the number of microphones and C is the number of sub-channels for each microphone. The set of synthesis filter responses are denoted as G in equations (1, 4, and 8). One example expression of the objective function is:
J(G)=JI,p(G)+λ·JS(G) (14)
G is a function of continuous variable of frequency, w. To make J(G) computationally tractable, G is discretized, represented as Gcompute, defined in equation (8). Both JI,p and JS are updated using the discretized representation, providing the objective function J(G) approximation as Jcompute(Gcompute), that is:
Other objective functions with different or additional terms may be used.
In the solution, one term being optimized is JI,p(G), which is provided for minimizing the interference sources. The interference sources may be modeled after expected interference. In other embodiments, the interference is modeled as white noise.
The p-norm of the interference gains is the p-norm of the interference sources' time-averaged energies, that is:
J
I,p(G)=∥(σyr2)r∥p (16)
where the time-averaged energy σ2yr is given below. Given that H and F are fixed and known from measurement or design for all n, the time-averaged energy only varies with synthesis filter responses G for each n.
For the case p=∞, JI,p(G) then becomes:
The value σ2yr is discretized for calculation in the optimization. The discretization is over the frequency, w, as a set of finite, uniformly spaced points (e.g., 16 frequencies) to give a σ2yrcompute, a computationally tractable term. Assume that all the sources have equal variance, σ2x, to further simplify computations of σ2yr.
One expression of σ2yr is provided as:
Where D is a period. By observing that Hn,D×D(ejw) is a diagonal matrix in Qn,D×D(ejw), a scalar expression of Qn,D×D(ejw)d1,d2 results:
Where d1 and d2 are row and column indexes of the matrix. Substituting equation 19 into equation (18) yields
The integral in equation (20) is approximated as follows:
Substituting equation (21) for the integral in equation (20) yields σ2yrcompute, an approximation of σ2yr that is computationally tractable. This is expressed as:
Assume that each source xr has the same variance, that is σ2xr=σ2x for all r, the leading coefficient may be treated as a constant, resulting in equation (22) becoming:
To simply notation, an equality is defined as:
The value of the product in equation (24) is known by assumption. The summations over n and I inside the magnitude squared are combined into a single summation over s (for sub-channel) by letting:
s=l+nC and S=N·C (25)
Hence, equation (23) becomes:
(26)
To efficiently compute equation (26), the equation is rewritten as a product of a row vector, matrix, and column vector, where the row and column vector contain the unknown discretized frequency responses of all S synthesis filters. To begin, the magnitude squared of equation (26) is expanded, and the finite summations are rearranged to get:
using φ for F. In order to reduce the summations over both d1 and f to a single summation over f, Nf is assumed to be an even multiple of D, that is:
N
f
=M·D (28)
with M a positive even integer. Rewriting the arguments to Gs1 and
φr(s1, s2, f) is M-periodic in f, as represented as:
Φr(s1,s2,f−M)=Φr(s1,s2,f) (30)
Equation (29) becomes:
in the relationship {dot over (f)}=(f−Md1)mod Nf, {dot over (f)}=0, 1, . . . , Nf−1. In addition, the z-transform is 2π periodic in w for z=ejw, which means f−Md1 mod Nf may be reindexed in the arguments of Gs1 and
Assuming that the analysis and the synthesis filters' FIR coefficients are real, the term:
is conjugate symmetric in continuous variable f. Hence, equation (32) is rewritten as follows:
For a fixed f, any of the three summations over s1 and s2 of equation (32) may be expressed as product of a row vector, matrix, and column vector, that is:
is a row vector, size 1×S, containing all S synthesis filters' responses at discretized frequency f. The entries of the square matrix φr,S×S(f), size S×S, is given as:
In addition, φr,S×S(f) may be expressed as the product of the analysis matrix
and its conjugate that is:
where the matrix
size S by D, is defined as:
and consists of D·N column vectors, size C by 1, defined in row vector notation as:
with dε{0, 1, . . . , D−1}. Hence, equation (34) is rewritten as follows:
The right hand side of equation (33) is then expressed as a product of a block diagonal matrix and column vector, that is:
and by transpose and conjugation, the following is provided:
JI(G) may be a computationally tractable approximation. Using equation (42), the computationally tractable approximation of equation (16) is provided as:
The JS term of the objective function of equation (14) is a penalty term. The penalty term forces selection of a sparse array of sub-channels. A computationally tractable and efficient sparse sub-channel penalty term JS,compute(Gcompute) may be derived. The derivation begins by defining JS,sgn(G), which counts the number of active sub-channels by seeing whether each sub-channel's synthesis filter frequency response is non-zero or not. Alternatively, a sufficiently low (e.g., thresholded) level of frequency response may be treated as zero response. The continuous frequency variable, w, is discretized along a finite, uniformly or other spaced set of points to give the computationally tractable term JS,sgn,compute(Gcompute). Finally, an L−1 like penalty is substituted to not only increase computational efficiency but also to induce sparse solutions to give the desired JS,compute(Gcompute).
Solving the objective function not only minimizes the gain (e.g., time-averaged energy or other measure of gain) of interference sources but also encourages sparse sub-channels. Sparse sub-channels express a sub-set of the possible sub-channels given the possible sub-channels. For example, only a few of the N*C sub-channels are active. As before, N is the number of microphones, and C is the number of sub-channels of each microphone. In one embodiment, a sub-channel is inactive if its synthesis filter frequency response is zero or very small in magnitude. Any threshold may be used for “very small.” The number of active sub-channels as is represented as follows:
where sgn2(x) is 1 if x<0, 0 if x=0, and 1 if x>0. In other words, a channel is considered active if any portion of its frequency response is non-zero. 0≦w<π rather than 0≦w<2π is used since the filter taps are real and hence the frequency response is conjugate symmetric. Equation (46) counts the number of active sub-channels and since sgn is applied to the maximum of absolute values, the value of equation (46) lies in the appropriate range of 0 to N·C.
To make equation (46) computationally tractable, the continuous frequency variable, w, is discretized using the Nf points as is done for equation (21), so equation (46) becomes:
The summations over n and I are combined into a single summation over s using equation (25), as before. In addition, similar to the spirit of Compressive Sampling, the sgn2 function is replaced by the absolute value. Equation (47) becomes:
The penalty term of the object function is a maximum of an absolute value of an infinity norm with discrete frequencies.
In the objective function of equation (15), equation (48) is used. Shortening the notation provides:
J
S,compute(Gcompute)=JS,abs,compute(Gcompute) (49).
Using this term with the constant λ, the optimization may be iteratively performed. Different values of the constant are tested until the desired number of sub-channels results from minimization of the objective function. The user inputs a number of sub-channels to be used in the audio system. The number is less than N*C. The optimization solves with the penalty term including a count of the sub-channels with the respective frequency responses above a threshold or active. The sub-channels with the respective frequency response above the threshold are included in the sub-set of the placement, and the sub-channels with the respective frequency response below the threshold are not included in the sub-set. Different values of the constant result in different numbers of sub-channels in the active and inactive sub-sets.
In one embodiment, the optimization problem is run iteratively to tune the parameter λ at each iteration until the desired number (e.g., 20 out of 64) of active sub-channels results. Any search pattern or approach may be used to select the next value of the constant to use in each iteration. For example, λ, a non-negative scalar, is found through a bisection algorithm since as λ increases, the number of active sub-channel decreases, and similarly as λ decreases, the number of active sub-channels increase.
In one embodiment, a sub-channel is considered inactive if the maximum magnitude of its synthesis filter response is less than 1/1000 of the greatest maximum magnitude of the responses of the synthesis filters.
The objective function with the multiple terms is subject to a constraint of target source perfect reconstruction during the minimization. Other than perfect reconstruction may be used in alternative embodiments. The target perfect reconstruction condition, TPR, discretizes the continuous variable, w, using Nf points. For f=0, 1, . . . ,Nf−1, the TPR is given as:
Since there are D constraints for each of the Nf discretized frequencies, the TPR condition has a total of Nf−D constraints.
In matrix-vector form, the D target perfect reconstruction of conditions of equation (50) for a fixed fε{0, 1, . . . ,Nf−1} are written as:
where S=N·C, the number of sub-channels. The matrix {dot over (F)}S×D(f), size D by S, is defined as:
and includes D·N row vectors, size 1 by C, defined as:
with dε{0, 1, . . . , D−1}.
The column vector
size S by 1, is defined as:
and includes N column vectors, size C by 1, defined as:
The column vector eTk,1×D, size D×1, is defined as the D×1 zero vector but with the k-th entry set to 1, that is:
If the same set of C analysis filters are used for each filterbank that is:
for all 0≦n1, n2≦N−1, 0≦d≦D−1, and 0≦f≦Nf−1, and the target inversion pre-filter removes the effect on phase from propagation perfectly, that is:
for all 0≦n≦N−1, 0≦d≦D−1, and 0≦f≦Nf−1, then equation (52) is of rank=min(D,C) since every C columns are scalar multiples of the previous C columns. Since D constraints are to be fulfilled, these two additional assumptions imply that D≦C.
If all filter taps are real, then the number of constraints may be almost halved using the conjugate symmetry of filter responses and the 2π periodicity in w in the z-transform for z=ejw. First, if equation (51) holds for 0≦f≦Nf−1, then the conjugate of the entire equation also holds, that is:
where the last equality follows because eT0,1×D contains all real entries. Next:
where the second line follows from conjugate symmetry and the third line follows from 2π periodicity in w. In summary, if TPRcompute(f) holds, so does TPRcompute(−f mod Nf). Hence, the constraints of equation (51) for all f as a product of a block-diagonal matrix-vector multiply, that is:
and by transposition:
where the block diagonal matrix
size
is given by:
the row vector of unknowns
size
is given by:
and the row vector of constraints
size
is given by:
As before, S represents the total number of sub-channels and is equal to the product the number of filterbanks N and the number of sub-channels per filterbank C, that is S=N·C.
In addition, by conjugating equation (62), the terms are consistent with equation (40) where the unknown synthesis filters are a column vector, resulting in:
The results of the above equations provide the objective function to be optimized. This objective function is computationally tractable. The optimization is represented as:
where Gcompute is given by equation (8), JI,p,compute is given by equation (45), JS,compute is given by equation (48) via equation (49), and TPRcompute is given by equation (63). The optimization of equation (68) is convex.
The solution of act 44 may be iteratively performed to provide a desired, predetermined, or user set number of placed sub-channels. Different values of λ are used until the optimization results in the set number of sub-channels. In alternative embodiments, a given value of λ is used and the resulting sub-set of sub-channels, regardless of the specific number, are placed or used. In yet another embodiment, different values of λ are used until the optimization results in a number of microphones being used. Each selected microphone of the sub-set may be associated with all or only some of the available sub-channels for that microphone.
Referring again to
The linking associates the optimized filter responses for the synthesis filters with each selected sub-channel. Labeling, loading the filter taps into the synthesis filter, assignment by reference number, or other linking associates the appropriate filter response with the appropriate microphone and microphone placement. For each of the selected possible locations of the sub-set identified by solving the objective function, linked filter responses are provided.
The linking is stored. For example, the association is stored with the filter responses. When the audio system for the target source location is to be used, the linked filter responses are loaded from memory into the programmable synthesis filters. The communication network provides the analysis filtered outputs for the desired or selected sub-channels from microphones at the selected locations for filtering by the programmed synthesis filters. Alternatively, the linking is used to program the synthesis filters without storage.
Acts 44 and 46 may be repeated for different target source locations and/or target source acoustic signals. An optimization is performed for each target source location and/or signal. Each optimization may result in different sub-sets of sub-channels and corresponding microphone locations of the possible locations. The same availability of sub-channels and possible locations are used, but the difference in location of the target source results in different placement of microphones and sub-channels as well as different filter responses. The same placement of microphones and/or sub-channels may occur with different filter responses or vice versa.
The repetition results in different audio systems for different target source locations and/or target signals. Where the same microphone array is to be used for the different audio system, the microphones needed for all the audio systems are placed, either through selection of existing microphones or installing of microphones. When any given audio system is active, the sub-set of sub-channels for that audio system are active or used.
In act 48, the optimized audio system is used. The microphones and sub-channels for the audio system are activated and/or connected through the communications network. The synthesis filters are programmed with the optimized filter responses. The beamformer designed by the optimization is established or configured with the microphones and sub-channels of the sub-set of possible locations.
Once configuration is complete, the signal or data representing the audio signals sensed by the microphones are processed along the beamformer channels. The active sub-channels provide the processing. For each active sub-channel, analysis filtering and synthesis filtering are provided. The resulting sub-channel signals are summed, providing signal or data representing the target source, if any, at the target location with attenuation of any interference sources.
In one example using the 32 possible locations of microphones and target source location of
The objective function used Nf=16. 9 of the 16 discrete frequencies are unique since all the filter taps are real and therefore frequency responses are conjugate symmetric.
While there have been shown, described and pointed out fundamental novel features of the invention as applied to preferred embodiments thereof, it will be understood that various omissions and substitutions and changes in the form and details of the methods and systems illustrated and in its operation may be made by those skilled in the art without departing from the spirit of the invention. It is the intention, therefore, to be limited only as indicated by the scope of the claims.
The present application claims the benefit of U.S. Provisional Patent Application Ser. No. 62/212,147, filed on Aug. 31, 2015, which is incorporated herein by reference in its entirety.
One or more aspects described herein were supported by the National Science Foundation (NSF) under contract numbers DMS-1109498 and 1440493. The U.S. Government may have certain rights in the claimed inventions.
Number | Date | Country | |
---|---|---|---|
62212147 | Aug 2015 | US |