The present invention relates to sound source localization method and a sound system, and more particularly, a low computational complexity, high accuracy sound source localization method and a sound system.
Sound source localization is an important technology in the field of sound signal processing. In the operation of sound source separation or reducing environmental noise interference, it is very helpful for the performance of sound separation or noise cancellation with the position information of the target or the interference source. In addition, in voice-related processing applications, the location of the sound source is also an important piece of information to the system, such as confirming the position of the speaker in the video conference or identifying the direction of the talker of the smart robot. Generally, the more accurate sound source localization system requires a microphone array of different positions in the space is arranged in a certain manner by a plurality of microphones. Due to its spatial selectivity, the microphone array may implement the sound source localization within a certain range.
The multiple signal classification (MUSIC) algorithm is a commonly used sound source localization method. However, the MUSIC algorithm is in high computational complexity, and the sound source cannot be localized accurately.
Therefore, it is necessary to improve the prior art.
It is, therefore, a primary objective of the present invention to provide a low computational complexity and high accuracy sound source localization method and a sound system to improve over disadvantages of the prior art.
An embodiment of the present invention discloses a sound source localization method, applied to a sound system, the sound system comprising a microphone array, the method comprising the microphone array receiving a received signal; establishing a cost function according to the received signal; forming a plurality of particles, wherein the plurality of particles are a plurality of virtual particles, and computing a plurality of update positions of the plurality of particles according to a plurality of current positions of the plurality of particles and the cost function, and obtaining at least one sound source locations according to the plurality of update positions.
An embodiment of the present invention further discloses a sound system, comprising a microphone array, comprising a plurality of microphone, configured to receive a received signal, and a sound source localization module, configured to perform the following steps: establishing a cost function according to the received signal; forming a plurality of particles, wherein the plurality of particles are a plurality of virtual particles, and computing a plurality of update positions of the plurality of particles according to a plurality of current positions of the plurality of particles and the cost function, and obtaining at least one sound source locations according to the plurality of update positions.
These and other objectives of the present invention will no doubt become obvious to those of ordinary skill in the art after reading the following detailed description of the preferred embodiment that is illustrated in the various figures and drawings.
Different from the prior art, the sound source localization module 14 may obtain a sound source location according to a received signal received by the microphone array 12, e.g., by a particle swarm optimization (PSO) algorithm.
Step 202: The microphone array receives a received signal.
Step 204: Establish a cost function according to the received signal.
Step 206: Form a plurality of particles.
Step 208: Compute a plurality of update positions of the plurality of particles according to a plurality of current positions of the plurality of particles and the cost function, and obtaining at least one sound source location according to the plurality of update positions.
In Step 202, the microphone array 12 receives a received signal r, wherein received signal r may be expressed as r=[r1, . . . , rM]T in vector notation, wherein rm represents the signal received by the microphone 120_m.
In Step 204, the sound source localization module 14 establishes a cost function CF according to the received signal r. The cost function CF may represent or respond to the reliability of the computation of the sound source location, and there is a monotonous increasing or monotonous decreasing relation between the cost function CF and the reliability of the computation of the sound source location. When the relation between the cost function CF and the reliability of the sound source location is monotonous increasing, the larger cost value corresponding to the cost function CF represents the higher reliability of the computed sound source location.
Methods of establishing the cost function CF are not limited. In an embodiment, the function used within the MUSIC algorithm (notated in SMUSIC) may be applied as the cost function CF in Step 204.
In detail, the sound source localization module 14 may compute a correlation matrix Rrr, corresponding to the received signal r and according to the received signal r, as Rrr=E[r·rH]. The notation E[⋅] represent the average operation, which may be an ensemble average or a time average in statistics.
After the sound source localization module 14 obtains the correlation matrix Rrr, the sound source localization module 14 can perform an eigenvalue decomposition on the correlation matrix Rrr, to obtain a plurality of eigenvalues λ1, . . . , λM and a plurality of eigenvectors v1, . . . , vM corresponding to the correlation matrix Rrr, wherein λ1≥ . . . ≥λM and the eigenvectors v1, . . . , vM are corresponding to the eigenvalues λ1, . . . , λM.
After the sound source localization module 14 obtains the eigenvectors v1, . . . , vM, the sound source localization module 14 can establish a projection matrix PN corresponding to a noise subspace as
wherein D is the number of sound sources, and M is the number of microphones within the microphone array.
In addition, the sound source localization module 14 can obtain an array manifold vector a corresponding to the microphone array 12 according to the topology of the microphone array 12. For example, if the microphone array 12 is a uniform linear array (ULA) as shown in
After the sound source localization module 14 obtains the array manifold vector a, the sound source localization module 14 can obtain the cost function CF or the function SMUSIC as CF(θ, φ)=SMUSIC(θ, φ)=1/(aH(θ, φ)·PN·a(θ, φ)) according to the projection matrix PN and the array manifold vector a. Due to the fact that the signal subspace is orthogonal to the noise subspace, when (θSS, φSS) represents/corresponds to a sound source location SS, aH(θSS, φSS)·PN·a(θSS, φSS)=0 and CF(θSS, φSS)=SMUSIC(θSS, φSS) should tend to infinity.
In Step 206, the sound source localization module 14 forms a plurality of particles ptcij, wherein the plurality of particles ptcij are a plurality of virtual particles. In an embodiment, the sound source localization module 14 forms the plurality of virtual particles ptcij in the 2-dimensional space spanned by the elevation angle θ and the azimuth angle φ, and each particle location xij or the virtual particle ptcij is corresponding to an azimuth angle φi and an elevation angle θj, for convenience, the particle locations xij of the particles ptcij can be express as xij=(φi, θj).
In Step 208, the sound source localization module 14 computes the plurality of update positions xij(tn+1) of the plurality of particles ptcij according to the plurality of current positions xij(tn) of the plurality of particles ptcij and the cost function CF, and obtains at least one sound source location according to the plurality of update positions xij(tn+1).
Details of Step 208 can be referred to
Step 300: Obtain a plurality of initial particle positions xij(t0) of the plurality of particles ptcij.
Step 302: Compute a plurality of cost values CF(φi(tn), θj(tn)) corresponding to the plurality of particles ptcij according to the plurality of particle positions xij(tn) of the plurality of particles ptcij and the cost function CF.
Step 304: Obtain a global best position g(tn) and a plurality of personal best position pij(tn) corresponding to the plurality of particles ptcij.
Step 306: Compute a plurality of particle velocities vij(tn+1) corresponding to the plurality of particle positions xij(tn) according to the plurality of particle positions xij(tn), the global best position g(tn), and the personal best position pij(tn).
Step 308: Compute the plurality of particle positions xij(tn+1) according to the plurality of particle positions xij(tn) and the plurality of particle velocities vij(tn+1).
Step 310: Determine whether a stopping criterion is achieved. If yes, go to Step 312; if not, go to Step 302.
Step 312: Obtain a sound source location S=(φS, θS) according to the plurality of update positions xij(tn+1).
In Step 300, the sound source localization module 14 may distribute the plurality of particle positions xij(t0) over the 2-dimensional space spanned by the elevation angle θ and the azimuth angle φ. In an embodiment, the sound source localization module 14 may uniformly distribute the plurality of initial particle positions xij(t0) over the 2-dimensional space spanned by the elevation angle θ and the azimuth angle φ (as shown in
In Step 302, the sound source localization module 14 may substitute the plurality of particle positions xij(tn)=(φi(tn), θj(tn)) of the plurality of particles ptcij into the cost function CF to compute the plurality of cost values CF(φi(tn), θj(tn)) corresponding to the plurality of particles ptcij.
In Step 304, the sound source localization module 14 may choose the global best position g(tn) according to the plurality of cost values CF(φi(tn), θj(tn)). In addition, for a specific particles ptcij, the sound source localization module 14 may choose the personal best position pij(tn) corresponding to the particles ptcij according to the historical position xij(t0), . . . , xij(tn) of the particles ptcij. The global best position g(tn) is the position having (or corresponding to) the cost value CF(φi(tn), θj(tn)) which is maximum among the ones of the plurality of particle positions xij(tn). The personal best position pij(tn) corresponding to the particles ptcij is the position having (or corresponding to) the cost value CF(φi(t), θj(t)) among the ones of the historical positions xij(t0), . . . , xij(tn).
In Step 306, the sound source localization module 14 may compute the particle velocity vij(tn+1) as vij(tn+1)=w vij(tn+1)+r1c1(pij(tn)−xij(tn))+r2c2(g(tn)−xij(tn)), wherein w is the inertia weight, c1, c2 are the acceleration constants, and r1, r2 are uniform distributed random variables within the interval [0,1]. Moreover, w vij(tn+1) is the inertia term, (pij(tn)−xij(tn)) is the cognition term, and (g(tn)−xij(tn)) is the social term.
In Step 308, the sound source localization module 14 may compute the particle position xij(tn+1) as xij(tn+1)=xij(tn)+vij(tn+1).
In Step 310, the sound source localization module 14 determines whether the stopping criterion is achieved. The stopping criterion may be |xij(tn+1)−xij(tn)|<ε or an iteration index n reaching the maximum iteration limit N. If |xij(tn+1)−xij(tn)|<ε or n==N holds, the sound source localization module 14 determines that the stopping criterion is achieved, and the sound source localization module 14 may go to Step 310 to obtain the sound source location S=(φS, θS) according to the plurality of update positions xij(tn+1). Otherwise, the sound source localization module 14 may go back to Step 302 to perform next iteration, including the execution of n=n+1.
For the n-th iteration (corresponding to the time tn), the particle position xij(tn) may be regarded as the current position of the particles ptcij in Step 302, and the particle position xij(tn+1) may be regarded as the update positions of the particles ptcij in Step 308.
The process 30 is suitable for the single sound source scenario. Nevertheless, the PSO algorithm may also be applied to the scenario of multiple sound sources.
Please refer to
Step 400: Obtain the plurality of initial particle positions xij(t0) of the plurality of particles ptcij.
Step 402: Compute the plurality of cost values CF(φi(tn), θj(tn)) corresponding to the plurality of particles ptcij according to the plurality of particle positions xij(tn) of the plurality of particles ptcij and the cost function CF.
Step 404: Obtain the plurality of local best positions Lij(tn) corresponding to the plurality of particles ptcij and the plurality of personal best positions pij(tn).
Step 406: Compute the plurality of particle velocities vij(tn+1) corresponding to the plurality of particle positions xij(tn) according to the plurality of particle positions xij(tn), the plurality of local best positions Lij(tn) and the personal best position pij(tn).
Step 408: Compute the plurality of particle positions xij(tn+1) according to the plurality of particle positions xij(tn) and the plurality of particle velocities vij(tn+1).
Step 410: Determine whether the stopping criterion is achieved. If yes, go to Step 412; otherwise, go to Step 402.
Step 412: Obtain a plurality of sound source locations S according to plurality of update positions xij(tn+1).
The process 40 is similar to the process 30. The difference between the process 40 and process 30 is that, the sound source localization module 14 replaces the global best position g(tn) in Step 304 and 306 with the local best positions Lij(tn) in Step 404 and 406, to perform the computation of the particle velocities vij(tn+1).
In Step 404, the sound source localization module 14 forms a region RGij centered at the particles ptcij or the particle positions xij(tn), and chooses a plurality of regional particles ptcij(RG) from the plurality of particle positions xij(tn) which is in the region RGij. That is, the plurality of regional particle positions xi(RG) corresponding to the plurality of regional particles ptcij(RG) is within RGij.
In an embodiment, the region RGij is a set formed by particle positions with distances related to the particle positions xij(tn) being smaller than a parameter a. Generally speaking, the region RGij may be expressed as RGij={x=(φ, θ)|∥x−xij(tn)∥≤σ}. ∥·∥ is generally referred to the norm operation. ∥x∥ may be ∥x∥1, ∥x∥2 or ∥x∥∞. Norm ∥x∥1, ∥x∥2 or ∥x∥∞ are known to one skilled in the art and omitted herein for brevity. Moreover, ∥x∥2 is the Euclidean norm, and the region RGij formed by the Euclidean norm, expressed as RGij={x=(φ, θ)|∥x−xij(tn)∥2≤σ}, is a circle centered at the xij(tn) with radius σ.
Moreover, the radius a of the region could be determined according to practical situations or rules of thumb. If two sound sources are too close or the radius a of the region is too large, the location best positions of all particles would point to a sound source with strong energy, which is not good for the sound source separation.
The sound source localization module 14 may compute the plurality of regional cost values CF(RG)(φi(tn), θj(tn)) corresponding to the plurality of regional particles ptcij(RG) (wherein CF(RG)(φi(tn), θj(tn))=CF (φi(tn), θj(tn)), xij(RG)=(φi(tn), θj(tn))∈RGij), and choose the local best positions Lij(tn) corresponding to the particles ptcij according to the plurality of regional cost values CF(RG)(φi(tn), θj(tn)), wherein the local best position Lij(tn) is the position having (or corresponding to) the regional cost value CF(RG)(φi(tn), θj(tn)) which is maximum among the ones of the plurality of regional particle positions xij(RG).
In Step 406, the sound source localization module 14 may compute the particle velocities vij(tn+1) as vij(tn+1)=w vij(tn+1)+r1c1(pij(tn)−xij(tn))+r2c2(Lij (tn)−xij(tn)).
Other steps in process 40 are the same as the ones in the process 30, which is not narrated herein for brevity.
The processes 30 and 40 are the embodiments to realize Step 208. The process 30 may be applied to single sound source scenario, while the process 40 may be applied to the scenario of multiple sound sources.
In the prior art, the sound source localization using the MUSIC algorithm requires exhaustive search, and the computation complexity is large. In addition, the resolution of the sound source localization depends on the microphone number M of the microphone array. In comparison, the present invention utilizes the PSO algorithm to perform the sound source localization, which does not require too much number of microphones M to achieve accurate sound source localization. In addition, the computation complexity of the PSO algorithm is lower than which of the MUSIC algorithm.
In summary, the present invention utilizes the PSO algorithm to perform sound source localization, which can achieve better accuracy and lower computation complexity.
Those skilled in the art will readily observe that numerous modifications and alterations of the device and method may be made while retaining the teachings of the invention. Accordingly, the above disclosure should be construed as limited only by the metes and bounds of the appended claims.
Number | Date | Country | Kind |
---|---|---|---|
108136524 | Oct 2019 | TW | national |