The present invention relates to a reverberation removal device, a parameter estimation device, a reverberation removal method, a parameter estimation method, and a program.
A dereverberation technique for removing reverberation from an observed mixed sound signal is a technique widely used for preprocessing of speech recognition or the like. A weighted prediction error (WPE, NPL 1) is known as a method for removing reverberation from an observed mixed sound signal by using one or more microphones.
WPE has a problem that dereverberation performance is deteriorated due to model errors under a noise environment or under a poor determination condition (where the number of sound sources is larger than the number of microphones).
In view of the foregoing problem, the present invention aims to provide a reverberation removal device that is highly accurate even in noisy environments and underdetermined conditions.
A reverberation removal device of the present invention removes reverberation by applying a plurality of reverberation prediction filters to an observation signal while switching them according to each time frequency bin of the observation signal.
The reverberation removal device of the present invention is highly accurate even under noisy environments or underdetermined conditions.
Hereinafter, embodiments of the present invention will be described in detail. It should be noted that components having the same function are given the same number, and overlapping descriptions thereof are omitted accordingly.
First, a reverberation removal method (Switching WPE) disclosed in the present invention will be described.
The dereverberation problem to be solved by the present invention is:
[Math. 1]
x
j,t=Στ=0N
A
f,r∈M×K,sf,t∈CK,nf,t∈CM, (2)
a problem of estimating the following equation 2, which is a signal obtained after dereverberation, from an observation signal x expressed in equation 1 above.
[Math. 2]
z
f,t:=Στ=0N
Note that M is the number of microphones, K the number of sound sources, f the number of frequency bins (f=1, . . . , F), t a time frame (t=1, . . . , T), sf,t∈CK a vector composed of K sound source signals, nf,t∈CM a background noise, {Af,τ}N1τ=0⊂CM×K an acoustic/indoor impulse response from a sound source to a microphone, N2 satisfies 0≤N2<<N1, the first term of equation (3) represents direct and initial reflection components (the purpose of removing reverberation components in the latter half) of the K sound source signals, and the second term of equation (3) represents a noise signal n′f,t which may be different from an original noise signal nf,t.
Equations (4), (5) and (6) represent a model of the Switching WPE of the present invention. However, since the dereverberation problem can be handled independently for each frequency bin, the index f of the frequency bin will be omitted hereinafter.
Here, 0M∈CM is a zero vector, IM∈CM×M is a unit matrix, λ={λt}Tt=1 is a power spectrum density of {zt}Tt=1 averaged over the entire microphone, G1, . . . , Gn are filters of WPE (reverberation prediction filters), ε>0 is a small constant, {αt,i}ni=1 in a time frame t is a mixed weight (binary), and zt,i is a signal obtained after dereverberation.
Note that x−t is expressed as follows.
The x−t means an observation signal in a predetermined section (t-δ1˜t-δp) past the time frame t.
Parameters to be estimated in this model are the following three parameters.
The reverberation removal method Switching WPE disclosed in the present invention matches the conventional reverberation removal method WPE when n=1.
The Switching WPE disclosed in the present invention reduces model errors that have been a problem in the WPE and improves dereverberation performance by switching between a plurality of reverberation prediction filters G1, . . . , Gn to use the most appropriate dereverberation filter in each time frequency bin.
<<Dereverberation Device 11>>
A functional configuration of a reverberation removal device 11 for removing reverberation by using the parameters obtained by the aforementioned Switching WPE will be described with reference to
The reverberation removal device 11 of the present example is characterized in that a plurality of reverberation prediction filters are applied to an observation signal while switching them according to each time frequency bin of the observation signal, thereby removing reverberation.
As shown in the diagram, the reverberation removal device 11 of the present example includes a reverberation prediction filter storage unit 110a, a mixed weight storage unit 110b, and a post-dereverberation signal estimation unit 111.
<Reverberation Prediction Filter Storage Unit 110a>
The reverberation prediction filter storage unit 110a stores a plurality of (n) reverberation prediction filters G1, . . . , Gn that are estimated by the Switching WPE described above.
<Mixed Weight Storage Unit 110b>
The mixed weight storage unit 110b stores a mixed weight {αt,i}ni=1(t=1, . . . , T) estimated by the Switching WPE described above. The mixed weight is a binary vector that determines which one of the reverberation prediction filters G1, . . . , Gn should be applied in accordance with each time frequency bin.
<Post-Dereverberation Signal Estimation Unit 111>
The post-dereverberation signal estimation unit 111 estimates a post-dereverberation signal zt in the time frame t by subtracting the result of computing the reverberation prediction filter predetermined by the mixing weight to the observation signal x−t in the predetermined section past the time frame t (see equation (7)) from the observation signal xt in the time frame t (S111,
<<Parameter Estimation Device 12>>
A functional configuration of the parameter estimation device 12 which is a device for estimating a parameter by the foregoing Switching WPE will be described hereinafter with reference to
Operations of the respective functional configurations will be described hereinafter with reference to
<Initial Value Setting Unit 121>
The initial value setting unit 121 sets appropriate initial values to the reverberation prediction filters G1, . . . , Gn(S121)
<Dereverberation Unit 122>
The dereverberation unit 122 estimates the post-dereverberation signal zt in the time frame t by subtracting the result of computing any of the plurality of reverberation prediction filters to the observation signal x−t in a predetermined section past the time frame t, from the observation signal xt in the time frame t (S122).
<Mixed Weight/Power Spectrum Updating Unit 123>
The mixed weight/power spectrum updating unit 123 updates a mixed weight αt determining which reverberation prediction filter should be applied according to each time frequency bin, and a power spectrum λt obtained after dereverberation in the time frame t (S123). Specifically, the mixed weight/power spectrum updating unit 123 updates the power spectrum λ and the mixed weight α based on equations (8) and (9).
<<Reverberation Prediction Filter Updating Unit 124>>
The reverberation prediction filter updating unit 124 updates the reverberation prediction filters (S124). Specifically, the reverberation prediction filter updating unit 124 updates the reverberation prediction filters G1, . . . , Gn, based on equation (12) which is the optimum solution of the following equation (10).
Here, * represents a matrix of size M×M, and matrices Ri and Pi are represented by the following equations (11) and (12).
<Control Unit 125>
The control unit 125 transmits a control command for repeatedly executing processing (S122) of the dereverberation unit 122, processing (S123) of the mixed weight/power spectrum updating unit 123, and processing (S124) of the reverberation prediction filter updating unit 124, until a predetermined condition is satisfied (S125). Examples of the predetermined condition include conditions such as until a predetermined repetition condition is reached, and when an update amount of a parameter including the mixed weight αt, the power spectrum λt, and the reverberation prediction filter becomes equal to or less than a predetermined threshold.
The device of the present invention includes, for example, as a single hardware entity, an input unit to which a keyboard or the like can be connected, an output unit to which a liquid crystal display or the like can be connected, a communication unit to which a communication device (e.g., a communication cable) capable of communicating with the exterior of the hardware entity can be connected, a CPU (Central Processing Unit, may also include a cache memory, registers, etc.), a RAM or ROM serving as a memory, an external storage device, which is a hard disk, and a bus that connects the input unit, the output unit, the communication unit, the CPU, the RAM, the ROM, and the external storage device such that data can be exchanged therebetween. If necessary, the device (the drive) capable of reading and writing a storage medium such as a CD-ROM may be provided in the hardware entity. A general-purpose computer or the like is an example of a physical entity including such hardware resources.
The external storage device of the hardware entity stores a program needed to realize the above-mentioned functions and data needed for the processing of this program (the program may be stored not only in the external storage device, but also in, for example, a ROM which is a read-only storage device). Also, the data and the like obtained through the processing of the program are stored as needed in a RAM, an external storage device, and the like.
In the hardware entity, each program stored in the external storage device (or ROM, etc.) and the data needed for processing each program are loaded to the memory as needed, and interpreted, executed, and processed by the CPU as appropriate. As a result, the CPU realizes predetermined functions (respective configuration requirements represented as . . . unit, . . . means and the like as described above).
The present invention is not limited to the embodiments described above, and can be modified appropriately within a scope not departing from the gist of the present invention. Further, the processes described in the foregoing embodiments are not only executed in chronological order in the described order, but also may be executed in parallel or individually according to a processing capability of a device that executes the processes or as necessary.
As described above, when the processing functions in the hardware entity (the device of the present invention) described in the foregoing embodiments are realized by a computer, the processing contents of the functions to be included in the hardware entity are described by a program. By executing this program on the computer, the processing functions in the above-described hardware entity are realized on the computer.
The various types of processing described above can be executed by causing a recording unit 10020 of a computer shown in
The program describing the processing contents can be recorded in a computer readable recording medium. Examples of the computer readable recording medium include a magnetic recording device, an optical disk, a magneto-optical recording medium, and a semiconductor memory. Specifically, for example, a hard disk device, a flexible disk, a magnetic tape, or the like can be used as the magnetic recording device, a DVD (Digital Versatile Disc), a DVD-RAM (Random Access Memory), a CD-ROM (Compact Disc Read Only Memory), a CD-R (Recordable)/RW (ReWritable), or the like can be used as the optical disk, an MO (Magneto-Optical disc) or the like can be used as the magneto-optical recording medium, and an EEP-ROM (Electronically Erasable and Programmable-Read Only Memory) or the like can be used as the semiconductor memory.
The program is distributed, for example, by sales, transfer, or rent of a portable recording medium such as a DVD or a CD-ROM on which the program is recorded. In addition, a configuration is possible in which the program is distributed by storing the program in advance in a storage device of a server computer and transferring the program from the server computer to another computer via a network.
A computer executing such a program is configured to, for example, first, temporarily store, in its own storage device, a program recorded on a portable recording medium or a program transferred from a server computer. Then, at the time of executing the processing, the computer reads the program stored in its own recording medium and executes the processing according to the read program. As another execution form of the program, the computer may directly read the program from the portable recording medium and execute processing according to the program, or may sequentially execute processing according to the received program every time the program is transferred from the server computer to the computer. In addition, by a so-called ASP (Application Service Provider) type service which does not transfer a program from the server computer to the computer but implements a processing function only by the execution instruction and the result acquisition, the above-mentioned processing may be executed. It is assumed that the program in the present embodiment includes data which is information to be provided for processing by an electronic computer and equivalent to a program (data or the like which is not a direct command to the computer but has a property to specify the processing of the computer).
Further, according to this aspect, the computer is caused to execute a predetermined program to constitute the hardware entity, but at least part of the processing contents may be realized by means of hardware.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2021/004097 | 2/4/2021 | WO |