1. Field of the Invention
The present invention relates to signal processing apparatuses, signal processing methods, and programs, and particularly relates to a signal processing apparatus capable of more easily and more reliably detecting noise, a signal processing method, and a program.
2. Description of the Related Art
In apparatuses which collect sound using incorporated microphones such as IC recorders, for example, it is likely that noise referred to as “touch noise” is generated since users touch the apparatuses when sound is collected.
In particular, click noise is generated due to energy integrated within a short period of time when various functional switches are clicked during recording and is output as abnormal noise which is not masked by other sounds at a time of reproduction of the collected sound and which is offensive to the ear. Therefore, there is a demand for a technique of detecting and reducing such click noise.
As a method for reducing click noise, a method for performing filter processing on a signal to be processed using a high-pass filter and detecting click noise using a ratio of a maximum value to a movement average value (refer to Japanese Examined Patent Application Publication No. 7-105692, for example) and a method for detecting click noise using a difference between a maximum value and a minimum value in a frame (refer to Japanese Patent No. 3420831, for example) have been proposed.
However, in these methods, if the signal to be processed includes a portion corresponding to high energy and a portion corresponding to low energy, not only click noise but also music, voice (especially, consonant), and the like may be detected as click noise. For example, a signal having high energy level for a certain period may be detected as the click noise.
Therefore, a method for detecting a persistence length of a pulse signal and determining that the signal is not click noise but a music signal when the persistence length is equal to or larger than a certain length has been proposed (refer to Japanese Patent No. 2702446, for example).
However, in the method for detecting a persistence length, a high-pass filter and a low-pass filter are used for detecting click noise, and in addition, the low-pass filter has to have a relatively steep characteristic. Accordingly, a calculation amount inevitably becomes large.
It is desirable to more easily and more reliably detect noise.
According to an embodiment of the present invention, there is provided a signal processing apparatus including absolute value means for converting an audio signal into absolute values, representative value calculation means for calculating representative values of consecutive sample values included in blocks of the audio signal which has been converted into the absolute values using at least maximum sample values among values of the samples included in the blocks for individual blocks, average value calculation means for determining a section which includes a predetermined number of consecutive blocks as a frame and calculating a maximum value of the representative values of the blocks included in the frame and an average value of the representative values of the blocks included in the frame, and detection means for detecting click noise in the frame on the basis of a ratio of the maximum value to the average value.
The representative value calculation means may determine that maximum sample values among the values of the samples included in the blocks correspond to the representative values for individual blocks.
The detection means may determine that the frame includes the click noise when the ratio of the maximum value to the average value is equal to or larger than a predetermined threshold value.
The detection means may detect the click noise in the frame to be processed using the maximum value and the average value of the frame to be processed and maximum values and average values of other frames located in the vicinity of the frame to be processed.
The signal processing apparatus may further include past interpolation waveform generation means for generating a past interpolation waveform to be used for interpolation of a noise section including the click noise using a first waveform of a section of the audio signal which has the same length as the noise section and which is located on a past side relative to the noise section of the audio signal, future interpolation waveform generation means for generating a future interpolation waveform to be used for the interpolation of the noise section using a second waveform of a section of the audio signal which has the same length as the noise section and which is located on a future side relative to the noise section of the audio signal, interpolation waveform generation means for generating an interpolation waveform by cross-fade using the past interpolation waveform and the future interpolation waveform, and replacing means for reducing the click noise by replacing the noise section of the audio signal by the interpolation waveform.
The signal processing apparatus may further include noise section detection means for determining, when the click noise is detected in the frame to be processed, that a noise starting block corresponds to one of the blocks which has a representative value equal to or smaller than a threshold value which is one of representative values of a frame located immediately before the frame to be processed and which is located, on the past side, in a nearest position relative to one of the blocks which has the maximum representative value of the frame to be processed, and for detecting a position of one of the samples which performs zero-cross first and which is located on the past side relative to a last sample included in the noise starting block.
The signal processing apparatus may further include noise section detection means for determining, when the click noise is detected in the frame to be processed, that a noise terminating clock corresponds to one of the blocks which has a representative value equal to or smaller than a threshold value corresponding to one of representative values of a frame located immediately after the frame to be processed and which is located, on the future side, in a nearest position relative to one of the blocks which has the maximum representative value of the frame to be processed, and for detecting a position of one of the samples which performs zero-cross first and which is located on the future side relative to a leading sample included in the noise terminating block.
The past interpolation waveform generation means may generate the past interpolation waveform by performing time reversal on the first waveform of the section of the audio signal which has the same length as the noise section and which is located adjacent to the noise section on the past side. The future interpolation waveform generation means may generate the future interpolation waveform by performing the time reversal on the second waveform of the section of the audio signal which has the same length as the noise section and which is located adjacent to the noise section on a future side.
The past interpolation waveform generation means may generate the past interpolation waveform by performing the time reversal on the first waveform and inverting signs of values of samples located before and after an end sample of the noise section on the past side when the signs of the signs of the values of the samples are different from each other. The future interpolation waveform generation means generates the future interpolation waveform by performing the time reversal on the second waveform and inverting signs of values of samples located before and after an end sample of the noise section on the future side when the signs of the signs of the values of the samples are different from each other.
The signal processing apparatus may further include noise section detection means for determining, when the click noise is detected in the frame to be processed, that a starting position of the click noise corresponds to a position of a leading sample of one of the blocks which has a representative value equal to or smaller than a threshold value corresponding to one of representative values of a frame located immediately before the frame to be processed and which is located, on the past side, in a nearest position relative to one of the blocks which has the maximum representative value of the frame to be processed.
The signal processing apparatus may further include noise section detection means for determining, when the click noise is detected in the frame to be processed, that a terminating position of the click noise corresponds to a position of a last sample of one of the blocks which has a representative value equal to or smaller than a threshold value corresponding to one of representative values of a frame located immediately after the frame to be processed and which is located, on the past side, in a nearest position relative to one of the blocks which has the maximum representative value of the frame to be processed.
The replacing means may generate an adjacent interpolation waveform by performing cross-fade using a waveform of a section which has a predetermined length and which is located immediately before the noise section of the audio signal and a waveform of a section which has a predetermined length and which is located immediately before the section corresponding to the first waveform of the audio signal, and replace the adjacent section by the adjacent interpolation waveform.
The replacing means may generate an adjacent interpolation waveform by performing cross-fade using a waveform of a section which has a predetermined length and which is located immediately after the noise section of the audio signal and a waveform of a section which has a predetermined length and which is located immediately after the section corresponding to the second waveform of the audio signal, and replace the adjacent section by the adjacent interpolation waveform.
According to another embodiment of the present invention, there is provided a signal processing method including the steps of converting an audio signal into absolute values, calculating representative values of consecutive sample values included in blocks of the audio signal which has been converted into the absolute values using at least maximum sample values among values of the samples included in the blocks for individual blocks, determining a section which includes a predetermined number of consecutive blocks as a frame and calculating a maximum value of the representative values of the blocks included in the frame and an average value of the representative values of the blocks included in the frame, and detecting click noise in the frame on the basis of a ratio of the maximum value to the average value.
According to a further embodiment of the present invention, there is provided a program which causes a computer to perform a process including the steps of converting an audio signal into absolute values, calculating representative values of consecutive sample values included in blocks of the audio signal which has been converted into the absolute values using at least maximum sample values among values of the samples included in the blocks for individual blocks, determining a section which includes a predetermined number of consecutive blocks as a frame and calculating a maximum value of the representative values of the blocks included in the frame and an average value of the representative values of the blocks included in the frame, and detecting click noise in the frame on the basis of a ratio of the maximum value to the average value.
Accordingly, noise may be more reliably and more easily detected.
Embodiments of the present invention will be described hereinafter with reference to the accompanying drawings.
For example, a signal processing apparatus 11 corresponds to a recording/reproducing apparatus which collects surrounding sound and reproduces the collected sound. To the signal processing apparatus 11, a signal such as a sound signal collected using a microphone or the like is input. The signal processing apparatus 11 detects click noise in the input signal, removes the click noise, and outputs the signal from which the click noise is removed as an output signal.
The signal processing apparatus 11 includes a noise detection unit 21 and a noise reduction unit 22. An input signal is supplied to the noise detection unit 21 and the noise reduction unit 22.
The noise detection unit 21 detects a section including click noise in the input signal and supplies a result of the detection to the noise reduction unit 22. Note that the click noise corresponds to a signal in a short section in a time direction of the signal which includes concentrated larger energy (amplitude) when compared with other surrounding sections.
The noise reduction unit 22 removes the click noise from the input signal where appropriate in accordance with the result of the detection of click noise supplied from the noise detection unit 21 and outputs a resultant signal.
The noise detection unit 21 illustrated in
The full-wave rectifying circuit 51 converts the input signal as an absolute value and supplies the absolute value to the representative-value determination unit 52. The representative-value determination unit 52 divides the signal which has been converted into the absolute value and which has been supplied from the full-wave rectifying circuit 51 into blocks corresponding to sections each of which has a predetermined length, calculates representative values of the blocks, and supplies the representative values to the average-value calculation unit 53. For example, a maximum value among values of samples of an input signal included in a block serves as a representative value of the values of the samples of the block.
The average-value calculation unit 53 calculates a maximum value and an average value of consecutive blocks included in a frame using the representative values of the blocks supplied from the representative-value determination unit 52 and supplies the maximum value and the average value to the determination unit 54. The determination unit 54 obtains a ratio of the average value to the maximum value of the frame supplied from the average-value calculation unit 53, determines whether the frame includes click noise in accordance with the ratio, and supplies a result of the determination as a result of detection of click noise to the noise reduction unit 22.
Furthermore, the noise reduction unit 22 illustrated in
Specifically, the noise reduction unit 22 includes a noise section determination unit 81, a past interpolation waveform generation unit 82, a future interpolation waveform generation unit 83, a synthesis unit 84, and a replacing unit 85. In the noise reduction unit 22, the signal is input to the noise section determination unit 81, the past interpolation waveform generation unit 82, the future interpolation waveform generation unit 83, and the replacing unit 85.
The noise section determination unit 81 specifies a section including click noise in the input signal in accordance with the result of the detection of click noise supplied from the determination unit 54 and supplies a result of the specifying to the past interpolation waveform generation unit 82, the future interpolation waveform generation unit 83, and the replacing unit 85. Note that, the section including click noise included in the input signal may be referred to as a “noise section” hereinafter.
The past interpolation waveform generation unit 82 generates a past interpolation waveform used for interpolation of the noise section using a section which is temporally before the noise section included in the input signal in accordance with the result of the specifying supplied from the noise section determination unit 81 and the input signal, and supplies the past interpolation waveform to the synthesis unit 84.
The future interpolation waveform generation unit 83 generates a future interpolation waveform used for interpolation of the noise section using a section which is temporally after the noise section included in the input signal in accordance with the result of the specifying supplied from the noise section determination unit 81 and the input signal, and supplies the future interpolation waveform to the synthesis unit 84.
The synthesis unit 84 synthesizes the past interpolation waveform supplied from the past interpolation waveform generation unit 82 and the future interpolation waveform supplied from the future interpolation waveform generation unit 83, and supplies a resultant interpolation waveform to the replacing unit 85. The replacing unit 85 removes the click noise by replacing the noise section included in the input signal by the interpolation waveform supplied from the synthesis unit 84 using the specifying result supplied from the noise section determination unit 81, and outputs a resultant signal.
Referring now to
In step S11, the full-wave rectifying circuit 51 performs full-wave rectification on an input signal, that is, converts the input signal into absolute values, and supplies resultant values to the representative-value determination unit 52.
When an input signal having a waveform illustrated in an upper portion in
Note that, in
As described above, among waveforms having a predetermined time length, a waveform having large amplitude only in a considerably short section is determined as a waveform of click noise. Noise having such a waveform is also referred to as petit noise or pulse noise which is offensive to the ear.
In the signal processing apparatus 11, when click noise is to be detected, an input signal is converted into absolute values. However, since human ears do not recognize click noise by a sign of an amplitude value, conversion of an input value into absolute values does not affect the detection of click noise. Note that, the human ears recognize click noise due to a considerable change of amplitude, that is, dramatic increase and decrease of power within a short period of time.
Referring back to the flowchart illustrated in
As illustrated in
In step S13, the average-value calculation unit 53 obtains a maximum value and an average value of representative values of the blocks included in a frame using the representative values of the blocks supplied from the representative-value determination unit 52 and supplies the maximum value and the average value to the determination unit 54.
For example, as shown in
For example, in the example shown in
In step S14, the determination unit 54 obtains a ratio of the maximum value to the average value for each frame supplied from the average-value calculation unit 53. For example, when the maximum value of the representative values of the blocks included in the frame to be processed is represented by PK and an average value of the representative values of the blocks included in the frame is represented by AVC, the determination unit 54 calculates a ratio RT of the maximum value and the average value as follows: RT=(PK/AVC).
In step S15, the determination unit 54 determines whether the frame to be processed includes click noise in accordance with the obtained ratio RT of the maximum value to the average value. Specifically, when the obtained ratio RT is equal to or larger than a predetermined threshold value th, it is determined that the frame to be processed includes click noise.
For example, when the threshold value th is “3”, the maximum value PK is three times or more larger than the average value AVC in the example shown in
In the signal processing apparatus 11, accuracy of the detection of click noise is improved since the average value of the representative values of the blocks is used instead of an average value of the values of the samples of the input signal.
It is assumed that, as shown in an upper portion in
Although the input signal shown in the upper portion in
When the input signal is to be processed, the input signal is converted into absolute values. By this, an input signal shown in a lower portion in
Then, the input signal which has been converted into the absolute values is divided into blocks as shown in
Assuming here that a threshold value th for detection of click noise is “3”, since a ratio TR of the maximum value to the average value (TR=(PK21/AVC21)) is smaller than the threshold value th “3” in this example, it is reliably determined that the frame does not include click noise.
On the other hand, the ratio (PK21/AVS21) of the maximum value PK21 to an average value AVS21 of values of all the samples included in the frame is equal to or larger than the threshold value th “3”. Therefore, if the determination as to whether the frame to be processed includes click noise is performed by comparing this ratio with the threshold value th, it may be determined that a normal sound wavelength corresponds to click noise.
As described above, by detecting click noise using the ratio of the maximum value of the representative values of the blocks to the average value of the representative values of the blocks, a waveform (undulation) of the entire frame is reliably recognized and accuracy of the detection is further improved. That is, it may be more reliably determined whether even an input signal which is likely to be mistakenly detected as click noise such as an audio signal having a small average value of entire amplitude which is considerably changed in some sections includes click noise.
Note that, although, in the foregoing description, it is determined whether the frame includes click noise using the maximum value and the average value of the representative values of the blocks included in the frame, the determination may be made using not only the frame to be processed but also the frame to be processed and frames in the vicinity of the frame. When the detection of click noise is performed using a plurality of frames including the frame to be processed, accuracy of the detection of click noise may be further improved.
It is assumed that a signal having an audio waveform illustrated in
The audio waveform illustrated in
Since the waveform is generated when the sound “ka” is produced, the waveform does not represent click noise. However, in a case where a frame to be processed includes the rising portion designated by the arrow mark A11 but does not include the pitch waveform portion designated by the arrow mark A12, if detection of click noise is performed only using one of frames, false detection may occurs. That is, a consonant portion corresponding to a leading portion of the sound denoted by the arrow mark A11 may be detected as click noise.
Accordingly, when the detection of click noise is performed using representative values of blocks of some of the frames, accuracy of the detection is further improved. Specifically, it is assumed that an input signal having the audio waveform illustrated in
In an example shown in
When a maximum value and an average value of representative values of blocks are obtained for each frame, a maximum value PK(n) and an average value AVC(n) are obtained in the frame F(n), a maximum value PK(n+1) and an average value AVC(n+1) are obtained in the frame F(n+1), and a maximum value PK(n+2) and an average value AVC(n+2) are obtained in the frame F(n+2).
Here, in the frames F(n) and F(n+2), the maximum values PK(n) and PK(n+2) are large to some extent due to the consonant portion and the pitch waveform portion. On the other hand, since the frame F(n+1) does not include a sample having large amplitude, the maximum value PK(n+1) is comparatively small.
Furthermore, since the frames F(n) and F(n+1) include only a small number of samples having large amplitude, the average values AVC(n) and AVC(n+1) are comparatively small. On the other hand, in the frame F(n+2) including a pitch waveform having large amplitude, the average value AVC(n+2) is comparatively large.
It is now assumed that the frame F(n) corresponds to a frame to be processed. For example, the determination unit 54 obtains ratios of the maximum value PK(n) of the frame F(n) to be processed to the individual average values AVC(n) to AVC(n+2) of the frames F(n) to frames F(n+2), respectively, and compares the individual ratios with a threshold value th.
Then, in a condition in which ((PK(n)/AVC(n)≧th), (PK(n)/AVC(n+1)≧th), and (PK(n)/AVC(n+2)≧th)) are satisfied, it is determined that the frame F(n) to be processed includes click noise. That is, when the maximum value PK(n) is larger than a value obtained by multiplying each of the average values of the frames F(n) to F(n+2) by the threshold value, only amplitude of a portion of a block having the maximum value PK(n) as a representative value may be considerably projected in the consecutive three frames. Therefore, in this case, it is determined that the frame F(n) includes click noise.
Furthermore, in a case where inequalities PK(n)/AVC(n)≧th and PK(n)/AVC(n+2)<th are satisfied, the maximum value PK(n) is not considerably projected when compared with a degree of the average amplitude of the frame F(n+2) and does not correspond to click noise. Therefore, in this case, it is determined that the frame F(n) does not include click noise.
As described above, by comparing a maximum value of a frame to be processed with average values of other frames near the frame to be processed, accuracy of the detection of click noise may be improved.
Note that, click noise may be detected in another way such that a maximum value of a frame to be processed is compared with maximum values of other frames near the frame to be processed. In this case, when the maximum value PK(n) of the frame F(n) to be processed is larger than the maximum values PK(n+1) and PK(n+2) by a predetermined value, for example, it is determined that the frame F(n) includes click noise.
Referring back to the flowchart shown in
Then, the noise section determination unit 81 instructs the replacing unit 85 to output an output signal representing the frame to be processed of the input signal in accordance with the result of the determination supplied from the determination unit 54. The replacing unit 85 outputs the output signal representing a section corresponding to the frame to be processed of the input signal in accordance with the instruction supplied from the noise section determination unit 81, and thereafter, the process proceeds to step S21.
On the other hand, when it is determined that the frame includes click noise in step S15, the determination unit 54 supplies a result of the determination representing that the frame to be processed includes click noise to the noise section determination unit 81, and thereafter, the process proceeds to step S16.
Here, the result of the determination representing that click noise is included includes representative values of the blocks included in the frame to be processed and frames which are adjacent to the frame to be processed so as to sandwich the frame to be processed, a maximum value of the representative values, and an average value of the representative values.
In step S16, the noise section determination unit 81 specifies a noise section including click noise in the section corresponding to the frame to be processed of the input signal using the determination result of the click noise supplied from the determination unit 54.
For example, as shown in an upper portion of
Note that, in
In
First, the noise section determination unit 81 detects a starting position of a noise section of click noise including the block BK(n)−4 having the representative value serving as the maximum value PK(n), that is, a left end of the noise section in the drawing. In this case, the noise section determination unit 81 uses the average value AVC(n−1) of representative values of the blocks of the frame F(n−1) which is the preceding frame relative to the frame F(n) to be processed and which is positioned adjacent to the frame F(n) to be processed as a threshold value ths.
Then, the noise section determination unit 81 detects the first block which has a representative value smaller than the threshold value ths in a past direction from the block BK(n)−4 which is a center of the click noise. The detected block is determined as a noise starting block.
It is assumed that, in
Furthermore, the noise section determination unit 81 refers to a section corresponding to the block BK(n)−2 serving as the noise starting block of the input signal so as to specify a sample which first performs zero-cross in the past direction from the last sample in the section (block). Then, a position of the specified sample is determined as a starting position of the noise section.
For example, as designated by an arrow mark A41 shown in
In
Here, in the portion of the input signal designated by the arrow mark A41, a sample SP11 located in a right end in the drawing corresponds to the last sample of the section corresponding to the block BK(n)−2 of the input signal, that is, the latest sample in the section. Since a value of the sample SP11 is a positive value, a sample which has a negative value, which is located in a past position relative to the sample SP11, and which is located nearest the sample SP11 corresponds to a sample located in the starting position of the noise section. Therefore, in
After specifying the starting position of the noise section in this way, the noise section determination unit 81 detects a terminating position of the noise section of the click noise, that is, a right end of the noise section in the drawing, which includes the block BK(n)−4 having the maximum value PK(n) serving as the representative value. In this case, the noise section determination unit 81 uses the average value AVC(n+1) of representative values of the blocks included in the frame F(n+1) which is located adjacent to the frame F(n) to be processed in a future direction as a threshold value the.
The noise section determination unit 81 detects the first block which has a representative value equal to or smaller than the threshold value the in the future direction from the block BK(n)−4 which is the center of the click noise, and determines the detected block as a noise terminating block.
It is assumed that, in
Furthermore, the noise section determination unit 81 refers to a section corresponding to the block BK(n)−6 serving as the noise terminating block in the input signal so as to specify a sample which performs zero-cross first in the future direction from a leading sample of the section (block). Then, a location of the sample is determined as a termination position of the noise section.
For example, as designated by an arrow mark A42 shown in
In
A section from the starting position to the termination position, that is, a section from the sample SP12 to the sample SP22, which is specified as described above corresponds to a noise section NZ. Note that, a length of the noise section NZ is especially referred to as an “interpolation length”.
As described above, in the signal processing apparatus 11, average values of frames which sandwich the frame F(n) to be processed are used as threshold values, and a section including blocks having representative values larger than the threshold values is determined as the noise section NZ.
It is assumed that click noise is not included in the frames which sandwich the frame F(n) to be processed, average values of representative values of the frames located before and after the frame F(n) represent average values of large amplitude in the vicinity of the frame F(n) in the input signal. Since representative values of blocks included in a portion of the click noise may be larger than the average values, a section including blocks having representative values larger than the average values which are consecutively aligned corresponds to a section of the click noise. Accordingly, when the average values of the frames before and after the frame F(n) to be processed are used as the threshold values, the section of the click noise is reliably specified.
Note that the noise section may be determined such that a length of the noise section has a value corresponding to a power of two.
In this case, if the number of samples in a section from the noise starting position to the noise terminating position, that is, a section from the sample SP12 to the sample SP22 corresponds to the power of two, the section from the sample SP12 to the sample SP22 is determined as a noise section without change.
On the other hand, when the number of samples in the section from the sample SP12 to the sample SP22 does not correspond to the power of two, among values corresponding to the power of two which are larger than the number of samples in the section from the sample SP12 to the sample SP22, the smallest value is determined as a length of the noise section. It is assumed that the number of samples in the section from the sample SP12 to the sample SP22 is “368”. Since “368” is not a value corresponding to the power of two, a value “512” which is larger than “368” but which is the smallest value corresponding to the power of two is determined as the length of the noise section.
Furthermore, when the length of the noise section represents a value corresponding to the power of two, the starting position of the noise section is located in the sample SP12, that is, located in a position of a sample which first performs zero-cross viewed from an end of the noise starting block. Therefore, a terminating position of the noise section is located in a terminal end of the section which has the length corresponding to the power of two and which is started from the position of the sample SP12.
As described above, since the length of the noise section is determined to be the smallest value among values which correspond to the power of two and which are equal to or larger than the number of samples in the section from the sample SP12 to the sample SP22, a calculation amount of an interpolation process performed in a latter stage may be reduced. Specifically, for example, a process in step S19 which will be described hereinafter, that is, a weighting calculation performed at a time of cross-fade of a preceding interpolation waveform and a succeeding interpolation waveform, may be realized only by multiplication and shift operation.
Furthermore, in the foregoing description, the noise starting position and the noise terminating position are reliably specified by specifying samples which first perform zero-cross from the ends of the noise starting block and the noise terminating block. However, this process may not be performed. In this case, for example, a leading sample of the noise starting block is determined as a starting position of the noise section whereas a last sample of the noise terminating block is determined as a terminating position.
As described above, by omitting a process of searching for zero-cross points and performing interpolation for each block, a calculation amount is reduced and a noise section is immediately specified. In this case, since the starting position and the terminating position of the noise section may not correspond to zero-cross points, a direct current component may be slightly generated due to interpolation of the noise section. However, it is less likely to deteriorate acoustic quality.
Referring back to the flowchart shown in
In step S17, the past interpolation waveform generation unit 82 generates a past interpolation waveform using a sample which has the interpolation length and which is located in the past relative to the noise starting position and using the information representing the noise section NZ supplied from the noise section determination unit 81 and supplies the past interpolation waveform to the synthesis unit 84.
For example, when a signal having a waveform designated by an arrow mark A43 shown in
Specifically, the section PR of the input signal is adjacent to the noise section NZ on a past side, that is, adjacent to the noise section NZ on a left side in
In step S18, the future interpolation waveform generation unit 83 generates a future interpolation waveform using a sample which has the interpolation length and which is located on a future side relative to the noise terminating position and using information on the noise section NZ supplied from the noise section determination unit 81, and supplies the future interpolation waveform to the synthesis unit 84.
For example, when the signal having the waveform designated by the arrow mark A43 shown in
Specifically, the section FR is adjacent to the noise section NZ on a future side, that is, adjacent to the noise section NZ on a right side in
As described above, since the waveforms used for the interpolation of the noise section NZ are generated using the sections which have the interpolation length and which are located before and after the noise section NZ of the input signal, powers of portions in the vicinity of the noise section NZ in the input signal after being subjected to the interpolation may be uniform. By this, a natural waveform is obtained without feeling of strangeness.
Furthermore, since the sections of the input signal before and after the noise section NZ are subjected to the time reversal, the first sample of the past interpolation waveform PS and the last sample of the future interpolation waveform FS correspond to the sample located immediately before the noise section and the sample located immediately after the noise section, respectively. Accordingly, when the interpolation is performed on the noise section using the past interpolation waveform PS and the future interpolation waveform FS, a connection between a waveform to be interpolated and waveforms located in boundaries of the noise section may become more natural without feeling of strangeness.
Referring back to the flowchart shown in
Specifically, the synthesis unit 84 multiplies values of samples included in the past interpolation waveform PS by weights designated by an arrow mark A44 shown in
In an example shown in
On the other hand, a weight to multiply a sample at a right end of the future interpolation waveform FS in
The synthesis unit 84 obtains sums of values of the samples included in the past interpolation waveform PS which are multiplied by the weights and values of the samples included in the future interpolation waveform FS which are multiplied by the weights and which are located so as to correspond to the samples of the past interpolation waveform PS so as to generate an interpolation waveform HS. For example, a sum of a value of the sample at the right end of the past interpolation waveform PS in
Referring back to the flowchart shown in
In step S20, the replacing unit 85 replaces the noise section NZ of the input signal by the interpolation waveform HS supplied from the synthesis unit 84 using the information representing the noise section NZ supplied from the noise section determination unit 81 so that the click noise is reduced.
For example, when a signal having a waveform designated by an arrow mark A46 shown in
After the noise is removed in step S20 or when it is determined that the click noise is not included in step S15, the process proceeds to step S21 where the signal processing apparatus 11 determines whether the process is to be terminated. For example, when the removal of the click noise has been performed on all sections of the input signal, it is determined that the process is to be terminated.
When it is determined that the process is not to be terminated in step S21, the process returns to step S11 and the operations described above are performed again. That is, a next frame is determined as a frame to be processed, and the detection and the removal of click noise are performed on the frame.
On the other hand, when it is determined that the process is to be terminated in step S21, the noise reduction process is terminated.
As described above, the signal processing apparatus 11 divides an input signal into a plurality of blocks, obtains representative values of the blocks, and detects click noise using a ratio of an average value and a maximum value of the representative values of the blocks included in a frame. Then, the signal processing apparatus 11 specifies a click noise section of the input signal, generates an interpolation waveform using sections which have the same length as the noise section and which are located before and after the noise section, and removes the click noise.
By this, since representative values are calculated for individual blocks and a ratio of an average value to a maximum value of the representative values of a frame including the blocks is obtained, click noise is more reliably detected with a reduced calculation amount with ease. Accordingly, the click noise may be reliably removed from the input signal, and natural sound is obtained in terms of acoustic sense without feeling of strangeness.
Note that, specifically, when a past interpolation waveform or a future interpolation waveform is generated, if signs of samples located before or after a sample at a starting position or a terminating position of a noise section are different from each other, signs of values of samples included in a sample group of a section of the input signal used for interpolation are inverted.
Specifically, it is assumed that, as illustrated in an upper portion of
Note that, in
In the input signal illustrated on the upper side of
In this case, the past interpolation waveform generation unit 82 determines whether signs of the samples SP43 and SP44 which are temporally located before and after the sample SP42 are the same as each other and generates a past interpolation waveform. For example, in the example shown in
Therefore, the past interpolation waveform generation unit 82 extracts a portion of the input signal surrounded by a rectangle K11 which is illustrated in a center portion of the drawing, that is, a section which has the interpolation length (noise section length) and which includes the sample SP43 at a right end in the drawing and performs time reversal on the section. Furthermore, the past interpolation waveform generation unit 82 inverts signs of values of samples of a waveform obtained by performing the time reversal on the portion of the input signal surrounded by the rectangle K11 so as to obtain a past interpolation waveform. By this, as illustrated in a lower portion of
In the lower portion of
By this, when signs of samples located before and after the sample SP42 at the starting position of the noise section are different from each other, signs of samples included in a section of the input signal used for generation of a past interpolation waveform are inverted when the past interpolation waveform is generated. Accordingly, when the noise section of the input signal is replaced by the past interpolation waveform as illustrated in the lower portion in
On the other hand, as illustrated on an upper portion of
Note that, also in
In an example shown in the upper portion of
Here, the past interpolation waveform generation unit 82 determines whether signs of values of the samples SP63 and SP64 which are temporally located before and after the sample SP62, respectively, are the same as each other. For example, in the example shown in
Therefore, the past interpolation waveform generation unit 82 extracts a portion of the input signal surrounded by a rectangle K31, that is, a section which has the interpolation length and which includes the sample SP63 at a right end thereof as shown in a center portion in the drawing and performs time reversal on the section so as to obtain a past interpolation waveform. By this, as shown in a lower portion of
In the lower portion of
As described above, when the signs of the values of the samples located before and after the sample SP62 located in the starting position of the noise section are the same as each other, the signs of the values of samples included in the section of the input signal used for the generation of the past interpolation waveform are not inverted. Accordingly, as shown in the lower portion of
Note that, as with the case of the past interpolation waveform, in a case where the future interpolation waveform is generated, when signs of values of samples located before and after a noise terminating position are different from each other, signs of values of samples used for the future interpolation waveform are inverted.
Furthermore, in the foregoing description, a maximum value of values of samples included in a block is determined as a representative value of the block. However, the representative block may be determined by a calculation using values of samples included in the block which satisfy a predetermined condition. For example, the representative value may be obtained by performing weighted summation on the values of all the samples included in the block. Alternatively, a predetermined number of samples may be selected in a descending order of the sample values, and an average value of the values of the samples may be determined as the representative value.
In the foregoing description, instead of a correlation calculation method, which is accompanied with a large amount of calculation and cost, the method for realizing effective reduction of click noise has been described. The calculation amount is reduced by replacing a waveform in a noise section by an interpolation waveform. However, in this method, when an obtained output signal is reproduced, sound corresponding to a discontinuous waveform of the output signal may be obtained in the vicinity of ends of a noise section which has been replaced by an interpolation waveform.
Specifically, it is assumed that a signal designated by an arrow mark A61 shown in an upper portion in
Note that, in
As designated by the arrow mark A61, when the noise section NZ31 is detected in the input signal, in the noise reduction process illustrated in
Then, as designated by an arrow mark A64, the noise section NZ31 of the input signal is replaced by an interpolation waveform HS21 obtained by performing cross-fade using the past interpolation waveform and the future interpolation waveform so that click noise is removed.
In this noise removal method, since the final interpolation waveform HS21 is generated using the past interpolation waveform and the future interpolation waveform by performing weighting in accordance with a distance to the noise section NZ31, unnaturalness of the waveform in the noise section NZ31 is reduced. Furthermore, in this method, since discontinuity of sample values at a starting position and a terminating position of the noise section NZ31 is avoided in principle, it is unlikely to generate apparent feeling of strangeness and abnormal sound.
However, when waveforms having low frequencies are included in portions before and after the noise section NZ31 of the input signal, aliasing waveforms apparently appear in the portions before and after the noise section NZ31 of an output signal and the aliasing portions have high frequency components. Therefore, when the output signal is reproduced, abnormal sound corresponding to the discontinuity of a waveform of the output signal is obtained as a result.
In the example shown in
Similarly, a section E13 located in the vicinity of the noise section terminating position has a waveform including a high frequency component. This is because when the click noise is to be removed, only continuity of sample values is taken into consideration among continuity to be considered in the noise section starting position and the noise section terminating position.
Accordingly, a noise reduction process may be performed so that a smoother waveform of the interpolation portion of the output signal is obtained. Hereinafter, referring to
In step S57, a past interpolation waveform generation unit 82 generates a past interpolation waveform using a preceding sample relative to the noise starting position which has the interpolation length using information representing a noise section supplied from a noise section determination unit 81 and supplies the past interpolation waveform to a synthesis unit 84.
For example, when a signal having a waveform designated by an arrow mark A81 shown in
Note that, in
In an example shown in
In step S58, a future interpolation waveform generation unit 83 generates a future interpolation waveform using a sample which is located on a future side relative to a noise terminating position and which has the interpolation length using the information representing the noise section supplied from the noise section determination unit 81 and supplies the future interpolation waveform to the synthesis unit 84.
For example, when the signal having the waveform designated by the arrow mark A81 shown in
As described above, in the noise reduction process shown in
In step S59, the synthesis unit 84 performs cross-fade using the past interpolation waveform supplied from the past interpolation waveform generation unit 82 and the future interpolation waveform supplied from the future interpolation waveform generation unit 83 so as to generate an interpolation waveform.
In step S59, the same process as step S19 in
For example, weights to multiply the samples of the past interpolation waveform gradually become smaller toward the future side, and a weight of the most preceding sample in a past direction is “1” and a weight of the most succeeding sample in a future direction is “0”. Conversely, weights to multiply the samples of the future interpolation waveform gradually become larger toward the future side, and a weight of the most preceding sample in the past direction is “0” and a weight of the most succeeding sample in the future direction is “1”.
After the synthesis unit 84 generates the interpolation waveform and supplies the interpolation waveform to a replacing unit 85, the process proceeds from step S59 to step S60.
In step S60, a replacing unit 85 replaces the noise section of the input signal by the interpolation waveform supplied from the synthesis unit 84 using information representing the noise section supplied from the noise section determination unit 81 so that the click noise of the input signal is reduced.
For example, when a signal designated by an arrow mark A82 shown in
As described above, in a state in which the noise section NZ41 is simply replaced by the interpolation waveform HS31, discontinuity (jump of sample values) of the waveform apparently occurs in a boundary section PS11 located in the vicinity of a noise starting position and a boundary section FS11 located in the vicinity of a noise terminating position. Note that the boundary section PS11 includes the noise starting position and the boundary section FS11 includes the noise terminating position.
Accordingly, the replacing unit 85 replaces a waveform in the vicinity of the boundary section PS11 and a waveform in the vicinity of the boundary section FS11 by waveforms which are newly generated by cross-fade so as to prevent the generation of the discontinuity of a waveform of an output signal.
Specifically, in step S61, the replacing unit 85 performs replacement of a waveform of the input signal included in a section which is adjacent to the noise starting position of the input signal which is obtained by performing the replacement using the interpolation waveform, that is, the input signal obtained by the process performed in step S60.
Specifically, as designated by an arrow mark A83 shown in
Next, the replacing unit 85 determines a section MP11 which is a predetermined section, which has the same length as the section BP11 and which is temporally located before (past) the section BP11 of the input signal. In an example shown in
Then, the replacing unit 85 performs cross-fade using a waveform of the section BP11 of the input signal and a waveform of the section MP11 of the input signal and replaces the section BP11 by a waveform HP11 obtained by the cross-fade as designated by an arrow mark A84 so that discontinuity of a waveform is avoided.
For example, when the cross-fade is performed, weights to multiply samples included in the section BP11 become gradually smaller toward a future side, and a weight of the most preceding sample on a past side is “1” and a weight of the most succeeding sample on the future side is “0”. Conversely, weights to multiply samples included in the section MP11 gradually become larger toward the future side, and a weight of the most preceding sample in the past side is “0” and a weight of the most succeeding sample in the future side is “1”.
Accordingly, in the vicinity of the section BP11 of the input signal which has been replaced by the waveform HP11, a waveform in the vicinity of a terminating position of the section MP11 is smoothly continued to a waveform in the vicinity of a starting position of the section PR31. Accordingly, the discontinuity of the waveform is avoided. As a result, natural sound is obtained without feeling of strangeness in terms of acoustic sense.
Specifically, when the interpolation waveform HS31 is generated, a weight to multiply a sample at a left end of the section PR31 in the drawing is “1” whereas a weight to multiply a sample at a left end of the section FR31 in the drawing is “0”. Accordingly, a sample at a left end of the interpolation waveform HS31 in the drawing is the same as the sample at the left end of the section PR31.
On the other hand, when the waveform HP11 is generated, a weight to multiply a sample at a right end of the section MP11 in the drawing is “1” whereas a weight to multiply a sample at a right end of the section BP11 in the drawing is “0”. Accordingly, a sample at a right end of the waveform HP11 in the drawing is the same as the sample at the right end of the section MP11.
When the waveform HP11 obtained as described above is arranged immediately before the interpolation waveform HS31, in a boundary portion between the waveform HP11 and the interpolation waveform HS31, the sample at the right end of the section MP11 and the sample at the left end of the section PR31 which are adjacent to each other in the original input signal are arranged adjacent to each other. That is, since the section BP11 of the input signal is replaced by the waveform HP11, a natural and smooth waveform is obtained in the vicinity of the starting position of the noise section NZ41.
Referring back to the flowchart shown in
Specifically, as designated by the arrow mark A83 shown in
Next, the replacing unit 85 determines a predetermined section which has the same length as the section BF11 and which is temporally located after the section BF11 of the input signal as a section MF11. In the example shown in
Then, the replacing unit 85 performs cross-fade on a waveform of the section BF11 and a waveform of the section MF11 and replaces the section BF11 of the input signal by a waveform HF11 obtained by the cross-fade as designated by an arrow mark A84 so that the discontinuity of a waveform is avoided.
For example, when the cross-fade is performed, weights to multiply samples included in the section BF11 gradually become larger toward a future side, and a weight of the most preceding sample on a past side is “0” and a weight of the most succeeding sample on the future side is “1”. Conversely, weights to multiply samples of the section MF11 gradually become smaller toward the future side, and a weight of the most preceding sample on the past side is “1” and a weight of the most succeeding sample on the future side is “0”.
Accordingly, in the vicinity of the section BF11 of the input signal which has been replaced by the waveform HF11, as with the case of the section BP11, a waveform in the vicinity of a starting position of the section MF11 and a waveform in the vicinity of a terminating position of the section FR31 are smoothly connected to each other. As a result, the discontinuity of a waveform is avoided, and natural sound corresponding to an output signal is obtained without feeling of strangeness in terms of acoustic sense.
The replacing unit 85 outputs the input signal obtained through the process described above to a subsequent stage as the output signal.
Referring back to the flowchart shown in
In step S63, the signal processing apparatus 11 determines whether the process is to be terminated. When removal of the click noise has been performed on all sections of the input signal, for example, it is determined that the process is to be terminated.
When it is determined that the process is not to be terminated in step S63, the process returns to step S51 and the processes described above are performed again. On the other hand, when it is determined that the process is to be terminated in step S63, the noise reduction process is terminated.
As described above, the signal processing apparatus 11 replaces the noise section of the input signal by the interpolation waveform, newly generates waveforms using sections adjacent to the noise section and the sections adjacent to sections used for the generation of the interpolation waveform, and thereafter, replaces the sections adjacent to the noise section by the newly-generated waveforms. By this, connection of the interpolation waveforms is attained so that the discontinuity of a waveform is prevented from being generated, and natural sound is obtained without feeling of strangeness in terms of acoustic sense.
When the noise reduction process illustrated in
Note that although the sections BP11 and BF11 which are adjacent to the noise section NZ41 shown in
The series of processes described above may be executed by hardware or software. When the series of processes is executed by software, programs included in the software are installed from a program recording medium to a computer which is incorporated in dedicated hardware or a general personal computer capable of executing various functions by installing various programs.
In the computer, a CPU (Central Processing Unit) 301, a ROM (Read Only Memory) 302, and a RAM (Random Access Memory) 303 are connected to one another through a bus 304.
An input/output interface 305 is also connected to the bus 304. To the input/output interface 305, an input unit 306 including a keyboard, a mouse, and a microphone, an output unit 307 including a display and a speaker, a recording unit 308 including a hard disk or a nonvolatile memory, a communication unit 309 including a network interface, and a drive 310 which drives a removable medium 311 such as a magnetic disk, an optical disc, a magneto-optical disc, or a semiconductor memory are connected.
In the computer configured as described above, when the CPU 301 loads programs recorded in the recording unit 308 to the RAM 303 through the input/output interface 305 and the bus 304 and executes the programs, the series of processes described above is performed.
The programs executed by the computer (CPU 301) are provided by being recorded in the removable medium 311 which is a package medium such as a magnetic disk (including a flexible disk), an optical disc (a CD-ROM (Compact Disc-Read Only Memory), a DVD (Digital Versatile Disc), or the like), a magneto-optical disc, or a semiconductor memory or by a wired or wireless transmission medium such as a local area network, the Internet, or digital satellite broadcasting.
The programs may be installed in the recording unit 308 through the input/output interface 305 by inserting the removable medium 311 into the drive 310. Furthermore, the programs may be received by the communication unit 309 through the wired or wireless transmission medium and installed in the recording unit 308. Alternatively, the programs may be installed in advance in the ROM 302 or the recording unit 308.
Note that the programs to be executed by the computer may be processed in time series in accordance with the order described in this specification, and alternatively, the programs may be processed in parallel or at a timing when the programs are called.
Note that embodiments of the present invention are not limited to the foregoing embodiment, and various modifications may be made without departing from the scope of the present invention.
The present application contains subject matter related to that disclosed in Japanese Priority Patent Application JP 2010-092817 filed in the Japan Patent Office on Apr. 14, 2010 and Japanese Priority Patent Application JP 2010-175335 filed in the Japan Patent Office on Aug. 4, 2010, the entire contents of which are hereby incorporated by reference.
It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and alterations may occur depending on design requirements and other factors insofar as they are within the scope of the appended claims or the equivalents thereof.
Number | Date | Country | Kind |
---|---|---|---|
P2010-092817 | Apr 2010 | JP | national |
P2010-175335 | Aug 2010 | JP | national |