This application is based upon and claims the benefit of priority of the prior Japanese Patent Application No. 2016-254286, filed on Dec. 27, 2016, the entire contents of which are incorporated herein by reference.
The embodiment discussed herein is related to an audio coding device and an audio coding method.
As one of audio coding techniques to compress and expand an audio signal of voice, music, or the like, there is a spectral band replication (SBR) technique. The SBR technique is a technique in which an audio signal is compressed by reproducing a high-band component from a low-band component. The SBR technique is a technique that enables coding with high sound quality at a low rate and therefore is used for various use purposes.
In audio coding, the SBR technique extracts a low-band component from an input sound source and extracts envelope information and tone information from a high-band component for information amount compression. The SBR technique replicates the low-band component to reproduce the high-band component. The envelope information is used for correcting the magnitude of energy of the high-band component reproduced through the replication. On the other hand, it is difficult to reproduce a signal that exists only in the high-band component through the replication of the low-band component. Thus, the SBR technique acquires information relating to the frequency and the magnitude of the energy about a tone signal that exists only in the high-band component as the tone information. The tone signal is a signal with a single frequency that is artificially given. The tone signal that exists only in the high band is included in music or the like performed by an electronic musical instrument. At the time of decoding, the tone signal is added based on the tone information to the high-band component reproduced with the envelope information and thereby the high-band component may be accurately decoded. For example, a technique using the SBR is disclosed in Japanese Laid-open Patent Publication No. 2008-96567.
[Patent Document 1] Japanese Laid-open Patent Publication No. 2008-96567
According to an aspect of the embodiment, an audio coding device includes a filter configured to extract a low-band signal having a first frequency component from an input signal, a memory, and a processor coupled to the memory and configured to extract envelope information relating to an envelope of a high-band signal having a second frequency component which is higher than the first frequency component in the input signal, detect tone information that is information on a tone signal included in a high-band signal spectrum from the input signal, correct the envelope information based on a difference between frequency of the tone signal and frequency of a peak of the envelope, and code the low-band signal, the tone information, and the envelope information that is corrected.
The object and advantages of the invention will be realized and attained by means of the elements and combinations particularly pointed out in the claims.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the invention, as claimed.
In the technique of Japanese Laid-open Patent Publication No. 2008-96567, there is the case in which a peak on an envelope reproduced based on the envelope information and the peak of a tone signal given based on the tone information exist with a very small frequency difference. In the case in which such peaks exist, when the high-band component is reproduced by the SBR technique based on the envelope information and the tone information, two peaks adjacently exist in the decoded signal. Due to the adjacency of the two peaks, a beat occurs in terms of the auditory sense and the decoded audio signal significantly deteriorates.
The disclosed techniques aim at implementing coding processing that allows decoding of a tone signal with which a beat does not occur even if a peak whose frequency is adjacent with respect to the tone signal is acquired.
The envelope information correcting unit 5 carries out correction of envelope information based on the envelope information output from the envelope information extracting unit 3 and tone information output from the tone information detecting unit 4. The envelope information correcting unit 5 includes an envelope peak detecting unit 7, a correction determining unit 8, and a peak suppressing unit 9.
When detecting a peak equal to or larger than a threshold set in advance from the envelope information, the envelope peak detecting unit 7 outputs the frequency of the peak and the peak value as peak information. The correction determining unit 8 executes correction necessity-unnecessity determination processing of whether or not to correct the envelope information based on the peak information output from the envelope peak detecting unit 7 and the tone information output from the tone information detecting unit 4. If determining that the correction is necessary based on information relating to the peak information and a frequency and a peak value included in the tone information, the correction determining unit 8 outputs a correction control signal for instructing the peak suppressing unit 9 to correct the envelope information as the determination result. When receiving the correction control signal that instructs correction of the envelope information from the correction determining unit 8, the peak suppressing unit 9 corrects the envelope information received from the envelope information extracting unit 3 based on the peak information received from the envelope peak detecting unit 7 and outputs corrected envelope information to the coding unit 6.
The coding unit 6 executes coding and multiplexing processing of a low-band signal received from the low-pass filter 2, the corrected envelope information received from the envelope information correcting unit 5, and the tone information received from the tone information detecting unit 4 and outputs the processing result as a stream signal.
As described above, the audio coding device 1 may correct the envelope information based on the envelope information and the tone information.
A spectrum 45 is a frequency spectrum obtained by a frequency transform of the input sound source by a Fourier transform or the like. The low-pass filter 2 in the audio coding device 1 extracts the spectrum of the low band existing in the region 41 in the spectrum 45 corresponding to the input sound source. An envelope 43 is envelope information extracted by the envelope information extracting unit 3. The envelope information extracting unit 3 extracts the envelope information represented in the envelope 43 from the spectrum of the high band existing in the region 42 in the spectrum 45. A peak 44 is tone information extracted by the tone information detecting unit 4. The tone information detecting unit 4 detects the tone information represented in the peak 44 from the spectrum of the high band included in the region 42 in the spectrum 45.
As described above, the audio coding device 1 may enhance the compression ratio in coding by executing the SBR processing on the input sound source to extract the envelope information and the tone information regarding the high-band signal.
A graph 18 represents processing of extracting the tone information from the tone signal as the original sound subjected to a frequency transform. In the graph 18, a spectrum 11 represents the spectrum of the original sound subjected to the frequency transform. Regions 17a and 17b represent sub-band regions. The sub-band regions are what are obtained by dividing the frequency region as the target of audio coding into plural frequency regions. If the peak of the spectrum 11 of the original sound is located at the boundary between the region 17a and the region 17b as in the graph 18, information on the peak of the spectrum 11 is included in both the region 17a and the region 17b. In the audio coding device 1, extraction processing of the envelope information and detection processing of the tone information are separately executed in each sub-band region. Therefore, for example, if the extraction processing of the envelope information and the detection processing of the tone information are executed at different resolutions, the tone information is acquired in a different sub-band region in some cases. In the graph 18, an envelope 12 is what is obtained by, in the region 17a, extracting the spectrum 11 of the original sound by the envelope information extracting unit 3. Furthermore, the tone information 13 is what is obtained by, in the region 17b, extracting information on the tone signal from the spectrum 11 of the original sound by the tone information detecting unit 4. Due to the extraction of information on the original sound as the envelope information and the tone information in two different sub-band regions, the information on the original sound becomes information in which two peaks adjacently exist through the coding although originally including one peak.
As represented by the graph 18, a graph 19 is the result of decoding of the tone signal 11 in the case in which, in audio coding, with respect to the original sound of the one tone signal 11, a peak is extracted as the envelope information as represented by the envelope 12 and a peak is detected as the tone information at a frequency different from the peak frequency of the envelope 12 as represented by the tone information 13. In decoding of the high-band signal subjected to the SBR processing, the low-band spectrum is copied into the high band and the energy level is adjusted based on the envelope information. If the frequency of a peak of the copied spectrum overlaps with the frequency of the peak of the envelope 12 as the result of the copying of the low-band spectrum, the peak extracted based on the envelope information is left as the high-band signal spectrum. When the tone signal spectrum is decoded based on the tone information 13 with respect to the high-band signal spectrum decoded based on the envelope information, a spectrum in which two peaks are adjacent is decoded as represented by a spectrum 15.
A graph 16 is a time waveform corresponding to the spectrum 15. When the spectrum in which the two peaks are adjacent is transformed to the time waveform by an inverse Fourier transform or the like, signals of the two adjacent frequencies interfere with each other and a beat occurs as represented by the graph 16. Because such a beat does not occur in the original sound, the occurrence of the beat becomes a cause of the lowering of the quality of the decoded sound.
In
A graph 32 represents that it is desirable that the peak frequency in the envelope information is separate from the peak frequency in the tone information by Δ or larger. Δ is a value close to zero but a beat does not occur if Δ is zero. Thus, the condition represented in the graph 32 intends to exclude the case in which a beat does not occur.
A graph 33 represents correction of the envelope information in the case in which a peak of the envelope information satisfying the conditions represented in the graph 31 and the graph 32 is detected. In the graph 33, a dotted line represents the envelope information before the correction and a solid line 38 represents the envelope information after the correction. The envelope information correcting unit 5 carries out correction regarding the detected envelope information based on a certain range 37 defined in advance as represented by the solid line 38. As the result of the correction, the peak energy of the envelope information becomes sufficiently lower than the peak energy of the tone information. Thus, the occurrence of a beat may be suppressed.
In
The envelope information correcting unit 5 detects a peak of envelope information in the detection range based on tone information (step S11). If the value of the detected peak is equal to or larger than a threshold set in advance (step S12: YES), the envelope information correcting unit 5 calculates the difference between the peak frequency of the detected envelope information and the peak frequency of the tone information (step S13). If the value of the detected peak is smaller than the threshold (step S12: NO), the envelope information correcting unit 5 ends the envelope information correction processing.
If the difference value calculated in the step S13 is equal to or larger than a threshold set in advance (step S14: YES), the envelope information correcting unit 5 suppresses the peak of the envelope information in the detection range and corrects the value of the peak to a level with which a beat does not occur (step S15). If the difference value is smaller than the threshold (step S14: NO), the envelope information correcting unit 5 ends the envelope information correction processing.
As described above, the envelope information correcting unit 5 may suppress the occurrence of a beat by correcting the envelope information based on the envelope information correction processing flow.
(Expression 1) is an expression that represents the relationship between a sub-band number i and a sub-band width SBW. In (expression 1), INT denotes a function for rounding down a value to zero decimal places. “pow” denotes an exponential function. F denotes the frequency resolution. “start” denotes a high-band generation start frequency index. “stop” denotes a high-band generation end frequency index. “numbands” denotes the number of sub-bands. The frequency index is what is obtained by giving a number from the lower band sequentially regarding frequency bands arising from dividing at a frequency resolution corresponding to F. For example, if a signal of 48-kHz sampling is subjected to a frequency transform by an orthogonal transform such as a modified discrete cosine transform in units of analysis length of 1024 samples, a frequency spectrum that may be expressed by 512 samples whose upper limit is 24 kHz is obtained. If this frequency spectrum is expressed as spec[j] (j=0 to 512), j is the frequency index.
The sub-band number i is what is obtained by giving a number from the lower frequency band sequentially when the frequency band as the target of audio coding processing is divided into plural bands. The sub-band width SBW is the bandwidth of the sub-band given each sub-band number i. As represented in the graph 91 in
In the embodiment of
(Expression 2) is an expression that defines the detection range W of peak detection based on (Expression 1).
When (Expression 1) and (Expression 2) are compared, the integer value added to the sub-band number i is changed from 1 to 2. The envelope information correcting unit 5 may carry out the peak detection of the envelope information by adjusting the integer value added to the sub-band number i based on (Expression 2) to define the detection range W.
As described above, the envelope information correcting unit 5 may detect the peak of the envelope information 12 having a relation to the tone information 13 more efficiently by setting the detection range W centered at the tone frequency.
The envelope information correcting unit 5 calculates i0 and i1 based on the sub-band number b of the sub-band region in which the peak of the envelope information 12 has been detected and (Expression 3) and carries out correction to an envelope that couples the value corresponding to i0 and the value corresponding to i1 by a straight line in the envelope information 12. By suppressing the peak of the envelope information that causes a beat by such correction, the audio coding device 1 may code the input signal in such a manner that the quality of the audio signal after decoding is improved.
As the masking threshold, the minimum value of the equal-loudness contour corresponding to the frequency band of a signal as the audio coding target may be set. Alternatively, the sound pressure level represented by the equal-loudness contour may be set based on the frequency of the peak as the correction target in the envelope information.
By correcting the envelope information based on the magnitude relationship with the masking threshold, a beat at the time of decoding may be suppressed with a smaller amount of calculation.
The CPU 50 functionally implements the respective functional blocks illustrated in
The input device 56 is a device for inputting information for processing of the audio coding device 1 from the external. The input device 56 includes a microphone, a keyboard, a mouse, and so forth. The output device 58 is a device for outputting the processing result of the audio coding device 1 to the external. The output device 58 includes a speaker, a display, and so forth. The DSP 60 is an abbreviation for a digital signal processor and executes, at high speed, processing of a frequency transform and so forth of an audio signal converted to a digital signal. The interface device 62 is a coupling part for implementing coupling of the audio coding device 1 to a network and coupling to an external storing device.
As described above, the audio coding device 1 may be implemented by executing the audio coding program by using a general-purpose computer.
The DEMUX 71 means a demultiplexer and demultiplexes a multiplexed stream signal into plural signals. The low-band signal decoding unit 72 decodes a coded low-band signal spectrum in the demultiplexed signals. The high-band generating unit 73 generates a high-band signal spectrum by copying the decoded low-band signal spectrum into the high band. The envelope information decoding unit 74 decodes coded envelope information in the demultiplexed signals. The tone information decoding unit 75 decodes coded tone information in the demultiplexed signals. The high-band shaping unit 76 corrects a peak of the high-band signal spectrum generated by the high-band generating unit 73 based on the envelope information output from the envelope information decoding unit 74. The tone generating unit 77 generates a tone signal based on the decoded tone information. The MIX 78 combines the high-band signal spectrum after the correction output from the high-band shaping unit 76 and the tone signal output from the tone generating unit 77 and outputs the decoded signal spectrum resulting from the combining.
As described above, the audio decoding device 10 may output the decoded signal based on the signal coded by the present embodiment.
In a graph 102, an envelope 83 represents an envelope of the high-band signal spectrum based on the envelope information and a peak 84 represents the peak of a tone signal based on the tone information. The high-band shaping unit 76 carries out correction of the energy level based on the envelope 83 for the high-band signal spectrum arising from the copying. The MIX 78 combines the peak 84 with the high-band signal spectrum corrected based on the envelope 83.
As described above, the audio decoding device 10 may decode an audio signal based on the low-band signal spectrum, the envelope information, and the tone information that are decoded.
All examples and conditional language recited herein are intended for pedagogical purposes to aid the reader in understanding the invention and the concepts contributed by the inventor to furthering the art, and are to be construed as being without limitation to such specifically recited examples and conditions, nor does the organization of such examples in the specification relate to a showing of the superiority and inferiority of the invention. Although the embodiment of the present invention has been described in detail, it should be understood that the various changes, substitutions, and alterations could be made hereto without departing from the spirit and scope of the invention.
Number | Date | Country | Kind |
---|---|---|---|
2016-254286 | Dec 2016 | JP | national |