SYSTEM AND METHOD FOR MEASUREMENT OF PERCEIVABLE QUANTIZATION NOISE IN PERCEPTUAL AUDIO CODERS

Information

  • Patent Application
  • 20080027721
  • Publication Number
    20080027721
  • Date Filed
    November 09, 2006
    18 years ago
  • Date Published
    January 31, 2008
    17 years ago
Abstract
A technique for computing perceptual noise in an audio signal that is computationally efficient. In one example embodiment, the technique includes computing perceptual noise in an input audio signal. The steps involve pre-computing NER (noise-to-excitation ratio) values associated with critical bands within a frame by zeroing out associated spectral coefficient values before the quantization loop, and also assuming bands with lower spectral energy than the band under consideration are zeroed out during quantization. When a critical band is zeroed out during qunatization, the associated NER values which have been pre-computed are used in computing an overall perceptual distortion of the frame.
Description

BRIEF DESCRIPTION OF THE DRAWINGS

A more detailed understanding of the invention may be had from the following description of preferred embodiments, given by way of example only, and to be understood in conjunction with the accompanying drawings in which:



FIG. 1 is a flowchart illustrating measurement of perceptual noise according to an embodiment of the present subject matter.



FIG. 2 is an example of a suitable computing environment for implementing the measurement of perceptual noise according to various embodiments of the present invention, such as those shown in FIGS. 1 and 2.





DETAILED DESCRIPTION

In the following detailed description of the various embodiments of the invention, reference is made to the accompanying drawings that form a part hereof, and in which are shown by way of illustration specific embodiments in which the invention may be practiced. These embodiments are described in sufficient detail to enable those skilled in the art to practice the invention, and it is to be understood that other embodiments may be utilized and that changes may be made without departing from the scope of the present invention. The following detailed description is, therefore, not to be taken in a limiting sense, and the scope of the present invention is defined only by the appended claims and their equivalents.


Referring to FIG. 1, at step 110, the method 100 in this example embodiment begins by pre-computing NER (noise-to-excitation ratio) values associated with each critical band within a frame by zeroing out associated spectral coefficient values, before the quantization loop. In some embodiments, NER for each critical band is computed as follows.


The noise is calculated assuming that the reconstructed values are zero for each critical band. The noise for each critical band is calculated using the equation







NP


[
b
]


=





k
=
0

,

B


[
b
]








A
2



[
k
]





X
2



[
k
]








Wherein X[k] are the original spectral coefficients, A[k] is an outer ear transform, and B[b] is final excitation values.


The excitation for each critical band is computed assuming that the critical band is zeroed out. All critical bands with spectral coefficient values lower than the current critical band are also assumed to have been zeroed out for the purpose of excitation computation At step 120, a quantization is performed on the original spectral coefficients associated with each of the critical bands within the frame to obtain quantized spectral coefficients. In some embodiments, an encoder applies a uniform, scalar quantization step size to a block of spectral data that was previously weighted by critical bands according to a quantization matrix. Alternatively, the encoder applies a non-uniform quantization to weight the block by quantization bands, or applies the quantization matrix and the uniform, scalar quantization step size.


At step 130, an inverse quantization is performed on the obtained quantized spectral coefficients for each of the critical bands within the frame to obtain reconstructed spectral coefficients. In some embodiments, an encoder reconstructs the block of spectral data from the quantized data. For example, the encoder applies the inverse quantization to reconstruct the block, and then applies an inverse multi-channel transform to return the block to independently coded channels.


In these embodiments, the encoder processes the reconstructed block in critical bands according to an auditory model. The number and placement of the critical bands depends on the auditory model, and may be different from the number and placement of quantization bands. By processing the block by critical bands, the encoder improves the accuracy of subsequent quality measurements.


At step 140, the method 100 determines whether to use pre-computed NER values associated with the critical bands as a function of the obtained reconstructed spectral coefficients or to compute new NER value for the critical bands using original excitation values


This step involves measuring quality of the reconstructed block, for example, measuring the NER as described above.


In some embodiments, noise pattern between original transform coefficients X[k] and the reconstructed transform coefficients Xr[k] is computed by calculating sample by sample differences N[k]. An outer ear transfer function A is applied to the difference to obtain N[k], as described below.






N[k]=A[k](X[k]−Xr[k])


Using the distortion coefficients N[k] thus obtained, the noise pattern in critical band ‘b’ is accumulated, over the length of the critical bandB[b] as described-above.







NP


[
b
]


=





k
=
0

,

B


[
b
]







N
2



[
k
]







In some embodiments, the excitation pattern is computed using below outlined steps. Transform coefficients X[k] are multiplied by the outer ear transform A[k] to obtain Y[k]






Y[k]=X[k]*A[k]


The energy of the coefficients Y[k] are summed up for all critical bands to obtain En[b]







En


[
b
]


=





k
=
0

,

B


[
b
]







Y
2



[
k
]







Frequency smearing is performed on En[b] bands. This can involve a process of convolution of En[b] with a level dependent spreading function to obtain Ec[b]. This spreading function models the frequency masking phenomenon of the inner ear.


Time smearing is performed on Ec[b] to obtain the final excitation values E[b]. Time smearing can involve first order low pass filtering on the excitation values on a per-band basis.






E[b]=aEPrev[b]+(1−a) Ec[b]


Wherein Eprev[b] is an excitation value corresponding to the previous frame.


At step 150, an overall perceptual distortion of the frame is computed by using either the pre-computed NER values or new NER values computed using the original excitation values based on the determination at step 140.


At step 160, the computed NER values associated with the critical bands are summed to obtain a summed NER value. At step 170, the method 100 compares the summed NER value with a target NER value and determines whether a target NER is achieved. The method 100 goes to step 180 and continues with the bit-rate loop process if the target NER is achieved. The method 100 goes to step 120 and repeats steps 120-170 if the target NER is not achieved.


Although the method 100 includes steps 110-180 that are arranged serially in the exemplary embodiments, other embodiments of the present subject matter may execute two or more acts in parallel, using multiple processors or a single processor organized two or more virtual machines or sub-processors. Moreover, still other embodiments may implement the acts as two or more specific interconnected hardware modules with related control and data signals communicated between and through the modules, or as portions of an application-specific integrated circuit. Thus, the exemplary process flow diagrams are applicable to software, firmware, and/or hardware implementations.


Various embodiments of the present invention can be implemented in software, which may be run in the environment shown in FIG. 2 (to be described below) or in any other suitable computing environment. The embodiments of the present invention are operable in a number of general-purpose or special-purpose computing environments. Some computing environments include personal computers, general-purpose computers, server computers, hand-held devices (including, but not limited to, telephones and personal digital assistants (PDAs) of all types), laptop devices, multi-processors, microprocessors, set-top boxes, programmable consumer electronics, network computers, minicomputers, mainframe computers, distributed computing environments and the like to decode code stored on a computer-readable medium. The embodiments of the present invention may be implemented in part or in whole as machine-executable instructions, such as program modules that are decoded by a computer. Generally, program modules include routines, programs, objects, components, data structures, and the like to perform particular tasks or to implement particular abstract data types. In a distributed computing environment, program modules may be located in local or remote storage devices.



FIG. 2 shows an example of a suitable computing system environment for implementing embodiments of the present invention. FIG. 2 and the following discussion are intended to provide a brief, general description of a suitable computing environment in which certain embodiments of the inventive concepts contained herein may be implemented.


A general computing device, in the form of a computer 210, may include a processor 202, memory 204, removable storage 201, and non-removable storage 214. Computer 210 additionally includes a bus 205 and a storage area network interface (NI) 212.


Computer 210 may include or have access to a utility computing environment that includes one or more computing servers 240 and one or more disk arrays 260, a SAN 250 and one or more communication connections 220 such as a network interface card or a USB connection. The computer 210 may operate in a networked environment using the communication connection 220 to connect to the one or more computing servers 240. A remote server may include a personal computer, server, router, network PC, a peer device or other network node, and/or the like. The communication connection may include a Local Area Network (LAN), a Wide Area Network (WAN), and/or other networks.


The memory 204 may include volatile memory 206 and non-volatile memory 208. A variety of computer-readable media may be stored in and accessed from the memory elements of computer 210, such as volatile memory 206 and non-volatile memory 208, removable storage 212 and non-removable storage 214. Computer memory elements can include any suitable memory device(s) for storing data and machine-readable instructions, such as read only memory (ROM), random access memory (RAM), erasable programmable read only memory (EPROM), electrically erasable programmable read only memory (EEPROM), hard drive, removable media drive for handling compact disks (CDs), digital video disks (DVDs), diskettes, magnetic tape cartridges, memory cards, Memory Sticks™, and the like; chemical storage; biological storage; and other types of data storage.


“Processor” as used herein, means any type of computational circuit, such as, but not limited to, a microprocessor, a microcontroller, a complex instruction set computing (CISC) microprocessor, a reduced instruction set computing (RISC) microprocessor, a very long instruction word (VLIW) microprocessor, explicitly parallel instruction computing (EPIC) microprocessor, a graphics processor, a digital signal processor, or any other type of processor or processing circuit. The term also includes embedded controllers, such as generic or programmable logic devices or arrays, application specific integrated circuits, single-chip computers, smart cards, and the like.


Embodiments of the present invention may be implemented in conjunction with program modules, including functions, procedures, data structures, application programs, etc., for performing tasks, or defining abstract data types or low-level hardware contexts.


Machine-readable instructions stored on any of the above-mentioned storage media are executable by the processor 202 of the computer 210. For example, a computer program 225 may comprise machine-readable instructions capable of measuring perceptual noise according to the teachings and herein described embodiments of the present invention. In one embodiment, the computer program 225 may be included on a CD-ROM and loaded from the CD-ROM to a hard drive in non-volatile memory 208. The machine-readable instructions cause the computer 210 to estimate SFO according to the various embodiments of the present invention.


The perceptual noise estimation technique of the present invention is modular and flexible in terms of usage in the form of a “Distributed Configurable Architecture”. As a result, parts of the perceptual estimation system may be placed at different points of a network, depending on the model chosen. For example, the technique can be deployed in a server and the input and output instructions streamed over from a client to the server and back, respectively. Such flexibility allows faster deployment to provide a cost effective solution to changing business needs.


The above description is intended to be illustrative, and not restrictive. Many other embodiments will be apparent to those skilled in the art. The scope of the invention should therefore be determined by the appended claims, along with the full scope of equivalents to which such claims are entitled.


The above-described methods and apparatus provide various embodiments for encoding characters. It is to be understood that the above-description is intended to be illustrative, and not restrictive. Many other embodiments will be apparent to those of skill in the art upon reviewing the above-description. The scope of the subject matter should, therefore, be determined with reference to the following claims, along with the full scope of equivalents to which such claims are entitled. The above-described process reduces the complexity of computing perceptual noise by about 40-50% of the overall traditional quantization techniques, after accounting for the initial calculation of noise-to-excitation ratio for each band as described-above. The above-described process alleviates the conventional iterative process of excitation computation. Further, in the above process the excitation values are computed only once prior to quantization.


As shown herein, the present invention can be implemented in a number of different embodiments, including various methods, a circuit, an I/O device, a system, and an article comprising a machine-accessible medium having associated instructions.


Other embodiments will be readily apparent to those of ordinary skill in the art. The elements, algorithms, and sequence of operations can all be varied to suit particular requirements. The operations described-above with respect to the method 100 illustrated in FIG. 1 can be performed in a different order from those shown and described herein. FIGS. 1 and 2 are merely representational and are not drawn to scale. Certain proportions thereof may be exaggerated, while others may be minimized. FIGS. 1-2 illustrate various embodiments of the invention that can be understood and appropriately carried out by those of ordinary skill in the art.


It is emphasized that the Abstract is provided to comply with 37 C.F.R. §1.72(b) requiring an Abstract that will allow the reader to quickly ascertain the nature and gist of the technical disclosure. It is submitted with the understanding that it will not be used to interpret or limit the scope or meaning of the claims.


The above-described implementation is intended to be applicable, without limitation, to situations where improvement to an OFDM system is sought, considering the use of SFO estimation. The description hereinabove is intended to be illustrative, and not restrictive. The various embodiments of the method of improving the OFDM system described herein are applicable generally to any OFDM system, and the embodiments described herein are in no way intended to limit the applicability of the invention. Many other embodiments will be apparent to those skilled in the art. The scope of this invention should therefore be determined by the appended claims as supported by the text, along with the full scope of equivalents to which such claims are entitled.

Claims
  • 1. A method of computing perceptual noise in an audio signal, comprising: pre-computing NER (noise-to-excitation ratio) values associated with critical bands within a frame by zeroing out associated spectral coefficient values before the quantization loop; andcomputing an overall perceptual distortion of the frame using the pre-computed NER values.
  • 2. The method of claim 1, wherein pre-computing the NER values comprises: computing NER values associated with critical bands within a frame by zeroing out associated spectral coefficient values before the quantization loop based on assuming that all critical bands, within the frame, having the spectral coefficient values lower than a current critical band are zeroed out
  • 3. The method of claim 2, wherein computing the overall perceptual distortion of the frame comprises: performing quantization on original spectral coefficients associated with each of the critical bands within the frame to obtain quantized spectral coefficients;performing inverse quantization on the quantized spectral coefficients for each of the critical bands within the frame to obtain reconstructed spectral coefficients;determining whether to use pre-computed NER values associated with critical bands as a function of the obtained reconstructed spectral coefficients or to compute new NER values for the critical bands using original excitation values; andcomputing the overall perceptual distortion of the frame by using either the pre-computed NER values or new NER values computed using the original excitation values based on the determination.
  • 4. The method of claim 3, further comprising: summing the computed NER values associated with the critical bands to obtain a summed NER value;comparing the summed NER value with a target NER value;determining whether the target NER value is achieved based on an outcome of the comparison; andif so, then continue with the bit-rate loop.
  • 5. The method of claim 4, further comprising: if not, repeating the steps of performing quantization, performing inverse quantization, determining, using, summing, comparing, and determining.
  • 6. An article comprising: a storage medium having instructions that, when decoded by a computing platform, will result in a method for computing perceptual noise, comprising: pre-computing NER (noise-to-excitation ratio) values associated with critical bands within a frame by zeroing out associated spectral coefficient values before the quantization loop; andcomputing an overall perceptual distortion of the frame using computed NER values.
  • 7. The article of claim 6, wherein computing the NER values comprises: computing NER values associated with critical bands within a frame by zeroing out associated spectral coefficient values before the quantization loop based on assuming that all critical bands, within the frame, having the spectral coefficient values lower than a current critical band are zeroed out
  • 8. The article of claim 7, wherein computing the overall perceptual distortion of the frame comprises: performing quantization on original spectral coefficients associated with each of the critical bands within the frame to obtain quantized spectral coefficients;performing inverse quantization on the quantized spectral coefficients for each of the critical bands within the frame to obtain reconstructed spectral coefficients;determining whether to use pre-computed NER values associated with critical bands as a function of the obtained reconstructed spectral coefficients or to compute new NER values for the critical bands using original excitation values; andcomputing the overall perceptual distortion of the frame by using either the pre-computed NER values or new NER values computed using the original excitation values based on the determination.
  • 9. The article of claim 8, further comprising: summing the computed NER values associated with the critical bands to obtain a summed NER value;comparing the summed NER value with a target NER value;determining whether the target NER value is achieved based on an outcome of the comparison; andif so, then continue with the bit-rate loop process.
  • 10. The method of claim 9, further comprising: if not, repeating the steps of performing quantization, performing inverse quantization, computing, determining, using, summing, comparing, and determining.
  • 11. A computer system comprising: a computer network, wherein the computer network has a plurality of network elements, and wherein the plurality of network elements has a plurality of network interfaces;a network interface;an input module coupled to the network interface that receives topology data via the network interface;a processing unit; anda memory coupled to the processor, the memory having stored therein code associated with a method for computing perceptual noise, the code causes the processor to perform a method comprising: pre-computing NER (noise-to-excitation ratio) values associated with critical bands within a frame by zeroing out associated spectral coefficient values before the quantization loop; andcomputing an overall perceptual distortion of the frame using computed NER values.
  • 12. The system of claim 11, wherein computing the NER values comprises: pre-computing NER values associated with critical bands within a frame by zeroing out associated spectral coefficient values before the quantization loop based on assuming that all critical bands, within the frame, having the spectral coefficient values lower than a current critical band are zeroed out
  • 13. The system of claim 12, wherein computing the overall perceptual distortion of the frame comprises: performing quantization on original spectral coefficients associated with each of the critical bands within the frame to obtain quantized spectral coefficients;performing inverse quantization on the quantized spectral coefficients for each of the critical bands within the frame to obtain reconstructed spectral coefficients;determining whether to use pre-computed NER values associated with critical bands as a function of the obtained reconstructed spectral coefficients or to compute new NER values for the critical bands using original excitation values; andcomputing the overall perceptual distortion of the frame by using either the pre-computed NER values or new NER values computed using the original excitation values based on the determination.
  • 14. The system of claim 13, further comprising: summing the computed NER values associated with the critical bands to obtain a summed NER value;comparing the summed NER value with a target NER value;determining whether the target NER value is achieved based on an outcome of the comparison; andif so, then continue with the bit-rate loop process.
  • 15. The system of claim 14, further comprising. if not, repeating the steps of performing quantization, performing inverse quantization, computing, determining, using, summing, comparing, and determining.
Priority Claims (1)
Number Date Country Kind
IN 1295/CHE/2006 Jul 2006 IN national