BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is a block diagram of a conventional audio data transcoding system.
FIG. 2 is a block diagram of an embodiment of the inventive audio data transcoding system.
FIG. 3 is a block diagram of a simplified implementation of filterbanks 4 and 6 of the FIG. 1 system.
FIG. 4 is a block diagram of elements of a simplified, maximally-decimated implementation of filterbank 5 of FIG. 2, also including blocks that indicate filtering functions implemented by other elements of the FIG. 2 system.
FIG. 5 is a block diagram of an embodiment of the inventive transcoding system.
FIG. 5A is a block diagram of another embodiment of the inventive transcoding system.
FIG. 6 is a block diagram of another simplified implementation of elements of filterbank 5 of FIG. 2 (which is not a maximally-decimated implementation), also including blocks indicating filtering functions implemented by other elements of the FIG. 2 system.
FIG. 7 is a diagram of steps that can be employed to generate the filters Mp,q(z) implemented by filter stage 37 of FIG. 6.
FIG. 8 is a block diagram of another embodiment of the inventive transcoding system.
FIG. 9 is a block diagram of another embodiment of the inventive transcoding system.
DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS
A class of embodiments of the inventive system will be described with reference to FIGS. 2 and 4. The system of FIG. 2 performs MP3 encoding of input audio data (using analysis filterbank 2 and quantization circuits Q), transcodes the resulting MP3 format audio data (using inverse quantization circuits IQ, synthesis and analysis filterbank 5, and quantization circuits Q′, connected as shown) to generate transcoded audio data having SBC format, and performs SBC decoding on the transcoded audio data (using inverse quantization circuits IQ′ and synthesis filterbank 8) to generate time-domain samples of decoded audio data. All elements of FIG. 2 that are labeled identically in FIGS. 1 and 2 are identical to the corresponding elements of FIG. 1, and the foregoing description of them will not be repeated with reference to FIG. 2.
The FIG. 2 system performs transcoding efficiently in accordance with the invention using a combined synthesis and analysis filterbank 5 configured to generate (and assert to quantization circuits Q′) transformed frequency-domain coefficients in response to frequency-domain coefficients of audio data from inverse quantization circuits IQ without performing synthesis filtering (as performed by filterbank 4 of FIG. 1 or elements 10 and 12 of FIG. 3) following by separate analysis filtering (as performed by filterbank 6 of FIG. 1 or elements 16 and 18 of FIG. 3). Thus filterbank 5 replaces a conventional cascade of separate synthesis and analysis filterbanks (e.g., cascaded filterbanks 4 and 6 of FIG. 1, in which filterbank 4 reconstructs the original input audio samples that were input to analysis filterbank 2 and asserts them to analysis filterbank 6, and filterbank 6 generates transformed frequency-domain coefficients in response thereto).
Next, with reference to FIGS. 3 and 4 we explain and contrast in more detail the manner in which the systems of FIGS. 1 and 2 perform transcoding. Recall that in FIGS. 1 and 2, filterbanks 4 and 5 receive 576 streams of frequency-band coefficients of MP3 format audio data and filterbanks 6 and 5 output eight streams of frequency-band coefficients of SBC format audio data to quantization circuits Q′. For simplicity, FIGS. 3 and 4 show (and their description assumes) only thirty-two filters in each of encoder 30 and stages 12 and 36, and thirty-two circuits 10, 32, and 34, for generating and transcoding thirty-two streams of frequency-band coefficients in response to a stream of time-domain input audio samples. Thus, the structures shown in FIGS. 3 and 4 (and FIG. 6 to be discussed below) could be employed to implement MPEG1(Layer I)-to-SBC transcoding or MPEG1(Layer II)-to-SBC transcoding. It will be apparent from the explanation below how the description of each of FIGS. 3 and 4 (and FIG. 5) should be modified to explain how the systems of FIGS. 3, 4, and 6 can implement MP3-to-SBC transcoding.
FIG. 4 is a block diagram of a simplified implementation of filterbank 5 of FIG. 2, in which blocks 30, 32, 38, and 40 indicate filtering functions implemented by other elements of FIG. 2. Up-sampling circuits 34 and filters 36 of filterbank 5 perform all their processing operations on data in the frequency domain. However, since these operations are performed on frequency-band coefficients having characteristics that are similar to those of time-domain samples of audio data, these operations will be described as if they were performed on time-domain samples of audio data. For example, each stream of frequency-band coefficients asserted to one of up-sampling circuits 34 is described herein as being up-sampled by such circuit 34 as if it were a stream of time-domain samples of audio data.
Similarly, FIG. 4 assumes that MP3 encoder 30 and decimation (down-sampling) circuits 32, and up-sampling circuits 38 and SBC decoding filterbank 40 of FIG. 4 system perform all processing operations in the frequency domain. However, since circuits 32 and 38 operate on frequency-band coefficients having characteristics that are similar to those of time-domain audio data, the operations of circuits 32 and 38 (and encoder 30 and filterbank 40) will be described as if they were performed on time-domain samples of audio data. For example, each stream of frequency-band coefficients asserted to one of down-sampling circuits 32 is described as being down-sampled by such circuit 34 as if it were a stream of time-domain samples of audio data.
FIG. 5 is a block diagram of a system for implementing transcoding in accordance with the invention to generate M streams of quantized frequency coefficients (quantized frequency-band coefficients) of data having a second encoding format (with M coefficients of such data, one from each stream, being indicative of M samples of audio data) in response to N streams of quantized frequency coefficients (quantized frequency-band coefficients) of data having a first encoding format (with N coefficients of such data, one from each stream, being indicative of N samples of audio data), where N and M are arbitrary numbers that satisfy N>M and M=N/L (where L is described below). For example, in case of MP3-to-SBC transcoding, N=576, M=8, and L=72.
Typically, the N streams of quantized frequency-domain coefficients to be transcoded by the FIG. 5 system have been quantized using perceptual criteria in order to achieve the highest audio quality at the desired bit rate. The FIG. 5 system includes an inverse quantization stage (comprising N inverse quantization circuits I1, I2, . . . , and IN) configured to reconstruct the original (pre-quantization) frequency-domain coefficients of audio data by performing inverse quantization thereon. The reconstructed frequency-domain coefficients are asserted to filterbank 103.
Filterbank 103 implements partial transcoding of the data values from the inverse quantization stage in accordance with the invention and asserts the partially transcoded data values to a quantization stage (comprising M quantization circuits Q′1, Q′2, . . . , and Q′M). More specifically, a new set of N data values is clocked into filterbank 103's up-sampling stage (comprising N up-sampling circuits, U1, U2, . . . , and UN) once per N clock cycles, and is clocked out of the up-sampling stage to filter stage 105 once per N/M clock cycles. Filter stage 105 of filterbank 103 generates a new set of M filtered frequency coefficients once per each N/M clock cycles in response to each set of N data values from the up-sampling stage.
Filter stage 105 asserts each such set of M partially transcoded frequency coefficients to the quantization stage comprising quantization circuits Q′1, Q′2, . . . , and Q′M. The quantization stage performs quantization on the partially transcoded frequency coefficients (typically in accordance with perceptual criteria) to generate a set of M fully transcoded frequency-domain coefficients (once per M clock cycles). These fully transcoded frequency-domain coefficients can then undergo conventional decoding to reconstruct the original time-domain audio samples therefrom.
FIG. 5A is a block diagram of a system that is a variation on the FIG. 5 system, and (like the FIG. 5 system) is configured to implement transcoding in accordance with the invention to generate M streams of quantized frequency coefficients (quantized frequency-band coefficients) of data having a second encoding format (with M coefficients of such data, one from each stream, being indicative of M samples of audio data) in response to N streams of quantized frequency coefficients (quantized frequency-band coefficients) of data having a first encoding format (with N coefficients of such data, one from each stream, being indicative of N samples of audio data). However, in the FIG. 5A system, N and M are arbitrary numbers that satisfy N<M and N=M/L (where L is described below).
Typically, the N streams of quantized frequency-domain coefficients to be transcoded by the FIG. 5A system have been quantized using perceptual criteria in order to achieve the highest audio quality at the desired bit rate. The FIG. 5A system includes an inverse quantization stage (comprising N inverse quantization circuits IQ1, IQ2, . . . , and IQN) configured to reconstruct the original (pre-quantization) frequency-domain coefficients of audio data by performing inverse quantization thereon. The reconstructed frequency-domain coefficients are asserted to filterbank 203.
Filterbank 203 implements partial transcoding of the data values from the inverse quantization stage in accordance with the invention and asserts the partially transcoded data values to a quantization stage (comprising M quantization circuits Q′1, Q′2, . . . , and Q′M). More specifically, a new set of N data values is clocked into filterbank 203's filter stage 205 of filterbank 203 once per N clock cycles. Filter stage 205 generates a new set of M filtered frequency coefficients once per each M/N clock cycles in response to each set of N data values from the inverse quantization stage. Each set of M filtered frequency coefficients is down-sampled (by the above-mentioned factor “L”) in a down-sampling stage comprising M down-sampling circuits, once per M clock cycles, such that each such set of M filtered frequency coefficients is clocked out of the down-sampling stage to the quantization stage once per M/N clock cycles. The quantization stage performs quantization on the partially transcoded frequency coefficients (typically in accordance with perceptual criteria) to generate a set of M fully transcoded frequency-domain coefficients (once per M clock cycles). These fully transcoded frequency-domain coefficients can then undergo conventional decoding to reconstruct the original time-domain audio samples therefrom.
With reference again to FIG. 4, MP3 encoder 30 and decimation circuits 32 implement the conventional MP3 encoding function of filterbank 2 of FIG. 2. Encoder 30 generates thirty-two frequency-band coefficients (each corresponding to one of filters E0(z)-E31(z) indicated in FIG. 4) in response to thirty-two consecutive time-domain samples of input audio. Decimation circuits 32 output one such set of thirty-two frequency-band coefficients per each thirty-two consecutive clock periods. Although each such set of thirty-two coefficients values is actually a set of thirty-two frequency-band coefficients (each corresponding to a different frequency band), the processing of these coefficients is sometimes referred to herein as if it were performed on time-domain values. One set of thirty-two frequency-band coefficients is asserted to filterbank 5 per each thirty-two consecutive clock periods. Typically, the frequency-band coefficients clocked into filterbank 5 have undergone quantization in the encoder of a creating device (typically the creating device is a content server, e.g., content server 400 of FIG. 9), and inverse quantization has then been performed on the quantized coefficients to reconstruct the thirty-two frequency-band coefficients that are asserted to filterbank 5 (for processing in elements 34, 36, and S0-S7 of filterbank 5), but the circuitry for performing these operations is not shown in FIG. 4 for simplicity Filter stage 36 of filterbank 5 generates a set of eight data values (each corresponding to one of filters (Mi(z)+Mi+1(z)+Mi+2(z)+Mi+3(z)), where i=0, 4, 8, 12, 16, 20, 24, and 28) and asserts this set of eight values (which together are indicative of eight samples of input audio data) at its outputs once per each eight clock cycles, in response to each set of thirty-two frequency-band coefficients clocked out of decimation circuits 32. Each set of eight values generated by stage 36 is actually a set of eight frequency-band coefficients (each corresponding to a different frequency band), but these coefficients may sometimes be referred to herein as time-domain values. Thus, filterbank 5 asserts four sets of eight frequency-band coefficients during each consecutive thirty-two clock cycles.
Each set of eight frequency-band coefficients output from filterbank 5 of FIG. 2 undergoes quantization (but the circuitry for performing quantization is not shown in FIG. 4 for simplicity). At the decoder in the final consuming device (e.g., device 404 of FIG. 8 which is typically a pair of headphones) one set of eight inverse-quantized frequency-band coefficients (sometimes referred to herein as a set of eight time-domain values) is equivalently clocked into up-sampling circuits 38 once per eight clock cycles. Up-sampling circuits 38 and SBC decoding filterbank 40 implement the conventional SBC decoding function of filterbank 8 of FIG. 2. Filterbank 40 applies filters F0(z)-F7(z) to each set of eight frequency-band coefficients values output from block 5 to generate a sequence of eight reconstructed time-domain samples of the original input audio data. One such time-domain sample is clocked out of filterbank 40 per clock cycle. In order to match the sample rates at the inputs of encoder 30 and the outputs of filterbank 40, a new set of eight data values is clocked into up-sampling circuits 38 once per eight clock cycles of the FIG. 4 system, and is clocked out of up-sampling circuits 38 to filterbank 40 once per clock cycle (eight times per eight clock cycles).
Filterbank 5 includes up-sampling circuits 34, transcoding filter stage 36, and summation circuits S0-S7, connected as shown in FIG. 4. Up-sampling circuits 34 and filter stage 36 implement MP3-to-SBC transcoding in which filter stage 36 applies filters M0(z)−M31(z) to each set of thirty-two reconstructed data values asserted to filterbank 5, to generate a set of thirty-two filtered data values. The outputs of filters M0(z)−M3(z) are combined in summation unit S0 to generate a first transcoded value (corresponding to filter M0(z)+M1(z)+M2(z)+M3(z)), the outputs of filters M4(z)−M7(z) are combined in another summation unit (not shown) to generate a second transcoded value (corresponding to filter M4(z)+M5(z)+M6(z)+M7(z)), the outputs of filters M8(z)−M11(z) are combined in another summation unit (not shown) to generate a third transcoded value, the outputs of filters M12(z)−M15(z) are combined in another summation unit (not shown) to generate a fourth transcoded value, the outputs of filters M16(z)−M19(z) are combined in another summation unit (not shown) to generate a fifth transcoded value, the outputs of filters M20(z)−M23(z) are combined in another summation unit (not shown) to generate a sixth transcoded value, the outputs of filters M24(z)−M27(z) are combined in another summation unit (not shown) to generate a seventh transcoded value, and the outputs of filters M28(z)−M31(z) are combined in summation unit S7 to generate an eight transcoded value (corresponding to filter M28(z)+M29(z)+M30(z)+M31(z)).
One such set of eight transcoded values (indicative of at least eight time-domain audio samples) is clocked out of filterbank 5 per eight clock cycles of the FIG. 4 system. In order to match the sample rates at the inputs of encoder 30 and the outputs of filterbank 5, a new set of thirty-two data values is clocked into up-sampling circuits 34 once per thirty-two clock cycles of the FIG. 4 system, and is clocked out of up-sampling circuits 34 to filter stage 36 once per eight clock cycles (four times per thirty-two clock cycles).
For simplicity, FIG. 3 shows and the description thereof assumes that circuits 10 receive only thirty-two (rather than 576) frequency-band coefficients indicative of time-domain samples of input audio data. Thus, the structure shown in FIG. 3 could be employed to implement MPEG1(Layer I)-to-SBC transcoding or MPEG1(Layer II)-to-SBC transcoding. In order to describe how a variation on the structure shown in FIG. 3 would implement MP3-to-SBC transcoding, the description of FIG. 3 provided below should be modified to include another filterbank to the left of FIG. 3. This so-called inner filterbank consists of a summation unit that asserts its output to upsamplers 10 in FIG. 3. Each of the summation units of the inner filterbank sums 18 inputs. An 18× upsampling stage followed by a band-pass filter produces an input to the summation unit. Thus, there are 576 (=32*18) 18× upsamplers followed by 576 band-pass filters (J′0(z) to J′575(z), where J′n(z)=J′n+18(z), i.e. only J′0(z) to J′17(z) are unique) followed by 32 summation units that assert their outputs to stage 10 of FIG. 3. Each of the 576 18× upsamplers receives the 576 reconstructed MP3 frequency sub-band coefficients.
Filterbank 2 (to implement MP3 encoding) actually consists of two filterbanks cascaded, with the first creating thirty-two streams of frequency-band samples and second creating eighteen streams of frequency sub-band samples for each stream of frequency-band samples, and thus creates 576 streams of frequency sub-band samples. However for simplicity FIG. 4 shows and the description thereof assumes only thirty-two filters in each of encoder 30 and stage 36, and thirty-two circuits 32 and thirty-two circuits 34, for generating and transcoding thirty-two streams of frequency-band coefficients (each stream corresponding to a different one of filters E0(z)−E31(z)) in response to a stream of time-domain input audio samples. Thus, the structure shown in FIG. 4 could be employed to implement MPEG1(Layer I)-to-SBC transcoding or MPEG1(Layer II)-to-SBC transcoding. In order to implement MP3-to-SBC transcoding, the FIG. 4 system would include implementations of encoder 30 and stage 36 and circuits 32 and 34 (and other circuitry not shown) configured to generate and transcode 576 streams of frequency-band coefficients (one for each different one of 576 frequency sub-bands, each corresponding to a one of 576 different paths through the two filterbanks) in response to a stream of time-domain input audio samples.
The description below of FIG. 4 applies, with minor modifications that will be apparent to one of ordinary skill in the art, to an implementation in which filterbank 5 receives 576 (rather than thirty-two) streams of data values indicative of time-domain samples of input audio data, and generates (and asserts to up-sampling circuits 38) seventy-two sets of eight transcoded frequency-band coefficients in response to each set of 576 data values clocked into filterbank 5. For example, in such an implementation of FIG. 4 (in which filterbank 5 receives 576 streams of data values), each of 576 up-sampling circuits 34 (connected in parallel within filterbank 5) would receive one stream of data values and implement “72×” upsampling thereon (in the sense described below), and filter stage 36 would apply 576 filters M′i(z), where “i” varies from 0 to 575 to generate eight transcoded frequency components in response to each set of 576 data values clocked into stage 36. One such set of eight transcoded frequency components (indicative of eight time-domain audio samples) would be clocked out of filterbank 5 per clock cycle. In order to match the sample rates at the inputs to encoder 30 and the outputs of filterbank 5, a new set of 576 data values would be clocked into the up-sampling circuits 34 once per 576 clock cycles of the FIG. 4 system, and would be clocked out of up-sampling circuits 34 to filter stage 36 once per eight clock cycles (seventy-two times per 576 clock cycles).
Before explaining in more detail the structure within filterbank 5 of FIG. 4, it is helpful first to consider the conventional structure shown in FIG. 3. FIG. 3 is a block diagram of a simplified implementation of elements 4 and 6 of the conventional FIG. 1 system. FIG. 3 is simplified in the sense that it processes 32 streams of coefficients (rather than 576 streams as discussed above).
In FIG. 3, a new set of thirty-two frequency-band coefficients (output from inverse quantization blocks IQ of FIG. 1) is clocked into up-sampling circuits 10 of filter 4 once per thirty-two clock cycles of the FIG. 1 system. In order to match the sample rates at the inputs of encoder 30 and the outputs of filter stage 12, each set of thirty-two data values is clocked out of up-sampling circuits 10 to filter stage 12 once per clock cycle (thirty-two times per thirty-two clock cycles). Once per clock cycle, filter stage 12 applies filters Hi(z) and combines the filtered outputs of filters Hi(z), to output one reconstructed sample (“T”) of the original input audio. Thus, filter stage 12 outputs thirty-two reconstructed samples of the original input audio per thirty-two clock cycles, in response to each new set of thirty-two data values clocked into up-sampling circuits 10.
Once per eight consecutive clock cycles, filter stage 16 of FIG. 3 applies filters Gi(z), where i has range from 0 to 7, to eight reconstructed input audio samples received from filter stage 12 to generate a set of eight, partially SBC-encoded data values (each corresponding to a different frequency band) and asserts such set of eight values to decimation (down-sampling) circuits 18. Decimation circuits 18 outputs one such set of eight partially SBC-encoded values per clock cycle, to MDCT and anti-aliasing circuitry (not shown) which transforms these values into a set of eight frequency coefficients. Each such set of eight frequency coefficients is then quantized in quantizers Q′ of FIG. 1 to generate a set of eight quantized frequency coefficients indicative of eight, SBC-encoded samples of the original time-domain input audio. Thus, filter stage 16, circuits 18, and quantizers Q′ together output one set of eight quantized frequency coefficients (indicative of eight, SBC-encoded samples of the original time-domain input audio) per eight clock cycles, in response to each eight time-domain data values clocked into filter stage 16 during eight consecutive clock cycles.
In FIG. 4, the impulse response of filters Hi(z) is given by hi(n) below:
According to the MPEG1-Layer I, II and III standard specification, filter h(n) is of length 512 and is a low-pass filter with cut-off at π/64.
The impulse response of filters Gi(z) of FIG. 4 is given by gi(n) below:
According to the Bluetooth A2DP SBC specification, filter g(n) is of length 80 and is a low-pass filter with cut-off at π/16.
Ideally after replacing filters 4 and 6 of FIG. 1 with new filterbank 5 of FIG. 2, the end-to-end transfer function of the FIG. 2 system should be near-unity (it should implement near-perfect reconstruction). This means that in the absence of quantization, the cascade of analysis filterbank 2, filterbank 5, and synthesis filterbank 8 of FIG. 2 should not introduce any aliasing distortions and its transfer function is just a delay.
Preferably, a maximally-decimated implementation of filterbank 5 (as shown in FIG. 4) is used to implement filterbank 5 of FIG. 2. In order to achieve an efficient implementation, it is desirable to derive all the filters Mi(z) of such maximally-decimated implementation of filterbank 5 by cosine-modulation of a low-pass prototype filter M(z). Thus:
M
0(z)=M4 (z)= . . . =M24(z)=M28(z)=ejφ·M(ze−jπ(0+0.5)/4)+e−jφ·M(zejπ(0+0.5)/4)
M
1(z)=M5 (z)= . . . =M25(z)=M29(z)=ejφ·M(ze−jπ(1+0.5)/4)+e−jφ·M(zejπ(1+0.5)/4)
M
2(z)=M6 (z)= . . . =M26(z)=M30(z)=ejφ·M(ze−jπ(2+0.5)/4)+e−jφ·M(zejπ(2+0.5)/4)
M
3(z)=M7 (z)= . . . =M27(z)=M31(z)=ejφ·M(ze−jπ(3+0.5)/4)+e−jφ·M(zejπ(3+0.5)/4)
MP3 decoding should achieve near-perfect reconstruction, and sufficient conditions for such near-perfect reconstruction are:
H
4p+q(z)=M4p+q(z8)·Fp(z) for p=0, 1, . . . , 7 and q=0, 1, 2, 3.
Note that M4p+q(z)=Mq(z), so that the conditions become
H
4p+q(z)=Mq(z8)·Fp(z) for p=0, 1, . . . , 7 and q=0, 1, 2, 3.
The prototype low-pass filter M(z) is judiciously chosen to be
H(z)=M(z8)·F(z).
H(z) and F(z) are low-pass prototype filters for MP3 cosine-modulated synthesis filterbank 12, and SBC cosine-modulated synthesis filterbank 40, respectively. Note that H(z) has support from −λ/64 to π/64, and F(z) has support from −π/16 to π/16. Therefore M(z) must have support from −π/8 to π/8.
It may not be possible (or practical) to find a filter M(z) that exactly satisfies the criteria set forth above and has a small finite impulse response. It is contemplated that a small FIR filter M(z) that approximately satisfies the criteria (and the corresponding filters Mi(z) of FIG. 4) will be implemented in typical embodiments of the invention:
H(z)=M(z8)·F(z).
Preferably, the phase factor φ in the expressions set forth above for filters Mi(z) filters is chosen so that the FIG. 4 system meets an end-to-end linear phase requirement.
By choosing filter M(z) to be a short (512−80)/8 or 54th order FIR filter that meets the above constraints, and implementing filters Mi(z) in accordance with such choice of filter M(z), maximally-decimated filterbank 5 of FIG. 4 can be implemented more efficiently than can MP3 synthesis filterbank 4 of FIG. 1 followed by SBC analysis filterbank 6 of FIG. 1 (e.g., maximally-decimated filterbank 5 of FIG. 4 can be implemented more efficiently than can the FIG. 3 system).
To implement the functions of stages 34 and 36 of filterbank 5 of FIG. 4 (with filters Mi(z) determined by the above-noted specific choice of filter M(z)), the only operations required are eight 4×4 DCT computations each followed by low-pass filtering by a filter of order 54. In other words, the only required computations are eight small (4×4) DCTs followed by eight small (54-point) FIR filters run over four samples. For example, in FIG. 4, the top four up-samplers of stage 34 and filters M0(z), M1(z), M2(z), and M3(z) of stage 36 can be implemented as circuitry for performing a 4×4 DCT followed by a 54-point FIR filter run over four samples.
In contrast, in order to implement the FIG. 3 system (an MP3 synthesis filterbank comprising elements 10 and 12 followed by an SBC analysis filterbank comprising elements 16 and 18), the required computations include one large (32×32) DCT and one large (512-point) FIR filter for MP3 synthesis and one medium-size (8×8) DCT and one medium-size (80-point) FIR filter run four times. In other words, the required computations include one large (32×32) DCT, four medium-size (8×8) DCTs, one large (512-point) FIR filtering operation, and four medium-size (80-point) FIR filtering operations.
FIG. 8 is an example of an embodiment of the inventive system which includes combined synthesis and analysis filterbank 303 configured to implement discrete cosine transform (DCT) computations (in DCT stage 304) each followed by low-pass filtering (in filter stage 305). When the FIG. 8 system implements MPEG1(Layer I)-to-SBC transcoding or MPEG1(LayerII)-to-SBC transcoding, stages 304 and 305 implement the functions of stages 34 and 36 of filterbank 5 of FIG. 4 (with filters Mi(z) determined by the above-noted specific choice of filter M(z)), as described in the paragraph preceding the previous paragraph. Alternatively, the FIG. 8 system can implement MP3-to-SBC transcoding (as described below) or transcoding of another type.
The FIG. 8 system includes inverse quantization stage (comprising N inverse quantization circuits I1, I2, . . . , and IN) configured to reconstruct original (pre-quantization) frequency-domain coefficients of audio data by performing inverse quantization thereon. The reconstructed frequency-domain coefficients are asserted to filterbank 303. The index N is equal to 576 to implement MP3-to-SBC transcoding, and is equal to 32 to implement MPEG1(Layer I)-to-SBC or MPEG1(LayerII)-to-SBC transcoding.
Filterbank 303 implements partial transcoding of the data values from the inverse quantization stage in accordance with the invention and asserts the partially transcoded data values to a quantization stage (comprising M quantization circuits Q′1, Q′2, . . . , and Q′M). More specifically, a new set of N data values is clocked into filterbank 103's DCT stage 304 once per N clock cycles, and a set of M transformed data values is clocked out of stage 304 to filter stage 305 once per N/M clock cycles. Filter stage 305 of filterbank 303 generates a new set of M filtered (“partially transcoded”) frequency coefficients once per each N/M clock cycles in response to each set of M data values from stage 304. Filter stage 305 asserts each such set of M partially transcoded frequency coefficients to the quantization stage comprising M quantization circuits Q′1, Q′2, . . . , and Q′M. The quantization stage performs quantization on the partially transcoded frequency coefficients (typically in accordance with perceptual criteria) to generate a set of M fully transcoded frequency-domain coefficients (once per M clock cycles). These fully transcoded frequency-domain coefficients can then undergo conventional decoding to reconstruct the original time-domain audio samples therefrom.
To implement the functions of non-simplified versions of stages 34 and 36 of filterbank 5 of a non-simplified version of FIG. 4 for performing MP3-to-SBC transcoding on sets of 576 frequency-band coefficients (rather than sets of 32 frequency-band coefficients as in the simplified version described above), the inventive filterbank (e.g., filterbank 303 of FIG. 8) should implement the equivalent of a MP3 synthesis filterbank implemented as a cascade of two filterbanks (an 18-band inner filterbank and a 32-band outer filterbank), and also an SBC analysis filterbank. The above-described simplified (32-band) version of FIG. 4 combines such an outer MP3 synthesis filterbank and an SBC analysis filterbank (which is a single stage, 8-band filterbank) in accordance with the invention. The combination of just the outer MP3 synthesis filterbank and the SBC analysis filterbank in accordance with the invention improves the efficiency of MP3-to-SBC transcoding significantly. However, its efficiency is further improved by extending the combination to include also the inner MP3 synthesis filterbank.
A non-simplified version of stages 34 and 36 of filterbank 5 of a version of FIG. 4 configured to perform MP3-to-SBC transcoding (e.g., an implementation of filterbank 303 of FIG. 8) operates on sets of 576 inverse-quantized frequency-band coefficients (each set including 18 coefficients for each of 18 different frequency sub-bands of each of 32 different frequency bands) to implement the equivalents of both the above-mentioned inner and outer MP3 synthesis filterbanks (and the equivalent of an SBC analysis filterbank). The only operations required on each set of 576 coefficients are eight 72×72 DCT computations, each followed by low-pass filtering by a 198-point FIR filter (when such DCT computations are performed in DCT stage 304 of FIG. 8, such low-pass filtering can be implemented by FIR filter stage 305). The non-simplified version of filterbank 5 performs operations equivalent to those performed by a version of stage 34 consisting of 576 upsamplers (each performing 72× upsampling on a different stream of frequency-band coefficients), a version of stage 36 consisting of 576 filters M0(z), . . . , M575(z), each such filter Mj(z) being a 198-point (198=(512−80)/8+36*4) FIR filter, and seven summation circuits (corresponding to circuits S0-S7 of FIG. 4), each such summation circuit configured to combine the outputs of a different subset of 72 of the filters M0(z)-M575(z) (i.e., the first summation circuit configured to add the outputs of M0(z)-M71(z), . . . , the second summation circuit configured to add the outputs of M72(z)-M143(z), . . . , and the seventh summation circuit configured to add the outputs of M504(z)-M575(z)).
In contrast, a non-simplified version of the conventional FIG. 3 system for performing MP3-to-SPB transcoding on sets of 576 inverse-quantized frequency-band coefficients would include an MP3 synthesis filterbank comprising 576 up-samplers (each corresponding to one of up-samplers 10 of FIG. 3, but configured to perform 18× upsampling) and a non-simplified (32×18 band) filter stage corresponding to stage 12, followed by similar second filterbank comprising thirty-two up-samplers 10 of FIG. 3, and a filter stage of stage 12, followed by an SBC analysis filterbank comprising a filter stage 16 and decimation circuits 18 (as in FIG. 3). The operations required on each set of 576 coefficients received by such a system include thirty-two (18×18) DCT computations and thirty-two (36-point) FIR filters on eighteen samples (to implement the inner MP3 synthesis filterbank), eighteen (32×32) DCT computations and eighteen large (512-point) FIR filters on 576 samples (to implement the outer MP3 synthesis filterbank), and a medium-size (8×8) DCT computation and medium-size (80-point) FIR filter run four times (to implement the SBC analysis filterbank). In other words, the required computations include eighteen large (32×32) DCTs, thirty-two large (18×18) DCTs, four (8×8) DCTs, eighteen large (512-point) FIR filtering operations, thirty-two 36-point FIR filtering operations, and four 80-point FIR filtering operations.
Clearly, processing in accordance with typical implementations of the FIG. 4 embodiment of the invention (e.g., to perform any of MPEG1(Layer I)-to-SBC transcoding, MPEG1(LayerII)-to-SBC transcoding, or MP3-to-SBC transcoding) has significant advantages (e.g., reduced computational complexity) relative to processing in accordance with the traditional FIG. 3 approach. In addition, the storage required for operating typical implementations of the FIG. 4 system is much smaller than the storage required to operate typical implementations of the FIG. 3 system. For example, to implement MPEG1(Layer I)-to-SBC or MPEG1(LayerII)-to-SBC transcoding as described with reference to FIG. 4, only a 54-point filter M(z) needs to be stored as opposed to 512-point and 80-point filters H(z) and G(z) for MPEG1(Layer I)-to-SBC or MPEG1(LayerII)-to-SBC transcoding as described with reference to FIG. 3.
Thus, filterbank 5 of FIG. 2 (and filterbank 5 of FIG. 4) can be configured to perform a small number of cosine transforms (e.g., eight or seventy-two DCTs), each on a different subset of a set of data values indicative of at least one time-domain sample of input audio data (e.g. a set of data values indicative of thirty-two or 576 time-domain samples of input audio data) to generate cosine-transformed data, and to perform low-pass filtering on the cosine-transformed data to generate transformed time-domain data. The transformed data values are transformed frequency-band coefficients that can be quantized to generate transcoded audio data in SBC format (e.g., transcoded audio data in SBC format indicative of thirty-two time-domain samples of input audio data).
In another class of embodiments of the inventive system, filterbank 5 of FIG. 2 is implemented as a non-maximally decimated filterbank. Such a system is shown in FIG. 6. All elements of FIG. 6 that are numbered identically to corresponding elements of FIG. 4 are identical in both FIGS. 4 and 6 and the description of these elements will not be repeated with reference to FIG. 6. Combined synthesis and analysis filterbank 5′ of FIG. 6 implements filterbank 5 of FIG. 2 (as does combined synthesis and analysis filterbank 5 of FIG. 4).
Filterbank 5′ of FIG. 6 differs from filterbank 5 of FIG. 4 in that it includes sixty up-sampling circuits 35 (in place of thirty-two up-sampling circuits 34 as in FIG. 4), and in that its filter stage 37 includes more elements than does corresponding filter stage 36 of FIG. 4. Filter stage 37 generates a set of eight data values (each corresponding to the output of one of summation circuits S′0-S′7) and asserts this set of eight values (which together are indicative of eight samples of input audio data) at its outputs once per each eight clock cycles, in response to each set of thirty-two frequency coefficients clocked out of decimation circuits 32. Thus, filterbank 5′ asserts at its outputs four such sets of eight data values during each consecutive thirty-two clock cycles. In order to match the sample rates at the inputs of encoder 30 and the outputs of filterbank 5′, a new set of thirty-two data values is clocked into up-sampling circuits 35 once per thirty-two clock cycles of the FIG. 6 system, and is clocked out of up-sampling circuits 35 to filter stage 37 once per eight clock cycles (four times per thirty-two clock cycles).
In elements 35 and 37 of the FIG. 6 system, approximately eight streams of MP3 frequency-band coefficients (which resemble eight streams of time-domain audio data samples and are thus sometimes referred to herein as eight streams of time-domain samples) are combined to generate a signal for each of eight SBC frequency-bands. Note that the first SBC frequency-band signal combines the first 6 MP3 frequency-bands, while the last SBC frequency-band combines the last 6 MP3 frequency-bands. To generate each of the six intermediate SBC frequency-band signals, four overlapping MP3 frequency-bands and two adjacent MP3 frequency-bands on either side are combined.
More specifically, the top six down-sampling circuits 32 are coupled to the top six up-sampling circuits 35 (whose outputs are filtered in filters M0,0(z), M1,0(z), M2,0(z), M3,0(z), M4,0(z), and M5,0(z)), the bottom six down-sampling circuits 32 are coupled to the bottom six up-sampling circuits 35 (whose outputs are filtered in filters M26,7(z), M27,7(z), M28,7(z), M29,7(z), M30,7(z), and M31,7(z)), the eight down-sampling circuits 32 above the bottom four circuits 32 are coupled to the eight corresponding up-sampling circuits 35 (whose outputs are filtered in filters M22,6(z), M23,6(z), M24,6(z), M25,6(z), M26,6(z), M27,6(z), M28,6(z), and M29,6(z)), and so on. The outputs of filters M0,0(z), M1,0(z), M2,0(z), M3,0(z), M4,0(z), and M5,0(z) are combined in circuit S′0, the outputs of filters M26,7(z), M27,7(z), M28,7(z), M29,7(z), M30,7(z), and M31,7(z) are combined in circuit S′7, the outputs of filters M22,6(z), M23,6(z), M24,6(z), M25,6(z), M26,6(z), M27,6(z), M28,6(z), and M29,6(z) are combined in circuit S′6, and so on.
In order to derive the correct filters Mp,q(z), where index “q” ranges from 0 to 7 and index “p” ranges from 0 to 31, the correct branches of the MP3 synthesis filter (G(z)) of FIG. 3 should be combined with corresponding branches of the SBC analysis filter (H(z)) of FIG. 3, and that theoretically each SBC analysis sub-band filter Gq(z) of FIG. 3 should be cascaded with each MP3 synthesis sub-band filter Hp(z). As noted above, in the FIG. 6 implementation the combinations are restricted to no more than eight MP3 sub-band filters Hp(z) that overlap and are adjacent to each Gq(z). In other words, p=4q−2, 4q−1, 4q, . . . , 4q+4, 4q+5, and 0<=p<=31. One such branch is illustrated in FIG. 7.
More specifically, filter stage 37 of FIG. 6 has sixty branches and implements only sixty filters Mp,q(z). The sixty filters Mp,q(z) that are implemented are determined with the restrictions that each Gq(z) is paired with up to eight overlapping filters Hp(z), where p=4q−2, 4q−1, 4q, . . . , 4q+4, 4q+5, but Gq(z) can be paired with just six filters from H bank, H0(z), H1(z), . . . , H5(z), since H−2(z), and H−1(z) do not exist, and G7(z) can be paired with just six filters H26(z), H27(z), . . . , H31(z), since H32(z), and H33(z) do not exist.
Consistent with FIG. 7, the result is that each filter Mp,q(z) is given by:
M
p,q(z)=(Hp(z)·Gq(z))↓8.
That is, the filter Mp,q(z) is one of the eight polyphase components of the filter Hp(z)Gq(z). Since Hp(z) is of order 512 and Gq(z) is of order 80, the filters Mp,q(z) are of order (512+80)/8 or 74.
FIG. 9 is a block diagram of another embodiment of the inventive transcoding system. The FIG. 9 system includes content server 400 which generates encoded audio data in a first encoding format. This audio data is transmitted over a link or network (e.g., the Internet) to transcoder 402 (which may be implemented in a portable media player). Transcoder 402 performs transcoding on the audio data in accordance with the invention, to generate transcoded audio data in a second encoding format in response thereto. The transcoded audio data are transmitted over a link or network (e.g., a wireless link) to decoder 404 (which may be implemented in a pair of headphones or other consuming device). Encoder 400 can implement filterbank 2 (and quantizers Q) of FIG. 2, transcoder 402 can implement filterbank 5 (inverse quantizers IQ and quantizers Q′) of FIG. 2, and decoder 404 can implement filterbank 8 (and inverse quantizers IQ′) of FIG. 2. Regardless of how encoder 400 and decoder 404 are implemented, the end-to-end transfer function of encoder 400, transcoder 402, and decoder 404 should be near-unity (it should implement near-perfect reconstruction). The manner in which encoder 400 and decoder 404 are implemented is not an object of the present invention. Indeed, some embodiments of the invention are limited to a transcoder, and do not include an encoder (for asserting encoded data to the transcoder) or a decoder (for decoding the transcoded data output from the transcoder).
Although the specific embodiments of the invention described herein are chosen because of their commercial importance, the principles of operation described herein are also applicable to transcoding of audio data in other formats (e.g., other perceptual transform coding formats).
It should be understood that while some embodiments of the present invention are illustrated and described herein, the invention is defined by the claims and is not to be limited to the specific embodiments described and shown.