Coding of audio signals for data reduction is a ubiquitous technology. High-quality, low-bitrate coding is essential for enabling cost-effective media storage and for facilitating distribution over constrained channels (such as Internet streaming). The efficiency of the compression is vital to these applications since the capacity requirements for uncompressed audio may be prohibitive in many scenarios.
Several existing audio coding approaches are based on sliding-window time-frequency transforms. Such transforms convert a time-domain audio signal into a time-frequency representation which is amenable to leveraging psychoacoustic principles to achieve data reduction while limiting the introduction of audible artifacts. In particular, the modified discrete cosine transform (MDCT) is commonly used in audio coders since the sliding-window MDCT can achieve perfect reconstruction using overlapping nonrectangular windows without oversampling, that is, while maintaining the same amount of data in the transform domain as in the time domain; this property is inherently favorable for audio coding applications.
While the time-frequency representation of an audio signal derived by a sliding-window MDCT provides an effective framework for audio coding, it is beneficial for coding performance to extend the framework such that the time-frequency resolution of the representation can be adapted based upon changes or variations in characteristics of the signal to be coded. For instance, such adaptation can be used to limit the audibility of coding artifacts. Several existing audio coders adapt to the signal to be coded by changing the window used in the sliding-window MDCT in response to the signal behavior. For tonal signal content, long windows may be used to provide high frequency resolution; for transient signal content, short windows may be used to provide high time resolution. This approach is commonly referred to as window switching.
Window switching approaches typically provide for short windows, long windows, and transition windows for switching from long to short and vice versa. It is common practice to switch to short windows based on a transient detection process. If a transient is detected in a portion of the audio signal to be coded, that portion of the audio signal is processed using short windows.
This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.
In one example aspect, a method of encoding an audio signal. Multiple different time-frequency transformations are applied to an audio signal frame across a frequency spectrum to produce multiple transforms of the frame, each transform including a corresponding time-frequency resolution across the frequency spectrum. Measures of coding efficiency are produced across multiple frequency bands within the frequency spectrum, for multiple time-frequency resolutions from among the multiple transforms. A combination of time-frequency resolutions is selected to represent the frame at each of the multiple frequency bands within the frequency spectrum, based at least in part upon the produced measures of coding efficiency. A window size and a corresponding transform size are determined for the frame, based at least in part upon the selected combination of time-frequency resolutions. A modification transformation is determined for at least a one of the frequency bands based at least in part upon the selected combination of time-frequency resolutions and the determined window size. The frame is windowed using the determined window size to produce a windowed frame. The windowed frame is transformed using the determined transform size to produce a transform of the windowed frame that includes a time-frequency resolution at each of the multiple frequency bands of the frequency spectrum. A time-frequency resolution within at least one frequency band of the transform of the windowed frame is modified based at least in part upon the determined modification transformation.
In another example aspect, a method of decoding a coded audio signal is provided. A coded audio signal frame (frame), modification information, transform size information, and window size information are received. A time-frequency resolution within at least one frequency band of the received frame is modified based at least in part upon the received modification information. An inverse transform is applied to the modified frame based at least in part upon the received transform size information. The inverse transformed modified frame is windowed using a window size based at least in part upon the received window size information.
It should be noted that alternative embodiments are possible, and steps and elements discussed herein may be changed, added, or eliminated, depending on the particular embodiment. These alternative embodiments include alternative steps and alternative elements that may be used, and structural changes that may be made, without departing from the scope of the disclosure.
Referring now to the drawings in which like reference numbers represent corresponding parts throughout:
FIG. 11C1-11C4 are illustrative functional block diagrams representing a sequence of frames flowing through a pipeline within the analysis block of the encoder of
FIG. 13B1 is an illustrative drawing representing an example first optimal transition sequence across frequency for a single frame through the trellis structure of
FIG. 13B2 is an illustrative first time-frequency tile frame corresponding to the first transition sequence across frequency of FIG. 13B1.
FIG. 13C1 is an illustrative drawing representing an example second optimal transition sequence across frequency for a single frame through the trellis structure of
FIG. 13C2 is an illustrative second time-frequency tile frame corresponding to the second transition sequence across frequency of FIG. 13C1.
In the following description of embodiments of an audio codec and method, reference is made to the accompanying drawings. These drawings shown by way of illustration specific examples of how embodiments of the audio codec system and method may be practiced. It is understood that other embodiments may be utilized and structural changes may be made without departing from the scope of the claimed subject matter.
An audio signal 101 denoted with time line 102 may represent an excerpt of a longer audio signal or stream, which may be a representation of time-varying physical sound features. A framing block 403 of the encoder 400 segments the audio signal into frames 120-128 for processing as indicated by the frame boundaries 103-109. The windowing block 407 multiplicatively applies the sequence of windows 111, 113, and 115 to the audio signal to produce windowed signal segments for further processing. The windows are time-aligned with the audio signal in accordance with the frame boundaries. For example, window 113 is time-aligned with the audio signal 101 such that the window 113 is centered on the frame 124 having frame boundaries 105 and 107.
The audio signal 101 may be denoted as a sequence of discrete-time samples x[t] where t is an integer time index. A windowing block audio signal value scaling function, as for example depicted by 111, may be denoted as w[n] where n is an integer time index. The windowing block scaling function may be defined in one embodiment as
for 0 s n≤N−1 where N is an integer value representing the window time length. In another embodiment, a window may be defined as
Other embodiments may perform other windowing scaling functions provided that the windowing function satisfies certain conditions as will be understood by those of ordinary skill in the art. See, J. P. Princen, A. W. Johnson, and A. B. Bradley. Subband/transform coding using filter bank designs based on time domain aliasing cancellation. In IEE EProc. Intl. Conference on Acoustics, Speech, and Signal Processing (ICASSP), page 2161-2164, 1987.
A windowed segment may be defined as,
x
i[n]=wi[n]x[n+ti] (3)
where i denotes an index for the windowed segment, wi[n] denotes the windowing function used for the segment, and ti denotes a starting time index in the audio signal for the segment. In some embodiments, the windowing scaling function may be different for different segments. In other words, different windowing time lengths and different windowing scaling functions may be used for different parts of the signal 101, for example for different frames of the signal or in some cases for different portions of the same frame.
In an audio coder based on a sliding-window transform, it may be beneficial to adapt the window and transform size based on the time-frequency behavior of the audio signal. As used herein, especially in the context of the MDCT, the term ‘transform size’ refers to the number of input data elements that the transform accepts; for some transforms other that the MDCT, e.g. the discrete Fourier transform (DFT), ‘transform size’ may instead refer to the number of output points (coefficients) that a transform computes. The concept of ‘transform size’ will be understood by those of ordinary skill in the related art. For tonal signals, the use of long windows (and likewise long-window frames) may improve coding efficiency. For transient signals, the use of short windows (and likewise short-window frames) may limit coding artifacts. For some signals, intermediate window sizes may provide coding advantages. Some signals may display tonal, transient, or yet other behaviors at different times throughout the signal such that the most advantageous window choice for coding may change in time. In such cases, a window-switching scheme may be used wherein windows of different sizes are applied to different segments of an audio signal that have different behaviors, for instance to different audio signal frames, and wherein transition windows are applied to change from one window size to another. In an audio coder, the selection of windows of a certain size in accordance with the audio signal behavior may improve coding performance; coding performance may be referred to as ‘coding efficiency’ which is used herein to describe how relatively effective a certain coding scheme is at encoding audio signals. If a particular audio coder, say coder A, can encode an audio signal at a lower data rate than a different audio coder, coder B, while introducing the same or fewer artifacts (such as quantization noise or distortion) as coder B, then coder A may be said to be more efficient than coder B. In some cases, ‘efficiency’ may be used to describe the amount of information in a representation, i.e. ‘compactness.’ For instance, if a signal representation, say representation A, can represent a signal with less data than a signal representation B but with the same or less error incurred in the representation, we may refer to representation A as being more ‘efficient’ than representation B.
The control signals provided to the windowing block 407 based upon the analysis results, may indicate a sequence of windowing operations to be applied by the windowing block 407 to a sequence of frames of audio data. The windowing block 407 produces a windowing signal waveform that includes a sequence of scaling windows. The analysis and control block 405 may cause the windowing block 407 to apply different scaling operations and different window time lengths to different audio frames, based upon different analysis results for the different audio frames, for example. Some audio frames may be scaled according to long windows. Others may be scaled according to short windows and still others may be scaled according to transition windows, for example. In some embodiments, the control block 405 may include a transient detector 415 to determine whether an audio frame contains transient signal behavior. For example, in response to a determination that a frame includes a transient signal behavior, the analysis and control block 405 may provide to the windowing block 407 control signals to indicate that a sequence of windowing operations consisting of short windows should be applied.
The windowing block 407 applies windowing functions to the audio frames to produce windowed audio segments and provides the windowed audio segments to the transform block 409. It will be appreciated that individual windowed time segments may be shorter in time duration than the frame from which they are produced; that is, a given frame may be windowed using multiple windows as illustrated by the short windows 317 of
The transform block 409 may be configured to carry out a MDCT, which may be defined mathematically as:
where
and where the values xi[n] are windowed time samples, i.e. time samples of a windowed audio segment. The values Xi[k] may be referred to generally as transform coefficients or specifically as modified discrete cosine transform (MDCT) coefficients. In accordance with the definition, the MDCT converts N time samples into transform coefficients. For the purposes of this specification, the MDCT as defined above is considered to be of size N. Conversely, an inverse modified discrete cosine transform (IMDCT), which may be performed by a decoder 1600, discussed below with reference to
where 0≤n≤N−1. As those of ordinary skill in the art will understand, a scale factor may be associated with either the MDCT, the IMDCT, or both. In some embodiments, the forward and inverse MDCT are each scaled by a factor
to normalize the result of the applying the forward and inverse MDCT successively. In other embodiments, a scale factor of 2/N may be applied to either the forward MDCT or the inverse MDCT. In yet other embodiments, an alternate scaling approach may be used.
In typical embodiments, a transform operation such as an MDCT is carried out by transform block 409 for each windowed segment of the input signal 401. This sequence of transform operations converts the time-domain signal 401 into a time-frequency representation comprising MDCT coefficients corresponding to each windowed segment. The time and frequency resolution of the time-frequency representation are determined at least in part by the time length of the windowed segment, which is determined by the window size applied by the windowing block 407, and by the size of the associated transform carried out by the transform block 409 on the windowed segment. In accordance with some embodiments size of an MDCT is defined as the number of input samples, and one-half as many transform coefficients are generated as the number of input samples. In an alternative embodiment using other transform techniques, input sample length (size) and corresponding output coefficient number (size) may have a more flexible relationship. For example, a size-8 FFT may be produced based upon a length-32 signal sample.
In some embodiments, a coder 400 may be configured to select among multiple window sizes to use for different frames. The analysis and control block 405 may determine that long windows should be used for frames consisting of primarily tonal content whereas short windows should be used for frames consisting of transient content, for example. In other embodiments, the coder 400 may be configured to support a wider variety of window sizes including long windows, short windows, and windows of intermediate size. The analysis and control block 405 may be configured to select an appropriate window size for each frame based upon characteristics of the audio content (e.g., tonal content, transient content).
In some embodiment, transform size corresponds to window length. For a windowed segment corresponding to a long time-length window, for example, the resulting time-frequency representation has low time resolution but high frequency resolution. For a windowed segment corresponding to a short time-length window, for example, the resulting time-frequency representation has relatively higher time resolution but lower frequency resolution than a time-frequency representation corresponding to a long-window segment. In some cases, a frame of the signal 401 may be associated with more than one windowed segment, as illustrated by the example short windows 317 of the example frame 307 of
As will be understood by those of ordinary skill in the art, an audio signal frame may be represented as an aggregation of signal transform components, such as MDCT components, for example. This aggregation of signal transform components may be referred to as a time-frequency representation. Furthermore, each of the components in such a time-frequency representation may have specific properties of time-frequency localization. In other words, a certain component may represent characteristics of the audio signal frame which correspond to a certain time span and to a certain frequency range. The relative time span for a signal transform component may be referred to as the component's time resolution. The relative frequency range for a signal transform component may be referred to as the signal transform component's frequency resolution. The relative time span and frequency range may be jointly referred to as the component's time-frequency resolution. As will also be understood by those of ordinary skill in the art, a representation of an audio signal frame may be described as having time-frequency resolution characteristics corresponding to the components in the representation. This may be referred to as the audio signal frame's time-frequency resolution. As will also be understood by those of ordinary skill in the art, a component refers to the function part of the transform, such as a basis vector. A coefficient refers to the weight of that component in a time-frequency representation of a signal. The components of a transform are the functions to which the coefficients correspond. The components are static. The coefficients describe how much of each component is present in the signal.
As will be understood by those of ordinary skill in the art, a time-frequency transform can be expressed graphically as a tiling of a time-frequency plane. The time-frequency representation corresponding to a sequence of windows and associated transforms can likewise be expressed graphically as a tiling of a time-frequency plane. As used herein the term time-frequency tile (hereinafter, ‘tile’) of an audio signal refers to a “box” which depicts a particular localized time-frequency region of the audio signal, i.e. a particular region of the time-frequency plane centered at a certain time and frequency and having a certain time resolution and frequency resolution, where the time resolution is indicated by the width of the tile in the time dimension (usually the horizontal axis) and the frequency resolution is indicated by the width of the tile in the frequency dimension (usually the vertical axis). A tile of an audio signal may represent a signal transform component e.g., an MDCT component. A tile of a time-frequency representation of an audio signal may be associated with a frequency band of the audio signal. Different frequency bands of a time-frequency representation of an audio signal may comprise similarly or differently shaped tiles i.e. tiles with the same or different time-frequency resolutions. As used herein a time-frequency tiling (hereinafter ‘tiling’) refers to a combination of tiles of a time-frequency representation, for example of an audio signal. A tiling may be associated with a frequency band of an audio signal. Different frequency bands of an audio signal may have the same or different tilings i.e. the same or different combinations of time-frequency resolutions. A tiling of an audio signal may correspond to a combination of signal transform components, e.g., a combination of MDCT components.
Thus, each tile in the graphical depictions described in this description indicates a signal transform component and its corresponding time resolution and frequency resolution for that region of the time-frequency representation. Each component in a time-frequency representation of an audio signal may have a corresponding coefficient value; analogously, each tile in a time-frequency tiling of an audio signal may have a corresponding coefficient value. A collection of tiles associated with a frame may be represented as a vector comprising a collection of signal transform coefficients corresponding to components in the time-frequency representation of the signal within the frame. Examples of window sequences and corresponding time-frequency tilings are depicted in
Referring to
Tile frame 532 represents the time-frequency resolution of a time-frequency representation of audio signal frame 506. Tile frame 534 represents the time-frequency resolution of a time-frequency representation audio signal frame 508. Tile frame 536 represents the time-frequency resolution of a time-frequency representation of audio signal frame 510. Tile dimensions within tile frames indicate time-frequency resolution. As explained above, tile width in the (vertical) frequency direction is indicative of frequency resolution. The narrower a tile is in the (vertical) frequency direction, the greater the number of tiles aligned vertically, which is indicative of higher frequency resolution. Tile width in the (horizontal) time direction is indicative of time resolution. The narrower a tile is in the (horizontal) time direction, the greater the number of tiles aligned horizontally, which is indicative of higher time resolution. Each of the tile frames 530-536 includes a plurality of individual tiles that are narrow along the (vertical) frequency axis, indicating a high frequency resolution. The individual tiles of tile frames 530-536 are wide along the (horizontal) time axis, indicating a low time resolution. Since all of the tile frames 530-536 have identical tiles that are narrow vertically and wide horizontally, all of the corresponding audio signal frames 504-510 represented by the tile frames 530-536 have the same time-frequency resolution as shown.
Referring to
Referring to
In some embodiments, the coder 400 may be configured to use a multiplicity of window sizes which are not related by powers of two. In some embodiments, it may be preferred to use window sizes related by powers of two as in the example in
The time-frequency tile frames depicted in
As depicted in
Modification of Time-Frequency Resolution of an Audio Signal Frame
As will be understood by those of ordinary skill in the art, the time-frequency resolution of an audio signal representation may be modified by applying a time-frequency transformation to the time-frequency representation of the signal. The modification of the time-frequency resolution of an audio signal may be visualized using time-frequency tiles.
Tile frame 901 represents an initial time-frequency tile frame consisting of tiles 902 with higher time resolution and lower frequency resolution. For the purposes of explanation, the corresponding signal representation may be expressed as a vector (not shown) consisting of four elements. In one embodiment, the resolution of the time-frequency representation may be modified by a time-frequency transformation process 903 to yield a time-frequency tile frame 905 consisting of tiles 904 with lower time resolution and higher frequency resolution. In some embodiments, this transformation may be realized by a matrix multiplication of the initial signal vector. Denoting the initial representation by {right arrow over (X)} and the modified representation by {right arrow over (Y)}, the time-frequency transformation process 903 may be realized in one embodiment as
where the matrix is based in part on a Haar analysis filter bank, which may be implemented using matrix transformations, as will be understood by those of ordinary skill in the art. In other embodiments, alternate time-frequency transformations such as a Walsh-Hadamard analysis filter bank, which may be implemented using matrix transformations, may be used. In some embodiments, the dimensions and structure of the transformation may be different depending on the desired time-frequency resolution modification. As those of ordinary skill in the art will understand, in some embodiments alternate transformations may be constructed based in part on iterating a two-channel Haar filter bank structure.
As another example, an initial time-frequency tile frame 907 represents a simple time-frequency tiling consisting of tiles 906 with higher frequency resolution and lower time resolution. For the purposes of explanation, the corresponding signal representation may be expressed as a vector (not shown) consisting of four elements. In one embodiment, the resolution of the tile frame 907 may be modified by a time-frequency transformation process 909 to yield a modified time-frequency tile frame 911 consisting of tiles 910 with higher time resolution and lower frequency resolution. As above, this transformation may be realized by a matrix multiplication of the initial signal vector. Denoting again the initial representation by {right arrow over (X)} and the modified representation by {right arrow over (Y)}, the time-frequency transformation 909 may be realized in one embodiment as
where the matrix is based in part on a Haar synthesis filter bank as will be understood by those of ordinary skill in the art. In other embodiments, alternate time-frequency transformations such as a Walsh-Hadamard synthesis filter bank, which may be implemented using matrix transformations, may be used. In some embodiments, the dimensions and structure of the time-frequency transformation may be different depending on the desired time-frequency resolution modification. As those of ordinary skill in the art will understand, in some embodiments alternate time-frequency transformations may be constructed based in part on iterating a two-channel Haar filter bank structure.
More particularly, the transform block 409 of the encoder 400 of
The time-frequency transformation modification block 1007 may perform time-frequency transformations on the frequency band groups in a manner generally described above with reference to
In some embodiments, the audio coder 400 may be configured with a control mechanism to determine an adaptive time-frequency resolution for the encoder processing. In such embodiments, the analysis and control block 405 may determine windowing functions for windowing block 407, transform sizes for time-frequency transform block 1003, and time-frequency transformations for time-frequency transformation modification block 1007. As explained with reference to
The analysis and control block 405 performs multiple different time-frequency transforms with different time-frequency resolutions on the analysis frame 1021. More specifically, first, second, third and fourth time-frequency transform analysis blocks 1023, 1025, 1027 and 1029 perform different respective first, second, third and fourth time-frequency transformations of the analysis frame 1021. The illustrative drawing of
First, second, third and fourth frequency band grouping blocks 1033-1039 may arrange the time-frequency signal transform coefficients (derived respectively by blocks 1023-1029), which may be MDCT coefficients, into groups according to frequency bands. The frequency band grouping may be represented as a vector arrangement of the transform coefficients organized in a prescribed fashion. For example, when grouping coefficients for a single window, the coefficients may be arranged in frequency order. When grouping coefficients for more than one window (e.g. when there is more than one set of signal transform coefficients, such as coefficients, computed—one for each window), the multiple sets of transform outputs may be rearranged into a vector with like frequencies adjacent to each other in the vector and arranged in time order (in the order of the sequence of windows to which they correspond). While
The frequency-band groupings of time-frequency transform coefficients corresponding to different time-frequency resolutions may be provided to the analysis block 1043 configured according to a time-frequency resolution analysis process. In some embodiments, the analysis process may only analyze the coefficients corresponding to a single analysis frame. In some embodiments, the analysis process may analyze the coefficients corresponding to a current analysis frame as well as frames of preceding frames. In some embodiments, the analysis process may employ an across-time trellis data structure and/or an across-frequency trellis data structure, as described below, to analyze coefficients across multiple frames. The analysis and control block 405 may provide control information for processing of an encoding frame. In some embodiments, the control information may include windowing functions for the windowing block 407, transform sizes (e.g. MDCT sizes) for block 1003 of transform block 409 of the encoder 400, and local time-frequency transformations for modification block 1007 of transform block 409 of the encoder 400. In some embodiments, the control information may be provided to block 411 for inclusion in the encoder output bitstream 413.
Similarly, the second time-frequency transform analysis block 1025 performs a second time-frequency transform of the analysis frame 1021 across an entire frequency spectrum of interest (F) to produce a second time-frequency transform frame 1052 that includes a second set of signal transform coefficients (e.g., MDCT coefficients) {CT-F2}. The second time-frequency transform may, for example, correspond to the time-frequency resolution of tiles 742 of frame 732 of
Likewise, the third time-frequency transform analysis block 1027 similarly performs a fourth time-frequency transform to produce a third time-frequency transform frame 1054 that includes a third set of signal transform components {CT-F3}. The third time-frequency transform may, for example, correspond to the time-frequency resolution of tiles 744 of frame 734 of
Finally, the fourth time-frequency transform analysis block 1029 similarly performs a fourth time-frequency transform to produce a fourth time-frequency transform frame 1056 that includes a fourth set of signal transform components {CT-F4}. The fourth time-frequency transform may, for example, correspond to the time-frequency resolution of tiles 746 of frame 736 of
Thus, it will be appreciated that in the example embodiment of
Operation 1101 receives a received frame 1186. Operation 1103 buffers the received frame 1186. The framing block 403 may buffer a set of frames that includes the encoding frame 1182, the analysis frame 1021, the received frame 1186, and any intermediate buffered frames 1188 received in a sequence between receipt of the encoding frame 1084 and receipt of the received frame 1186. Although the example in
Operation 1105 employs the multiple time-frequency transform analysis blocks 1023, 1025, 1027 and 1029 to compute multiple different time-frequency transforms (having different time-frequency resolutions) of the analysis frame 1021 as explained above, for example. In some embodiments, the operation of a time-frequency transform block such as 1023, 1025, 1027, or 1029 may comprise applying a sequence of windows and correspondingly sized MDCTs across the analysis frame 1021, where the size of the windows in the sequence of windows may be chosen from a predetermined set of window sizes. Each of the time-frequency transform blocks may have a different corresponding window size chosen from the predetermined set of window sizes. The predetermined set of window sizes may for example correspond to short windows, intermediate windows, and long windows. In other embodiments, alternate transforms may be computed in transform blocks 1023-1029 whose time-frequency resolutions correspond to these various windowed MDCTs.
Operation 1107 may configure the analysis block 1043 of
Operation 1113 communicates the window size to the windowing block 407 and the bitstream 413. Operation 1115 determines the optimal local transformations based on the window size choice and the optimal trellis path. Operation 1117 communicates the transform size and the optimal local transformations for the encoding frame 1182 to the transform block 409 and the bitstream 413.
Thus, it will be appreciated that an analysis frame 1021 is a frame on which analysis is currently being performed. A received frame 1186 is queued for analysis and encoding. An encoding frame is a frame 1182 on which encoding currently is being performed that may have been received before the current analysis frame. In some embodiments, there may be one or more additional intermediate buffered frames 1188.
In operation 1105, one or more sets of time-frequency tile frame transform coefficients are computed and grouped into frequency bands by blocks 1023-1029 and 1033, 1035, 1037, 1039 of the control block 405 of
The determined optimal transformation may be provided by the control module 405 to the processing path that includes blocks 407 and 409. Transforms such as a Walsh-Hadamard transform or a Haar transform determined by control block 405 may be used according to modification block 1007 by the transform block 409 of
In operation 1107, the time-frequency resolution tile frame data generated in operation 1105 is analyzed in some embodiments, using cost functions associated with a trellis algorithm to determine the efficiency of each possible time-frequency resolution for coding the analysis frame. In some embodiments, operation 1107 corresponds to computing cost functions associated with a trellis structure. A cost function computed for a path through a trellis structure may indicate the coding effectiveness of the path (i.e. the coding cost, such as a metric that encapsulates how many bits would be needed to encode that representation). In some embodiments, the analysis may be carried out in conjunction with transform data from previous audio signal frames. In operation 1109, an optimal set of time-frequency tile resolutions for an encoding frame is determined based upon results of the analysis in operation 1107. In other words, in some embodiments, in operation 1109, an optimal path through the trellis structure is identified. All path costs are evaluated and a path with the optimal cost is selected. An optimal time-frequency tiling of a current encoding frame may be determined based upon an optimal path identified by the trellis analysis. In some embodiments, an optimal time-frequency tiling for a signal frame may be characterized by a higher degree of sparsity of the coefficients in the time-frequency representation of the signal frame than for any other potential tiling of that frame considered in the analysis process. In some embodiments, the optimality of a time-frequency tiling for a signal frame may be based in part on the cost of encoding the corresponding time-frequency representation of the frame. In some embodiments, an optimal tiling for a given signal may yield improved coding efficiency with respect to a suboptimal tiling, meaning that the signal may be encoded with the optimal tiling at a lower data rate but the same error or artifact level as a suboptimal tiling or that the signal may be encoded with the optimal tiling at a lower error or artifact level but the same data rate as with a suboptimal tiling. Those of ordinary skill in the art will understand that the relative performance of encoders may be assessed using rate-distortion considerations.
In some embodiments, the encoding frame 1182 may be the same frame as the analysis frame 1021. In other embodiments, the encoding frame 1182 may precede the analysis frame 1021 in time. In some embodiments, the encoding frame 1182 may immediately precede the analysis frame 1021 in time with no intermediate buffered frames 1188. In some embodiments, the analysis and control block 405 may process multiple frames to determine the results for the encoding frame 1182; for example, the analysis may process one or more of the frames, some of which may precede the encoding frame 1182 in time, such as the encoding frame 1182, buffer frames 1088 (if any) between the encoding frame 1182 and the analysis frame 1021, and the analysis frame 1021. For example, if the encoding frame 1182 is before the analysis frame in time, then analysis and control block 405 can use the “future” information to process an analysis frame 1021 currently being analyzed to make final decisions for the encoding frame. This “lookahead” ability helps improve the decisions made for the encoding frame. For example, better encoding may be achieved for an encoding frame 1182 because of new information that the trellis navigation may incorporate from an analysis frame 1021. In general, lookahead benefits apply to encoding decisions made across multiple frames such as those illustrated in
In operation 1111, the analysis and control block 405 determines an optimal window size for the encoding frame 1182 at least in part based on the optimal time-frequency tile frame transform determined for the frame in operation 1109. The optimal path (or paths) for the encoding frame may indicate the best window size to use for the encoding frame 1182. The window size may be determined based on the path nodes of the optimal path through the trellis structure. For example, in some embodiments, the window size may be selected as the mean of the window sizes indicated by the path nodes of the optimal path through the trellis for the frame. In operation 1113, the analysis and control block 405 sends one or more signals to the windowing block 407, the transform block 409 and the data reduction and bitstream formatting block 411, to indicate the determined optimal window size. The data reduction and bitstream formatting block 411 encodes the window size into the bitstream for use by a decoder (not shown), for example. In operation 1115, optimal local time-frequency transformations for the encoding frame are determined at least in part based on the optimal time-frequency tile frame for the frame determined in step 1109. The optimal local time-frequency transforms also may be determined in part based on the optimal window size determined for the frame. More particularly, in accordance with some embodiments for example, in each frequency band, a difference is determined between the optimal time-frequency resolution for the band (indicated by the optimal trellis path) and the resolution provided by the window choice. That difference determines a local time-frequency transformation for that band in that frame. It will be appreciated that a single window size ordinarily must be selected to perform a time-frequency transform of an encoding frame 1182. The window size may be selected to provide a best overall match to the different time-frequency resolutions determined for the different frequency bands within the encoding frame 1182 based upon the trellis analysis. However, the selected window may not be an optimal match to time-frequency resolutions determined based upon the trellis analysis for one or more frequency bands. Such a window mismatch may result in inefficient coding or distortion of information within certain frequency bands. The local transformations according to the process of
In operation 1117, the optimal set of time-frequency transformations are provided to the transform block 409 and the data reduction and bitstream formatting block 411, which encodes the set of time-frequency transformations in the bitstream 413 so that a decoder can carry out the local inverse transformations.
In some embodiments, the time-frequency transformations may be encoded differentially with respect to transformations in adjacent frequency bands. In some embodiments, the actual transformation used (the matrix that is applied to the frequency band data) may be indicated in the bitstream. Each transformation may be indicated using an index into a set of possible transformations. The indices may then be encoded differentially instead of based upon their actual values. In some embodiments, the time-frequency transformations may be encoded differentially with respect to transformations in adjacent frames. In some embodiments, the data reduction and bitstream formatting block 411 may, for each frame, encode the base window size, the time-frequency resolutions for each band of the frame, and the transform coefficients for the frame into the bitstream for use by a decoder (not shown), for example. In some embodiments, one or more of the base window size, the time-frequency resolutions for each band, and the transform coefficients may be encoded differentially.
As discussed with reference to
FIG. 11C1-11C4 are illustrative functional block diagrams representing a sequence of frames flowing through a pipeline 1150 within the analysis block 405 and illustrating use of analysis results, produced during the flow, by the windowing block 407, transform block 409 and data reduction and bitstream formatting block 411 of the encoder 400 of
Referring to FIG. 11C1, at a first time interval analysis data for a current analysis frame F4 is stored at the analysis frame storage stage 1152, analysis data for a current second buffered frame F3 is stored at the second buffered frame storage stage 1154, analysis data for a current first buffered frame F2 is stored at the first buffered frame storage stage 1156; and analysis data for a current encoding frame F1 is stored at the encoding frame storage stage 1158. As explained in detail below, in some embodiments, the analysis block 1043 is configured to perform a trellis process to determine an optimal combination of time-frequency resolutions for multiple frequency bands of the current encoding frame F1. In some embodiments, the analysis block 1043 is configured to select a single window size for use by the windowing block 407 in production of an encoded frame F1C corresponding to the current encoding frame F1 in the analysis pipeline 1150. The analysis block produces the first, second and third control signals C407, C1003 and C1005 based upon the selected window size. The selected window size may not match an optimal time-frequency transformation determined for one or more frequency bands within the current encoding frame F1. Accordingly, in some embodiments, the analysis block 1043 produces the fourth time-frequency modification signal C1007 for use by the time-frequency transformation modification block 1007 to modify time-frequency resolutions within frequency bands of the current encoding frame F1 for which the optimal time-frequency resolutions determined by the analysis block 1042 are not matched to the selected window size. The analysis block 1043 produces the fifth control signal C411 for use by the data reduction and bitstream formatting block 411 to inform the decoder 1600 of the determined encoding of the current encoding frame, which may include an indication of the time-frequency resolutions used in the frequency bands of the frame.
During each time interval, an optimal time-frequency resolution for a current encoding frame and coding information for use by the decoder 1600 to decode the corresponding time-frequency representation of the encoding frame are produced based upon frames currently contained within the pipeline. More particularly, referring to FIGS. 1C1-11C4, at successive time intervals, analysis data for a new current analysis frame shifts into the pipeline 1150 and the analysis data for the previous frames shift (left), such that the analysis data for a previous encoding frame shifts out. Referring to
Referring to FIG. 11C2, F5 is the current analysis frame, F4 is the current second buffered frame, F3 is the current first buffered frame, F2 is the current encoding frame, and control signals 1160 are produced that are used to generate an current encoding frame version F2c. Referring to FIG. 11C3, F6 is the current analysis frame, F5 is the current second buffered frame, F4 is the current first buffered frame, F3 is the current encoding frame, and control signals 1160 are produced that are used to generate a current encoding frame version F3C. Referring to FIG. 11C4, F7 is the current analysis frame, F6 is the current second buffered frame, F5 is the current first buffered frame, F4 is the current encoding frame, and control signals 1160 are produced that are used to generate a current encoding frame version Fac.
It will be appreciated that the encoder 400 may produce a sequence of encoding frame versions (F1C, F2C, F3C, F4C) based upon corresponding sequence of current encoding frames (F1, F2, F3, F4). The encoding frame versions are invertible based at least in part upon frame size information and time-frequency modification information, for example. In particular, for example, a window may be selected to produce an encoding frame that does not match the optimal determined time-frequency resolution within one or more frequency bands within the current encoding frame in the pipeline 1150. The analysis block may determine time-frequency resolution modification transformations for the one or more mismatched frequency bands. The modification signal information C1007 may be used to communicate the selected adjustment transformation such that appropriate inverse modification transformations may be carried out in the decoder according to the process described above with reference to
Trellis Processing to Determine Optimal Time-Frequency Resolutions for Multiple Frequency Bands
In some embodiments, analysis and control block 405 may determine an optimal window size and a set of optimal time-frequency resolution transformations for an encoding frame of an audio signal using a trellis structure configured as in
In some embodiments, a node in the trellis structure of
Referring to
Thus, in some embodiments, a node may be associated with a state that includes transform coefficients corresponding to the node's frequency band and time-frequency resolution. For example, in some embodiments node 1317 may be associated with a second frequency band (in accordance with column 1311) and a lowest frequency resolution (in accordance with row 1301). In some embodiments, the transform coefficients may correspond to MDCT coefficients corresponding to the node's associated frequency band and resolution. MDCT coefficients may be computed for each analysis frame for each of a set of possible window sizes and corresponding MDCT transform sizes. In some embodiments, the MDCT coefficients may be produced according to the transform process of
In some embodiments, a state cost of a node state in terms of transform coefficient sparsity may be a function in part of the 1-norm of the transform coefficients of the node state. In some embodiments, a state cost of a node state in terms of transform coefficient sparsity may be a function in part of the number of transform coefficients having a significant absolute value, for instance an absolute value above a certain threshold. In some embodiments, a state cost of a node state in terms of transform coefficient sparsity may be a function in part of the entropy of the transform coefficients. It will be appreciated that in general, the more sparse the transform coefficients corresponding to the time-frequency resolution associated with a node, the lower the cost associated with the node. In some embodiments, a transition path cost associated with a transition path between nodes may be a measure of the data cost for encoding a change between the time-frequency resolutions associated with the nodes connected by the transition path. More specifically, in some embodiments, a transition path cost may be a function in part of the time-frequency resolution difference between the nodes connected by the transition path. For example, a transition path cost may be a function in part of the data required for encoding the difference between integer values corresponding to the time-frequency resolution of the states of the connected nodes. Those of ordinary skill in the art will understand that the trellis structure may be configured to direct a dynamic trellis-based optimization process to use other cost functions than those disclosed.
FIG. 13B1 is an illustrative drawing representing an example first optimal transition sequence across frequency through the trellis structure of
It will be appreciated that for the example trellis processing of FIG. 13B1 and FIG. 13C1, since there is no trellis processing across time in the trellis, there is no need or benefit from extra lookahead. The trellis analysis is run on an analysis frame, which in some embodiments may be the same frame in time as the encoding frame. In other embodiments, the analysis frame may be the next frame in time after the encoding frame. In other embodiments, there may be one or more buffered frames between the analysis frame and the encoding frame. The trellis analysis for the analysis frame may indicate how to complete the windowing of the encoding frame prior to transformation. In some embodiments it may indicate what window shape to use to conclude windowing the encoding frame in preparation for transforming the encoding frame and in preparation for a subsequent processing cycle wherein the present analysis frame becomes the new encoding frame.
FIG. 13C1 is an illustrative drawing representing an example second optimal transition sequence across frequency through the trellis structure of
In some embodiments, analysis and control block 405 is configured to use the trellis structure of
In some embodiments, the analysis and control block 405 may be configured to use additional enumerations; for example, a +2 may indicate a specific increase in frequency resolution greater than that enumerated by +1. In some embodiments, an enumeration of a time-frequency resolution change may correspond to the number of rows in the trellis spanned by the corresponding transition path of an optimal transition sequence. In some embodiments, the control block 405 may be configured to use enumerations to control the transform modification block 1009. In some embodiments, the enumeration may be encoded into the bitstream 413 by the data reduction and bitstream formatting block 411 for use by a decoder (not shown).
In some embodiments, the analysis block 1043 of the analysis and control block 405 may be configured to determine an optimal window size and a set of optimal time-frequency resolution modification transformations for an audio signal using a trellis structure configured as in
In some embodiments the first frame may be an encoding frame, the second and third frames may be buffered frames and the fourth frame may be an analysis frame. Referring to
In some embodiments, a node in the trellis structure of
In some embodiments, a state cost of a node state in terms of transform coefficient sparsity may be a function in part of the 1-norm of the transform coefficients of the node state. As explained above, in some embodiments, a state cost of a node state in terms of transform coefficient sparsity may be a function in part of the number of transform coefficients having a significant absolute value, for instance an absolute value above a certain threshold. In some embodiments, a state cost of a node state in terms of transform coefficient sparsity may be a function in part of the entropy of the transform coefficients. It will be appreciated that in general, the more sparse the transform coefficients corresponding to the time-frequency resolution associated with a node, the lower the cost associated with the node. Moreover, as explained above, in some embodiments, a transition cost associated with a transition path between nodes may be a measure of the data cost for encoding a change in the time-frequency resolutions associated with the nodes connected by the transition path. More specifically, in some embodiments, a transition path cost may be a function in part of the time-frequency resolution difference between the nodes connected by the transition path. For example, a transition path cost may be a function in part of the data required for encoding the difference between integer values corresponding to the time-frequency resolution of the states of the connected nodes. Those of ordinary skill in the art will understand that the trellis structure may be configured to direct a dynamic trellis-based optimization process to use other cost functions than those disclosed.
Thus, for lookahead-based processing using a trellis decoder, for example, an optimal path may be computed up to the current analysis frame. Nodes on that optimal path from the past (e.g., three frames back) may then be used for the encoding. Referring to
In embodiments in accordance with
Example of Modification of Signal Transform Time-Frequency Resolution within a Frequency Band of a Frame Due to Selection Of Mismatched Window Size Referring again to
The machine 1700 can comprise, but is not limited to, a server computer, a client computer, a personal computer (PC), a tablet computer, a laptop computer, a netbook, a set-top box (STB), a personal digital assistant (PDA), an entertainment media system or system component, a cellular telephone, a smart phone, a mobile device, a wearable device (e.g., a smart watch), a smart home device (e.g., a smart appliance), other smart devices, a web appliance, a network router, a network switch, a network bridge, a headphone driver, or any machine capable of executing the instructions 1716, sequentially or otherwise, that specify actions to be taken by the machine 1700. Further, while only a single machine 1700 is illustrated, the term “machine” shall also be taken to include a collection of machines 1700 that individually or jointly execute the instructions 1716 to perform any one or more of the methodologies discussed herein.
The machine 1700 can include or use processors 1710, such as including an audio processor circuit, non-transitory memory/storage 1730, and I/O components 1750, which can be configured to communicate with each other such as via a bus 1702. In an example embodiment, the processors 1710 (e.g., a central processing unit (CPU), a reduced instruction set computing (RISC) processor, a complex instruction set computing (CISC) processor, a graphics processing unit (GPU), a digital signal processor (DSP), an ASIC, a radio-frequency integrated circuit (RFIC), another processor, or any suitable combination thereof) can include, for example, a circuit such as a processor 1712 and a processor 1714 that may execute the instructions 1716. The term “processor” is intended to include a multi-core processor 1712, 1714 that can comprise two or more independent processors 1712, 1714 (sometimes referred to as “cores”) that may execute the instructions 1716 contemporaneously. Although
The memory/storage 1730 can include a memory 1732, such as a main memory circuit, or other memory storage circuit, and a storage unit 1136, both accessible to the processors 1710 such as via the bus 1702. The storage unit 1736 and memory 1732 store the instructions 1716 embodying any one or more of the methodologies or functions described herein. The instructions 1716 may also reside, completely or partially, within the memory 1732, within the storage unit 1736, within at least one of the processors 1710 (e.g., within the cache memory of processor 1712, 1714), or any suitable combination thereof, during execution thereof by the machine 1700. Accordingly, the memory 1732, the storage unit 1736, and the memory of the processors 1710 are examples of machine-readable media.
As used herein, “machine-readable medium” means a device able to store the instructions 1716 and data temporarily or permanently and may include, but not be limited to, random-access memory (RAM), read-only memory (ROM), buffer memory, flash memory, optical media, magnetic media, cache memory, other types of storage (e.g., erasable programmable read-only memory (EEPROM)), and/or any suitable combination thereof. The term “machine-readable medium” should be taken to include a single medium or multiple media (e.g., a centralized or distributed database, or associated caches and servers) able to store the instructions 1716. The term “machine-readable medium” shall also be taken to include any medium, or combination of multiple media, that is capable of storing instructions (e.g., instructions 1716) for execution by a machine (e.g., machine 1700), such that the instructions 1716, when executed by one or more processors of the machine 1700 (e.g., processors 1710), cause the machine 1700 to perform any one or more of the methodologies described herein. Accordingly, a “machine-readable medium” refers to a single storage apparatus or device, as well as “cloud-based” storage systems or storage networks that include multiple storage apparatus or devices. The term “machine-readable medium” excludes signals per se.
The I/O components 1750 may include a variety of components to receive input, provide output, produce output, transmit information, exchange information, capture measurements, and so on. The specific I/O components 1750 that are included in a particular machine 1700 will depend on the type of machine 1100. For example, portable machines such as mobile phones will likely include a touch input device or other such input mechanisms, while a headless server machine will likely not include such a touch input device. It will be appreciated that the I/O components 1750 may include many other components that are not shown in
In further example embodiments, the I/O components 1750 can include biometric components 1756, motion components 1758, environmental components 1760, or position components 1762, among a wide array of other components. For example, the biometric components 1756 can include components to detect expressions (e.g., hand expressions, facial expressions, vocal expressions, body gestures, or eye tracking), measure biosignals (e.g., blood pressure, heart rate, body temperature, perspiration, or brain waves), identify a person (e.g., voice identification, retinal identification, facial identification, fingerprint identification, or electroencephalogram based identification), and the like, such as can influence a inclusion, use, or selection of a listener-specific or environment-specific impulse response or HRTF, for example. In an example, the biometric components 1156 can include one or more sensors configured to sense or provide information about a detected location of the listener in an environment. The motion components 1758 can include acceleration sensor components (e.g., accelerometer), gravitation sensor components, rotation sensor components (e.g., gyroscope), and so forth, such as can be used to track changes in the location of the listener. The environmental components 1760 can include, for example, illumination sensor components (e.g., photometer), temperature sensor components (e.g., one or more thermometers that detect ambient temperature), humidity sensor components, pressure sensor components (e.g., barometer), acoustic sensor components (e.g., one or more microphones that detect reverberation decay times, such as for one or more frequencies or frequency bands), proximity sensor or room volume sensing components (e.g., infrared sensors that detect nearby objects), gas sensors (e.g., gas detection sensors to detect concentrations of hazardous gases for safety or to measure pollutants in the atmosphere), or other components that may provide indications, measurements, or signals corresponding to a surrounding physical environment. The position components 1762 can include location sensor components (e.g., a Global Position System (GPS) receiver component), altitude sensor components (e.g., altimeters or barometers that detect air pressure from which altitude may be derived), orientation sensor components (e.g., magnetometers), and the like.
Communication can be implemented using a wide variety of technologies. The I/O components 1750 can include communication components 1764 operable to couple the machine 1700 to a network 1780 or devices 1770 via a coupling 1782 and a coupling 1772 respectively. For example, the communication components 1764 can include a network interface component or other suitable device to interface with the network 1780. In further examples, the communication components 1764 can include wired communication components, wireless communication components, cellular communication components, near field communication (NFC) components, Bluetooth® components (e.g., Bluetooth® Low Energy), Wi-Fi® components, and other communication components to provide communication via other modalities. The devices 1770 can be another machine or any of a wide variety of peripheral devices (e.g., a peripheral device coupled via a USB).
Moreover, the communication components 1764 can detect identifiers or include components operable to detect identifiers. For example, the communication components 1764 can include radio frequency identification (RFID) tag reader components, NFC smart tag detection components, optical reader components (e.g., an optical sensor to detect one-dimensional bar codes such as Universal Product Code (UPC) bar code, multi-dimensional bar codes such as Quick Response (QR) code, Aztec code, Data Matrix, Dataglyph, MaxiCode, PDF49, Ultra Code, UCC RSS-2D bar code, and other optical codes), or acoustic detection components (e.g., microphones to identify tagged audio signals). In addition, a variety of information can be derived via the communication components 1064, such as location via Internet Protocol (IP) geolocation, location via Wi-Fi® signal triangulation, location via detecting an NFC beacon signal that may indicate a particular location, and so forth. Such identifiers can be used to determine information about one or more of a reference or local impulse response, reference or local environment characteristic, or a listener-specific characteristic.
In various example embodiments, one or more portions of the network 1780 can be an ad hoc network, an intranet, an extranet, a virtual private network (VPN), a local area network (LAN), a wireless LAN (WLAN), a wide area network (WAN), a wireless WAN (WWAN), a metropolitan area network (MAN), the Internet, a portion of the Internet, a portion of the public switched telephone network (PSTN), a plain old telephone service (POTS) network, a cellular telephone network, a wireless network, a Wi-Fi@ network, another type of network, or a combination of two or more such networks. For example, the network 1780 or a portion of the network 1080 can include a wireless or cellular network and the coupling 1082 may be a Code Division Multiple Access (CDMA) connection, a Global System for Mobile communications (GSM) connection, or another type of cellular or wireless coupling. In this example, the coupling 1782 can implement any of a variety of types of data transfer technology, such as Single Carrier Radio Transmission Technology (1×RTT), Evolution-Data Optimized (EVDO) technology, General Packet Radio Service (GPRS) technology, Enhanced Data rates for GSM Evolution (EDGE) technology, third Generation Partnership Project (3GPP) including 3G, fourth generation wireless (4G) networks, Universal Mobile Telecommunications System (UMTS), High Speed Packet Access (HSPA), Worldwide Interoperability for Microwave Access (WiMAX), Long Term Evolution (LTE) standard, others defined by various standard-setting organizations, other long range protocols, or other data transfer technology. In an example, such a wireless communication protocol or network can be configured to transmit headphone audio signals from a centralized processor or machine to a headphone device in use by a listener.
The instructions 1716 can be transmitted or received over the network 1780 using a transmission medium via a network interface device (e.g., a network interface component included in the communication components 1064) and using any one of a number of well-known transfer protocols (e.g., hypertext transfer protocol (HTTP)). Similarly, the instructions 1716 can be transmitted or received using a transmission medium via the coupling 1772 (e.g., a peer-to-peer coupling) to the devices 1770. The term “transmission medium” shall be taken to include any intangible medium that is capable of storing, encoding, or carrying the instructions 1716 for execution by the machine 1700, and includes digital or analog communications signals or other intangible media to facilitate communication of such software.
Example 1 can include a method of encoding an audio signal comprising: receiving the audio signal frame (frame); applying multiple different time-frequency transforms to the frame across a frequency spectrum to produce multiple transforms of the frame, each transform having a corresponding time-frequency resolution across the frequency spectrum; computing measures of coding efficiency for multiple frequency bands within the frequency spectrum, for multiple time-frequency resolutions corresponding to the multiple transforms; selecting a combination of time-frequency resolutions to represent the frame at each of the multiple frequency bands within the frequency spectrum, based at least in part upon the computed measures of coding efficiency; determining a window size and a corresponding transform size for the frame, based at least in part upon the selected combination of time-frequency resolutions; determining a modification transformation for at least a one of the frequency bands based at least in part upon the selected combination of time-frequency resolutions and the determined window size; windowing the frame using the determined window size to produce a windowed frame; transforming the windowed frame using the determined transform size to produce a transform of the windowed frame that has a corresponding time-frequency resolution at each of the multiple frequency bands of the frequency spectrum; modifying a time-frequency resolution within at least one frequency band of the transform of the windowed frame based at least in part upon the determined modification transformation.
Example 2 can include, or can optionally be combined with the subject matter of Example 1, wherein each corresponding time-frequency resolution across the frequency spectrum corresponds to a corresponding set of coefficients across the frequency spectrum; wherein the combination of time-frequency resolutions selected to represent the frame includes for each of the multiple frequency bands a subset of each corresponding set of coefficients; and wherein the computed corresponding measures of coding efficiency provide measures of coding efficiency of the corresponding subsets of coefficients.
Example 3 can include, or can optionally be combined with the subject matter of Example 2, wherein computing measures of coding efficiency includes computing measures based upon a combination of data rate and error rate.
Example 4 can include, or can optionally be combined with the subject matter of Example 2, wherein computing measures of coding efficiency includes computing measures based upon the sparsity of the coefficients.
Example 5 can include, or can optionally be combined with the subject matter of Example, wherein determining the modification transformation for the at least a one of the frequency bands includes determining based at least in part upon a difference between a time-frequency resolution selected to represent the frame in the at least a one of the frequency bands and a time-frequency resolution corresponding to the determined window size.
Example 6 can include, or can optionally be combined with the subject matter of Example 1, wherein modifying the time-frequency resolution within the at least one frequency band of the transform of the windowed frame includes modifying time-frequency resolution within at least one frequency band of the transform of the windowed frame to match a time-frequency resolution selected to represent the frame in the at least a one of the frequency bands.
Example 7 can include, or can optionally be combined with the subject matter of Example 1, wherein determining the modification transformation for the at least a one of the frequency bands includes determining based at least in part upon a difference between a time-frequency resolution selected to represent the frame in the at least a one of the frequency bands and a time-frequency resolution corresponding to the determined window size; and wherein modifying the time-frequency resolution within the at least one frequency band of the transform of the windowed frame includes modifying a time-frequency resolution within the at least one frequency band of the transform of the windowed frame to match the time-frequency resolution selected to represent the frame in the at least a one of the frequency bands.
Example 8 can include, or can optionally be combined with the subject matter of Example 1, wherein each corresponding time-frequency resolution across the frequency spectrum corresponds to a corresponding set of coefficients across the frequency spectrum; further including: grouping each corresponding set of coefficients into corresponding subsets of coefficients for each of the multiple frequency bands within the frequency spectrum; wherein computing the measures of coding efficiency for the multiple frequency bands across the frequency spectrum includes determining respective measures of coding efficiency for multiple respective combinations of subsets of coefficients, each respective combination of coefficients having a subset of coefficients from each set of corresponding coefficients in each frequency band.
Example 9 can include, or can optionally be combined with the subject matter of Example 8, wherein selecting the combination of time-frequency resolutions includes comparing the determined respective measures of coding efficiency for multiple respective combinations of subsets of coefficients.
Example 10 can include, or can optionally be combined with the subject matter of Example 1, wherein each corresponding time-frequency resolution across the frequency spectrum corresponds to a corresponding set of coefficients across the frequency spectrum; further including:
grouping each corresponding set of coefficients into corresponding subsets of coefficients for each of the multiple frequency bands within the frequency spectrum;
wherein computing a measure of coding efficiency for the multiple frequency bands across the frequency spectrum includes using a trellis structure to compute the measures of coding efficiency, wherein a node of the trellis structure corresponds to one of the subsets of coefficients and a column of the trellis structure corresponds to one of the multiple frequency bands.
Example 11 can include, or can optionally be combined with the subject matter of Example 10, wherein respective measures of coding efficiency include respective transition costs associated with respective transition paths between nodes in different columns of the trellis structure.
Example 12 can include a method of encoding an audio signal comprising: receiving, a sequence of audio signal frames (frames), wherein the sequence of frames includes an audio frame received before one or more other frames of the sequence; designating the audio frame received before one or more other frames of the sequence as the encoding frame; applying multiple different time-frequency transforms to each respective received frame across a frequency spectrum to produce for each respective frame multiple transforms of the respective frame, each transform of the respective frame having a corresponding time-frequency resolution of the respective frame across the frequency spectrum; computing measures of coding efficiency of the sequence of received frames across multiple frequency bands within the frequency spectrum, for multiple time-frequency resolutions of the respective frames corresponding to the multiple transforms of the respective frames; selecting a combination of time-frequency resolutions to represent the encoding frame at each of the multiple frequency bands within the frequency spectrum, based at least in part upon the computed measures of coding efficiency; determining a window size and a corresponding transform size for the encoding frame, based at least in part upon the combination of time-frequency resolutions selected to represent the encoding frame; determining a modification transformation for at least a one of the frequency bands based at least in part upon the selected combination of time-frequency resolutions for the encoding frame and the determined window size; windowing the encoding frame using the determined window size to produce a windowed frame; transforming the windowed encoding frame using the determined transform size to produce a transform of the windowed encoding frame that has a corresponding time-frequency resolution at each of the multiple frequency bands of the frequency spectrum; and modifying a time-frequency resolution within at least one frequency band of the transform of the windowed encoding frame based at least in part upon the determined modification transformation.
Example 13 can include, or can optionally be combined with the subject matter of Example 12, wherein each corresponding time-frequency resolution across the frequency spectrum corresponds to a corresponding set of coefficients across the frequency spectrum; wherein the combination of time-frequency resolutions selected to represent the encoding frame includes for each of the multiple frequency bands a subset of each corresponding set of coefficients; and wherein the computed measures of coding efficiency provide measures of coding efficiency of the corresponding subsets of coefficients.
Example 14 can include, or can optionally be combined with the subject matter of Example 13, wherein computing measures of coding efficiency includes computing measures based upon a combination of data rate and error rate.
Example 15 can include, or can optionally be combined with the subject matter of Example 13, wherein computing measures of coding efficiency includes computing measures based upon sparsity of coefficients.
Example 16 can include, or can optionally be combined with the subject matter of Example 12, wherein determining the modification transformation for the at least a one of the frequency bands includes determining based at least in part upon a difference between a time-frequency resolution selected to represent the encoding frame in the at least a one of the frequency bands and a time-frequency resolution corresponding to the determined window size.
Example 17 can include, or can optionally be combined with the subject matter of Example 12, wherein modifying the time-frequency resolution within the at least one frequency band of the transform of the windowed encoding frame includes modifying time-frequency resolution within at least one frequency band of the transform of the windowed encoding frame to match a time-frequency resolution selected to represent the encoding frame in the at least a one of the frequency bands.
Example 18 can include, or can optionally be combined with the subject matter of Example 12, wherein determining the modification transformation for the at least a one of the frequency bands includes determining based at least in part upon a difference between a time-frequency resolution selected to represent the encoding frame in the at least a one of the frequency bands and a time-frequency resolution corresponding to the determined window size; and wherein modifying the time-frequency resolution within the at least one frequency band of the transform of the windowed encoding frame includes modifying a time-frequency resolution within the at least one frequency band of the transform of the windowed encoding frame to match the time-frequency resolution selected to represent the encoding frame in the at least a one of the frequency bands.
Example 19 can include, or can optionally be combined with the subject matter of Example 12, wherein each corresponding time-frequency resolution across the frequency spectrum corresponds to a corresponding set of coefficients across the frequency spectrum; further including: grouping each corresponding set of coefficients into corresponding subsets of coefficients for each of the multiple frequency bands within the frequency spectrum; wherein computing the measures of coding efficiency for the multiple frequency bands across the frequency spectrum includes determining respective measures of coding efficiency for multiple respective combinations of subsets of coefficients, each respective combination of coefficients having a subset of coefficients from each corresponding set of coefficients in each frequency band.
Example 20 can include, or can optionally be combined with the subject matter of Example 19, wherein selecting the combination of time-frequency resolutions includes comparing the determined respective measures of coding efficiency for multiple respective combinations of subsets of coefficients.
Example 21 can include, or can optionally be combined with the subject matter of Example 12, wherein each corresponding time-frequency resolution across the frequency spectrum corresponds to a corresponding set of coefficients across the frequency spectrum; further including: grouping each corresponding set of coefficients into corresponding subsets of coefficients for each of the multiple frequency bands within the frequency spectrum; wherein computing a measure of coding efficiency for the multiple frequency bands across the frequency spectrum includes using a trellis structure that includes a plurality of nodes arranged in rows and columns to compute the measures of coding efficiency, wherein a node of the trellis structure corresponds to one of the subsets of coefficients for one of the multiple frequency bands and a column of the trellis structure corresponds to one of the frames of the sequence of frames.
Example 22 can include, or can optionally be combined with the subject matter of Example 21, wherein computing measures of coding efficiency includes determining respective transition costs associated with respective transition paths between nodes of the trellis structure.
Example 23 can include, or can optionally be combined with the subject matter of Example 12, wherein each corresponding time-frequency resolution across the frequency spectrum corresponds to a corresponding set of coefficients across the frequency spectrum; further including: grouping each corresponding set of coefficients into corresponding subsets of coefficients for each of the multiple frequency bands within the frequency spectrum; wherein computing a measure of coding efficiency for the multiple frequency bands across the frequency spectrum includes using multiple trellis structures to compute the measures of coding efficiency, wherein each trellis structure corresponds to a different one of the multiple frequency bands, wherein each trellis structure includes a plurality of nodes arranged in rows and columns, wherein each column of each trellis structure corresponds to one of the frames of the sequence of frames, and wherein each node of each respective trellis structure corresponds to one of the subsets of coefficients for the frequency band corresponding to that trellis structure.
Example 24 can include, or can optionally be combined with the subject matter of Example 23, wherein computing measures of coding efficiency includes computing respective transition costs associated with respective transition paths between nodes of the respective trellis structures.
Example 25 can include audio encoder comprising: applying multiple different time-frequency transforms to the frame across a frequency spectrum to produce multiple transforms of the frame, each transform having a corresponding time-frequency resolution across the frequency spectrum; computing measures of coding efficiency for multiple frequency bands within the frequency spectrum, for multiple time-frequency resolutions corresponding to the multiple transforms; selecting a combination of time-frequency resolutions to represent the frame at each of the multiple frequency bands within the frequency spectrum, based at least in part upon the computed measures of coding efficiency; determining a window size and a corresponding transform size for the frame, based at least in part upon the selected combination of time-frequency resolutions; determining a modification transformation for at least one of the frequency bands based at least in part upon the selected combination of time-frequency resolutions and the determined window size; windowing the frame using the determined window size to produce a windowed frame; transforming the windowed frame using the determined transform size to produce a transform of the windowed frame that has a corresponding time-frequency resolution at each of the multiple frequency bands of the frequency spectrum; modifying a time-frequency resolution within at least one frequency band of the transform of the windowed frame based at least in part upon the determined modification transformation.
Example 26 can include, or can optionally be combined with the subject matter of Example 25, wherein each corresponding time-frequency resolution across the frequency spectrum corresponds to a corresponding set of coefficients across the frequency spectrum; wherein the combination of time-frequency resolutions selected to represent the frame includes for each of the multiple frequency bands a subset of each corresponding set of coefficients; and wherein the computed corresponding measures of coding efficiency provide measures of coding efficiency of the corresponding subsets of coefficients.
Example 27 can include, or can optionally be combined with the subject matter of Example 26, wherein computing measures of coding efficiency includes computing measures based upon a combination of data rate and error rate.
Example 28 can include, or can optionally be combined with the subject matter of Example 26, wherein computing measures of coding efficiency includes computing measures based upon the sparsity of the coefficients.
Example 29 can include, or can optionally be combined with the subject matter of Example 25, wherein determining the modification transformation for the at least a one of the frequency bands includes determining based at least in part upon a difference between a time-frequency resolution selected to represent the frame in the at least a one of the frequency bands and a time-frequency resolution corresponding to the determined window size.
Example 30 can include, or can optionally be combined with the subject matter of Example 25, wherein modifying the time-frequency resolution within the at least one frequency band of the transform of the windowed frame includes modifying time-frequency resolution within at least one frequency band of the transform of the windowed frame to match a time-frequency resolution selected to represent the frame in the at least a one of the frequency bands.
Example 31 can include, or can optionally be combined with the subject matter of Example 25, wherein determining the modification transformation for the at least a one of the frequency bands includes determining based at least in part upon a difference between a time-frequency resolution selected to represent the frame in the at least a one of the frequency bands and a time-frequency resolution corresponding to the determined window size; and wherein modifying the time-frequency resolution within the at least one frequency band of the transform of the windowed frame includes modifying a time-frequency resolution within the at least one frequency band of the transform of the windowed frame to match the time-frequency resolution selected to represent the frame in the at least a one of the frequency bands.
Example 32 can include, or can optionally be combined with the subject matter of Example 25, wherein each corresponding time-frequency resolution across the frequency spectrum corresponds to a corresponding set of coefficients across the frequency spectrum; further including: grouping each corresponding set of coefficients into corresponding subsets of coefficients for each of the multiple frequency bands within the frequency spectrum; wherein computing the measures of coding efficiency for the multiple frequency bands across the frequency spectrum includes determining respective measures of coding efficiency for multiple respective combinations of subsets of coefficients, each respective combination of coefficients having a subset of coefficients from each set of corresponding coefficients in each frequency band.
Example 33 can include, or can optionally be combined with the subject matter of Example 32, wherein selecting the combination of time-frequency resolutions includes comparing the determined respective measures of coding efficiency for multiple respective combinations of subsets of coefficients.
Example 34 can include, or can optionally be combined with the subject matter of Example 25, wherein each corresponding time-frequency resolution across the frequency spectrum corresponds to a corresponding set of coefficients across the frequency spectrum; further including: grouping each corresponding set of coefficients into corresponding subsets of coefficients for each of the multiple frequency bands within the frequency spectrum;
wherein computing a measure of coding efficiency for the multiple frequency bands across the frequency spectrum includes using a trellis structure to compute the measures of coding efficiency, wherein a node of the trellis structure corresponds to one of the subsets of coefficients and a column of the trellis structure corresponds to one of the multiple frequency bands.
Example 35 can include, or can optionally be combined with the subject matter of Example 34, wherein respective measures of coding efficiency include respective transition costs associated with respective transition paths between nodes in different columns of the trellis structure.
Example 36 can include an Example audio encoder comprising: at least one processor; one or more computer-readable mediums storing instructions that, when executed by the one or more computer processors, cause the system to perform operations comprising: receiving, a sequence of audio signal frames (frames), wherein the sequence of frames includes an audio frame received before one or more other frames of the sequence; designating the audio frame received before one or more other frames of the sequence as the encoding frame; applying multiple different time-frequency transforms to each respective received frame across a frequency spectrum to produce for each respective frame multiple transforms of the respective frame, each transform of the respective frame having a corresponding time-frequency resolution of the respective frame across the frequency spectrum; computing measures of coding efficiency of the sequence of received frames across multiple frequency bands within the frequency spectrum, for multiple time-frequency resolutions of the respective frames corresponding to the multiple transforms of the respective frames; selecting a combination of time-frequency resolutions to represent the encoding frame at each of the multiple frequency bands within the frequency spectrum, based at least in part upon the computed measures of coding efficiency; determining a window size and a corresponding transform size for the encoding frame, based at least in part upon the combination of time-frequency resolutions selected to represent the encoding frame; determining a modification transformation for at least a one of the frequency bands based at least in part upon the selected combination of time-frequency resolutions for the encoding frame and the determined window size; windowing the encoding frame using the determined window size to produce a windowed frame; transforming the windowed encoding frame using the determined transform size to produce a transform of the windowed encoding frame that has a corresponding time-frequency resolution at each of the multiple frequency bands of the frequency spectrum; and modifying a time-frequency resolution within at least one frequency band of the transform of the windowed encoding frame based at least in part upon the determined modification transformation.
Example 37 can include, or can optionally be combined with the subject matter of Example 36, wherein each corresponding time-frequency resolution across the frequency spectrum corresponds to a corresponding set of coefficients across the frequency spectrum; wherein the combination of time-frequency resolutions selected to represent the encoding frame includes for each of the multiple frequency bands a subset of each corresponding set of coefficients; and wherein the computed measures of coding efficiency provide measures of coding efficiency of the corresponding subsets of coefficients.
Example 38 can include, or can optionally be combined with the subject matter of Example 37, wherein computing measures of coding efficiency includes computing measures based upon a combination of data rate and error rate.
Example 39 can include, or can optionally be combined with the subject matter of Example 37, wherein computing measures of coding efficiency includes computing measures based upon sparsity of wherein determining the modification transformation for the at least a one of the frequency bands includes determining based at least in part upon a difference between a time-frequency resolution selected to represent the encoding frame in the at least a one of the frequency bands and a time-frequency resolution corresponding to the determined window size.
Example 40 can include, or can optionally be combined with the subject matter of Example 36, wherein modifying the time-frequency resolution within the at least one frequency band of the transform of the windowed encoding frame includes modifying time-frequency resolution within at least one frequency band of the transform of the windowed encoding frame to match a time-frequency resolution selected to represent the encoding frame in the at least a one of the frequency bands.
Example 41 can include, or can optionally be combined with the subject matter of Example 36, wherein determining the modification transformation for the at least a one of the frequency bands includes determining based at least in part upon a difference between a time-frequency resolution selected to represent the encoding frame in the at least a one of the frequency bands and a time-frequency resolution corresponding to the determined window size; and wherein modifying the time-frequency resolution within the at least one frequency band of the transform of the windowed encoding frame includes modifying a time-frequency resolution within the at least one frequency band of the transform of the windowed encoding frame to match the time-frequency resolution selected to represent the encoding frame in the at least a one of the frequency bands.
Example 42 can include, or can optionally be combined with the subject matter of Example 36, wherein each corresponding time-frequency resolution across the frequency spectrum corresponds to a corresponding set of coefficients across the frequency spectrum; further including: grouping each corresponding set of coefficients into corresponding subsets of coefficients for each of the multiple frequency bands within the frequency spectrum; wherein computing the measures of coding efficiency for the multiple frequency bands across the frequency spectrum includes determining respective measures of coding efficiency for multiple respective combinations of subsets of coefficients, each respective combination of coefficients having a subset of coefficients from each corresponding set of coefficients in each frequency band.
Example 43 can include, or can optionally be combined with the subject matter of Example 42, wherein selecting the combination of time-frequency resolutions includes comparing the determined respective measures of coding efficiency for multiple respective combinations of subsets of coefficients.
Example 44 can include, or can optionally be combined with the subject matter of Example 36, wherein each corresponding time-frequency resolution across the frequency spectrum corresponds to a corresponding set of coefficients across the frequency spectrum; further including: grouping each corresponding set of coefficients into corresponding subsets of coefficients for each of the multiple frequency bands within the frequency spectrum; wherein computing a measure of coding efficiency for the multiple frequency bands across the frequency spectrum includes using a trellis structure that includes a plurality of nodes arranged in rows and columns to compute the measures of coding efficiency, wherein a node of the trellis structure corresponds to one of the subsets of coefficients for one of the multiple frequency bands and a column of the trellis structure corresponds to one of the frames of the sequence of frames.
Example 45 can include, or can optionally be combined with the subject matter of Example 44, wherein computing measures of coding efficiency includes determining respective transition costs associated with respective transition paths between nodes of the trellis structure.
Example 46 can include, or can optionally be combined with the subject matter of Example 36, wherein each corresponding time-frequency resolution across the frequency spectrum corresponds to a corresponding set of coefficients across the frequency spectrum; further including: grouping each corresponding set of coefficients into corresponding subsets of coefficients for each of the multiple frequency bands within the frequency spectrum; wherein computing a measure of coding efficiency for the multiple frequency bands across the frequency spectrum includes using multiple trellis structures to compute the measures of coding efficiency, wherein each trellis structure corresponds to a different one of the multiple frequency bands, wherein each trellis structure includes a plurality of nodes arranged in rows and columns, wherein each column of each trellis structure corresponds to one of the frames of the sequence of frames, and wherein each node of each respective trellis structure corresponds to one of the subsets of coefficients for the frequency band corresponding to that trellis structure.
Example 47 can include, or can optionally be combined with the subject matter of Example 46, wherein computing measures of coding efficiency includes computing respective transition costs associated with respective transition paths between nodes of the respective trellis structures.
An Example 48 includes a method of decoding a coded audio signal comprising: receiving the coded audio signal frame (frame); receiving modification information; receiving transform size information; receiving window size information; modifying a time-frequency resolution within at least one frequency band of the received frame based at least in part upon the received modification information; applying an inverse transform to the modified frame based at least in part upon the received transform size information; and windowing the inverse transformed modified frame using a window size based at least in part upon the received window size information.
Example 49 can include, or can optionally be combined with the subject matter of Example of claim 48 further including: overlap-adding the windowed inverse transformed modified frame with adjacent windowed inverse transformed modified frames.
Example 50 can include, or can optionally be combined with the subject matter of Example 48 further including: overlap-adding short windows within the windowed inverse transformed modified frame.
An Example 51 includes a method of decoding a coded audio signal comprising: receiving the coded audio signal frame (frame); receiving modification information; receiving transform size information; receiving window size information; modifying a coefficient within at least one frequency band of the received frame based at least in part upon the received modification information; applying an inverse transform to the modified frame based at least in part upon the received transform size information; and windowing the inverse transformed modified frame using a window size based at least in part upon the received window size information.
Example 52 can include, or can optionally be combined with the subject matter of Example 51 further including: overlap-adding the windowed inverse transformed modified frame with adjacent windowed inverse transformed modified frames.
Example 53 can include, or can optionally be combined with the subject matter of Example 51 further including: overlap-adding short windows within the windowed inverse transformed modified frame.
An Example 54 includes an audio decoder comprising: at least one processor; one or more computer-readable mediums storing instructions that, when executed by the one or more computer processors, cause the system to perform operations comprising: receiving the coded audio signal frame (frame); receiving modification information; receiving transform size information; receiving window size information; modifying a time-frequency resolution within at least one frequency band of the received frame based at least in part upon the received modification information; applying an inverse transform to the modified frame based upon at least in part upon the received transform size information; and windowing the inverse transformed modified frame using a window size based upon the received window size information.
Example 55 can include, or can optionally be combined with the subject matter of Example 54 further including: one or more computer-readable mediums storing instructions that, when executed by the one or more computer processors, cause the system to perform operations comprising: overlap-adding the windowed inverse transformed modified frame with adjacent windowed inverse transformed modified frame.
Example 56 can include, or can optionally be combined with the subject matter of Example 54 further including: one or more computer-readable mediums storing instructions that, when executed by the one or more computer processors, cause the system to perform operations comprising: overlap-adding short windows within the windowed inverse transformed modified frame.
An Example 57 includes audio decoder comprising: at least one processor; one or more computer-readable mediums storing instructions that, when executed by the one or more computer processors, cause the system to perform operations comprising: receiving the coded audio signal frame (frame); receiving modification information; receiving transform size information; receiving window size information; modifying a coefficient within at least one frequency band of the received frame based at least in part upon the received modification information; applying an inverse transform to the modified frame based at least in part upon the received transform size information; and windowing the inverse transformed modified frame using a window size based at least in part upon the received window size information.
Example 58 can include, or can optionally be combined with the subject matter of Example 57 further including: one or more computer-readable mediums storing instructions that, when executed by the one or more computer processors, cause the system to perform operations comprising: overlap-adding the windowed inverse transformed modified frame with adjacent windowed inverse transformed modified frame.
Example 59 can include, or can optionally be combined with the subject matter of Example 57 further including: one or more computer-readable mediums storing instructions that, when executed by the one or more computer processors, cause the system to perform operations comprising: overlap-adding short windows within the windowed inverse transformed modified frame.
The above description is presented to enable any person skilled in the art to create and use a system and method to determine window sizes and time-frequency transformations in audio coders. Various modifications to the embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments and applications without departing from the scope of the invention. In the preceding description, numerous details are set forth for the purpose of explanation. However, one of ordinary skill in the art will realize that the invention might be practiced without the use of these specific details. In other instances, well-known processes are shown in block diagram form in order not to obscure the description of the invention with unnecessary detail. Identical reference numerals may be used to represent different views of the same or similar item in different drawings. Thus, the foregoing description and drawings of embodiments in accordance with the present invention are merely illustrative of the principles of the invention. Therefore, it will be understood that various modifications can be made to the embodiments by those skilled in the art without departing from the scope of the invention, which is defined in the appended claims.
This patent application is a Continuation of U.S. patent application Ser. No. 15/967,119, filed on Apr. 30, 2018, which claims the benefit of priority to U.S. Provisional Patent Application No. 62/491,911, filed on Apr. 28, 2017, the contents of which are incorporated by reference herein in their entireties.
Number | Date | Country | |
---|---|---|---|
62491911 | Apr 2017 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 15967119 | Apr 2018 | US |
Child | 17080548 | US |