This disclosure relates to audio signal coding. In particular, this disclosure relates to range extension techniques for decoding digital audio signals.
The development of digital encoding and decoding for digital audio signals continues to have a significant effect on the delivery and enjoyment of entertainment content. Perceptual audio coding systems, which enable conveying encoded audio signals at low bitrates while maintaining perceptual quality of the signal when decoded, can employ techniques to characterize certain properties of the audio signal. Parameters can be used to characterize such properties. These parameters can, for example, indicate an energy or level of the signal as a function of time and of frequency. For this purpose, often a time/frequency grid is used. The grid includes a set of time segments, also referred to as time frames, and a set of frequency bands represented in each time frame. For each point in the grid, a respective parameter describes a signal property for the corresponding frequency band and time frame. The points in the grid are sometimes referred to as time/frequency tiles. With most audio signals, the parameters often vary slowly across time and frequency. Thus, time-differential or frequency-differential coding can be performed on the parameter to convey it efficiently, for instance, at a sufficiently low bitrate in a bitstream.
Disclosed are some examples of systems, apparatus, methods and computer program products implementing techniques for audio coding with range extension.
In some examples, a set of encoded values for a sequence of frequency bands in an identifiable time frame of an audio signal is processed. The encoded values vary in relation to a sequence of time frames of the audio signal and in relation to the sequence of frequency bands. The set of encoded values is decoded to produce decoded values. The decoding uses at least a first coding protocol of a set of coding protocols, where the first coding protocol is associated with direct coding of the audio signal. For at least one frequency band of the sequence of frequency bands in the identifiable time frame, such as a lowest frequency band, it is determined that a decoded value corresponds to a minimum of a first range of values of the first coding protocol. The determined value is modified to be below the minimum to produce an extended value. In some implementations, a second decoded value associated with a second frequency band of the sequence is identified as being below the minimum of the first range of values, and the second value is provided as the extended value. The decoded values including the extended value can be provided for further processing.
In some examples, a set of decoded values for a sequence of frequency bands in an identifiable time frame is received. The decoded values vary in relation to the sequence of time frames of the audio signal and in relation to the sequence of frequency bands. For at least one frequency band of the sequence of frequency bands in the identifiable time frame, it is determined that a decoded value corresponds to a minimum of a first range of values of the first coding protocol. The determined value is modified to be below the minimum to produce an extended value. In some implementations, decoded values associated with an upper range of frequency bands of the sequence of frequency bands are identified. The extended value is determined as an extrapolation of the identified decoded values.
In some examples, an audio coding system includes an encoder and a decoder. The encoder is operable to obtain parameters characterizing at least one property of an audio signal. The parameters vary in relation to a sequence of time frames of the audio signal and in relation to a sequence of frequency bands in each time frame. For each time frame, the encoder is further operable to encode a set of the parameters for the sequence of frequency bands in the time frame to produce a set of encoded values. The encoding uses at least a first coding protocol of a set of coding protocols. The encoder is further operable to store the set of encoded values on a storage medium, and/or provide the set of encoded values on a communications medium. The decoder is operable, for each time frame, to retrieve the set of encoded values from the storage medium, and/or receive the set of encoded values on the communications medium. The decoder is further operable to decode the set of encoded values to produce a set of decoded values. The decoder is further operable to identify any decoded values as corresponding to a minimum of a first range of values of the first coding protocol, and modify any identified values to be below the minimum as explained above.
Some examples of the disclosed systems, apparatus, methods and computer program products may be implemented via hardware, firmware, software stored in one or more non-transitory data storage media, and/or combinations thereof. For example, at least some aspects of this disclosure may be implemented in apparatus that includes an interface system and a control system. The interface system may include a user interface and/or a network interface. In some implementations, the apparatus may include a memory system. The interface system may include at least one interface between the control system and the memory system. The control system may include at least one processor, such as a general purpose single- or multi-chip processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, and/or combinations thereof. In some non-limiting examples, the control system may be capable of performing part or all of a range extension process, as disclosed herein.
Details of some implementations of the subject matter described in this disclosure are set forth in the accompanying drawings and the description below. Other features, aspects, and advantages will become apparent from the description, the drawings and the claims. Note that the relative dimensions of the following figures may not be drawn to scale.
Like reference numbers and designations in the various Figures indicate like elements.
Disclosed are some examples of systems, apparatus, methods and computer program products implementing techniques for extending the range of decoded parameter values of a digital audio signal. By way of illustration, a decoding system can receive from an encoding system a set of encoded parameter values characterizing a sequence of frequency bands in a given time frame of the digital audio signal. A first stage of the decoding system is configured to decode the set of encoded values using one or more codebooks to produce a set of decoded values. A second stage of the decoding system is configured to perform range extension on the set of decoded values by identifying one or more of the decoded values as being equal to a minimum of a first range of values available by one of the codebooks. The second stage of the decoding system can extend any identified value(s) to be below the minimum to produce a modified set of decoded values for further processing. In some implementations, the first stage of the decoding system can perform time- and/or frequency-differential decoding using three codebooks explained in further detail below, while the second stage can perform a decoded parameter value modification not affecting the processing of the first stage.
The teachings disclosed herein can be applied in various different ways. In some implementations, examples of the disclosed techniques can be applied to extend the dynamic range of parameters in a high frequency reconstruction (HFR) system in a backwards compatible manner. As an example, a decoder implementing some of the disclosed techniques can also decode legacy bitstreams, generally referring to bitstreams without extended ranges. This is made possible since the disclosed examples of range extension techniques do not call for changes to the underlying bitstream syntax, nor changes to associated codebooks. Some examples provided in this disclosure can be implemented in the context of the Dolby AC-4 audio format of the Dolby Audio™ family, the Dolby AC-3 audio codec (also known as “Dolby Digital”), or the Enhanced AC-3 audio codec (also known as E-AC-3 or “Dolby Digital Plus”), although the disclosed teachings are not limited to such Dolby Audio™ contexts. Some examples of the concepts disclosed herein can be implemented in the context of other audio codecs, including but not limited to MPEG-2 AAC and MPEG-4 AAC. Some of the disclosed examples may be implemented in various audio encoders and/or decoders provided by various manufacturers, and may be included in mobile telephones, smartphones, desktop computers, hand-held or portable computers, netbooks, notebooks, smartbooks, tablets, stereo systems, portable listening devices, televisions, DVD players, digital recording devices and a variety of other devices and systems. Accordingly, the teachings of this disclosure are not intended to be limited to the implementations shown in the figures and/or described herein, but instead have wide applicability.
In the case of frequency-differential coding, the value of a parameter for the first, often the lowest, frequency band of a sequence of frequency bands in a given time frame is generally coded absolutely rather than differentially. Absolute coding is also referred to herein as direct coding. Absolute or direct coding is generally used on the first frequency band because there usually is no previous frequency band relative to which the lowest frequency band could be differentially coded. Thus, different codebooks can be used, depending on the mode of coding (time- or frequency-differential coding) to be performed and depending on the particular frequency band to be coded for a given time frame.
In some implementations, a parameter is coded using three different codebooks: F0, DF, and DT. For a time frame using frequency-differential coding, the value for the first frequency band in the sequence of bands is conveyed directly using codebook F0, and each subsequent frequency band is coded relative to the previous band in the same frame using codebook DF. For a frame using time-differential coding, the value of the parameter in each frequency band is coded relative to the same band in the previous frame using codebook DT. To minimize error propagation in a decoder in case of transmission errors and to enable fast tune-in when starting to decode the bitstream of an ongoing broadcast, by way of example, frequency-differential coding can be used by an encoder in regular time intervals, for instance, once or twice per second. For other frames, the encoder can be configured to select the coding mode that is most efficient, for instance, that uses the smallest number of bits.
Generally, the range of values a parameter can have immediately after being decoded is determined by the range of values covered by the codebook used. In particular, for the coding mode of frequency-differential coding, the range of parameter values for the first band is determined by the range covered by the F0 codebook. The range of parameter values possible for the remaining bands, or for all of the bands in the case of time-differential coding, is often larger. By way of illustration, in a non-limiting example, the three codebooks cover integer values of parameters in the following ranges:
In this example, for a frame using frequency-differential coding, and for the first band, only parameter values in the range of 0 to 35 can be represented, while for the second band, values as low as −35 or as high as 70 can be represented, assuming that the first band had the lowest or highest possible values, respectively.
In a legacy implementation, however, an encoder can be configured to truncate negative values to 0 in a quantization step. Similarly, if the decoder finds values below 0, for instance, after delta decoding, the frame can be deemed erroneous. The highest value (35 in this example) could represent the power obtained in a time/frequency tile when encoding a full scale sinusoid centered in the corresponding frequency band. In the present example, since values are forced into the range of 0 to 35, the range of the F0 codebook, the largest positive step possible from one value to the next (when delta coding in time or frequency) is from 0 to 35 (+35), which is the positive limit for the DF and DT codebooks, and the largest negative step possible is from 35 to 0 (−35), which signifies the negative limit of the DF and DT codebooks.
In some implementations, different techniques are disclosed to extend and facilitate extending the range of values a decoded parameter can have. Some techniques are provided in the context of frequency-differential coding, while at least one technique is applicable in the context of time-differential coding. In some examples, techniques are applied to extend decoded parameter values beyond the range of values of the F0 codebook. Such techniques can be beneficial in some implementations where the F0 codebook has been set and cannot easily be altered to extend the codebook's range.
Below are some non-limiting examples of different range extension techniques to illustrate some of the disclosed implementations. In the numerical examples, the same three codebooks—F0, DF and DT—having the same numerical ranges as the example above are used. Using implementations of some disclosed techniques, even lower values than ‘0’ can be achieved for the first frequency band, where ‘0’ is the lowest possible value if no range extension techniques are used. In some examples below, values in the range from −12 to 35 for the first band can be represented.
In some but not all examples, lower parameter values representing soft sounds correspond to low energy levels in a time/frequency tile, and relatively higher parameter values representing loud sounds correspond to high energy levels. In the following examples, the value 0 corresponds to a very soft but still audible sound level, while the value −12 corresponds to a sound level that is below the threshold of perception and, hence, characterizes complete silence. Those skilled in the art should appreciate that the claims are not limited to the disclosed numerical ranges of these examples, and low parameter values and high parameter values can represent other signal properties than soft and loud sounds.
In
In
In
In the example of
At 408 of
At 416 of
At 420 of
At 424 of
In
At 712 of
At 716 of
As an alternative to the processing of
In some alternative implementations, the last (e.g., highest) frequency band of a sequence of frequency bands in a given time frame can be assigned the value of the next-to-last band when delta decoding. Delta coded values would start beginning from the F0 band when the decoded value for the first frequency band is equal to the minimum of the F0 codebook; i.e., the first DF value would indicate the delta offset for the first band, that is the same band also coded using the F0 codebook. Similarly, in the above case, an extra DF value could also be signaled when F0 equals 0, representing the missing delta value for the last band.
In some implementations, when envelope coder 344 encodes digital audio signal 312 using the F0, DF and DT codebooks, for the first frequency band of a sequence of frequency bands in a time frame, often the encoded values for the first frequency band in the sequence is limited to a range of values representable by the F0 codebook, for instance, the range of 0 to 35 in the example above. Also, in some implementations, when the range extension techniques of
At 816 of
In this example, system 1400 includes an interface system 1405. Interface system 1405 may include a network interface, such as a wireless network interface. Alternatively, or additionally, interface system 1405 may include a universal serial bus (USB) interface or another such interface.
System 1400 includes a logic system 1410. Logic system 1410 may include a processor, such as a general purpose single- or multi-chip processor. Logic system 1410 may include a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, or discrete hardware components, or combinations thereof. Logic system 1410 may be configured to control the other components of system 1400. Although no interfaces between the components of system 1400 are shown in
Logic system 1410 may be configured to perform encoder and/or decoder functionality, including but not limited to the types of encoding and/or decoding processes described herein. In some such implementations, logic system 1410 may be configured to operate (at least in part) according to software stored on one or more non-transitory media. The non-transitory media may include memory associated with logic system 1410, such as random access memory (RAM) and/or read-only memory (ROM). The non-transitory media may include memory of memory system 1415. Memory system 1415 may include one or more suitable types of non-transitory storage media, such as flash memory, a hard drive, etc.
For example, logic system 1410 may be configured to receive frames of encoded audio data via interface system 1405 and to decode the encoded audio data according to the decoding processes described herein. Alternatively, or additionally, logic system 1410 may be configured to receive frames of encoded audio data via an interface between memory system 1415 and logic system 1410. Logic system 1410 may be configured to control speaker(s) 1420 according to decoded audio data. In some implementations, logic system 1410 may be configured to encode audio data according to conventional encoding methods and/or according to encoding methods described herein. Logic system 1410 may be configured to receive such audio data via microphone 1425, via interface system 1405, etc.
Display system 1430 may include one or more suitable types of display, depending on the manifestation of system 1400. For example, display system 1430 may include a liquid crystal display, a plasma display, a bistable display, etc.
User input system 1435 may include one or more devices configured to accept input from a user. In some implementations, user input system 1435 may include a touch screen that overlays a display of display system 1430. User input system 1435 may include buttons, a keyboard, switches, etc. In some implementations, user input system 1435 may include microphone 1425: a user may provide voice commands for system 1400 via microphone 1425. The logic system may be configured for speech recognition and for controlling at least some operations of system 1400 according to such voice commands.
Power system 1440 may include one or more suitable energy storage devices, such as a nickel-cadmium battery or a lithium-ion battery. Power system 1440 may be configured to receive power from an electrical outlet.
The techniques described herein can be implemented by one or more computing devices. For example, a controller of a special-purpose computing device may be hard-wired to perform the disclosed operations or cause such operations to be performed and may include digital electronic circuitry such as one or more application-specific integrated circuits (ASICs) or field programmable gate arrays (FPGAs) persistently programmed to perform operations or cause operations to be performed. In some implementations, custom hard-wired logic, ASICs, and/or FPGAs with custom programming are combined to accomplish the techniques.
In some other implementations, a general purpose computing device can include a controller incorporating a central processing unit (CPU) programmed to cause one or more of the disclosed operations to be performed pursuant to program instructions in firmware, memory, other storage, or a combination thereof. Examples of general-purpose computing devices include servers, network devices and user devices such as smartphones, tablets, laptops, desktop computers, portable media players, other various portable handheld devices, and any other device that incorporates data processing hardware and/or program logic to implement the disclosed operations or cause the operations to implemented and performed. A computing device may include a monitor, printer, or other suitable display for providing any of the results mentioned herein to a user.
The terms “storage medium” and “storage media” as used herein refer to any media that store data and/or instructions that cause a computer or type of machine to operation in a specific fashion. Any of the components, models, modules, units, engines and operations described herein may be at least partially implemented as or caused to be implemented by software code executable by a processor of a controller using any suitable computer language. The software code may be stored as a series of instructions or commands on a computer-readable medium for storage and/or transmission and for use by a computer program product. Examples of suitable computer-readable media include random access memory (RAM), read only memory (ROM), a magnetic medium such as a hard-drive or a floppy disk, an optical medium such as a compact disk (CD) or DVD (digital versatile disk), a solid state drive, flash memory, and any other memory chip or cartridge. The computer-readable medium may be any combination of such storage devices. Computer-readable media encoded with the software/program code may be packaged as part of a computer program product with a compatible device such as a user device or a server as described above or provided separately from other devices. Any such computer-readable medium may reside on or within a single computing device or an entire computer system, and may be among other computer-readable media within a system or network.
In some implementations, a non-transitory computer-readable storage medium stores instructions executable by a computing device to cause some or all of the operations described above to be performed. Non-limiting examples of computing devices include servers and desktop computers, as well as portable handheld devices such as a smartphone, a tablet, a laptop, a portable music player, etc. In some instances, one or more servers can be configured to encode and/or decode a digital audio signal using one or more of the disclosed techniques and stream a processed output signal to a user's device over the Internet as part of a cloud-based service.
Storage media is distinct from but may be used in conjunction with transmission media. Transmission media participates in transferring information between storage media. For example, transmission media includes coaxial cables, copper wire and fiber optics. Transmission media can also take the form of acoustic or light waves, such as those generated during radio-wave and infra-red data communications.
Despite references to particular computing paradigms and software tools herein, the disclosed techniques are not limited to any specific combination of hardware and software, nor to any particular source for the instructions executed by a computing device or data processing apparatus. Program instructions on which various implementations are based may correspond to any of a wide variety of programming languages, software tools and data formats, and be stored in any type of non-transitory computer-readable storage media or memory device(s), and may be executed according to a variety of computing models including, for example, a client/server model, a peer-to-peer model, on a stand-alone computing device, or according to a distributed computing model in which various functionalities may be effected or employed at different locations. In addition, references to particular protocols herein are merely by way of example. Suitable alternatives known to those of skill in the art may be employed.
Various modifications to the implementations described in this disclosure may be readily apparent to those skilled in the art. The general principles defined herein may be applied to other implementations without departing from the spirit or scope of this disclosure. Thus, the claims are not intended to be limited to the implementations shown herein, but are to be accorded the widest scope consistent with this disclosure, the principles and the novel features disclosed herein.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2016/057232 | 4/1/2016 | WO | 00 |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2016/162283 | 10/13/2016 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
7269552 | Prange | Sep 2007 | B1 |
8077769 | Krishnan | Dec 2011 | B2 |
20030115041 | Chen | Jun 2003 | A1 |
20030176934 | Gopalan | Sep 2003 | A1 |
20030233234 | Truman | Dec 2003 | A1 |
20040181406 | Garrett | Sep 2004 | A1 |
20040260545 | Gao | Dec 2004 | A1 |
20050015249 | Mehrotra | Jan 2005 | A1 |
20050226426 | Oomen | Oct 2005 | A1 |
20080091440 | Oshikiri | Apr 2008 | A1 |
20090030676 | Xu | Jan 2009 | A1 |
20100169081 | Yamanashi | Jul 2010 | A1 |
20100286990 | Biswas | Nov 2010 | A1 |
20100324708 | Ojanpera | Dec 2010 | A1 |
20110170711 | Rettelbach | Jul 2011 | A1 |
20110208528 | Schildbach | Aug 2011 | A1 |
20120263312 | Takada | Oct 2012 | A1 |
20130110506 | Norvell | May 2013 | A1 |
20130110522 | Choo | May 2013 | A1 |
20130114733 | Fukui | May 2013 | A1 |
20130339036 | Baeckstroem | Dec 2013 | A1 |
20140114651 | Liu | Apr 2014 | A1 |
20140156284 | Porov | Jun 2014 | A1 |
20150319438 | Shima | Nov 2015 | A1 |
20160042744 | Klejsa | Feb 2016 | A1 |
20160140974 | Valero | May 2016 | A1 |
20160210977 | Ghido | Jul 2016 | A1 |
20160232903 | Choo | Aug 2016 | A1 |
20170092282 | Choo | Mar 2017 | A1 |
Number | Date | Country |
---|---|---|
2011124473 | Oct 2011 | WO |
Entry |
---|
Digital Audio Compression (AC-4) Standard, Part 1: Channel Based coding; Draft ETSI TS 103 190-1 ETSI Draft, European Telecommunications Standards Institute (ETSI), 650, vol. Broadcast No. V1.2.1, pp. 1-302, Feb. 23, 2015. |
Number | Date | Country | |
---|---|---|---|
20180130480 A1 | May 2018 | US |
Number | Date | Country | |
---|---|---|---|
62144163 | Apr 2015 | US | |
62260845 | Nov 2015 | US |