The present invention relates generally to audio encoding and decoding.
In the last twenty years microprocessor speed increased by several orders of magnitude and Digital Signal Processors (DSPs) became ubiquitous. It became feasible and attractive to transition from analog communication to digital communication. Digital communication offers the major advantage of being able to more efficiently utilize bandwidth and allows for error correcting techniques to be used. Thus by using digital technology one can send more information through a given allocated spectrum space and send the information more reliably. Digital communication can use radio links (wireless) or physical network media (e.g., fiber optics, copper networks).
Digital communication can be used for different types of communication such as speech, audio, image, video or telemetry for example. A digital communication system includes a sending device and a receiving device. In a system capable of two-way communication each device has both sending and receiving circuits. In a digital sending or receiving device there are multiple staged processes through which the signal and resultant data is passed between the stage at which the signal is received at an input (e.g., microphone, camera, sensor) and the stage at which a digitized version of the signal is used to modulate a carrier wave and transmitted. After (1) the signal is received at the input and then digitized, (2) some initial noise filtering may be applied, followed by (3) source encoding and (4) finally channel encoding. At a receive device, the process works in reverse order; channel decoding, source recovery, and then conversion to analog. The present invention as will be described in the succeeding pages can be considered to fall primarily in the source encoding stage.
The main goal of source encoding is to reduce the bit rate while maintaining perceived quality to the extent possible. Different standards have been developed for different types of media.
The features of the invention believed to be novel are set forth with particularity in the appended claims. The invention itself however, both as to organization and method of operation, together with objects and advantages thereof, may be best understood by reference to the following detailed description, which describes certain exemplary embodiments of concepts that include the invention. The description is meant to be taken in conjunction with the accompanying drawings in which:
Skilled artisans will appreciate that elements in the figures are illustrated for simplicity and clarity and have not necessarily been drawn to scale. For example, the dimensions of some of the elements in the figures may be exaggerated relative to other elements to help to improve understanding of embodiments of the present invention.
While this invention is susceptible of embodiment in many different forms, there is shown in the drawings and will herein be described in detail specific embodiments, with the understanding that the present disclosure is to be considered as an example of the principles of the invention and not intended to limit the invention to the specific embodiments shown and described. In the description below, like reference numerals are used to describe the same, similar or corresponding parts in the several views of the drawings.
In this document, relational terms such as first and second, top and bottom, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. The terms “comprises,” “comprising,” or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. An element preceded by “comprises . . . a” does not, without more constraints, preclude the existence of additional identical elements in the process, method, article, or apparatus that comprises the element.
Reference throughout this document to “one embodiment”, “certain embodiments”, “an embodiment” or similar terms means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, the appearances of such phrases or in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments without limitation.
The term “or” as used herein is to be interpreted as an inclusive or meaning any one or any combination. Therefore, “A, B or C” means “any of the following: A; B; C; A and B; A and C; B and C; A, B and C”. An exception to this definition will occur only when a combination of elements, functions, steps or acts are in some way inherently mutually exclusive.
Embodiments described herein relate to encoding signals. The signals can be speech or other audio such as music that are converted to digital information and communicated by wire or wirelessly.
Turning now to the drawings, wherein like numerals designate like components,
The human interface system 120 is a system that comprises a processing system and electronic components that support the processing system, such peripheral I/O circuits and power control circuits, as well as electronic components that interface to users, such as a microphone 102, a display/touch keyboard 104, and a speaker 106. The processing system comprises a central processing unit (CPU) and memory. The CPU processes software instructions stored in the memory that primarily relate to human interface aspects of the mobile communication device 100, such as presenting information on the display/keyboard 104 (lists, menus, graphics, etc.) and detecting human entries on a touch surface of the display/keyboard 104. These functions are shown as a set of human interface applications (HIA) 130. The HIA 130 may also receive speech audio from the microphone 102 through the analog/digital (ND) converter 125, then perform speech recognition of the speech and respond to commands made by speech. The HIA 130 may also send tones such as ring tones to the speaker 106 through digital to analog converter (D/A) 135 The human interface system 120 may comprise other human interface devices not shown in
The radio system 199 is a system that comprises a processing system and electronic components that support the processing system, such peripheral I/O circuits and power control circuits, as well as electronic components that interface to the antenna, such as RF amplifiers. The processing system comprises a central processing unit (CPU) and memory. The CPU processes software instructions stored in the memory that primarily relate to radio interface aspects of the mobile communication device 100, such as transmitting digitized signals that have been encoded to data packets (shown as transmitter system 170) and receiving data packets that are decoded to digitized signals (shown as receiver system 140). But for the antenna 108 and certain radio frequency interface portions of receiver system 140 and transmitter system 170 (not explicitly shown in
The receiver system 140 is coupled to the antenna 108. The antenna 108 intercepts radio frequency (RF) signals that may include a channel having a digitally encoded signal. The intercepted signal is coupled to the receiver system 140, which decodes the signal and couples a recovered digital signal in these embodiments to a human interface system 120, which converts it to an analog signal to drive a speaker. In other embodiments, the recovered digital signal may be used to present an image or video on a display of the human interface system 120. The transmitter system 170 accepts a digitized signal 126 from the human interface system 120, which may be for example, a digitized speech signal, digitized music signal, digitized image signal, or digitized video signal, which may be coupled from the receiver system 140, stored in the wireless electronic communication device 100, or sourced from an electronic device (not shown) coupled to the electronic communication device 100. The digitized signal is one that has been sampled at a periodic digitizing sampling rate. The digitized sampling rate may be, for example 8 KHz, 16 KHz, 32 KHz, 48K Hz, or other sampling rates that are not necessarily multiples of 8 KHz. It will be appreciated that the bandwidth of the signal being sampled may be less than ½ the sampling rate. For example, in some embodiments a signal having a bandwidth of 12 KHz may have been sampled at a 48 KHz sampling rate. The transmitter system 170 analyzes and encodes the digitized signal 126 into digital packets that are transmitted on an RF channel by antenna 108.
The transmitter system 170 comprises an audio coding function 181 that periodically analyzes the samples of the digitized signal and encodes them into bandwidth efficient code words 182. The code words 182 are generated at a bit rate determined by a frequency analysis of the digitized signal 126 and a bit rate value 141 that is received in a message from a network device and coupled from the receiver system 140 to the audio coding function 181. A bit rate value 141 received from a network may in some embodiments define a permitted bit rate that the device 100 may not exceed for transmissions to the network, which would typically be determined by a network operator or network device based on the current network traffic loading. The bit rate value in some embodiments may define a permitted bit rate that must be met as an average value but having instantaneous values within some tolerance (e.g., not more than 10% above the average value) by the device 100. An example of this type of bit rate value may be one that restricts the transmission bit rate used by the device 100 in accordance with a fee structure. In some embodiments, the bit rate value 141 may be coupled from the human interface system 120 instead of the receiver system 140. A packet generator 187 uses the code words 182 to form packets that are coupled to an RF transmitter 190 for amplification, and are then radiated by antenna 108.
Referring to
Referring to
The energy analysis is coupled to the band split functions 310-325, which determine the total amount of energy in each sub-band. The sub-band ranges for an example that will be used herein are 0-7 KHz for band split #1310, 7-8 KHz for band split #2315, 8-16 KHz for band split #3320, and 16-20 KHz for band split #4 (not shown in
Referring back to
Referring to Table 1, a set of threshold values is shown, in accordance with certain embodiments. The set is one that could be used for the example that has been described herein above, and may be included in the bias table 370 (
It will be appreciated that when the energy density is uniform the total energy in each sub-band would be, from the lowest sub-band to the highest sub-band 35, 5, 20, and 40 respectively. When the bit rate value 141 is Low and the energy density is uniform, the respective outputs of the threshold-with-hysteresis-functions 350-365, from lowest to highest, would be TRUE, FALSE, FALSE, and FALSE because the only threshold that is exceeded is the one for 0-7 KHz. Since the highest sub-band for which the threshold is TRUE is the 0-7 KHz sub-band, the selected bandwidth is 7 KHz. When the energy density is uniform and the bit rate value 141 is High, the respective outputs of the threshold-with-hysteresis-functions 350-365, from lowest to highest, would be TRUE, TRUE, FALSE, and TRUE. Since the highest sub-band for which the threshold is TRUE is the 12-20 KHz sub-band, the threshold logic function 215 selects the protocol that provides a 20 KHz bandwidth. Below plots 405, 410 in
In certain embodiments, if there is a maximum permitted transmit data rate that would be exceeded by using any of the selectable bandwidths, then the transmitter system 170 may include logic to prevent protocols having such bandwidths from being used, by limiting the selection of bandwidths to lower bandwidth protocols that always keeps the transmitted data rate below the maximum permitted transmitted data rate. This additional restriction may be incorporated in the threshold logic function 215 based on an indication received in a protocol message received by receiver system 140. The indication could be used, for example, to select one of several different tables of values, some of which have thresholds chosen to preclude the use of high bandwidths, or may be logic that alters the selected bandwidth to a lower one if it would result in an excessive transmitted data rate.
It will be appreciated that by having the flexibility of defining sets of threshold values (and in some embodiments corresponding hysteresis values) that are selected by choosing a bit rate value, the average transmitted bit rate can be lowered in accordance with channel conditions while the audio quality is more optimally maintained than that when bit rate restrictions are imposed in systems that use conventional techniques. In some embodiments it will be appreciated that it is desirable to match the audio bandwidth of the encoding protocol to that of the input signal as closely as possible while the bandwidth of the input signal varies over time. In other words the threshold values are empirically determined so that the audio bandwidths of the encoding protocols that are sequentially selected during an input signal track the varying bandwidth of the input signal. The input signal used is one or more audio sequences typical of those that are expected to be encoded. Such a configuration would be appropriate to achieve medium channel bit rates (a so called Med bit rate setting). In some embodiments, when for example the channel bit rate available to the encoding protocol is limited and better sounding synthetic audio is produced when the input signal bandwidth is reduced, the sub-band spectral analysis function 210 may be biased such that lower audio bandwidth encoding protocols are favoured; a so called Low bit rate setting. When a higher channel bit rate is available to the encoding protocol in some embodiment, the sub-band spectral analysis function 210 may be biased such that higher audio bandwidth encoding protocols are favoured; a so called High bit rate setting. In some embodiments, a change in the bit rate value during the audio signal alters the selection of the set of thresholds from the available sets as soon as practical within the constraints of the encodings protocols that are used, which provides a quicker change of the average channel bit rate. This allows better control of the combined bandwidth of several devices that are using a shared bandwidth.
Lower audio bandwidth encoding protocols being “favoured” means that the thresholds are empirically set so that the default output will be encoded using a low audio bandwidth encoding protocol, only switching to a higher bandwidth encoding protocol, that has a channel bit rate that is similar to (e.g., within 10% in some embodiments; in other embodiments the similarity tolerance may be as high as 50%) of the channel bit rate of the low audio bandwidth encoding protocol, for limited time periods. This switching will occur when the energy in a higher sub-band is large enough that the perceptual advantage of encoding the higher audio bandwidth outweighs the degradation caused by reducing the number of encoding bits allocated to the audio signal within the lower audio bandwidths. The low audio bandwidth encoding protocol encodes a bandwidth that includes the lowest audio sub-band and may include higher sub-band(s) up to and including a particular higher audio sub-band (but not the highest sub-band). The low audio bandwidth is determined based on input signals of the type expected to be encoded, and may be determined based on theoretical methods (e.g., accuracy), empirical methods (e.g., expert listening or Mean Opinion Score (MOS) tests), or may be the lowest encoding protocol bandwidth usable in a system at a particular time. Higher audio bandwidths being “favoured” means that the thresholds are empirically set so that the output will be encoded using a high audio bandwidth encoding protocol, only switching to a lower bandwidth encoding protocol for time periods where the high frequency energy, e.g., the energy corresponding to the top sub-band in the input signal, is imperceptible to the average listener. The high audio bandwidth encoding protocol encodes a bandwidth that includes the highest audio sub-band and may include lower sub-band(s) down to and including a particular lower audio sub-band. The high audio bandwidth is determined based on input signals of the type expected to be encoded, and may be determined based on theoretical methods (e.g., accuracy), empirical methods (e.g., expert listening or Mean Opinion Score (MOS) tests)), or may be the highest encoding protocol bandwidth usable in a system at particular time. The empirically determined threshold settings for the above described Med, Low, and High bit rates could be used in a single embodiment in the form of a correspondence table such as the one shown in Table 1 (but having the empirically determined values). The first and second Hysteresis values could also be empirically determined for the Med, Low and High bit rates in the single embodiment. The first and second hysteresis values may be the same for the transitions in each of the Med, Low and High bit rates.
Referring to
Referring to
The processes illustrated in this document, for example (but not limited to) the method steps described in
In the foregoing specification, specific embodiments of the present invention have been described. However, one of ordinary skill in the art appreciates that various modifications and changes can be made without departing from the scope of the present invention as set forth in the claims below. As examples, in some embodiments some method steps may be performed in different order than that described, and the functions described within functional blocks may be arranged differently (e.g., the bias table 370 and threshold with hysteresis blocks 350-365 could be a part of the threshold logic function 215 instead of the sub-band spectral analysis function 210). As another example, any specific organizational and access techniques known to those of ordinary skill in the art may be used for tables such as the bias table 370. Accordingly, the specification and figures are to be regarded in an illustrative rather than a restrictive sense, and all such modifications are intended to be included within the scope of present invention. The benefits, advantages, solutions to problems, and any element(s) that may cause any benefit, advantage, or solution to occur or become more pronounced are not to be construed as a critical, required, or essential features or elements of any or all the claims. The invention is defined solely by the appended claims including any amendments made during the pendency of this application and all equivalents of those claims as issued.
Number | Name | Date | Kind |
---|---|---|---|
5115240 | Fujiwara et al. | May 1992 | A |
5742734 | Dejaco et al. | Apr 1998 | A |
6091723 | Even | Jul 2000 | A |
20060004565 | Eguchi | Jan 2006 | A1 |
20090234645 | Bruhn | Sep 2009 | A1 |
20100324708 | Ojanpera | Dec 2010 | A1 |
Entry |
---|
3GPP LTE ETSI TS 126 290 v10.0.0 (Apr. 2011), Digital cellular telecommunications system (Phase 2+); Universal Mobile Telecommunications System (UMTS); LTE; Audio codec processing functions; Extended Adaptive Multi-Rate-Wideband (AMR-WB+) codec; Transcoding functions (3GPP TS 26.290 version 10.0.0. Release 10), all pages. |
Patent Cooperation Treaty, International Search Report and Written Opinion of the International Searching Authority for International Application No. PCT/US2012/067532, Mar. 4, 2013, 10 pages. |
Number | Date | Country | |
---|---|---|---|
20130151260 A1 | Jun 2013 | US |