There are many situations where it is necessary or desirable for audio communication to occur with low latency in limited bandwidth environments where interference can cause data transmission errors. As one example, modern hearing aids and other hearable devices support low latency audio communication with various electronic devices. Bandwidth and latency requirements can generally be reduced using audio compression techniques that remove unnecessary redundance from the signal. One popular compression technique is adaptive differential pulse code modulation (ADPCM), some modifications of which enhance robustness to transmission errors though doing so at a significant performance cost whether measured in terms of reproduction quality or compression rate. In “Error Resilience Enhancement for a Robust ADPCM Audio Coding Scheme” (2014 IEEE ICASSP p. 3685-89), which is hereby incorporated herein by reference, Simkus et al. propose one approach that achieves improved performance but which unfortunately requires the use of a sideband channel. In many contexts, it would be infeasible or unnecessarily complex to provide for communication of such sideband channel information.
Accordingly, there are disclosed herein devices, systems, and methods employing adaptive differential pulse code modulation (ADPCM) techniques providing for optimum performance even while ensuring robustness against transmission errors. One illustrative audio communication device includes: a difference element that produces a sequence of prediction error values by subtracting a sequence of predicted audio sample values from a sequence of audio samples; a scaling element that produces a sequence of scaled error values by dividing each prediction error value by a corresponding envelope estimate; a quantizer that operates on the sequence of scaled error values to produce a sequence of quantized error values; a multiplier that uses the corresponding envelope estimates to produce a sequence of reconstructed error values; a predictor that produces the sequence of predicted audio sample values based on reconstructed audio samples derived from the sequence of reconstructed error values; and an envelope estimator. The envelope estimator includes: an updater that applies a dynamic gain to the reconstructed error values to produce a sequence of update values; and an integrator that combines each of the update values with the corresponding envelope estimate to produce a subsequent envelope estimate.
An illustrative audio communication receiver receives an audio data stream conveying a sequence of quantized error values, and includes: a multiplier that uses corresponding envelope estimates to produce a sequence of reconstructed error values based on the sequence of quantized error values; a summation element that combines the sequence of reconstructed error values with a sequence of predicted audio sample values to produce a sequence of reconstructed audio samples; a predictor that produces the sequence of predicted audio sample values based on the sequence of reconstructed audio samples; and an envelope estimator. The envelope estimator includes: an updater that applies a dynamic gain to the reconstructed error values to produce a sequence of update values; and an integrator that combines each of the update values with the corresponding envelope estimate to produce a subsequent envelope estimate.
An illustrative audio communication method includes: obtaining a sequence of quantized error values from an audio data stream; using corresponding envelope estimates to produce a sequence of reconstructed error values based on the sequence of quantized error values; combining the sequence of reconstructed error values with a sequence of predicted audio sample values to produce a sequence of reconstructed audio samples; producing the sequence of predicted audio sample values based on the sequence of reconstructed audio samples; and deriving the corresponding envelope estimates. The estimates are derived by: applying a dynamic gain to the reconstructed error values to produce a sequence of update values; and combining each of the update values with the corresponding envelope estimate to produce a subsequent envelope estimate.
Each of these illustrative embodiments may be employed separately or conjointly, and may optionally include one or more of the following features in any suitable combination: 1. the quantizer is nonlinear. 2. a dequantizer that operates on the sequence of quantized error values to provide the multiplier with reconstructed scaled error values. 3. an encoder that converts the sequence of quantized error values into an audio data stream for storage or transmission. 4. a decoder that, based on the audio data stream, supplies the dequantizer with the sequence of quantized error values. 5. the dynamic gain at the input of the envelope estimator varies based on the previous envelope estimate. 6. the dynamic gain decreases from a maximum gain value to a minimum gain value as the corresponding envelope estimate increases. 7. the envelope estimator includes: a second difference element that determines a difference between the maximum gain value and a scaled version of the corresponding envelope estimate; and a range limiter that produces the dynamic gain by limiting the difference to a range between the minimum and maximum gain values. 8. the envelope estimator includes a comparator to select a larger weight factor for the update values having a larger magnitude than the corresponding envelope estimate and a smaller weight factor for the update values having a smaller magnitude than the corresponding envelope estimate.
It should be understood that the following description and accompanying drawings are provided for explanatory purposes, not to limit the disclosure. In other words, they provide the foundation for one of ordinary skill in the art to recognize and understand all modifications, equivalents, and alternatives falling within the scope of the claims.
The present disclosure is best understood in light of a suitable application. As context,
Illustrated media device 106 is a television generating sound 112 as part of an audiovisual presentation, but other sound sources are also contemplated including doorbells, (human) speakers, audio speakers, computers, and vehicles. Illustrated media device 108 is a mobile phone, tablet, or other processing device, which may have access to a network access point 110 (shown here as a cell tower). Media device 108 sends and receives streaming data 114 potentially representing sound to enable a user to converse with (or otherwise interact with) a remote user, service, or computer application. Arrays of one or more microphones 118 and 120 may receive sound 112, which the devices 102, 104 may digitize, process, and play through earphone speakers 119, 121 in the ear canal. The wireless audio devices 102, 104 employ a low latency streaming link 116 to convey the digitized audio between them, enabling improved audio signals to be rendered by the speakers 119, 121.
Various suitable implementations exist for the low latency streaming link 116, such as a near field magnetic induction (NFMI) protocol, which can be implemented with a carrier frequency of about 10 MHz is used. NFMI enables dynamic exchange of data between audio devices 102, 104 at low power levels, even when on opposite sides of a human head. Streaming data 114 is more typically conveyed via Bluetooth or Bluetooth Low Energy (BLE) protocols.
For CROS and BiCROS operation, the audio devices detect, digitize, and apply monaural processing to the sound received at that ear. One or both of the audio devices convey the digitized sound as a cross-lateral signal to the other audio device via the dedicated point-to-point link 116. The receiving device(s) apply a binaural processing operation to combine the monaural signal with the cross-lateral signal before converting the combined signal to an in-ear audio signal for delivery to the user's ear. Audio data streaming entails rendering (“playing”) the content represented by the data stream as it is being delivered. CROS and audio data streaming employ wireless network packets to carry the data payloads to the target device. Channel noise and interference may cause packet loss, so the various protocols may employ varying degrees of buffering and redundancy, subject to relatively strict limits on latency. For example, latencies in excess of 20 ms are noticeable to participants in a conversation and widely regarded as undesirable. To support CROS and BiCROS features, very low latencies (e.g., below 5 ms end-to-end) are required to avoid undesirable “echo” effects. In energy-limited applications such as hearing aids, the latency requirements must be met while the operation is subject to strict power consumption limits.
A signal detection unit 214 collects, filters, and digitizes signals from local input transducers 216 (such as a microphone array). The detection unit 214 further provides direct memory access (DMA) transfer of the digitized signal data into the system memory 212, with optional digital filtering and downsampling. Conversely, a signal rendering unit 218 employs DMA transfer of digital signal data from the system memory 212, with optional upsampling and digital filtering prior to digital-to-analog (D/A) conversion. The rendering unit 218 may amplify the analog signal(s) and provide them to local output transducers 220 (such as a speaker or piezoelectric transducer array).
Controller 208 extracts digital signal data from the wireless streaming packets received by radio module 204, optionally buffering the digital signal data in system memory 212. As signal data is acquired by the signal detection unit 214, the controller 208 may collect it and perform audio compression to form data payloads for the radio module to frame and send, e.g., as cross-lateral data via the point-to-point wireless link 116. The controller 208 may provide error correction code encoding to add controlled redundancy for protection against errors in transmitted data, and conversely may employ an error correction code decoder to detect bit errors in received data, correcting them if possible prior to performing decompression to convert the received audio data into a received audio stream. Latency and power consumption restrictions may limit audio compression and complexity.
The controller 208 or the signal rendering unit 218 combines the acquired digital signal data with the wirelessly received signal data, applying filtering and digital signal processing as desired to produce a digital output signal which may be directed to the local output transducers 220. Controller 208 may further include general purpose input/output (GPIO) pins to measure the states of control potentiometers 222 and switches 224, using those states to provide for manual or local control of on/off state, volume, filtering, and other rendering parameters. At least some contemplated embodiments of controller 208 include a RISC processor core, a digital signal processor core, special purpose or programmable hardware accelerators for filtering, array processing, and noise cancelation, as well as integrated support components for power management, interrupt control, clock generation, and standards-compliant serial and parallel wiring interfaces.
The software or firmware stored in memories 210, 212, may cause the processor core(s) of the controller 208 to implement a low-latency wireless streaming method using ADPCM compression with an enhanced performance as described further below. Alternatively the controller 208 may implement this method using application-specific integrated circuitry.
As the compression process removes most of the signal redundancy, an error correction code (ECC) encoder 304 re-introduces a controlled amount of redundancy to enable error detection and correction (within limits). The added redundance may take the form of parity bits sufficient to enable correction of a single bit error in each data packet.
Box 306 represents a digital communications channel that includes a modulator to convert the ECC-encoded digital audio data dk into channel symbols, a transmitter to send the channel symbols across a wireless signaling medium, and a receiver-demodulator that receives potentially-corrupted channel symbols from the signaling medium and converts them to estimated digital audio data {circumflex over (d)}k that potentially includes bit errors. An ECC decoder 308 operates on the estimated digital audio data to detect one or more bit errors in each packet, correcting them when possible (e.g., when only a single error is present).
An audio decompressor 310 reverses the operation of compressor 302 to reconstruct a stream of digital audio samples âk from the stream of audio error samples {circumflex over (q)}k. A digital to analog converter 312 converts the stream of digital audio samples into an analog audio signal at, which a speaker or other audio transducer 314 converts into a sound signal st.
Elements 412-422 mimic the operation of the receiving device so as to enable the receiving device to reconstruct the audio sample stream xk from the quantized error values qk. A dequantizer 412 converts the quantized error value qk into a reconstructed version of the scaled error value. A multiplier 414 multiplies this scaled error value by the envelope estimate vk-1 to obtain a reconstructed error value êk. An envelope estimator 418 operates on the sequence of reconstructed error values êk to provide the envelope estimate vk to a delay element 416, which makes the preceding estimate vk-1 available to the multiplicative inverter 408 and multiplier 414. A summation element 420 adds the reconstructed error values êk to the predicted value to obtain the reconstructed audio sample stream {circumflex over (x)}k. The prediction filter 422 operates on the reconstructed audio sample stream {circumflex over (x)}k to obtain the next audio sample prediction which is used by difference element 402.
The audio compressor and decompressor make the best use of the available bit resolution for the quantization error qk when the envelope estimators 418 provide an accurate scale factor for matching the range of the prediction error ek to that of the quantizer 410. For faithful reconstruction of the audio sample stream, the envelope estimate on the receiver side must converge with that on the transmit side, even in the presence of data transmission errors. Estimators 418 use lossy integration with a damping factor 13 chosen to provide the desired tradeoff between robustness and performance. Fidelity of the reconstructed audio sample stream quickly degrades when scaled prediction errors exceed the range of the quantizer, which can occur when the envelope estimate is overly damped.
In the integration operation, the selected parameter sets the weighting between the previous envelope value and the new error contribution. A difference element 512 subtracts the selected parameter value from one to obtain the weight for the previous envelope value. A multiplier 514 multiplies the damped (squared) previous envelope value with the calculated weight, while another multiplier 516 multiplies the (squared) amplified error value by the selected parameter value. An adder 520 combines the weighted values to obtain the new squared envelope estimate. A square root element 522 takes the square root to provide the new envelope estimate. A limiter 524 may be used to ensure the envelope estimate vk does not exceed a maximum value or fall below a minimum value.
A delay element 526 latches the envelope estimate vk to make a previous envelope estimate vk-1 available for use. A power element 518 calculates the damped squared previous envelope value vk-12β, where β is the damping factor chosen to provide robustness against transmission errors. The damping factor β is in the range between one and zero. Setting β equal to one would provide no protection against transmission errors. As β decreases toward zero, the rate of recovery from transmission errors increases at the expense of reduced audio quality.
The envelope estimator of
The inventor has observed that the use of a dynamic gain drastically accelerates the recovery from transmission errors, as any resulting mismatch in the encoder's and decoder's envelope detector values is corrected on the decoder side by the combined effects of the damping factor and the mismatch in the dynamic gain. This accelerated correction obviates any incentive for communicating the transmitter's dynamic gain and envelope values via a side channel or other means.
While the foregoing discussion has focused on audio streaming in the context of hearing aids, the foregoing principles are expected to be useful for many applications, particularly those involving audio streaming to or from smart phones or other devices low latency wireless audio streaming. Any of the controllers described herein, or portions thereof, may be formed as a semiconductor device using one or more semiconductor dice. Though the operations shown and described in
It will be appreciated by those skilled in the art that the words during, while, and when as used herein relating to circuit operation are not exact terms that mean an action takes place instantly upon an initiating action but that there may be some small but reasonable delay(s), such as various propagation delays, between the reaction that is initiated by the initial action. Additionally, the term while means that a certain action occurs at least within some portion of a duration of the initiating action. The use of the word approximately or substantially means that a value of an element has a parameter that is expected to be close to a stated value or position. The terms first, second, third and the like in the claims or/and in the Detailed Description or the Drawings, as used in a portion of a name of an element are used for distinguishing between similar elements and not for describing a sequence, either temporally, spatially, in ranking or in any other manner. It is to be understood that the terms so used are interchangeable under appropriate circumstances and that the embodiments described herein are capable of operation in other sequences than described or illustrated herein. Inventive aspects may lie in less than all features of any one given implementation example. Furthermore, while some implementations described herein include some but not other features included in other implementations, combinations of features of different implementations are meant to be within the scope of the invention, and form different embodiments as would be understood by those skilled in the art.
The present application claims priority to Provisional U.S. Application 63/260,431, filed 2021 Aug. 19 and titled “Transmission Error Robust Adaptive Quantization Step Adjustment with Rapid and Optimum Response” by inventor Erlam Onat, which is hereby incorporated herein by reference.
Number | Name | Date | Kind |
---|---|---|---|
7430255 | Shibuya et al. | Sep 2008 | B2 |
8601338 | Kolze | Dec 2013 | B2 |
8649523 | Chau | Feb 2014 | B2 |
9754601 | Hirschfeld | Sep 2017 | B2 |
11545164 | Petersen | Jan 2023 | B2 |
20090254783 | Hirschfeld | Oct 2009 | A1 |
20120243710 | Chau | Sep 2012 | A1 |
20130204630 | Ragot | Aug 2013 | A1 |
20160064007 | Villemoes | Mar 2016 | A1 |
20170330572 | Johnston | Nov 2017 | A1 |
Entry |
---|
David L. Cohn et al., “The Relationship Between an Adaptive Quantizer and a Variance Estimator,” IEEE Transactions on Information Theory, Nov. 1975, pp. 669-671. |
Gediminas Simkus, et al., “Error Robust Delay-Free Lossy Audio Coding Based on ADPCM,” Proc. of the 16th Int. Conference on Digital Audio Effects (DAFx-13), Maynooth, Ireland, Sep. 2-5, 2013, 8 pages. |
Gediminas Simkus, et al., “Error Resilience Enhancement for a Robust ADPCM Audio Coding Scheme,” 2014 IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP), IEEE 978-1-4799-2893—Apr. 14, 2014, pp. 3685-3689. |
Number | Date | Country | |
---|---|---|---|
20230058583 A1 | Feb 2023 | US |
Number | Date | Country | |
---|---|---|---|
63260431 | Aug 2021 | US |