The present invention relates generally to an audio reproduction device, and, more particularly, to a head phone.
System and method for an audio reproduction device is described. Audio sound reproduction devices may include headphones and earbuds. Humans have evolved to hear sounds within physical spaces. The physical configuration of our two ears, our head between them, and the ways in which we perceive sound is the result of the interface with, and the physical characteristics of, the environment within which sounds are created and transported. However, since the introduction of the Walkman® in 1979, headphones (and later earbuds) became very popular ways to enjoy listening to sound. By closely coupling two sound transducers with our two ears independently, all of environmental effects and the natural perception of sound are circumvented. This creates a synthetic, artificial listening environment, and substantially changes our psychoacoustic interpretation of the sounds that we hear.
Further, entertainment content such as music and film soundtracks are typically created in carefully designed physical environments (studios and sound stages). Therefore, by listening to the resulting music or film soundtracks through headphones, our psychoacoustic experience is typically significantly different than that which was intended by the creators, producers or editors of the content. This presents numerous problems. In some examples, creating content using headphones is highly challenging, therefore requiring carefully designed studio spaces and expensive monitor loudspeakers. In some examples, a listener's psychoacoustic experience while consuming audible content is different when accessed through loudspeakers versus headphones. There is a need to solve one or more of these problems. It is with these needs in mind, this disclosure arises.
In one embodiment, a method for enhancing audio reproduced by an audio reproduction device is disclosed. X samples of audio signals for at least a first channel and a second channel are received for a given time period, by a digital signal processor. Received x samples of audio signals for the first channel are stored in a first position of an input buffer with 2x positions and rest of the x positions in the first portion are padded with zeroes. Received x samples of audio signals for the second channel are stored in a second position of an input buffer with 2x positions and rest of the x positions in the second portion are padded with zeroes. The contents of the first portion and the second portion are transformed to frequency domain components, by a frequency domain transformation engine. Transformed frequency domain components are multiplied with a first filter coefficients indicative of a short echo, to generate frequency domain components with short echo effect, by a filter coefficient multiplier. Transformed frequency domain components are multiplied with a second filter coefficients indicative of a long echo, to generate frequency domain components with long echo effect, by the filter coefficient multiplier. The frequency domain components with short echo effect are converted to time domain components with short echo effect, by a time domain transformation engine. The frequency domain components with long echo effect are converted to time domain components with long echo effect, by the time domain transformation engine. Selective time domain components with short echo effect and selective time domain components with long echo effect are combined by an overlap adder engine to generate a convolved first channel output and a convolved second channel output.
In another embodiment, a system for enhancing audio reproduced by an audio reproduction device is disclosed. X samples of audio signals for at least a first channel and a second channel are received for a given time period, by a digital signal processor. Received x samples of audio signals for the first channel are stored in a first position of an input buffer with 2x positions and rest of the x positions in the first portion are padded with zeroes. Received x samples of audio signals for the second channel are stored in a second position of an input buffer with 2x positions and rest of the x positions in the second portion are padded with zeroes. The contents of the first portion and the second portion are transformed to frequency domain components, by a frequency domain transformation engine. Transformed frequency domain components are multiplied with a first filter coefficients indicative of a short echo, to generate frequency domain components with short echo effect, by a filter coefficient multiplier. Transformed frequency domain components are multiplied with a second filter coefficients indicative of a long echo, to generate frequency domain components with long echo effect, by the filter coefficient multiplier. The frequency domain components with short echo effect are converted to time domain components with short echo effect, by a time domain transformation engine. The frequency domain components with long echo effect are converted to time domain components with long echo effect, by the time domain transformation engine. Selective time domain components with short echo effect and selective time domain components with long echo effect are combined by an overlap adder engine to generate a convolved first channel output and a convolved second channel output.
This brief summary has been provided so that the nature of the disclosure may be understood quickly. A more complete understanding of the disclosure can be obtained by reference to the following detailed description of the preferred embodiments thereof in connection with the attached drawings.
The foregoing and other features of several embodiments are now described with reference to the drawings. In the drawings, the same components have the same reference numerals. The illustrated embodiments are intended to illustrate but not limit the invention. The drawings include the following Figures:
The embodiments herein and the various features and advantageous details thereof are explained more fully with reference to the non-limiting embodiments that are illustrated in the accompanying drawings and detailed in the following description. Descriptions of well-known components and processing techniques are omitted so as to not unnecessarily obscure the embodiments herein. The examples used herein are intended merely to facilitate an understanding of ways in which the embodiments herein may be practiced and to further enable those of skill in the art to practice the embodiments herein. Accordingly, the examples should not be construed as limiting the scope of the embodiments herein.
The embodiments herein disclose an audio reproduction device. Referring now to the drawings, where similar reference characters denote corresponding features consistently throughout the figures, various examples of this disclosure is described.
According to an example of this disclosure, real-time convolution to the digital sound signals are applied, with separate convolution functions for each incoming channel, and for each ear. For example, with a two-channel stereo signal, convolutions will be applied in real-time for the left channel to the left ear sometimes referred to as a LL convolution, left channel to the right ear, sometimes referred to as a LR convolution, right channel to the left ear, sometimes referred to as a RL convolution and right channel to the right ear, sometimes referred to as RR convolution.
In one example, each convolution function applies pre-calculated coefficients, associated with the impulse response data from a specific physical space. The number of coefficients for each convolution set can be calculated as follows: n=s*t, where n is the number of coefficients per convolution set, s is the sample rate of the digital signal source in samples per second, and t is the maximum convolution time in seconds. For example, with a signal sample rate of 64,000 samples per second and 0.25 seconds of maximum convolution time, n of 16,000 coefficients are required.
In one example, a non-linear bass distortion (NLBD) function generator is used to digitally generate a controlled harmonic distortion (sometimes referred to as CH distortion) associated with physical subwoofers. The digital NLBD function generator includes a low-pass filter to separate only low frequencies, circuit to generate even and/or odd harmonics, and another low-pass filter. The generated CH distortion is then mixed with the original signal.
In one example, a middle-side filter (MS filter) circuit is used to adjust the physical separation of the original sound source, which may be referred to as the perceived “sound stage”. In the case of stereo signal, middle-side filter determines the perceived distance between the right and left virtual speakers within this sound stage. One implementation of a MS filter includes summing the signals from the left and right channels to create a “middle” signal. It also includes calculating the difference between the signals from the left and right channels to create a separate “side” signal. The middle channel then contains just the information that appears in both the left and right channels, and the side channel contains all the information that differs between the left and right channels. In other words, the middle signal represents sounds that would be perceived by a listener to be emanating mainly from a center location. Similarly, the side signal represents sounds that would be perceived by a listener to be emanating from either the left or right sides of the perceived sound stage. Therefore, by independently amplifying or attenuating the middle and side signals, it is possible to emphasize or reduce sound that appear to originate from either the center or the left and right sides of the perceived sound stage. Among other things, this has the effect of determining how far apart the virtual speakers are located within the perceived sound stage. After applying the amplification or attenuation of the middle and side signals, they are then subsequently summed together and divided by 2 to re-create the left signal, and subtracted from each and divided by 2 to recreate the right signal.
Given:
L=left signal
R=right signal
M=middle signal
S=side signal
MG=center gain; >1 represents amplification, 0<MG<1 represents attenuation
SG=side gain; >1 represents amplification, 0<SG<1 represents attenuation
Then:
M=MG*(L+R) Equation 1
S=SG*(L−R) Equation 2
Finally:
Recreated Left Signal L′=0.5*(M+S) Equation 3
Recreated Right Signal R′=0.5*(M−S) Equation 4
A combination of one or more of the convolution coefficients, CH distortion and MS filter may be applied to the original digital sound. Such a corrected digital sound may assist in recreating the perception of listening to sound as if it were being reproduced by loudspeakers in a defined (modeled) space. For example, the LL, LR, RL and RR convolutions emulate the sounds that would be received by the listener's ears within the modeled space. Instead of perceiving a narrow phantom center channel, the listener's brain reconstructs the processed left and right analog signals reproduced by the left and right headphone drivers into a natural left and right channels, and enables reconstruction of an accurate center channel.
To generate the required convolution coefficients, the desired (modeled) space must be evaluated. Now, referring to
A left ear microphone 306 and a right ear microphone 308 are selectively placed within the desired space 300, for example, at locations that may substantially correspond to a listener's left ear and right ear respectively.
Now, referring to
For example, the signal received at the left ear microphone 306 from the left speaker 302 is deconvolved to generate the LL coefficients. The signal received at the right ear microphone 308 from the left speaker 302 is deconvolved to generate the LR coefficients.
Referring to
Now, referring to
For example, the signal received at the left ear microphone 306 from the sound received from the right speaker 304 is deconvolved to generate the RL coefficients. Referring to
In one example, a digital signal processor may be configured to modify input signal based on the convolution coefficients measured for a modeled space. Now, referring to
The communication management engine 402 is configured to communicate with external devices, for example, computing device 416, over a wired connection 418 or a wireless connection 420. In one example, the communication management engine 402 is configured to communicate with the computing device 416 and receive various parameters for configuring the audio system 400, including the digital signal processor 408. In one example, the communication management engine 402 is configured to receive digital audio signal to be reproduced by the audio system 400, over the wired connection 418 or wireless connection 420. The received digital audio signal (for example, two channel digital audio signals L and R) is fed to the DSP 408.
The analog input tuner 404 is configured to communicate with an analog sound source 422, for example, over an analog wired connection 424, to receive audio signal to be reproduced by the audio system 400. In one example, a two-channel audio signal (left and right) is received. The analog input tuner 404 is configured to optimize impedance and frequency response characteristics of the analog audio signal received from the analog audio source 422. The output of the analog input tuner 404 is fed to the A/D converter 406, to generate digital audio signal (for example, two channel digital audio signals L and R). The digital audio signal is fed to the DSP 408.
The DSP 408 processes the received digital audio signal, applying modifications to the received digital audio signal, based on the convolution coefficients, generated CH distortion and the middle-side filter (MS filter) digital settings. Modified digital audio signal is then fed to the D/A converter 410 to generate modified analog audio signal. The modified analog audio signal in some examples may be amplified by the amplifier 412 to generate an amplified modified analog audio signal. The amplified modified analog audio signal is then fed to an analog output tuner 414. The analog output tuner 414 feeds the amplified modified analog audio signal to left driver 426 and right driver 428, for reproduction of the amplified modified analog audio signal. As one skilled in the art appreciates, if the amplifier 412 is not used, the modified analog audio signal will be fed to the analog output tuner 414 which in turn will feed the modified analog audio signal to the left driver 426 and the right driver 428, for reproduction of the modified analog audio signal. The analog output tuner 414 is configured to optimize impedance and frequency response characteristics of the modified analog audio signal for the left driver 426 and the right driver 428.
Having described the general operation of the audio system 400, functions and features of the DSP 408 will now be described. In general, the DSP 408 is configured to receive digital audio signal (for example, as L and R signals) from the A/D converter 406 (for audio received from an analog audio source) or the communication management engine 402 (for audio received from a digital audio source). The DSP 408 then selectively modifies the received digital audio signal to generate the modified digital audio signal and output the modified digital audio signal, to be fed to the D/A converter 410.
The DSP 408 includes a coefficients and parameters data store 430, a selected convolution coefficients data store 432, a selected DSP filter parameters data store 434, a LL convolution generator 436, a LR convolution generator 438, a RL convolution generator 440, a RR convolution generator 442, a CH distortion generator 444 and a middle-side filter circuit 446. The coefficients and parameters data store 430 stores various coefficients and parameters for one or more modeled space. In one example, various coefficients and parameters are received by the communication management engine 402, from an external computing device and loaded into the coefficients and parameters data store 430.
When a specific modeled space is selected, corresponding coefficients and parameters are retrieved from the coefficients and parameters data store 430 and selectively loaded into the selected convolution coefficients data store 432 and the selected DSP filter parameters data store 434. As one skilled in the art appreciates, the selected convolution coefficients data store 432 and the selected DSP filter parameters data store 434 may be configured to be high speed memory, so that data may be retrieved from them at a speed to process the data in real time.
The LL convolution generator 436, a LR convolution generator 438, a RL convolution generator 440, a RR convolution generator 442 selectively retrieve the selected convolution coefficients from the selected convolution coefficients data store 432 and apply appropriate convolution to each of the channels (L and R) of the digital audio signal to generate a convolved digital audio signal. The convolved digital audio signal is then fed to the D/A converter 410, to generate modified analog audio signal.
In one example, the CH distortion generator 444 adds CH distortion to the convolved digital audio signal. The middle-side filter circuit 446 based on the selected parameters, applies appropriate correction to the convolved digital audio signal with CH distortion, to generate the modified digital audio signal. The modified digital audio signal is then fed to the D/A converter 410, to generate modified analog audio signal.
In one example, the audio system 400 may be selectively placed within an enclosure of an audio reproduction device 448. The audio reproduction device 448 may be a headphone with the left driver 426 and the right driver 428. Additionally, any power source needed to operate the audio system 400 may also be selectively placed within the enclosure of the audio reproduction device 448.
Now, referring to
Now, referring to
In block S604, a digital audio signal is modified based on the generated plurality of convolution coefficients, to generate a convolved digital audio signal. For example, as previously described with reference to
In block S606, a convolved analog audio signal is generated based on the generated convolved digital audio signal. For example, as previously described with reference to
In block S608, the generated convolved analog audio signal is fed to an audio reproduction device. For example, generated convolved analog audio signal is fed to the audio reproduction device 448, as previously described.
The process of convolution requires significant signal processing, while not introducing perceivable delays, or latency. Typically, such processing involves multiple finite impulse response (FIR) filters, where the impulse response characteristics can be of significant duration, generally, greater than 100 milliseconds.
One approach to implement a FIR filter is to calculate the convolution in the frequency domain, and in real time. However, with long impulse response characteristics, such an implementation can introduce unacceptable delays into the audio stream.
An alternate solution that allows reaching the low latency requirement is a partitioned convolution algorithm. Partitioning the algorithm enables splitting both the audio and the impulse response characteristics into smaller sub-components and processing them separately in the frequency domain. Then, combined result is then merged into the final signal in the time domain.
Now, referring to
An alternate solution that allows reaching the low latency requirement is a partitioned convolution algorithm. Partitioning the algorithm enables splitting both the audio and the impulse response characteristics into smaller sub-components and processing them separately in the frequency domain. Then, combined result is then merged into the final signal in the time domain.
The use of partitioned convolution algorithm implies a significant tradeoff between the size of each partition and the required processing time. Reducing partition size results in less latency but requires more processing effort, therefore consuming more energy. However, the increased processing demands can become impractical due to the hardware limitations such as available processing speed or power consumption, particularly in mobile or wearable devices. An improvement to the partitioned convolution algorithm that reduces the need in the processing time and energy is desirable.
In accordance with an example of this disclosure, an alternate example of processing the input streams are described with reference to
As one skilled in the art appreciates, special emulation of audio signals require two or more FIRs processed simultaneously and the system described herein above with reference to
Now, referring to
ICG 720 includes a multiplexer 724, input buffer (IB) 726, frequency domain transform engine (FDTE) 728, filter coefficient multiplier (FCM) 730, time domain transform engine (TDTE) 732, holding buffer (HB) 734, overlap adder engine (OAE) 736, output buffer (OB) 738 and de-multiplexed 740.
Multiplexer 724 is configured to receive samples of input audio signals from a plurality of channels. For example, multiplexer 724 may receive a plurality of samples of two channel digital audio signals (L and R). In one example, the received plurality of samples of two channel digital audio signals are multiplexed by the multiplexer 724 and selectively stored in the input buffer 726 so as to be processed together by the FDTE 728.
FDTE 728 converts the received two channel digital audio signals into frequency domain. The FCM 730 selectively multiplies the converted two channel digital audio signals in frequency domain, with corresponding filter coefficients. Output of the FCM 730 is fed to the TDTE 732, which converts the two channel digital audio signals to time domain. Output of the TDTE 732 is stored in a holding buffer 734. Holding buffer 734 is configured to hold output of the TDTE 732 for multiple time periods. Selective output of the TDTE 732 is fed from the holding buffer 734 to the OAE 736.
The OAE 736 selectively adds various samples of the output of the TDTE 732 to generate a combined convolved digital audio signal for both channels. The output of the OAE 736 is stored in the output buffer 738. The de-multipexer 740 selectively separates the combined convolved digital audio signal into separate convolved digital audio signal, for example, convolved digital audio signal for L channel and convolved digital audio signal for R channel. Convolved digital audio signal for L channel and convolved digital audio signal for R channel may be further processed as previously described with reference to digital signal processor 408, with reference to
Now, referring to
The input buffer 726 has a first portion 742, configured to receive samples of L channel and a second portion 744, configured to receive samples of R channel. In this example, each of the first portion 742 and the second portion 744 of the input buffer 726 have eight buffers and are configured to hold eight samples of audio signals for each channel (L and R). According to an example of this disclosure, multiplexer 724 loads four samples of the L channel (shown as Sa0, Sa1, Sa2, and Sa3) into only first four of the available eight buffers of the first portion 742 and loads zeros in the second four of the available eight buffers of the first portion 742 for L channel. Similarly, the multiplexer 724 loads four samples of the R channel (shown as Sb0, Sb1 Sb2, and Sb3) into only first four of the available eight buffers of the second portion 744 and loads zeros in the second four of the available eight buffers of the second portion 744 for R channel. This selective partial loading of the samples in the input buffer 726 advantageously keeps the processed data separated by each L channel and R channel, when the FDTE 728 processes the samples of audio signals of L channel and R channel.
In general, samples x for each channel may chosen as x=2p, where p is an integer and number of buffers in each of the first portion 742 and second portion 744 will be 2×2p, so that there will be equal number of samples and zeroes in the first portion 742 and second portion 744. In this example, p=2, number of samples will be x=22=4 and number of buffers in the first portion 742 and second portion 744 is 2×22=2×4=8. And, first four buffers in each of the first portion 742 and second portion 744 are filled with samples and rest of the four buffers in each of the first portion 742 and second portion 744 are filled with zeroes.
The samples of the first portion 742 is processed by the FDTE 728 and the samples are transformed into frequency domain components (shown as Fa0, Fa1, Fa3, Fa4, Fa5, Fa6, and Fa7) for channel L and shown in FDTE output 729. Similarly, samples of the second portion 744 is processed by the FDTE 728 and the samples are transformed into frequency domain components (shown as Fb0, Fb1, Fb2, Fb3, Fb4, Fb5, Fb6, and Fb7) for channel R and shown in FDTE output 729. Difference between a traditional 16 input FFT and FDTE 728 of this disclosure will be further described with reference to
Next, there are two sets of filter coefficients, one for short echo and another for long echo. First filter coefficient 722-1 correspond to short echo and second filter coefficient 722-2 corresponds to long echo. For example, the first filter coefficients 722-1 include coefficients Ka10 Ka11, Ka13 Ka14 Ka15, Ka16 and Ka17, corresponding to channel L. Each of these filter coefficients are multiplied with corresponding frequency domain components Fa0, Fa1, Fa3, Fa4, Fa5, Fa6, and Fa7 by the FCM 730, to yield an output of Fa10, Fa11, Fa13, Fa14, Fa15, Fa16, and Fa17 respectively, for channel L. For example, Fa0 is multiplied with Ka10 to yield an output of Fa10. The output is still represented in frequency domain. The output includes effect of short echo for channel L. These outputs may be referred to as frequency domain components with short echo effect for channel L.
The first filter coefficients 722-1 also include coefficients Kb10, Kb11, Kb13, Kb14, Kb15, Kb16, and Kb17, corresponding to channel R. Each of these filter coefficients are multiplied with corresponding frequency domain components Fb0, Fb1, Fb3, Fb4, Fb5, Fb6, and Fb7 by the FCM 730, to yield an output of Fb10, Fb11, Fb13, Fb14, Fb15, Fb16, and Fb17 respectively, for channel R. For example, Fb0 is multiplied with Kb10 to yield an output of Fb10. The output is still represented in frequency domain. The output includes effect of short echo for channel R. These outputs may be referred to as frequency domain components with short echo effect for channel R.
The second filter coefficients 722-2 include coefficients Ka20, Ka21, Ka23, Ka24, Ka25, Ka26, and Ka27, corresponding to channel L. Each of these filter coefficients are multiplied with corresponding frequency domain components Fa0, Fa1, Fa3, Fa4, Fa5, Fa6, and Fa7 by the FCM 730, to yield an output of Fa20, Fa21, Fa23, Fa24, Fa25, Fa26, and Fa27 respectively, for channel L. For example, Fa0 is multiplied with Ka20 to yield an output of Fa20. The output is still represented in frequency domain. The output includes effect of long echo for channel L. These outputs may be referred to as frequency domain components with long echo effect for channel L.
The second filter coefficients 722-2 also include coefficients Kb20, Kb21, Kb23, Kb24, Kb25, Kb26, and Kb27, corresponding to channel R. Each of these filter coefficients are multiplied with corresponding frequency domain components Fb0, Fb1, Fb3, Fb4, Fb5, Fb6, and Fb7 by the FCM 730, to yield an output of Fb20, Fb21, Fb23, Fb24, Fb25, Fb26, and Fb27 respectively, for channel R. For example, Fb0 is multiplied with Kb20 to yield an output of Fb20. The output is still represented in frequency domain. The output includes the effect of long echo for channel R. These outputs may be referred to as frequency domain components with long echo effect for channel R.
The output of the FCM 730 corresponding to short echo stored in buffer 746-1 is then processed by the TDTE 732-1 and the output of the TDTE 732-1 is stored in first holding buffer 734-1. TDTE 732-1 converts the output of the FCM 730 from frequency domain to time domain. For example, output of the TDTE 732-1 corresponding to input samples Sa0, Sa1, Sa2, and Sa3 at time t−1 are shown as Sa10, Sa11, Sa12, and Sa13 and Sa1′4, Sa1′5, Sa1′6, and Sa1′7 in the first holding buffer 734-1. The output of the TDTE 732-1 in the first holding buffer 734-1 shown as Sa10, Sa11, Sa12, and Sa13 will be selectively used in time t−1. The output of the TDTE 732-1 in the first holding buffer 734-1 shown as Sa1′4, Sa1′5, Sa1′6, and Sa1′7 in the first holding buffer 734-1 will be selectively used in the next time slot, namely time t.
TDTE 732-1 converts the output of the FCM 730 from frequency domain to time domain. For example, output of the TDTE 732-1 corresponding to input samples Sb0, Sb1, Sb2, and Sb3 at time t−1 is shown as Sb10, Sb11, Sb12, and Sb13 Sb1′4, Sb1′5, Sb1′6, and Sb1′7 in the first holding buffer 734-1. The output of the TDTE 732-1 in the first holding buffer 734-1 shown as Sb0, Sb1, Sb2, and Sb3 will be selectively used in time t−1. The output of the TDTE 732-1 in the first holding buffer 734-1 shown as Sb1′4, Sb1′5, Sb1′6, and Sb1′7 in the first holding buffer 734-1 will be selectively used in the next time slot, namely time t.
In other words, when the output of the FCM when converted from frequency domain to time domain by the TDTE, two sets of output are generated. A selective first set of output will have effect on the output signal at then current time period (in this case, time t−1) and a selective second set of output will have effect on the output signal at next time period (in this case, time t).
The output of the FCM 730 corresponding to long echo stored in buffer 746-2 is then processed by TDTE 732-2 and converts the output from frequency domain to time domain. The output of the TDTE 732-2 is stored in second holding buffer 734-2.
For example, output of the TDTE 732-2 corresponding to input samples Sa0, Sa1, Sa2, and Sa3 at time t−1 is shown as Sa24, Sa25, Sa26, and Sa27 and Sa2′8, Sa2′9, Sa2′10, and Sa2′11 in the second holding buffer 734-2. The output of the TDTE 732-2 in the second holding buffer 734-2 shown as Sa24, Sa25, Sa26, and Sa27 will be selectively used in subsequent time t, as this corresponds to long echo effect. The output of the TDTE 732-2 in the second holding buffer 734-2 shown as Sa2′8, Sa2′9, Sa2′10, and Sa2′11 in the second holding buffer 734-2 may be selectively used subsequently in one of the next time slots after time t, depending upon the long echo effect of input samples Sa0, Sa1, Sa2, and Sa3.
In other words, when the output of the FCM when converted from frequency domain to time domain by the TDTE, two sets of output are generated. In the case of long echo effect, a selective first set of output will have effect on the output signal at the next time period (in this case, time t) and a selective second set of output will have effect on the output signal at a subsequent time period, depending upon the residual effect of the long echo beyond one time period.
TDTE 732-2 converts the output of the FCM 730 for long echo from frequency domain to time domain. For example, output of the TDTE 732-2 corresponding to input samples Sb0, Sb1, Sb2, and Sb3 at time t−1 is shown as Sb24, Sb25, Sb26, and Sb27 and Sb2′8, Sb2′9, Sb2′10, and Sb2′11 in the second holding buffer 734-2. The output of the TDTE 732-2 in the second holding buffer 734-2 shown as Sb24, Sb25, Sb26, and Sb27 will be selectively used in subsequent time t, as this corresponds to long echo effect. The output of the TDTE 732-2 in the second holding buffer 734-2 shown as Sb2′8, Sb2′9, Sb2′10, and Sb2′11 in the second holding buffer 734-2 may be selectively used in one of the next time slots after time t, depending upon the long echo effect of input samples Sa0, Sa1, Sa2, and Sa3.
Selective samples of the first holding buffer 734-1 and second holding buffer 734-2 are added by the OAE 736, to generate convolved audio signal for L channel and R channel and stored in the output buffer 738. For example, convolved audio signal for L channel are Sac0, Sac2, and Sac3. Similarly, convolved audio signal for R channel are Sbc0, Sbc1, Sbc2, and Sbc3. Functions and features of the OAE 736 will be further described with reference to
The de-multiplexer 740 selectively retrieves the convolved audio signal for L channel and R channel from the output buffer 738 and outputs as convolved L channel and convolved R channel signals. For example, convolved Sac0, Sac1, Sac2, and Sac3 signals are output in sequence by the de-multiplexer 740 as convolved L channel signals. And, convolved Sbc0, Sbc1, Sbc2, and Sbc3 signals are output in sequence by the de-multiplexer 740 as convolved R channel signals.
Now, referring to
For example, output of the TDTE 732-1 corresponding to input samples Saa, Sab, Sac, and Sad for L channel at time t−2 is shown in the first holding buffer 734-1 as Sa1a, Aa1b, Sa1c, and Sa1d in cell 750, and Sa1′0 Sa1′1, Sa1′2, and Sa1′3 in cell 752.
Similarly, output of the TDTE 732-2 corresponding to input samples Saa, Sab, Sac, and Sad at time t−2 is shown in the second holding buffer 734-2 as Sa20, Sa21, Sa22, and Sa23 in cell 754 and Sa2′4, Sa2′5, Sa26 and Sa2′7 in cell 756.
For example, output of the TDTE 732-1 corresponding to input samples Sba, Sbb, Sbc, and Sbd for R channel at time t−2 is shown in the first holding buffer 734-1 as Sb1a, Sb1b, Sb1c, and Sb1d in cell 758 and as Sb1′0, Sb1′1, Sb1′2, and Sb1′3 in cell 760.
Similarly, output of the TDTE 732-2 corresponding to input samples Sba, Sbb, Sbc, and Sbd at time t−2 is shown in the second holding buffer 734-2 as Sb20, Sb21, Sb22, and Sb23 in cell 762 and as Sb2′4, Sb2′5, Sb2′6, and Sb2′7 in cell 764.
Output of the TDTE 732-1 corresponding to input samples Sa0, Sa1, Sa2, and Sa3 for L channel at time t−1 is shown in the first holding buffer 734-1 as Sa10, Sa11, Sa12, and Sa13 in cell 766 and Sb10, Sb11, Sb12, and Sb13 in cell 768.
In general, subscripts (a-d), (0-3), (4-7), (8-11), (12-15), (16-20) in holding buffers 734-1 and 734-2 for each calculated time slot (for example, time slots t−2, t−1, t, and t+1) indicate the output samples for which the corresponding block of data for short echo effect and long echo effect is selectively added by the OAE 736.
Having shown contents of the holding buffer at various time slots for various input samples, function of the OAE 736 is now described. At time t−1, the OAE 736 selectively adds the contents of cell 752, which holds Sa1′0-3 (from time t−2) with contents of cell 754 which holds Sa20-3 (from time t−2) and contents of cell 766, which holds Sa10-3 (from time t−1) to generate convolved output Sac0-3 as shown in cell 770 of output buffer 738, as convolved output at time t−1, for L channel.
Similarly, the OAE 736 selectively adds the contents of cell 760, which holds Sb1′0-3 (from time t−2) with contents of cell 762 which holds Sb20-3 (from time t−2) and contents of cell 768, which holds Sb10-3 (from time t−1) to generate convolved output Sbc0-3 as shown in cell 772 of output buffer 738, as convolved output at time t−1, for R channel.
The OAE 736 selectively adds selective contents of the holding buffer 734-1 and 734-2 at various time slots to generate corresponding convolved output for L channel and R channel. For example, cell 774 shows convolved output Sac4-7 (at time t) for L channel and cell 766 shows convolved output Sbc4-7 (at time t) for R channel. Similarly, cell 778 shows convolved output Sac8-11 (at time t+1) for L channel and cell 780 shows convolved output Sbc8-11 (at time t+1) for R channel.
As one skilled in the art appreciates, the overlap adder selectively adds portions of the output of the TDTE with short echo effect and long echo effect from a given time period with portions of the output of the TDTE from the next time period to generate a convolved output signal.
As one skilled in the art appreciates, there can be additional filter coefficients representing additional convolution to the input signal. For example, there can be a set of filter coefficients representing say, a long-long echo (for example, due to multiple reflections of the sound signal in a specified space), which may have an effect on a subsequent set of input signal. As an example, signals sampled at time t−2 may have an effect on the convolved output signal at time t. Example system described with reference to
Although example system herein is described with reference to two channels of input signal, system described herein may be expanded to additional channels of input signal, with corresponding additional processing circuitry.
Now, referring to
Now, referring to
In general, number of decimation steps yin the modified FFT 804 will be equal to p+1. So, for a value of p=2, the number of decimation steps y will be 2+1=3, and the sample size x for each channel will be 2p, which is equal to 22=4. And, for a value of p=3, the number of decimation steps y will be 3+1=4, and the sample size x for each channel will be 2p, which is equal to 23=8.
Now, referring to
In block S904, the received x samples of audio signals of the first channel is stored in a first portion of an input buffer, with 2x positions. For example, the received samples of the first channel is stored in first portion 742 of the input buffer 726. For example, samples Sa0, Sa1, Sa2, and Sa3 are stored in the first portion 742 of the input buffer 726.
In block S906, rest of the x positions of the first portion of the input buffer is padded with zeros. For example, referring to
In block S908, the received x samples of audio signals of the second channel is stored in a second portion of an input buffer, with 2x positions. For example, the received samples of the second channel is stored in the second portion 744 of the input buffer 726. For example, samples Sb0, Sb1, Sb2, and Sb3 are stored in the second portion 744 of the input buffer 726.
In block S910, rest of the x positions of the second portion of the input buffer is padded with zeros. For example, referring to
In block S912, the contents of the first portion and the second portion are transformed to frequency domain components. For example, contents of the first portion of the input buffer is transformed into frequency domain components as Fa0, Fa1, Fa2, Fa3, Fa4, Fa5, Fa6 and Fa7, by the frequency domain transformation engine 728, as shown in FDTE output 729. And, contents of the second portion of the input buffer is transformed into frequency domain components as Fb0, Fb1, Fb2, Fb3, Fb4, Fb5, Fb6 and Fb7, by the frequency domain transformation engine 728, as shown in FDTE output 729.
In block S914, the transformed frequency domain components are multiplied with a first filter coefficients indication of a short echo, to generate frequency domain components with short echo effect. For example, the transformed frequency domain components shown in FDTE output 729 are multiplied with filter coefficients 722-1, to generate frequency domain components with short echo effect, as shown in block 746-1.
In block S916, the transformed frequency domain components are multiplied with a second filter coefficients indication of a long echo, to generate frequency domain components with long echo effect. For example, the transformed frequency domain components shown in FDTE output 729 are multiplied with filter coefficients 722-2, to generate frequency domain components with long echo effect, as shown in block 746-2.
In block S918, the frequency domain components with short echo effect are converted to time domain components with short echo effect. For example, the time domain transformation engine 732 converts the frequency domain components with short echo in block 746-1 to time domain components with short echo effect, as shown in block 734-1.
In block S920, the frequency domain components with long echo effect are converted to time domain components with long echo effect. For example, the time domain transformation engine 732 converts the frequency domain components with long echo in block 746-2 to time domain components with long echo effect, as shown in block 734-2.
In block S922, selective time domain components with short echo effect and selective time domain components with long echo effect are combined to generate a convolved first channel output and a convolved second channel output. For example, overlap adder 736 selectively adds time domain components with short echo effect and selective time domain components with long echo effect, as described in detail, with reference to
People who create professional audio content, including but not limited to musicians, recording engineers, producers, sound producers, mixers, often struggle due to the limitations of traditional headphones. This requires them to seek professionally-treated physical spaces to deliver professional-sounding content. This includes high fidelity loudspeakers, carefully designed positioning and geometry of hard surfaces within the room such as walls, ceiling, and other reflective objects which shape the sound. The result of this space is to deliver an optimal sound experience with the listener located at a well-defined location, sometimes referred to as the “sweet spot” in the room. However, it is not practical for many audio professionals to utilize sonically-treated spaces, such as recording studios. These spaces typically cost money, may be in inconvenient locations, and require advance reservations. Yet many professionals prefer to work with headphones.
The physical space emulation described in this disclosure enables creating all of the effects of a professionally-treated physical space within headphones, whenever and where ever inspiration strikes. By modeling multiple different recording studio spaces and allowing the user to alternately selecting them, the content creator can even test their work in different virtual studios with the same set of headphones—even if the studios are geographically dispersed. For example, a recording engineer can test their work in an emulated studio located in Los Angeles, another studio in London, and a third in Nashville, all with the same set of headphones.
Our perception is trained to sense stereo sound in three-dimensional space. Traditional stereo headphones isolate our two ears and destroy that perception. Many people prefer to perceive sound with the sensation of emulated 3D space. For example, music sounds more natural and less fatiguing according to this disclosure and is generally more desirable. Since most music is created in carefully designed recording studios, adding emulation of a studio space to music allows the listener to enjoy a sonic experience that is similar to that intended by the producer, recording engineer and artist creators. Additionally, live venue spaces can also be emulated, allowing the listener to experience music as if she were hearing it in a dance club, concert hall, outside concert venue, or any other physical space which can be modeled.
The foregoing description of the specific embodiments will so fully reveal the general nature of the embodiments herein that others can, by applying current knowledge, readily modify and/or adapt for various applications such specific embodiments without departing from the generic concept, and, therefore, such adaptations and modifications should and are intended to be comprehended within the meaning and range of equivalents of the disclosed embodiments. It is to be understood that the phraseology or terminology employed herein is for the purpose of description and not of limitation. Therefore, while the embodiments herein have been described in terms of preferred embodiments, those skilled in the art will recognize that the embodiments herein can be practiced with modification within the spirit and scope of the claims as described herein.
This application is a continuation-in-part application of patent application Ser. No. 16/927,792 filed on Jul. 17, 2020, entitled “SYSTEM AND METHOD FOR AN AUDIO REPRODUCTION DEVICE”, which claims priority to provisional patent application No. 62/873,803 filed on Jul. 12, 2019, entitled “SYSTEM AND METHOD FOR AN AUDIO REPRODUCTION DEVICE”. Contents of application Ser. No. 16/927,792 is incorporated herein by reference, in its entirety. Contents of application No. 62/873,803 is incorporated herein by reference, in its entirety.
Number | Name | Date | Kind |
---|---|---|---|
6091012 | Takahashi | Jul 2000 | A |
9899013 | Stillman | Feb 2018 | B1 |
20080137875 | Zong | Jun 2008 | A1 |
20130243211 | Kondo | Sep 2013 | A1 |
Number | Date | Country | |
---|---|---|---|
62873803 | Jul 2019 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 16927792 | Jul 2020 | US |
Child | 17481936 | US |