With the advancement of technology, the use and popularity of electronic devices has increased considerably. Multiple electronic devices may be used to play audio at the same time. The devices may perform audio placement in order to output an audio sample at the exact correct time across multiple devices. When audio samples are played at exactly the correct time, the audio is synchronized and a user may hear the audio at the same time. Disclosed herein are technical solutions to improve audio placement.
For a more complete understanding of the present disclosure, reference is now made to the following description taken in conjunction with the accompanying drawings.
Multiple electronic devices may be used to play audio at the same time. The devices may perform audio placement in order to output an audio sample at the exact correct time across multiple devices. During audio playback, a device may divide audio data into frames of audio data, with each frame of audio data including a fixed number of audio samples (e.g., 8 ms of audio data). The device may add the frames of audio data to an audio pipeline to be output by a speaker and the speaker may output a single audio sample at a time. When audio samples are played at exactly the correct time by each of the multiple devices, the audio is synchronized and a user may hear audio from each of the devices at the same time.
Some devices perform audio placement using a processor at the end of the audio pipeline. For example, a device may include timing information (e.g., a timestamp) with each frame of audio data to enable a processor located just before the speaker in the audio pipeline to perform the audio placement. Due to the proximity between the processor and the speaker in the audio pipeline, there is a fixed delay between the processor sending the audio sample and the audio sample being output by the speaker, enabling the processor to precisely control when the audio sample is output. For example, if the processor is about to output a first audio sample of a frame at time 1000 μs, but a corresponding timestamp indicates that the frame should start at time 1020 μs, the processor may add audio samples before the first audio sample in order to delay the frame so that the speaker outputs the first audio sample at the correct time of 1020 μs. Alternatively, if the processor is expecting to output the first audio sample at 1030 μs but a timestamp indicates that the first audio sample should start at time 1020 μs, the processor may skip audio samples so that the speaker outputs the first audio sample at the correct time of 1020 μs. Thus, the processor may output the audio samples at the correct time based on the timing information associated with each frame of audio data.
Other devices perform audio placement using a processor at a beginning of the audio pipeline. For example, some devices may not be able to include timing information with each frame of audio data. If the size of the audio pipeline is known, the processor at the beginning of the audio pipeline may perform audio placement by controlling when an audio sample is added to the audio pipeline. For example, the processor may add audio samples in order to output the next audio sample at a later time or may skip audio samples in order to output the next audio sample at an earlier time. Thus, when the size of the audio pipeline is known, the processor may precisely control when the audio sample is output by the speaker. However, the device may not know a size of the audio pipeline and may therefore not know when an audio sample added to the audio pipeline will actually be output by the speaker.
To improve audio placement and/or synchronization of audio output by separate devices, devices, systems and methods are disclosed that may determine a playback delay between when an audio sample is added to an audio pipeline and when the audio sample is output from the audio pipeline to a speaker. For example, a first component may generate a test signal corresponding to a saturation threshold and may send the test signal to a second component at a first time, followed by blank audio samples. The second component may detect that an audio sample exceeds the saturation threshold at a second time, generate a timestamp and send the timestamp to the first component. The first component may determine a playback delay between the first time and the second time and may also determine a number of blank audio samples sent between the first time and the second time, which corresponds to a number of audio samples currently in the audio pipeline. Based on the playback delay and the number of audio samples in the audio pipeline, the system may add audio samples to the audio pipeline in order to output the next audio sample at a later time or may skip audio samples in the audio pipeline in order to output the next audio sample at an earlier time. Thus, the system may precisely control when the next audio sample is added to the audio pipeline so that the next audio sample is output by the speaker at the correct time.
During audio playback, the device 102 may divide audio data into frames of audio data, with each frame of audio data including a fixed number of audio samples (e.g., 8 ms of audio data). For example, when the speaker(s) 134 operate using a 48 kHz clock frequency, a frame of audio data may correspond to 8 ms of the audio data and may include 384 audio samples, with each audio sample corresponding to roughly 21 μs (e.g., 20.833 μs) of the audio data. The device 102 may send the audio data to the speaker(s) 134 by adding the frames of audio data to an audio pipeline to be output by the speaker(s) 134 and the speaker(s) 134 may output a single audio sample at a time. The audio pipeline may include multiple processes and/or multiple tasks, each of which having a buffer of unknown size. Therefore, the device 102 may not know a size of the audio pipeline.
As discussed above, audio placement may be performed by a first component associated with a beginning of the audio pipeline or a second component associated with an end of the audio pipeline. For example, if timing information was included with each frame of audio data, a digital signal processor (DSP) 124 located just before the speaker(s) 134 in the audio pipeline may perform the audio placement.
The system 100 illustrated in
In order to control the speaker(s) 134 to play an audio sample at the correct time, the time aligner 140 may determine a playback delay between when the audio sample is added to the audio pipeline (e.g., sent to the speaker(s) 134 at a first time) and when the audio sample is output from the audio pipeline (e.g., output by the speaker(s) 134 at a second time), as discussed in greater detail below. In addition, the time aligner 140 may determine a number of audio samples currently in the audio pipeline. For example, the time aligner 140 may determine a number of audio samples added to the audio pipeline between the first time and the second time. Based on the playback delay and the number of audio samples in the audio pipeline, the device 102 may output a next audio sample to the speaker(s) 134 such that audio output generated by multiple devices 102 and/or speaker(s) 134 are time-aligned with each other. For example, the time aligner 140 may add audio samples (e.g., blank audio samples having a value of zero) to the audio pipeline in order to delay the next audio sample or may skip audio samples (e.g., remove audio samples from the audio pipeline) in order to play the next audio sample sooner, such that the next audio sample added to the audio pipeline is output by the speaker(s) 134 at a coordinated time.
In some examples, the system 100 may synchronize a first audio sample in a group of audio samples (e.g., audio data corresponding to a song) using the playback delay and/or the number of audio samples currently in the audio pipeline and may synchronize remaining audio samples in the group of audio samples using other techniques, discussed in greater detail below. However, the disclosure is not limited thereto and the system 100 may synchronize two or more audio samples of the group of audio samples based on the playback delay and/or the number of audio samples currently in the audio pipeline without departing from the disclosure.
In order to determine the playback delay, the time aligner 140 may generate an audio test signal and may add the audio test signal to the audio pipeline at a first time. After adding the audio test signal to the audio pipeline, the time aligner 140 may add blank audio frames to the audio pipeline. The DSP 124 may detect the audio test signal at a second time, generate a timestamp corresponding to the second time, and send the timestamp to the time aligner 140. After being detected, the audio test signals may be removed from the audio pipeline prior to the speaker(s) 134, resulting in minimal audible effects. Based on the timestamp, the time aligner 140 may determine the playback delay between the first time and the second time. In addition, the time aligner 140 may determine a number of the blank audio samples added to the audio pipeline between the first time and the second time.
Typically, audio samples received by the DSP 124 do not have a positive peak value exceeding a positive saturation threshold or a negative peak value below a negative saturation threshold. Thus, the time aligner 140 may generate the audio test signal to include a first audio sample having a peak value equal to a relatively large positive value (e.g., maximum positive value) and a second audio sample having a peak value equal to a relatively large negative value (e.g., maximum negative value). As a result, the DSP 124 may detect the first audio sample by detecting an audio sample having a peak value exceeding the positive saturation threshold and may detect the second audio sample by detecting an audio sample having a peak value that is below the negative saturation threshold.
To illustrate an example, the audio samples may have a value represented by a signed integer, having a negative maximum (e.g., −32767) and a positive maximum (e.g., 32767). Thus, a positive saturation threshold may correspond to the positive maximum (e.g., 32767) or a value near to the positive maximum (e.g., 32000) and a negative saturation threshold may correspond to the negative maximum (e.g., −32767 or a value near to the negative maximum (e.g., −32000). However, the disclosure is not limited thereto and values of the saturation thresholds may vary without departing from the disclosure. In some examples, the saturation threshold may be lower (e.g., +/−28000) without departing from the disclosure, provided that the positive audio test signal exceeds the positive saturation threshold and the negative audio test signal is below the negative saturation threshold. Additionally or alternatively, the values of the audio samples may be represented by an unsigned integer without departing from the disclosure. For example, the negative saturation threshold may correspond to a value of zero, whereas the positive saturation threshold may correspond to a positive maximum (e.g., 65534).
In some examples the DSP 124 may detect both the first audio sample and the second audio sample, but the disclosure is not limited thereto. Instead, the DSP 124 may detect only one of the first audio sample or the second audio sample without departing from the disclosure. For example, the audio test signal may be combined with positive audio samples having a positive value. Due to the positive value, a combination of the positive audio samples and the second audio sample may not have a value that is below the negative saturation threshold. However, a combination of the positive audio samples and the first audio sample would have a value that exceeds the positive saturation threshold. Similarly, the audio test signal may be combined with negative audio samples having a negative value. Due to the negative value, a combination of the negative audio samples and the first audio sample may not have a value that exceeds the positive saturation threshold. However, a combination of the negative audio samples and the second audio sample would have a value that is below the negative saturation threshold. Thus, including the first audio sample and the second audio sample increases a likelihood that the DSP 124 may detect the audio test signal.
In the examples described herein, the audio test signal includes the first audio sample and the second audio sample so that regardless of other audio samples combined with the audio test signal, the DSP 124 will detect at least one audio sample having a peak value exceeding the positive saturation threshold or below the negative saturation threshold. However, the disclosure is not limited thereto and the audio test signal may include the first audio sample and/or the second audio sample without departing from the disclosure. For example, the audio test signal may include the first audio sample or the second audio sample multiple times. Additionally or alternatively, while the examples described herein describe the audio test signal as including a positive audio sample followed by a negative audio sample, the disclosure is not limited thereto and the audio test signal may include the positive audio sample and/or the negative audio sample in any order without departing from the present disclosure.
The device 102 may include audio sources 112, such as a notification player 112a, a text-to-speech (TTS) player 112b and a media player 112c. The notification player 112a may generate notifications, such as beeping noises or other audio signals indicating an event. The TTS player 112b may generate text to speech, such as acknowledgement of voice commands, speech indicating a command being performed, responses to voice commands and/or other speech directed to a user. The media player 112c may generate audio corresponding to media such as videos, music, audiobooks, podcasts or the like. For example, the media player 112c may generate audio samples corresponding to the synchronized audio output by multiple devices 102 and/or speaker(s) 134.
Audio samples from the notification player 112a, the text-to-speech (TTS) player 112b and the media player 112c may be combined using an audio mixer 114 and the combined audio samples may be stored in an audio buffer 116. For example, the audio mixer 114 may combine audio samples received from the notification player 112a, the TTS player 112b and/or the media player 112c using saturating arithmetic. Thus, when a sum of the audio samples exceeds a positive saturation threshold, the audio mixer 114 clips the arithmetic at the positive saturation threshold and outputs the positive saturation threshold instead of wrapping around. Similarly, when a sum of the audio samples is below a negative saturation threshold, the audio mixer 114 clips the arithmetic at the negative saturation threshold and outputs the negative saturation threshold instead of wrapping around. The audio buffer 116 may store the combined audio samples and output the combined audio samples to the DSP 124.
As illustrated in
Audio placement may be performed by a time aligner 140 included in the device 102. The time aligner 140 may be configured to resample audio samples, add audio samples and/or drop audio samples, along with additional processing or other functional. For example, in order to determine the playback delay the time aligner 140 may generate the audio test signal. For example, the time aligner 140 may receive audio samples from the media player 112c and may generate the first audio sample having the positive value and the second audio sample having the negative value based on the received audio samples. In some examples, the time aligner 140 may modify existing values of the received audio samples to generate the audio test signal. However, the disclosure is not limited thereto and the time aligner 140 may generate the audio test signals during blank audio samples (e.g., audio samples that don't include any data) or the like.
The time aligner 140 may send the audio test signal to the audio mixer 114 at a first time, followed by blank audio frames. Sending audio samples to the audio mixer 114 may correspond to adding the audio samples to the audio pipeline. In some examples, the time aligner 140 may generate a first timestamp corresponding to the first time. The audio mixer 114 may combine the audio test signal with other audio samples from the notification player 112a and/or the TTS player 112b. Due to the first audio sample being followed by the second audio sample, at least one of the combined audio samples is likely to be saturated (e.g., clipped at the positive saturation threshold or the negative saturation threshold). If peak values of the audio samples being combined with the audio test signal are small enough, a first combined audio sample may be saturated (e.g., clipped at the positive saturation threshold) in a positive direction and a second combined audio sample may be saturated in a negative direction (e.g., clipped at the negative saturation threshold).
The audio mixer 114 may output the combined audio samples to the audio buffer 116, which may output to the DSP 124. The DSP 124 may apply digital signal processing to the combined audio samples, such as performing equalization, range control and/or other processing. In addition, the DSP 124 may detect the audio test signal by detecting that at least one of the combined audio samples has a peak value above the positive saturation threshold or below the negative saturation threshold. Thus, the DSP 124 may detect the audio test signal within a frame of audio data and determine a precise position of the audio test signal within the frame of audio data (e.g., number of audio samples from a beginning of the frame to the audio test signal).
The DSP 124 may output the combined audio samples to the DMA engine 128, which may receive a frame of audio samples at a time and may sequentially output each of the audio samples to the speaker(s) 134 for audio playback. Thus, the DMA engine 128 may operate continuously, such that once the DMA engine 128 is finished with a first frame of audio data, the DMA engine 128 may request a second frame of audio data from the DSP 124. When the DMA engine 128 is finished with the first frame of audio data, the DMA engine 128 may generate an interrupt service routine (ISR) and send the ISR to the DSP 124.
The ISR indicates to the DSP 124 that the DMA engine 128 is ready to receive a second frame of audio data. In response to receiving the ISR, the DSP 124 may send the DMA engine 128 the second frame of audio data. In addition, the DSP 124 previously detected the audio test signal within the second frame of audio data and determined a precise position of the audio test signal within the second frame of audio data (e.g., number of audio samples from a beginning of the second frame to the audio test signal). As the rest of the audio pipeline is completely deterministic (e.g., the DMA engine 128 outputs audio samples sequentially to the speaker(s) 134 for audio playback), the DSP 124 may determine exactly when the audio test signal will be output by the speaker(s) 134 based on the precise position of the audio test signal within the second frame of audio data and the ISR, which indicates a beginning of the second frame.
After receiving the ISR, the DSP 124 may generate a second timestamp corresponding to a second time and may send the second timestamp to the time aligner 140. In some examples the second time corresponds to when the audio test signal will be output by the speaker(s) 134. For example, the DSP 124 may calculate the second time based on the ISR and the precise position of the audio test signal within the second frame. However, the disclosure is not limited thereto and the second time may instead correspond to when the ISR is received by the DSP 124. For example, the DSP 124 may send the second timestamp and position information of the audio test signal within the second frame to the time aligner 140 and the time aligner 140 may determine when the audio test signal will be output by the speaker(s) 134 without departing from the disclosure. Additionally or alternatively, the DSP 124 may send the position information (e.g., frame number and sample number within the frame associated with the audio test signal) and the DMA engine 128 may send the second timestamp in response to the ISR.
Based on the first timestamp and the second timestamp, the time aligner 140 may determine the playback delay. For example, the time aligner 140 may subtract the second timestamp from the first timestamp to determine the playback delay. In addition, the time aligner 140 may determine a number of audio samples currently in the audio pipeline by determining a number of blank audio frames added to the audio pipeline (e.g., sent from the time aligner 140 to the audio mixer 114) between the first time (e.g., first timestamp) and the second time (e.g., second timestamp). Based on the playback delay and the number of audio samples in the audio pipeline, the time aligner 140 may add audio samples (e.g., blank audio samples having a value of zero) to the audio pipeline in order to delay the next audio sample or may skip audio samples (e.g., remove audio samples from the audio pipeline) in order to play the next audio sample sooner, such that the next audio sample added to the audio pipeline is output by the speaker(s) 134 at a specific time.
As illustrated in
The DSP 124 may detect (156) the audio test signal at a second time, may generate (158) a second timestamp corresponding to the second time, and may send (160) the second timestamp to the time aligner 140.
The time aligner 140 may determine (162) the playback delay. For example, the time aligner 140 may determine a difference between the first timestamp and the second timestamp (e.g., difference between the first time and the second time). The time aligner 140 may determine (164) the number of blank samples added to the audio pipeline after the audio test signal (e.g., between the first time and the second time). Using the playback delay and the number of blank samples, the time aligner 140 may synchronize (166) audio based on the playback delay. For example, the time aligner 140 may add audio samples (e.g., blank audio samples having a value of zero) to the audio pipeline in order to delay the next audio sample or may skip audio samples (e.g., remove audio samples from the audio pipeline) in order to play the next audio sample sooner, such that the next audio sample added to the audio pipeline is output by the speaker(s) 134 at a specific time.
In order to synchronize audio data between multiple speaker(s) 134 and/or multiple devices 102, the system 100 may also compensate for differences in clock frequencies between different integrated circuits. For example, a first device (e.g., first wireless speaker) may have a first clock frequency associated with a first audio pipeline and a second device (e.g., second wireless speaker) may have a second clock frequency associated with a second audio pipeline. By synchronizing the first clock frequency and the second clock frequency, the first device and/or the second device may perform audio placement and output audio at the same time, despite separate audio pipelines. In some examples, the first device and/or the second device may use the techniques disclosed herein to perform audio placement. Additionally or alternatively, a first processor of the device 102 may have a first clock frequency and a second processor of the device 102 may have a second clock frequency. Thus, two processors associated with a single audio pipeline may have two separate clock frequencies, and converting between the clock frequencies may be included as part of audio placement.
A clock frequency corresponds to a timing clock signal produced by a crystal oscillator, such as an electronic oscillator circuit that uses the mechanical resonance of a vibrating crystal of piezoelectric material to create an electrical signal with a precise frequency. An integrated circuit may use the clock frequency as a measure of time, such as by timing operations of the integrated circuit based on the cycles of the clock frequency. For example, the electrical signal may be used to increment a counter that counts each “tick” of the electrical signal, such as a high resolution timer configured to provide high-resolution elapsed times. A clock frequency of 48 kHz corresponds to 48,000 ticks of the electrical signal per second. Thus, the high resolution timer may have a first value (e.g., 24000) at a first time and may generate a first timestamp indicating the first value. Over time, the high resolution timer may increment with each tick of the electrical signal (e.g., 24001, 24002, etc.), may have a second value (e.g., 36000) at a second time and may generate a second timestamp indicating the second value. The first timestamp and the second timestamp may be compared to determine a difference in time between the first time and the second time. If there is a difference between otherwise identical clock frequencies, these differences can result in some devices operating faster or slower than others.
A problem for generating synchronized audio occurs when a first integrated circuit has a different sampling rate (e.g., clock frequency) than a second integrated circuit. This can occur between two devices, between two integrated circuits included in a single device, and/or between a device and a loudspeaker (e.g., the loudspeaker may have its own crystal oscillator that provides an independent clock signal). Thus, the audio data that the device transmits to the loudspeaker may be output at a subtly different sampling rate by the loudspeaker, such that a playback rate of the audio is subtly different than the audio data that had been sent to the loudspeaker. For example, consider loudspeakers that transfer audio data using a 48 kHz sampling rate (i.e., 48,000 digital samples per second of analog audio signal). An actual rate based on a first integrated circuit's clock signal might actually be 48,000.001 samples per second, whereas another integrated circuit's clock signal might operate at an actual rate of 48,000.002 samples per second. This difference of 0.001 samples per second between actual frequencies is referred to as a frequency offset. The consequences of a frequency offset is an accumulated drift in the timing between the integrated circuits over time. Uncorrected, after one-thousand seconds, the accumulated drift is an entire sample of difference between integrated circuits.
Using techniques known to one of skill in the art, the frequency offset and/or drift between different sampling rates can be measured and corrected. Thus, during playback of synchronized audio (e.g., while the speaker(s) 134 are producing audio corresponding to audio samples), the device 102 may add or remove a certain number of digital samples per second in order to compensate for the frequency offset between a local sampling rate (e.g., first clock frequency associated with the device 102) and a master sampling rate (e.g., second clock frequency associated with a remote device). For example, the device 102 may add at least one sample per second when the frequency offset is positive and may remove at least one sample per second when the frequency offset is negative. Therefore, first audio samples sent to a first speaker 134 may be aligned with second audio samples sent to a second speaker 134, such that the first audio samples are output by the first speaker and the second audio samples are output by the second speaker at roughly the same time (e.g., substantially simultaneously).
In order to compensate for the frequency offset and/or the drift between sampling rates on different devices, the device 102 may perform audio placement, as illustrated in
In some examples, the device 102 may compensate for a frequency offset and/or drift between a first processor and a second processor included in the device 102. For example, the device 102 may include an advanced RISC Machines (ARM) processor 210 associated with a beginning of the audio pipeline and a Digital Signal Processor (DSP) 220 associated with an ending of the audio pipeline, as illustrated in
The ARM processor 210 may be associated with a first clock frequency (e.g., 1 GHz) and the DSP 220 may be associated with a second clock frequency (e.g., 800 MHz). In addition to operating at different frequencies, the first clock frequency and the second clock frequency may have other variations that result in timer offset, frequency offset and/or drift. While the ARM processor 210 and the DSP 220 communicate, the second clock frequency is not visible to the ARM processor 210 and the first clock frequency is not visible to the DSP 220. Thus, a first timestamp generated by the ARM processor 210 (e.g., using a first high resolution timer) cannot be compared to a second timestamp generated by the DSP 220 (e.g., using a second high resolution timer) without first synchronizing the clock frequencies and compensating for a timer offset (e.g., skew) between the first high resolution timer and the second high resolution timer, which may vary over time based on the frequency offset and/or drift.
To synchronize the clock frequencies and compensate for the timer offset, the frequency offset and/or drift between the ARM processor 210 and the DSP 220, the device 102 may generate a first timestamp using the first clock frequency and a second timestamp using the second clock frequency at one or more points in time. The device 102 may perform a timestamp exchange between the ARM processor 210 and the DSP 220 during an initialization step (e.g., when the device 102 is powered on) to determine the timer offset, the frequency offset and/or drift. After determining the timer offset, the frequency offset and/or drift, a timestamp generated by the DSP 220 (e.g., associated with the second clock frequency) may be translated or converted to a timestamp associated with the ARM processor 210 (e.g., associated with the first clock frequency) or vice versa. For example, a timestamp converter 242 illustrated in
The media player 112c may send audio samples to the time aligner 140. The time aligner 140 may resample the audio samples, add audio samples and/or drop audio samples. In addition, the time aligner 140 may generate the audio test signals and/or the blank audio samples. The time aligner 140 may output the audio samples, including the audio test signals and/or the blank audio samples, to the audio mixer 114.
Audio samples from the notification player 112a, the text-to-speech (TTS) player 112b and the time aligner 140 may be combined by the audio mixer 114. For example, the audio mixer 114 may combine audio samples received from the notification player 112a, the TTS player 112b and/or the time aligner 140 using saturating arithmetic. Thus, when a sum of the audio samples exceeds a positive saturation threshold, which may correspond to the first audio sample included in the audio test signal, the audio mixer 114 clips the arithmetic at the positive saturation threshold and outputs the positive saturation threshold instead of wrapping around. Similarly, when a sum of the audio samples is below a negative saturation threshold, which may correspond to the second audio sample included in the audio test signal, the audio mixer 114 clips the arithmetic at the negative saturation threshold and outputs the negative saturation threshold instead of wrapping around.
The audio mixer 114 may output the combined audio samples to the audio buffer 116, which may store the combined audio samples and output the combined audio samples to an audio proxy 218. The audio proxy 218 may be configured as an interface between the ARM processor 210 and the DSP 220. For example, the audio proxy 218 may receive audio samples, divide the audio samples into frames of audio data and send the audio samples to the DSP 220.
As illustrated in
The ring buffer 222 may output frames of audio samples to the ping-pong buffer 226 included in the DSP 220. For example, the ping-pong buffer 226 may be configured to output audio samples corresponding to a first frame of audio data stored in a first portion to the ping-pong buffer 226 while receiving and storing audio samples corresponding to a second frame of audio data in a second portion of the ping-pong buffer 226. After completely outputting the first frame of audio data from the first portion, the ping-pong buffer 226 may output audio samples corresponding to the second frame of audio data stored in the second portion to the ping-pong buffer 226 while receiving and storing audio samples corresponding to a third frame of audio data in the first portion. Thus, the ping-pong buffer 226 bounces between the first portion and the second portion, receiving and outputting frames of audio data at a time.
The DSP 124 may copy audio samples from the ring buffer 222 to the ping-pong buffer 226. The DSP 124 may apply digital signal processing to the combined audio samples, such as performing equalization, range control and/or other processing. In addition, the DSP 124 may detect the audio test signal by detecting that at least one of the combined audio samples has a peak value above the positive saturation threshold or below the negative saturation threshold. Thus, the DSP 124 may detect the audio test signal within a frame of audio data and determine a precise position of the audio test signal within the frame of audio data (e.g., number of audio samples from a beginning of the frame to the audio test signal).
The ping-pong buffer 226 may output frames of audio data to the DMA engine 128 one frame at a time. The DMA engine 128 may receive a single frame of audio samples from the ping-pong buffer 226 and may sequentially output each of the audio samples included in the frame. The DMA engine 128 may output audio samples and/or frames of audio samples to an Integrated Interchip Sound (I2S) Driver 230 that sends the audio samples to a digital-to-analog converter (DAC) 232 associated with the speaker(s) 134 for audio playback. The DAC 232 may convert the audio samples from a digital signal to an analog signal and the speaker(s) 134 may produce audible sound by driving a “voice coil” with an amplified version of the analog signal.
The DMA engine 128 may operate continuously, such that once the DMA engine 128 is finished with a first frame of audio data, the DMA engine 128 may request a second frame of audio data from the ping-pong buffer 226. For example, when the DMA engine 128 is finished with the first frame of audio data, the DMA engine 128 may generate an interrupt service routine (ISR) and send the ISR to the DSP 124 requesting the second frame of audio data.
The ISR indicates to the DSP 124 that the DMA engine 128 is ready to receive a second frame of audio data, which provides timing information associated with output generated by the speaker(s) 134. For example, as the rest of the audio pipeline is completely deterministic (e.g., audio samples are sequentially sent from the DMA engine 128 to the speaker(s) 134 for audio playback), the DSP 124 may determine a fixed delay from when an audio sample is output by the DMA engine 128 to when the audio sample is output by the speaker(s) 134. Thus, the DSP 124 may determine exactly when a first audio sample in the second frame will be output by the speaker(s) 134 based on the ISR. In addition, the DSP 124 previously detected the audio test signal within the second frame of audio data and determined a precise position of the audio test signal within the second frame of audio data (e.g., number of audio samples from a beginning of the second frame to the audio test signal). Therefore, the DSP 124 may determine exactly when the audio test signal will be output by the speaker(s) 134 based on the ISR and the precise position of the audio test signal within the second frame of audio data.
After receiving the ISR, the DSP 124 may generate a second timestamp (e.g., TimestampDSP) corresponding to a second time and may send the second timestamp to the time aligner 140. In some examples the second time corresponds to when the audio test signal will be output by the speaker(s) 134. For example, the DSP 134 may calculate the second time based on the ISR and the precise position of the audio test signal within the second frame. However, the disclosure is not limited thereto and the second time may instead correspond to when the ISR is received by the DSP 124. For example, the DSP 124 may send the second timestamp and position information of the audio test signal within the second frame to the time aligner 140 and the time aligner 140 may determine when the audio test signal will be output by the speaker(s) 134 without departing from the disclosure. Additionally or alternatively, the DSP 124 may send the position information (e.g., frame number and sample number within the frame associated with the audio test signal) and the DMA engine 128 may send the second timestamp in response to the ISR.
In the example illustrated in
The timestamp converter 242 may send the third timestamp and/or additional information to the time aligner 140. The timestamp converter 242 may be included in the DSP 220 and/or the ARM processor 110 without departing from the disclosure. In some examples, the timestamp converter 242 may determine when the audio test signal will be output by the speaker(s) 134 without departing from the disclosure. For example, the DSP 124 may send the position information (e.g., frame number and sample number within the frame associated with the audio test signal) to the timestamp converter 242 in response to detecting the audio test signal and the DMA engine 128 may send the second timestamp to the timestamp converter 242 in response to the ISR.
Audio placement may be performed by the time aligner 140, as discussed above. As the functionality of the time aligner 140 is identical between
In the examples described herein, the audio test samples 312 includes the first audio sample and the second audio sample so that regardless of other audio samples combined with the audio test samples 312, the DSP 124 will detect at least one audio sample having a peak value exceeding the positive saturation threshold or below the negative saturation threshold. However, the disclosure is not limited thereto and the audio test signal may include the first audio sample and/or the second audio sample without departing from the disclosure. In some examples, the audio test samples 312 may include two or more of the first audio sample and/or two or more of the second audio sample without departing from the disclosure. For example, the audio test samples 312 may include a first audio sample having a peak value of a maximum value (e.g., positive saturation threshold), a second audio sample having a peak value of a minimum value (e.g., negative saturation threshold), a third audio sample having a peak value of the maximum value and a fourth audio sample having a peak value of the minimum value.
The audio samples 410 may be output to the audio mixer 114 at a first time and the time aligner 140 may generate a first timestamp 430 corresponding to the first time. The audio mixer 114 may combine the audio samples 410 with other audio samples to generate audio samples 420 (e.g., audio samples DSP61-DSP70). Thus, audio sample Aligner1 (e.g., the positive test audio sample) in the audio samples 410 corresponds to audio sample DSP68 in the audio samples 420, as shown by the grey highlighting.
The DSP 124 may detect audio samples having a value greater than the positive saturation threshold or below the negative saturation threshold. Thus, the audio samples 420 may be output to the DSP 124 and the DSP 124 may detect that the audio sample DSP68 has a value greater than the positive saturation threshold at a second time. The DSP 124 may generate a second timestamp 432 corresponding to the second time.
As illustrated in
In addition to the playback delay 440, the time aligner 140 may determine the number of blank audio samples 442 (e.g., audio samples Aligner3-Aligner7) that were added to the audio pipeline between the first time and the second time. Thus, the time aligner 140 may determine that the playback delay 440 corresponded to the number of blank audio samples 442 (e.g., five blank audio samples) for a total number of audio samples in the audio pipeline 444 (e.g., seven audio samples) when including the audio test samples 412. Based on the playback delay 440 and the total number of audio samples in the pipeline 444, the time aligner 140 may add audio samples (e.g., blank audio samples having a value of zero) to the audio pipeline in order to delay the next audio sample or may skip audio samples (e.g., remove audio samples from the audio pipeline) in order to play the next audio sample sooner, as discussed in greater detail above. Thus, the device 102 may add audio samples or skip audio samples so that the next audio sample added to the audio pipeline is output by the speaker(s) 134 at a specific time.
As illustrated in
In order to precisely output the first audio sample (e.g., Audio1) of the audio data, the device 102 may determine a time difference 534 between the original timing 530 and the synchronized timing 532 and determine a number of audio samples corresponding to the time difference 534. For example, the device 102 may determine that the time difference 534 corresponds to added audio samples 536, which are included in the synchronized buffer 520 (e.g., audio buffer 116 after adding the audio samples). Thus, the device 102 may output the original blank audio samples (Blank20-Blank21), followed by five blank added audio samples 536 (Blank22-Blank26) and then output the first audio sample (e.g., Audio1) of the audio data, which is output at the synchronized timing 532. While this example illustrates the device 102 adding blank audio samples, the disclosure is not limited thereto and the device 102 may add any type of audio samples without departing from the disclosure. For example, the device 102 may duplicate an audio sample and/or may output audio samples having values other than zero without departing from the disclosure.
As illustrated in
In order to precisely output the first audio sample, the device 102 may determine a time difference 544 between the original timing 540 and the synchronized timing 542 and determine a number of audio samples corresponding to the time difference 544. For example, the device 102 may determine that the time difference 544 corresponds to skipped audio samples 546 (e.g., Blank25-Blank27)), which are removed from the synchronized buffer 522 (e.g., audio buffer 116 after removing the audio samples). Thus, the device 102 may output the blank audio samples (e.g., Blank20-Blank24), followed by the first audio sample (e.g., Audio1) of the audio data, which is output at the synchronized timing 542.
As illustrated in
As illustrated in
As illustrated in
As illustrated in
As illustrated in
As illustrated in
The time aligner 140 may determine (816) a difference between the first number and the second number and may determine (818) whether the difference is positive. If the difference is positive, the time aligner 140 may add (820) to the audio pipeline a third number of audio samples corresponding to the difference. If the difference is negative, the time aligner may remove (820) from the audio pipeline the third number of audio samples corresponding to the difference. The time aligner 140 may then add (824) the first audio data to the audio pipeline. Thus, the device 102 may add or remove audio samples from the audio pipeline so that a first audio sample of the first audio data is output by the speaker(s) 134 at a specific time.
The device 102 may include an audio output device for producing sound, such as speaker(s) 134, and the audio output device may be integrated into the device 102 or may be separate. The device 102 may be an electronic device capable of receiving audio data and outputting the audio data using precise timing. For example, the device 102 may output the audio data in synchronization with audio data output by other speaker(s) and/or device(s). Examples of electronic devices may include computers (e.g., a desktop, a laptop, a server or the like), portable devices (e.g., smart phone, tablet, speech controlled devices or the like), media devices (e.g., televisions, video game consoles, or the like), or the like. The device 102 may also be a component of any of the abovementioned devices or systems.
As illustrated in
The device 102 may include one or more controllers/processors 904, that may each include a central processing unit (CPU) for processing data and computer-readable instructions, and a memory 906 for storing data and instructions. The memory 906 may include volatile random access memory (RAM), non-volatile read only memory (ROM), non-volatile magnetoresistive (MRAM) and/or other types of memory. The device 102 may also include a data storage component 908, for storing data and controller/processor-executable instructions (e.g., instructions to perform the algorithms illustrated in
The device 102 includes input/output device interfaces 910. A variety of components may be connected through the input/output device interfaces 910, such as the speaker 134, microphone(s) (not illustrated), wireless speaker(s) (not illustrated) or the like. The input/output interfaces 910 may include digital to analog (D/A) converters for converting audio signals into an analog current to drive the speaker(s) 134, if the speaker(s) 134 are integrated with or hardwired to the device 102. However, if the speaker(s) 134 are independent, the D/A converters will be included with the speaker(s) 134, and may be clocked independent of the clocking of the device 102 (e.g., conventional Bluetooth speakers). Likewise, the input/output interfaces 910 may include analog to digital (A/D) converters for converting output of the microphone(s) into digital signals if the microphones are integrated with or hardwired directly to the device 102. If the microphone(s) are independent, the A/D converters will be included with the microphone(s), and may be clocked independent of the clocking of the device 102.
The input/output device interfaces 910 may also include an interface for an external peripheral device connection such as universal serial bus (USB), FireWire, Thunderbolt or other connection protocol. The input/output device interfaces 910 may also be configured to operate with to one or more network(s) 999 via an Ethernet port, a wireless local area network (WLAN) (such as WiFi) radio, Bluetooth, and/or wireless network radio, such as a radio capable of communication with a wireless communication network such as a Long Term Evolution (LTE) network, WiMAX network, 3G network, etc. The network(s) 999 may include a local or private network or may include a wide network such as the internet. Through the network(s) 999, the system 100 may be distributed across a networked environment.
The device 102 further includes a synchronization module 924, which may comprise processor-executable instructions stored in storage 908 to be executed by controller(s)/processor(s) 904 (e.g., software, firmware, hardware, or some combination thereof). For example, components of the synchronization module 924 may be part of a software application running in the foreground and/or background on the device 102. The synchronization module 924 may control the device 102 as discussed above, for example with regard to
Executable computer instructions for operating the device 102 and its various components may be executed by the controller(s)/processor(s) 904, using the memory 906 as temporary “working” storage at runtime. The computer instructions may be stored in a non-transitory manner in non-volatile memory 906, storage 908, or an external device. Alternatively, some or all of the executable instructions may be embedded in hardware or firmware in addition to or instead of software.
The components of the device 102, as illustrated in
The concepts disclosed herein may be applied within a number of different devices and computer systems, including, for example, general-purpose computing systems, server-client computing systems, mainframe computing systems, telephone computing systems, laptop computers, cellular phones, personal digital assistants (PDAs), tablet computers, video capturing devices, video game consoles, speech processing systems, distributed computing environments, etc. Thus the modules, components and/or processes described above may be combined or rearranged without departing from the scope of the present disclosure. The functionality of any module described above may be allocated among multiple modules, or combined with a different module. As discussed above, any or all of the modules may be embodied in one or more general-purpose microprocessors, or in one or more special-purpose digital signal processors or other dedicated microprocessing hardware. One or more modules may also be embodied in software implemented by a processing unit. Further, one or more of the modules may be omitted from the processes entirely.
The above embodiments of the present disclosure are meant to be illustrative. They were chosen to explain the principles and application of the disclosure and are not intended to be exhaustive or to limit the disclosure. Many modifications and variations of the disclosed embodiments may be apparent to those of skill in the art. Persons having ordinary skill in the field of computers and/or digital imaging should recognize that components and process steps described herein may be interchangeable with other components or steps, or combinations of components or steps, and still achieve the benefits and advantages of the present disclosure. Moreover, it should be apparent to one skilled in the art, that the disclosure may be practiced without some or all of the specific details and steps disclosed herein.
Embodiments of the disclosed system may be implemented as a computer method or as an article of manufacture such as a memory device or non-transitory computer readable storage medium. The computer readable storage medium may be readable by a computer and may comprise instructions for causing a computer or other device to perform processes described in the present disclosure. The computer readable storage medium may be implemented by a volatile computer memory, non-volatile computer memory, hard drive, solid-state memory, flash drive, removable disk and/or other media.
Embodiments of the present disclosure may be performed in different forms of software, firmware and/or hardware. Further, the teachings of the disclosure may be performed by an application specific integrated circuit (ASIC), field programmable gate array (FPGA), or other component, for example.
Conditional language used herein, such as, among others, “can,” “could,” “might,” “may,” “e.g.,” and the like, unless specifically stated otherwise, or otherwise understood within the context as used, is generally intended to convey that certain embodiments include, while other embodiments do not include, certain features, elements and/or steps. Thus, such conditional language is not generally intended to imply that features, elements and/or steps are in any way required for one or more embodiments or that one or more embodiments necessarily include logic for deciding, with or without author input or prompting, whether these features, elements and/or steps are included or are to be performed in any particular embodiment. The terms “comprising,” “including,” “having,” and the like are synonymous and are used inclusively, in an open-ended fashion, and do not exclude additional elements, features, acts, operations, and so forth. Also, the term “or” is used in its inclusive sense (and not in its exclusive sense) so that when used, for example, to connect a list of elements, the term “or” means one, some, or all of the elements in the list.
Conjunctive language such as the phrase “at least one of X, Y and Z,” unless specifically stated otherwise, is to be understood with the context as used in general to convey that an item, term, etc. may be either X, Y, or Z, or a combination thereof. Thus, such conjunctive language is not generally intended to imply that certain embodiments require at least one of X, at least one of Y and at least one of Z to each is present.
As used in this disclosure, the term “a” or “one” may include one or more items unless specifically stated otherwise. Further, the phrase “based on” is intended to mean “based at least in part on” unless specifically stated otherwise.
Number | Name | Date | Kind |
---|---|---|---|
7269524 | Ong | Sep 2007 | B1 |
9246545 | Ayrapetian | Jan 2016 | B1 |
20020046288 | Mantegna | Apr 2002 | A1 |
20020177996 | Cooper | Nov 2002 | A1 |
20060256970 | Asada | Nov 2006 | A1 |
20150032788 | Velazquez | Jan 2015 | A1 |
20160014373 | LaFata | Jan 2016 | A1 |
20170064154 | Tseng | Mar 2017 | A1 |