The presently disclosed subject matter relates to playback of digital audio, and in particular to implementation of systems for simultaneous playback of digital audio on multiple speakers.
Problems of implementation in systems of digital audio playback have been recognized in the conventional art and various techniques have been developed to provide solutions.
According to a further aspect of the presently disclosed subject matter there is provided a computerized microphone-equipped audio playback device comprising a processing circuitry, the processing circuitry comprising a speaker and microphone, and being configured to:
a) receive data indicative of digital audio; and
b) play the digital audio on a speaker, in accordance with a playback delay,
a) receiving, by a processor of a first microphone-equipped playback device, data indicative of digital audio; and
b) playing the digital audio, by the processor, on a speaker of the first microphone-equipped playback device, in accordance with a playback delay,
According to another aspect of the presently disclosed subject matter there is provided a computer program product comprising a non-transitory computer readable storage medium retaining program instructions, which, when read by a processing circuitry, cause the processing circuitry to perform a computerized method of providing a user with a persistent view of syndicated content items, the method comprising:
a) receiving, by a processor of a first microphone-equipped playback device, data indicative of digital audio; and
b) playing the digital audio, by the processor, on a speaker of the first microphone-equipped playback device, in accordance with a playback delay,
In order to understand the invention and to see how it can be carried out in practice, embodiments will be described, by way of non-limiting examples, with reference to the accompanying drawings, in which:
In the following detailed description, numerous specific details are set forth in order to provide a thorough understanding of the invention. However, it will be understood by those skilled in the art that the presently disclosed subject matter may be practiced without these specific details. In other instances, well-known methods, procedures, components and circuits have not been described in detail so as not to obscure the presently disclosed subject matter.
Unless specifically stated otherwise, as apparent from the following discussions, it is appreciated that throughout the specification discussions utilizing terms such as “processing”, “computing”, “generating”, “playing”, “detecting”, “noting”, “calculating”, “receiving”, “providing”, “obtaining”, “measuring”, “communicating” or the like, refer to the action(s) and/or process(es) of a computer that manipulate and/or transform data into other data, said data represented as physical, such as electronic, quantities and/or said data representing the physical objects. The term “computer” should be expansively construed to cover any kind of hardware-based electronic device with data processing capabilities including, by way of non-limiting example, the processor, mitigation unit, and inspection unit therein disclosed in the present application.
The terms “non-transitory memory” and “non-transitory storage medium” used herein should be expansively construed to cover any volatile or non-volatile computer memory suitable to the presently disclosed subject matter.
The operations in accordance with the teachings herein may be performed by a computer specially constructed for the desired purposes or by a general-purpose computer specially configured for the desired purpose by a computer program stored in a non-transitory computer-readable storage medium.
Embodiments of the presently disclosed subject matter are not described with reference to any particular programming language. It will be appreciated that a variety of programming languages may be used to implement the teachings of the presently disclosed subject matter as described herein.
Attention is now directed to
In recent years, “smart” speakers have become increasing popular. A smart speaker is, in some examples, a wireless device (that includes a processor) which communicates with a user via a voice command interface i.e. the user makes requests commands (e.g. for weather, news, checking a schedule, control of home thermostat, alarm, appliances etc.), and the speaker responds by performing requested actions and by communicating to the user with a human-like voice. Google Home™, Amazon Echo™ and Apple HomePod™ are examples of smart speakers.
Playing music is another common use of smart speakers—for example: using streaming applications such as Spotify™, Apple Music™, Deezer™ etc. While many smart speakers are stereophonic, their compact design limits their ability to give the listener a stereophonic sound experience.
A system using two or more smart speaker devices can—in principle—play music with an enhanced stereophonic or multichannel experience. But such an arrangement can have synchronization problems: the devices' clocks are not synchronized, and the latency imposed in each speaker by digital-analog conversion (DAC) and other delays is not necessarily identical.
Moreover, as in every stereophonic system, if a listener is closer to one loudspeaker than to the other, the audio from the closer loudspeaker arrives earlier and louder at the listener's ears. Consequently the listener can perceive that the sound comes from a place near the closer loudspeaker rather than from the center of both loudspeakers.
In some embodiments of the presently disclosed subject matter, a multi microphone-equipped playback device system performing time synchronization—and optionally gain alignment—relative to the listener's location can enhance and optimize the listening experience.
It is noted that amplitude decay of a direct sound wave is approximately proportional to 1/r where r is the distance between the listener and the sound source (within distance range where the reverberation can be neglected).
In some embodiments of the presently disclosed subject matter, two or more microphone-equipped playback devices play the same audio signal. In some embodiments, two or more microphone-equipped playback devices play respective channels of multi-channel content. In some embodiments, an external device transmits digital audio to all of the microphone-equipped playback devices. In some embodiments, each of the microphone-equipped playback devices accesses identical audio content (e.g. from internal disk or network server).
The description hereinbelow addresses an example scenario where one external device transmits individual channels of a multi-channel stream to respective microphone-equipped playback devices. The same method, with minor modifications, can be utilized for other audio-source cases such as those mentioned hereinabove, as known in the art.
The term “optimal listening position” (or “sweet spot”) can refer to a point at which all wave fronts from all loudspeakers arrive simultaneously. The optimal listening position can be steered to a listener's location by adjusting the play time on each loudspeaker. Similarly playback gain can be adjusted in order to correct the level difference at the optimal listening position.
Some embodiments of the presently disclosed subject matter employ a computer-based method that considers all factors affecting arrival time of audio at the listener's location (e.g. clocks not in sync, codec delay, driver delay, buffer drops, DAC delay etc.) and compensates for all of them together—without knowledge of the geometry or positions of loudspeakers and/or listener, and without explicitly computing the absolute positions or relative positions of the loudspeakers and/or listener. In addition, some embodiments of the presently disclosed subject matter compensate for loudness differences between the loudspeakers at the listener's location due to the different distances from the listener, resulting in a different decay in energy.
In
Attention is now directed to
Microphone-equipped playback device 110 can include processing circuitry 200. Processing circuitry 200 can include processor 210 and memory 220.
Processor 210 can be a suitable hardware-based electronic device with data processing capabilities, such as, for example, a general purpose processor, digital signal processor (DSP), a specialized Application Specific Integrated Circuit (ASIC), one or more cores in a multicore processor etc. Processor 210 can also consist, for example, of multiple processors, multiple ASICs, virtual processors, combinations thereof etc.
Memory 220 can be, for example, a suitable kind of volatile or non-volatile storage, and can include, for example, a single physical memory component or a plurality of physical memory components. Memory 220 can also include virtual memory. Memory 220 can be configured to, for example, store various data used in computation.
Network interface 225 can be a suitable type of interface to a wired or wireless network communications device that provides data connectivity to e.g. other microphone-equipped speakers, streaming playback devices, etc.
Clock subsystem 270 can be a suitable type of hardware and/or software mechanism for making time available to components microphone-equipped playback device 110. In some embodiments, the time made available by clock subsystem 270 need not be synchronized with clocks of peer microphone-equipped playback devices.
Microphone subsystem 230 can be a suitable type of hardware and/or software subsystem that receives sound the (e.g. voice commands, recordable audio etc.) from an area external to microphone-equipped playback device 110. Microphone subsystem 230 can include e.g. a hardware microphone, an analog-to-digital component, software etc. There can be a delay from the time that a sound reaches the microphone and the time that e.g. a digital representation of the sound is handled by processor 210. This delay imposed by microphone subsystem 230 or its components can be at least part of a delay that is herein termed “ingress delay”.
Speaker subsystem 240 can be a suitable type of hardware and/or software subsystem that receives data indicative of digital audio (from e.g. processor 210) and plays the audible sound. Speaker subsystem 240 can include e.g. codec processing software, digital-to-analog component, a hardware speaker, etc. There can be a delay from the time that a digital audio is transmitted by the processor 210 and the time that sound is played. This delay imposed by speaker subsystem 240 or its components can be at least part of a delay that is herein termed “egress delay”.
Processor 210 can be configured to execute several functional modules in accordance with computer-readable instructions implemented on a non-transitory computer-readable storage medium. Such functional modules are referred to hereinafter as comprised in the processor. These modules can include, for example, delay calibration module 250, audio playback delay model 260 and gain module 265.
Delay calibration module 250 can be operably connected to microphone subsystem 230 and can receive data indicative of received sound. Delay calibration module 250 can be operably connected to speaker subsystem 240 and can receive data indicative of sound for playback. Delay calibration module 250 can be operably connected to network interface 225 and can exchange data with e.g peer microphone-equipped playback devices and/or servers. Delay calibration module 250 can perform methods of delay and gain calibration, as described in detail below with reference to
Audio playback delay module 260 can impose a delay value (e.g. as determined from data provided by delay calibration module 250) for digital audio that is to be played out e,g, on speaker subsystem 240. This procedure is described in more detail below with reference to
Gain module 265 can be impose a gain value (e.g. as determined from data provided by delay calibration module 250) for digital audio that is to be played out e,g, on speaker subsystem 240. This procedure is described in more detail below with reference to
It is noted that the teachings of the presently disclosed subject matter are not bound by the system described with reference to
Attention is now directed to
Processing circuitry 200 (e.g. delay calibration module 250) can begin by performing a calibration (310) method that is herein termed a listener position inbound sound detection procedure. This procedure is described in detail below with reference to
Processing circuitry 200 (e.g. delay calibration module 250) can next perform a calibration (320) method that is herein termed an inter-peer latency detection procedure. This procedure is described in detail below with reference to
The inter-peer latency detection procedure can additionally result in data indicative of a generation time (e.g. by processor 210) of a calibration sound that was played by a speaker subsystem 240. Generation time (e.g. as generated by clock subsystem 270) on a first device of a sound played and subsequently received at a particular peer microphone-equipped playback device is herein denoted as TFirst->peer.
Optionally: processing circuitry 200 (e.g. delay calibration module 250) can perform (330) additional inter-peer latency detection procedures with additional peer microphone-equipped playback devices. Each additional performance of the procedure can result in another RPeer->first value and corresponding TFirst->peer value for the respective peer microphone-equipped playback device. It is noted inter-peer latency detection need not be carried out separately for each peer, and that methods can simultaneously perform inter-peer latency detection to multiple peers, as described below with reference to
Processing circuitry 200 (e.g. delay calibration module 250) can next receive (340) an audio playback delay value derivative of data resulting from the detection procedures. In some embodiments, a central server communicates with each microphone-equipped playback device to receive measured calibration data, and then computes audio playback delay values which it then transmits back to the microphone-equipped playback devices. Details of this procedure are described below, with reference to
In some embodiments, the audio playback delay value is in accordance with a calculated “listener position propagation differential” e.g. a calculated difference in the time required for egress delay and sound propagation from the current speaker to the listener position and time required for egress delay and sound propagation from a peer speaker to the listener position.
By way of non-limiting example: in a scenario of playing streaming audio over two microphone-equipped playback devices, it might be calculated that the left channel microphone-equipped playback device has a delay of 10 ms from generation of sound by a processor until reception of the sound at the listener position 100 (this delay can include egress delay such as DAC delay etc., sound propagation delay, etc.). Similarly, it might be calculated that the right channel microphone-equipped playback device has a delay of 12 ms from generation of sound by a processor until reception of the sound at the listener position 100.
In this example scenario, the left-channel microphone-equipped playback device can be configured to delay audio output for 2 ms (i.e. the listener position propagation differential)—thus synchronizing sound arrival at the listener position 100.
Alternatively, in this example scenario, the right-channel microphone-equipped playback device can be configured to delay audio output for 1 ms, and the left-channel microphone-equipped playback device can be correspondingly configured to delay audio output for 3 ms (i.e. in accordance with the listener position propagation differential)—thus synchronizing sound arrival at the listener position 100
Optionally: Processing circuitry 200 (e.g. delay calibration module 250) can also receive (350) an audio playback gain adjustment that is derivative of data resulting from the detection procedures, as described below with reference to
It is noted that the teachings of the presently disclosed subject matter are not bound by the flow diagram illustrated in
Attention is now directed to
Processing circuitry 200 (e.g. audio playback delay module 260) can receive (410) a digital audio segment from e.g. a network-based media server. The digital audio segment can arrive at microphone-equipped playback device via e.g. network interface 225. The digital audio segment can be in any compressed or uncompressed digital audio format.
Processing circuitry 200 (e.g. audio playback delay module 260) can delay (420) before playing the digital audio (for example on speaker subsystem 240), in accordance with—for example—a received or calculated audio playback delay value. The delaying can be performed by buffering the digital audio data, instructing speaker subsystem 240 to perform the delay, or other techniques known in the art.
Optionally: processing circuitry 200 (e.g. gain module 265) can also adjust (430) the gain of the audio (for example: before playback on speaker subsystem 240 or by instructing speaker subsystem 240 to perform the adjustment, or other suitable methods) in accordance with a received or calculated gain adjustment.
Following delay and optional gain adjustment, processing circuitry 200 (e.g. speaker subsystem 240) can play (440) the audio segment.
By way of non-limiting example: Two microphone-equipped playback devices can be receiving a stream of music from a server of an internet-based streaming music service. Upon receiving a segment of audio, one microphone-equipped playback device can delay it by a received value (e.g. 2 ms) and adjust the gain by a received value (e.g. 6 dB) before playback. After playback, the sound can reach the listener position with the same timing and loudness as its peer microphone-equipped playback devices—resulting in an enhanced listening experience in comparison to unsynchronized or non-gain adjusted listening.
As described above with reference to
As will be described in more detail below, a listener position propagation differential can be derivative of, at least:
It is noted that the teachings of the presently disclosed subject matter are not bound by the flow diagram illustrated in
Attention is now directed to
A user or device at the listener-position 100 can generate (510) an inbound calibration sound 520. This can be a e.g. a user uttering a calibration phrase (e.g. “calibrate”), a smartphone app generating a particular type of sound, etc.
Processing circuitry 200 of each microphone-equipped playback device 110a 110b 110c 110d can receive (e.g. at microphone subsystem 230) the inbound calibration sound 520, detect (e.g. at delay calibration module 250) that listener-position-generated calibration sound has been received (e.g. by detecting data indicative of the listener-position-generated calibration sound), and note (e.g. at delay calibration module 250) the time of arrival (e.g. in accordance with a time provided by clock subsystem 270).
The detection of the calibration sound can be performed—for example—by utilizing a speech-to-text module.
In some such embodiments, the processing circuitry 200 does not in all cases detect the listener-position-generated calibration sound or its arrival time. In some such embodiments, when the listener-position-generated calibration sound is identified on a first device using e.g. a speech to text module, the processing circuitry 200 of a first microphone-equipped playback device requests a recording of the recent seconds of received audio from each of the peer microphone-equipped playback devices. Then, using—for example—gcc-phat (general cross correlation phase transform algorithm which is an advanced cross-correlation algorithm), the processing circuitry 200 of the first device compares the calibration sound location in all each recording of the peer devices to the calibration sound location in the recording of the first, and the time differences between each peer device to the first device can be calculated.
It is noted that the delay between the origination of the calibration sound at the listener position and the detection of the sound at a processing circuitry (eg. delay calibration module 250) can include several components such as: the distance-dependent sound propagation delay, and the ingress delay of the microphone-equipped playback device etc. It is further noted that the ingress delay can include the time necessary for analog-to-digital conversion and other delays.
In some embodiments, processing circuitry (eg. delay calibration module 250) can also detect the loudness of the calibration sound.
It is noted that the teachings of the presently disclosed subject matter are not bound by the flow diagram illustrated in
Attention is now directed to
Processing circuitry 200 (e.g. delay calibration module 250) of peer microphone-equipped playback device 110b can generate (610) a calibration sound 605a. This can be e.g music that is played at the time, “pink noise” etc.
Processing circuitry 200 of microphone-equipped playback device 110a can receive (620) (e.g. at microphone subsystem 230) peer-generated calibration sound 605a, detect (e.g. at delay calibration module 250) that peer-generated calibration sound 605a has been received (e.g. by receiving data indicative of the calibration signal), and note (e.g. at delay calibration module 250) the time of arrival (e.g. in accordance with a time provided by clock subsystem 270).
It is noted that the delay between the generation of the calibration sound at the generating microphone-equipped playback device 110b and the noting of the time of sound arrival at the processing circuitry 200 (e.g. at delay calibration module 250) of receiving microphone-equipped playback device 110a can include several components including: egress delay from the peer microphone-equipped speaker device, the distance-dependent sound propagation delay, and the ingress delay of the microphone-equipped speaker device. It is further noted that the egress delay can include the time necessary for digital-to-analog conversion and other delays, and that ingress delay can include the time necessary for analog-to-digital conversion and other delays.
It is further noted that—in some embodiments—the time of generation of the calibration sound at microphone-equipped playback device 110b is the time of generation at its processing circuitry 200 (e.g. at delay calibration module 250).
Similarly, microphone-equipped playback device 110a can generate (630) a calibration sound 605b and note the transmission time. Calibration sound 605b can then be received at peer microphone-equipped playback device 110b, which can detect the calibration sound and note the arrival time. It is noted that—in some embodiments—the time of arrival of the calibration sound at receiving microphone-equipped playback device 110b is its time of reception at its processing circuitry 200 (e.g. at delay calibration module 250).
Additional peer microphone-equipped playback devices can also receive calibration sound 605b, detect that calibration sound 605b has been received, and note the time of arrival.
It is noted that various mechanisms (e.g. network-based messaging) can be used to ensure that the microphone-equipped playback devices do not simultaneously generate calibration sounds.
It is noted that the teachings of the presently disclosed subject matter are not bound by the flow diagram illustrated in
Attention is now directed to
In some embodiments, the method is performed by a central server which receives calibration measurements (timing and loudness) from each microphone-equipped speaker device. In some such embodiments, this server can be colocated in one of the microphone-equipped playback devices (e.g delay calibration module 250). In some embodiments, the method can be implemented in a distributed manner utilizing multiple servers and microphone-equipped playback devices.
For clarity, the following description addresses a case of synchronizing two microphone-equipped playback devices. In scenarios involving more than two microphone-equipped playback devices, the method can be performed repeatedly—for example between a first microphone-equipped playback device and each peer microphone-equipped playback device.
Processing circuitry 200 (e.g delay calibration module 250) can determine (710)—for a pair of microphone-equipped playback devices—a value herein termed a listener position inbound sound reception differential, for example as described below with reference to
Processing circuitry 200 (e.g delay calibration module 250) can determine (720)—for the pair of microphone-equipped playback devices—a value herein termed a inter-peer sound latency differential, for example as described below with reference to
Processing circuitry 200 (e.g delay calibration module 250) can determine (720)—for a pair of microphone-equipped playback devices—a value herein termed listener position propagation differential, for example by subtracting the listener position inbound sound reception differential from the inter-peer sound latency differential.
It is noted that in some embodiments, the inter-peer sound latency differential is in accordance with the expression:
D1−EgressLatency1+PeerSoundPropagationLatency+IngressLatency2−EgressLatency2−PeerSoundPropagationLatency−IngressLatency1
where PeerSoundPropagationLatency refers to the sound propagation delay from one peer to the other (and in which the propagation latencies are assumed to be the same), and where EgressLatency1 refers to the egress latency for the first device etc.
Similarly it is noted that in some embodiments, the listener position inbound sound reception differential is in accordance with the expression:
D2=IngressLatency1+LPSoundPropagationLatency1−IngressLatency2−LPSoundPropagationLatency2
Consequently, in some embodiments, D1−D2 is in accordance with the expression:
(EgressLatency1−EgressLatency1)+(LPSoundPropagationLatency1−LPSoundPropagationLatency2)
and is thus indicative of the difference in egress delay and propagation delay to the listener position 200.
It is noted that this calculation also compensates for any deviation between the clocks of the two microphone-equipped playback devices—so that the clocks need not be synchronized.
Processing circuitry 200 (e.g delay calibration module 250) can then provide (740) an audio playback delay value to one or more microphone-equipped playback devices in accordance with the listener position propagation differential, to enable synchronized sound arrival at the listener position 100, as described above with reference to
Optionally: processing circuitry 200 (e.g delay calibration module 250) can then provide (740) a gain adjustment to one or more microphone-equipped playback devices. The gain adjustment can be derivative of:
a) a loudness of the first calibration sound detected by the processor, and
b) a loudness of the first calibration sound detected by the second microphone-equipped playback device.
In some embodiments, the gain adjustment is in accordance with (for example: equal to) a ratio between the loudness of the first calibration sound detected by the processor, and the loudness of the first calibration sound detected by the second microphone-equipped playback device.
It is noted that the teachings of the presently disclosed subject matter are not bound by the flow diagram illustrated in
Attention is now directed to
In some embodiments, the method is performed by a central server which receives calibration measurements from each microphone-equipped speaker device. In some such embodiments, this server can be colocated in one of the microphone-equipped playback devices (e.g delay calibration module 250). In some embodiments, the method can be implemented in a distributed manner utilizing multiple servers and microphone-equipped playback devices.
For clarity, the following description addresses a case of synchronizing two microphone-equipped playback devices.
Processing circuitry 200 (e.g delay calibration module 250) can receive (810) data indicative of reception time of the inter-peer delay calibration sound at a first device.
Processing circuitry 200 (e.g delay calibration module 250) can receive (820) data indicative of generation time of the inter-peer delay calibration sound at a peer device.
Processing circuitry 200 (e.g delay calibration module 250) can subtract (830) the peer device transmission time from the first device reception time, resulting in a value indicative of the time between the peer generation of the calibration sound and the processor detection of the sound i.e. “inter-peer sound latency”
It is noted that the teachings of the presently disclosed subject matter are not bound by the flow diagram illustrated in
Attention is now directed to
In some embodiments, the method is performed by a central server which receives calibration measurements from each microphone-equipped speaker device. In some such embodiments, this server can be colocated in one of the microphone-equipped playback devices(e.g delay calibration module 250). In some embodiments, the method can be implemented in a distributed manner utilizing multiple servers and microphone-equipped playback devices.
For clarity, the following description addresses a case of synchronizing two microphone-equipped playback devices.
Processing circuitry 200 (e.g. delay calibration module 250) can receive (910) Tpeer->first from the peer microphone-equipped playback device and Rpeer->first from the first microphone-equipped playback device.
Processing circuitry 200 (e.g. delay calibration module 250) can subtract (920) Tpeer->first from Rpeer->first resulting in the inter-peer sound latency to the first microphone-equipped playback device from the particular peer (i.e. Lpeer->first).
Processing circuitry 200 (e.g. delay calibration module 250) can receive (930) Tfirst->peer from the first microphone-equipped playback device and Rfirst->peer from the peer playback device.
Processing circuitry 200 (e.g. delay calibration module 250) can subtract (940) Tfirst->peer from Rfirst->peer —resulting in the inter-peer sound latency to the peer microphone-equipped playback device from the first device (i.e. Lfirst->peer).
Processing circuitry 200 (e.g. delay calibration module 250) can subtract (950) Lfirst->peer from Lpeer->first resulting in an inter-peer sound latency differential.
It is noted that the teachings of the presently disclosed subject matter are not bound by the flow diagram illustrated in
Attention is now directed to
In some embodiments, the method is performed by a central server which receives calibration measurements from each microphone-equipped speaker device. In some such embodiments, this server can be colocated in one of the microphone-equipped playback devices (e.g delay calibration module 250). In some embodiments, the method can be implemented in a distributed manner utilizing multiple servers and microphone-equipped playback devices.
For clarity, the following description addresses a case of synchronizing two microphone-equipped playback devices.
Processing circuitry 200 (e.g. delay calibration module 250) can receive (1010) RLP->peer from the peer microphone-equipped playback device and RLP->first from the first microphone-equipped playback devices.
Processing circuitry 200 (e.g. delay calibration module 250) can subtract (1020) RLP->peer from RLP->first —resulting in the listener position inbound sound reception differential.
It is noted that the teachings of the presently disclosed subject matter are not bound by the flow diagram illustrated in
It is to be understood that the invention is not limited in its application to the details set forth in the description contained herein or illustrated in the drawings. The invention is capable of other embodiments and of being practiced and carried out in various ways. Hence, it is to be understood that the phraseology and terminology employed herein are for the purpose of description and should not be regarded as limiting. As such, those skilled in the art will appreciate that the conception upon which this disclosure is based may readily be utilized as a basis for designing other structures, methods, and systems for carrying out the several purposes of the presently disclosed subject matter.
It will also be understood that the system according to the invention may be, at least partly, implemented on a suitably programmed computer. Likewise, the invention contemplates a computer program being readable by a computer for executing the method of the invention. The invention further contemplates a non-transitory computer-readable memory tangibly embodying a program of instructions executable by the computer for executing the method of the invention.
Those skilled in the art will readily appreciate that various modifications and changes can be applied to the embodiments of the invention as hereinbefore described without departing from its scope, defined in and by the appended claims.
Number | Date | Country | |
---|---|---|---|
62913183 | Oct 2019 | US |