The present invention relates to a method of synchronized audio over a network. and in particular to a method of synchronized audio with low latency and low jitter over a network.
The invention has been developed primarily for use with audio signals transmitted between distant and/or different types of audio receiver/transmission systems streamed under one medium and then transmitted via ethernet to be played by a receiver/player and will be described hereinafter with reference to this application. However, it will be appreciated that the invention is not limited to this particular field of use.
There is a fundamental problem in maintaining synchronized audio clocks between a transmitted audio stream and a received audio stream over a network. Accurate clock synchronization is required to provide low jitter and low latency of an audio stream transmitted over the network.
As shown in
In another form it is required to have the transmission TX connected directly to the receiver RX without the use of a connecting network.
It is known to approach these problems by use of a method of facilitating clock synchronization over networks using PTP (Precision Time Protocol).
The PTP solution can include having a SYNC message and a DELAY message of a PTP clock synchronization cycle being carried by different redundant networks, and adjusting a timestamp associated with one of the messages to emulate transfer of the SYNC and the DELAY messages as if by the same redundant network.
There can be various ways of facilitating PTP clock synchronization. A network need not be a redundant (secondary) network for PTP to function for carrying messages from a slave clock to a master clock.
This requirement for linking slave clock and master clock or other uses of different forms of a “global clock” is an added complexity and further a limitation to operation in many streaming situations.
It can be seen that known prior art methods of streaming of synchronized audio over a network has the problems of:
The present invention seeks to provide a method of synchronized audio over a network, which will overcome or substantially ameliorate at least one or more of the deficiencies of the prior art, or to at least provide an alternative.
It is to be understood that, if any prior art information is referred to herein, such reference does not constitute an admission that the information forms part of the common general knowledge in the art, in Australia or any other country.
According to a first aspect of the present invention, there is provided a method of synchronized audio over a network using asynchronous clock reconstruction from audio sources including the steps of:
The audio data transmission channel can be a wireless channel such as WiFi or a wired channel such as ethernet.
In a further aspect of the invention there is provided a method of streaming synchronized audio over a network using asynchronous clock reconstruction from the sourcing audio including the steps of:
The processing received transmitted sourced audio is at the changeable receiver frequency.
The changeable receiver frequency is determined from the known source frequency and an observation of the IP packets of the received transmitted sourced audio in the buffer of the receiving audio processor.
The observation of the IP packets of the received transmitted sourced audio in the buffer of the receiving audio processor can be an indirect frequency correction of the known source frequency. This indirect frequency correction in one form includes observing a position of the buffer and counting the number of IP packets in the buffer to that point and particularly observing a particular value such as 50% of the size of PBS (Packet Buffer with Size) such that it can be determined that the corrected frequency is higher or lower than the known source frequency whereby the frequency of streaming is indirectly provided by the observation.
The observation of the IP packets of the received transmitted sourced audio in the buffer of the receiving audio processor can be a direct frequency correction of the known source frequency. This direct frequency correction in one form can include monitoring a particular marking on the sourcing audio and maintaining observation of the frequency of observation of consecutive markings. The direct frequency correction can include timestamps included in received packets which delivers the information on departure times of packets such that the corresponding arrival time is measured with a receiver clock whose frequency is thereby directly determinable. It can include timestamps, but may also be done without using received packet timestamps and instead from measuring arrival time without timestamps.
The observation of the IP packets of the received transmitted sourced audio in the buffer of the receiving audio processor can be a combination of direct and indirect frequency correction of the known source frequency.
For emergency actions, watermarks can be created near the high and low end of the IP packet buffer, and if the number of IP packets is detected to be near the watermarks then a more severe emergency frequency correction is undertaken so as to allow for emergency recovery back to the predetermined correct position such as at 50% of PBS.
In a further aspect of the invention there is provided a method of streaming synchronized audio over a network using asynchronous clock reconstruction from the sourcing audio including the steps of:
The processing received transmitted sourced audio is at the changeable receiver frequency.
The changeable receiver frequency is determined from the known source frequency and an observation of the IP packets of the received transmitted sourced audio in the buffer of the plurality of receiving audio processors.
The observation of the IP packets of the received transmitted sourced audio in the buffers of the plurality of receiving audio processors can be an indirect frequency correction of the known source frequency. This indirect frequency correction in one form includes observing a position of the buffers and counting the number of IP packets in each buffers to that point and particularly observing a particular value such as 50% of PBS such that it can be determined that the corrected frequency is higher or lower than the known source frequency whereby the frequency of streaming is indirectly provided by the observation.
The observation of the IP packets of the received transmitted sourced audio in the buffers of the plurality of receiving audio processors can be a direct frequency correction of the known source frequency. This direct frequency correction in one form can include monitoring a particular marking on the sourcing audio and maintaining observation of the frequency of observation of consecutive markings. The direct frequency correction can include timestamps included in received packets which delivers the information on departure times of packets such that the corresponding arrival time is measured with a receiver clock whose frequency is thereby directly determinable. It can include timestamps, but may also be done without using received packet timestamps and instead from measuring arrival time without timestamps.
The observation of the IP packets of the received transmitted sourced audio in the buffers of the plurality of receiving audio processors can be a combination of direct and indirect frequency correction of the known source frequency.
For emergency actions, watermarks can be created near the high and low end of the IP packet buffers, and if the number of IP packets is detected to be near the watermarks then a more severe emergency frequency correction is undertaken so as to allow for emergency recovery back to the predetermined correct position such as at 50% of PBS.
The invention provides a method of synchronized audio over a network wherein for minimising time variation, or drift over time of network packets at each receiver for uni-cast traffic and when there is no global clock in a network having multiple receivers, a round robin mode can be used which averages out delays related to the ordering and timing of transmitted packets and how network switches route these packets in hardware.
Packets can have specific destination Network addresses PD are sent by transmitter A through the Transmission Network to Receiver's C,D,E, up to receiver N each having their own destination Network address.
The packets can be addressed in the round robin order of:
Also when there is no global clock in a network having multiple receivers, a Ring Mode can is used separately or in combination wherein the timing of when each packet is transmitted is made more precise (and therefore reduces drift and timing variation in received packets) by staggering the sending time at equally spaced intervals per clock.
In another aspect of the invention the method of synchronized audio over an audio data transmission network can be achieved without a global clock by a time correction of audio in transmissible packets including the steps of:
In another aspect of the invention the method of synchronized audio over an audio data transmission network can be achieved without a global clock by a time correction of audio in transmissible packets including the steps of:
It can be seen that the invention of a method of synchronized audio over a network provides the benefit of it being possible to transmit high and ultra-high resolution audio across a network without the use of global clocking. Sending packets of audio over a network in real time requires the audio clock to be recovered at the receiving side.
Bound by real-world constraints (100M or 1G, 5G or 10G wired-ethernet, 8 channel 192 kHz audio, latency <10 mS), it is possible to overcome the requirement for global clocking by recovering the clocking information from audio data itself.
Achieving this is a major breakthrough.
Developing in this area does not require any complex global network synchronization mechanisms but now enables a multitude of use cases which were not possible with existing standards.
Other aspects of the invention are also disclosed.
Notwithstanding any other forms which may fall within the scope of the present invention, a preferred embodiment of the invention will now be described, by way of example only, with reference to the accompanying drawings in which:
It should be noted in the following description that like or the same reference numerals in different embodiments denote the same or similar features.
Referring to the drawings there is shown an asynchronous clock reconstruction from audio sources. In one form the method uses source clock frequency recovery (SCFR) in packet networks.
In order to recover the source clock frequency without a common reference clock or any reference period in packet streams, transmitter TX receives audio from the audio inputs and prepares the next IP packet of audio to be streamed. The IP packets are transmitted using UDP protocol (and UDP packet may be further wrapped inside RTP header). Each IP packet contains both a header (which can be of 20 or 24 bytes long) and data (variable length). The header includes the IP addresses of the source and destination, plus other fields that help to route the packet. The audio data is the actual IP packet content (also known as the payload).
The transmitter sends the IP packets at a regular and accurate packet rate (PR) which is related to the audio input clock rate (SR).
It should be noted that the UDP packet may be sent by many means such as (but not limited to) Unicast and Multicast methods.
The IP packets propagate through the network to the receiver (RX).
At the receiver the IP packets are processed at the same rate (SR).
The output sample rate to the outputs of speakers or other self-contained audio systems such as a multichannel Digital Signal Processor (DSP) Amplifiers, networked audio receivers and stereo amplifier receivers feeding to own speaker network.
In another embodiment of the invention as shown in
The transmitter 230 sends the IP packets at a regular and accurate packet rate (PR) which is related to the audio input clock rate (SR).
The IP packets propagate through the network 135 to a plurality of receivers 235 in such a way that, the sequence of the plurality of receivers receiving the IP packets changes in a cyclical order per IP packet until the last IP packet has been distributed.
At the plurality of receivers the IP packets are processed at the same rate (SR).
For distributed multi-channel applications (such as a single receiver at each of many speakers) one IP packet containing multiple audio channels may be transmitted to multiple receivers, with each receiver selecting one or more channels to output from the received packet. While not limited to two channel and as shown in
The output sample rate to the outputs of speakers or other self-contained audio systems such as a multichannel Digital Signal Processor (DSP) Amplifiers, networked audio receivers and stereo amplifier receivers feeding to own speaker network.
Referring to
To minimise this time variation two schemes are presented. Round Robin Mode and Ring Mode. Both modes can be used together or separately. In all cases DMA, zero copy techniques and priority to real-time audio processing are employed in hardware processing. To minimise time variation and drift a round-robin transmission scheme is used.
The round-robin approach averages out delays related to the ordering and timing of transmitted packets and how network switches route these packets in hardware. The result is lower drift and variance over time between receivers. In the case where transmitter A sends the same (or different) audio buffer content to more than one receiver at the same time: Transmitter A is sending packets at regular rate (packet transmit interval) related to the sample rate of the audio input. The order the packets are sent is changed in round-robin packet transmit interval. Refer to Next Packet Algorithm (NPA) for detail of how this works in practice.
Further more for Ring Mode the timing of when each packet is transmitted can be made more precise (and therefore reduce drift and timing variation in received packets) by staggering the sending time at equally spaced intervals per clock.
Refer to Next Packet Algorithm (NPA) for detail of how this works in practice.
Packets with specific destination Network addresses (IP/MAC addresses) P(subscript D) are sent by transmitter A (230) through the Transmission Network (135) to Receiver's C,D,E, up to receiver N (135) each having their own destination Network address
By way of example:
In this scheme the transmitter device should be accurate in its timing of each packet transmission (to multiple receivers) within the packet transmit interval so as to avoid contributing to receiver drift. Techniques should be used in the transmitting device to minimise this, such as; packet construction, duplication and buffering in hardware, interrupt driven dma transfers of the packet data and equally spaced timing of the transmission of each packet within the packet transmit interval.
In an example, but not limited to, the method of sending the IP packet from a transmitter to a plurality of receivers 235 is:
As enumerated in the embodiment above, sending of certain packet from the transmitter 230 at a regular and accurate packet rate which is related to the audio input clock rate makes it possible so that synchronized audio can be stream over a network to a plurality of receivers without the use of a complex system or a single global audio clock. Once an IP packet has been propagated through the network 135, the plurality of receivers 235 will then receive the IP packet. When that IP packet has been distributed to the plurality of receivers, another IP packet will be sent by the transmitter to the network. Since there is only a single transmitter and plurality of receivers, the sequence of the plurality of receivers receiving the IP packets changes in a cyclical order per IP packet until the last IP packet has been distributed. It can be seen in the example above that the sequence of the plurality of receivers changes in cyclical order per IP packet distributed. The change of sequence in cyclical order allows the IP packets to be distributed to the plurality of receivers in equal manner to average out the delays relating to the ordering and timing of packets, resulting to a lower drift and variance between receivers. The steps are repeated until the last IP packet has been distributed.
Referring to
The sample rate (SR) of the audio originating at the audio processor is substantially in the range of 32 kHz to 384 kHz. The hardware clock provides the frequency of the sample rate fSR which is interval divided such the transmissible audio packets is an integer divided frequency of the sample rate (SR). For example a 48 Khz sample rate, the packet rate of the transmissible audio packets is used at 1.5 kHz, 3 kHz or 6 kHz or other integer division such as 750 Hz, 375 Hz etc.
The system uses a Round Robin Mode as detailed above and using this Next Packet Algorithm NPA so as to determine next packet and destination to send PD for N destinations.
In Normal Mode, the Mapping table is fixed sequential mapping to each destination. Round Robin Mode: Mapping table changes (each table entry at index now becomes the entry at index+1, table wrapped about n) after the last Index n is transmitted.
In Ring Mode, when disabled then all (n) PD destinations are sent as soon as possible per fPS interrupt. When enabled then the next PD is sent one per fPS interrupt. fPS rate shall be multiplied by n (DIV is divided by n).
Referring to
The clock reconstruction from the sourcing audio can be an indirect frequency correction such that a characteristic of the sourcing audio is observed over time to see fluctuations and thereby indirectly note the change of frequency by noting the change of characteristic of the streaming.
The source clock frequency recovery SCFR through periodic packet streams is a special case where the constant packet generation interval, assumed to be known at both the sender and the receiver through service specifications, can be used to extract this information instead of timestamps.
The clock reconstruction from the sourcing audio can be a direct frequency correction such that a predefined feature of the sourcing audio is marked and is directly observed over time at the receiver to thereby directly note the change of frequency by noting the predefined marked feature of the sourcing audio in the streaming.
As the IP packets are streamed and received by the receiver there is formed a buffer with a Packet Buffer Size (PBS). By observing a position of the buffer and counting the number of IP packets in the buffer to that point and particularly observing a particular value such as 50% of PBS then the frequency of streaming is indirectly provided by the observation.
It can be assessed if the streaming rate is ahead or behind of the known source frequency. The output frequency of the streaming audio can then be adjusted based on this observation of the PBS by adjusting the processing in the buffer rate to increase or decrease the number of IP packets in the buffer and return the observed point of 50% to match half the Packet Buffer Size (ie PBS/2).
This is the same as keeping the error at zero in a control loop. The control loop can consist of PID, PI, P controller architecture, or could consist of fixed rate changes above and below the 50% mark.
The IP Packets can be directly observed to determine frequency by monitoring a particular marking on the sourcing audio and maintaining observation of the frequency of observation of consecutive markings. In one form this is by use of a timestamp.
Timestamps are included in received packets which deliver the information on departure times of packets. A packet timestamp is generated by the source clock whose frequency is known. The corresponding arrival time is measured with a receiver clock whose frequency is known. The arrival time includes a packet delay measured in the receiver clock whose true value is unknown.
The arrival times, the timestamps, and the packet delays are modeled by linear regression where not only frequency ratio but also phase difference between the clocks are to be estimated. Because we need to estimate only the ratio for SCFR, a linear regression by subtracting initial values from arrival times, timestamps, and packet delays.
The output frequency (and phase relationship with source audio) can be directly altered to match the detected Direct Frequency Correction provided by the timestamp.
It can be understood that a combination of direct and indirect frequency correction methods can be used.
Referring to
The watermarks are created near the high and low end of the IP packet buffer, as shown in
Referring to
The audio transmission channel is a wireless or an ethernet channel, wherein the processing received transmitted sourced audio is transmittable to a plurality of audio outputting means at a changeable receiver frequency. The changeable receiver frequency is determined from the known source frequency and an observation of the IP packets of the plurality of received transmitted sourced audio in a plurality of buffers of the plurality of receiving audio processors.
Referring to
The predefined input framework includes one or more categories of networked receiver/transmission audio devices selected from:
The transmission is through WiFi or Ethernet Network.
Other variations understood by a person skilled in the art are included within the scope of this invention.
Reference throughout this specification to “one embodiment” or “an embodiment” means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, appearances of the phrases “in one embodiment” or “in an embodiment” in various places throughout this specification are not necessarily all referring to the same embodiment, but may. Furthermore, the particular features, structures or characteristics may be combined in any suitable manner, as would be apparent to one of ordinary skill in the art from this disclosure, in one or more embodiments.
Similarly it should be appreciated that in the above description of example embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. This method of disclosure, however, is not to be interpreted as reflecting an intention that the claimed invention requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the Detailed Description of Specific Embodiments are hereby expressly incorporated into this Detailed Description of Specific Embodiments, with each claim standing on its own as a separate embodiment of this invention.
Furthermore, while some embodiments described herein include some but not other features included in other embodiments, combinations of features of different embodiments are meant to be within the scope of the invention, and form different embodiments, as would be understood by those in the art. For example, in the following claims, any of the claimed embodiments can be used in any combination.
As used herein, unless otherwise specified the use of the ordinal adjectives “first”, “second”, “third”, etc., to describe a common object, merely indicate that different instances of like objects are being referred to, and are not intended to imply that the objects so described must be in a given sequence, either temporally, spatially, in ranking, or in any other manner.
In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the invention may be practiced without these specific details. In other instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
In describing the preferred embodiment of the invention illustrated in the drawings, specific terminology will be resorted to for the sake of clarity. However, the invention is not intended to be limited to the specific terms so selected, and it is to be understood that each specific term includes all technical equivalents which operate in a similar manner to accomplish a similar technical purpose. Terms such as “forward”, “rearward”, “radially”, “peripherally”, “upwardly”, “downwardly”, and the like are used as words of convenience to provide reference points and are not to be construed as limiting terms.
In the claims which follow and in the preceding description of the invention, except where the context requires otherwise due to express language or necessary implication, the word “comprise” or variations such as “comprises” or “comprising” are used in an inclusive sense, i.e. to specify the presence of the stated features but not to preclude the presence or addition of further features in various embodiments of the invention.
Any one of the terms: including or which includes or that includes as used herein is also an open term that also means including at least the elements/features that follow the term, but not excluding others. Thus, including is synonymous with and means comprising.
Thus, while there has been described what are believed to be the preferred embodiments of the invention, those skilled in the art will recognize that other and further modifications may be made thereto without departing from the spirit of the invention, and it is intended to claim all such changes and modifications as fall within the scope of the invention. For example, any formulas given above are merely representative of procedures that may be used. Functionality may be added or deleted from the block diagrams and operations may be interchanged among functional blocks. Steps may be added or deleted to methods described within the scope of the present invention.
Although the invention has been described with reference to specific examples, it will be appreciated by those skilled in the art that the invention may be embodied in many other forms.
It is apparent from the above, that the arrangements described are applicable to the audio streaming industries.
Number | Date | Country | Kind |
---|---|---|---|
2022902628 | Sep 2022 | AU | national |
This Patent Application is a Continuation of co-pending Australian PCT Patent Application No. PCT/AU2023/050879, filed Sep. 12, 2023, which designated the United States and is now pending. This Patent Application and Australian PCT Patent Application No. PCT/AU2023/050879 claim priority to Australian Patent Application No. 2022902628, filed Sep. 12, 2022. The entire teachings and disclosure each application are incorporated herein by reference thereto.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/AU2023/050879 | Sep 2023 | WO |
Child | 19077838 | US |