The present invention generally relates to a wireless conference system as used in a conference room to enable multiple users to participate to a conference. Such conference system typically comprises a central access point under control of the host or chairperson of the conference, and multiple conference units for the users installed on or integrated in the user desks present in the conference room. The present invention more particularly concerns the latency sensitive, bi-directional audio transmission between the conference units and the access point in such conference system.
Conference rooms are often equipped with conference systems enabling a large number of users, for instance tens up to hundreds of users, to participate to a single conference. A wireless conference system typically comprises a central access point and multiple conference units coupled to the central access point. A separate conference unit may for instance be provided for each user, or a conference unit may be shared by two users at neighbouring seats in the conference room. Each conference unit typically has a microphone connector enabling to connect a microphone, for instance a gooseneck microphone, a built-in loudspeaker, and one or several headphone connectors enabling to connect headphones. Each conference unit further has a controller, a data processor and a transceiver configured for wireless, bi-directional data transfer with a similar transceiver in the central access point. The transceiver of a conference unit comprises a transmitter able to upstream transmit digital data packets to the receiver that forms part of the transceiver in the central access point. Similarly, the transceiver of the central access point comprises a transmitter able to downstream transmit digital data packets to the receiver(s) that form(s) part of the transceiver(s) of the conference unit(s). In a typical situation wherein a single participant is speaking while the other participants are listening, the audio captured by the microphone of the conference unit as a result of a person speaking, is digitized, packetized, and upstream transmitted to the central access point. The access point processes the received audio packets, for example mixing with other audio input sources, and downstream distributes the processed audio to all other conference units in the meeting room wherein the receiver shall receive the audio and the processor shall process the received audio packets for playout via headphones. This includes downstream transmission of the audio to the originating conference unit where the audio packets were received from such that the speaking person can hear his/her own speech via headphones.
The wireless, bidirectional communication of audio between conference units and an access point is subject to low latency requirements, sometimes called real-time requirements. Depending on the applicable quality norm, the end-to-end latency on the round trip time (RTT), i.e. the maximum delay for an audio packet to travel back and forth between a conference unit and the central access point including all processing in the conference unit and central access point, in conference systems may for instance be limited to a value in the range from 10 milliseconds up to 30 milliseconds, for example 15 milliseconds. An audio packet that arrives later than the maximum acceptable delay will not be used for playout. When an audio packet does not arrive in time (packet loss), the lost packet will be replaced by a packet that is determined through a packet loss concealment algorithm to avoid audible artefacts, noticeable by the conference participant(s) whose conference unit did not receive the audio packet in time. In order to enable such packet loss concealment algorithm to timely generate the replacement packet, it is important that packet loss is detected as early as possible.
In an example conference system, the audio packets are transferred in timeslots of a TDMA frame over a Wi-Fi connection between the conference unit and central access point. The TDMA frame may for instance have a length of 5 milliseconds subdivided in 10 timeslots each having a length of 500 microseconds. The conference unit shall capture 5 milliseconds of audio, typically speech by a conference participant, digitize the 5 milliseconds of audio into a digital audio packet, and transmit this digital audio packet in a single timeslot of a TDMA frame. The packet generation in this example already introduces 5 milliseconds of delay. In case soft TDMA is applied to avoid interference with other users of the channel, i.e. a TDMA scheme wherein timeslots are fixedly assigned to transceivers but wherein transceivers listen to the channel for interfering traffic within the timeslot to determine if the channel is free before transmitting data in such timeslot, the soft TDMA introduces a variable delay or jitter, the so-called Listen-Before-Talk jitter or LBT jitter. The interrupt mechanisms that control the data packet processing at the transmitter side and receiver side, the packet and audio processing time the propagation time of the data packet through the wireless channel, and any synchronization inaccuracies between transmitter and receiver introduce further delays that, in combination with LBT jitter may amount up to 2 milliseconds for single direction transmission. The single direction latency experienced by an audio packet in a conference system implementing a soft TDMA scheme with TDMA frame length of 5 milliseconds may thus amount up to 7 milliseconds.
A straightforward way to detect packet loss in order to initiate packet loss concealment relies on sequence numbering of the digital audio packets. A packet is then assumed lost when the audio packet with a following sequence number is received by the receiver. When relying on sequence numbering, the earliest moment whereon loss of a data packet can be detected however is the point in time whereon the next data packet is received. In order to generate and transmit the next audio packet, at least another 5 milliseconds will have lapsed in the above example with TDMA frames of 5 milliseconds, leaving insufficient time for the receiver to apply packet loss concealment and generate a replacement packet for a previous, lost audio packet.
A solution to enhance the detection of packet loss relying on sequence numbering of audio packets, could be found in reducing the TDMA frame length and consequently in reducing the length of an audio segment embedded in a single audio packet. Reducing the packet length however has a negative impact on the capacity of the wireless channel. The overhead/payload ratio increases as a result thereof as it may be assumed that the overhead per packet remains constant, and consequently the effective capacity of the wireless link shall decrease. In addition, the packet processing speed must increase when packets are shortened, requiring more expensive processors.
United States Patent Application US 2015/0201289 A1, entitled “Method and Apparatus for Rendering Audio in Wireless Hearing Instruments”, recognizes the problem of audio packet loss over a wireless communication link in a different application, more precisely the application of hearing instruments worn on one or both sides of a person's head to assist a patient suffering hearing loss. As is described in par. [0030] of US 2015/0201289 A1, packets may be resent from the audio source device in an effort to improve the overall packet error rate performance of the wireless link. The retransmissions happen as a result of the audio source device failing to receive an acknowledgment from the audio sink device, or they can be sent unconditionally with a number of retransmissions. The suggestion to send acknowledgements and retransmissions indicates that the latency constraints in the application of hearing instruments are less stringent than in conference systems. The quality constraints of conference systems, as illustrated by the above example, do not allow retransmissions. Paragraphs [0029] and [0031] of US 2015/0201289 A1 further describes that link layer information shared between the radio of the wireless communication circuit and the DSP (Digital Signal Processor) of the processing circuit can be used to determine a packet concealment strategy. The TDMA mechanism deployed by the radio has inherent good timing mechanisms that allow to schedule packet arrivals. In case a scheduled receiving event takes place without a received packet, the radio will inform the DSP of the missing packet to allow the DSP to insert a packet loss concealment frame of information.
United States Patent Application US 2019/0104423 A1, entitled “Ultra-Low Latency Audio over Bluetooth” recognises the problem of low latency requirements for audio packets being transferred over a wireless connection in a different application, more precisely the application of audio-over-Bluetooth transfer between a device and a wireless headset or wireless ear buds. Also US 2019/0104423 A1 relies on acknowledgements and retransmissions, and teaches to reduce the latency for wireless audio packet transmission by enhancing the acknowledgement through combining BTC (Bluetooth Classic) packets and BLTE (Bluetooth Low Energy) packets within a single Bluetooth frame, by limiting the number of retransmissions and packet concealments per frame cycle to an upper limit, and by using time-efficient audio coding and decoding implementing FEC (Forward Error Correction) such as RS (Reed-Solomon). This is for instance described in paragraph [0005] of US 2019/0104423 A1.
It is an object of the present invention to disclose embodiments of a conference system that resolve or mitigate one or several of the above-mentioned drawbacks of existing solutions. More particularly, it is an object of the present invention to disclose embodiments of a conference system wherein audio packet loss as a result of wireless transfer between a conference unit and central access point or vice versa is detected faster thereby enabling faster initiation of packet concealment to allow the conference system to meet latency requirements and quality standards applicable for conference systems. It is a further object of the present invention to disclose such embodiments of a conference system without negatively affecting the wireless link's effective bandwidth.
According to embodiments of the invention, the above-defined object is achieved by the wireless conference system adapted to enable a plurality of users to participate to a conference in a conference room, the wireless conference system comprising an access point and a plurality of conference units,
Thus, according to embodiments of the invention, the access point and one or more conference units are equipped with clocks or timers that are actively synchronized with high accuracy. As a consequence thereof, the receiver is aware of the transmit time of a data packet and the receiver can determine the expected arrival time of a data packet. As a result of the actively synchronized clocks, all transceivers know the start and end times of the TDMA frames and timeslots within these TDMA frames, up to some clock synchronization tolerance. The receiver consequently can derive from its local clock signal what the expected arrival time of an audio data packet is, namely the transmit time of that audio data packet plus an expected transmission delay. The transmit time is known from a synchronisation clock used by the transmitter and receiver for the TDMA based wireless communication. Indeed, each conference unit knows the TDMA schedule and its synchronisation clock allows to derive where they are in the TDMA frame. The expected transmission delay accounts for the overall time required for interrupt handling at the transmitter and receiver side, propagation of the audio data packet over the wireless link, jitter of various nature, and inaccuracies of various nature, and consequently represents an upper limit for the overall time between the data packet processor at the transmitter's side releasing the audio data packet for transmission and the data packet processor at the receiver's side receiving that same audio data packet for processing. If no data packet is received by the expected arrival time, packet loss concealment is activated at the receiver's side in order to produce a replacement packet.
As the transmission delay, i.e. the overall delay due to propagation, jitter, interrupt handling, processing and synchronization inaccuracies, in conference systems is in the range of a few milliseconds, typically between 1 and 3 milliseconds for a single direction, the packet loss concealment in embodiments according to the invention can be initiated much faster than in known conference systems that rely on acknowledgements, packet retransmissions and time-outs at the receiver side for packet loss detection, or conference systems that rely on audio packet numbering and packet loss detection based on missing audio packet numbers at the receiver side. Faster initiation of packet loss concealment allows to reduce the overall end-to-end system latency, and the risk reduces that the packet loss concealment does not timely generate the replacement for insertion and playout of the audio stream, hence resulting in a reduced number of audible artefacts. Furthermore, embodiments of the conference system according to the invention do not require to reduce the TDMA frame length and/or the audio data packet length. Such measure would also enable a faster detection of packet loss and consequently a faster initiation of packet loss concealment at the price of an increased overhead/payload ratio and thus at the price of a reduced effective bandwidth (amount of useful data, i.e. audio samples, transferable per unit of time) of the wireless link.
A conference unit in the context of the current invention comprises any unit installed on or integrated in a user's desk in a conference room. Such conference unit typically comprises a built-in microphone array or an audio input connector like for instance a connector for a gooseneck microphone that can be used by a single user or can be shared between two users when the conference unit is installed in between the seats of two neighbouring users. The conference unit typically also comprises one or plural audio output connectors like a connector for headphones for one or plural users, and typically also comprises a built-in speaker. It is noticed that the audio input connector and audio output connector may also be integrated into a single connector. The conference unit may also comprise a connector for a camera or other sensors, may be equipped with a display and with physical or virtual (i.e. displayed) buttons to control the audio input (e.g. muting the microphone), to control the audio output (e.g. controlling the volume of headphones), to control other sensors, to interact with the chairperson (e.g. request to speak), and/or to serve as voting buttons. The conference unit further may comprise indicators, for instance coloured LEDs indicating to the chairperson, the user of the conference unit, and/or to other conference participants what the status of the conference unit is. The conference unit further has a processor for digitizing and packetizing audio captured by a microphone connected to its audio input connector, and a wireless transmitter for transmitting audio data packets to a central unit, the so-called access point. The conference unit also has a receiver for receiving audio data packets from the central unit, a processor for de-packetizing the received audio data packets and producing an (analogue or digital) audio stream sourced via the audio output connector. The transmitter and receiver jointly form a transceiver for bi-directional communication with the central unit. The processor generating the audio data packets for transmission and the processor processing the received audio data packets may be integrated to form a single physical processor.
An access point in the context of the current invention constitutes a central unit, managed and controlled by a chairperson or conference organisation. The access point provides bi-directional wireless communication with all conference units in the conference room in a TDMA-based manner. The access point thereto comprises a transmitter and receiver that respectively transmit and receive audio packets in timeslots of a TDMA frame. Audio data packets received in different timeslots originate from different conference units. The access point typically has a processor to process the received audio data packets from different conference units, to select or combine (generally process) the audio packets from one or plural conference units into a single audio stream for transmission by its transmitter to the conference units. The access point thereto may receive input from a chairperson who controls the conference and decides at any point in time which conference participants are allowed to speak. In addition to the conference units, the audio stream also may be provided to interpreter units in order to enable an interpreter or translator to upload an interpretation or translation of the audio stream that is further distributed by the central access point to the conference units.
A wireless conference system in the context of the present invention comprises the set of conference units installed in a conference room and the access point where these conference units connect to in a multipoint-to-point fashion through a wireless, bi-directional link which is shared in a TDMA-based manner.
Packet loss concealment in the context of the present invention comprises any algorithm or technology that generates audio samples in replacement for a lost audio packet, i.e. an audio data packet that never arrives at the receiver or that arrives late at the receiver as a result of which it can no longer be processed and timely integrated in the audio stream. Packet loss concealment techniques typically use the recently received audio packet(s) to generate a replacement packet for the lost audio data packet. Packet loss concealment techniques typically strive at avoiding or minimizing audible effects as a result of the replacement, and may for instance be based on frequency or tonality of recent audio samples.
In embodiments of the wireless conference system according to the invention, the access point and the one or more conference units are configured to not acknowledge receipt of audio data packets.
Indeed, preferred embodiments of the wireless conference system according to the invention implement a protocol without receipt acknowledgements for audio data packets, or with receipt acknowledgement being deactivated. Acknowledgements or the absence thereof will trigger retransmissions, but any attempt to recover a lost audio data packet through retransmission will delay the activation of packet loss concealment and therefore increase the overall latency of the wireless conference system.
In embodiments of the wireless conference system according to the invention, the access point and the one or more conference units are configured to not retransmit a lost audio data packet.
Indeed, preferred embodiments of the wireless conference system according to the invention implement a protocol without retransmissions of audio data packets in the upstream and downstream directions. As explained here above, retransmission attempts will delay the activation of packet loss concealment and therefore increase the overall latency of the wireless conference system.
In embodiments of the wireless conference system according to the present invention, the latency sensitive audio data packets have a round trip time latency limit of 25 milliseconds for wireless transfer from a conference unit to the access point, and wireless transfer from the access point to the conference unit.
Thus, embodiments of the wireless conference system may set a restriction of 25 milliseconds for the round-trip time of audio data packets. This restriction in other words sets the maximum acceptable delay for a conference participant between speaking in the microphone connected to his conference unit and hearing his/her own speech in headphones connected to that same conference unit. The skilled person shall appreciate that alternative embodiments of the wireless conference system according to the invention may implement any other round-trip time latency limit smaller than 25 milliseconds. Such alternative embodiments set a higher quality standard on the audio on the condition that underlying technology like packet loss concealment algorithms are able to meet the lower round trip time limit.
In embodiments of the wireless conference system according to the present invention, the latency sensitive audio data packets have a round trip time latency limit of 15 milliseconds for wireless transfer from a conference unit to the access point, and wireless transfer from the access point to the conference unit.
Indeed, preferred embodiments of the wireless conference system set a restriction of 15 milliseconds for an audio data packet to travel back and forth between a conference unit and the central access point. The skilled person shall appreciate that round trip time limits below 15 milliseconds may as well be implemented, at the risk however that higher capacity data packet processors that are more expensive must be deployed, and/or shorter TDMA frames and shorter audio packet lengths must be implemented negatively impacting the effective bandwidth of the wireless link, in order to minimize the audible artefacts.
In embodiments of the wireless conference system according to the present invention, the TDMA based wireless communication uses TDMA frames of 5 milliseconds.
Thus, preferred embodiments implement a TDMA frame with length of 5 milliseconds. This implies that audio data packets also comprise audio samples spanning 5 milliseconds. At a audio sampling rate of 48 kHz, this means each audio data packet comprises 240 audio samples. These 240 audio samples constitute the payload section of an audio data packet. In addition, the audio data packet comprises overhead. The skilled person shall appreciate that shorter TDMA frames and shorter audio data packets have a negative impact on the effective bandwidth of the wireless link: the payload section of an audio data packet will reduce whereas its overhead section shall remain constant. The skilled person shall further appreciate that longer TDMA frames and longer audio data packets may complicate the task of packet processors and packet loss concealment technology to timely produce the audio stream without audible artefacts. The preferred TDMA frame length of 5 milliseconds in other words is the result of trading-off effective bandwidth on the wireless link between conference units and access point versus quality standards to be met for conference systems.
In embodiments of the wireless conference system according to the present invention, the transmitter is configured to listen for interfering traffic within an assigned timeslot within a TDMA frame before transmitting an audio data packet therein.
Indeed, preferred embodiments of the invention implement a so-called soft TDMA scheme wherein the timeslots that form part of a TDMA frame are fixedly assigned to conference units following a predefined scheme, but wherein the transmit time within each timeslot of a TDMA frame is flexibly determined by the conference units based on an LBT (Listen Before Talk) mechanism. The LBT mechanism brings the advantage that the wireless channel can be used simultaneously by different wireless systems, resulting in a more effective use of the overall available bandwidth of a wireless channel. The LBT mechanism on the other hand introduces jitter or uncertainty for the receiver in the arrival time of audio data packets, as the transmitter will transmit the audio data packet only after having established that a timeslot is not used by any other transmitter, either internal or external to the wireless conference system. The LBT jitter in conference systems typically resides in the order of 1 to 2 milliseconds.
In embodiments of the wireless conference system according to the invention, the wireless communication uses Wi-Fi (IEEE 802.11).
Indeed, the Wi-Fi protocol serves well as wireless technology for connecting conference units with the central access point in a conference system. Wi-Fi has a reach that enables to cover the area of conference rooms with a single access point (or at most a few access points) and offers multiple channels or frequency bands to deal with interference.
In embodiments of the wireless conference system according to the present invention, the one or more conference units comprise clock synchronization units, configured to actively synchronize their respective clocks with a clock in the access point based on a timestamp inserted in beacon messages regularly broadcasted by the access point.
Thus, a preferred way to implement active synchronisation between the clock of the access point and the clocks of the conference units relies on beacon messages regularly broadcasted by the access point. In case the access point relies on the Wi-Fi protocol, the beacon messages for instance may correspond to the messages wherein the access point regularly broadcasts its SSID in order to enable devices to detect presence of the access point and establish connectivity. In comparison with other messages, such beacon messages are typically transmitted at a lower modulation scheme, i.e. using a less complex constellation scheme and increased redundancy, such that these beacon messages are more robust: they have a larger range and reduced risk for being lost before reaching the receiver. Using beacon messages to convey a time value or timestamp between the access point and conference unit of a conference system thus makes the active clock synchronisation that is essential to the present invention, more robust. An additional advantage in case of Wi-Fi, is that the beacon messages transferred therein are backwards compatible with earlier flavours or versions of the Wi-Fi technology.
In embodiments of the wireless conference system according to the invention, the predetermined expected transmission delay is determined as a sum of a propagation delay, jitter, an interrupt handling delay, processing delay, and clock synchronisation inaccuracy.
Indeed, the overall inaccuracy on the receipt time of an audio data packet in preferred embodiments comprises a first contribution resulting from the effective propagation through the air, typically in the order of 100 a 200 microseconds. The overall inaccuracy on the receipt time of an audio packet further may comprise a second contribution resulting from LBT jitter in embodiments wherein a soft TDMA scheme is implemented. This second contribution is substantial, typically in the range of 1 to 1.5 milliseconds for TDMA frames of 5 milliseconds. The overall inaccuracy on the receipt time of an audio packet further comprises third, fourth and fifth contributions, respectively resulting from the interrupt handling at the transmitter and receiver side, i.e. the processing of interrupts indicating that an event has occurred like for instance the receipt of a packet, resulting from the packet and audio processing time, and resulting from synchronisation inaccuracies between the transmitter clock and receiver clock that may depend on the active synchronisation mechanism deployed.
In embodiments of the wireless conference system according to the invention, the jitter delay comprises a listen-before-talk jitter contribution.
As already explained here above, jitter of various nature may contribute to the overall inaccuracy on the receipt time of an audio packet. In case a soft TDMA scheme is applied, wherein the transmitter waits until the channel is free before transmitting in the scheduled timeslot, an important jitter contribution stems from the listen-before-talk behaviour of the transmitter.
In embodiments of the wireless conference system according to the invention, the predetermined expected transmission delay is set at a value between 1.5 milliseconds and 2 milliseconds.
Tests have shown that in conference systems using Wi-Fi with a soft TDMA scheme with time frames of 5 milliseconds for communication between conference units and a central access point of a conference system, and using timestamps in Wi-Fi beacons for active synchronisation, an overall expected transmission delay set at 1.5 milliseconds allows to implement the present invention with substantial gain in the detection and early activation of packet loss concealment without sacrificing effective bandwidth on the wireless link.
According to a second aspect, the present invention relates to a method for transfer of latency sensitive audio data packets between one or more conference units and an access point in a conference system adapted to enable a plurality of users to participate to a conference in a conference room, the method for transfer comprising bi-directional, time division multiple access based or TDMA based wireless communication of the audio data packets, the method further comprising:
Although the present invention has been illustrated by reference to specific embodiments, it will be apparent to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied with various changes and modifications without departing from the scope thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. In other words, it is contemplated to cover any and all modifications, variations or equivalents that fall within the scope of the basic underlying principles and whose essential attributes are claimed in this patent application. It will furthermore be understood by the reader of this patent application that the words “comprising” or “comprise” do not exclude other elements or steps, that the words “a” or “an” do not exclude a plurality, and that a single element, such as a computer system, a processor, or another integrated unit may fulfil the functions of several means recited in the claims. Any reference signs in the claims shall not be construed as limiting the respective claims concerned. The terms “first”, “second”, third“, “a”, “b”, “c”, and the like, when used in the description or in the claims are introduced to distinguish between similar elements or steps and are not necessarily describing a sequential or chronological order. Similarly, the terms “top”, “bottom”, “over”, “under”, and the like are introduced for descriptive purposes and not necessarily to denote relative positions. It is to be understood that the terms so used are interchangeable under appropriate circumstances and embodiments of the invention are capable of operating according to the present invention in other sequences, or in orientations different from the one(s) described or illustrated above.
Number | Date | Country | Kind |
---|---|---|---|
20191690.5 | Aug 2020 | EP | regional |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2021/067652 | 6/28/2021 | WO |