The present invention relates to voice over IP communication, and in particular, but not exclusively to a method and device for making a call over a voice over IP network.
In conventional communication systems, all telephonic devices are designed to yield a frequency response of the transfer function representing all stages from the acoustic signal to the digital signal prior to the speech encoder that matches the characteristics of the sending intermediate reference system (IRS) specified in ITU-T P.48 standard, “Specification for an Intermediate Reference System,” ITU-T Recommendation P.48, 1988. The frequency characteristics of the Intermediate Reference System according to ITU-T P.48 are shown in
The frequency characteristics of the IRS provide an emphasis to the speech frequency band that is considered most important for speech intelligibility. That is, that more weight is given to the second formant frequencies rather than to the first formant, which is known to increase intelligibility of clipped speech, as discussed in I. B Thomas, “The Influence of First and Second Formants on the Intelligibility of Clipped Speech,” Journal of Audio Engineering Society, Vol. 16, No. 2, 1968.
By concentrating the energy of a narrowband signal into the second formant frequencies the intelligibility of the narrowband signal is improved, allowing improved intelligibility of a speech signal at a receiver of a call without increasing the bandwidth requirements.
Thus, conventional communication systems, for example the public switched telephone network based on fixed line and/or mobile networks, are designed to have average frequency responses as defined in the IRS specification, that emphasize the second formant frequencies.
Some communication systems allow the user of a device, such as a personal computer, to communicate across a packet-based computer network such as the Internet. Such communication systems include voice over internet protocol (“VoIP”) systems. These systems are beneficial to the user as they are often of significantly lower cost than conventional fixed line or mobile networks. This may particularly be the case for long-distance communication. To use a VoIP system, the user installs and executes client software on their device. The client software sets up the VoIP connections as well as providing other functions such as registration and authentication.
In order to be able to communicate using VoIP, the device must be capable of capturing the voice signal. Commonly, a device may be coupled to a headset, or may contain a built-in microphone that can be used for this purpose. Often, when a computer is used to place VoIP calls, the microphone or headset used will be a general purpose audio input device, and may not necessarily conform to the IRS specification of classical telephony.
When a call is made from a computer to a fixed/mobile phone using an audio input device that is not compliant with the IRS specification, the effect is that the receiving sound at the mobile or fixed phone tends to sound muffled, resulting in reduced intelligibility of the recreated speech compared to, for example, a regular mobile to mobile call.
This is because the computer will encode a speech signal with a spectral emphasis that is different to the IRS specification due to the general purpose designed microphones headsets. However, the fixed/mobile phone receiving a call from the computer will treat the received signal as though it had been captured by another fixed/mobile phone.
It is an aim of some embodiments of the invention to address at least some of the problems associated with the prior art.
According to an aspect of the invention, there is provided a method of making a call in a packet switched network between a calling device and a called device, the method comprising receiving at a processor of the calling device samples of a speech signal and an identity of the called device, executing code on the processor to perform the steps of: determining based on the identity of the called device whether a filter should be applied to the samples, when it is determined that a filter should be applied, filtering the samples, and encoding the filtered samples for transmission on the packet switched network.
Filtering the samples may further comprise filtering the samples in accordance with a telephonic standard. The telephonic standard may comprise the P.48 “Specification for an intermediate reference system,” ITU-T Recommendation P.48, 1988 standard.
Filtering may be applied when it is determined that the called device comprises one of a mobile phone or a fixed phone. In particular, the filtering may be applied to the samples when it is determined based on the ID of the called receiver that the called receiver complies with the P.48 “Specification for an intermediate reference system,” ITU-T Recommendation P.48, 1988 standard.
The samples may be filtered in an adaptive filter. The method may further comprise adapting filter coefficients of the adaptive filter to match the frequency response of the filtered samples to a target frequency response.
Encoding the filtered samples may comprise encoding the filtered samples into a plurality of blocks, and wherein the method further comprises calculating an average power/magnitude spectra for the plurality of blocks to determine the frequency response of the filtered samples.
According to a further aspect of the invention, there is provided a terminal for making a call over a packet switched network to a called device, the terminal comprising a processor configured to receive digital samples of a speech signal and an identity of a called device, and a memory configured to store program code arranged so as when executed on the processor to: determine based on the identity of the called device whether a filter should be applied to the samples, when it is determined that the filter should be applied, filtering the samples, and encoding the filtered samples for transmission on the packet switched network.
According to a further aspect of the invention, there is provided a computer program product for making a call in a packet switched network between a calling device and a called device, the program comprising code arranged so as when executed on a processor to receive digital samples of a speech signal and an identity of the called device, determine based on the identity of the called device whether a filter should be applied to the samples, when it is determined that the filter should be applied, filtering the samples, and encoding the filtered samples for transmission on the packet switched network.
According to a further aspect of the invention, there is provided a communication system comprising a plurality of end-user terminals as described above.
For a better understanding of the present invention and to show how it may be carried into effect, reference will now be made by way of example to the accompanying drawings in which:
Embodiments of the invention are described herein by way of particular examples and specifically with reference to exemplary embodiments. It will be understood by one skilled in the art that the invention is not limited to the details of the specific embodiments given herein.
Embodiments of the invention provide selective filtering of a speech signal in a VoIP device when placing a call to a mobile or fixed telephone to thereby alleviate the muffled quality of the speech reproduced at the receiver. According to embodiments, a digital filter is applied to a speech signal prior to the speech encoder inside the VoIP client.
The gateway 206 provides a connection between the packet switched network 204, as used for voice over IP telephony, and the circuit switched network 208 to allow a VoIP call originating at the VoIP client 202 to be routed to a traditional telephone 210, 212.
The destination of the VoIP call is determined in the VoIP client 202 based on an identity of a called party, allowing the call to be correctly routed over the packet switched network 204. If it is determined that the called party is a mobile or fixed telephone located in the circuit switched network 208, the encoded speech is transmitted to the gateway 206, where the speech is decoded and then transmitted over the circuit switched network 208 to the called party as a normal telephone call.
A block diagram of a VoIP device 300 for placing a call over a packet switched network 204 according to an embodiment of the invention is shown in
The signal output by the microphone 302 is sampled in an analogue to digital converter, before being received by the VoIP client 202. The sampled microphone output is coupled to an echo and noise canceller 304. The echo and noise canceller 304 has an output coupled to an input of an adaptive filter 306. The adaptive filter 306 has an output coupled to an input of the speech encoder 308. The adaptive filter 306 receives filtered output samples in for use in adapting the filter coefficients. The speech encoder 308 outputs an encoded speech signal for transmission over the packet switched network 204.
A target response selector 310 receives information relating to call characteristics of a current call at an input, and has an output coupled to the adaptive filter 306 to provide a selected target frequency response to the adaptive filter 306.
In operation, a speech signal is captured by the microphone 302 and sampled in an analogue to digital converter (not shown), and the sampled signal is passed to the echo and noise canceller 304, which processes the captured speech signal to reduce echoes and unwanted noise components of the captured signal. The target frequency response 310 determines from the call characteristics information an appropriate target frequency response, this selected target frequency response is provided to the adaptive filter 306. The adaptive filter coefficients are then updated to match the desired target frequency response.
The target response selector 310 selects an appropriate target frequency response for a particular call scenario, based on the call characteristic information. For example, if it is determined that the call being placed is to a mobile phone, a target frequency response that emphasizes the frequency region where the second formant sits might be desirable in order to improve the speech intelligibility on the mobile side. In a further example scenario, the call characteristic may indicate that the call is a wideband call between two VoIP clients, and a target frequency response will then be chosen accordingly.
A block diagram of a VoIP device 600 for placing a call via a gateway 206 over a circuit switched network 208 according to an embodiment of the invention is shown in
The VoIP device 600 is similar to that shown in
While the switch 612 is illustrated as a hardware switch, it will be understood that the switch could be implemented in software within the VoIP client 202.
A controller 610 is coupled to the switch to command the switch between the first and second positions. The controller 610 is further coupled to the filter 306 to allow control over the filter coefficients.
In operation, a speech signal is captured by the microphone 302 and sampled in an analogue to digital converter (not shown), and the sampled signal is passed to the echo and noise canceller 304, which processes the captured speech signal to reduce echoes and unwanted noise components of the captured signal. The controller 610 determines from the identity of the called party whether the receiver of the call is a mobile or fixed telephone, and if so controls the switch to the first position. With the switch in the first position, the speech signal is filtered in filter 306 before being encoded in the speech encoder 308.
If the controller 610 determines that the receiver of the call is not a telephonic device, for example the receiver may be a further VoIP client attached to the packet switched network 204, the switch is commanded to the second position, and the filter 306 is bypassed.
Thus, the filter 306 is only applied to the captured speech signal when it is determined that a call is to be connected between the VoIP client 202 and a mobile or fixed phone 210, 212. The filter 306 is not applied for a call between to VoIP clients communicating across the packet switched network 204.
In the embodiment of
The average frequency response for the combination of all stages prior to the speech encoder 308 may be calculated based on information provided by the speech encoder 308. For example, the speech encoder may be configured to provide information for each block of encoded speech that allows the calculation of an average power/magnitude spectra for the blocks of the encoded speech signal. This average power/magnitude spectra for the blocks of the encoded speech signal can be considered to be a product of the frequency response for the stages prior to the speech encoder with an average power spectrum of speech.
A target frequency response can then be determined as the product of an average power spectrum of voiced speech and the desired frequency response for the combination of all stages prior to the speech encoder including the filter 306, for example the power spectrum in
The filter coefficients are then adapted based on a comparison of the calculated frequency response and the target frequency response.
According to the described embodiment of the invention, the filter 306 may comprise an Infinite Impulse Response (IIR) filter, i.e. having a transfer function defined by:
where the filter coefficients an and bn are subject to tuning.
Embodiments of the invention provide for filtering of the speech signal prior to the signal being encoded, for example to give spectral emphasis to the frequency region ˜1-4 kHz, the second formant frequencies, when placing a call to a mobile or fixed telephone. This filtering alleviates the muffled quality experienced when placing a call from a VoIP client using a general purpose microphone to a fixed/mobile phone, thereby improving speech intelligibility at the receiving side.
Advantageously, adaptation of the filter coefficients to match the average frequency response of the microphone 302, echo and noise canceller 304, and filter 306 to a desired target frequency response allows the VoIP client 202 to adapt to variations in frequency response of different input devices, and thus produce a more consistent audio quality at the receiving side.
The modules of the VoIP client 202 are implemented in software, such that each of the components 304 to 308 comprise modules of software stored on one or more memory devices and executed on a processor.
Embodiments of the invention have been described in the context of the ITU-T P.48 Intermediate Reference System as one example target frequency response that is appropriate for calls having certain characteristics. However, the design is by no means limited to match the IRS specification, and it would be understood that other target frequency responses might yield even better speech intelligibility.
It will be appreciated that the above embodiments are described only by way of example. For instance, some or all of the modules of the VoIP client could be implemented in dedicated hardware units. Further, instead of a user input device like a microphone, the input speech signal could be received from some other source such as a storage device. Similarly, echo and noise canceller 304 may be omitted, or further processing blocks may be included in the VoIP client 202. The filter 306 may be adapted to match an average frequency response for the combination of all stages prior to the speech encoder, including the further processing blocks, to the target frequency response.
The foregoing description has provided by way of exemplary and non-limiting examples a full and informative description of the exemplary embodiment of this invention. However, various modifications and adaptations may become apparent to those skilled in the relevant arts in view of the foregoing description, when read in conjunction with the accompanying drawings and the appended claims. However, all such and similar modifications of the teachings of this invention will still fall within the scope of this invention as defined in the appended claims.
Number | Date | Country | Kind |
---|---|---|---|
0921464.4 | Dec 2009 | GB | national |