The present invention relates generally to push-to-talk (PTT), or push to transmit, systems.
Emergency Response Teams (ERTs) often utilize PTT devices to facilitate their communication. PTT devices, which include two-way radios or other devices which support two-way communications, include buttons that may be engaged to transmit media, e.g., a voice signal or voice data, and disengaged to receive media. Some PTT systems facilitate floor control such that only a single end user may control the floor and send media, while all other end users associated with the system may only listen to the single end user with control of the floor.
As ERT teams often operate in environments which are relatively noisy, communications utilizing PTT devices may be impeded. For example, if an end-user transmits media, surrounding noise is also transmitted. The surrounding noise may include significant noise such as noise from sirens, noise associated with traffic, and noise associated with helicopters and aircraft. When the voice of an end-user is transmitted along with significant noise, a receiver may not be able to determine what message the end-user is trying to convey. Hence, communications using PTT devices may not be efficient in the presence of surrounding noise.
The invention may best be understood by reference to the following description taken in conjunction with the accompanying drawings in which:
In one embodiment, a method includes obtaining a first media stream using a microphone when a PTT functionality of a PTT communications system is in a first state, and identifying a first set of characteristics associated with noise in the first media stream. The method also includes obtaining a second media stream using the microphone that includes the noise and a first sound when the PTT functionality is in a second state. A second set of characteristics associated with the first sound in the second media stream is identified, and parameters associated with a filtering arrangement are determined using the first and second sets of characteristics. Finally, the method includes applying the filtering arrangement to the second media stream to filter out the noise such that a communications stream is created.
By reducing the effect of surrounding noise on a transmission of a voice of a speaker or an end user using a push-to-talk (PTT) device by modifying either a transmitting path or a receiving path, communications using PTT devices may be enhanced. The voice characteristics of the speaker are captured when the PTT function of the PTT device is engaged, and surrounding noise characteristics are captured when the PTT function is not engaged. Both voice characteristics and noise characteristics may be captured in a media signal while the PTT function is engaged. Hence, knowledge of what the surrounding noise characteristics are when the speaker is not speaking, e.g., when the PTT function is not engaged, allows a filter to be designed to filter out the noise characteristics from the media signal such that the effect of surrounding noise may be reduced.
In one embodiment, a single microphone such as one intended to capture the voice of a speaker or an end user may be used in an intelligent, time-multiplexed manner. When a PTT function of a PTT device is engaged and the speaker speaks, the microphone captures both the voice of the speaker and surrounding noise. If the PTT function is not engaged and the speaker is not speaking, the microphone captures surrounding noise. Hence, when the PTT function is engaged, speaker voice characteristics may be collected. Surrounding noise characteristics may be collected when the PTT function is not engaged.
Referring initially to
Coupled to microphone 108 is a control subsystem 112 which provides multiplexing and noise reduction. A multiplexing arrangement 116 allows microphone 108 to be used in a time-multiplexed manner, while a noise reduction arrangement 120 generates a filter that allows surrounding noise 124 to be filtered out of media streams associated with a voice of speaker 104. Multiplexing arrangement 116 may further be arranged to allow microphone 108 to remain on or active even when PTT functionality is not engaged. In general, control subsystem 112 may either be located at a core of system 100 or at an endpoint or PTT device of system 100.
At a time t1, when the PTT functionality associated with microphone 108 is engaged or is in a first state, a voice of speaker 104 as well as surrounding noise 124 may be captured by microphone 108. At a time t2, when the PTT functionality associated with microphone 108 is not engaged or is in a second state, surrounding noise 124 is still captured by microphone 108. Capturing noise 124 and/or a voice of speaker 104 in media streams is generally at least partially controlled by multiplexing arrangement 112. Multiplexing arrangement 116 facilitates the use of microphone 108 to capture the voice of speaker 104 and surrounding noise 124 when PTT functionality is engaged, and to capture surrounding noise 124 when PTT functionality is not engaged. A voice characteristics analyzer 118 cooperates with multiplexer 116 and noise reduction arrangement 120 to analyze the characteristics of the voice of speaker 104 as well as characteristics of surrounding noise 124.
Media streams may be provided to voice characteristics analyzer 118 and to noise reduction arrangement 120 such that characteristics of noise 124 and characteristics of a voice of speaker 104 may be used to generate a filter to reduce noise associated with a transmission of the voice of speaker 104 while substantially minimizing the impact to the media associated with speaker 104. In one embodiment, noise reduction arrangement 120 generates and implements notch filter using parameters which are determined using characteristics of noise 124 and characteristics of the voice of speaker 104.
When noise reduction arrangement 220 includes a notch filter, characteristics of noise 224 that are obtained when the PTT functionality of a PTT device is not engaged, may be used to substantially prevent noise 224 from being included in communications stream 232. That is, a notch filter may block out certain noise frequencies from being included in communications stream 232 such that a voice of speaker 204 is transmitted without significant corruption from noise 224.
Noise may be filtered out of a media stream using an adaptive noise filter at an endpoint, e.g., a PTT device, or at a core processor arrangement of an overall communications system. In other words, the analysis of a media stream that includes the voice of a speaker may occur either at an endpoint of a deployment architecture or at a core of a deployment architecture. “In accordance with one deployment architecture, system 220 of
In accordance with a second deployment architecture, system 220 of
With reference to
System 400 includes a plurality of endpoints 406, 408 which may be PTT devices. In one embodiment, endpoints 408, which are located in IP network system may be IP based PTT devices such as a Cisco Push-to-Talk Management Center (PMC) available commercially from Cisco Systems, Inc. of San Jose, Calif. Endpoints 406, 408 however, may instead be computing systems which are in communication with PTT devices. Each endpoint 406, 408 has an associated microphone, and is arranged to both capture and to analyze media signals, e.g., media signals associated with the voice of a speaker and media signals associated with surrounding noise.
In lieu of being located at an endpoint, digital signal processing functionality may be located at the core of a centric or central architecture.
In one embodiment, central media server 550 is in communication with endpoints 506 through a local area network (LAN) or a wide area network (WAN) 580. Directory 584 is substantially attached to LAN/WAN 580, and provides a mechanism or functionality for storing voice and noise] signatures of the users of system 500. As users logon into system 500, the users may retrieve their specific voice characteristics use them to initiate the calculation of an applicable notch filter before speaking.
Endpoints 506 capture media streams, which are then communicated to central media server 552 such that digital signal processing functionality 576 may be used to determine voice and noise signatures, and to enable noise to be filtered out of media streams that include the voice of a speaker. As system 500 analyzes the media stream of the speakers, System 500 compares the voice characteristics with the characteristics stored in directory 584 and updates them accordingly.
With reference to
In step 609, a determination is made as to whether the PTT function of the PTT device is engaged, e.g., it is determined if floor control has been granted to a speaker associated with the PTT device who wishes to speak into the PTT device. If it is determined that the PTT function is engaged, the indication is that voice characteristics of the speaker are to be captured. Accordingly, process flow moves to step 613 in which speaker voice characteristics and surrounding noise are captured using a microphone of the PTT device. The media stream that is captured by the microphone generally includes the speech or voice characteristics of the speaker including, but not limited to including, frequency and power, as corrupted by noise. The combined voice and noise characteristics may be stored either on the PTT device or in a central mixing facility.
The output voice stream, or the voice stream that is to be transmitted by the PTT device is adjusted based on previously captured noise characteristics in step 617. In other words, noise is filtered out of the captured media stream using information relating to known noise characteristics. One method of adjusting the output voice stream will be discussed below with reference to
Returning to step 609, if it is determined that the PTT function is not engaged, noise characteristics are captured through the microphone of the PTT device in step 625. The noise characteristics, which may include but are not limited to including frequency and power, relate to the surrounding or ambient noise at the location at which the PTT device is being used. In general, once the noise characteristics are obtained, the noise characteristics may be stored. Methods for capturing noise characteristics will be discussed below with reference to
Once noise characteristics are captured, it is determined in step 629 whether the user has logged out. If it is determined that the user has logged out, the process of utilizing a PTT device is completed. Alternatively, if the determination is that the user had not logged out, process flow returns to step 609 in which it is determined if the PTT functionality of the PTT device is engaged.
Referring next to
After the characteristics of the combined speaker voice and surrounding noise are obtained and stored, noise characteristics are obtained in step 709, e.g., during time interval 244b of
As mentioned above with respect to
After the packets are collected from the media stream associated with surrounding noise, the candidate packets are correlated to captured packets associated with speaker voice characteristics in step 833. In other words, the candidate packets collected when the PTT functionality is released are compared to packets that were collected when the PTT functionality was previously engaged. Any suitable method may be employed to correlate the candidate packets with the captured packets associated with speaker voice characteristics.
A determination is made in step 837 as to whether the parameters of the candidate packets and the parameters of the captured packets associated with speaker voice characteristics exhibit common characteristics. For example, the system may determine if the two media streams possess overlapping frequency spectrums and identify frequency components which exist substantially only in the media stream received when the PTT function is engaged.
If it is determined that the parameters collected during the time interval of time the PTT is engaged and during the time interval the PTT is not engaged are similar, the implication is that the candidate packets likely contain the speaker voice and may not be used as surrounding noise packets. In one example embodiment, if the system may not identify a frequency spectrum which is unique to the media stream which is received when the PTT function is engaged, the system concludes that both media streams contain the speaker's voice. As such, in step 841, the candidate packets are discarded, and it is determined in step 849 whether PTT functionality is engaged. If it is determined that PTT functionality is engaged, the process of capturing noise characteristics is completed. Alternatively, if PTT functionality is determined not to be engaged, the indication is that a speaker is not speaking and that candidate packets may include noise characteristics. As such, process flow moves from step 849 to step 825 in which packets collected from a media stream are marked as candidates for surrounding noise packets.
Alternatively, if it is determined in step 837 that the overlap between the parameters is not relatively high, then the indication is that the candidate packets are suitable for use as surrounding noise packets. Therefore, process flow moves from step 837 to step 845 in which the candidate packets are analyzed for determining the noise characteristics and creating an appropriate filter to notch out the surrounding noise that is present in packets that include speaker voice characteristics.
Once the candidate packets are analyzed for noise packets and noise characteristics are extracted, it is determined in step 849 whether PTT functionality is engaged. It should be appreciated that if PTT functionality is engaged, then candidate packets are not collected, as the packets collected while PTT functionality is engaged are packets that include the voice of a speaker. If the determination is that PTT functionality is not engaged, process flow returns to step 825 in which collected packets are marked. Alternatively, if it is determined that PTT functionality is engaged, and the process of capturing noise characteristics is completed.
Although only a few embodiments of the present invention have been described, it should be understood that the present invention may be embodied in many other specific forms without departing from the spirit or the scope of the present invention. By way of example, the voice characteristics of each speaker or end user who may use a PTT device associated with a system may be stored either at an endpoint or end device, or at a directory which is attached to the network. If voice characteristics of a speaker are stored, when the speaker joins a VTG using a PTT device, the system may download the stored voice characteristics for use as a starting point for determining parameters of an adaptive filter for use in notching out noise from a media stream that carries the voice or the speech of the speaker and the surrounding noise. In one embodiment, voice characteristics may be stored at an endpoint. However, voice characteristics may also be stored in a central directory of the system attached to the network.
A filter that may be created to filter out noise from a media stream that carries the speech of a speaker or end user has been described as being a notch filter. Other filters may be implemented for use in filtering out noise. For instance, substantially any band-stop or band-rejection filter with a relatively narrow stopband may be implemented in lieu of a notch filter.
In general, a PTT device may include a hardware or soft button or similar mechanism that is pushed to engage PTT functionality and released to disengage PTT functionality. That is, a PTT device may include a button that is pushed by a speaker when he or she wishes to speak, and is released by the speaker when he or she does not wish to speak. It should be appreciated, however, that a variety of different methods may be used to engage and to disengage PTT functionality.
The present invention has generally been described as being deployed on either an endpoint or a core of a central media server. The invention, however, is not limited to being used in such deployment architectures. By way of example, the present invention may be implemented as a hybrid deployment architecture wherein some services of the system are located at the endpoint while other are located at the central media server without departing from the spirit or the scope of the present invention. Further, it should be understood that in other embodiments, the noise reduction components may reside in the receiving endpoints or may be distributed among any combination of a transmitting endpoint, a receiving endpoint, and a component attached to a LAN/WAN network.
PTT devices or endpoints may be widely varied. In other words, devices which support PTT functionality may be widely varied. For example, PTT devices may include, but are not limited to, land mobile radios, walkie-talkie devices, and a PTT Management Center (PMC) client available commercially from Cisco Systems, Inc.
The steps associated with the methods of the present invention may vary widely. Steps may be added, removed, altered, combined, and reordered without departing from the spirit of the scope of the present invention. Therefore, the present examples are to be considered as illustrative and not restrictive, and the invention is not to be limited to the details given herein, but may be modified within the scope of the appended claims.