An architecture 100 according to a first embodiment is depicted in
The gateway 116 can be any suitable device for controlling ingress to and egress from the corresponding LAN. The gateway is positioned logically between the other components in the corresponding enterprise premises 108 and the network 112 to process communications passing between the server 120 and internal communication device 128 on the one hand and the network 112 on the other. The gateway 116 typically includes an electronic repeater functionality that intercepts and steers electrical signals from the network 112 to the corresponding LAN 124 and vice versa and provides code and protocol conversion. When processing voice communications, the gateway 116 further performs a number of VoIP functions, particularly silence suppression and jitter buffer processing. The gateway 116 therefore includes a Voice Activity Detector 132 to perform VAD and SAD and a comfort noise generator (not shown) to generate comfort noise during periods of silence. Comfort noise is synthetic background noise, which prevents the listener from perceiving, from the periods of absolute silence resulting from silence suppression, that the communication channel has been disconnected. Examples of suitable gateways include modified versions of Avaya Inc. 's, G700, G650, G350, Crossfire, MCC/SCC media gateways and Acme Packet's Net-Net 4000 Session Border Controller.
The server 120 processes call control signaling, such as incoming Voice Over IP or VoIP and telephone call set up and tear down messages. The term “server”, as used herein, should be understood to include an ACD, a Private Branch Exchange PBX (or Private Automatic Exchange PAX) an enterprise switch, an enterprise server, or other type of telecommunications system switch or server, as well as other types of processor-based communication control devices such as media servers, computers, adjuncts, etc. Illustratively, the server of
The internal and external communication devices 104 and 128 are preferably packet-switched stations or communication devices, such as IP hardphones (e.g., Avaya Inc.'s 4600 Series IP Phones™), IP softphones (e.g., Avaya Inc.'s IP Softphone™), Personal Digital Assistants or PDAs, Personal Computers or PCs, laptops, packet-based H.320 video phones and conferencing units, packet-based voice messaging and response units, peer-to-peer based communication devices, and packet-based traditional computer telephony adjuncts. Examples of suitable devices are the 4610™, 4621SW™, and 9620™ IP telephones of Avaya, Inc.
The voice activity detector 116, as can be seen from
The detector 132 exploits the periodicity of a fixed signal by detecting peaks and troughs (i.e. turning points). In addition to time-based periodicity, the detector 132 uses amplitude-based periodicity. It relies on the detection of regular patterns within the signal. The detector 132 can be efficient, as it does not require significant signal processing resources to detect a fixed power signal.
A buffer 136 of n audio samples is stored. The number of samples is typically the same number of audio samples contained in a packet (or frame) to be transmitted to the destination communication device. N is frequently 80, as this represents 10 milliseconds of voice sampled at 8 kHz. The detector 132 iterates over this buffer 136, one-sample-at-a-time, and records selected characteristics of the sampled portion of the signal. In particular, the high and low points of the signal (e.g., peaks and troughs) are recorded. This information, when combined with the previous history of the recorded signal features, provides a condensed historical span of what the pattern is like.
Followed by this, there is a post processing step to search the gathered information for a pattern (or template). This is typically done by searching for repetitions. For example with a dual frequency signal, the detector 132 searches for a signal pattern having two distinct peaks and two distinct troughs and, for a single frequency signal, for a signal pattern having only one peak and only one trough. When the values do not fit the selected pattern, the sampled signal is deemed to be a more random signal and is rejected by the algorithm. Account can be taken of the noise floor waveform and any possible interference by establishing a range within which two values are considered to be similar. This allows the algorithm to execute in the presence of background noise.
An example of the recorded data structures generated during processing of the samples in the buffer 136 is shown in
The resulting recorded data is then examined for the occurrence of a fixed pattern within the signal itself based on the periodicity of turning points and amplitude of those points. The fixed pattern within the signal may be identified by comparing the data to one or more templates typical of different types of progress tones, such as intercept tones, ringback tones, busy tones, dial tones, reorder tones, and the like, to determine whether the analyzed sampled signal segment is a fixed signal. As noted, the pattern searched for in a dual frequency signal has first and second sets of distinct peaks and first and second sets of distinct troughs arranged in alternating fashion. The pattern searched for in a single frequency signal set of peaks and a set of troughs arranged in alternating fashion. Most progress tones are single frequency signals. The pattern is defined using not only the temporal periodicity of the turning points but also the signal amplitude at the turning points. A probability may be used to determine how well the segment fits the pattern. Probabilities below a specified threshold are not deemed to be fixed signals while probabilities at or above the specified threshold are deemed to be fixed signals. As can be seen from the data structures in
As will be appreciated, any suitable pattern matching algorithm may be used to post process. Such algorithms generally check for the presence of the constituents of a given pattern.
An example of a relatively simple algorithm is to construct first and second arrays describing a sampled audio signal segment. The first array comprises the number of instances of selected temporal distances between turning points. For example, the array would contain a number of instances for each of the selected temporal distances of 1, 2, 3, 4, . . . . The second array comprises the number of instances of a number of selected amplitude ranges at turning points. For example, the array would contain a number of instances for each of the amplitude ranges A-B, B-C, C-D, . . . , where A, B, C, D, . . . are amplitude values. The resulting instances in each array column could then be compared to specified templates for temporal and amplitude periodicity to determine if the signal segment is likely a fixed signal segment. The templates may be, for example, a maximum permissible distribution of the instances among differing array columns. If the instances are too widely distributed, the comparison would indicate that the signal segment is variable while a tighter distribution indicates that the signal segment is fixed. The template match probabilities from the comparisons to the first and second arrays can then be weighted to arrive at a combined probability that the signal segment is characteristic of a fixed or variable signal.
This analytical approach is further shown in
The operation of the detector 132 will now be described with reference to
In step 600, a frame comprising n audio signal samples is received. The samples in the frame are generated when the received analog audio signal is converted to digital form. The following steps are performed sample-by-sample and frame-by-frame. As noted, a packet will commonly contain one frame of 80 samples.
In step 604, a next sample is selected for analysis.
In step 608, the trend indicated by the selected sample is determined. As noted, the trend is typically determined by comparing the amplitude of the selected sample with the amplitude of the prior sample. If the amplitude is increasing, the trend is positive, and, if the amplitude is decreasing, the trend is negative.
In decision diamond 612, it is determined whether the sample includes a turning point. When a trend changes from positive in the prior sample to negative in the selected sample or from negative in the prior sample to positive in the selected sample, the selected sample is deemed to include a turning point.
When the selected sample includes a turning point, the temporal distance to the prior turning point is determined in step 616. This is done by counting the number of samples between the selected sample and the most recent (prior) sample containing a turning point.
In step 620, the sample identifier, a turning point indicator, a temporal distance from the turning point in the selected sample to the prior turning point, and an amplitude of the current turning point are saved.
When the selected sample does not include a turning point or after step 616, it is determined, in decision diamond 624, whether there is a next sample. If so, the detector returns to step 604. If not, the detector, in decision diamond 628, determines whether the recorded data defines a pattern. When the recorded data likely defines a pattern, the detector, in step 632, concludes that the audio samples in the selected packet are not silence and overrides any contrary determination made by another technique, such as by using the noise floor waveform. When the recorded data likely does not define a pattern, the detector, in step 636, concludes that the audio samples in the selected packet are not a fixed signal. Therefore, no change is made to the result determined by another technique.
Depending on the contents of the frame, it is either discarded as silence or packetized and transmitted to the destination endpoint as an active signal.
A number of variations and modifications of the invention can be used. It would be possible to provide for some features of the invention without providing others.
For example in one alternative embodiment, the present invention is used for non-VoIP applications, such as speech coding and automatic speech recognition.
In yet another embodiment, dedicated hardware implementations including, but not limited to, Application Specific Integrated Circuits or ASICs, programmable logic arrays, and other hardware devices can likewise be constructed to implement the methods described herein. Furthermore, alternative software implementations including, but not limited to, distributed processing or component/object distributed processing, parallel processing, or virtual machine processing can also be constructed to implement the methods described herein.
It should also be stated that the software implementations of the present invention are optionally stored on a tangible storage medium, such as a magnetic medium like a disk or tape, a magneto-optical or optical medium like a disk, or a solid state medium like a memory card or other package that houses one or more read-only (non-volatile) memories. A digital file attachment to e-mail or other self-contained information archive or set of archives is considered a distribution medium equivalent to a tangible storage medium. Accordingly, the invention is considered to include a tangible storage medium or distribution medium and prior art-recognized equivalents and successor media, in which the software implementations of the present invention are stored.
Although the present invention describes components and functions implemented in the embodiments with reference to particular standards and protocols, the invention is not limited to such standards and protocols. Other similar standards and protocols not mentioned herein are in existence and are considered to be included in the present invention. Moreover, the standards and protocols mentioned herein and other similar standards and protocols not mentioned herein are periodically superseded by faster or more effective equivalents having essentially the same functions. Such replacement standards and protocols having the same functions are considered equivalents included in the present invention.
The present invention, in various embodiments, includes components, methods, processes, systems and/or apparatus substantially as depicted and described herein, including various embodiments, subcombinations, and subsets thereof. Those of skill in the art will understand how to make and use the present invention after understanding the present disclosure. The present invention, in various embodiments, includes providing devices and processes in the absence of items not depicted and/or described herein or in various embodiments hereof, including in the absence of such items as may have been used in previous devices or processes, e.g., for improving performance, achieving ease and\or reducing cost of implementation.
The foregoing discussion of the invention has been presented for purposes of illustration and description. The foregoing is not intended to limit the invention to the form or forms disclosed herein. In the foregoing Detailed Description for example, various features of the invention are grouped together in one or more embodiments for the purpose of streamlining the disclosure. This method of disclosure is not to be interpreted as reflecting an intention that the claimed invention requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the following claims are hereby incorporated into this Detailed Description, with each claim standing on its own as a separate preferred embodiment of the invention.
Moreover, though the description of the invention has included description of one or more embodiments and certain variations and modifications, other variations and modifications are within the scope of the invention, e.g., as may be within the skill and knowledge of those in the art, after understanding the present disclosure. It is intended to obtain rights which include alternative embodiments to the extent permitted, including alternate, interchangeable and/or equivalent structures, functions, ranges or steps to those claimed, whether or not such alternate, interchangeable and/or equivalent structures, functions, ranges or steps are disclosed herein, and without intending to publicly dedicate any patentable subject matter.