Wireless headset and method for robust voice data communication

Description

FIELD OF THE INVENTION

This invention relates generally to wireless communication devices, and particularly to a wireless device, such as a headset, utilized for speech recognition applications, and other speech applications.

BACKGROUND OF THE INVENTION

Wireless communication devices are used for a variety of different functions and to provide a communication platform for a user. One particular wireless communication device is a headset. Generally, headsets incorporate speakers that convey audio signals to the wearer, for the wearer to hear, and also incorporate microphones to capture speech from the wearer. Such audio and speech signals are generally converted to electrical signals and processed to be wirelessly transmitted or received.

Wireless headsets have become somewhat commonplace. Wireless headsets are generally wirelessly coupled with other devices such as cell phones, computers, stereos, and other devices that process audio signals. In use, a wireless headset may be coupled with other equipment utilizing various RF communication protocols, such as the IEEE 802.11 standard for wireless communication. Other wireless communication protocols have been more recently developed, such as the Bluetooth protocol.

Bluetooth is a low-cost, low-power, short-range radio technology designed specifically as a cable replacement to connect devices, such as headsets, mobile phone handsets, and computers or other terminal equipment together. One particular use of the Bluetooth protocol is to provide a communication protocol between a mobile phone handset and an earpiece or headpiece. The Bluetooth protocol is a well; known protocol understood by a person of ordinary skill in the art, and thus all of the particulars are not set forth herein.

While wireless headsets are utilized for wireless telephone communications, their use is also desirable for other voice or audio applications. For example, wireless headsets may play a particular role in speech recognition technology. U.S. patent application Ser. No. 10/671,140, entitled “Wireless Headset for Use in a Speech Recognition Environment,” and filed on Sep. 25, 2003, sets forth one possible use for a wireless headset and that application is incorporated herein by reference in its entirety. Speech recognition applications demand high quality speech or audio signal, and thus a significantly robust communication protocol. While Bluetooth provides an effective means for transmission of voice for typical telephony applications, the current Bluetooth standard has limitations that make it significantly less effective for speech recognition applications and systems.

For example, the most frequently used standard representing voice or speech data in the telephony industry utilizes 8-bit data digitized at an 8,000 Hz sample rate. This communication standard has generally evolved from the early days of analog telephony when it was generally accepted that a frequency range of 250 Hz to 4,000 Hz was adequate for voice communication over a telephone. More recent digital voice protocol standards, including the Bluetooth protocol, have built upon this legacy. In order to achieve an upper bandwidth limit of 4,000 Hz, a minimal sample rate of at least twice that, or 8,000 Hz, is required. To minimize link bandwidth, voice samples are encoded as 8 bits per sample and employ a non-linear transfer function to provide increased dynamic range on the order of 64-72 dB. The Bluetooth standard supports generally the most common telephony encoding schemes. At the physical layer, the Bluetooth protocol uses a “synchronous connection oriented” (SCO) link to transfer voice data. An SCO link sends data at fixed, periodic intervals. The data rate of an SCO link is fixed at 64,000 bits per second (64 Kbps). Voice packets transmitted over an SCO link do not employ flow control and are not retransmitted. Therefore, some packets are dropped during normal operation, thus resulting in data loss of portions of the audio signals.

For most human-to-human communication applications, such as telephony applications, the current Bluetooth voice sampling and encoding techniques using SCO links and voice packets are adequate. Generally, humans have the ability to subconsciously use reasoning, context, and other clues to mentally reconstruct the original speech over a more lossy communication medium. Furthermore, where necessary, additional mechanisms, such as the phonetic alphabet, can be employed to ensure the reliability of the information transferred (e.g., “Z” as in Zulu).

However, for human-to-machine communication, such as speech recognition systems, significantly better speech sampling and encoding performance is necessary. First, a more reliable data link is necessary, because dropped voice packets in the typical telephony Bluetooth protocol can significantly reduce the performance of a speech recognition system. For example, each dropped Bluetooth SCO packet can result in a loss of 3.75 milliseconds of speech. This can drastically increase the probability of a speech recognition error.

Additionally, the information-bearing frequency range of speech is now understood to be in the range of 250 Hz to 6,000 Hz, with additional less critical content available up to 10,000 Hz. The intelligibility of consonants has been shown to diminish when the higher frequencies are filtered out of the speech signal. Therefore, it is important to preserve this high end of the spectrum.

However, increasing the sample rate of the audio signal to 12,000 Hz, while still maintaining 8-bit encoding exceeds the capability of the Bluetooth SCO link, because such an encoding scheme would require a data rate of 96 Kbps, which is above the 64 Kbps Bluetooth SCO rate.

Speech samples digitized as 8-bit data also contain a high degree of quantization error, which has the effect of reducing the signal-to-signal ratio (SNR) of the data fed to the recognition system. Speech signals also exhibit a variable dynamic range across different phonemes and different frequencies. In the frequency ranges where dynamic range is decreased, the effect of quantization error is proportionally increased. A speech system with 8-bit resolution can have up to 20 dB additional quantization error in certain frequency ranges for the “unvoiced” components of the speech signal. Most speech systems reduce the effect of quantization error by increasing the sample size to a minimum of 12 bits per sample. Thus, the current Bluetooth voice protocol for telephony is not adequate for speech application such as speech recognition applications.

Therefore, there is a need for an improved wireless device for use in speech and voice applications. There is particularly a need for a wireless headset device that is suitable for use in speech recognition applications and systems. Still further, it would be desirable to incorporate a Bluetooth protocol in a wireless headset suitable for use with speech recognition systems.

BRIEF DESCRIPTION OF THE DRAWINGS

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and, together with a general description of the invention given above, and the detailed description of the embodiments given below, serve to explain the principles of the invention.

FIG. 1 illustrates the schematic view of a communication system in which the present invention may be incorporated.

FIG. 2 illustrates a block diagram view of components of a wireless communication device in accordance with the principles of the invention.

FIG. 3 is a flow chart illustrating one operational embodiment of the present invention.

FIG. 4 is a table of Bluetooth protocol parameters utilized in accordance with one aspect of the present invention.

FIG. 5 is a flow chart illustrating another operational embodiment of the present invention.

DETAILED DESCRIPTION OF EMBODIMENT OF THE INVENTION

The present invention addresses the above-referenced issues and noted drawbacks in the prior art by providing a wireless device that is useful for speech applications and, particularly, useful for speech recognition applications that require higher quality speech signals for proper performance. To that end, the present invention rather than relying upon voice sampling and coding techniques for human-to-human communication, such as over the telephone, utilizes correlation processing and represents spectral characteristics of an audio or speech signal in the form of data.

Particularly, an autocorrelation component of the invention generates a set of coefficients for successive portion or frames of a digitized audio signal. The coefficients are reflective of spectral characteristics of the audio signal portions, which are represented by multiple successive frames. The sets of coefficients reflective of audio signal frames are transmitted as data packets in a wireless format. Although various wireless transmission protocols might be used, one particular embodiment utilizes a Bluetooth transceiver and a Bluetooth protocol. However, rather than utilizing standard Bluetooth voice processing and voice packets, the present invention transmits the sets of coefficients as data, utilizing data packets in the Bluetooth protocol. Other wireless transmission schemes may utilize their data transmission parameters, as well, in accordance with the principles of the present invention, as opposed to voice parameters, which are general utilized to transmit voice for human-to-human communication. In one particular aspect, the Bluetooth transceiver utilizes an asynchronous connection-less (ACL) link for transmitting the coefficients as data.

Therefore, the present invention overcomes the inherent limitations of Bluetooth and other wireless communication methods for use with speech recognition, by providing desired link reliability between the devices, providing high dynamic range and lower quantization error coding, and by providing less link bandwidth than current methods, while avoiding additional computational complexity on the speech recognition system.

FIG. 1 illustrates a schematic view of a system incorporating the invention and aspects thereof. The aspects and features of the present invention may be incorporated into various different wireless devices, which contain the signal processing capability and wireless transceiving capability for implementing the elements of the invention. For example, the invention might be incorporated into a traditional computer, such as a desktop or laptop computer, in a portable terminal or portable computing device, a cellular phone, a headset, or any other device which is operable for processing audio signals and transceiving the data which results from such signal processing according to the invention.

In one particular embodiment of the invention, it is incorporated into a wireless headset device worn by a user, and the data is transceived as data packets utilizing a Bluetooth protocol. Therefore, in the example discussed herein, a Bluetooth-enabled headset device is described. However, it should be understood that this is only one particular device and one particular data-transceiving protocol that might be utilized. Other such devices and transceiving protocols could also be used in accordance with aspect of the present invention. Therefore, the invention is not limited only to Bluetooth headsets.

Referring again to FIG. 1, a user 10 is shown utilizing or wearing a headset device 12. The headset 12 has wireless transceiving capabilities, which are implemented by appropriate processing and transceiving circuitry 14. The headset circuitry 14 also handles other signal processing as described below. The headset 12 will generally usually have one or more speakers 16 for the user to hear audio signals that are received as well as to hear those audio signals that are spoken and transmitted by the user. To capture audio signals, such as speech, the headset 12 incorporates a microphone 18. The processed signals are referred to herein generally as “audio signals” and will include voice and speech signals, as well as other audio signals. Generally, it is desirable for the user to communicate commands, responses, general speech, etc. to one or more devices, which are wirelessly coupled to the headset. For example, the headset 12 might communicate with a portable terminal 20 which may be worn or carried by the user, another person, or a piece of equipment. Such a wireless link is indicated by reference numeral 21. Similarly, the wireless headset 12 might communicate with another enabled device, such as a cellular phone or other device 22, which is coupled by wireless link 23. In FIG. 1 device 22 is indicated as being a Bluetooth-enabled device, although other transceiving protocols might be used. In another embodiment, headset 12 may be coupled directly to a server or other computer device 24, generally through a wireless access point 26, which has an appropriate antenna 27. The wireless link to the server 24 is indicated by reference 25. Generally, the wireless coupling of a device such as headset 12 to various other devices (not shown) in accordance with the principles of the present invention is also possible, as long as the devices have the necessary processing and transceiving circuitry for implementing the invention.

One particular speech application for the wireless headset device 12 or other inventive wireless device is a speech recognition application wherein the speech generated by user 10 is analyzed and processed for performing multiple tasks. For example, a user might be directed to perform a task through headset 12. Upon or during completion of the task, the user might speak to the system, through microphone 18, to confirm the instructions and task, ask additional information, or report certain conditions, for example. The speech of the user and the words spoken must then be analyzed or “recognized” to extract the information therefrom. U.S. patent application Ser. No. 10/185,995, entitled “Terminal and Method for Efficient Use and Identification of Peripherals” and filed on Jun. 27, 2002, discusses use of a headset and speech recognition in an inventory management system, for example, that application is incorporated herein by reference in its entirety. Various different speech recognition technologies may be used to process the unique data generated by the wireless headset or other device of the invention, and persons of ordinary skill in the art know such technologies. Therefore, the particulars of a specific speech recognition system are not set forth herein.

FIG. 2 illustrates, in block diagram form, various components of a wireless headset 12 to implement one embodiment of the invention. The components illustrated, while separated into functional blocks in FIG. 2, might be combined together into a single integrated circuit, or maybe implemented utilizing individual circuit components. As noted above, headset 12 incorporates a microphone 18 for capturing an audio signal, such as a speech signal from user 10. Microphone 18 is coupled to an audio coder/decoder or audio codec 40. The audio codec 40 performs analog-to-digital (A/D) conversion on the analog audio signal captured by microphone 18. The audio codec 40 also preferably performs anti-aliasing on the resulting digitized data as well. In effect, audio codec 40 provides a digitized audio signal reflective of the analog audio signal captured by microphone 18. Audio codec 40 supports the necessary data sample rates and bit resolutions noted below for implementing various embodiments of the present invention. Particularly, audio codec 40 provides sampling rates for high-quality audio signals that capture most of the speech frequencies that would be of interest to speech applications, such as speech recognition applications.

The digital audio data, or the digitized audio signal, is supplied to a digital processor 42. The digital processor includes a microprocessor or other digital signal processor, volatile and non-volatile memory, and associated logic necessary to provide the desired processing of the signal for implementing the invention. For example, as discussed further below, the digital processor 42 may provide pre-emphasis processing, frame generation, windowing, and auto correlation processing of the digital data stream. The product of the digital processor 42 is processed, digitized audio or speech data, which is then supplied to a baseband processor 44, such as a Bluetooth baseband processor, for example.

The baseband processor 44 then formats the processed digital speech data according to transceiving protocol standards and, in the exemplary embodiment, according to Bluetooth protocol standards. However, the digital speech data provided by baseband processor 44 is not transmitted as voice packets under the Bluetooth protocol, as it would be under typical Bluetooth telephony applications. Rather, in accordance with one aspect of the invention, the digitized speech is transmitted as data using data packets under the Bluetooth protocol. The baseband processor may perform such operations as adding packet header information, forward error correction, cyclic redundancy check, and data encryption. It also implements and manages the Bluetooth stack. As noted above, the Bluetooth transmission protocol is a standard transmission protocol, and thus will be readily understood by a person of ordinary skill in the art. As such, all of the various specifics associated with Bluetooth transmission are not discussed herein.

A wireless transceiver, such as a Bluetooth transceiver 46, coupled to an antenna 48, performs all operations necessary to transmit and receive the voice data over a wireless link, such as a Bluetooth link. Wireless transceiver 46 might be operable under another wireless communication protocol even though the exemplary embodiment discussed herein utilizes Bluetooth. The operations of Bluetooth transceiver 46 may include, but are not limited to, such typical transceiver operations as conversion to RF frequencies, modulation and demodulation, spreading, and amplification. Antenna 48 provides efficient transmission and reception of signals in a wireless format.

While one aspect of the invention is directed to transmitting a representation of captured speech signals from a device for use in speech recognition applications, wireless headset 12 also implements a receive data link. All the various functional blocks shown in FIG. 2 support bidirectional data transfer. The audio codec 40 is capable of performing digital-to-analog (D/A) conversion and sending the analog signal to one or more speakers 16. The audio codec 40 preferably separates A\D and D\A converters with independent channels so that full duplex operation is possible. The received data link can be implemented utilizing either an asynchronous connection-less (ACL) link, as discussed further below for one embodiment of the invention, or an SCO link. If telephony-quality data is acceptable on the receive link, then an SCO link can be employed, and standard Bluetooth audio processing can be performed by either the baseband processor 44 or the digital processor 42, or by some combination of both. The processed audio data will then be sent to the audio codec 40 for playback by speakers 16. Generally, an SCO link using various packets might be acceptable on the receive side, unlike the transmit side of the invention, because the received data link may contain audio that will be listened to and interpreted by a human (i.e. the user) rather than a machine. As with typical Bluetooth voice applications, a lower quality voice link is possible for telephony applications.

However, if a more reliable link is necessary or desired, then an ACL link might be employed on the receive side as well, according to the invention. In that case, audio processing would be performed by the digital processor 42. A more reliable receive data link may be necessary, for example, for safety-critical applications, such as for use by emergency first responders.

As noted above, it will be apparent to a person of ordinary skill in the art that the disclosed embodiment is exemplary only and a wide range of other embodiments may be implemented in accordance with the principles of the present invention. For example, various different commercially available components are available to implement the elements described in FIG. 2, with varying levels of integration. Furthermore, the functional blocks in the Figure may be implemented using individual integrated circuit components, or several functional blocks may be combined together into a single integrated circuit.

Referring now to FIG. 3, that figure shows a processing flow chart for one embodiment of the present invention. Analog audio signals, such as user speech or voice signal, are collected, such as by a microphone 18 in the headset 12 in the example discussed herein. Alternatively, an analog audio signal might be retrieved from a storage medium, such as tape to be further processed and transmitted according to the invention. The audio input is provided to circuitry for performing A/D conversion 62. For example, the audio signals might be directed to a codec 40 as discussed. In the A/D conversion step 62, the analog audio signal is converted to digital samples, which are suitable for being further processed and for being used in speech applications, such as in a speech recognition system. The A/D conversion step 62 may utilize typical sampling rates for high-quality audio signals, such as sampling rates 11,025 Hz; 16,000 Hz; 22,060 Hz; 44,100 Hz, and 48,000 Hz. For the purposes of discussion of the exemplary embodiment herein, we will address the sample rates of 11,025 Hz and 16,000 Hz. Such sample rates are suitable for capturing most of the speech frequencies that would be of interest in general speech applications, such as a speech recognition application. Accordingly, the audio codec 40 is configured and operable for achieving such sampling rates. It is also desirable that the resolution of the A/D conversion in step 62 by the codec 40 is at least 12 bits in order to provide an acceptable quantization error. Reasonably priced devices that provide up to 16 bits of resolution are commercially available and, thus, a 16-bit resolution is also discussed in the exemplary environment herein. Of course, other higher resolutions might also be utilized.

The output of the A/D conversion step 62 may, therefore, provide a continuous bit stream of from 132.3 Kilobits/second (Kbps) (i.e., 11,025 Hz×12 bits resolution) to around 256 Kbps (i.e., 16,000 Hz×16 bits resolution). While such a bit stream would clearly exceed the capability of a typical Bluetooth SCO link using voice packets to transmit the speech signal, the present invention provides generation of data reflective of the audio signal and, utilizes an ACL link with data packets. Additional processing of the bit stream enhances the data for being transmitted, and then subsequently used with a speech application, such as speech recognition system, the additional processing also reduces the bandwidth needed to transfer the data over a Bluetooth link.

Specifically, to further process the bit stream, a pre-emphasis step 64 may be utilized. A pre-emphasis step may be performed, for example, by the digital processor 42. In one embodiment, the pre-emphasis is typically provided in the digital processor by a first-order filter that is used to emphasize the higher frequencies of the speech spectra, which may contain information of greater value to a speech recognition system than the lower frequencies. One suitable filter may have an equation of the form:

y(t)=x(t)−a*y(t−1) EQ 1

where “a” is a scaling factor that is utilized to control the amount of pre-emphasis applied. The range of the scaling factor is typically between 0.9 and 1.0 depending upon the amount of spectral tilt present in the speech data. Spectral tilt essentially refers to the overall slope of the spectrum of a speech signal as is known to those of skill in the art.

To further process the digitized audio signal in the form of the bit stream, the data stream is then processed through a frame generation step or steps 66. The frame generation might also be performed by digital signal processing circuitry such as the digital processor 42 of FIG. 2. In the frame generation step, the data stream is subdivided into multiple successive frames to be further processed. In one embodiment of the invention, the frames are overlapping frames. Data overlap on each end of the frame is needed to eliminate artifacts that would be introduced by the signal-processing algorithms further down the processing chain. For speech recognition systems, framed buffer sizes may typically range from 10 msec (i.e., 100 frames per second) to 100 msec (i.e., 10 frame per second) of continuous audio samples. Frames may have an overlap of around 0 percent to 50 percent of the previous frame. The frames essentially represent portions of the digitized audio signal and the successive frames thus make up the whole captured audio signal from step 60. Follow-unprocessing is then performed on each frame sequentially.

Referring again to FIG. 3, a windowing step may be provided in the digital signal processing by digital processor 42. For example, a Hamming window might be utilized to multiply each frame in one embodiment. Of course, other types of windowing circuits might also be utilized to adjust the digitized audio signal. The windowing step 68, such as with a Hamming window, serves to smooth the frequency content of the frame and reduce spectral leakage that would occur by the implicit rectangular windowing imposed by the framing operation of step 66. Without the windowing step 68 of the Hamming window, the sudden breaks at each end of the successive frames would cause ringing in the frequency content, spreading energy from some frequencies across the entire spectrum. The Hamming window tapers the signal at the edges of the frame, thereby reducing the spectral leakage that occurs. The Hamming window has a raised cosine shape and might be specified for a window of size “N,” as follows:

$\begin{matrix} w (i) = 0.54 + 0.46 * \cos (\frac{2 π * i}{N} - π) & EQ 2 \end{matrix}$

In accordance with a further aspect of the present invention, an autocorrelation step 70 is performed. That is, the autocorrelation of each frame is calculated in sequence. The autocorrelation step 70 generates a set of coefficients for each frame. The coefficients are reflective of spectral characteristics of the audio signal portion represented by the frame. That is, the data sent by the present invention is not simply a digitized voice signal, but rather is a set of coefficients configured as data that are reflective of spectral characteristics of the audio signal portion.

In a speech signal, it is the envelope of the spectrum that contains the data of interest to a speech recognition system. The autocorrelation step 70 computes a set of coefficients that parameterize the spectral envelope of the speech signal. That is, the coefficient set is reflective of the spectral envelope. This is a particular advantage of the present invention, with use in speech recognition systems, because speech recognition systems also use autocorrelation coefficients. Therefore, in further processing, the data sent by the inventive wireless device, no additional computational complexity would be imposed on the speech recognition system.

Autocorrelation is computed on each frame as follows, for example:

$\begin{matrix} R (i) = \sum_{i} x (t) * x (t - i) & EQ 3 \end{matrix}$

where “R” is autocorrelation coefficients,

where “i” is in the range of 0 to the number of autocorrelation coefficients generated minus 1, and

where “t” is based on the size of the frame.

Autocorrelation algorithms are known to a person of ordinary skill in the art to generate spectral information useful to a speech recognition system. The number of coefficients to use depends primarily on the speech frequency range and the spectral tilt of the speech signal. As a general rule, two coefficients are generated for every 1,000 Hz of speech bandwidth, plus additional coefficients as needed for the speech recognition system to compensate for spectral tilt. In accordance with one aspect of the present invention, the typical values of “i” as the number of coefficients, range from 10 to 21 coefficients per frame. Each coefficient that is generated in the invention is represented as a data word, and the data word sizes typically range from 16 to 32 bits for each coefficient. Of course, different ranges of coefficients might be utilized, as well as different sized data words. However, the noted ranges are typical for an exemplary embodiment of the invention. The autocorrelation step is also a process provided by the digital signal processor, or digital processor 42.

The resulting output from the autocorrelation step 70 is digital speech data 72 that consists of a set of autocorrelation coefficients reflective of the spectral characteristics of the captured analog audio input. Therefore, the coefficients can be used to recreate the original voice waveform, although with some loss compared with the original waveform, due to the digitization of the signal processing, as noted above.

In accordance with another aspect of the present invention, a wireless transceiver is configured for transmitting the set of coefficients as data. In an example utilizing a Bluetooth transceiver, the set of coefficients may be transmitted as data utilizing data packets in the Bluetooth protocol, and utilizing a Bluetooth ACL link. The transceiver is configured for transmitting the set of coefficients as data to another device to utilize for speech applications, such as speech recognition applications. The speech recognition system utilizes the autocorrelation data to compute speech features general referred to as “cepstra,” as is known in the art of speech recognition. The cepstra is then used with a pattern-matching approach to identify the spoken word, also in line with recognized speech recognition technology. Therefore, since speech recognition systems already use the autocorrelation coefficients that are sent as data by the present invention, no additional computational complexity is imposed on the speech recognition system, as noted above. The speech recognition system may exist elsewhere in the processing stream, such as in main server 24, portable terminal 20, or in another Bluetooth-enabled device 22.

Providing a speech signal as a coefficient data over a Bluetooth or other transceiving protocol rather than as traditional digitized voice provides significant benefits noted above. Reviewing the bit rates achieved by the invention, which are provided as digital speech data, the bit rate using the processing chain can range, for example, from around 1.6 Kbps to 67.2 Kbps depending on the parameters chosen for implementing the embodiment of the invention. For example,

Minimum rate=10 frames/second*10 words/frame*16 bits/word=1,600 bits/second(1.6 Kbps) EQ4
Maximum rate=100 frames/second*21 words/frame*32 bits/word=67,200 bits/second(67.2 Kbps) EQ5

The proper choice of parameters for an embodiment of the invention would be dependent upon the characteristics of the speech recognition system and, thus, the particular parameters with respect to frame size, coefficients per frame, and data word size may be selectively adapted as desired, according to the present invention.

In one particular embodiment of the invention as noted, a Bluetooth transceiver may be utilized for transmitting the coefficient data, utilizing data packets rather than voice. Thus, the present invention provides the reliable transfer of digital speech data 72 over a Bluetooth link utilizing data packets to provide higher quality voice data for a speech recognition system or other speech application, and also a reduced data rate for transmission over the Bluetooth link.

To provide reliable transfer of the digital speech data over the Bluetooth link, one embodiment of the invention uses the ACL link (instead of the typical voice SCO link) at the physical layer. Referring to FIG. 4, various different Bluetooth link types are illustrated along with the Bluetooth packet types suitable for such links and other characteristics along with the maximum data rate for asymmetric and symmetric transmission. As seen from FIG. 4, in a Bluetooth protocol, data packets transmitted over ACL links support retransmission. In that way, packets are not dropped, which might be critical for a speech recognition system. Therefore, in the present invention, ACL links are much more reliable and robust than SCO links for the purposes of speech signal transmission. The six types of data packets supported by a Bluetooth ACL link are shown in the table of FIG. 4. Data is carried in DH (Data High rate) and DM (Data Medium rate) packets. DM packets carry extra data but provide less error protection.

Generally, in a Bluetooth protocol, packets of information are transmitted on numbered time slots. The data packets may have various lengths spanning multiple slots. For example, a one-slot packet might be sent, whereas other packets may require three slots or five slots respectively. Shorter length packets (i.e., Dx1) provide lower data throughput, but are less susceptible to non-recoverable burst errors. Longer length packets, on the other hand (i.e., Dx5) provide higher data throughput, but are more susceptible to non-recoverable burst errors. In the present invention, the data packets are utilized to transmit voice information. Once the voice data (i.e. coefficient data) is generated, the Bluetooth protocol contains built in algorithms to monitor the quality and reliability of the link, and to determine which packet types are appropriate at any given time. That is, the Bluetooth transceiver 46 of FIG. 2 can determine which data packets are most appropriately sent in accordance with the principles of the present invention.

In any case, due to the reduced data rate necessary for high quality voice transmission utilizing the present invention, any type of ACL data packet transmitted in symmetric mode is capable of handling the data rate for the digital speech data 72. For example, for the embodiment of the invention discussed herein, a maximum rate of 67.2 Kbps is required. Any of the ACL packet types in the table of FIG. 4, in symmetric mode, is capable of supporting a bit rate of at least 108.8 kbps. Additionally, for asymmetric mode, at least three packet types (DM1, DH1, DH3) are capable of handling the maximum data rate in asymmetric mode.

In accordance with another aspect of the present invention, depending upon the desired system parameters, the invention can be parameterized in such a way that any of the ACL packets in either symmetric mode or asymmetric mode are capable of handling the link bandwidth. For example, 100 frames/second×21 words/frame×16 bits/word=33.6 kbps. This is less than the smallest maximum asymmetric rate of 36.3 kbps for a DM5 packet.

Once the coefficient data is determined, it is then transmitted by the wireless device to another device or system that has a suitable receiver. The received data is utilized for speech applications, such as speech recognition applications or other speech applications. As noted, autocorrelation coefficients may be utilized directly by the speech recognition system without additional computational complexity in the system. Various different speech recognition systems might be utilized as known by a person of ordinary skill in the art, and thus the present invention is not directed to a specific type of speech recognition system. Of course, those systems that are capable of directly handling the autocorrelation coefficient data as transmitted may be most desirable.

While the exemplary embodiment discussed herein is directed to transmitting the coefficient data to another device, as noted above, the wireless device, such as a headset, may also receive data. To that end, all the functional blocks of FIG. 2 support bidirectional data transfer. A receive data link, such as with a Bluetooth transceiver, may be implemented using an ACL link or an SCO link. If the received data link contains audio that will be listened to and interpreted by a human, the reduced quality of the SCO link may be acceptable, and standard Bluetooth audio processing can be performed by either the baseband processor 44 or digital processor 42 with the processed audio data being sent to codec 40 for playback over speaker 16. However, if a more reliable receive link is necessary or desired, as with the transmission link, then an ACL Bluetooth link may be employed. In such a case, the received audio data would be processed by the digital processor 42.

Another advantage of the present invention in using the autocorrelation coefficients as the speech representation and sending them as data is the ability to leverage this representation to reproduce the speech signal at the receiver, such as for replay or storage of the audio signals of the speech. With additional data bits representing a residual signal, the speech signal may be effectively regenerated, such as to be replayed in audio. This aspect of the invention is useful in various applications where the ability to collect the speech signal (or listen in) or the ability to recreate the audio speech is required, along with speech recognition capabilities of the invention. In the proposed implementation, the autocorrelation values that are generated by a transmitter (such as a headset) and sent to a receiver (such as a terminal) are used to generate a predictor to remove the redundancy in the speech signal and produce a residual signal. The residual is then encoded. Generally fewer bits per sample are needed for the residual signal than for the speech signal and respective coefficients (e.g. 2 to 4 bits per sample versus 16 bits per sample) The encoded residual signal is then transmitted to the receiver. At the receiver the residual signal is reconstructed from the encoded values, the redundancy of the speech signal is reinserted using the available autocorrelation values that were transmitted, and the speech signal is thus reproduced.

Generally, the steps at the transmitter in accordance with one embodiment of the invention are as follows and as illustrated in the flowchart of FIG. 5:

- 1. Use the autocorrelation values to generate a set of prediction coefficients (step 80).
- 2. Use the prediction coefficients to predict current speech signal values from the previous values (step 82).
- 3. Subtract the predicted speech signal values from the true speech signal values to generate the residual signal (step 84).
- 4. Encode the residual signal values using 2-4 bits per sample (step 86).
- 5. Transmit the encoded residual signal to a receiver (step 88).
  
  The prediction coefficients utilized in one embodiment of the invention for step 80 are established utilizing Linear Prediction, which would be known to a person of ordinary skill in the art. Specifically, in Linear Prediction, the prediction coefficients are related to the autocorrelation values by the following equations:

$\begin{matrix} \sum_{k = 1}^{p} a_{i} R (\langle i - k \rangle) = R (i) for 1 \leq i \leq p & EQ 6 \end{matrix}$

wherein a_iare the prediction coefficients;

R(i) are the autocorrelation coefficients;

and p is the number of prediction coefficients.

Usually the number of prediction coefficients p is one less than the number of correlation values available. So, for example, if you calculate 17 correlation values, R(0) through R(16), then p would equal 16. The above equations represent p linear equations in p unknowns. These equations may be solved in a variety of ways for the purposes of the invention. For example, matrix inversion, Gaussian elimination, a Levinson-Durbin algorithm, might be used. The method of solution generally does not change the resulting prediction coefficients (other than numerical round off errors).

The prediction coefficients are then used to generate a predicted speech signal per step 82 using the following equation:

$\begin{matrix} \hat{s} (n) = \sum_{k = 1}^{p} a_{k} s (n - k) & EQ 7 \end{matrix}$

where ŝ(n) is the predicted speech signal;

and a s(n) is the original speech signal.

The residual speech signal e(n) is then defined as the difference between the original and predicted speech signals:

e(n)=s(n)−ŝ(n) EQ 8

That is, as noted in step 84 of FIG. 5, the predicted speech signal values are subtracted from the true speech signal values to generate the residual speech signal.

The residual signal is then normalized by dividing each signal with a normalization factor G given by:

$\begin{matrix} G = \sqrt{R (0) - \sum_{k = 1}^{p} a_{k} R (k)} & EQ 9 \end{matrix}$

The normalized residual signal is then encoded, as noted in step 86, using a desirable number of bits (e.g., 2-10 bits) that might be determined by the design and the desired quality of the audio reproduction. Four (4) bits may be desired, although fewer, such as 2 bits may also be possible. If 2-4 bits per sample are utilized, it would represent great savings compared to the 16 bits per sample used to represent the original speech signal. At 11,025 samples per second, the bit rate for transmitting the speech signal values is reduced from 176,400 bits per second to 22050 to 44100 bits per second. The encoded residual is then transmitted to the receiver in accordance with the methodology as outlined hereinabove and step 88 of FIG. 5. That is, the residual values are transmitted using data parameters, such as Bluetooth data configurations using suitable packet types and parameters as discussed above and depending upon the ultimate bit rate requirement. The final choice of bit rate to use depends on the desired quality and the application of the reconstructed speech signal.

The steps at the receiver in accordance with one embodiment of the invention are then:

- 1. Use the autocorrelation values to generate a set of prediction of coefficients
- 2. Use the prediction coefficients to predict current speech signal values from the previous values
- 3. Add the predicted speech signal values to the residual signal to generate a representation of the original speech values.

The prediction coefficients are generated in the receiver, such as a terminal, generally exactly as they were generated in the transmitter, such as a headset, since they are derived from the same autocorrelation values. Also the normalization value G is calculated as shown above. The received residual signal is decoded and multiplied by G to remove the effect of the normalization.

For those applications requiring audio, the speech signal is regenerated, such as to transmit it or play it back as an audio signal, by adding the predicted value of speech to the received residual signal using the following equation:

$\begin{matrix} \tilde{s} (n) = ⅇ (n) + \sum_{k = 1}^{p} \tilde{s} (n - k) & EQ 10 \end{matrix}$

where {tilde over (s)}(n) is the reconstructed speech signal.

This aspect of the invention takes advantage of the availability of autocorrelation values at the receiver, according to the invention as described herein, to reduce the number of bits per sample needed to represent the speech signal and reproduce the speech signal at the receiver or elsewhere. The approach is based on the well-known Linear Prediction method of speech representation. This method is the source of many approaches to speech coding. In accordance with one embodiment of the invention, a specific methodology is described herein, however other approaches may also be used. That is, while a basic method is described, the invention contemplates the use of other Linear-Prediction based methods. Of course, as noted above, where audio is not necessary at the receiver site, the data, such as the autocorrelation coefficients may be used directly for speech recognition applications.

While the present invention has been illustrated by a description of various embodiments and while these embodiments have been described in considerable detail, it is not the intention of the applicant to restrict or in any way limit the scope of the appended claims to such detail. Additional advantages and modifications will readily appear to those skilled in the art. The invention in its broader aspects is therefore not limited to the specific details, representative apparatus and method, and illustrative examples shown and described. Accordingly, departures may be made from such details without departing from the spirit or scope of applicant's general inventive concept.

Claims

1. A wireless device for use with a separate processing device that runs speech recognition applications, the wireless device comprising: speech processing circuitry for providing initial speech processing of audio captured from a user and including a frame generator for generating successive frames from digitized audio signals that are reflective of the audio signals captured from the user, the frames representing portions of the digitized audio signals; andautocorrelation circuitry for generating a set of coefficients for each frame, the coefficient set containing information about spectral characteristics of the audio signal portion represented by the frame;processing circuitry configured for transforming the set of coefficients into data in a Bluetooth data packet type instead of a voice packet type for wireless transmission;a Bluetooth transceiver configured for transmitting the set of coefficients as data in the Bluetooth data packet type and configured for using an asynchronous connection-less (ACL) link with a separate processing device for transmitting the set of coefficients to the processing device;wherein the separate processing device utilizes the coefficients for further processing for speech recognition applications.
2. The wireless device of claim 1, the speech processing circuitry including an A/D circuit for sampling an audio signal captured by the headset to form the digitized audio signals.
3. The wireless device of claim 2 wherein the A/D circuit is configured for sampling at a rate in the range of 11 KHz to 48 KHz.
4. The wireless device of claim 1, the speech processing circuitry including pre-emphasis circuitry to emphasize higher frequency components of the digitized audio signals.
5. The wireless device of claim 4 wherein the pre-emphasis circuitry includes a filter.
6. The wireless device of claim 1 wherein the frame generator generates successive overlapping frames.
7. The wireless device of claim 6 wherein the frames overlap with successive frames in an amount in the range of 0% to 50%.
8. The wireless device of claim 1 wherein the frames generated are in the range of 10 ms to 100 ms in size.
9. The wireless device of claim 1, the speech processing circuitry including windowing circuitry for adjusting the digitized audio signals at the edges of the successive frames.
10. The wireless device of claim 1 wherein the autocorrelation circuitry generates a number of coefficients for each frame in the range of 10 to 21.
11. The wireless device of claim 1 wherein generated coefficients have a data size in the range or 16 to 32 bits per coefficient.
12. A wireless device for use with a separate processing device that runs speech recognition applications, the wireless device comprising: digital signal processing circuitry for providing initial speech processing and configured to generate sets of coefficients from digitized audio signals that are reflective of audio signals captured from the user, the coefficients containing information about spectral characteristics of portions of the digitized audio signals;processing circuitry for transforming the sets of coefficients into data in a Bluetooth data packet type instead of a voice packet type for wireless transmission;a Bluetooth transceiver configured for transmitting the sets of coefficients as data in the Bluetooth data packet type and configured for using an asynchronous connection-less (ACL) link with a separate processing device for transmitting the set of coefficients to the processing device;wherein the separate processing device utilizes the coefficients for further processing for speech recognition applications.
13. The wireless device of claim 12 wherein the digital signal processing circuitry is configured for generating successive frames from digitized audio signals, the frames representing portions of the digitized audio signal.
14. The wireless device of claim 13 wherein the frames generated are in the range of 10 ms to 100 ms in size.
15. The wireless device of claim 12 wherein the digital signal processing circuitry pre-emphasizes higher frequency components of the digitized audio signals prior to generating the sets of coefficients.
16. The wireless device of claim 13 wherein the digital signal processing circuitry is configured to perform a windowing application on the successive frames of the digitized audio signals for adjusting the digitized audio signals at the edges of the successive frames.
17. The wireless device of claim 13 wherein the digital signal processing circuitry generates a number of coefficients for each frame in the range of 10 to 21.
18. The wireless device of claim 12 wherein the generated coefficients have a data size in the range or 16 to 32 bits per coefficient.
19. A wireless headset for use with a processing device that runs speech applications, the headset comprising: a microphone to capture audio signals from a user;digital signal processing circuitry for providing initial speech processing and configured to generate sets of coefficients from digitized audio signals that are reflective of the audio signals captured from the user, the coefficients containing information about spectral characteristics of portions of the digitized audio signals;processing circuitry for transforming the set of coefficients into data in a Bluetooth data packet type instead of a voice packet type for wireless transmission;a Bluetooth transceiver configured for transmitting the sets of coefficients in the Bluetooth data packet type and configured for using an asynchronous connection-less (ACL) link with a separate processing device for transmitting the set of coefficients to the processing device;wherein the separate processing device utilizes the coefficients for further processing for speech applications.
20. The wireless headset of claim 19 wherein the digital signal processing circuitry is configured for generating successive frames from digitized audio signals, the frames representing portions of the digitized audio signals.
21. The wireless headset of claim 20 wherein the frames generated are in the range of 10 ms to 100 ms in size.
22. The wireless headset of claim 19 wherein the digital signal processing circuitry pre-emphasizes higher frequency components of the digitized audio signals prior to generating the sets of coefficients.
23. The wireless headset of claim 20 wherein the digital signal processing circuit is configured to perform a windowing application on the successive frames of the digitized audio signal for adjusting the digitized audio signal at the edges of the successive frames.
24. The wireless headset of claim 20 wherein the digital signal processing circuitry generates a number of coefficients for each frame in the range of 10 to 21.
25. The wireless headset of claim 19 wherein the generated coefficients have a data size in the range or 16 to 32 bits per coefficient.
26. A method of transmitting voice information over a wireless link from one device to a separate processing device for use in speech recognition applications, the method comprising: capturing audio signals from a user with the one device and providing initial speech processing by digitizing the audio signals; andgenerating a set of autocorrelation coefficients from the digitized audio signals, the coefficient set containing information about spectral characteristics of the audio signals;processing the coefficients for transforming the set of coefficients into data in a Bluetooth data packet type instead of a voice packet type for wireless transmission;with a Bluetooth transceiver, transmitting the set of coefficients as data in the Bluetooth data packet type to a separate processing device using an asynchronous connection-less (ACL) link with the separate processing device;utilizing the coefficients in the separate processing device for further processing for speech recognition applications.
27. The method of claim 26 wherein the initial speech processing further includes generating successive frames from a digitized version of the audio signals, the frames representing portions of the digitized audio signals.
28. The method of claim 27 further comprising generating successive overlapping frames.
29. The method of claim 28 wherein the frames overlap with successive frames in an amount in the range of 0% to 50%.
30. The method of claim 26 wherein the initial speech processing further includes pre-emphasizing higher frequency components of the digitized audio signals prior to generating a set of autocorrelation coefficients.
31. The method of claim 27 wherein the frames generated are in the range of 10 ms to 100 ms in size.
32. The method of claim 27 wherein the initial speech processing further includes performing a windowing operation on the frames for adjusting the digitized audio signals at the edges of the successive frames.
33. The method of claim 27 further comprising generating the autocorrelation coefficients with a number of coefficients for each frame in the range of 10 to 21.
34. The method of claim 26 wherein generated coefficients have a data size in the range or 16 to 32 bits per coefficient.
35. A wireless device for use with a separate processing device that runs speech recognition applications, the wireless device comprising: speech processing circuitry for providing initial speech processing and including conversion circuitry for digitizing original audio signals to form digitized audio signals; andautocorrelation circuitry for generating a set of coefficients from the digitized audio signals, the coefficient containing information about spectral characteristics of the audio signals;processing circuitry using the autocorrelation coefficients for generating a predicted version of the audio signals and to subtract the predicted version from the original audio signals to generate residual signals;processing circuitry for transforming the set of coefficients and residual signals into data in a Bluetooth data packet type instead of a voice packet type for wireless transmission;a Bluetooth transceiver configured for transmitting the set of coefficients and the residual signals as data in the Bluetooth data packet type and configured for using an asynchronous connection-less (ACL) link with a separate processing device for transmitting the set of coefficients and residual signals to the processing device;wherein the separate processing device utilizes the coefficients and the residual signals data for further processing for speech recognition applications.
36. The wireless device of claim 35 wherein the processing circuitry generates prediction coefficients for the predicted version of the audio signals.
37. The wireless device of claim 35 wherein the processing circuitry is configured for normalizing the residual signals.
38. The wireless device of claim 35 further comprising encoding circuitry for encoding the residual signals for transmission.
39. The wireless device of claim 38 further wherein the encoding circuit encodes the residual signals with 2-10 bits.
40. A wireless system for use with speech applications comprising: a transmitter including speech processing circuitry for providing initial speech processing, the speech processing circuitry configured for digitizing original audio signals and generating a set of autocorrelation coefficients from the digitized audio signals that contain information about spectral characteristics of the audio signals, the speech processing circuitry further configured for generating predicted versions of the audio signals from the autocorrelation coefficients and for subtracting the predicted versions from the original audio signals to generate residual signals, the processing circuitry configured for transforming the set of coefficients and residual signals into data in a Bluetooth data packet type instead of a voice packet type for wireless transmission;a Bluetooth transceiver configured for transmitting the set of autocorrelation coefficients and the residual signals as data in the Bluetooth data packet type and configured for using an asynchronous connection-less (ACL) link with a separate processing device for transmitting the set of coefficients and residual signals to the separate processing device; andthe processing device having a receiver for receiving the data, the processing device configured for utilizing the autocorrelation coefficients and residual signals data for further processing for speech applications.
41. The system of claim 40 wherein the processing device is configured to use the autocorrelation coefficients and residual signals data for speech recognition applications.
42. The system of claim 40 wherein the processing device is configured for using the autocorrelation coefficients and residual signals to recreate audio signals reflective of the original audio signals.
43. The system of claim 42 wherein the processing device is configured for using the autocorrelation coefficients to generate predicted versions of the audio signals and then to add the predicted versions to the residual signals to recreate audio signals reflective of the original audio signals.
44. A method of transmitting voice information over a wireless link from one device to a separate processing device for use in speech applications, the method comprising: capturing an original audio signal with the one device and providing initial speech processing by digitizing the audio signal; andgenerating a set of autocorrelation coefficients from the digitized audio signal, the autocorrelation coefficient set containing information about spectral characteristics of the audio signal;generating a predicted version of the audio signals with the autocorrelation coefficients and subtracting the predicted version from the original audio signals to generate residual signals;processing the coefficients and residual signals for transforming them into data in a Bluetooth data packet type instead of a voice packet type for wireless transmission;with a Bluetooth transceiver, transmitting the set of coefficients and the residual signals as data in the Bluetooth data packet type to a separate processing device using an asynchronous connection-less (ACL) link with the separate processing device where the autocorrelation coefficients and residual signals data are utilized by the processing device for speech applications.

US Referenced Citations (382)

Number	Name	Date	Kind
1483315	Saal	Feb 1924	A
D130619	Treslse et al.	Dec 1941	S
D153112	Braun et al.	Mar 1949	S
2506524	Stuck	May 1950	A
2782423	Wiegand et al.	Feb 1957	A
2958769	Bounds	Nov 1960	A
3087028	Ernest	Apr 1963	A
D196654	Van Den Berg	Oct 1963	S
3192326	Chapman	Jun 1965	A
D206665	Sanzone	Jan 1967	S
3327807	Mullin	Jun 1967	A
D212863	Roberts	Dec 1968	S
3568271	Husserl	Mar 1971	A
3654406	Reinthaler	Apr 1972	A
3682268	Gorike	Aug 1972	A
3969796	Hodsdon	Jul 1976	A
3971900	Foley	Jul 1976	A
3971901	Foley	Jul 1976	A
3984885	Yoshimura et al.	Oct 1976	A
4018599	Hill et al.	Apr 1977	A
4020297	Brodie	Apr 1977	A
4024368	Shattuck	May 1977	A
4031295	Rigazio	Jun 1977	A
4039765	Tichy	Aug 1977	A
4090042	Larkin	May 1978	A
4138598	Cech	Feb 1979	A
4189788	Schenke	Feb 1980	A
4239936	Sakoe	Dec 1980	A
RE30662	Foley	Jun 1981	E
4302635	Jacobsen	Nov 1981	A
4335281	Scott et al.	Jun 1982	A
D265989	Harris	Aug 1982	S
4357488	Knighton et al.	Nov 1982	A
D268675	Hass	Apr 1983	S
4409442	Kamimura	Oct 1983	A
4418248	Mathis	Nov 1983	A
4471496	Gardner	Sep 1984	A
4472607	Houng	Sep 1984	A
4499593	Antle	Feb 1985	A
D278805	Bulgari	May 1985	S
4625083	Poikela	Nov 1986	A
4634816	O'Malley et al.	Jan 1987	A
4672672	Eggert et al.	Jun 1987	A
4672674	Clough	Jun 1987	A
4689822	Houng	Aug 1987	A
4783822	Toole et al.	Nov 1988	A
D299129	Wiegel	Dec 1988	S
4821318	Wu	Apr 1989	A
D301145	Besasie et al.	May 1989	S
4845650	Meade et al.	Jul 1989	A
4875233	Derhaag	Oct 1989	A
4907266	Chen	Mar 1990	A
4952024	Gale	Aug 1990	A
D313092	Nilsson	Dec 1990	S
5003589	Chen	Mar 1991	A
5018599	Dohi et al.	May 1991	A
5023824	Chadima et al.	Jun 1991	A
D318670	Taniguchi	Jul 1991	S
5028083	Mischenko	Jul 1991	A
5056161	Breen	Oct 1991	A
D321879	Emmerling	Nov 1991	S
5113428	Fitzgerald	May 1992	A
D326655	Iribe	Jun 1992	S
5155659	Kunert	Oct 1992	A
5177784	Hu	Jan 1993	A
5179736	Scanlon	Jan 1993	A
D334043	Taniguchi et al.	Mar 1993	S
5197332	Shennib	Mar 1993	A
5202197	Ansell et al.	Apr 1993	A
D337116	Hattori	Jul 1993	S
5225293	Mitchell	Jul 1993	A
5251105	Kobayashi	Oct 1993	A
D341567	Acker	Nov 1993	S
5267181	George	Nov 1993	A
5281957	Schoolman	Jan 1994	A
D344494	Cardenas	Feb 1994	S
D344522	Taniguchi	Feb 1994	S
5293647	Mirmilshteyn et al.	Mar 1994	A
5305244	Newman	Apr 1994	A
5369857	Sacherman et al.	Dec 1994	A
5371679	Abe et al.	Dec 1994	A
5381473	Andrea	Jan 1995	A
5381486	Ludeke	Jan 1995	A
5406037	Nageno	Apr 1995	A
5438626	Neuman	Aug 1995	A
5438698	Burton et al.	Aug 1995	A
5446788	Lucey et al.	Aug 1995	A
5469505	Gattey	Nov 1995	A
D365559	Fathi	Dec 1995	S
5475791	Schalk	Dec 1995	A
5479001	Kumar	Dec 1995	A
D367256	Tokunaga	Feb 1996	S
5491651	Janik	Feb 1996	A
5501571	Van Durrett et al.	Mar 1996	A
5515303	Cargin et al.	May 1996	A
5535437	Karl et al.	Jul 1996	A
5553312	Gattey et al.	Sep 1996	A
5555490	Carroll	Sep 1996	A
5555554	Hofer	Sep 1996	A
5563952	Mercer	Oct 1996	A
5572401	Carroll	Nov 1996	A
5572623	Pastor	Nov 1996	A
5579400	Ballein	Nov 1996	A
D376598	Hayashi	Dec 1996	S
D377020	Bungardt et al.	Dec 1996	S
5581492	Janik	Dec 1996	A
5604050	Brunette et al.	Feb 1997	A
5604813	Evans et al.	Feb 1997	A
5607792	Garcia et al.	Mar 1997	A
D380199	Beruscha	Jun 1997	S
5637417	Engmark	Jun 1997	A
D384072	Ng	Sep 1997	S
5665485	Kuwayama et al.	Sep 1997	A
5671037	Ogasawara et al.	Sep 1997	A
5673325	Andrea	Sep 1997	A
5673364	Bialik	Sep 1997	A
D385272	Jensen	Oct 1997	S
5680465	Boyden	Oct 1997	A
D385855	Ronzani	Nov 1997	S
5687244	Untersander	Nov 1997	A
D387898	Ronzani	Dec 1997	S
D390552	Ronzani	Feb 1998	S
D391234	Chacon et al.	Feb 1998	S
5716730	Deguchi	Feb 1998	A
5719743	Jenkins et al.	Feb 1998	A
5719744	Jenkins et al.	Feb 1998	A
D394436	Hall et al.	May 1998	S
5749072	Mazurkiewicz et al.	May 1998	A
5757339	Williams et al.	May 1998	A
5762512	Trant et al.	Jun 1998	A
5766794	Brunette et al.	Jun 1998	A
5774096	Usuki et al.	Jun 1998	A
5774837	Yeldener et al.	Jun 1998	A
5778026	Zak	Jul 1998	A
5781644	Chang	Jul 1998	A
5787166	Ullman	Jul 1998	A
5787361	Chen	Jul 1998	A
5787387	Aguilar	Jul 1998	A
5787390	Quinquis et al.	Jul 1998	A
5793865	Leifer	Aug 1998	A
5793878	Chang	Aug 1998	A
D398899	Chaco	Sep 1998	S
D400848	Clark et al.	Nov 1998	S
5832098	Chen	Nov 1998	A
5841630	Seto et al.	Nov 1998	A
5841859	Chen	Nov 1998	A
D402651	Depay et al.	Dec 1998	S
5844824	Newman et al.	Dec 1998	A
5856038	Mason	Jan 1999	A
5857148	Weisshappel et al.	Jan 1999	A
5860204	Krengel	Jan 1999	A
5862241	Nelson	Jan 1999	A
D406098	Walter et al.	Feb 1999	S
5869204	Kottke et al.	Feb 1999	A
5873070	Bunte et al.	Feb 1999	A
5890074	Rydbeck	Mar 1999	A
5890108	Yeldener	Mar 1999	A
5895729	Phelps et al.	Apr 1999	A
D409137	Sumita	May 1999	S
5905632	Seto et al.	May 1999	A
D410466	Mouri	Jun 1999	S
D410921	Luchs et al.	Jun 1999	S
D411179	Toyosato	Jun 1999	S
5931513	Conti	Aug 1999	A
5933330	Beutler et al.	Aug 1999	A
5935729	Mareno	Aug 1999	A
D413582	Tompkins	Sep 1999	S
D414470	Chacon	Sep 1999	S
5991085	Rallison et al.	Nov 1999	A
5999085	Szwarc	Dec 1999	A
6014619	Wuppermann et al.	Jan 2000	A
6016347	Magnasco	Jan 2000	A
6021207	Puthuff et al.	Feb 2000	A
6036100	Asami	Mar 2000	A
D422962	Shevlin et al.	Apr 2000	S
6051334	Tsurumaru	Apr 2000	A
D424035	Steiner	May 2000	S
6060193	Remes	May 2000	A
6061647	Barrett	May 2000	A
6071640	Robertson et al.	Jun 2000	A
6075857	Doss et al.	Jun 2000	A
6078825	Hahn et al.	Jun 2000	A
6084556	Zwern	Jul 2000	A
6085428	Casby et al.	Jul 2000	A
6091546	Spitzer	Jul 2000	A
D430158	Bhatia	Aug 2000	S
D430159	Bhatia et al.	Aug 2000	S
6101260	Jensen et al.	Aug 2000	A
6114625	Hughes et al.	Sep 2000	A
6120932	Slipy et al.	Sep 2000	A
D431562	Bhatia et al.	Oct 2000	S
6127990	Zwern	Oct 2000	A
6136467	Phelps et al.	Oct 2000	A
6137868	Leach	Oct 2000	A
6137879	Papadopoulos et al.	Oct 2000	A
6154669	Hunter et al.	Nov 2000	A
D434762	Ikenaga	Dec 2000	S
6157533	Sallam	Dec 2000	A
6160702	Lee	Dec 2000	A
6167413	Daley	Dec 2000	A
D436104	Bhatia	Jan 2001	S
6171138	Lefebvre et al.	Jan 2001	B1
6179192	Weinger et al.	Jan 2001	B1
6188985	Thrift	Feb 2001	B1
6190795	Daley	Feb 2001	B1
D440966	Ronzani	Apr 2001	S
6225777	Garcia et al.	May 2001	B1
6226622	Dabbiere	May 2001	B1
6229694	Kono	May 2001	B1
6230029	Hahn et al.	May 2001	B1
6235420	Ng	May 2001	B1
6237051	Collins	May 2001	B1
D443870	Carpenter et al.	Jun 2001	S
6252970	Poon et al.	Jun 2001	B1
6261715	Nakamura et al.	Jul 2001	B1
D449289	Weikel et al.	Oct 2001	S
6302454	Tsurumaru	Oct 2001	B1
6304430	Laine	Oct 2001	B1
6304459	Toyosato et al.	Oct 2001	B1
6310888	Hamlin	Oct 2001	B1
6324053	Kamijo	Nov 2001	B1
D451903	Amae et al.	Dec 2001	S
D451907	Amae et al.	Dec 2001	S
6325507	Jannard	Dec 2001	B1
6326543	Lamp	Dec 2001	B1
6327152	Saye	Dec 2001	B1
6339706	Tillgren et al.	Jan 2002	B1
6339764	Livesay et al.	Jan 2002	B1
6349001	Spitzer	Feb 2002	B1
6353313	Estep	Mar 2002	B1
6356635	Lyman et al.	Mar 2002	B1
6357534	Buetow et al.	Mar 2002	B1
6359603	Zwern	Mar 2002	B1
6359777	Newman	Mar 2002	B1
6359995	Ou	Mar 2002	B1
6364126	Enriquez	Apr 2002	B1
6369952	Rallison et al.	Apr 2002	B1
6371535	Wei	Apr 2002	B2
6373942	Braund	Apr 2002	B1
6374126	MacDonald, Jr. et al.	Apr 2002	B1
6376942	Burger	Apr 2002	B1
6377825	Kennedy et al.	Apr 2002	B1
D457133	Yoneyama	May 2002	S
6384591	Estep	May 2002	B1
6384982	Spitzer	May 2002	B1
6386107	Rancourt	May 2002	B1
6394278	Reed	May 2002	B1
6434251	Jensen et al.	Aug 2002	B1
6445175	Estep	Sep 2002	B1
6446042	Detlef	Sep 2002	B1
6453020	Hughes et al.	Sep 2002	B1
6456721	Fukuda	Sep 2002	B1
D463784	Taylor et al.	Oct 2002	S
6466681	Siska, Jr.	Oct 2002	B1
D465208	Lee et al.	Nov 2002	S
D465209	Rath	Nov 2002	S
D466497	Wikel	Dec 2002	S
6496111	Hosack	Dec 2002	B1
6500581	White et al.	Dec 2002	B2
D469080	Kohli	Jan 2003	S
6511770	Chang	Jan 2003	B2
6532148	Jenks	Mar 2003	B2
6560092	Itou et al.	May 2003	B2
6562950	Peretz et al.	May 2003	B2
6581782	Reed	Jun 2003	B2
6600798	Wuppermann et al.	Jul 2003	B2
6615174	Arslan et al.	Sep 2003	B1
6628509	Kono	Sep 2003	B2
6633839	Kushner et al.	Oct 2003	B2
D482019	Petersen et al.	Nov 2003	S
6658130	Huang	Dec 2003	B2
6660427	Hukill	Dec 2003	B1
D487064	Stekelenburg	Feb 2004	S
6697465	Goss	Feb 2004	B1
D488146	Minto	Apr 2004	S
D488461	Okada	Apr 2004	S
6728325	Hwang et al.	Apr 2004	B1
6731771	Cottrell	May 2004	B2
D491917	Asai	Jun 2004	S
D492295	Glatt	Jun 2004	S
6743535	Yoneyama	Jun 2004	B2
6745014	Seibert	Jun 2004	B1
6749960	Takeshita	Jun 2004	B2
6754361	Hall	Jun 2004	B1
6754632	Kalinowski et al.	Jun 2004	B1
6757651	Vergin	Jun 2004	B2
D494517	Platto et al.	Aug 2004	S
6769762	Saito et al.	Aug 2004	B2
6769767	Swab et al.	Aug 2004	B2
6772114	Sluijter et al.	Aug 2004	B1
6772331	Hind et al.	Aug 2004	B1
6778676	Groth et al.	Aug 2004	B2
6795805	Bessette et al.	Sep 2004	B1
D498231	Jacobson et al.	Nov 2004	S
6811088	Lanzaro et al.	Nov 2004	B2
6826532	Casby et al.	Nov 2004	B1
6847336	Lemelson	Jan 2005	B1
6873516	Epstein	Mar 2005	B1
D506065	Sugino et al.	Jun 2005	S
6909546	Hirai	Jun 2005	B2
D507523	Resch et al.	Jul 2005	S
6934567	Gantz et al.	Aug 2005	B2
6934675	Glinski	Aug 2005	B2
6965681	Almqvist	Nov 2005	B2
D512417	Hirakawa et al.	Dec 2005	S
D512984	Ham	Dec 2005	S
D512985	Travers et al.	Dec 2005	S
7013018	Bogeskov-Jensen	Mar 2006	B2
D519497	Komiyama	Apr 2006	S
7027774	Kuon	Apr 2006	B2
D521492	Ham	May 2006	S
7050598	Ham	May 2006	B1
7052799	Zatezalo et al.	May 2006	B2
7063263	Swartz et al.	Jun 2006	B2
D524794	Kim	Jul 2006	S
D525237	Viduya	Jul 2006	S
7082393	Lahr	Jul 2006	B2
7085543	Nassimi	Aug 2006	B2
7099464	Lucey et al.	Aug 2006	B2
7106877	Linville	Sep 2006	B1
7107057	Arazi et al.	Sep 2006	B2
7110800	Nagayasu et al.	Sep 2006	B2
7110801	Nassimi	Sep 2006	B2
D529447	Greenfield	Oct 2006	S
D531586	Poulet	Nov 2006	S
7136684	Matsuura et al.	Nov 2006	B2
7143041	Sacks et al.	Nov 2006	B2
D537438	Hermansen	Feb 2007	S
7185197	Wrench, Jr.	Feb 2007	B2
7203651	Baruch et al.	Apr 2007	B2
7225130	Roth et al.	May 2007	B2
D549216	Viduya	Aug 2007	S
D549217	Viduya	Aug 2007	S
D549694	Viduya et al.	Aug 2007	S
D551615	Wahl	Sep 2007	S
D552595	Viduya et al.	Oct 2007	S
7343177	Seshadri et al.	Mar 2008	B2
7343283	Ashley	Mar 2008	B2
7346175	Hui et al.	Mar 2008	B2
D567218	Viduya et al.	Apr 2008	S
D567219	Viduya et al.	Apr 2008	S
D567799	Viduya et al.	Apr 2008	S
D567806	Viduya et al.	Apr 2008	S
7369991	Manabe et al.	May 2008	B2
7391863	Viduya	Jun 2008	B2
7519186	Varma et al.	Apr 2009	B2
7519196	Bech	Apr 2009	B2
7596489	Kovesi et al.	Sep 2009	B2
7885419	Wahl et al.	Feb 2011	B2
8050657	Hollander	Nov 2011	B2
20010010689	Awater et al.	Aug 2001	A1
20010017925	Ceravolo	Aug 2001	A1
20010017926	Vicamini	Aug 2001	A1
20010036291	Pallai	Nov 2001	A1
20010046305	Muranami	Nov 2001	A1
20020003889	Fischer	Jan 2002	A1
20020015008	Kishida	Feb 2002	A1
20020067825	Baranowski	Jun 2002	A1
20020068610	Anvekar	Jun 2002	A1
20020076060	Hall	Jun 2002	A1
20020110246	Gosior	Aug 2002	A1
20020111197	Fitzgerald	Aug 2002	A1
20020131616	Bronnikov	Sep 2002	A1
20020141547	Odinak et al.	Oct 2002	A1
20020152065	Kopp	Oct 2002	A1
20030095525	Lavin	May 2003	A1
20030118197	Nagayasu et al.	Jun 2003	A1
20030130852	Tanaka et al.	Jul 2003	A1
20030179888	Burnett	Sep 2003	A1
20030228023	Burnett	Dec 2003	A1
20040024586	Andersen	Feb 2004	A1
20040063475	Weng	Apr 2004	A1
20040091129	Jensen	May 2004	A1
20050070337	Byford et al.	Mar 2005	A1
20050149414	Schrodt	Jul 2005	A1
20050232436	Nagayasu et al.	Oct 2005	A1
20050272401	Zatezalo	Dec 2005	A1
20070223766	Davis	Sep 2007	A1
20100296683	Slippy et al.	Nov 2010	A1
20110107415	Shen	May 2011	A1
20110116672	Wahl et al.	May 2011	A1
20120045082	Wahl et al.	Feb 2012	A1

Foreign Referenced Citations (17)

Number	Date	Country
2628259	Dec 1977	DE
3604292	Aug 1987	DE
0380290	Aug 1990	EP
0531645	Mar 1993	EP
0703720	Mar 1996	EP
1018854	Jul 2000	EP
01383029	Jan 2004	EP
2320629	May 2011	EP
2352271	Aug 2011	EP
2275846	Sep 1994	GB
2006019340	Feb 2006	WO
2007061648	May 2007	WO
2007075226	Jul 2007	WO
2008089444	Jul 2008	WO
2010135314	Nov 2010	WO
2011056914	May 2011	WO
2012023993	Feb 2012	WO

Non-Patent Literature Citations (24)

Entry
Lawrence rabiner and biing-hwang juang, fundamentals of speech recognition, Apr. 22, 1993, Prentice Hall PTR, pp. 95-117.
Published U.S. Patent Publication No. 2002/0016161; published Feb. 7, 2002, filing date Jan. 29, 2001; Dellien et al., Method and apparatus for compression of speech encoded parameters.
Published U.S. Patent Publication No. 2002/0091526; published Jul. 11, 2002, filing date Dec. 13, 2001; Kiessling et al., Mobile terminal controllable by spoken utterances.
Published U.S. Patent Publication No. 2002/0159574; published Oct. 31, 2002, filing date Apr. 27, 2001; Stogel, Automatic telephone directory apparatus and method of operation thereof.
Published U.S. Patent Publication No. 2003/0050786; published Mar. 13, 2003, filing date Aug. 7, 2001; Jax et al., Method and apparatus for synthetic widening of the bandwidth of voice signals.
Published U.S. Patent Publication No. 2003/0103413; published Jun. 5, 2003, filing date Nov. 30, 2001; Jacobi, Jr. et al., Portable universal interface device.
Published U.S. Patent Publication No. 2003/0182243; published Sep. 25, 2003, filing date Mar. 20, 2002; Gerson et al., Method and apparatus for remote control of electronically activated tasks.
Published U.S. Patent Publication No. 2003/0212480; published Nov. 13, 2003, filing date May 10, 2002; Lutter et al., Method and apparatus for controlling operations in a vehicle.
Published U.S. Patent Publication No. 2003/0217367; published Nov. 20, 2003, filing date May 20, 2002; Romano, Wireless hand-held video, data, and audio transmission system.
Published U.S. Patent Publication No. 2004/0001588; published Jan. 1, 2004, filing date Jun. 28, 2002; Hairston, Headset cellular telephones.
Published U.S. Patent Publication No. 2004/0010407; published Jan. 15, 2004, filing date Sep. 5, 2001; Kovesi et al., Transmission error concealment in an audio signal.
Published U.S. Patent Publication No. 2004/0029610; published Feb. 12, 2004, filing date May 2, 2003; Ihira et al., Portable radio communication terminal and call center apparatus.
Published U.S. Patent Publication No. 2004/0046637; published Mar. 11, 2004, filing date May 18, 2001; Wesby Van Swaay, Programmable communicator.
Published U.S. Patent Publication No. 2004/0049388; published Mar. 11, 2004, filing date Sep. 6, 2002; Roth et al., Methods, systems, and programming for performing speech recognition.
Published U.S. Patent Publication No. 2004/0083095; published Apr. 29, 2004, filing date Oct. 23, 2002; Ashley et al., Method and apparatus for coding a noise-suppressed audio signal.
International Search Report, mailed Aug. 11, 2004.
Hong Kook Kim, et al.; A Bitstream-Based Front-End for Wireless Speech Recognition on IS-136 Communications System; IEEE Transactions on Speech and Audio Processing; Manuscript received Feb. 16, 2000, revised Jan. 25, 2001; 11 Pages; vol. 9; No. 5; Jul. 2001; New York, NY, US.
Mladen Russo, et al; Speech Recognition over Bluetooth ACL and SCO Links: A Comparison; Consumer Communications and Networking Conference 2005; Jan. 3-6, 2005; 5 Pages; Las Vegas, NV, US.
Four-page Vocollect Speech Recognition Headsets brochure—Clarity and comfort. Reliable performance. Copyright Sep. 2005.
Four-page Vocollect Speech Recognition Headsets brochure—SR 30 Series Talkman High-Noise Headset. Copyright 2005.
Two-page Vocollect SR 20 Talkman Lightweight Headset Product Information Sheet. Copyright Aug. 2004.
Photographs 1-7 SR Talkman Headset Aug. 2004—Prior art.
Two-page Supplemental Vocollect SR 20, Talkman Lightweight Headset Product Information Sheet. Copyright Aug. 2004.
Photographs 1-8 SR Talkman Headset.

Related Publications (1)

	Number	Date	Country
	20070143105 A1	Jun 2007	US

Wireless headset and method for robust voice data communication

Information

Patent Number

Date Filed

Date Issued

Inventors

Original Assignees

Examiners

Agents

CPC

US Classifications

Field of Search

US

International Classifications