Methods and apparatus for automatic user-specific, condition-specific communication system intelligibility testing and optimization are provided.
The hearing loss experienced by people who are hard of hearing is rarely uniform across the entire audio spectrum. For example, a person's hearing may be down by only 5 dB at 500 Hz, and down by 20 dB at 2,000 Hz. For users with this type of hearing loss, it can be helpful to provide a compensating amount of amplification at frequencies where the user is known to have a specific amount of hearing loss. Using the above example, this compensation could be a 5 dB boost at 500 Hz and a 20 dB boost at 2,000 Hz. An underlying assumption of this approach is that intelligibility, i.e., the ability for a listener to discriminate between two essentially similar sounds, is highly correlated with the ability to perceive all frequencies in the acoustic spectrum at the correct amplitude.
Although there are electronic audio devices that allow users to adjust the spectral characteristics for themselves, typically via what are commonly referred to as “tone controls” or “graphic equalizers,” a problem with this approach when applied to telecommunication systems is that users tend to adjust the characteristics to maximize the aesthetic quality of the voice rather than the intelligibility. (The inability of hard of hearing users to self-adjust audio systems optimally is a reason why audiologists, and not the individual users, make the spectral adjustments on users' hearing aids.) But perhaps the most important reason why self-adjustment of the spectral characteristics may not yield optimal speech intelligibility for hard of hearing users is that certain types of audio degradation that are common in telecommunication systems can affect these users differently from users with normal hearing, and are best mitigated through techniques that do not rely exclusively on simple spectral compensation. Examples include the distortions introduced by audio compression (e.g., GSM or G.729), packet loss, ambient noise, transducer quality, and poor signal to noise ratio. In this context, it is important to note that the optimal mitigation strategy will differ among individuals depending on the nature of the individual's hearing loss.
In summary, when considering the needs of hard of hearing users of telecommunication systems:
(a) Optimal intelligibility is not reliably achieved when users self-adjust the audio characteristics of the device.
(b) Many of the audio distortions commonly experienced in telecommunication systems are best mitigated on a per-user basis through techniques that are not limited to simple spectral compensation.
For these reasons, a method is required that relies on the results of individually administered intelligibility tests (rather than hearing acuity tests) to provide automatic optimization of audio factors that include, but are not limited to, spectral adjustments.
Systems and methods for improving the intelligibility of speech delivered to a user through a communication system are provided. More particularly, an automatic user-specific, condition-specific intelligibility testing and optimization system and method are provided. According to embodiments of disclosed invention, an intelligibility test is automatically administered to a user that evaluates the user's ability to discriminate between two essentially similar speech sounds. After administering the intelligibility test, the results are analyzed, and modifications are made to the speech signal by the system automatically in order to maximize the intelligibility of speech for the user.
Systems in accordance with the present disclosure include a communication server or set of communication servers and at least one user endpoint. The communication server includes or has access to an interactive voice response (IVR) system or script that operates to administer the intelligibility test. The communication server additionally includes application programming that can identify patterns in the user's ability, or inability, to discriminate between different speech sounds. The communication server can then identify audio adjustments that would maximize intelligibility for the user and make the adjustments automatically. The system can additionally identify how user specific discrimination patterns change as a function of factors associated with the communication or telecom system and the user's environment. Sets of different automatic adjustments for a user can be stored for use by a user in connection with different communication systems and/or communication devices following the intelligibility testing and analysis.
Methods in accordance with embodiments of the present disclosure include initiating a communication session between a user and a communication server. After establishing the communication session, the communication server administers an intelligibility test for the user of the communication device. The user's responses are analyzed, and used to identify patterns in the user's ability, or inability, to discriminate between speech sounds. The method further includes using the user responses to identify adjustments to the output parameters of the speech signal in order to maximize the intelligibility of speech signals for the user. The adjustments are then applied automatically. The automatic adjustments or compensation can include, but are not limited to, spectral shaping and/or modifications to frames of speech signal data. Embodiments of the method additionally include performing automatic optimization of intelligibility for different users, and applying different adjustments to reproduced audio signals for the different users. Further embodiments of the present disclosure can include performing intelligibility tests for a user under different conditions and/or using different communication devices or systems, and applying the adjustments determined best suited for the different conditions devices or systems.
Additional features and advantages of embodiments of the present disclosure will become more readily apparent from the following description, particularly when taken together with the accompanying drawings.
The communication server 104 may comprise a general purpose computer or server device. The communication server 104 can include an interactive voice response (IVR) system 124 that is operable to administer an intelligibility test to a user 116, as described in greater detail elsewhere herein. The communication server 104 can additionally include an analysis and modification unit 120, that operates to determine and implement adjustments to the reproduction of speech for a user 116 through a communication device 108 as described herein.
A communication endpoint or device 108 may comprise a desktop telephone, cellular telephone, soft phone, two-way radio, or other device capable of supporting voice communications or the delivery of speech to the user 116. In addition, different communication endpoints 108 can be associated with different networks or audio encoding algorithms. In general, each communication endpoint 108 is associated with at least one user 120. In addition, one user 120 may be associated with multiple communication devices 108. For example, one user 120 may be associated with a first communication device 108a comprising a desk phone, and a second communication device 108b comprising a cellular telephone. As can be appreciated by one of skill in the art, different telephones can operate with different networks 112 and different audio encoding algorithms, which affect the quality and characteristics of speech or audio signals.
All the functions defined in the communication server 104 as well as an emulation of the network 112 may reside within the communication endpoint or device 108. For example, the communication device 108 could encode speech in one of many available codecs, feed the resultant encoded bit stream through network emulation software such as found in Netem that replicates real network conditions and then capture the bit stream out of this network function and decode this to speech that in real-time is played out the speaker of the communication device 108. It is equally valuable to do this in the different user acoustic environments as described elsewhere.
With reference now to
A communication server 104 can also include memory 208 for use in connection with the execution of application programming or instructions by the processor 204, and for the temporary or long term storage of program instructions and/or data. As an example, the memory 208 may comprise RAM, SDRAM, or other solid state memory. Alternatively or in addition, data storage 212 might be provided as part of a communication server 104. In accordance with embodiments of the present invention, data storage 212 can contain programming code or instructions implementing various of the applications or functions executed by the communication server 104. Like the memory 208, the data storage 212 may comprise a solid state memory device or devices. Alternatively or in addition, the data storage 212 may comprise a hard disk drive or other random access memory.
In accordance with embodiments of the present invention, the data storage 212 can include various applications and data. For example, the data storage 212 can include an IVR application 216, for example in connection with providing an IVR system 124 or IVR function as described herein. As a further example, the data storage 212 can include user data 220, such as information identifying individual users, and adjusted audible signal characteristics that are applied in connection with providing speech signals to particular users 120 and/or communication devices 108. A communication server 104 can additionally include one or more communication interfaces 224. For example, a first communication interface 224a can be provided to operably interconnect the communication server 104 to a first network 116a, and a second communication interface 224b can be provided to interconnect the communication server 104 to the second network 116b.
At step 312, a determination can be made as to whether adjustments to the speech signal parameters are warranted, based on the responses of the user 120 to the speech intelligibility test. If changes to the parameters of the speech signal are warranted, the adjustments that the administration of the intelligibility test determined were applicable to the user 120 can be stored (step 316), for example as part of user data 220. The stored, adjusted speech signal parameters can then be made available for later communications involving the user 120 and the communication endpoint 108.
After storing the adjusted speech signal parameters, or after determining that adjustment to the parameters are not required, a determination can be made as to whether a communication is in progress (step 320). If a communication is determined to be in progress, a next determination can be made as to whether adjusted speech signal parameters are available for a communication device 108 or user 120 involved in the communication (step 324). If adjusted parameters are available, they can be applied in connection with the communication (step 328). The application of adjusted speech signal parameters can include modifying the speech signal provided by the communication server 104 to the communication device 108 associated with the user 120 for whom adjusted speech signal parameters have been determined as a part of the administration of an intelligibility test as described herein. The adjusted speech signal parameters can include spectral shaping, in which different frequencies of an audio frequency are amplified or attenuated in order to improve the intelligibility of the speech signal to the user 120. As a further example, the adjusted speech signal parameters can include adjustments to the length of data frames containing the audio data comprising the speech signal. For example, by lengthening data frames containing plosive sounds, the intelligibility of such sounds can be improved. Another technique for improving the intelligibility of speech, which is described in U.S. Pat. No. 6,889,186 to Michaelis, identifies portions of the speech signal that includes sounds that typically present intelligibility problems and modifies those portions in an appropriate manner. For example, the amplitude of frames determined to include unvoiced plosive sounds may be boosted. In addition, the amplitude of frames preceding such unvoiced plosive sounds can be reduced to better accentuate the plosive. After applying adjusted parameters, or after determining that no adjusted parameters are available, the process can end.
The intelligibility test can be administered in connection with each communication device 108 and/or network 112 in connection with which a user 120 may receive speech signals. Accordingly, a user 120 can connect to a communication server 104 for intelligibility testing in connection with different communication endpoints 108, networks 112, and/or combinations thereof. Speech signal adjustment parameters determined as a result of the intelligibility testing can be stored and applied subsequent to the intelligibility testing to the provision of speech signals to a user 120.
The application of speech signal adjustment parameters stored as part of the user's data 220 can depend on the communication device 108 and/or communication network 112 involved in a communication session with the user 120. Accordingly, different sets of speech signal adjustment parameters determined while testing the intelligibility of speech for a user 120 can be applied when different communication devices 108 and/or communication networks 112 are used to transmit speech signals to that user 120. In addition, different sets of speech signal adjustment parameters can be established through testing and applied in use for different communication environments. For example, a user 120 may have a set of speech signal adjustment parameters that are applied when the user 120 is involved in a communication session that uses a cellular telephone connected via a Bluetooth connection to a microphone and speakers provided as part of an automobile. As yet another example, a different set of speech signal adjustment parameters can be determined with respect to a particular communication endpoint 108 when that communication endpoint is being used in the home, a second set of speech signal adjustment parameters can be developed for application with that same communication endpoint 108 when the user 120 is on a city street, and yet another set of speech signal adjustment parameters can be applied when the user 120 is in an automobile. In accordance with still other embodiments, the conditions that affect intelligibility can change mid-call. Accordingly, the set of speech signal adjustment parameters that are applied can be changed during a call. For example, when the user moves from a quiet to a noisy environment or vice versa, changes in packet loss rates due to network congestion, or any other change that can be detected by the communication server 104 or endpoint 108 can result in an automatic change in the applied speech signal adjustment parameters. Accordingly, continuous optimization of the parameters is possible.
The establishment of different speech signal adjustment parameters for inclusion in user data 220 can be developed during a set-up or initialization process. Moreover, a user 120 can be provided with an opportunity to establish a new set of speech signal adjustment parameters for each new environment and/or combination of equipment 108, 112 associated with the communication. In this way, optimal or more favorable speech signal characteristics for a particular user 120 can be applied in different situations. The application of different speech signal adjustment parameters can be automatic, in that the communication server 104, for example through operation of the IVR application 216, can select a particular set of speech signal adjustment parameters for a particular set of equipment 108, 112, communication protocols, environments in which the user 120 is located during the communication session, etc. Alternatively, a user 120 can select a particular set of speech signal adjustment parameters for application during a communication session. In accordance with still other embodiments, different sets of speech signal adjustment parameters can be applied for different users 120 communicating with one another during the communication session. In particular, a first set of speech signal adjustment parameters can be drawn from user data 220 associated with a first user 120a, and a second set of speech signal adjustment parameters stored as user data 220 and associated with the second user 120b can be applied to speech signals provided to that second user 120.
The foregoing discussion of the invention has been presented for purposes of illustration and description. Further, the description is not intended to limit the invention to the form disclosed herein. Consequently, variations and modifications commensurate with the above teachings, within the skill or knowledge of the relevant art, are within the scope of the present invention. The embodiments described hereinabove are further intended to explain the best mode presently known of practicing the invention and to enable others skilled in the art to utilize the invention in such or in other embodiments and with various modifications required by the particular application or use of the invention. It is intended that the appended claims be construed to include alternative embodiments to the extent permitted by the prior art.