This disclosure related to automated speech systems and more particularly to a system and method for imparting the proper inflection to numbers.
In many automated systems it is necessary to provide spoken numbers under automated control. For example, in an interaction voice response (IVR) system it is necessary to an automated system to “speak” numbers from time to time. Such a number could be, for example, “your balance is 5 dollars and 38 cents.” Usually the response is a number sequence having individual strings. An example would be “your account number is 38 4041 256,” having three strings in the sequence. The first string having a length of 2, the second string length being 4, and the fourth string length of 3.
Current IVR systems have ten numbers (0-9) prerecorded. In order to create a group of numbers, the prerecorded numbers are concatenated together in the right order. This was acceptable in situations where the user (listener) was inputting numbers using mechanical touch-tones. In such systems, it was expected that any voice response would sound mechanical. However, as systems began to migrate toward speech recognition, user's have begun to want the “speech” coming from an automated system to be more conversational, such that the message coming to them sounds to them the way a real person would speak.
When a real person says a number string, such as a phone number, the string, such as 972-454-8316 has pauses inserted and each number has an inflection based on where in the string the number falls. Concatenated number strings played to a user do not have the proper inflections and thus such systems are becoming unacceptable.
The present invention is directed to a system and method which begins with a speaker recording strings of numbers in different string lengths. The system takes advantage of the natural speech patterns that occur when numbers are reached in strings. For instance, social security numbers, phone numbers, zip codes are spoken in groups and people naturally say them a certain way. Advantage is taken of the fact that speakers typically break numbers into group sizes of two, three, or four. Thus, by way of example, a recorder (speaker) records two 0's, two 2s, two 3s, etc. Then the recorder records three 1s, three 2s, three 3s, etc., followed by four 1s, four 2s, four 3s, etc. Then these strings are were broken apart and stored. Advantage is taken of the upward inflection, the middle inflection, and the downward inflections that are imparted to each number dependent upon its position in the string as well as the length of the string. When a number string is to be spoken (for example, the number string 782), the system retrieves from the stored three digit number strings a first 7, a middle 8, and an end 2. Using systems and methods discussed herein the proper inflections are achieved for each digit when these referenced number values are transmitted to a recipient.
The foregoing has outlined rather broadly the features and technical advantages of the present invention in order that the detailed description of the invention that follows may be better understood. Additional features and advantages of the invention will be described hereinafter which form the subject of the claims of the invention. It should be appreciated that the conception and specific embodiment disclosed may be readily utilized as a basis for modifying or designing other structures for carrying out the same purposes of the present invention. It should also be realized that such equivalent constructions do not depart from the invention as set forth in the appended claims. The novel features which are believed to be characteristic of the invention, both as to its organization and method of operation, together with further objects and advantages will be better understood from the following description when considered in connection with the accompanying figures. It is to be expressly understood, however, that each of the figures is provided for the purpose of illustration and description only and is not intended as a definition of the limits of the present invention.
For a more complete understanding of the present invention, reference is now made to the following descriptions taken in conjunction with the accompanying drawing, in which:
Turning now to
Note that while the stored sound values are shown in conjunction with IVR 10 (
For a string where L=2 the spoken values are stored in files with names temp2 of 1.wav-temp2 of o.wav, as shown in Table A2.
For a string where L=3 the spoken values are stored in files with names temp3 of 1.wav-temp3 of 0.wav as shown in Table A3.
For a string where L=4 the spoken values are stored in files with names temp4 of 0.wav as shown in Table A4.
For a string where L=5, the spoken values are stored in files with names temp5 of 0.wav as shown in Table A5.
Note that the number values for each position within each string are the same (1,1,1; 2,2,2; etc.). This is done for convenience in recording and in further processing. The numbers could be recorded randomly so long as for each string length L there is a first, second, third, etc. value recorded for each digit 0-9.
Process 202 of
Thus, by way of example, looking at the three-digit string 1, 1, 1, there will be a beginning 1, a middle 1, and an ending 1. When the three 1's are recorded, the natural inflection on the first 1 would be an upward inflection, the natural inflection on the middle 1 would be a flat inflection, and the natural inflection on the last 1 would be a falling inflection. Note then that when all ten digits have been recorded as triple digits (a string of 3), the individual number values therein can be interchanged because the string will be cut apart and stored so that there will be a beginning 1, a beginning 2, a beginning 3, etc., a middle 1, a middle 2, a middle 3, etc. and an end 1, an end 2, and an end 3, etc. This would also be true when the string length L=2, 4, or 5, or any desired length.
Once these values are recorded they can be reused, thereby limiting the amount of recordings that must be made to achieve a natural sound for number strings. This system avoids having the recorder record every possible combination in every one of the strings. To do so would require making thousands and thousands of recordings which is not feasible, mainly because it would take too long to record as well as using a larger amount of memory which is not always available, particularly in portable systems where the memory is limited. In addition to requiring a large memory it would take a long time to record all the possible combinations. After a while the spoken values would not sound consistent, and thus, when the number values are replayed they would not sound right to a recipient.
The reason why strings of two, three, four, and five are recorded separately is because there is a different inflection for each value for each such string length. A string of three flows differently than does a string of two, or a string of four, or a string of five. A string of six, seven, etc., is different still. While it is possible to record strings of two, strings of three, strings of four, strings of five, etc., up to strings of any number; it is not needed to go beyond a string of five because numbers are most often communicated in strings of two, three, four, or five. Phone numbers, zip codes, credit account numbers, social security numbers, all have a format that is broken into such strings. Even if a customer has a long account number it is almost always broken into a particular pattern, such as the first four (dash), the next five (dash), etc., thus, strings greater than five are almost never used. However, if desired, any string length could be used.
Process 302 determines how the string(s) are to be played. For example, in a personal social security number the system must play back a string of three, pause, a string of two, pause, a string of four. For a ten digit phone number (XXX YYY-ZZZZ), the system would require strings of three, three, and four.
Process 303 obtains from memory the numbers needed to play to the recipient. For example, assume an account number of 972-8816-54 is to be communicated. Table C shows the stored files that are to be retrieved. Note that the 9 from the first position of the first string is selected from the “three digit” recording as shown on line 01 of Table C. (First recorded and stored as shown in Table A3 and then broken apart and stored as shown in Table B, as discussed above.) The 7 from the second position of the first string is selected from the “three digit” recording as shown on line 02 of Table C, while the 2 from the third position of the first string is selected from the “three digit” recording as shown on line 03 of Table C.
A string of numbers having L=5 (for example, the string 62109), would use the number values 6, 2, 1, 0, and 9.
Process 304 determines if the string is complete and process 303 reiterates until all values for the string are obtained. Process 305 determines if the sequence of strings is complete. If not, then the next sequence (in the example the four position sequence 8316) is retrieved from memory, followed by the next sequence, which is the two position sequence 54. When the values corresponding to all numbers for all strings are available, the values are assembled by process 305 as shown in Table C with pauses (or other sounds, such as dash, etc.) inserted as shown on lines 04 and 09 of Table C. Once assembled, the sequence of strings is played as shown by process 307.
Although the present invention and its advantages have been described in detail, it should be understood that various changes, substitutions and alterations can be made herein without departing from the invention as defined by the appended claims. Moreover, the scope of the present application is not intended to be limited to the particular embodiments of the process, machine, manufacture, composition of matter, means, methods and steps described in the specification. As one will readily appreciate from the disclosure, processes, machines, manufacture, compositions of matter, means, methods, or steps, presently existing or later to be developed that perform substantially the same function or achieve substantially the same result as the corresponding embodiments described herein may be utilized. Accordingly, the appended claims are intended to include within their scope such processes, machines, manufacture, compositions of matter, means, methods, or steps.
Number | Name | Date | Kind |
---|---|---|---|
6161092 | Latshaw et al. | Dec 2000 | A |
6873952 | Bailey et al. | Mar 2005 | B1 |
20020103648 | Case et al. | Aug 2002 | A1 |
20020193995 | Case et al. | Dec 2002 | A1 |
20030207701 | Rolnik et al. | Nov 2003 | A1 |
20050240410 | Charles et al. | Oct 2005 | A1 |
20070055632 | Hogl | Mar 2007 | A1 |