Speech playback with prebuffered openings

Description

RELATED APPLICATIONS

(Not Applicable)

FEDERALLY SPONSORED RESEARCH

(Not Applicable)

BACKGROUND OF THE INVENTION

This invention relates to speech playback systems and, more particularly, such systems with reduced latent periods prior to and during playback of pre-recorded speech.

Speech playback systems of different types are available for a variety of applications. The design, operation and capabilities of such systems are well known to skilled persons. A typical application involves informational responses to inquiries provided by callers using an “800” type telephone service.

The responses to callers provided by many such systems are subject to delays or latent periods prior to and during playback of recorded messages. Playback systems may typically have a capacity to store and play thousands of recorded messages (e.g., opening messages and responses to caller's inquiries). Such capacity may represent thousands of hours of recorded speech. Speech (e.g., in digital format) may be stored in active solid state memory directly associated with a computer. However, at the present state of the art use of such memory is subject to limitations in capacity and economic tradeoffs. Based on both technical and economic considerations it is typically not practical to provide adequate speech storage capacity in active computer memory. As a result, a separate or associated speech storage facility is relied upon in order to provide adequate speech storage capacity. Such a facility may utilize large capacity electromagnetic or other storage units to which access is provided by a speech data server unit which may be linked to other units of the playback system by a local area network or other communication channel.

Systems of the type described provide adequate capabilities and capacity for speech storage and retrieval. However, the need to retrieve recorded messages from a speech storage facility introduces delays and latent periods prior to and during speech playback. Such delays in initial response and latent periods (e.g., “dead air” gaps) between message portions result from the response times and signal transmission delays inherent in speech retrieval from a typical speech storage facility.

Objects of the present invention are, therefore, to provide new and improved speech playback systems and methods, and such systems and methods having one or more of the following advantages and characteristics:

rapid opening speech playback response;

reduced latent periods during speech playback;

limitation of active computer memory capacity requirements;

improved flow of speech during concatenation; and

economical high-capacity speech storage with rapid opening of speech response.

SUMMARY OF THE INVENTION

In accordance with the invention, a speech playback system, using concatenation with rapid opening to respond to an incoming call from a caller, includes memory to store opening fragments of speech elements and permit rapid retrieval access thereto, and storage to store remaining portions of speech elements and permit retrieval access thereto which is slower than such rapid retrieval access. The system also includes a controller responsive to the incoming call (i) to determine a responsive message, (ii) to cause an opening fragment of a speech element to be retrieved from memory, (iii) to cause a remaining portion of that speech element to be retrieved from storage, and (iv) to cause a message beginning with the opening fragment and continuing with the remaining portion to be provided to the caller.

Typically, the speech playback system will also include a speech playback unit to convert retrieved speech elements into audio signals, the memory may be a solid state computer memory, and a storage facility may include an electromagnetic medium speech storage unit controlled by a speech data server.

Also in accordance with the invention, a speech playback method, using concatenation with rapid opening to respond to an incoming call from a caller, includes the steps of:

(a) storing opening fragments of speech elements to permit rapid retrieval access thereto;

(b) storing remaining portions of speech elements to permit retrieval access thereto which is slower than such rapid retrieval access;

(c) determining a responsive message in response to the incoming call;

(d) retrieving an opening fragment stored in step (a);

(e) initiating action to transmit to the caller a message beginning with the opening fragment retrieved in step (d);

(f) retrieving a remaining portion stored in step (b); and

(g) continuing transmission to the caller of the message initiated in step (e) by transmission of the remaining portion retrieved in step (f).

For a better understanding of the invention, together with other and further objects, reference is made to the accompanying drawings and the scope of the invention will be pointed out in the accompanying claims.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1

is a block diagram of a speech playback system in accordance with the invention.

FIG. 2

shows a multi-unit configuration, utilizing a common speech data server and speech storage facility, in accordance with the invention.

FIG. 3

is a flow chart useful in describing a speech playback method pursuant to the invention.

DESCRIPTION OF THE INVENTION

FIG. 1

illustrates an embodiment of a speech playback system

10

utilizing the invention. As shown, system

10

is arranged to receive calls from an individual caller via unit

12

, which may be a telephone instrument, for example. Communication path

14

shown linking unit

12

and system

10

may comprise a public telephone network or other suitable communication facility enabling two-way communication. As represented in

FIG. 1

, the system will typically be configured to serve a plurality of callers initiating individual calls.

As shown, system

10

includes a controller

20

, which may be a suitable microprocessor-based or other computer facility utilizing suitable computer programs, a memory shown as speech memory

22

, and a speech playback unit

24

. Memory

22

, which may comprise of the order of 4 megabytes of active solid state computer memory or other suitable rapid access storage capacity, is shown as positioned internal to controller

20

, however other configurations may be employed. Speech playback unit

24

, also shown internal to controller

24

for purposes of example, comprises a suitable unit or capability enabling digital or other signals representative of retrieved speech to be constituted as audio signals appropriate for playback to a caller. The

FIG. 1

speech playback system, as shown, also includes speech storage and speech transport facilities

30

and

32

, respectively. Speech storage unit

30

may comprise a high capacity electromagnetic medium or other suitable data storage device or multi-unit configuration arranged for storage and retrieval of speech elements. For present purposes a “speech element” is defined as a speech portion of any length as appropriate for use in an application (e.g., a single digit number, a word, a sentence, a paragraph, or any shorter or longer portion of speech). Speech transport

32

may comprise any suitable form of link or path (e.g., a conductor, bus, local area network, etc.) enabling signal transmission between speech storage unit

30

and other portions of the system.

As will be further described, in this embodiment controller

20

is responsive to an incoming call and arranged:

(i) to determine an appropriate message responsive to the incoming call from caller

12

;

(ii) to cause an opening fragment of a speech element (e.g., the first 4 kilobytes) to be retrieved from memory

22

;

(iii) to cause a remaining portion (e.g., the remainder, if any) of that speech element to be retrieved from speech storage

30

; and

(iv) to cause a message beginning with the retrieved opening fragment and continuing with the retrieved remaining portion to be sent to caller

12

(e.g., via playback unit

24

).

As will now be appreciated, use of the invention reduces the time required to initiate the playback of the responsive message to the caller. The playback of a message comprising a speech element pre-stored in storage

30

would entail a finite delay while such speech element was retrieved. However, with the opening fragment more rapidly retrieved from active computer memory

22

, message playback to the caller is enabled to begin more quickly. Then while the opening fragment is being provided to the caller, the remaining portion of the speech element is moved from storage

30

to memory

22

in time to provide a continuous playback of the complete speech element, or a close approximation of continuous playback, to the callers. At the present state of the development and use of speech playback equipment, it will be apparent that once having an understanding of the invention skilled persons will be capable of implementing the components of the

FIG. 1

system, including other and alternative features and capabilities already known for use in such systems.

As discussed above, in a currently preferred configuration the opening fragments stored in memory

22

consist of a fixed portion of each pre-recorded speech element. Thus, for a given application a speech playback system may be supplied with thousands of pre-recorded speech elements. Some elements may be single numerical digits which may by concatenation techniques be assembled into multi-digit numerical responses. Other speech elements may be complete sentences, each responsive to a particular caller inquiry suitable for use by a bank, brokerage house, public utility or other system provider. Each pre-recorded speech element may be of any desired length. For each system implementation, an opening fragment “length” is selected (e.g., the first 4 or 8 kilobytes of each speech element). Then, for each speech element the first 4 kilobytes, for example, of digital data representative of that speech element is stored in memory

22

. One half second of audio may typically be represented by 4 kilobytes of digital data. Correspondingly, the remainder of each speech element is stored in speech storage facility

30

. Then, when controller

20

determines that a particular speech element (e.g., speech element No. 4,321, of 6,000 pre-recorded speech elements) should be played to a caller, the opening 4 kilobyte fragment of speech element No. 4,321 is retrieved and played from memory

22

. Contemporaneously, the remaining portion of speech element No. 4,321 is retrieved from storage

30

, placed in memory

22

, and by concatenation techniques made to smoothly follow the opening fragment in providing a complete version of speech element No. 4,321 to the caller. Where a response to a caller is to be assembled from a plurality of pre-recorded speech elements (e.g., from individual numbers, words or phrases, or combinations thereof) intra-message latent periods or “dead air” gaps are similarly avoided by rapid openings using opening fragments from active computer memory, followed by concatenated remaining portions, for a succession of speech elements retrieved from a speech storage facility. Where a very short speech element is involved, the opening fragment may comprise the entire speech element with no associated remaining portion. More generally, however, a speech element will typically be long enough so that both an opening fragment and a remaining portion will be involved.

Referring now to

FIG. 2

, there is illustrated a speech playback system

10

a

, which includes two controllers

20

a

and

20

b

, are shown coupled to speech storage facility

30

via speech transport

32

(e.g., a LAN) and speech data server

34

. Controllers

20

a

and

20

b

may each be a unit as described with reference to controller

20

of FIG.

1

. There are thus effectively two systems (systems

1

and

2

) with shared speech storage facilities. In this configuration, a larger volume of activity is enabled by inclusion of speech data server

34

, which is a computer-based system based on known techniques to supply recorded speech data to a plurality of controllers.

Operational understanding of the invention will be enhanced by consideration of a speech playback method pursuant to the invention. An exemplary method as illustrated in

FIG. 3

includes the following steps. Initially, a collection of pre-recorded speech elements suitable for responding to incoming calls in a particular application (e.g., response to brokerage customers) is provided. Such collection may include a large number of sentences, phrases, words and spoken numbers usable to form appropriate responses by use of concatenation techniques.

At

40

, opening fragments of speech elements are stored in memory

22

of

FIG. 1

, to permit rapid access to such fragments. A fragment for this purpose may comprise the first 4 or 8 kilobytes of digital data representing a speech element. In some applications, the length of the opening fragments stored in memory

22

may not be uniform. For example, short words or individual numerical digits may be treated as opening fragments and stored in their entirety in active computer memory

22

.

At

41

, remaining portions of speech elements are stored in speech storage facility

30

. For example, if a speech element is a sentence and the first 4 kilobytes of recorded speech data has been stored in memory

22

, the speech data for the remainder of the sentence is stored in storage

30

. As will be appreciated, memory

22

provides rapid retrieval access, while storage

30

provides retrieval access which is slower than the rapid retrieval access of memory

22

.

At

42

, controller

20

determines the appropriate make-up of a message which will be responsive to an incoming call. By application of appropriate software in known manner, controller

20

utilizes whatever associated caller data is available relative to a particular incoming call to determine the content of a responsive message.

At

43

, controller

20

causes an appropriate opening fragment of a speech element, to be used in providing the responsive message, to be retrieved from memory

22

.

At

44

, controller

20

initiates action to transmit to the caller a message beginning with the retrieved opening fragment.

At

45

, controller

20

causes the remaining portion of the speech element to be retrieved from storage

30

.

At

46

, controller

20

causes transmission of the message to the caller to be continued with the retrieved remaining portion by use of concatenation techniques, to provide a composite message intelligible to the caller. In providing the message to the caller, retrieved speech data in digital form is converted into audio signals via speech playback unit

24

.

At

47

, steps

42

through

46

are repeated as appropriate to respond to further inquiries from the caller.

It should be understood that the above steps are not necessarily executed strictly in order. For example, retrieval of the remaining portion may be initiated before or concurrently with action to transmit the opening fragment. In any event, it is normally an objective that, as transmission of the opening fragment is completed, the speech data for the remaining portion is available for use in providing what will be perceived by the caller as a continuous speech element.

Each responsive message may comprise one or a plurality of speech elements. In the case of a plurality, as the transmission of the first speech element is completed, the process of opening fragment and remaining portion retrieval for the next speech element can be implemented, so that extensive messages can be provided with reduction of both initial and intra-message latent periods.

While there have been described the currently preferred embodiments of the invention, those skilled in the art will recognize that other and further modifications may be made without departing from the invention and it is intended to claim all modifications and variations as fall within the scope of the invention.

Claims

1. A speech playback system, using concatenation with rapid opening to respond to an incoming call from a caller, comprising:memory to store opening fragments of speech elements and permit rapid retrieval access thereto; storage to store remaining portions of speech elements and permit retrieval access thereto which is slower than said rapid retrieval access; and a controller responsive to said incoming call (i) to determine a responsive message, (ii) to cause an opening fragment of a speech element to be retrieved from said memory, (iii) to cause a remaining portion of said speech element to be retrieved from said storage, and (iv) to cause a message beginning with said opening fragment and continuing with said remaining portion to be provided to said caller.
2. A speech playback system as in claim 1, wherein said memory is a solid state computer memory.
3. A speech playback system as in claim 1, wherein said storage includes an electromagnetic medium speech storage unit controlled by a speech data server.
4. A speech playback system as in claim 1, wherein said controller is a computer utilizing suitable computer programs.
5. A speech playback system as in claim 1, additionally comprising:a speech playback unit arranged to convert retrieved opening fragments and remaining portions of speech elements into audio signals for transmission to the caller.
6. A speech playback system as in claim 5, additionally including a speech transport arranged to couple remaining portions of speech elements retrieved from the speech storage unit to the speech playback unit.
7. A speech playback system as in claim 1, wherein the controller is arranged in clause (iii) to cause said remaining portion after retrieval from said storage to be temporarily stored in said memory, prior to causing the message to be provided to said caller.
8. A speech playback method, using concatenation with rapid opening to respond to an incoming call from a caller, comprising the steps of:(a) storing opening fragments of speech elements to permit rapid retrieval access thereto; (b) storing remaining portions of speech elements to permit retrieval access thereto which is slower than said rapid retrieval access; (c) determining a responsive message in response to said incoming call; (d) retrieving an opening fragment stored in step (a); (e) initiating action to transmit to said caller a message beginning with said opening fragment retrieved in step (d); (f) retrieving a remaining portion stored in step (b); and (g) continuing transmission to said caller of the message initiated in step (e) by transmission of said remaining portion retrieved in step (f).
9. A method as in claim 8, wherein steps (e) and (g) respectively include converting retrieved opening fragments and remaining portions into audio signals for transmission to the caller.
10. A method as in claim 8, wherein step (a) includes storing opening segments in a solid state computer memory.
11. A method as in claim 10, wherein step (b) includes storing said remaining portions in a speech storage facility separate from the memory utilized in step (a).
12. A method as in claim 8, wherein step (a) includes storage in a solid state computer memory and step (f) includes temporarily storing said remaining portion in said computer memory prior to its transmission to said caller in step (g).
13. A speech playback method, using concatenation with rapid opening to respond to an incoming call from a caller, comprising the steps of:(a) storing opening fragments of speech elements to permit rapid retrieval access thereto; (b) storing remaining portions of speech elements to permit retrieval access thereto which is slower than said rapid retrieval access; (c) determining a responsive message in response to said incoming call; (d) retrieving an opening fragment stored in step (a); (e) retrieving a remaining portion stored in step (b); and (f) transmitting to said caller a message beginning with said opening fragment retrieved in step (d) and continuing with said remaining portion retrieved in step (e).
14. A method as in claim 13, wherein step (f) includes converting retrieved opening fragments and remaining portions into audio signals for transmission to the caller.
15. A method as in claim 13, wherein step (a) includes storing opening fragments in a solid state computer memory.
16. A method as in claim 15, wherein step (b) includes storing said remaining portions in a speech storage facility separate from the memory utilized in step (a).
17. A method as in claim 13, wherein step (a) includes storage in a solid state computer memory and step (e) includes temporarily storing said remaining portion in said computer memory prior to its transmittal to said caller in step (f).

US Referenced Citations (6)

Number	Name	Date
4320256	Freeman	Mar 1982
4420656	Freeman	Dec 1983
4813014	DeBell	Mar 1989
5454036	Gleeman et al.	Sep 1995
5822537	Katseff et al.	Oct 1998
5841979	Schulhof et al.	Nov 1998

Speech playback with prebuffered openings

Information

Patent Number

Date Filed

Date Issued

Inventors

Original Assignees

Examiners

Agents

CPC

US Classifications

Field of Search

US

International Classifications

Abstract

Description

Claims

US Referenced Citations (6)