This invention relates to an applications server operable to provide a user driven service in accordance with an application program. The invention also relates to a method for providing a user driven service, the service being provided in response to user commands for selecting service options. The invention also relates to an application program operable to provide a user driven service in response to user commands for selecting service options.
Services provided on an applications server may be accessed by a user in response to user commands issued by the user. The services may be provided over a network, for instance a mobile network including a server, and could include, for example, services such as initiating a telephone call, retrieving voicemail or sending and retrieving text or picture messages. User commands may take a number of different forms. For instance, users may be able to issue a command by pressing a button or a series of buttons on a keypad of a user terminal such as a mobile telephone. Alternatively, the user may be able to issue a command by navigating and selecting menu items on a graphical user interface of a user terminal, or by providing a voice command. The services may be accessed using a set of dialogs conducted between a user and an application program provided on an applications server. The applications server may communicate with the user via a set of audio prompts tailored to the information required from the user. The user can, in response to these prompts, supply the applications server with commands.
According to a first aspect of the invention, there is provided a speech applications server operable to provide a user driven service in accordance with an application program. The application program is arranged to provide the service in response to user commands for selecting service options, the user commands being prompted by audio prompts. The application program comprises a state machine operable by a state machine engine to determine a state of the application program from one of a predetermined set of states defining a logical procedure through the user selected service options, transitions between states being determined in accordance with logical conditions to be satisfied in order to change between one state of the set and another state of the set. The logical conditions include whether a user has provided one of a set of possible commands. The application program further comprises a set of prompt selection rules operable by a prompt selection engine to generate the audio prompts for prompting the commands from the user in accordance with predetermined rules. The prompt selected by the prompt selection engine is determined at run-time and the at least one state machine of the application program is defined separately from the prompt selection rule set to the effect that a change can be made to the prompt selection rule set defining a dialogue generated by the prompt selection engine for the user driven service independently from the operation of the state machine.
In accordance with this first aspect, by providing that the state machine and the prompt selection rule set are separate entities, it is possible to effect a change to the set of rules defining a dialogue from the prompt selection engine for a particular service in a manner which is independent from the operation of the state machine. That is, different customisations can be provided by different sets of rules for defining the dialogues, which are applied to the prompt selection engine to be used in providing services to different users in accordance with their needs, without requiring a correspondingly customised state machine to be provided. For example, users of the service within different countries or localities may be provided with specific audio prompts from a set of rules for the prompt selection engine, which have a specific customisation tailored to the local languages and dialects of the users of the service within that locality. The predetermined rules used by the prompt selection engine may simply be a one-to-one mapping of a state determined by the state machine to a given voice prompt, or alternatively a given state determined by the state machine may correspond to a number of possible prompts, the actual prompt chosen being selected on the basis of the predetermined rules.
Providing a separate prompt selection rule set is advantageous compared to an alternative approach in which a customisation of the service simply involves recording a new set of audio prompt files, which may not be sufficient for an alternative language customisation or other complex customisation. Further, the present invention is preferable to an alternative approach in which a customisation of the service involves providing a modified state machine for each customisation. This results in onerous maintenance requirements, since any changes in service logic (e.g. dialog flow, addition of new dialogs or bug fixes) require developers to apply the changes to every customisation of the service. Further, customers are unlikely to be allowed access to the state machine of the service, and will therefore be unable to create customisations themselves.
In contrast, embodiments of the invention allow customisations to be developed without any alteration to the service logic, so there can be a single code base for the service. Customisations of the service can be deployed (or removed) without redeploying the service and can be developed and deployed independently from each other. Thus customisations will not introduce bugs to the service itself or to existing customisations. In addition, because customisation development is separate from service development, an operator (or other customer) may be provided with the capability to create its own customisations, with prompt selection reflecting brand values or any other criteria they desire without having to wait for their service vendor to release a new version of the service.
The present approach stands in contrast to the typical practice for localizing non-speech applications using message catalogues. Message catalogues contain all the text strings used by an application. These text strings are extracted from the application and replaced with index keys, each of which points to the message catalogue entry containing the extracted text. Creating a new localization for an application is then a matter of creating a new message catalogue. Note the analogy to a speech application in which the collection of prompt audio recordings can be replaced. While a one-for-one substitution of audio prompt files is possible with this traditional approach, a change in the dialogue format and structure is not achievable.
According to one embodiment of the invention, the speech applications server comprises a command recognition engine. The command recognition engine includes a speech recogniser which is operable to provide the command recognition engine with a set of possible user commands which may be received from the user for changing from a current one of the predetermined set of states to another of the states to which the state machine may change. The command recognition engine is operable to analyse the user commands and the possible commands provided by the speech recogniser to provide the state machine engine with an estimate of one of the possible commands which the user provided. The state machine engine is operable to change state in response to the estimated user command.
According to this embodiment, a set of possible user commands which can be recognised and acted upon by the server are specified by the grammar rules and are used by the command recognition engine in a process of identifying possible commands which are deemed a likely match to the user inputted commands. The set of commands issued to the command recognition engine acts as a constraint to the number and type of user command estimates that can be provided by the command recognition engine and focuses the task of the command recognition engine on relevant commands only. Either a single user command estimate may be provided, or alternatively a plurality of user command estimates may be provided. The state machine engine is operable to use these estimates to determine an appropriate state transition.
In addition to providing an estimate of the user commands, the speech recogniser may also be operable to provide confidence levels corresponding to each of the command estimates, the confidence levels indicating how likely the speech recogniser considers a user command estimate to match the inputted user command. In this case, the state machine engine determines a change of state from, for example, the estimated user command in combination with the determined confidence level.
According to another embodiment, the server is arranged to accept voice commands from the user, and in one example, all communications between the server and the user, both in terms of prompts from the server and commands from the user, may be carried out via speech dialog, advantageously providing a fluid hands-free service to the user. However, it will be appreciated that in the case of some command types for controlling the application program, spoken commands may be either unsuitable, or less expedient than non-spoken commands, such as for example providing “dialled” commands using DTMF tones. In other examples, a combination of spoken and non-spoken commands may be adapted.
According to another embodiment of the invention, the application program is operable to generate a mark-up language page in accordance with a current state of the application program as determined by the state machine, the mark-up language page including universal resource locations (URLs) defining a location for data files providing the audio prompts. The URLs may also specify grammar files, each grammar file providing a set of possible commands for the command recognition engine, some of which may be generated dynamically, whilst others may exist statically. In one example the mark-up language is VoiceXML. The use of a mark-up language to define the prompts is particularly advantageous in the context of a web-server based system.
According to other embodiments, one or more of the state machines, the prompt selection rule set and the command recognition grammars may be defined using mark-up languages.
Various further aspects and features of the present inventions are defined in the appended claims. Other aspects of the invention include a speech application system, a speech application method and an application program.
Embodiments of the present invention will now be described by way of example only with reference to the accompanying drawings where like parts are provided with corresponding reference numerals and in which:
An example embodiment of the present invention will now be described with reference to a voice-activated service.
Embodiments of the present invention provide a facility for an audio based service, which in some examples allows a user to voice activate a service. The voice activation of the service is effected by providing voice commands in response to audio prompts for user command options. However in other examples the user commands may be provided by DTMF tones.
A diagram providing a more detailed representation of one possible run-time implementation of the speech applications server of
The applications server 10 is arranged to provide a platform for running application programs for providing voice activated services to users. According to the present technique, the application program separates the rules for prompt selection from the service logic defining states of the application program, such states implementing the tasks to be performed for the user.
A set of rules run by the prompt selection engine define prompts to be generated for the user. The user responds to the prompts by uttering commands to specify the task to be performed by the service logic. An operative association between the state machine engine and the prompt selection engine is made at run time, so that the prompts to be generated for a particular state are established at run-time. As such the application program when executing on the applications server may be considered as comprising:
As shown in
As will be explained for each of the states of the application program defined by the state machine, certain actions are to be performed in accordance with the session state. A session state manager 106 is therefore arranged to access a data access layer 112 which provides to the user a facility for performing tasks in accordance with the state in the application program which has been reached. The data access layer 112 may handle certain events and may access external resources such as email, SMS or Instant Messaging messaging via an external gateway 110. The data access layer 112 may also receive external events from the external gateway 110 and forward these to the session state manager 106.
The data access layer 112 provides a facility for retrieving data from databases and other data stores. The data access layer 112 is provided with access to data stored in a database 114 and may also be provided with access to XML data resources and other data stores such as:
As mentioned above, the application program also includes a prompt selection engine 120 for selecting audio prompts for communication to the user. The audio prompts are selected by the prompt selection engine from media 122 via a media locator 115. The media resources are identified by Universal Resource Locators (URLs) which identify, amongst other things, prompts in the form of audio files 124 which are accessible by the data access layer 112. The data access layer 122 also provides access to a command recognition engine 126 which is arranged to process user received commands and to generate a confidence score indicating how confident the command recognition engine 126 is that a particular command has been issued.
The confidence scores are passed to the service logic for determining whether a logical condition for changing between one state and another has been satisfied. The data access layer 112 also provides a facility for the user to provide information in accordance with the service being provided. For example, recordings made by the user may be stored by the data access layer 112 in a recordings repository 128. In addition, spoken commands generated by the user may be stored in an utterances data store 130.
The application program also includes a presentation generator 132 which is arranged to receive data for presentation to the user from the session state manager 106 and the prompt selection engine 120. The presentation generator 132 is arranged to form data for presentation to the user, the data being deployed by the web server 100. In one example, the data for presentation to the user is in the form of a VoiceXML page which may include one or more ULRLs to data objects such as audio prompt files 124.
The state machine 104 of the application program is arranged to ensure that the input handling processor 102 and the presentation generator 132 are maintained in a corresponding one of the predetermined states of the application program with respect to which particular actions are performed. The state of the application program is determined for the state machines 104 by the state machine engine 134.
The web server 100 includes a page request servlet 100.2 and a media request servlet 100.4. The page request servlet 10.2 is arranged to formulate a VoiceXML page for communication to the telephony platform 30 in accordance with data received from the presentation generator 132. The telephony platform 30 interprets the received VoiceXML page in accordance with what is specified in the VoiceXML page. The telephony platform 30 accesses a media servlet 122 to obtain media data 122 in response to the VoiceXML page. The VoiceXML page may include one or more URLs, which access the media data 122 via the data access layer 112. The web server 100 also receives page requests from the telephony platform 30, in response to <submit> or <goto> elements in the VoiceXML page, which are processed by the web server 100 and returned in the form of VoiceXML pages.
As explained above, examples of the present technique provide a facility for separating service logic which defines the states of the application program for providing tasks to the user from prompt selection rules and in some examples also from the user commands which are recognised by the user command recogniser 126. As illustrated in
As a result, a particular advantage is provided by the specification and execution of the application program in that a user command driven service may be adapted to different audio prompts in accordance with preferences of the user. For example, the user may receive audio prompts in the form of a female voice rather than a male voice if the user so prefers. In contrast, in some implementations, the service may be adapted to different languages. Accordingly, by separating the state machine defining the service logic from the prompt selection rules the same service may be deployed in different countries by simply replacing the audio prompts, adapting the prompt selection rules, and adapting the user command recogniser 126 including prompts and prompt recordings.
Such an arrangement is particularly advantageous when applied to language customisation because tailoring the user interface to a particular language does not always simply involve substituting a set of prompt recordings of one language with a set of prompt recordings of an alternative language. For instance, to create a French language version of an existing English language service, the simplest course of action would be to translate each of the English prompts into French. However, this approach will not provide a high quality French language user interface because grammatical differences between the two languages sometimes dictate different sentence structures for communicating the same concept. A prompt sequence may include a mixture of static and dynamic prompts, for example “You have six messages, of which two are new. The first message is from Fred Smith” where the underlined terms are dynamic. The full prompt is therefore composed of at least six portions, of which three are static and three are dynamic. In an alternative language these prompt sections may preferably be arranged in a different order, or alternatively a different prompt sequence may be used, depending on stylistic reasons or depending on differences in the syntactical rules of different languages. Clearly, a direct substitution of prompt recordings is unable to address these issues.
In order to illustrate advantages provided by embodiments of the present invention, an example service driven by voice activated user commands will now be explained with reference to
Call: <place> 206, this state is reached if the user has specified call and “place” that is to say with a received confidence value of less than 0.9 the user specified a place where he wishes to place a call.
Call: <place> 208, this state corresponds to the call: <place> state 206 except that the confidence level returned by the user command recogniser 126 is greater than or equal to 0.9.
Call: <person> 210, this state is reached from the main state if the user uttered the word “call” followed by the person to be called where the command recogniser 126 has returned a confidence level for the confidence of detecting the person to be called of greater than or equal to 0.9 and where the person to be called has only one number, for instance a home number or a work number.
Call: <person> 212, this state is reached from the main state if the user uttered the word “call” followed by the person to be called where the command recogniser 126 has returned a confidence level for the confidence of detecting the person to be called of greater than or equal to 0.9 and where the person to be called has more than one number, for instance both a home number and a work number.
Call: <person> <place> state 214, this state is reached from the main state if the user uttered the word “call” followed by the name of a person and the name of a place where the confidence level for both the person and the place is less than 0.9.
For the example illustrated in
For the example of where the call: <place> state 206 was reached, represented as the second row 206.1, the suggested prompt would request whether the specified place was correct because the confidence that the place was recognised returned by the command recognition engine 126 was less than 0.9. Accordingly the transition setup would be to go to the “ConfirmnCall” dialogue state. In contrast, if the state call: <place> 208 had been reached, represented by the third row 208.1, then because the place was recognised with greater than or equal to 0.9 confidence level, the suggested prompt would inform the user that the call was being placed to <place>, an action would be performed to initiate the call and the transition setup would be to go to the “Background” dialogue state. The background state is a state in which the application program is idle except for monitoring whether the user expresses a “wakeup” word.
For the example of where the call: <person> state 210 was reached, represented as the fourth row 210.1, the suggested prompt informs the user that the call is being placed to <person>, the action is to initiate the call to the person, and the next state is Background, because the person has been recognised with a confidence score of greater than or equal to 0.9 and there is only one available number for that person. In contrast, where the call: <person> state 212 was reached, represented as the fifth row 212.1, the suggested prompt asks the user which of the <person>'s places to, and the slot to be filled is <Place> because there is more than one number, corresponding to a particular place, associated with that person, and the particular place has not been specified. Accordingly, the transition setup specifies that a grammar for recognizing place names should be made active.
As explained above, an advantage provided by the present technique is that prompt generation and selection is separate from the state logic of the application program. Accordingly the prompt suggestions represented in the fourth column 228 of
According to the present technique the states of the application program and the transitions between those states are expressed and specified in the form of a mark up language which has been designed and developed in order to specify the states of an application program. From the state specification described by the mark up language, code is generated which when executed at run time forms the state machine 104 shown in
The service logic, as described by a dialog flow, may be implemented using an XML-based service description language. Prompt selection logic on the other hand is specified separately from dialog flow, and may also be implemented using an XML-based language. The service logic in this case is specified as a set of dialogs, or forms, each of which has a set of slots, which can be filled. For example, a dialog for composing a voice message might have a slot for the name of a recipient and another slot for the importance of the message. Within each form, the mark-up language describes a set of situations, where each situation represents a different combination of slots to be filled within the form, along with possible events that may occur during the execution of the form. For each situation there may be a prompt to be played, an action that may be performed, and a transition to a new state of the service logic that may take place. Therefore, for the example illustrated in
The prompt selection portion of a service customisation may be specified as a set of rules, there being one or more rules for each situation that appears in the dialog of the corresponding service. Each rule may consist of a situation identifier (as illustrated for example in
Within the FORM there is a SITUATION referred to as the situation ID “CallPersonPlace.1B” 502. The table shown in
Within the SITUATION element there is a set of logical CONDITIONS for executing the form state call: <person> <place> which are provided by the logical operators within the ALLOF form state 506. However, in order to perform an ACTION required by the class then certain preconditions have to be satisfied. These preconditions are defined by a PREDICATE element 508. The PREDICATE element 508 invokes a command 510 which determines whether a person has the contact number recognised by the person and place argument. If the person does have the number which is identified for the person at the given place, then the PREDICATE is evaluated as true and the state proceeds to the ACTION element 512 in order to execute the action concerned. For the present example, this is to call the person recognised by the voice dialling package. However within the ACTION states there is provided a DIARY command to determine whether the user has called a contact specified as person place before. The diary is used to record a user's interactions with the application program, which in turn may be used by service and prompt selection logic to influence the manner of future interactions. That is the diary can then be used to select a different type of prompt in dependence upon previous actions taken by the user. This is provided by a DIARY command 514 and a command which retrieves the number at the place command 516.
A further action is also taken from the commands provided by an INVOKE element 518 to update a number of times which the voice dialling has been executed for the given call prompt act. At step 520 a PROMPT is generated by requesting the prompt selection engine to produce a prompt indicating that the application program is about to call a number. Thereafter a TRANSITION state is reached 522 which includes a command GOTO “ready to dial” indicating the transition to the ready to dial state as is correspondingly reflected in
As mentioned above, the prompt generation and prompt selection is separated from the state specification.
Corresponding examples are given for each of the situation IDs provided in the first column 250 which corresponds to situation IDs presented in the third column of
According to the present technique the prompt selection and rules pertaining to the prompt selection are specified in accordance with a situation based prompt mark-up language. The situation based prompt mark-up language defines a prompt to be selected for a given state (situation ID) of the application program in accordance with conditions for selecting that prompt as defined by the prompt rules. From the situation based mark-up language code is generated, such as Java code, for execution at run time by the prompt selection engine 220 shown in
An example sequence for invoking prompt selection at run time may include the following steps:
The application program may comprise only a single service, or may comprise a collection of services. In general, each service will deliver some set of related features to a user. In either case, the service or services provided by the application program may include customisations. As described above with reference to
The subscribed services 620 will include at least a base service which represents the minimum basic functionality of the application program, and may also include one or more additional services representing additional functionality. The subscribed services 620 can be separately installed, removed and customised prior to run-time, and during run-time will determine a coherent set of service logic.
The voice dialling service 720 includes a “MainCall” form 722 and a “Base.TopLevel” form 724. The MainCall form 722 includes a group of states 726 for enabling a user to initiate a voice call to a third party. The Base.TopLevel form includes a group of states 728 to be combined with the TopLevel form states 718 of the base service 710. In other words, the Base.TopLevel form 724 constitutes a modification to a form within another service, in this case the TopLevel form 714 within the base service 710. The forms and corresponding situations within the situation registry 730 are filtered according to the services to which a current user is subscribed to generate an overall eligible list of states. The group of states 726 within the MainCall form 722 of the voice dialling service 720 are self contained within the present example and do not directly combine with any states within the base service 710. Accordingly, the group of states 726 can be passed to a Form State Filter to define part of the service logic as a discrete group. The same applies to the SetPrefs form 712 within the base service 710. In contrast, the group of states 728 within the Base.TopLevel form 724 of the voice dialling service 720 are arranged to be combined with the group of states 718 within the TopLevel form 714 of the base service 710 to define a modified TopLevel form comprising the eligible state set 750. The eligible state set 750 therefore comprises both state set 718 and state set 728. In general, when additional services are provided, the TopLevel menu of the base service will be adapted in this way to provide the user with access to the functionality of the additional services. The eligible state set 750 is then passed on to a Form State Filter described below with reference to
A filtering process for an eligible state set 750 generated in accordance with
Once a new current user interface state 790 is selected, a corresponding action may be performed, and prompt generation, grammar generation and page generation can be invoked.
In addition to updating the service logic 104 on the introduction of a new service, the command recognition grammars 126 may also be updated to include new commands and conditions. By modifying the service logic 104 and command recognition grammars 126, a new service can accomplish several things, including adding new commands to the set of valid commands, overriding the handling of existing commands, handling new events, and overriding the handling of existing events.
In order to appreciate the advantages provided by the present technique, a general process for design and deployment of a service is illustrated in
The customer locale and brand requirements 304 also influence the design of the persona of the audio prompts which are generated for the user in order to provide the voice activated service. From the design of the persona 314, in combination with the dialogue description 308, a script writing process 316 identifies the prompts and commands which are required from the user and from which the design inputs are produced for the prompt selection rules 318 and commands from the user which are to be used to activate the tasks 320. Dialogs or forms defining the situations or states of the application program are also defined from the dialogue descriptions 308.
As illustrated in
The prompt selection rules 318 and the commands 320 are input to develop the customisation package for the application program. The prompt rules 318 serve to identify to the service designer the prompts and the prompt selection rules which are specified in the prompt selection based mark-up language 440 as specified by a prompt editor 442 controlled by the service designer. The commands 320 serve to define the command recognition specifications 444 which are developed in accordance with the design using a command recognition editor 446.
The rule mark-up language is then translated by mark-up language translator 450 into prompt selection classes 452 which are then used by customisation packager 454 to define the customisation package 456. Also used by the customisation packager are the user commands to be recognised 455 and utility classes 460. These components are combined by the packager into a single physical package by the customisation packager. Finally the prompts themselves are generated by a voice talent which are recorded by a prompt recorder 464 and input to the customisation packager to produce the customisation package 456 to be applied to the service to be deployed.
Various modifications may be made to the embodiments herein before described without departing from the scope of the present invention. It will be appreciated that an aspect of the present invention is a computer program, which when used to control an applications server carries out the methods described herein.
Number | Date | Country | Kind |
---|---|---|---|
05290038.8 | Jan 2005 | EP | regional |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/EP2006/000110 | 1/3/2006 | WO | 00 | 5/27/2008 |