There have been several attempts to enable natural language/speech based interaction with computers. The results of these attempts have so far been limited. This is due to a combination of technology imperfections, lack of non-intrusive microphone infrastructure, high authoring costs, entrenched customer behaviors and a competitor in the form of the GUI (Graphical user interface), which offers high value for many tasks. The present invention focuses on two of these limitations, closer integration with the GUI and reduced authoring. The Graphical User Interface (GUI) is a widely used interface mechanism. GUI's are very good for positioning tasks (e.g. resizing a rectangle), visual modifier tasks (e.g. making something an indescribable shade of blue) or selection tasks (e.g. this is the one of a hundred pictures I want rotated). GUI is also good for speedy access to quick single step features. An applications GUI is a useful toolbox that is organized from a functional perspective (e.g. organized into menus, toolbars, etc) rather than a task oriented perspective (e.g. organized by higher level tasks that users want to do: e.g. “make my computer secure against hackers”).
However, GUIs present many problems to the user as well. Using the toolbox analogy, a user has difficulty finding the tools in the box or figuring out how to use the tools to complete a task. An interface described by single words, tiny buttons and tabs forced into an opaque hierarchy doesn't lend itself to the way people think about their tasks. The GUI requires the user to decompose the tasks in order to determine what elements are necessary to accomplish the task. This requirement leads to complexity. Aside from the complexity issue, it takes time to assemble GUI elements (i.e. menu clicks, dialog clicks, etc). This can be inefficient and time consuming even for expert users.
One existing mechanism for addressing GUI problems is a written help procedure. Help procedures often take the form of Help documents, PSS (Product support services) KB (Knowledge base) articles, and newsgroup posts, which fill the gap between customer needs and GUI problems. They are analogous the manual that comes with the toolbox, and have many benefits. These benefits include, by way of example:
However, Help documents, PSS KB articles and newsgroups have their own set of problems. These problems include, by way of example:
Another existing mechanism for addressing GUI problems is a Wizard. Wizards were created to address the weaknesses of GUI and written help procedures. There are now thousands of wizards, and these wizards can be found in almost every software product that is manufactured. This is because wizards solve a real need currently not addressed by existing text based help and assistance. They allow users to access functionality in a task-oriented way and can assemble the GUI or tools automatically. Wizards allow a program manager and developer a means for addressing customer tasks. They are like the expert in the box stepping the user through the necessary steps for task success. Some wizards help customers setup a system (e.g. Setup Wizards), some wizards include content with features and help customers create content (e.g. Newsletter Wizards or PowerPoint's AutoContent Wizard), and some wizards help customers diagnose and solve problems (e.g. Troubleshooters).
Wizards provide many benefits to the user. Some of the benefits of wizards are that:
However, wizards too, have their own set problems. Some of the problems with wizards include, by way of example:
Active Content Wizards (ACWs) related to helping computer users perform tasks are executed using an ACW interpreter. In one aspect of the present invention, the interpreter provides multiple levels of user interaction for a given ACW script. In order to help focus the user's attention, various methods are used to increase the conspicuity of the user interface elements relative to sub-tasks during execution of the ACW script. In one embodiment, areas around the user interface element are also de-emphasized.
The invention is operational with numerous other general purpose or special purpose computing system environments or configurations. Examples of well known computing systems, environments, and/or configurations that may be suitable for use with the invention include, but are not limited to, personal computers, server computers, hand-held or laptop devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.
The invention may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
With reference to
Computer 110 typically includes a variety of computer readable media. Computer readable media can be any available media that can be accessed by computer 110 and includes both volatile and nonvolatile media, removable and non-removable media. By way of example, and not limitation, computer readable media may comprise computer storage media and communication media. Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by computer 110. Communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media. The term “modulated data signal” means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared and other wireless media. Combinations of any of the above should also be included within the scope of computer readable media.
The system memory 130 includes computer storage media in the form of volatile and/or nonvolatile memory such as read only memory (ROM) 131 and random access memory (RAM) 132. A basic input/output system 133 (BIOS), containing the basic routines that help to transfer information between elements within computer 110, such as during start-up, is typically stored in ROM 131. RAM 132 typically contains data and/or program modules that are immediately accessible to and/or presently being operated on by processing unit 120. By way of example, and not limitation,
The computer 110 may also include other removable/non-removable volatile/nonvolatile computer storage media. By way of example only,
The drives and their associated computer storage media discussed above and illustrated in
A user may enter commands and information into the computer 110 through input devices such as a keyboard 162, a microphone 163, and a pointing device 161, such as a mouse, trackball or touch pad. Other input devices (not shown) may include a joystick, game pad, satellite dish, scanner, or the like. These and other input devices are often connected to the processing unit 120 through a user input interface 160 that is coupled to the system bus, but may be connected by other interface and bus structures, such as a parallel port, game port or a universal serial bus (USB). A monitor 191 or other type of display device is also connected to the system bus 121 via an interface, such as a video interface 190. In addition to the monitor, computers may also include other peripheral output devices such as speakers 197 and printer 196, which may be connected through an output peripheral interface 195.
The computer 110 may operate in a networked environment using logical connections to one or more remote computers, such as a remote computer 180. The remote computer 180 may be a personal computer, a hand-held device, a server, a router, a network PC, a peer device or other common network node, and typically includes many or all of the elements described above relative to the computer 110. The logical connections depicted in
When used in a LAN networking environment, the computer 110 is connected to the LAN 171 through a network interface or adapter 170. When used in a WAN networking environment, the computer 110 typically includes a modem 172 or other means for establishing communications over the WAN 173, such as the Internet. The modem 172, which may be internal or external, may be connected to the system bus 121 via the user input interface 160, or other appropriate mechanism. In a networked environment, program modules depicted relative to the computer 110, or portions thereof, may be stored in the remote memory storage device. By way of example, and not limitation,
Task prediction module 210 is configured to determine a task associated with the inputted user command 206. In one embodiment, task prediction module 210 leverages an existing help search module to search task database 220 to find matches to the user command 206. Task prediction module 210 receives a user input command 206 and converts and/or processes command 206 into a format that allows for searching of task database 220. Module 210 then executes a search against task database 220 to obtain information associated with the task represented by command 206.
Following the search, task prediction module 210 receives the results of the search from task database 220 and provides one or more task documents from database 220 that likely match the user query 206, to the user through an appropriate interface 221. In one embodiment, module 210 simply selects one of the task documents as a selected task. In another embodiment, the user can select, through interface 221, one of those documents as a selected document. Task prediction module 210 then returns an active content wizard (ACW) script corresponding to the selected task to the ACW Interpreter 230. It should be noted that task prediction module 210 has been described as a conventional information retrieval component. However, other methods can be used to determine the desired task represented by user command 206. By way of example, any other well-known information retrieval technique, such as pattern or word matching, context free grammars (CFGs) for speech support, or other classifier such as support vector machines and Naive Bayesian Networks.
ACW interpreter 230 is a computer program configured to execute the atomic steps for the task selected by the user. In one embodiment ACW interpreter 230 contains a GUI Automation module implemented using Microsoft User Interface Automation also by Microsoft Corporation. This module simulates user inputs, such as keyboard key depressions, mouse clicks, mouse wheel rotations, etc. However, the GUI automation module of ACW interpreter 230 can be implemented using any application that is able to programmatically navigate a graphical user interface and to perform and execute commands on the user interface. Thus, ACW interpreter 230, in some embodiments, may actually use a programmatic interface, such as a user interface automation module, to send messages directly to the user interface control(s).
ACW interpreter 230 thus executes each of the atomic steps associated with a selected task in order. For instance, when the task requires the user to click a button on the GUI to display a new menu or window, ACW interpreter 230 uses the GUI automation module to locate the button on the display device 191 (such as a monitor), clicks the button, and then waits for the new window to show up on the display device. The type/name of the window expected is detailed in the ACW script file 211.
At 428, system 200 selects the first step in the number of atomic steps to be executed by the ACW Interpreter 230. At 434, the system 200 determines whether a user input is required to complete this particular atomic step. If user input is required to complete the step, system 200 displays, at 440, the particular step to the user. The display can be a window on display device 191 requesting an input, or it can be the GUI associated with the particular atomic step. For example, following displaying of the text for that particular step system 200 waits, and does not advance to the next atomic step until it receives the required user input at 446. The system can also display any additional information that is useful to the user in making a decision, such as related information.
Following receipt of the required input, or if no such input is required, system 200 proceeds to execute the current atomic step at 452. At step 458, system 200 looks ahead to see whether there is another atomic step to be executed for the selected task. If there are additional atomic steps to execute, system 200 checks, at 464, to see if the user has selected a step-by-step mode. If so, system 200 executes each individual atomic step only after it receives an input from the user indicating that the user is ready to advance to the next atomic step in the list of atomic steps. This input is received at 470. If system 200 is not in step-by-step mode, the system returns to step 428 and executes the next step in the list of atomic steps as discussed above. If at step 458 there are no additional atomic steps to execute, system 200 had finished executing the desired task at step 476.
The set of screen shots in
The text 501 to display in window 500 is “Open Control Panel”. The ACW Interpreter 230 executes this step by executing a shortcut called control.exe, and displays the control panel window under window 500 as shown in
The text 511 to display in window 510 is “Click the System icon”. The ACW Interpreter 230 finds the System icon 515 on the control panel window using the Path information contained in the script file. The Path information is used by the ACW Interpreter to programmatically locate the icon on the screen using some GUI automation technology (E.g. Windows UI Automation). When ACW Interpreter 230 finds the icon, the interpreter calls the “invoke” method on the icon (using Windows UI Automation) to click it.
In
In
As there are additional steps required to complete the task, system 200 displays to the user the next set of instructions in window 540. Window 540 instructs the user to “Click on the Path icon” 541. At the same time the ACW interpreter 230 locates the Path icon 543 on window 542 and highlights it for the user. System 200 then executes a click command on path icon 543 causing window 550 to appear as illustrated in
The user is again presented with instructions to complete this next step in the sequence of atomic steps. Window 550 instructs the user to click on the Edit button 553 through text 551. At the same time ACW Interpreter 230 locates the edit button 553 on window 542 and highlights the edit button 553 on the GUI. System 200 then executes a click command clicking edit button 553, which causes window 562 to open as illustrated in
The action is listed as a USERACTION which lets the ACW Interpreter know that user input is expected in this step, and that it cannot proceed till the user finishes.
Window 550 changes to highlight a second instruction 563 to the user. This instruction instructs the user to make desired changes to the path. As this step requires user input system 200 does not advance until the user enters the desired information and clicks Next. Then system 200 causes window 570 to open instructing the user to click the “OK” button 572. At the same time the ACW Interpreter 230 locates and highlights button 572 on window 562, as illustrated in
In accordance with one embodiment of the invention, ACW interpreter 230 can provide different levels of user interaction with the interface. For example, in steps where no user data is required, such as those described above with respect to
In accordance with another embodiment of the present invention, as each step of the ACW script is executed by ACW interpreter 230, the conspicuity of the corresponding user interface element is increased. The increase in conspicuity can be done in a number of ways. First, the element itself can be highlighted, or otherwise emphasized as set forth above. Additionally or alternatively, the conspicuity of the user interface element can be increased by de-emphasizing areas surrounding the element of interest. For example, a fog can be applied to all areas of the user interface except the element of interest and the ACW window, such as window 500 in
When a user's attention needs to be drawn to an element of the interface, it is preferable that the conspicuity of the element be increased by de-emphasizing its surroundings. Accordingly, when ACW interpreter 230 is operating in “show me” mode, the entire screen, with the exception of the ACW window and the user interface element with which the ACW script is interacting, may be de-emphasized.
As ACW interpreter 230 executes a given ACW script, it preferably employs a timer for each step that requires a user action. The use of such a timer ensures that if a user is struggling, as indicated by the passing of a selected amount of time (for example, three seconds) ACW interpreter 230 will provide a further prompt to the user. For example, if ACW interpreter 230 is waiting for the user to press the “Next Step” button 860, in
Although the present invention has been described with reference to particular embodiments, workers skilled in the art will recognize that changes may be made in form and detail without departing from the spirit and scope of the invention.
This application is a Continuation-In-Part Application of U.S. patent application Ser. No. 10/337,745, filed Jan. 7, 2003 entitled ACTIVE CONTENT WIZARD: EXECUTION OF TASKS AND STRUCTURED CONTENT.
Number | Date | Country | |
---|---|---|---|
Parent | 10337745 | Jan 2003 | US |
Child | 10944688 | Sep 2004 | US |