The invention relates generally to a system and method for controlling the behavior of a social robotic character, which may be embodied as a physical or virtual character.
Characters (i.e., robotic and virtual/animated characters) are becoming capable of interacting with people in an increasingly life-like manner. The term character as used herein refers to a social robotic character, which may be embodied as a physical or virtual character. Characters are especially well-suited for carrying out discrete, purposeful tasks or exercises. An example of such a task would be for a character to teach an autistic student how to politely thank people for gifts. However, to carry out such tasks in a life-like manner, the character must monitor and adapt to each human user's unpredictable behavior while continuing to perform the tasks at hand. As such, developing life-like programs or applications for a character is exceedingly complex and difficult. In particular, it is difficult for the character to perform in an apparently coherent and responsive fashion in the face of multiple simultaneous goals, perceptions, and user inputs.
Furthermore, if these applications are executed solely using locally available hardware and software, then they would require complex software and expensive computer hardware to be installed locally. The locally available hardware and software is referred to as the local agent. Meanwhile, modern computer networks have made it possible to access very powerful processors in centralized server locations (“the cloud”) at much lower cost per computational operation than on a local agent. These central servers, or remote agent, offer throughput and cost advantages over local systems, but can only be accessed over the network relatively infrequently (compared to local resource accesses), with significant time latency, and subject to common network reliability and performance concerns. Using a distributed network computing approach can exacerbate the problem of maintaining coherence and responsiveness in the character's performance as discussed in the previous paragraph.
Thus, there is a need for a system for efficiently developing programs and/or applications for a character to perform discrete, purposeful tasks or exercises, including where such tasks require the character to coherently perform many functions sequentially as well as simultaneously. Further, there is a need for such a system to account for and adapt to the environment in which the character is operating. Still further, there is a need for a system executing such programs and/or applications to operate efficiently and be implementable using low-cost hardware at the local-agent level. Thus, there is a need for a system that offloads computationally difficult tasks to a remote system, while taking into account the latency, reliability, and coherence issues inherent in network communication among distributed systems.
The present invention provides a system for controlling the behavior of a social robotic character. The system comprises a scene planner module. The scene planner module is configured to assemble a scene specification record comprising one or more behavior records. A scene execution module is configured to receive the scene specification record and to process the scene specification record to generate an output. A character interaction module is configured to receive the output and from the output cause the social robotic character to perform one or more behaviors specified by the one or more behavior records. The social robotic character may be embodied as a physical robot or a virtual robot.
The present invention provides a method for controlling the behavior of a social robotic character. The method comprises the step of assembling a scene specification record comprising one or more behavior records. Then, the scene specification record is processed and an output is generated. Finally, the output then causes the social robotic character to perform one or more behaviors specified by the one or more behavior records.
The present invention also provides a non-transitory computer readable storage medium having stored thereon machine readable instructions for controlling the behavior of a social robotic character. The non-transitory computer readable storage medium comprises instructions for assembling a scene specification record comprising one or more behavior records. The non-transitory computer readable storage medium further comprises instructions for processing the scene specification record to generate an output, as well as instructions for causing the social robotic character to perform one or more behaviors specified by the one or more behavior records based on the output.
The foregoing has outlined rather broadly the features and technical advantages of the present invention in order that the detailed description of the invention that follows may be better understood. Additional features and advantages of the invention will be described hereinafter which form the subject of the claims of the invention. It should be appreciated by those skilled in the art that the conception and the specific embodiment disclosed may be readily utilized as a basis for modifying or designing other structures for carrying out the same purposes of the present invention. It should also be realized by those skilled in the art that such equivalent constructions do not depart from the spirit and scope of the invention as set forth in the appended claims.
For a more complete understanding of the present invention, and the advantages thereof, reference is now made to the following descriptions taken in conjunction with the accompanying drawings, in which:
Refer now to the drawings wherein depicted elements are, for the sake of clarity, not necessarily shown to scale and wherein like or similar elements are designated by the same reference numeral through the several views. In the interest of conciseness, well-known elements may be illustrated in schematic or block diagram form in order not to obscure the present invention in unnecessary detail, and details concerning various other components known to the art, such as computers, electronic processors, and the like necessary for the operation of many electrical devices, have not been shown or discussed in detail inasmuch as such details are not considered necessary to obtain a complete understanding of the present invention, and are considered to be within the skills of persons of ordinary skill in the relevant art. Additionally, as used herein, the term “substantially” is to be construed as a term of approximation.
It is noted that, unless indicated otherwise, all functions described herein may be performed by a processor such as a microprocessor, a controller, a microcontroller, an application-specific integrated circuit (ASIC), an electronic data processor, a computer, or the like, in accordance with code, such as program code, software, integrated circuits, and/or the like that are coded to perform such functions. Furthermore, it is considered that the design, development, and implementation details of all such code would be apparent to a person having ordinary skill in the art based upon a review of the present description of the invention.
Referring to
A content authoring and scene generator module (CASGM) 110 is responsible for generating the scene, which is more specifically referred to as the scene specification record. The CASGM 110 accesses various cloud graphs 120, which contain the data necessary to determine the motivations of the character and provides an output comprising a scene to certain cloud graphs 120. The CASGM 110 and the cloud graphs comprise the higher-layer functions of the system 100 and are preferably implemented using remote hardware and software or in the “cloud,” i.e., at a remote agent. In alternative embodiments, the higher-layer functions may be implemented on a local agent. In yet other embodiments, the local agent has a less powerful version of the higher-layer functions that may be used if communication with the remote system fails. A scene execution module (SEM) 130 accesses certain information in the cloud graphs 120, including the scene specification record. It processes the behaviors in the scene by accessing various local graphs 140 and provides an output to various local graphs 140 and also the cloud graphs 120. The SEM 130 and certain local graphs 140 comprise the middle-layer functions of the system 100 and are implemented on a local agent. Preferably, the higher-layer functions when implemented remotely may service a plurality of local agents.
A character interaction module (CIM) 150 accesses information on certain local graphs 140 and may cause the character to perform the desired behavior. Preferably, graphs are implemented in compliance with Resource Description Framework (RDF) standards published by the World Wide Web Consortium (W3C). Alternatively, instead of graphs, other forms for data transfer and storage may be used, including SQL databases, column stores, object caches, and shared file systems.
A preferred implementation of the higher-layer functions is shown in
A scene planner module 260 is provided and is responsible for translating current motivations into a scene specification record. The scene planner module 260 first accesses the motivation graph 230 to retrieve the current motivations for a particular character. It also accesses the scene results graph 250 for the character to determine if activity from the previous scene requires further processing. The scene planner module 260 then generates a complete scene specification record by accessing the behavior template graph 200 whose records provide a starting point. The scene specification record is preferably defined by the following pseudo-code:
The scene specification record is output to a scene source graph 270, which is then accessed by middle-layer functions to run the scene. The middle-layer functions also provide the results of scenes being run via the scene results graph 250.
Referring to
The scene query module 310 then loads the retrieved scene specification (including all behavior records therein) into memory and control passes to a behavior processing module (BPM) 320.
Prior to the BPM 320 processing the behavior records, a channel access module 330 sets-up or “wires” any necessary graphs to input and output channels that are needed to process the scene. Input channels are wired to readable graphs, which are accessed by the BPM 130 to evaluate guards from the scene's behaviors (discussed below). Output channels are wired to writable graphs, which are accessed by the BPM 130 to accomplish the output steps from the scene's behaviors (discussed below). Preferably, the wired graphs include: a perception parameter graph 350, an estimate graph 352, a goal graph 360, the scene results graph 250, and the other knowledge base graph 240.
After wiring is complete, the BPM 320 begins to process behavior records of the scene specification record in order to determine appropriate output actions for the character to perform. Each behavior record comprises a set of one or more steps. A step is generally an action by the character, or a change in the character's state. Steps that a character may perform include, but are not limited to: outputting speech in the character's voice with lip synchronization; beginning or modifying a walking motion in some direction; establishing eye contact with a user; playing a musical phrase or sound effect; updating a variable stored in some graph including the local working state graph 354 or cloud-hosted scene results graph 250. Each step has zero or more guards associated with it. A guard defines a condition that must be satisfied before the associated step can be performed. A guard is preferably a predicate evaluated over one or more of the graphs that are currently wired. For example, a step instructing the character to say “You're welcome!” may have a guard associated with it that requires a student to first say “Thank you!”, or an equivalent synonym. The example predicate may be written in pseudo-code as follows:
To evaluate this example, the BPM 320 would access the SPEECH_IN channel, which would be resolved by the channel access module 230 to the estimate graph 352 containing the results of speech input processing. As the BPM 320 processes the scene (as discussed in more detail below), output related to various steps is provided by writing records into the goal graph 360, which triggers processing by lower-layer functions as discussed below.
Referring to
Referring to
Referring to
Referring to
The lower layer also comprises a character embodiment 750. The character is preferably embodied as a physical character (e.g., a robot). Alternatively, the character may be embodied in a virtual character. In either embodiment, the character may interact in the physical world with one or more human end users. A physical character embodiment may directly interact with the user, while a virtual character embodiment (also referred to as an avatar) may be displayed on the screen of one or more tablets, computers, phones, or the like and thus interact with the users. The character embodiment 750 contains sensors 754 for receiving input. For example, sensors 754 may include: cameras, microphones, proximity sensors, accelerometers, gyroscopes, touchscreens, keyboard and mouse, and GPS receivers. The character embodiment 754 also contains actuators 756 for providing output and interacting with the user. For example, actuators may include: servo motor mechanisms for physical movements (e.g., waving a hand or walking), speakers, lights, display screens, and other kinds of audiovisual output mechanisms. In the case of a virtual character embodiment, the body joints of the virtual character or avatar represent virtual servos and are controlled through a process analogous to that used with a physical robot character. Furthermore, in the virtual case, it is preferred to make use of sensors and actuators attached to or embedded in the computer, tablet, or phone on which the virtual avatar is displayed.
Information provided by sensors 754 is continually processed by a perception module 710. Parameters of the perception process are maintained in the perception parameter graph 350. Results of the perception process are intermittently posted into the estimate graph 352. For example, the perception module 710 monitors input from a microphone sensor and may determine that some sound heard on the microphone contains the words “Hello there, nice robot.” If monitoring for that phrase, the perception module 710 would then post a corresponding estimated result into the estimate graph 352, which may be used by another component, such as the BPM to evaluate a guard and trigger a step to be performed. In another example, the perception module 710 may determine that an image from a camera sensor contains a familiar looking human face and calculate an estimate of that person's identity and location, which is posted into the estimate graph 352 for use by the BPM 320 and other interested components.
Goals for actions of the character are set by middle-layer components as discussed above through the goal graph 360. An action module 720 monitors these goals and sends appropriate commands to the appropriate actuators 756 to cause the character to perform the action or goal. For example, a step executed by the BPM may configure a speech-output goal to say a particular piece of speech text along with synchronized mouth movements commands sent to the character's body. Other goals may include, e.g., to play a particular musical score, to walk in a particular direction, or to make eye contact with a particular user. The action module 720 records progress towards the completion of each goal by sending goal progress update records into the goal graph 360. These records are then available for reading by middle and higher layer functions. In some cases, such as maintaining eye contact with a user, the action module 720 may need to process frequently updated sensor information in a closed feedback control loop. The action module 720 may do this by directly accessing the estimate graph 352.
We now consider a detailed example illustrating a preferred embodiment of the present invention with reference to
Using an administrative interface provided by the administrative agent, Tammy may control certain behaviors of Zig, e.g., by causing him to play a scene. Here, Tammy has instructed Zig to tell a story about a rocket ship, which Zig has begun to tell. Also present in the room with Zig are three child users, named Wol 850, Xerc 852, and Yan 854. Zig is aware and knows certain information about the three child users. Yan is closest to Zig, and Zig has known him for seven months. Zig just met Xerc three minutes ago, and so far Zig only knows Xerc's name and face. Wol is a sibling of Yan who has some emotional development challenges of which Zig is aware. Wol has been known for about as long as Yan.
All this information is stored in clouds graphs maintained by the remote agent 810. More particularly, the higher-layer agent modules 210 organize the information as it is processed and generally store the information in the other knowledge base graphs 240. The higher-layer agent module 210 also uses the total available set of information to create and update a set of persistent motivations, which are stored in the motivation graph 230. The higher-layer agent modules 210 also create a specific story-telling motivation in response to the instruction received from Tammy.
At a certain point in time, Zig's motivation graph 230 (maintained at the remote agent 805) may have the following exemplary motivations shown as follows in Table 1:
The set of six motivations described above is maintained through the combined action of all higher-layer agent modules 210 with access to Zig's motivation graph 230.
The scene planner module 260 makes use of all available information in the cloud graphs to translate each of the above six examples into one or more behavior records, which collectively form a scene specification record. For example, the scene planner module 260 may convert the motivations M1-M6 into the following behavior records (comprising steps and guards) shown in Table II:
In this exemplary case, the mapping from cognitive motivation set into behavioral intention set is one to one, but the scene planner module is free to rewrite the motivation set into any larger or smaller set of combined behavior intentions that most coherently and pleasingly represent the best conceived scene of appropriate duration, typically 10 to 100 seconds. Further, each behavior record comprises zero or more steps for carrying out the desired behavior which satisfies a motivation. And, each step may have zero or more guards. For example, behavior record BR4 may have the following steps and guards as shown in Table III:
Similarly, the other behavior records contain steps and guards.
The scene specification is retrieved from the scene source graph 270 by the scene query module 310 of the scene execution module (SEM) 130. The channel access module 330 wires the necessary channels to process the behavior records. Then the behavior records are processed by the behavior processing module (BPM) 320. The BPM writes goals (physical, verbal, musical, etc.) to the goal graph 360, which are then read by the action module 720 of the character interaction module 150. The action module 720 then cause actuators to perform the step. For example, when S3 is executed, the BPM would write a speech-output goal to the goal graph, which would be read by the action module. The action module would then use a text-to-speech system to produce output audio that would be sent to the speaker actuator, thereby causing Zig's speaker actuator to say, “I made it, plug me in!” The action module is also responsible for instructing servo actuators in Zig's mouth to move in synchronization with the output audio, thus creating a convincing performance.
The BPM, as a software component, performs the six behaviors as cooperative tasks on a single thread. Preferably, it refreshes each behavior's processing at least five times each second. Typically, the exact order in which behaviors are asked to proceed is not significant to the total performance as they are being performed simultaneously. That is, Zig may ask Xerc a question (BR3), while at the same time walking towards the charger (S2 of BR4) and continuing to intend to eventually return to the rocket ship story. What is more significant is the fact that the local CPU sharing between behaviors is single-threaded, and thus they may operate free from low-level locking concerns on individual state variables.
However, the locking concerns that do matter are at the intentional level, where behaviors seek to avoid trampling upon each other. That is, they should seek to avoid producing output activity that will appear to be conflicting from end users' perspective. Knowing this, the scene planner module 260 generates behavior specifications that guard against conflict with each other using certain arbitrary variables in the working state graph 160. For example, a “topic” variable may be used to establish a sequential context for verbal interactions, and thus prevent the different verbally dependent behaviors from conflicting unnecessarily. The following pseudo-code illustrates such an example of using guards and steps employing the WorkingState.Topic parameter to resolve these issues:
The scene planner module 260 faces other difficulties in short term character performance related to the complexity of physical (and musical) performance and sensing in a dynamic environment, when multiple physical goals are active. They may similarly be resolved using an appropriate working state variable. The scene planner module is aware of the various steps being inserted into the scene specification record and thus may insert the appropriate guards when constructing the scene specification record.
Having thus described the present invention by reference to certain of its preferred embodiments, it is noted that the embodiments disclosed are illustrative rather than limiting in nature and that a wide range of variations, modifications, changes, and substitutions are contemplated in the foregoing disclosure and, in some instances, some features of the present invention may be employed without a corresponding use of the other features. Many such variations and modifications may be considered obvious and desirable by those skilled in the art based upon a review of the foregoing description of preferred embodiments. Accordingly, it is appropriate that the appended claims be construed broadly and in a manner consistent with the scope of the invention.
The present application claims the benefit of U.S. Provisional Application No. 61/784,839, titled “System and Method for Robotic Behavior,” filed on Mar. 14, 2013, which is hereby incorporated by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
61784839 | Mar 2013 | US |