The present systems, methods, control modules, and computer program products generally relate to robot control, and particularly relate to deploying, harnessing, and/or generally using a large language model in the control of a robot.
Robots are machines that may be deployed to perform tasks. Robots may come in a variety of different form factors, including humanoid form factors. Humanoid robots may be operated by tele-operation systems through which the robot is caused to emulate the physical actions of a human operator or pilot. Special-purpose robots may be designed to perform a specific task, whereas general purpose robots may be designed to perform a multitude of tasks.
Humans perform many tasks in their personal and work lives. Examples of tasks include everything from making a bed, to washing dishes, to loading a dishwasher, to mowing a lawn, to taking inventory, to checking out customers, to stocking shelves, to painting, to hairstyling, to preparing a meal, to cleaning, to taking measurements, to performing calculations, to recording data, to performing analyses, to creating art/music, to performing art/music, to building, to manufacturing, to assembling, to destroying, to disassembling, to displacing, to pick-and-placing, to navigating, and on and on. In many cases, there is a strong desire, and an ongoing need, to automate various tasks so that humans may direct their time and/or attention to other things.
A large language model (LLM) is a form of artificial intelligence that has been trained on a large corpus of text data to produce human-like text responses to natural language (NL) inputs. Popular examples in the art today include the various incarnations of OpenAI™'s Generative Pre-Trained Transformer (GPT), such as text-davinci-003, text-curie-001, text-babbage-001, and text-ada-001. LLMs can be accessed by, or deployed in, text-based user interfaces to allow chat-like interactions between a user and a computer, such as in OpenAI™'s ChatGPT™ application built on the GPT-3™ family of LLMs.
A method of operation of a robot system may be summarized as including: identifying, by the robot system, a person in an environment of the robot system; accessing, by the robot system, information about the person; generating a first natural language (NL) query by the robot system, the first NL query including a NL description of the information about the person, a NL description of contextual information, and a NL request for an outbound verbalization for the robot system to deliver to the person; providing the first NL query to a large language model (LLM) module of the robot system; receiving, from the LLM module, the outbound verbalization for the robot system to deliver to the person; and delivering, by the robot system, the outbound verbalization to the person. Identifying, by the robot system, the person in the environment of the robot system may include: capturing, by at least one camera, an image of a face of the person; and determining an identity of the person based on the image of the face of the person. Identifying, by the robot system, the person in the environment of the robot system may include: scanning, by at least one sensor, an identifier associated with the person; and determining an identity of the person based on the identifier associated with the person.
Accessing, by the robot system, information about the person may include retrieving, by the robot system, digital information about the person from a database of digital information about multiple people, the database stored in a non-transitory processor-readable storage medium. Retrieving, by the robot system, digital information about the person from a database of digital information about multiple people may include retrieving, by the robot system, digital information about the person from a database of digital information about multiple people, the digital information about multiple people collected through multiple channels including at least one channel selected from a group consisting of: purchasing histories of the multiple people; location histories of the multiple people; internet browsing histories of the multiple people; account profiles of the multiple people; event history of the environment; and information about past interactions between the robot system and the multiple people.
The NL description of contextual information may include a NL description of at least a portion of the environment. The NL description of contextual information may include a NL description of a respective role of each of the robot system and the person. The NL description of contextual information may include a NL description of information accessed by the robot system from at least one source selected from a group consisting of: a local news report, a national news report, an international news report, and a weather report.
Delivering, by the robot system, the outbound verbalization to the person may include verbalizing the outbound verbalization by the robot system. The outbound verbalization may include a question about the person, and wherein verbalizing the outbound verbalization by the robot system includes verbally asking the person a question by the robot system.
The method may further include: receiving, by the robot system, an inbound verbalization from the person; generating a second NL query by the robot system, the second NL query including a NL transcription of the inbound verbalization received from the person by the robot system, a NL description of the outbound verbalization delivered from the robot system to the person, a NL description of the first NL query, and a NL request for a response to the inbound verbalization received from the person by the robot system; providing the second NL query to the LLM module of the robot system; receiving, from the LLM module, the response; and delivering the response to the person by the robot system. The NL description of the first NL query may include at least one NL description selected from a group consisting of: a NL summary of the first NL query, a NL excerpt from the first NL query, and a NL copy of the first NL query.
A method of operation of a robot system may be summarized as including: receiving, by the robot system, an inbound verbalization from a person in an environment of the robot system; identifying the person by the robot system; accessing, by the robot system, information about the person; generating a natural language (NL) query by the robot system, the NL query including a NL description of the information about the person, a NL description of contextual information, a NL transcription of the inbound verbalization received from the person by the robot system, and a NL request for a response verbalization for the robot system to deliver to the person; providing the NL query to a large language model (LLM) module of the robot system; receiving, from the LLM module, the response verbalization for the robot system to deliver to the person; and delivering, by the robot system, the response verbalization to the person. Identifying the person by the robot system may include: capturing, by at least one camera, an image of a face of the person; and determining an identity of the person based on the image of the face of the person. Identifying the person by the robot system may include: scanning, by at least one sensor, an identifier associated with the person; and determining an identity of the person based on the identifier associated with the person.
Accessing, by the robot system, information about the person may include retrieving, by the robot system, digital information about the person from a database of digital information about multiple people, the database stored in a non-transitory processor-readable storage medium. Retrieving, by the robot system, digital information about the person from a database of digital information about multiple people may include retrieving, by the robot system, digital information about the person from a database of digital information about multiple people, the digital information about multiple people collected through multiple channels including at least one channel selected from a group consisting of: purchasing histories of the multiple people; location histories of the multiple people; internet browsing histories of the multiple people; account profiles of the multiple people; event history of the environment; and information about past interactions between the robot system and the multiple people.
The NL description of contextual information may include a NL description of at least a portion of the environment. The NL description of contextual information may include a NL description of a respective role of each of the robot system and the person. Delivering, by the robot system, the response verbalization to the person may include verbalizing the response verbalization by the robot system.
The various elements and acts depicted in the drawings are provided for illustrative purposes to support the detailed description. Unless the specific context requires otherwise, the sizes, shapes, and relative positions of the illustrated elements and acts are not necessarily shown to scale and are not necessarily intended to convey any information or limitation. In general, identical reference numbers are used to identify similar elements or acts.
The following description sets forth specific details in order to illustrate and provide an understanding of the various implementations and embodiments of the present systems, methods, control modules, and computer program products. A person of skill in the art will appreciate that some of the specific details described herein may be omitted or modified in alternative implementations and embodiments, and that the various implementations and embodiments described herein may be combined with each other and/or with other methods, components, materials, etc. in order to produce further implementations and embodiments.
In some instances, well-known structures and/or processes associated with computer systems and data processing have not been shown or provided in detail in order to avoid unnecessarily complicating or obscuring the descriptions of the implementations and embodiments.
Unless the specific context requires otherwise, throughout this specification and the appended claims the term “comprise” and variations thereof, such as “comprises” and “comprising,” are used in an open, inclusive sense to mean “including, but not limited to.”
Unless the specific context requires otherwise, throughout this specification and the appended claims the singular forms “a,” “an,” and “the” include plural referents. For example, reference to “an embodiment” and “the embodiment” include “embodiments” and “the embodiments,” respectively, and reference to “an implementation” and “the implementation” include “implementations” and “the implementations,” respectively. Similarly, the term “or” is generally employed in its broadest sense to mean “and/or” unless the specific context clearly dictates otherwise.
The headings and Abstract of the Disclosure are provided for convenience only and are not intended, and should not be construed, to interpret the scope or meaning of the present systems, methods, control modules, and computer program products.
The various implementations described herein provide systems, methods, control modules, and computer program products that use one or more LLM(s) to enhance, facilitate, augment, or implement control of one or more robot system(s). Exemplary robot systems that may employ the teachings of the present systems, methods, control modules, and computer program products include, without limitation, the general-purpose humanoid robots developed by Sanctuary Cognitive Systems Corporation, various aspects of which are described in U.S. patent application Ser. No. 18/375,943, U.S. patent application Ser. No. 18/513,440, U.S. patent application Ser. No. 18/417,081, U.S. patent application Ser. No. 18/424,551, U.S. patent application Ser. No. 16/940,566 (Publication No. US 2021-0031383 A1), U.S. patent application Ser. No. 17/023,929 (Publication No. US 2021-0090201 A1), U.S. patent application Ser. No. 17/061,187 (Publication No. US 2021-0122035 A1), U.S. patent application Ser. No. 17/098,716 (Publication No. US 2021-0146553 A1), U.S. patent application Ser. No. 17/111,789 (Publication No. US 2021-0170607 A1), U.S. patent application Ser. No. 17/158,244 (Publication No. US 2021-0234997 A1), U.S. Provisional Patent Application Ser. No. 63/001,755 (Publication No. US 2021-0307170 A1), and/or U.S. Provisional Patent Application Ser. No. 63/057,461, as well as U.S. Provisional Patent Application Ser. No. 63/151,044, U.S. Provisional Patent Application Ser. No. 63/173,670, U.S. Provisional Patent Application Ser. No. 63/184,268, U.S. Provisional Patent Application Ser. No. 63/213,385, U.S. Provisional Patent Application Ser. No. 63/232,694, U.S. Provisional Patent Application Ser. No. 63/316,693, U.S. Provisional Patent Application Ser. No. 63/253,591, U.S. Provisional Patent Application Ser. No. 63/293,968, U.S. Provisional Patent Application Ser. No. 63/293,973, and/or U.S. Provisional Patent Application Ser. No. 63/278,817, each of which is incorporated herein by reference in its entirety.
In some implementations, a robot system or control module may employ a finite Instruction Set comprising generalized reusable work primitives that can be combined (in various combinations and/or permutations) to execute a task. For example, a robot control system may store a library of reusable work primitives each corresponding to a respective basic sub-task or sub-action that the robot is operative to autonomously perform (hereafter referred to as an Instruction Set). A work objective may be analyzed to determine a sequence (i.e., a combination and/or permutation) of reusable work primitives that, when executed by the robot, will complete the work objective. The robot may execute the sequence of reusable work primitives to complete the work objective. In this way, a finite Instruction Set may be used to execute a wide range of different types of tasks and work objectives across a wide range of industries. This approach is described in US Patent Publication No. 2022-0258340 based on U.S. patent application Ser. No. 17/566,589, which is incorporated herein by reference in its entirety.
To expand on the above, a general-purpose robot is able to complete multiple different work objectives. As used throughout this specification and the appended claims, the term “work objective” refers to a particular task, job, assignment, or application that has a specified goal and a determinable outcome, often (though not necessarily) in the furtherance of some economically valuable work. Work objectives exist in many aspects of business, research and development, commercial endeavors, and personal activities. Exemplary work objectives include, without limitation: cleaning a location (e.g., a bathroom) or an object (e.g., a bathroom mirror), preparing a meal, loading/unloading a storage container (e.g., a truck), taking inventory, collecting one or more sample(s), making one or more measurement(s), building or assembling an object, destroying or disassembling an object, delivering an item, harvesting objects and/or data, and so on. The various implementations described herein provide robots, systems, control modules, computer program products, and methods for operating a robot system, to at least semi-autonomously complete tasks or work objectives.
In accordance with the present robots, systems, control modules, computer program products, and methods, a work objective can be deconstructed or broken down into a “workflow” comprising a set or plurality of “work primitives”, where successful completion of the work objective involves performing each work primitive in the workflow. Depending on the specific implementation, completion of a work objective may be achieved by (i.e., a workflow may comprise): i) performing a corresponding set of work primitives sequentially or in series; ii) performing a corresponding set of work primitives in parallel; or iii) performing a corresponding set of work primitives in any combination of in series and in parallel (e.g., sequentially with overlap) as suits the work objective and/or the robot performing the work objective. Thus, in some implementations work primitives may be construed as lower-level activities, steps, or sub-tasks that are performed or executed as a workflow in order to complete a higher-level work objective.
Advantageously, and in accordance with the present robots, systems, control modules, computer program products, and methods, a catalog of “reusable” work primitives may be defined. A work primitive is reusable if it may be generically invoked, performed, employed, or applied in the completion of multiple different work objectives. For example, a reusable work primitive is one that is common to the respective workflows of multiple different work objectives. In some implementations, a reusable work primitive may include at least one variable that is defined upon or prior to invocation of the work primitive. For example, “pick up *object*” may be a reusable work primitive where the process of “picking up” may be generically performed at least semi-autonomously in furtherance of multiple different work objectives and the *object* to be picked up may be defined based on the specific work objective being pursued.
As stated previously, the various implementations described herein provide robots, systems, control modules, computer program products, and methods where a robot is enabled to at least semi-autonomously perform tasks or complete work objectives. Unless the specific context requires otherwise, the term “autonomously” is used throughout this specification and the appended claims to mean “without control by another party” and the term “semi-autonomously” is used to mean “at least partially autonomously.” In other words, throughout this specification and the appended claims, the term “semi-autonomously” means “with limited control by another party” unless the specific context requires otherwise. An example of a semi-autonomous robot is one that can independently and/or automatically execute and control some of its own low-level functions, such as its mobility and gripping functions, but relies on some external control for high-level instructions such as what to do and/or how to do it.
In accordance with the present robots, systems, control modules, computer program products, and methods, a catalog of reusable work primitives may be defined, identified, developed, or constructed such that any given work objective across multiple different work objectives may be completed by executing a corresponding workflow comprising a particular combination and/or permutation of reusable work primitives selected from the catalog of reusable work primitives. Once such a catalog of reusable work primitives has been established, one or more robot(s) may be trained to autonomously or automatically perform each individual reusable work primitive in the catalog of reusable work primitives without necessarily including the context of: i) a particular workflow of which the particular reusable work primitive being trained is a part, and/or ii) any other reusable work primitive that may, in a particular workflow, precede or succeed the particular reusable work primitive being trained. In this way, a semi-autonomous robot may be operative to autonomously or automatically perform each individual reusable work primitive in a catalog of reusable work primitives and only require instruction, direction, or guidance from another party (e.g., from an operator, user, or pilot) when it comes to deciding which reusable work primitive(s) to perform and/or in what order. In other words, an operator, user, pilot, or LLM module may provide a workflow consisting of reusable work primitives to a semi-autonomous robot system and the semi-autonomous robot system may autonomously or automatically execute the reusable work primitives according to the workflow to complete a work objective. For example, a semi-autonomous humanoid robot may be operative to autonomously look left when directed to look left, autonomously open its right end effector when directed to open its right end effector, and so on, without relying upon detailed low-level control of such functions by a third party. Such a semi-autonomous humanoid robot may autonomously complete a work objective once given instructions regarding a workflow detailing which reusable work primitives it must perform, and in what order, in order to complete the work objective. Furthermore, in accordance with the present robots, systems, methods, control modules and computer program products, a robot system may operate fully autonomously if it is trained or otherwise configured to (e.g. via consultation with an LLM module, which can be included in the robot system) analyze a work objective and independently define a corresponding workflow itself by deconstructing the work objective into a set of reusable work primitives from a library of reusable work primitives that the robot system is operative to autonomously perform.
In the context of a robot system, reusable work primitives may correspond to basic low-level functions that the robot system is operable to (e.g., autonomously or automatically) perform and that the robot system may call upon or execute in order to achieve something. Examples of reusable work primitives for a humanoid robot include, without limitation: look up, look down, look left, look right, move right arm, move left arm, close right end effector, open right end effector, close left end effector, open left end effector, move forward, turn left, turn right, move backwards, and so on, as well as cognitive functions like analyze, calculate, plan, determine, reason, and so on; however, a person of skill in the art will appreciate that: i) the foregoing list of exemplary reusable work primitives for a humanoid robot is by no means exhaustive; ii) the present robots, systems, control modules, computer program products, and methods, the high-level functions that a robot is operative to perform are deconstructed or broken down into a set of basic components or constituents, referred to throughout this specification and the appended claims as “work primitives”. Unless the specific context requires otherwise, work primitives may be construed as the building blocks of which higher-level robot functions are constructed.
In some implementations training a robot system to autonomously perform a reusable work primitive may be completed in a real-world environment or a simulated environment. Once a robot has been trained to autonomously perform a catalog of reusable work primitives, operation of the robot may be abstracted to the level of reusable work primitives; e.g. an LLM module which prepares a task plan for the robot may do so by determining which reusable work primitive(s) to perform and, in some implementations, in what order to perform them, and the robot may have sufficient autonomy or automation to execute a complete work objective based on such limited control instructions.
As described previously, “clean a bathroom mirror” is an illustrative example of a work objective that can be deconstructed into a set of work primitives to achieve a goal and for which the outcome is determinable. The goal in this case is a clean bathroom mirror, and an exemplary set of work primitives (or workflow) that completes the work objective is as follows:
A person of skill in the art will appreciate that the exemplary workflow above, comprising nine work primitives, is used as an illustrative example of a workflow that may be deployed to complete the work objective of cleaning a bathroom mirror; however, in accordance with the present robots, systems, control modules, computer program products, and methods the precise definition and composition of each work primitive and the specific combination and/or permutation of work primitives selected/executed to complete a work objective (i.e., the specific construction of a workflow) may vary in different implementations. For example, in some implementations work primitives 3, 4, and 5 above (i.e., locate mirror, aim the cleaning solution at the mirror, and dispense the cleaning solution onto the mirror) may all be combined into one higher-level work primitive as “spray cleaning solution on the mirror” whereas in other implementations those same work primitives may be broken down into additional lower-level work primitives as, for example:
Based on the above example and description, a person of skill in the art will appreciate that the granularity of work primitives may vary across different implementations of the present robots, systems, control modules, computer program products, and methods. Furthermore, in accordance with the present robots, systems, control modules, computer program products, and methods the work primitives are advantageously “reusable” in the sense that each work primitive may be employed, invoked, applied, or “reused” in the performance of more than one overall work objective. For example, while cleaning a bathroom mirror may involve the work primitive “grasp the cleaning solution,” other work objectives may also use the “grasp the cleaning solution” work primitive, such as for example “clean the toilet,” “clean the window,” and/or “clean the floor.” In some implementations, work primitives may be abstracted to become more generic. For example, “grasp the cleaning solution” may be abstracted to “grasp the spray bottle” or “grasp the *object1*” where the *object1* variable is defined as “*object1*=spray bottle”, and “locate the mirror” may be abstracted to “locate the object that needs to be sprayed” or simply “locate *object2*” where “*object2*=mirror”. In such cases, the “grasp the spray bottle” work primitive may be used in tasks that do not involve cleaning, such as “paint the wall” (where the spray bottle=spray paint), “style the hair” (where the spray bottle=hairspray), or “prepare the stir-fry meal” (where the spray bottle=cooking oil spray).
Unless the specific context requires otherwise, throughout this specification and the appended claims reference to an “LLM” or “LLM module” should be construed as including one or more LLM(s) or one or more LLM module(s), and/or one or more application(s) or program(s) that run, access, use, or otherwise leverage at least one LLM. For example, reference to interactions with an LLM or LLM module (e.g. providing input to the LLM, receiving output from the LLM, asking the LLM, querying the LLM, etc.) can be performed through an application or interface which uses the LLM module (e.g. a chat application which accesses an LLM to interpret inputs and formulate outputs, such as OpenAI™'s ChatGPT™ application built on the GPT-3™ family of LLMs).
In some implementations of the present systems, methods, control modules, and computer program products, an LLM is used to assist in determining a sequence of reusable work primitives (hereafter “Instructions”), selected from a finite library of reusable work primitives (hereafter “Instruction Set”), that when executed by a robot will cause or enable the robot to complete a task. In some implementations, an LLM is used to assist in determining a “workflow”. For example, a robot control system may take a Natural Language (NL) command as input and return a Task Plan formed of a sequence of allowed Instructions drawn from an Instruction Set whose completion achieves the intent of the NL input. Throughout this specification and the appended claims, unless the specific context requires otherwise a Task Plan may comprise, or consist of, a workflow depending on the specific implementation. Take as an exemplary application the task of “kitting” a chess set comprising sixteen white chess pieces and sixteen black chess pieces. A person could say, or type, to the robot, e.g., “Put all the white pieces in the right hand bin and all the black pieces in the left hand bin” and an LLM could support a fully autonomous system that converts this input into a sequence of allowed Instructions that successfully performs the task. In this case, the LLM may help to allow the robot to perform general tasks specified in NL. General tasks include but are not limited to all work in the current economy.
Throughout the present systems, methods, control modules, and computer program products, the term “natural language” refers to any language that has evolved naturally in humans and includes as examples without limitation: English, French, Spanish, Chinese (Mandarin, Yue, Wu, etc.), Portuguese, Japanese, Russian, Korean, Arabic, Hebrew, German, Polish, Hindi, Bengali, Italian, Punjabi, Vietnamese, Hausa, Swedish, Finnish, and so on.
While
Here is a specific example of a prompt and response pair, obtained by running a python script:
In the above example, the RESPONSE provided by the LLM corresponds to a Task Plan. If a robot system executes the sequence of Instructions specified in the Task Plan, then the task specified in NL via the PROMPT will be successfully completed by the robot system. Throughout this disclosure, the term “motion plan” could be used in place of “task plan”. In this regard, the sequence of instructions specified in the task plan (motion plan) can comprise instructions which cause the robot to undergo a series of motions or movements.
Returning to
At 202, sensor data is captured representing information about an environment of a robot body of the robot system. To this end, the robot body can carry at least one exemplary sensor which captures the sensor data, as discussed later with reference to
At 204, at least one processor of the robot system generates a natural language (NL) description of at least one aspect of the environment based on the sensor data. Such an NL description is referenced (as a scene description) in
In an exemplary implementation, the at least one processor executes an object or feature detection model (e.g. a classification module such as a YOLO model, or any other appropriate model) which identifies objects or features in the environment (as represented in the sensor data), and assigns text labels to such features or objects. Such text labels can be in “robot-language”. Throughout this disclosure, the term “robot language” or similar refers to language which is a result of, or intended for, use within a robot or programmatic context, as opposed to natural human language which humans use to communicate with each other. With reference to the chess kit example earlier, a particular chess pawn could be identified in robot language as “chess_pawn_54677”. This is an example of robot language in that underscores are used instead of spaces, and a numerical identifier for the pawn is far higher than a human would use in normal context.
Regardless, there are commonalities between robot language and human language which can be useful (particularly, common vocabulary). In the example of “chess_pawn_54677”, the terms “chess” and “pawn” are also used in human natural language. In order to generate the NL description of at least one aspect of the environment, the at least one processor can execute a text string matching module which matches text in robot-language text labels to NL vocabulary. For example, an NL description of “chess_pawn_54677” can be generated as “chess pawn 1”. Further, identified objects or features in the environment can also be associated with metadata which can be used in generating the NL description of the environment. For example, the label “chess_pawn_54677” can be associated with metadata indicating a color of the chess pawn (typically “white” or “black”). The at least one processor can use this metadata to generate an NL description of “chess_pawn_54677” as “white chess pawn 1”, for example. The inclusion of metadata is not necessary however. For example, the label could also indicate such information (e.g. “white_chess_pawn_54677”).
Additional NL descriptions of other aspects of the environment can also be generated. With reference to the exemplary prompt discussed above, NL descriptions for several different chess pieces, the bins, the person, and the table are generated. Such NL descriptions can be generated in a similar manner to as discussed above.
Further, generating the NL description of the environment is not necessarily limited to generating NL descriptions of objects or features in the environment. In some implementations, locations or placement of such objects or features can also be described. With reference to the exemplary prompt discussed above, the sentence “There is white chess pawn number 1, white chess pawn number 2, black chess pawn number 1, black chess pawn number 2, a white chess rook, a black chess bishop and two blue bins on the table, and a person stood opposite me.” Describes several objects in the environment, as well as their positions. Such an NL description could be generated by the at least one processor by, for example, populating a template description with a list of objects, based on where said objects fit into the template.
At 206, the NL query is provided to a large language model (LLM) module. In some implementations, the LLM module is a software or data module stored on at least one non-transitory processor-readable storage medium of the system (either at the robot body or at a robot controller remote from the robot body). In such implementations, the NL query can be prepared and provided as input to the LLM module by the at least one processor of the robot system. In other implementations, the LLM module can be a software or data module stored at a non-transitory processor readable storage medium of a device separate from the robot system. In yet other implementations, the LLM module can refer to a hardware module which receives input prompts and executes the LLM module on the inputs. In such other implementations, the NL query can be prepared by the at least one processor of the robot system, and provided to the device where the LLM module is via a communication interface of the robot system. As one specific example, the LLM module can be stored at one or more servers remote from the robot body, and may take prompts as input by a website, form, or appropriate API. The at least one processor of the robot system can prepare the NL query in the appropriate format, and the robot system can send the NL query via the communication interface of the robot system.
The NL query provided to the LLM module includes the NL description of at least one aspect of the environment, as generated at 204. Additionally, the NL query includes an NL description of a work objective, and NL description of an Instruction Set executable by the robot system, and an NL request for a task plan. Each of these NL descriptions are described in detail below.
As mentioned earlier, a work objective generally refers to a particular task, job, assignment, or application that has a specified goal and determinable outcome. An NL description of such a work objective is an expression of such a work objective in a format natural to humans. With reference to the example of kitting the chess set discussed earlier, the prompt includes the phrase “My goal is to do what the person says.” This in itself can be considered as an NL description of a work objective, but further input provides more details on specific goals the robot system should be complete. In the example, the prompt also includes the phrase “The person says ‘Put all the white pieces in the right hand bin and all the black pieces in the left hand bin’.” This can also be considered an NL description of a work objective, and provides a specific statement on what the robot system is expected to do. In some implementations, the NL description of the work objective comprises the entirety of the two phrases “My goal is to do what the person says. The person says ‘Put all the white pieces in the right hand bin and all the black pieces in the left hand bin’.”
The NL description of the work objective can be based on various information or data. In some implementations, an indication of the work objective may be made explicitly available to the robot system (e.g. sent from a management device or server to the robot system, or stored in at least one non-transitory processor-readable storage medium of the robot system). The indication of the work objective can be made available to the robot system in an NL format, such that the at least one processor only needs to access the indication of the work objective and provide the same to the LLM module. In this sense, it is not necessary for the at least one processor of the robot system to generate the NL description of the work objective, but rather an existing NL description of the work objective can be provided to the LLM module. Alternatively, the indication of the work objective may not be in NL format (e.g. it may be in robot-language), and the at least one processor may generate the NL description of the work objective based on the indication of the work objective (e.g. by executing a robot language conversion module such as a text-string matching module similar to as discussed earlier). In other implementations, the at least one processor of the robot system may generate the NL description of the work objective based on other information, such as a role in which the robot is deployed. In this context, a “role” generally refers to a category of purposes which a robot may serve within a pertinent environment. For example, a janitorial robot can serve a role of cleaning up a particular area or facility. In such a case, with reference to the earlier example of kitting the chess set, the at least one processor may generate an NL description of a work objective as “Clean up loose chess pieces and place in appropriate bins”.
NL descriptions of work objectives can be generated based on any appropriate additional information. In another case, capabilities of the robot system may be accounted for when generating an NL description of a work objective. For example, a robot body which lacks locomotion elements can only successfully complete work objectives in an immediate area of the robot body.
As mentioned earlier, the Instruction Set executable by the robot system can be a library of reusable work primitives, such as “grasp object”, “place object on object”, or any other appropriate action. These examples are presented here in natural language, but may be stored and accessed in a robot-language form, such as “grasp(object)” or “place(object1, object2)” (as non-limiting examples). In the example of kitting the chess set discussed earlier, the “options” 1, 2, 3, 4, 5, and 6 represent an NL description of the Instruction Set executable by the robot system. The exemplary prompt also includes qualifying statements “I can only choose strictly from the following options:” and “Where object should be replaced with an appropriate object on the table.”, which provide the LLM module additional information on how the NL description of the Instruction Set should be interpreted, used, or applied. Such qualifying statements can be added, removed, or altered as appropriate for a given application and a given Instruction Set. Such qualifying statements could further be included in the NL query, for example by inclusion in a template on which the NL query is based, as is discussed in more detail later.
In some implementations, the NL description of the Instruction Set can be pre-generated and loaded to the robot system, such that the robot system can provide this pre-generated NL description of the Instruction Set to the LLM module at 206. For example, a management device, server, or configuration device can generate the NL description of the Instruction Set, which can be stored at a non-transitory processor-readable storage medium of the robot system for subsequent access (e.g. during configuration or deployment of the robot system). As a specific example, a reusable work primitive “place(object1, object2)” can be stored with metadata of an NL description of the reusable work primitive as “place object on object”. Such NL descriptions of instructions can be provided manually by a human, or can be generated by at least one processor (and possibly reviewed and/or modified by a human for accuracy).
In some implementations, the NL description of the Instruction Set can be generated by the robot system. Regardless of where generation of the NL description of the Instruction Set is performed (in examples where the NL descriptions are generated by at least one processor), the at least one processor which performs the generation can execute a robot-language conversion module which generates an NL description of each instruction in the Instruction Set, based on the respective instruction as expressed in robot-language. Similar to as discussed earlier, such a robot-language conversion module can comprise a text-string matching module operable to compare robot-language instructions in the Instruction Set to natural language vocabulary representative of actions performable by the robot system. Matching text strings can be identified for inclusion in the NL description of the Instruction Set. As an example, for the instruction “grasp(object)”, the text-string matching module can identify “grasp” and “object” as corresponding to NL vocabulary. Further, the at least one processor can infer, based on general structure of programmatic functions, that the intent of this instruction is to cause a robot to “grasp” the input “object”. To this end, the NL description “grasp object” can be generated for this instruction. As another example, for the instruction “place(object1, object2)”, the text-string matching module can identify “place” and “object” as corresponding to NL vocabulary. Further, the at least one processor can infer, based on general structure of programmatic functions, that the intent of this instruction is to cause a robot to “place” the input “object1” on the input “object 2”. To this end, the NL description “place object on object” can be generated for this instruction.
The NL request for a task plan generally refers to a statement or phrase intended to tell the LLM module what to do with the other information in the NL query. With reference to the example of kitting the chess set discussed above, the phrase “Here is a list of the above commands I should perform in order to complete my goal:”. In the example, this phrase is intended to inform the LLM module that it is to generate a list of commands selected from the Instruction Set, in order to accomplish the stated goal (work objective, as discussed earlier). The NL request for a task plan can be generated, for example, by the at least one processor of the robot system based on a request template. As another example, the NL request can be included in an NL query template which is utilized to structure or format the NL query, as discussed below.
In some implementations, at least one non-transitory processor-readable storage medium of the robot system stores at least one pre-generated NL template. Such an NL template can include any or all of respective template aspects for an NL description of at least one aspect of an environment, an NL description of a work objective, an NL description of an Instruction Set, and/or an NL request for a task plan. An exemplary template is discussed below, with reference to generation of the example prompt discussed earlier for kitting the chess set. However, other exemplary templates could be used, in different scenarios, to generate different NL queries. The discussed non-limiting exemplary NL template could be:
In the above template, elements in square brackets can be populated by the at least one processor inserting appropriate NL descriptions. In particular, object_array_1 represents at least one array of objects at position_1 (for example, an array of chess pieces on a table). In this example, the at least one processor can replace the text [object_array_1] with an NL description of the chess pieces, and the text [position_1] with an NL description of the table. Further in the example, object_array_2 represents a person standing at a position_2 opposite the robot body. In this example, the at least one processor can replace the text [object_array_2] with an NL description of the person, and the text [position_2] with an NL description of the person's position. Further, from the text [(,)(and)], the at least one processor can select either “,” or “and”, to connect the text regarding object_array_1 and object_array_2 in a natural way (based on whether there is an object_array_3). In the example, there are no additional object arrays (no object_array_3 or object_array_j), so the at least one processor selects the connecting text “and”. Further, because there are no additional object arrays, the at least one processor deletes or ignores (e.g. replaces with no text) the text “[object_array_j] at [position_j]”.
As a result of the above steps, the first sentence of the NL query as generated based on the template can be “There is white chess pawn number 1, white chess pawn number 2, black chess pawn number 1, black chess pawn number 2, a white chess rook, a black chess bishop and two blue bins at a table, and a person stood opposite me.” This is similar to the exemplary prompt as discussed earlier, but for the chess pieces being described as being “at a table” instead of “on the table”. To improve generation of the NL query, the template can include options for transitional or locational words like “on” or “at”, such that the at least one processor can select the most natural word for a given scenario.
Returning to the above template, the at least one processor can replace the text [work_objective] with an NL description of the work objective of the robot system. In the example, the at least one processor can replace the text [work_objective] such that the second sentence of the NL query is “My goal is to do what the person says. The person says ‘Put all the white pieces in the right hand bin and all the black pieces in the left hand bin’.”, similar to the prompt in the example discussed earlier.
Further, the at least one processor can replace the text for the available instructions “1.)[reusable_work_primitive_1 (variable)] . . . k.)[reusable_work_primitive_k(variable)]” with NL descriptions of each available reusable work primitive. In the example scenario, the text for available instructions can be replaced with “1.) Grasp object”, “2.) Place object on object”, “3.) Look at object”, “4.) Slice object”, “5.) Say hello to the person”, and “6.) Drop object in object”, as in the exemplary prompt discussed earlier.
In the example, the “variable” for each reusable work primitive is replaced with appropriate text of “object” or “person”, depending on what the given reusable work primitive is applicable to. Further, the text [variable] and [position_1] . . . [position_j] in the second to last sentence of the template are also replaced with appropriate text of “object”, and relevant positions of the objects. In this regard, the second to last sentence of the generated NL query reads “Where object should be replaced with an appropriate object at the table”, as in the exemplary prompt presented earlier.
In view of the above, by replacing or inputting select elements in a pre-generated template, an NL query is generated which is suitable for provision to an LLM module.
While the above describes the NL template as text where certain elements are “replaced” or “input”, this is not strictly necessary in terms of implementation. For example, instead of “replacing” text in a literal sense, the NL template can also be implemented as a set of instructions or functions (e.g. a program or a script) which pieces base sentences together with relevant elements in a piece-wise manner. In such an example, the NL query is “assembled” as pieces instead of elements being literally “replaced”. In this sense, the presented NL template is intended to be a logical representation of how elements can be pieced together, rather than a strict process by which text generation actually occurs.
Returning to method 200 in
At 210, the robot system executes the task plan. For example, the at least one processor of the robot controller can cause at least one element (e.g. actuatable element) to perform any actions specified in the task plan.
As mentioned above, the task plan provided by the LLM module is expressed in NL. For example, the task plan can indicate at least one action performable by the robot system expressed in NL. In order for the robot system to execute the action plan, the at least one processor can first generate a robot-language task plan based on the task plan as expressed in NL. The robot-language task plan can comprise a set of robot control instructions which when executed by the at least one processor cause the robot system to perform the at least one action indicated in the task plan. For example, the set of robot control instructions can comprise a library or set of at least one reusable work primitive executable by the robot system. Further, the at least one action indicated in the task plan as expressed in NL can comprise an NL description of a particular reusable work primitive (e.g. grasp chess pawn 1), whereas the robot control instructions in the robot-language task plan can comprise actions of the NL task plan, but specified in a language format usable by the robot system (e.g. grasp(chess_pawn_54677)).
Similar to as described earlier, generating the robot-language task plan can comprise executing a robot-language conversion module which converts the at least one action performable by the robot system as expressed in NL to at least one reusable work primitive in the Instruction Set executable by the robot system. With reference to the example where the NL task plan includes an action expressed in NL as “grasp chess pawn 1”, the robot-language conversion module can match text strings in the action as expressed in NL to text strings available in reusable work primitives usable by the robot system (e.g. grasp(object)), or objects in the environment with which the robot system can interact (e.g. chess_pawn_54677). As a result, the at least one processor can generate robot-language actions such as grasp(chess_pawn_54677).
In some implementations, an LLM module may be used to autonomously troubleshoot a task plan. For example, if a given task plan fails to execute (i.e., fails to be validated, fails to proceed through to completion, and/or fails to complete an intended task) or encounters an error, an NL prompt can be sent (back) to the LLM module including all of the successful parts of the Task Plan executed or validated, with additional verbiage describing what failed and asking the LLM module what to do next. In addition, an external checker can review or validate a proposed plan and reject it for some reason. The external checker could be a logic-based system or reasoning engine, such as the CYC® machine reasoning AI platform from Cycorp Inc., as a non-limiting example. Reasoning engines (sometimes called inference engines) can utilize a library of logical rules, statements, terms, pieces of knowledge, or similar, and can make logical conclusions based on the same. In this way, a task plan as referenced in method 200 can be validated by a reasoning engine, by comparing the task plan to a set of rules (or similar) specified at least in part of a reasoning engine. That is, at least a part of the logic of a reasoning engine can be applied to a task plan to validate whether the task plan makes logical sense, and/or to identify any logical inconsistencies or impossibilities in the task plan. A reason for rejecting could be, for example, a safety violation in relation to robot safety or safety of any human or other living being. In the event of a rejection, an NL prompt could be sent back to the LLM module modified to prevent a plan from failing the external check.
In some implementations, an LLM may help to autonomously assign parameters or definitions to generalized and/or parameterized objects in a robot control system. For example, parameterized work primitives or “Instructions” can be assigned by the LLM as in the case of the chess kitting example above. As another example, if a task plan successfully executes, the successful task plan can be stored and then re-parameterized to become generalized. When the robot encounters a future instance of a similar task, it can recall the stored successful task plan and ask the LLM module (e.g., via a simple NL prompt) to replace the parameterized objects from the previously successful instance of the task plan with new objects specific to the current instance of the task plan. For example, if a plan was generated to successfully sort two types of specific object, a robot can re-use it by asking the LLM to replace those objects with different objects.
Various implementations of the present systems, methods, control modules, and computer program products involve using NL expressions (descriptions) (e.g., via a NL prompt, which may be entered directly in text by a user or may be spoken vocally by a user and converted to text by an intervening voice-to-text system) to control functions and operations of a robot, where an LLM module may provide an interface between the NL expressions and the robot control system. This framework can be particularly advantageous when certain elements of the robot control architecture employ programming and/or instructions that can be expressed in NL. A suitable, but non-limiting, example of this is the aforementioned Instruction Set. For example, as mentioned earlier, a task plan output of an LLM module can be parsed (e.g., autonomously by the robot control system) by looking for a word match to Instruction Set commands, and the arguments of the Instruction Set can be found by string matching within the input NL prompt (e.g. by a text-string matching module as discussed earlier). In some implementations, a 1-1 map may be generated between the arguments used in the robot control system and NL variants, in order to increase the chance of the LLM module processing the text properly. For example, even though an object is represented in the robot control system (e.g., in a world model environment portion of the robot control system) as chess_pawn_54677, it may be referred to in the NL prompt as “chess pawn 1”. In this case, if the returned task plan contains the phrase “grasp chess pawn 1”, this may be matched to Instruction Set “grasp” and the object “chess pawn 1” so the phrase may be mapped to grasp(chess_pawn_54677). Such parsing and/or word matching (e.g. the text-string matching module) can be employed in any of the situations discussed herein where robot language is converted to natural language or vice-versa.
In some implementations, a robot control system may generate and/or employ a scene graph describing the robot's environment, and a function may be applied to act on the scene graph and creates an NL prompt or description describing the scene from the robot's perspective (e.g. in the context of act 204 of method 200). This auto-generated NL prompt or description may then be used as an input into an LLM module in order to facilitate various operations, such as reasoning, fact-checking, and task planning.
In some implementations, the quality of a task plan may depend, at least in part, on the robot's knowledge of its environment, so the robot control system may regularly check and compare its Scene Graph and Inner World Model in the background. In accordance with the present systems, methods, control modules and computer program products, this checking and comparing the scene graph (e.g., actual data from the robot's external environment) and inner world model (e.g., robot's simulation of its external environment) can be done by automatically generated NL prompts or descriptions of each and feeding these NL prompts or descriptions through an LLM module.
In some implementations, a LLM module used as a task planner may be engaged frequently by a robot control system to answer the question “what could I (the robot) do here/now?”. For example, the robot control may automatically generate an NL description of at least one aspect of its environment (scene graph) and capabilities (Instruction Set) and feed these NL descriptions into the LLM module along with the query: What can I do?: or “What should I do?: (or similar variations, such as “What would be the most useful thing for me to do”, “What is a productive thing that I could do?”, etc.) A set of answers to this or similar questions can each then be run through generation of task plans (e.g., as described above with reference to method 200 in
Some task plans may contain steps that cannot be resolved to Instruction Set elements and are inherently computational. For example, a task plan may require the computation of an integral, or some other computational process, that might not be possible given a particular Instruction Set. In these cases, the robot system can send these task plan steps to an LLM-based system or LLM module that asks for the generation of a piece of code, for example a python script, that generates a function to execute the task. In some implementations, that script can then live in a “code repository” where human engineers look at all the auto-generated scripts generated by the background “what could I do here?” process, and check that they do what is intended. Such scripts generated by an LLM-based device or module can provide new Instruction Set elements that can be called to “unlock” task plans that were blocked by not having access to an appropriate instruction, or can be otherwise accessible to the robot system for incorporation and use in task plans.
In some implementations, an LLM module may be stored and executed outside of a robot (e.g., in the cloud) and called or accessed by a robot system (as illustrated in the example of
Robot body 501 further includes at least one sensor 503 that detects and/or collects data about the environment and/or objects (e.g., including people, such as customers) in the environment of robot system 500. In the illustrated implementation, sensor 503 corresponds to a sensor system including a camera, a microphone, and an initial measurement unit that itself comprises three orthogonal accelerometers, a magnetometer, and a compass. However, any appropriate sensor could be included or excluded in the at least one sensor 503, as appropriate for a given application. Sensor data such as captured in act 202 of method 200 can be captured for example by sensor 503.
For the purposes of illustration,
In some implementations, actions or processes can be performed entirely locally at robot body 501. For example, in some implementations the entirety of method 200 can be performed locally at robot body 501. In such implementations, the at least one sensor 503 captures the sensor data in act 202, and the at least one processor 530 generates the NL description of the at least one aspect of the environment in act 204. The at least one processor 530 can further generate any of the other NL descriptions included in the NL query as discussed earlier. Further in such implementations, memory 540 also stores an LLM module, to which the NL query is provided in act 206. Providing the NL query in such cases can refer to the at least one processor 530 executing the LLM module, with the NL query as input. Further, receiving the task plan from the NL as in act 208 of method 200 can comprise the at least one processor 530 receiving the task plan as output by the LLM module. Executing the task plan as in act 210 comprises the at least one processor 530 executing instructions which cause robot body 501 to perform actions specified in the task plan.
In some implementations, actions or processes can be performed either locally at robot body 501, or separately by a device separate from the robot body 501. In this regard, the at least one processor 530 is also communicatively coupled to a wireless transceiver 550 via which robot body 501 sends and receives wireless communication signals 570.
In particular, separate device 580 is also illustrated as including at least one processor 582 communicatively coupled to wireless transceiver 581, and at least one non-transitory processor-readable storage medium 590 (or “memory” 590) communicatively coupled to the at least one processor 582. Memory 590 stores data 591 and processor-executable instructions 592 (e.g., together as a robot control module or computer program product) that, when executed by processor 582, cause separate device 580 (or components thereof) to perform actions and/or functions in association with the present systems, robots, methods, robot control modules, and computer program products. Memory 590 can also store an LLM module. Alternatively, separate device 580 can access an LLM module stored at yet another device (e.g. a cloud or internet based LLM module).
Methods or processes discussed herein (e.g. method 200 in
The various implementations described herein include systems, methods, control modules, and computer program products for leveraging one or more LLM(s) in a robot control system, including for example establishing an NL interface between the LLM(s) and the robot control system and calling the LLM(s) to help autonomously instruct the robot what to do. Example applications of this approach include task planning, motion planning, reasoning about the robot's environment (e.g., “what could I do now?”), and so on. Such implementations are particularly well-suited in robot control systems for which at least some control parameters and/or instructions (e.g., the Instruction Set described previously) are amenable to being specified in NL. Thus, some implementations may include converting or translating robot control instructions and/or parameters into NL for communicating such with the LLM(s) via the NL interface.
The various implementations described herein also include systems, methods, and computer program products for leveraging one or more LLM(s) in or by a robot system in order to enhance interactions with one or more humans or persons in the environment of a robot body. For example, in some implementations a robot system with access to an LLM may serve as a customer-facing omnichannel input/output point (i.e., interface) for a business, commercial enterprise, retailor, or any engagement where targeting knowledge-based interactions with individual people is desired. As a specific non-limiting example, a robot system with access to an LLM may operate at a retail store and be stationed “on the floor” either as a greeter or more generally available throughout the retail area. Such a robot may have access to all customer data available to the retail store and a means for identifying individual customers. An example of a means of identifying customers includes automatic facial recognition, where the robot system may be equipped with, or have access to, a camera and access to suitable facial recognition software including, for example, a database of facial images for at least some of the store's customers. In some implementations customers may need to “opt-in” to facial recognition by granting the retail store and/or robot system rights to capture and/or store their facial image and identity information. Another example of a means of identifying customers includes registrations and/or detection of a customer identification, such as for example by having the robot system scan or otherwise detect a customer ID card carried by the customer, or similar. In any case, a robot system with access to an LLM that also has access to customer data may leverage both to create personal interactions with a customer. Examples of personal interactions include, without limitation: conversation, including asking questions based on knowledge about the customer (e.g., “how is your (daughter), (name)?”), discussing past purchases the customer has made at the store (e.g., “did you enjoy (mint chocolate chip ice cream)?”), and advising the customer about promotions or sales that the customer is likely to be interested in based on their customer profile (including purchase history). Such a robot system is well-suited to answer questions or queries from the customer as well (e.g., by converting spoken language into NL text, inputting the NL text into the LLM along with a query for a response, receiving an NL response from the LLM, and converting the NL response to spoken language by the robot system—where the LLM may leverage or otherwise have access to the customer's data in its knowledge base), such as “Where can I find X?”, “What size of Y will be best for me?”, and generally any question the customer may typically have for an employee of the retail store.
Importantly, customer data accessed by the LLM (e.g., in the example above or otherwise) may come from a wide range of different sources. For example, the customer data may include, without limitation, information collected: by the brick and mortar store such as the customer's on-site purchase history and visitation schedule; by the robot system in observing and interacting with the customer at the store; by the store's website based on how the customer interacts with the store's website, including through their laptop, PC, smartphone, etc.; and from other sources where the retail store may lawfully access information about the customer (such as website or application data from other third party websites or applications accessed by the customer via their laptop, PC, smartphone, or other connected device). This synthesis of customer data from multiple touch points, or “channels”, is referred to as “omnichannel” and the various implementations described herein include omnichannel customer experiences in which a robot system with access to an LLM may serve as both a source of data (i.e., a “channel” into the omnichannel system) and as a platform for leveraging the data to enhance the customer's experience. An example of this omnichannel system is illustrated in
Robot body 602 may also serve as an “output” of Customer Data 610 to customer 601. For example, robot body 602 may process, either directly or through LLM 603, Customer Data 610 to deliver an interaction to customer 601. Exemplary details of such interactions are described in relation to
Returning to
At 701, the robot system identifies a person in the environment of the robot system. As previously described, in order to identify the person the robot system may, for example: i) capture, by at least one camera, an image of a face of the person and determine an identity of the person based on the image of the face of the person (i.e., by employing conventional known software and algorithms for facial recognition based on camera images); and/or ii) scan, by at least one sensor, an identifier associated with the person (such as a QR code or barcode on a badge or card carried by the person) and determine an identity of the person based on the identifier associated with the person (e.g., by looking up a scan result in a customer database).
At 702, the robot system accesses information about the person. For example, the robot system may access customer data (such as Customer Data 610) stored in a database. In some implementations, at 702 the robot system may retrieve digital information about the person from a database of digital information about multiple people, the database stored in a non-transitory processor-readable storage medium. As previously described, the digital information about multiple people may include digital information about multiple people collected through multiple channels including, without limitation: purchasing histories of the multiple people; location histories of the multiple people; internet browsing histories of the multiple people; account profiles of the multiple people; event history of the environment; and/or information about past interactions between the robot system and the multiple people.
At 703, the robot system generates a first natural language (NL) query, similar in some implementations to act 206 of method 200 though the composition of the first NL query in method 700 may be different. The first NL query of method 700 may include a NL description of the information about the person accessed at 702, a NL description of contextual information, and a NL request for an outbound verbalization for the robot system to deliver to the person. Throughout this specification and the appended claims, the term “outbound” verbalization is used to refer to a verbalization that a robot system may provide or deliver outward to its environment (including to a person in its environment), whereas the term “inbound” verbalization is used to refer to a verbalization that a robot system may receive from its environment (including from a person in its environment).
In some implementations, the NL description of the information about the person accessed at 702 may include any or all of: a NL summary of the information about the person, a NL excerpt from the information about the person, and/or a NL copy of the information about the person.
In some implementations, the NL description of contextual information may include an NL description of the respective roles of the person and the robot system. For example, if the robot system is operating as a greeter or other retail support at a retail store and the person is a customer at the retail store, then the NL description of contextual information may include an NL description of the person as a customer and of the robot system as a greeter or retail support. In some implementations, the NL description of contextual information may include an NL description of at least a portion of the environment. For example, in some implementations, the NL description of contextual information may include an NL description of at least one aspect of the environment based on sensor data collected by at least one sensor of the robot system, substantially similar to as described for acts 202 and 204 of method 200.
In some implementations, the NL description of contextual information may include other information accessed by the robot system (e.g., via a wireless communicative link to the internet), such as information from local news reports, national or international news reports, and/or weather reports.
At 704, the robot system provides the first NL query to a large language model (LLM) module, in a manner that may, in some implementations, be substantially similar to as described in relation to act 206 of method 200. As described in relation to
At 705, the robot system receives (from the LLM module) the outbound verbalization for the robot system to deliver to the person. In various implementations, the outbound verbalization may include a question, comment, statement, request, query, exclamation, or generally “something for the robot system to say”.
At 706, the robot system delivers the outbound verbalization to the person. In some implementations, at 706 the robot system may “verbalize” or “say” the outbound verbalization. In the context of a robot system, to verbalize or say a verbalization means to generate and emit audio that includes a voice (e.g., an artificial voice) saying the verbalization. Thus, in some implementations, at 706 the robot system converts the outbound verbalization into a digital audio file that includes an audible expression of the outbound verbalization and then executes or plays the digital audio file to cause at least one speaker (or audio output device) of the robot system to generate the audio represented in the digital audio file. In various implementations, when the robot system delivers the outbound verbalization at 706, the effective result may be that the robot system audibly asks a question about the person, makes a comment, statement, or declaration to the person, or similar.
In some implementations of method 700, the interaction with the person initiated by the robot system may continue beyond act 706. For example, in some implementations the person may respond (e.g., verbally respond) to the delivery of the outbound verbalization by the robot system at 706. In such implementations, method 700 may further include: receiving, by the robot system, an inbound verbalization from the person; generating a second NL query by the robot system, the second NL query including a NL transcription of the inbound verbalization received from the person by the robot system, a NL description of the outbound verbalization delivered from the robot system to the person (which may, in some implementations, include any or all of: a NL summary of the first NL query, a NL excerpt from the first NL query, and/or a NL copy of the first NL query), a NL description of the first NL query, and a NL request for a response to the inbound verbalization received from the person by the robot system; providing the second NL query to the LLM module of the robot system; receiving, from the LLM module, the response; and delivering (e.g., verbalizing) the response to the person by the robot system. In this way, the interaction with a person initiated by the robot system in method 700 may continue through multiple iterations, for any number of successive inbound and outbound verbalizations between the robot system and the person, with the robot system leveraging both an LLM module and omnichannel customer data available to the robot system.
Method 700 describes a method of operation of a robot system in which the robot system initiates a interaction (e.g., a verbal interaction) with a person. However, in some situations, it may be the person who initiates an interaction (e.g., a verbal interaction) with the robot system.
At 801, the robot system receives an inbound verbalization from a person in an environment of the robot system. The inbound verbalization may include, without limitation: a comment, a question, a statement, a declaration, a query, an inquiry, a request, or similar. The robot may receive the inbound verbalization through at least one sensor, such as through a microphone that detects the audible inbound verbalization and converts the audio signal into an electrical signal that is processed by at least one processor on-board the robot body of the robot system.
At 802, the robot system identifies the person. In some implementations, act 802 of method 800 may be substantially similar to act 701 of method 700.
At 803, the robot system accesses information about the person based on the identity of the person determined at 802. In some implementations, act 803 of method 800 may be substantially similar to act 702 of method 700.
At 804, the robot system generates a natural language (NL) query, similar in some implementations to act 703 of method 700 though the composition of the NL query in method 800 may be different. In method 800, the NL query may include a NL description of the information about the person accessed at 803, a NL description of contextual information, a NL transcription of the inbound verbalization received from the person by the robot system at 801, and a NL request for a response verbalization (i.e., an outbound verbalization in response to the inbound verbalization) for the robot system to deliver to the person.
At 805, the robot system provides the NL query to a large language model (LLM) module of the robot system. In some implementations, act 805 of method 800 may be substantially similar to act 704 of method 700.
At 806, the robot system receives, from the LLM module, the response verbalization for the robot system to deliver to the person. In some implementations, act 806 of method 800 may be substantially similar to act 705 of method 700.
At 807, the robot system delivers the response verbalization to the person. In some implementations, act 807 of method 800 may be substantially similar to act 706 of method 700.
Throughout this specification and the appended claims the term “communicative” as in “communicative coupling” and in variants such as “communicatively coupled,” is generally used to refer to any engineered arrangement for transferring and/or exchanging information. For example, a communicative coupling may be achieved through a variety of different media and/or forms of communicative pathways, including without limitation: electrically conductive pathways (e.g., electrically conductive wires, electrically conductive traces), magnetic pathways (e.g., magnetic media), wireless signal transfer (e.g., radio frequency antennae), and/or optical pathways (e.g., optical fiber). Exemplary communicative couplings include, but are not limited to: electrical couplings, magnetic couplings, radio frequency couplings, and/or optical couplings.
Throughout this specification and the appended claims, infinitive verb forms are often used. Examples include, without limitation: “to encode,” “to provide,” “to store,” and the like. Unless the specific context requires otherwise, such infinitive verb forms are used in an open, inclusive sense, that is as “to, at least, encode,” “to, at least, provide,” “to, at least, store,” and so on.
This specification, including the drawings and the abstract, is not intended to be an exhaustive or limiting description of all implementations and embodiments of the present systems, methods, control modules and computer program products. A person of skill in the art will appreciate that the various descriptions and drawings provided may be modified without departing from the spirit and scope of the disclosure. In particular, the teachings herein are not intended to be limited by or to the illustrative examples of computer systems and computing environments provided.
This specification provides various implementations and embodiments in the form of block diagrams, schematics, flowcharts, and examples. A person skilled in the art will understand that any function and/or operation within such block diagrams, schematics, flowcharts, or examples can be implemented, individually and/or collectively, by a wide range of hardware, software, and/or firmware. For example, the various embodiments disclosed herein, in whole or in part, can be equivalently implemented in one or more: application-specific integrated circuit(s) (i.e., ASICs); standard integrated circuit(s); computer program(s) executed by any number of computers (e.g., program(s) running on any number of computer systems); program(s) executed by any number of controllers (e.g., microcontrollers); and/or program(s) executed by any number of processors (e.g., microprocessors, central processing units, graphical processing units), as well as in firmware, and in any combination of the foregoing.
Throughout this specification and the appended claims, a “memory” or “storage medium” is a processor-readable medium that is an electronic, magnetic, optical, electromagnetic, infrared, semiconductor, or other physical device or means that contains or stores processor data, data objects, logic, instructions, and/or programs. When data, data objects, logic, instructions, and/or programs are implemented as software and stored in a memory or storage medium, such can be stored in any suitable processor-readable medium for use by any suitable processor-related instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the data, data objects, logic, instructions, and/or programs from the memory or storage medium and perform various acts or manipulations (i.e., processing steps) thereon and/or in response thereto. Thus, a “non-transitory processor-readable storage medium” can be any element that stores the data, data objects, logic, instructions, and/or programs for use by or in connection with the instruction execution system, apparatus, and/or device. As specific non-limiting examples, the processor-readable medium can be: a portable computer diskette (magnetic, compact flash card, secure digital, or the like), a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM, EEPROM, or Flash memory), a portable compact disc read-only memory (CDROM), digital tape, and/or any other non-transitory medium.
The claims of the disclosure are below. This disclosure is intended to support, enable, and illustrate the claims but is not intended to limit the scope of the claims to any specific implementations or embodiments. In general, the claims should be construed to include all possible implementations and embodiments along with the full scope of equivalents to which such claims are entitled.
This application claims priority to U.S. Provisional Patent Application No. 63/441,897, filed on Jan. 30, 2023, titled “Robot Control Systems, Methods, and Computer Program Products That Leverage Large Language Models”, the entirety of which is incorporated by reference herein.
Number | Date | Country | |
---|---|---|---|
63441897 | Jan 2023 | US |