AGENT DEVICE, METHOD OF CONTROLLING AGENT DEVICE, AND COMPUTER-READABLE NON-TRANSIENT STORAGE MEDIUM

CROSS-REFERENCE TO RELATED APPLICATION

Priority is claimed on Japanese Patent Application No. 2019-056595, filed Mar. 25, 2019, the content of which is incorporated herein by reference.

BACKGROUND OF THE INVENTION
Field of the Invention

The present invention relates to an agent device, a method of controlling the agent device, and a computer-readable non-transient storage medium.

Description of Related Art

In the related art, a technology related to an agent function that provides information related to driving assistance according to requirements of an occupant, control of a vehicle, other applications, or the like, while performing conversation with the occupant in the vehicle has been disclosed (for example, see Japanese Unexamined Patent Application, First Publication No. 2006-335231).

SUMMARY OF THE INVENTION

In recent years, while practical applications of vehicles with a plurality of agent functions including an agent function based on network connection have been promoted, cooperation of the plurality of agent functions when a communication environment of network connection is changed has not been sufficiently studied. For this reason, in the related art, when the communication environment of the network connection is changed, in some cases, cooperation of the plurality of agent functions could not be achieved, none of them could operate, and thus, inconvenience may be caused to an occupant in the vehicle.

An aspect of the present invention is directed to providing an agent device, a method of controlling the agent device, and a computer-readable non-transient storage medium that are capable of continuously providing agent functions even when a communication environment of network connection is changed.

An agent device, a method of controlling the agent device, and a program according to the present invention employs the following configurations.

(1) An agent device according to an aspect of the present invention includes a plurality of agent function parts configured to provide a service including causing an output section to output a response by a voice in response to a speech of an occupant in a vehicle by using at least partially a network connection using a wireless communication device, wherein the plurality of agent function parts includes: a first agent function part; and a second agent function part having less dependence on the network connection than that of the first agent function part, a network observation part configured to recognize or predict a change in communication environment of the network connection is further provided, and the second agent function part is configured to store the speech of the occupant in the vehicle in a storage when the first agent function part is under operation and deterioration of the communication environment of the network connection is recognized or predicted by the network observation part.

(2) In the aspect of the above-mentioned (1), the second agent function part may store an order executed by the first agent function part according to a requirement from the occupant in the vehicle in the storage when the deterioration of the communication environment of the network connection is recognized or predicted by the network observation part.

(3) In the aspect of the above-mentioned (1) or (2), the agent device may further include a notification controller configured to notify that a function of the first agent function part is restricted to the occupant in the vehicle when the deterioration of the communication environment of the network connection is recognized or predicted by the network observation part.

(4) In the aspect of the above-mentioned (1) to (3), the agent device may further include a notification controller configured to notify the occupant in the vehicle so that a speech to the second agent function part is urged when the deterioration of the communication environment of the network connection is recognized or predicted by the network observation part.

(5) In the aspect of the above-mentioned (1) to (4), the agent device may further include an agent starting part configured to control starting of each of the plurality of agent function parts, wherein the agent starting part starts the second agent function part when the deterioration of the communication environment of the network connection is recognized or predicted by the network observation part.

(6) In the aspect of the above-mentioned (1) to (5), the agent device may further include an agent starting part configured to control starting each of the plurality of agent function parts, wherein the storage stores at least a part of a function of the second agent function part, and the agent starting part acquires at least a part of the function of the second agent function part from the storage and starts the second agent function part when the deterioration of the communication environment of the network connection is recognized or predicted by the network observation part.

(7) In the aspect of the above-mentioned (1) to (6), the network observation part may predict a change in communication environment of the network connection based on position information of the vehicle and communication environment information of the network connection corresponding to a position of the vehicle.

(8) In the aspect of the above-mentioned (1) to (7), the first agent function part may acquire a speech of the occupant in the vehicle from the storage when it is recognized by the network observation part that the communication environment of the network connection is returned to a state before the deterioration.

(9) A method of controlling an agent device according to another aspect of the present invention, which is performed by a computer, includes: starting any one of a first agent function part and a second agent function part having less dependence on a network connection than that of the first agent function part using a wireless communication device; providing a service including causing an output section to output a response by a voice in response to a speech of an occupant in a vehicle using the network connection at least partially as a function of the started agent function part; recognizing or predicting a change in communication environment of the network connection; and causing the second agent function part to store the speech of the occupant in the vehicle in a storage when the first agent function part is under operation and deterioration of the communication environment of the network connection is recognized or predicted.

(10) A computer-readable non-transient storage medium according to still another aspect of the present invention storing a program executed in a computer, the program stored in the computer-readable non-transient storage medium includes processing of starting any one of a first agent function part and a second agent function part having less dependence on a network connection than that of the first agent function part using a wireless communication device; processing of providing a service including causing an output section to output a response by a voice in response to a speech of an occupant in a vehicle using the network connection at least partially as a function of the started agent function part; processing of recognizing or predicting a change in communication environment of the network connection; and processing of causing the second agent function part to store the speech of the occupant in the vehicle in a storage when the first agent function part is under operation and deterioration of the communication environment of the network connection is recognized or predicted.

According to the aspects of the above-mentioned (1) to (10), an agent function can be continuous provided even when a communication environment of network connection is changed.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a view showing a configuration of an agent system including an agent device.

FIG. 2 is a view showing a configuration of the agent device according to the embodiment, and instruments mounted on a vehicle.

FIG. 3 is a view showing an arrangement example of display and operation device.

FIG. 4 is a view for describing an example of an agent image.

FIG. 5 is a view showing a configuration of an agent server and a part of the configuration of the agent device.

FIG. 6 is a flowchart for describing a flow of a series of processing of the agent device according to the embodiment.

FIG. 7 is a view for describing an operation of the agent device according to the embodiment.

DETAILED DESCRIPTION OF THE INVENTION

Hereinafter, embodiments of an agent device, a method of controlling an agent device, and a computer-readable non-transient storage medium of the present invention will be described with reference to the accompanying drawings. The agent device is a device configured to realize a part or all of the agent system. Hereinafter, as an example of the agent device, the agent device mounted on a vehicle (hereinafter, a vehicle M) and including a plurality of types of agent functions will be described. The agent function is, for example, a function that provides various types of information based on requirements (commands) contained in a speech of an occupant in the vehicle M or mediates network services while conversing with the occupant by at least partially using network connection that uses, for example, a wireless communication device. A plurality of types of agents may have different functions, processing sequences, controls, output aspects and contents, which are carried out respectively. In addition, some of the agent functions may have functions that perform control or the like of instruments in the vehicle (for example, instruments related to driving control or vehicle body control).

The agent functions are realized by integrally using, for example, a natural language processing function (a function of understanding a structure or meaning of text), a conversation management function, a network searching function of searching another device via a network or searching predetermined database provided in a host device, or the like, in addition to a voice recognition function of recognizing voice of an occupant (a function of converting voice into text). Some or all of the functions may be realized by an artificial intelligence (AI) technology. In addition, a part of the configuration of performing the functions (in particular, a voice recognition function or a natural language processing interpretation function) may be mounted an agent server (an external device) that can communicate with on an on-vehicle communication device of the vehicle M or a general purpose communication device brought into the vehicle M. In the following description, it is assumed that a part of the configuration is mounted on the agent server, and the agent device and the agent server cooperate to realize the agent system. In addition, a service provider (a service entity) that virtually appears in cooperation with the agent device and the agent server is referred to as an agent.

FIG. 1 is a configuration view of an agent system 1 including an agent device 100. The agent system 1 includes, for example, the agent device 100, a plurality of agent servers 200-1, 200-2 and 200-3, and the like. The numbers after the hyphen at the ends of the numerals are identifiers for distinguishing the agents. When there is no distinction which agent server is provided, it may be simply referred as agent servers 200. While three agent servers 200 are shown in FIG. 1, the number of the agent servers 200 may be two, or may be four or more. The agent servers 200 are managed by providers of agent systems that are different from each other. Accordingly, the agents according to the present invention are agents realized by different providers. As the providers, for example, a car manufacturer, a network service business operator, an electronic commerce business operator, a portable terminal seller or manufacturer, and the like, are exemplified, and an arbitrary subject (a corporation, a group, a private person, or the like) may become a provider of an agent system.

The agent device 100 is in communication with the agent servers 200 via a network NW. The network NW includes, for example, some or all of the Internet, a cellular network, a Wi-Fi network, a wide area network (WAN), a local area network (LAN), a public line, a telephone line, a radio base station, and the like. Various types of web servers 300 are connected to the network NW, and the agent servers 200 or the agent device 100 can acquire web pages from the various types of web servers 300 via the network NW.

The agent device 100 performs conversation with the occupant in the vehicle M, transmits voice from the occupant to the agent server 200, and suggests a reply obtained from the agent server 200 to the occupant in a form of voice output or image display.

[Vehicle]

FIG. 2 is a view showing a configuration of the agent device 100 according to the embodiment, and instruments mounted on the vehicle M. For example, one or more microphones 10, display and operation device 20, a speaker unit 30 (an output section), a navigation device 40, a vehicle instrument 50, an on-vehicle communication device 60, and the agent device 100 are mounted on the vehicle M. In addition, a general purpose communication device 70 such as a smartphone or the like may be brought into a passenger compartment and used as a communication device. The on-vehicle communication device 60 and the general purpose communication device 70 are examples of “wireless communication devices.” These devices are connected to each other via a multiple communication line such as a controller area network (CAN) communication line or the like, a serial communication line, a wireless communication network, or the like. Further, the configuration shown in FIG. 2 is merely an example, and a part of the configuration may be omitted or another configuration may be added.

The microphone 10 is a sound pickup part configured to collect voice emitted in the passenger compartment. The display and operation device 20 is a device (or a device group) configured to receive input operations while displaying images. The display and operation device 20 includes a display device configured as, for example, a touch panel. The display and operation device 20 may further include a head up display (HUD) or a mechanical input device. The speaker unit 30 includes, for example, a plurality of speakers (sound output sections) disposed at difference positions in the passenger compartment. The display and operation device 20 may be shared by the agent device 100 and the navigation device 40. Detailed description thereof will be followed.

The navigation device 40 includes a navigation human machine Interface (HMI), a global positioning device such as a global positioning system (GPS) or the like, a storage device on which map information is stored, and a control device (a navigation controller) configured to perform route search or the like. Some or all of the microphone 10, the display and operation device 20 and the speaker unit 30 may be used as a navigation HMI. The navigation device 40 searches a route (a navigation route) to move from a position of the vehicle M specified by the global positioning system device to a destination input by the occupant, and outputs guide information using the navigation HMI such that the vehicle M can travel along the route.

The route search function may be provided in the navigation server that is accessible via the network NW. In this case, the navigation device 40 acquires a route from the navigation server and outputs guide information. Further, the agent device 100 may be built on the basis of a navigation controller, and in this case, the navigation controller and the agent device 100 are integrally configured on hardware.

The vehicle instrument 50 includes, for example, a driving force output device such as an engine, a traveling motor, or the like, a starting motor of the engine, a door lock device, a door opening/closing device, a window, a window opening/closing device and a window opening/closing control device, a seat, a seat position control device, a rearview mirror, an angle and a position control device of the rearview mirror, illumination devices inside and outside the vehicle and control devices thereof, a wiper or a defogger and a control device thereof, a direction indicator lamp and a control device thereof, an air-conditioning device, a vehicle information device containing information of a traveling distance or pneumatic pressures of tires, residual quantity information of fuel, or the like, or the like.

The on-vehicle communication device 60 is accessible to, for example, the network NW using a cellular network or a Wi-Fi network.

FIG. 3 is a view showing an arrangement example of the display and operation device 20. The display and operation device 20 includes, for example, a first display 22, a second display 24 and an operation switch ASSY 26. The display and operation device 20 may further include an HUD 28.

In the vehicle M, for example, a driver's seat DS on which a steering wheel SW is provided, and an assistant driver's seat AS provided in a vehicle width direction (a Y direction in the drawings) with respect to the driver's seat DS, are present. The first display 22 is a laterally elongated display device extending from a middle area in an installment panel between the driver's seat DS and the assistant driver's seat AS to a position facing a left end portion of the assistant driver's seat AS.

The second display 24 is located in the middle between the driver's seat DS and the assistant driver's seat AS in the vehicle width direction and below the first display. For example, the first display 22 and the second display 24 are both configured as a touch panel, and include a liquid crystal display (LCD), organic electroluminescence (EL), a plasma display, or the like, as a display section. The operation switch ASSY 26 is an assembly in which dial switches, button type switches, and the like, are integrated. The display and operation devices 20 output contents of operations performed by the occupant to the agent device 100. The contents displayed by the first display 22 or the second display 24 may be determined by the agent device 100.

[Agent Device]

Returning to FIG. 2, the agent device 100 includes a management part 110, agent function parts 150-1, 150-2 and 150-3, and a pairing application execution part 152. The management part 110 includes, for example, a sound processing part 112, a wake up (WU) determination part 114 for each agent, a display controller 116, a voice controller 118, a network observation part 120, an agent starting part 122 and a storage 124. The display controller 116 and the voice controller 118 are an example of “a notification controller.” When agent function parts are not distinguished, they are simply referred to as the agent function parts 150. Illustration of the three agent function parts 150 is merely an example corresponding to the number of the agent servers 200 in FIG. 1, and the number of the agent function parts 150 may be two, or four or more. A software layout shown in FIG. 2 is shown for the sake of simple explanation, and in fact, for example, the software layout can be arbitrarily modified such that the management part 110 is interposed between the agent function parts 150 and the on-vehicle communication device 60.

The components of the agent device 100 except the storage 124 are realized by executing a program (software) using a hardware processor such as a central processing unit (CPU) or the like. Some or all of these components may be realized by hardware (a circuit part; including circuitry) such as a large scale integration (LSI), an application specific integrated circuit (ASIC), a field-programmable gate array (FPGA), a graphics processing unit (GPU), or the like, or may be realized by cooperation of software and hardware. The program may be previously stored on a storage device such as a hard disk drive (HDD), a flash memory, or the like (a storage device including a non-transient storage medium), or the program may be stored on a detachable storage medium (non-transient storage medium) such as DVD, CD-ROM, or the like, and installed when the storage medium is mounted on a drive device. The storage 124 is realized by an HDD, a flash memory, a random access memory (RAM), or the like.

The management part 110 functions by a program such as an operating system (OS), middleware, or the like, being executed.

The sound processing part 112 of the management part 110 performs sound processing on the input sound so as to be in a state in which wake-up-words preset for each agent are appropriately recognized.

The WU determination part 114 for each agent recognizes the wake-up-words present to correspond to the agent function parts 150-1, 150-2 and 150-3 and predetermined for each agent. The WU determination part 114 for each agent recognizes a meaning of voice from the voice (voice stream) on which sound processing is performed. First, the WU determination part 114 for each agent detects a voice section on the basis of an amplitude and zero crossing of a voice waveform in the voice stream. The WU determination part 114 for each agent may perform section detection based on voice identification and non-voice identification of a frame unit based on a Gaussian mixture model (GMM).

Next, the WU determination part 114 for each agent converts the voice in the detected voice section into text and sets the text as character information. Then, the WU determination part 114 for each agent determines whether character information corresponding to text corresponds to a wake-up-word. When it is determined that the character information corresponding to text is the wake-up-word, the WU determination part 114 for each agent notifies to the agent starting part 122 that there is a starting instruction to the corresponding agent function part 150. Further, the function corresponding to the WU determination part 114 for each agent may be mounted on the agent server 200. In this case, the management part 110 transmits the voice stream on which the sound processing is performed by the sound processing part 112 to the agent server 200, and when the agent server 200 determines that the voice stream has a wake-up-word, the agent function parts 150 are started according to an instruction from the agent servers 200. Further, each of the agent function parts 150 may be always running and perform determination of the wake-up-word by itself. In this case, it is not necessary for the management part 110 to include the WU determination part 114 for each agent.

The agent function part 150 cooperates with the corresponding agent server 200 to make the agent appear, and provides a service including causing the output section to output a response by the voice in response to the speech of the occupant in the vehicle. The agent function parts 150 may include those authorized to control the vehicle instrument 50. In addition, some of the agent function parts 150 may communicate with the agent server 200 in cooperation with the general purpose communication device 70 via the pairing application execution part 152.

For example, the agent function part 150-1 communicates with the agent server 200-1 via the on-vehicle communication device 60. The agent function part 150-2 has the authority to control the vehicle instrument 50. The agent function part 150-2 is in communication with the agent server 200-2 via the on-vehicle communication device 60. The agent function part 150-3 cooperates with the general purpose communication device 70 via the pairing application execution part 152 and is in communication with the agent server 200-3. The pairing application execution part 152 performs pairing with the general purpose communication device 70 using, for example, Bluetooth (registered trade name), and connects the agent function part 150-3 and the general purpose communication device 70. Further, the agent function part 150-3 may be connected to the general purpose communication device 70 through wired communication using a universal serial bus (USB) or the like.

In addition, the agent function parts 150 include a first agent function part, and one or more second agent function parts having less dependence on a network connection than that of the first agent function part. For example, the first agent function part is the agent function parts 150 on the premise of network connection, and the second agent function part is the agent function parts 150 not on the premise of network connection. For example, when the communication environment of the network connection used by the vehicle M deteriorates, since the operation of the second agent function part is simpler than the original operation, its operation can be continued to some extent, whereas the operation of the first agent function part cannot be continued. An operation to some extent is, for example, an operation that can respond by using at least a part of the function of the second agent function part stored in the storage 124. In the embodiment, the agent function part 150-1 and the agent function part 150-3 are an example of “the first agent function part” and the agent function part 150-2 is an example of “the second agent function part.” Further, in the following description, regarding the first agent function part, the agent function part 150-1 may be referred to as the first agent function part 150-1, and the agent function part 150-2 may be referred to as the second agent function part 150-2.

The display controller 116 displays an image on the first display 22 or the second display 24 according to an instruction from the agent function parts 150. Hereinafter, the first display 22 is used. The display controller 116 generates, for example, an anthropomorphic image of the agent (hereinafter, referred to as an agent image) that performs communication with the occupant in the passenger compartment according to the control of some of the agent function parts 150, and displays the generated agent image on the first display 22. The agent image is, for example, an image in a mode of talking to the occupant. The agent image may include, for example, a face image to a level at which an expression or a face orientation can be recognized by at least a viewer (occupant). For example, in the agent image, parts imitating the eyes and the nose may be represented in the face region, and the expression or the face orientation may be recognized on the basis of the positions of the parts in the face region. In addition, the agent image may be perceived three-dimensionally, the face orientation of the agent may be recognized by the viewer by including a head image in a three-dimensional space, or an action, a behavior, a posture, or the like, of the agent may be recognized by including an image of a main body (a torso, or hands and feet). In addition, the agent image may be an animation image.

The voice controller 118 causes some or all of the speakers included in the speaker unit 30 to output the voice according to the instruction from the agent function parts 150. The voice controller 118 may perform control of localizing the sound image of the agent voice at a position corresponding to the display position of the agent image using the plurality of speaker units 30. The position corresponding to the display position of the agent image is, for example, a position at which the occupant is predicted to perceive that the agent image is speaking the agent voice, and specifically, is a position in the vicinity of the display position of the agent image. In addition, the localization of the sound image is to determine, for example, a spatial position of the sound source felt by the occupant by adjusting a loudness of the sound transmitted to the left and right ears of the occupant.

The network observation part 120 observes the communication environment of the network connection used by the vehicle M. The network observation part 120 recognizes, for example, a change in the communication environment of the network connection based on the signal input via the on-vehicle communication device 60 and the general purpose communication device 70. A change in the communication environment of the network connection includes deterioration of the communication environment and improvement of the communication environment. In addition, the network observation part 120 may predict a change in the communication environment of the network connection. The network observation part 120 may predict, for example, a change in the communication environment of the network connection based on position information of the vehicle M and communication environment information of the network connection corresponding to the position of the vehicle. The communication environment information may be, for example, information related to a radio field intensity received from a radio base station by the vehicle M, or may be information related to a telecommunications standard (4G, 5G, or the like) of the network connection.

In this case, the network observation part 120 may identify, for example, the position of the vehicle M through a GPS sensor mounted on the navigation device 40, and acquire the communication environment information around the specified position of the vehicle M from the various types of web servers 300. Then, the network observation part 120 may predict the change in the communication environment of the network connection with reference to a traveling route of the vehicle M set through the navigation device 40 based on the communication environment information acquired from the various types of web servers 300. In addition, the network observation part 120 may predict, for example, the change in the communication environment of the network connection based on traveling circumstances of the vehicle M. As an example, the traveling circumstance of the vehicle M is acquired with reference to the traveling route of the vehicle M set through the navigation device 40 and the map data of the navigation device 40. In this case, the network observation part 120 may predict that the communication environment of the network connection will change when a predetermined place is included in the traveling route of the vehicle M. Further, the predetermined place includes at least some of places in which the communication environment of the network connection cannot be easily maintained, for example, a tunnel, an underground parking lot, a border, or the like.

The network observation part 120 outputs notification to the display controller 116 and the voice controller 118 when the change in the communication environment of the network connection is recognized or predicted. The display controller 116 modifies an agent image displayed on the first display 22 when the notification is acquired from the network observation part 120. The voice controller 118 modifies contents of the voice output through the microphone 10 when the notification is acquired from the network observation part 120.

The agent starting part 122 controls starting of the plurality of agent function parts 150-1, 150-2 and 150-3. The agent starting part 122 starts the corresponding agent function part when there is a notification from the WU determination part 114 for each agent and the change in the communication environment of the network connection is recognized or predicted.

The agent starting part 122 shifts the first agent function part 150-1 from an operating state to a standby state when the first agent function part 150-1 has been activated and deterioration of the communication environment of the network connection is recognized or predicted. In addition, the agent starting part 122 starts the second agent function part 150-2. In addition, when the deterioration of the communication environment of the network connection is recognized or predicted, the started second agent function part 150-2 stores the speech of the occupant in the vehicle M on the storage 124. As an example, the speech of the occupant in the vehicle M may be data expressing an order representing a task executed by the first agent function part 150-1 when activated according to a requirement from the occupant in the vehicle M. As an example, the orders may be data of the voice acquired by the operating first agent function part 150-1 from the microphone 10 via the management part 110, or may be data expressing types of voice commands executed on the basis of an input of the voice. The second agent function part 150-2 acquires an order from the storage 124 and executes the order while the communication environment of the network connection is deteriorated.

The agent starting part 122 starts the first agent function part 150-1 that has been activated immediately before deterioration of the communication environment of the network connection when the communication environment returns to a state before deterioration. In this case, the first agent function part 150-1 is executed by acquiring the order executed immediately before deterioration of the communication environment from the storage 124. In addition, the agent starting part 122 shifts the second agent function part 150-2 operating until the communication environment of the network connection returns from an operating state to a standby state.

FIG. 4 is a view for describing an example of an agent image before and after deterioration of the communication environment of the network connection is recognized or predicted. In the example shown in FIG. 4, the display controller 116 displays an anthropomorphic agent image G-1 of the agent 1 on the first display 22 when the agent function part 150-1 is operating. Next, the display controller 116 displays an anthropomorphic agent image G-2 of an agent 2 on the first display 22 according to starting of the second agent function part 150-2 based on the notification acquired from the network observation part 120 when the deterioration of the communication environment of the network connection is recognized or predicted.

In addition, the display controller 116 displays a notification image G-3 showing that a function of the first agent function part 150-1 is restricted on the first display 22 so as to overlap with the agent image G-1 based on the notification acquired from the network observation part 120 when the deterioration of the communication environment of the network connection is recognized or predicted. In addition, the voice controller 118 displays a notification image G-4 that requests a speech to the second agent function part 150-2 that is active and operable to some extent in the communication environment in the present state on the first display 22 so as to overlap with the agent image G-2 based on the notification acquired from the network observation part 120 when the deterioration of the communication environment of the network connection is recognized or predicted. Further, when the deterioration of the communication environment of the network connection is recognized or predicted, in addition to (or instead of) display of an image by the display controller 116, output of the voice through the microphone 10 by the voice controller 118 may be modified. In this case, for example, the voice controller 118 may output a notification voice expressing that a function of the first agent function part 150-1 is restricted when the deterioration of the communication environment of the network connection is recognized or predicted, or may output a notification voice that requests a speech to the second agent function part 150-2.

[Agent Server]

FIG. 5 is a view showing a configuration of the agent server 200 and a part of a configuration of the agent device 100. Hereinafter, an operation of the agent function parts 150 or the like will be described together with the configuration of the agent server 200. Here, description of physical communication from the agent device 100 to the network NW will be omitted.

The agent server 200 includes a communication part 210. The communication part 210 is a network interface such as a network interface card (NIC) or the like. Further, the agent server 200 includes, for example, a voice recognition part 220, a natural language processing part 222, a conversation management part 224, a network search part 226, and an answer sentence generating part 228. These components are realized by executing, for example, a program (software) using a hardware processor such as a CPU or the like. Some or all of these components may be realized by hardware (a circuit part; including circuitry) such as an LSI, ASIC, FPGA, GPU, or the like, or may be realized by cooperation of software and hardware.

The program may be previously stored on a storage device such as a HDD, a flash memory, or the like (a storage device including a non-transient storage medium), or the program may be stored on a detachable storage medium (a non-transient storage medium) such as a DVD, a CD-ROM, or the like, and installed as a storage medium is mounted a drive device.

The agent server 200 includes a storage 250. The storage 250 is realized by various types of storage devices. Data or programs such as a personal profile 252, a dictionary database (DB) 254, a knowledge-based DB 256, a response regulation DB 258, and the like, are stored on the storage 250.

In the agent device 100, the agent function parts 150 transmit the voice stream, or a voice stream on which processing such as compression, encoding, or the like, is performed, to the agent server 200. The agent function parts 150 may perform processing required by the voice command when a voice command that can be locally processed (processing that does not pass through the agent server 200) is recognized. A voice command that can be locally processed is a voice command that can be replied to by referring to the storage (not shown) included in the agent device 100, or a voice command that controls the vehicle instrument 50 (for example, a command or the like to turn on the air-conditioning device) in the case of the agent function part 150-1. Accordingly, the agent function parts 150 may have some of the functions included in the agent server 200.

When the voice stream is obtained, the voice recognition part 220 performs voice recognition and outputs character information corresponding to text, and the natural language processing part 222 performs meaning interpretation thereon while referring to the dictionary DB 254 with respect to the character information. The dictionary DB 254 is a DB in which abstracted meaning information is associated with the character information. The dictionary DB 254 may include table information of synonyms or near synonyms.

The processing of the voice recognition part 220 and the processing of the natural language processing part 222 need not be clearly separate, and may interact with each other such that the voice recognition part 220 receives the processing result of the natural language processing part 222 and modifies the recognition result.

When a meaning such as “Today's weather?” or “How is the weather?” is recognized as a recognition result, the natural language processing part 222 generates, for example, a command replacing with standard character information “Today's weather”. Accordingly, even if there is variation in the wording of the text of a request, it is possible to easily perform a required conversation. In addition, the natural language processing part 222 may recognize, for example, the meaning of the character information by using artificial intelligence processing such as machine learning processing using a probability or the like, and generate a command based on the recognition result.

The conversation management part 224 determines contents of a speech made to the occupant in the vehicle M while referring the personal profile 252, the knowledge-based DB 256, and the response regulation DB 258 on the basis of the processing results (the commands) of the natural language processing part 222. The personal profile 252 includes personal information of an occupant, interests and preferences thereof, personal history of past conversations, and the like, stored for each occupant. The knowledge-based DB 256 is information that defines a relationship between things. The response regulation DB 258 is information that defines an operation to be performed by the agent with respect to a command (a reply or contents of instrument control, or the like).

In addition, the conversation management part 224 may identify the occupant by performing comparison of the personal profile 252 using the feature information obtained from the voice stream. In this case, in the personal profile 252, for example, personal information is associated with feature information of the voice. The feature information of the voice is, for example, information associated with features of talking such as a voice pitch, intonation, rhythm (a sound pitch pattern), or the like, or a feature value due to Mel frequency Cepstrum coefficients or the like. The feature information of the voice is, for example, information obtained by an occupant uttering the sound of a predetermined word, sentence, or the like, upon initial registration of the occupant, and recognizing the spoken voice.

The conversation management part 224 causes the network search part 226 to perform searching when the command requires information that can be searched for through the network NW. The network search part 226 accesses the various types of web servers 500 via the network NW, and acquire desired information. The “information that can be searched for via the network NW” is, for example, an evaluation result of a general user of a restaurant around the vehicle M or a weather forecast according to the position of the vehicle M on that day.

The answer sentence generating part 228 generates the answer sentence and transmits the answer sentence to the agent device 100 such that the contents of the speech determined by the conversation management part 224 are transmitted to the occupant of the vehicle M. The answer sentence generating part 228 may call the name of the occupant or generate an answer sentence that is made to resemble the speech of the occupant when the occupant is identified as the occupant registered in the personal profile.

The agent function parts 150 perform voice synthesis and instruct the voice controller 118 to output the voice when the answer sentence is acquired. In addition, the agent function parts 150 instruct the display controller 116 to display the image of the agent according to the voice output. As a result, the agent function in which a virtually appearing agent responds to the occupant in the vehicle M is realized.

[Processing Flow of Agent Device]

Hereinafter, a flow of a series of processing of the agent device 100 according to the embodiment will be described using a flowchart. FIG. 6 is a flowchart for describing the flow of the series of processing of the agent device 100 according to the embodiment. The processing of the flowchart may be, for example, started together with starting of the first agent function part 150-1.

First, the network observation part 120 determines whether the deterioration of the communication environment of the network connection is recognized or predicted during operation of the first agent function part 150-1 (step S10). The agent starting part 122 starts the second agent function part 150-2 when it is determined that the deterioration of the communication environment of the network connection is recognized or predicted by the network observation part 120 (step S12). The second agent function part 150-2 outputs an instruction to the agent starting part 122 such that an order that is being executed by the first agent function part 150-1 is stored in the storage 124 (step S14). The agent starting part 122 acquires the order that is being executed by the first agent function part 150-1 from the first agent function part 150-1 and stores the order in the storage 124 based on the instruction acquired from the second agent function part 150-2 (step S16). In addition, the agent starting part 122 acquires the order from the storage 124 and outputs the order to the second agent function part 150-2 (step S18). Then, the agent starting part 122 shifts the first agent function part 150-1 from the operating state to the standby state (step S20).

Next, the network observation part 120 determines whether the communication environment of the network connection is returned to a state before deterioration (step S22). The agent starting part 122 starts the first agent function part 150-1 when it is determined that the communication environment of the network connection is returned by the network observation part 120 (step S24). In addition, the agent starting part 122 acquires the order from the storage 124 and outputs the order to the first agent function part 150-1 (step S26). After that, the agent starting part 122 shifts the second agent function part 150-2 from the operating state to the standby state (step S28). Accordingly, processing of the flowchart is terminated.

FIG. 7 is a view for describing an operation of the agent device 100 according to the embodiment. In the example shown in FIG. 7, an example in which the agent device 100 starts the second agent function part 150-2 provided that the deterioration of the communication environment of the network connection is recognized will be described exemplarily.

First, the management part 110 of the agent device 100 starts the second agent function part 150-2 when it is recognized that the communication environment of the network connection is deteriorated at a time t1. Then, the management part 110 of the agent device 100 operates an agent function having a low dependence on the network connection through the second agent function part 150-2. In this case, the second agent function part 150-2 stores the order that is being executed by the first agent function part 150-1 in the storage 124. Then, the second agent function part 150-2 acquires the order from the storage 124 and executes the order. In addition, the management part 110 of the agent device 100 shifts the first agent function part 150-1 from the operating state to the standby state.

After that, the management part 110 of the agent device 100 starts the first agent function part 150-1 when it is recognized that the communication environment of the network connection is returned at a time t2. Then, the management part 110 of the agent device 100 operates the agent function having high dependence on the network connection through the first agent function part 150-1. In this case, the first agent function part 150-1 acquires the order that was interrupted while the communication environment of the network connection was deteriorated from the storage 124. Then, the first agent function part 150-1 executes the order acquired from the storage 124. In addition, the management part 110 of the agent device 100 shifts the second agent function part 150-2 from the operating state to the standby state.

According to the agent device 100 of the above-mentioned embodiment, even when the communication environment of the network connection is changed, the agent function can be continuously provided. For example, when the communication environment of the network connection is deteriorated during the operation of the first agent function part 150-1 having high dependence on the network connection, the service provided by the first agent function part 150-1 may be interrupted. In this case, it may be impossible for any one of the plurality of agent function parts 150 to provide a service executed by the agent function until the second agent function part 150-2 having low dependence on the network connection has been activated. On the other hand, in the agent device 100 according to the embodiment, when the deterioration of the communication environment of the network connection is recognized or predicted, the second agent function part 150-2 is activated. In addition, the second agent function part 150-2 stores the order that is being executed by the first agent function part 150-1 in the storage 124. Then, the second agent function part 150-2 acquires the order from the storage 124 and executes the order. For this reason, even when the communication environment of the network connection is deteriorated, the agent function can be continuously provided by cooperating and operating the first agent function part 150-1 and the second agent function part 150-2.

In addition, according to the agent device 100, further, even when the communication environment of the network connection is changed, the agent function can be stably provided. For example, it is assumed that the second agent function part 150-2 has been activated under a condition in which the communication environment of the network connection is deteriorated. In this case, when the communication environment of the network connection is deteriorated, the first agent function part 150-1 having high dependence on the network connection has been activated until starting of the second agent function part 150-2 is terminated, and an operation of the service provided by the first agent function part 150-1 may become unstable. On the other hand, in the agent device 100 according to the embodiment, even before the communication environment of the network connection is deteriorated, the second agent function part 150-2 is activated provided that the deterioration of the communication environment of the network connection is predicted. For this reason, the agent function can be stably provided by the agent function parts 150.

In addition, according to the agent device 100, further, even when the communication environment of the network connection is changed, cooperation of the plurality of agent functions can be smoothly performed. For example, it is assumed that the function of the first agent function part 150-1 under operation can be restricted provided that the communication environment of the network connection is deteriorated. In this case, even when the speech is performed by the occupant of the vehicle M while the function of the first agent function part 150-1 is restricted, it may be impossible for the first agent function part 150-1 to smoothly provide the service. On the other hand, in the agent device 100 according to the embodiment, when the communication environment of the network connection is changed, it is notified to the occupant in the vehicle M that the function of the first agent function part 150-1 is restricted. For this reason, the occupant in the vehicle M is urged to perform the speech toward the second agent function part 150-2 under operation, instead of the first agent function part 150-1 in which the function is limited, while the function of the first agent function part 150-1 is restricted. For this reason, even when the communication environment of the network connection is changed, cooperation of the plurality of agent functions can be smoothly performed. In addition, in the agent device 100 according to the embodiment, when the communication environment of the network connection is changed, it is notified to the occupant in the vehicle M that the speech to the second agent function part 150-2 is urged. For this reason, even when the communication environment of the network connection is changed, cooperation of the plurality of agent functions can be more smoothly performed.

While preferred embodiments of the invention have been described and illustrated above, it should be understood that these are exemplary of the invention and are not to be considered as limiting. Additions, omissions, substitutions, and other modifications can be made without departing from the scope of the present invention. Accordingly, the invention is not to be considered as being limited by the foregoing description, and is only limited by the scope of the appended claims.

AGENT DEVICE, METHOD OF CONTROLLING AGENT DEVICE, AND COMPUTER-READABLE NON-TRANSIENT STORAGE MEDIUM

Information

Publication Number

Date Filed

Date Published

Inventors

Original Assignees

CPC

International Classifications

Abstract

Description

Claims

Priority Claims (1)