ELECTRONIC DEVICE AND OPERATING METHOD FOR GENERATING RESPONSE TO USER INPUT

TECHNICAL FIELD

The disclosure relates to an electronic device and operation method thereof for generating a response to a user input.

BACKGROUND

With the advancement of mobile communication technology and processor technology, portable electronic devices (hereinafter, electronic devices) implement a variety of diverse functions in addition to conventional call functions. Recently, with the development of artificial intelligence (AI) technology, AI-related functions have found inclusion in portable electronic devices. For example, voice assistants such as Bixby, Alexa, and Google Assistant are AI-related functions now implemented in portable electronic devices. These voice assistants may receive and interpret a user's voice or text-based command, execute a corresponding action, and output a voice or textual response to the user. The voice assistant can provide visual feedback such as media imagery, user interfaces (UI), text, and audial information, such as music and synthesized speech, so that user may immediately recognize the voice assistant's response.

SUMMARY

When a voice assistant outputs responses to users using predefined templates aimed at responding to specific groups or classes, it often executes an operation corresponding to the user's basic query, but may fail to provide a service that evokes the user's interest and/or intrigue. A voice assistant built upon pre-written templates cannot adequately reflect the user's emotion and tone, which limits or restricts the ability to provide truly customized service reflecting users' individual characteristics. Accordingly, this may degrade the user's experience with the AI, and may even reduce usage of the AI.

Certain embodiments of the disclosure are provide a device and method which can generate a response to a user by reflecting characteristics and expressions appropriate to the user's current context, and the topic of discussion.

According to certain embodiments of the disclosure, an electronic device may include: input/output circuitry, a memory, and a processor operably connected to the memory, wherein the processor may be configured to: receive, via the input/output circuitry, an input including textual or voice information, detect a tone from among a plurality of predefined tones from the received textual or voice information, detect a user-specific expression from the received textual or voice information, generate a response to the received textual or voice information, based on the detected tone and the detected user-specific expression, and output the generated response based on at least one of textual or vocal output.

According to certain embodiments of the disclosure, an operation method of the electronic device may include: receiving, using input/output circuitry, an input including textual or voice information, detecting, by at least one processor, a tone from among a plurality of predefined tones from the received textual or voice information, detecting a user-specific expression from the received textual or voice information, generating a response to the received textual or vocal information based on the detected tone and the detected user-specific expression, and outputting the generated response using at least one of vocal or textual output.

According to certain embodiments of the disclosure, a response to a user's input may be generated that reflects the individual characteristics of a particular user, which accounts for variations in style, expression, tone, etc. of a specific user. Further, the generated response may account for time and/or place, which correspond to the situation specified in user's input (e.g., voice, text) and context.

According to certain embodiments of the disclosure, the familiarity and intimacy of the user experience with the AI may be increased by providing a more contextual and nuanced AI-generated response to the user, thereby increasing usage of the AI service.

BRIEF DESCRIPTION OF DRAWINGS

FIG. 1 is a block diagram of an electronic device in a network environment according to certain embodiments.

FIG. 2 is a block diagram illustrating an integrated intelligence system according to certain embodiments.

FIG. 3 is a diagram illustrating a form of relation information between concepts and actions stored in a database according to certain embodiments.

FIG. 4 is a diagram illustrating a user terminal that displays a screen for processing a received voice input through an intelligent application according to certain embodiments.

FIG. 5 is a configuration diagram of a voice assistant system according to certain embodiments.

FIG. 6 illustrates an architecture of the electronic device according to certain embodiments.

FIG. 7 illustrates an example in which an expression detection module detects a user-specific expression according to certain embodiments.

FIG. 8 illustrates an example in which the expression detection module detects a user's expression on a specific topic according to certain embodiments.

FIG. 9 shows an example of managing a user-specific expression database according to certain embodiments.

FIG. 10 is a structural diagram of a response generation module according to certain embodiments.

FIG. 11 illustrates an example of generating a response based on user's characteristics according to certain embodiments.

FIG. 12 illustrates an example of generating a response based on a user's situation according to certain embodiments.

FIG. 13 illustrates an example of generating a response based on user's characteristics according to certain embodiments.

FIG. 14 illustrates an example of generating a response based on the location of the user according to certain embodiments.

FIG. 15 illustrates an example of generating a response based on a specific topic and a user's expression according to certain embodiments.

FIG. 16 illustrates another example of generating a response based on a specific topic and a user's expression according to certain embodiments.

FIG. 17 is a flowchart for the electronic device to generate a response to a user's input according to certain embodiments.

DETAILED DESCRIPTION

FIG. 1 is a block diagram illustrating an electronic device 101 in a network environment 100 according to certain embodiments. Referring to FIG. 1, the electronic device 101 in the network environment 100 may communicate with an electronic device 102 via a first network 198 (e.g., a short-range wireless communication network), or at least one of an electronic device 104 or a server 108 via a second network 199 (e.g., a long-range wireless communication network). According to an embodiment, the electronic device 101 may communicate with the electronic device 104 via the server 108. According to an embodiment, the electronic device 101 may include a processor 120, memory 130, an input module 150, a sound output module 155, a display module 160, an audio module 170, a sensor module 176, an interface 177, a connecting terminal 178, a haptic module 179, a camera module 180, a power management module 188, a battery 189, a communication module 190, a subscriber identification module(SIM) 196, or an antenna module 197. In some embodiments, at least one of the components (e.g., the connecting terminal 178) may be omitted from the electronic device 101, or one or more other components may be added in the electronic device 101. In some embodiments, some of the components (e.g., the sensor module 176, the camera module 180, or the antenna module 197) may be implemented as a single component (e.g., the display module 160).

The processor 120 may execute, for example, software (e.g., a program 140) to control at least one other component (e.g., a hardware or software component) of the electronic device 101 coupled with the processor 120, and may perform various data processing or computation. According to an embodiment, as at least part of the data processing or computation, the processor 120 may store a command or data received from another component (e.g., the sensor module 176 or the communication module 190) in volatile memory 132, process the command or the data stored in the volatile memory 132, and store resulting data in non-volatile memory 134. According to an embodiment, the processor 120 may include a main processor 121 (e.g., a central processing unit (CPU) or an application processor (AP)), or an auxiliary processor 123 (e.g., a graphics processing unit (GPU), a neural processing unit (NPU), an image signal processor (ISP), a sensor hub processor, or a communication processor (CP)) that is operable independently from, or in conjunction with, the main processor 121. For example, when the electronic device 101 includes the main processor 121 and the auxiliary processor 123, the auxiliary processor 123 may be adapted to consume less power than the main processor 121, or to be specific to a specified function. The auxiliary processor 123 may be implemented as separate from, or as part of the main processor 121.

The auxiliary processor 123 may control at least some of functions or states related to at least one component (e.g., the display module 160, the sensor module 176, or the communication module 190) among the components of the electronic device 101, instead of the main processor 121 while the main processor 121 is in an inactive (e.g., sleep) state, or together with the main processor 121 while the main processor 121 is in an active state (e.g., executing an application). According to an embodiment, the auxiliary processor 123 (e.g., an image signal processor or a communication processor) may be implemented as part of another component (e.g., the camera module 180 or the communication module 190) functionally related to the auxiliary processor 123. According to an embodiment, the auxiliary processor 123 (e.g., the neural processing unit) may include a hardware structure specified for artificial intelligence model processing. An artificial intelligence model may be generated by machine learning. Such learning may be performed, e.g., by the electronic device 101 where the artificial intelligence is performed or via a separate server (e.g., the server 108). Learning algorithms may include, but are not limited to, e.g., supervised learning, unsupervised learning, semi-supervised learning, or reinforcement learning. The artificial intelligence model may include a plurality of artificial neural network layers. The artificial neural network may be a deep neural network (DNN), a convolutional neural network (CNN), a recurrent neural network (RNN), a restricted Boltzmann machine (RBM), a deep belief network (DBN), a bidirectional recurrent deep neural network (BRDNN), deep Q-network or a combination of two or more thereof but is not limited thereto. The artificial intelligence model may, additionally or alternatively, include a software structure other than the hardware structure.

The memory 130 may store various data used by at least one component (e.g., the processor 120 or the sensor module 176) of the electronic device 101. The various data may include, for example, software (e.g., the program 140) and input data or output data for a command related thererto. The memory 130 may include the volatile memory 132 or the non-volatile memory 134.

The program 140 may be stored in the memory 130 as software, and may include, for example, an operating system (OS) 142, middleware 144, or an application 146.

The input module 150 may receive a command or data to be used by another component (e.g., the processor 120) of the electronic device 101, from the outside (e.g., a user) of the electronic device 101. The input module 150 may include, for example, a microphone, a mouse, a keyboard, a key (e.g., a button), or a digital pen (e.g., a stylus pen).

The sound output module 155 may output sound signals to the outside of the electronic device 101. The sound output module 155 may include, for example, a speaker or a receiver. The speaker may be used for general purposes, such as playing multimedia or playing record. The receiver may be used for receiving incoming calls. According to an embodiment, the receiver may be implemented as separate from, or as part of the speaker.

The display module 160 may visually provide information to the outside (e.g., a user) of the electronic device 101. The display module 160 may include, for example, a display, a hologram device, or a projector and control circuitry to control a corresponding one of the display, hologram device, and projector. According to an embodiment, the display module 160 may include a touch sensor adapted to detect a touch, or a pressure sensor adapted to measure the intensity of force incurred by the touch.

The audio module 170 may convert a sound into an electrical signal and vice versa. According to an embodiment, the audio module 170 may obtain the sound via the input module 150, or output the sound via the sound output module 155 or a headphone of an external electronic device (e.g., an electronic device 102) directly (e.g., wiredly) or wirelessly coupled with the electronic device 101.

The sensor module 176 may detect an operational state (e.g., power or temperature) of the electronic device 101 or an environmental state (e.g., a state of a user) external to the electronic device 101, and then generate an electrical signal or data value corresponding to the detected state. According to an embodiment, the sensor module 176 may include, for example, a gesture sensor, a gyro sensor, an atmospheric pressure sensor, a magnetic sensor, an acceleration sensor, a grip sensor, a proximity sensor, a color sensor, an infrared (IR) sensor, a biometric sensor, a temperature sensor, a humidity sensor, or an illuminance sensor.

The interface 177 may support one or more specified protocols to be used for the electronic device 101 to be coupled with the external electronic device (e.g., the electronic device 102) directly (e.g., wiredly) or wirelessly. According to an embodiment, the interface 177 may include, for example, a high definition multimedia interface (HDMI), a universal serial bus (USB) interface, a secure digital (SD) card interface, or an audio interface.

A connecting terminal 178 may include a connector via which the electronic device 101 may be physically connected with the external electronic device (e.g., the electronic device 102). According to an embodiment, the connecting terminal 178 may include, for example, a HDMI connector, a USB connector, a SD card connector, or an audio connector (e.g., a headphone connector).

The haptic module 179 may convert an electrical signal into a mechanical stimulus (e.g., a vibration or a movement) or electrical stimulus which may be recognized by a user via his tactile sensation or kinesthetic sensation. According to an embodiment, the haptic module 179 may include, for example, a motor, a piezoelectric element, or an electric stimulator.

The camera module 180 may capture a still image or moving images. According to an embodiment, the camera module 180 may include one or more lenses, image sensors, image signal processors, or flashes.

The power management module 188 may manage power supplied to the electronic device 101. According to an embodiment, the power management module 188 may be implemented as at least part of, for example, a power management integrated circuit (PMIC).

The battery 189 may supply power to at least one component of the electronic device 101. According to an embodiment, the battery 189 may include, for example, a primary cell which is not rechargeable, a secondary cell which is rechargeable, or a fuel cell.

The communication module 190 may support establishing a direct (e.g., wired) communication channel or a wireless communication channel between the electronic device 101 and the external electronic device (e.g., the electronic device 102, the electronic device 104, or the server 108) and performing communication via the established communication channel. The communication module 190 may include one or more communication processors that are operable independently from the processor 120 (e.g., the application processor (AP)) and supports a direct (e.g., wired) communication or a wireless communication. According to an embodiment, the communication module 190 may include a wireless communication module 192 (e.g., a cellular communication module, a short-range wireless communication module, or a global navigation satellite system (GNSS) communication module) or a wired communication module 194 (e.g., a local area network (LAN) communication module or a power line communication (PLC) module). A corresponding one of these communication modules may communicate with the external electronic device via the first network 198 (e.g., a short-range communication network, such as Bluetooth™, wireless-fidelity (Wi-Fi) direct, or infrared data association (IrDA)) or the second network 199 (e.g., a long-range communication network, such as a legacy cellular network, a 5G network, a next-generation communication network, the Internet, or a computer network (e.g., LAN or wide area network (WAN)). These various types of communication modules may be implemented as a single component (e.g., a single chip), or may be implemented as multi components (e.g., multi chips) separate from each other. The wireless communication module 192 may identify and authenticate the electronic device 101 in a communication network, such as the first network 198 or the second network 199, using subscriber information (e.g., international mobile subscriber identity (IMSI)) stored in the subscriber identification module 196.

The wireless communication module 192 may support a 5G network, after a 4G network, and next-generation communication technology, e.g., new radio (NR) access technology. The NR access technology may support enhanced mobile broadband (eMBB), massive machine type communications (mMTC), or ultra-reliable and low-latency communications (URLLC). The wireless communication module 192 may support a high-frequency band (e.g., the mmWave band) to achieve, e.g., a high data transmission rate. The wireless communication module 192 may support various technologies for securing performance on a high-frequency band, such as, e.g., beamforming, massive multiple-input and multiple-output (massive MIMO), full dimensional MIMO (FD-MIMO), array antenna, analog beam-forming, or large scale antenna. The wireless communication module 192 may support various requirements specified in the electronic device 101, an external electronic device (e.g., the electronic device 104), or a network system (e.g., the second network 199). According to an embodiment, the wireless communication module 192 may support a peak data rate (e.g., 20 Gbps or more) for implementing eMBB, loss coverage (e.g., 164 dB or less) for implementing mMTC, or U-plane latency (e.g., 0.5 ms or less for each of downlink (DL) and uplink (UL), or a round trip of 1 ms or less) for implementing URLLC.

The antenna module 197 may transmit or receive a signal or power to or from the outside (e.g., the external electronic device) of the electronic device 101. According to an embodiment, the antenna module 197 may include an antenna including a radiating element implemented using a conductive material or a conductive pattern formed in or on a substrate (e.g., a printed circuit board (PCB)). According to an embodiment, the antenna module 197 may include a plurality of antennas (e.g., array antennas). In such a case, at least one antenna appropriate for a communication scheme used in the communication network, such as the first network 198 or the second network 199, may be selected, for example, by the communication module 190 (e.g., the wireless communication module 192) from the plurality of antennas. The signal or the power may then be transmitted or received between the communication module 190 and the external electronic device via the selected at least one antenna. According to an embodiment, another component (e.g., a radio frequency integrated circuit (RFIC)) other than the radiating element may be additionally formed as part of the antenna module 197.

According to certain embodiments, the antenna module 197 may form a mmWave antenna module. According to an embodiment, the mmWave antenna module may include a printed circuit board, a RFIC disposed on a first surface (e.g., the bottom surface) of the printed circuit board, or adjacent to the first surface and capable of supporting a designated high-frequency band (e.g., the mmWave band), and a plurality of antennas (e.g., array antennas) disposed on a second surface (e.g., the top or a side surface) of the printed circuit board, or adjacent to the second surface and capable of transmitting or receiving signals of the designated high-frequency band.

At least some of the above-described components may be coupled mutually and communicate signals (e.g., commands or data) therebetween via an inter-peripheral communication scheme (e.g., a bus, general purpose input and output (GPIO), serial peripheral interface (SPI), or mobile industry processor interface (MIPI)).

According to an embodiment, commands or data may be transmitted or received between the electronic device 101 and the external electronic device 104 via the server 108 coupled with the second network 199. Each of the electronic devices 102 or 104 may be a device of a same type as, or a different type, from the electronic device 101. According to an embodiment, all or some of operations to be executed at the electronic device 101 may be executed at one or more of the external electronic devices 102, 104, or 108. For example, if the electronic device 101 should perform a function or a service automatically, or in response to a request from a user or another device, the electronic device 101, instead of, or in addition to, executing the function or the service, may request the one or more external electronic devices to perform at least part of the function or the service. The one or more external electronic devices receiving the request may perform the at least part of the function or the service requested, or an additional function or an additional service related to the request, and transfer an outcome of the performing to the electronic device 101. The electronic device 101 may provide the outcome, with or without further processing of the outcome, as at least part of a reply to the request. To that end, a cloud computing, distributed computing, mobile edge computing (MEC), or client-server computing technology may be used, for example. The electronic device 101 may provide ultra low-latency services using, e.g., distributed computing or mobile edge computing. In another embodiment, the external electronic device 104 may include an internet-of-things (IoT) device. The server 108 may be an intelligent server using machine learning and/or a neural network. According to an embodiment, the external electronic device 104 or the server 108 may be included in the second network 199. The electronic device 101 may be applied to intelligent services (e.g., smart home, smart city, smart car, or healthcare) based on 5G communication technology or IoT-related technology.

FIG. 2 is a block diagram illustrating an integrated intelligence system according to certain embodiments.

With reference to FIG. 2, the integrated intelligent system of an embodiment may include a user terminal 210, an intelligent server 230, and a service server 250.

The user terminal 210 of an embodiment may be a terminal device (or, electronic device) connectable to the Internet, and may be, for example, a mobile phone, a smartphone, a personal digital assistant (PDA), a notebook computer, a TV, a domestic appliance, a wearable device, a HMD, or a smart speaker.

According to the illustrated embodiment, the user terminal 210 may include a communication interface 213, a microphone 212, a speaker 216, a display 211, a memory 215, or a processor 214. The components listed above may be operably or electrically connected to each other. The user terminal 210 may include at least some of the configurations and/or functions of the electronic device 101 in FIG. 1.

The communication interface 213 of an embodiment may be configured to transmit and receive data via connection to an external device. The microphone 212 may receive sound (e.g., user's utterance) and convert it into an electrical signal. The speaker 216 of an embodiment may output an electrical signal as sound (e.g., voice). The display 211 may be configured to display an image or video. The display 211 may also display a graphical user interface (GUI) of an application (e.g., an application program) that is presently executed.

The memory 215 of an embodiment may store a client module 218, a software development kit (SDK) 217, and a plurality of applications 219a and 219b. The client module 218 and the SDK 217 may include a framework (or a solution program) for performing general functions. In addition, the client module 218 or SDK 217 may include a framework for processing voice input.

The plurality of applications 219a and 219b in memory 215 may represent programs for performing a specified function or functions. According to an embodiment, the plurality of applications may include at least a first application 219a and a second application 219b. According to an embodiment, the plurality of applications 219a and 219b may each include a plurality of operations for performing a specified function. For example, the applications 219a and 219b may include an alarm application, a message application, and/or a schedule application. According to an embodiment, the plurality of applications 219a and 219b may be executed by the processor 214 to sequentially execute at least some of the plurality of operations.

The processor 214 of an embodiment may control the overall operation of the user terminal 210. For example, the processor 214 may be electrically connected to the communication interface 213, the microphone 212, the speaker 216, and the display 211 to perform specified operations.

The processor 214 of an embodiment may also execute a program stored in the memory 215 to perform a designated function. For example, the processor 214 may execute at least one of the client module 218 or the SDK 217 to perform the following operations for processing a voice input. The processor 214 may control the operation of the plurality of applications 219a and 219b through, for example, the SDK 217. The following operations described as operations of the client module 218 or the SDK 217 may be executed by the processor 214.

The client module 218 of an embodiment may receive voice input. For example, the client module 218 may receive a voice signal corresponding to the user's utterance as detected through the microphone 212. The client module 218 may transmit the received voice input to the intelligent server 230. The client module 218 may transmit state information of the user terminal 210 to the intelligent server 230 together with the received voice input. The state information may be, for example, information about the execution state of an application.

The client module 218 of an embodiment may receive a result corresponding to the received voice input. For example, when the intelligent server 230 produces a result corresponding to the received voice input, the client module 218 may receive the result corresponding to the received voice input. The client module 218 may display the received result on the display 211.

The client module 218 of an embodiment may receive a plan corresponding to a received voice input. The client module 218 may display a result obtained by executing a plurality of operations of an application according to the plan on the display 211. For example, the client module 218 may sequentially display execution results of a plurality of operations on the display 211. As another example, the user terminal 210 may display a partial result of executing a plurality of operations (e.g., result of the last operation) on the display 211.

According to an embodiment, the client module 218 may receive a request from the intelligent server 230 to obtain information utilized to produce a result corresponding to a voice input. According to an embodiment, the client module 218 may transmit the information to the intelligent server 230 in response to the request.

The client module 218 of an embodiment may transmit information including a result obtained by executing a plurality of operations according to the plan to the intelligent server 230. The intelligent server 230 may confirm that the received voice input has been correctly processed using the result information.

The client module 218 of an embodiment may include a speech recognition module. According to an embodiment, the client module 218 may recognize a voice input for performing a limited function through the speech recognition module. For example, the client module 218 may execute an intelligent application for processing a voice input to perform systematic operations through a designated input (e.g., “wake up!”).

The intelligent server 230 of an embodiment may receive information related to a user's voice input from the user terminal 210 through a communication network. According to an embodiment, the intelligent server 230 may convert data related to the received voice input into text data. According to an embodiment, the intelligent server 230 may generate a plan for performing a task corresponding to the user's voice input based on the text data.

According to an embodiment, the plan may be generated by an artificial intelligence (AI) system. The AI system may be a rule-based system, a neural network-based system (e.g., feedforward neural network (FNN)), a recurrent neural network (RNN). It may be a combination of these or another AI system. According to an embodiment, the plan may be selected from a set of predefined plans, or may be generated in real time in response to a user request. For example, the AI system may select at least one plan from among a plurality of predefined plans.

The intelligent server 230 of an embodiment may transmit a result according to a generated plan to the user terminal 210 or transmit a generated plan to the user terminal 210. According to an embodiment, the user terminal 210 may display a result according to the plan on the display 211. According to an embodiment, the user terminal 210 may display a result of executing operations according to the plan on the display 211.

The intelligent server 230 of an embodiment may include a front end 231, a natural language platform 232, a capsule database 238, an execution engine 233, an end user interface 234, a management platform 235, a big data platform 236, or an analytics platform 237.

The front end 231 according to an embodiment may receive a voice input from the user terminal 210. The front end 231 may transmit a response corresponding to the voice input.

According to an embodiment, the natural language platform 232 may include an automatic speech recognition module (ASR module) 232a, a natural language understanding module (NLU module) 232b, a planner module 232c, a natural language generator module (NLG module) 232d, or a text-to-speech module (TTS module) 232e.

The automatic speech recognition module 232a of an embodiment may convert a voice input received from the user terminal 210 into text data. The natural language understanding module 232b of an embodiment may identify the user's intent using the text data of the voice input. For example, the natural language understanding module 232b may identify the user's intention by executing syntactic analysis or semantic analysis on the text data. The natural language understanding module 232b of an embodiment may identify a meaning of a word extracted from the voice input based on linguistic features (e.g., grammatical elements) of morphemes or phrases, and may determine an intention of the user by matching the meaning of the identified word with the intention.

The planner module 232c of an embodiment may generate a plan using the intention and parameters determined by the natural language understanding module 232b. According to an embodiment, the planner module 232c may determine a plurality of domains utilized to perform the task based on the determined intention. The planner module 232c may determine a plurality of actions included in each of the a plurality of domains determined based on the intention. According to an embodiment, the planner module 232c may determine parameters utilized to execute the determined a plurality of actions or result values output by the execution of the a plurality of actions. The parameter and the result value may be defined as a concept of a specified format (or, class). Consequently, the plan may include a plurality of actions and a plurality of concepts determined by the user's intention. The planner module 232c may determine the relationship between the a plurality of actions and the a plurality of concepts in stages (or, hierarchically). For example, the planner module 232c may determine the execution order of the a plurality of actions determined based on the user's intention, based on the a plurality of concepts. In other words, the planner module 232c may determine the execution order of the a plurality of actions based on parameters utilized for execution of the a plurality of actions and results output by the execution of the a plurality of actions. Accordingly, the planner module 232c may generate a plan including association information (e.g., ontology) between a plurality of actions and a plurality of concepts. The planner module 232c may generate a plan by using information stored in a capsule database in which a set of relationships between concepts and actions is stored.

The natural language generator module 232d of an embodiment may convert specified information into a text form. The information in text form may be in the form of a natural language utterance. The text-to-speech module 232e of an embodiment may convert information in text form into information in speech form.

According to an embodiment, some or all of the functions of the natural language platform 232 may be implemented in the user terminal 210.

The capsule database may store information on relationships between a plurality of concepts and actions corresponding to a plurality of domains. A capsule according to an embodiment may include a plurality of action objects (or, action information) and concept objects (or, concept information) included in a plan. According to an embodiment, the capsule database may store a plurality of capsules in the form of a concept action network (CAN). According to an embodiment, a plurality of capsules may be stored in a function registry included in the capsule database.

The capsule database may include a strategy registry in which strategy information utilized for determining a plan corresponding to a voice input is stored. The strategy information may include reference information for determining one plan from among a plurality of plans corresponding to a voice input. According to an embodiment, the capsule database may include a follow-up registry in which information for suggesting a follow-up action to the user in a specified situation is stored. The follow-up action may include, for example, a follow-up utterance. According to an embodiment, the capsule database may include a layout registry that stores layout information of information output through the user terminal 210. According to an embodiment, the capsule database may include a vocabulary registry in which vocabulary information included in the capsule information is stored. According to an embodiment, the capsule database may include a dialog registry in which information about a dialog (or, interaction) with the user is stored. The capsule database may update a stored object through a developer tool. The developer tool may include a function editor for updating, for example, an action object or a concept object. The developer tool may include a vocabulary editor for updating the vocabulary. The developer tool may include a strategy editor for creating and registering strategies for determining plans. The developer tool may include a dialog editor for creating a dialog with the user. The developer tool may include a follow-up editor that may activate a follow-up goal and edit a follow-up utterance that provides a hint. The follow-up goal may be determined based on a currently set goal, a user's preference, or an environmental condition. In an embodiment, the capsule database may be implemented in the user terminal 210.

The execution engine 233 of an embodiment may produce a result by using a generated plan. The end user interface 234 may transmit the produced result to the user terminal 210. Hence, the user terminal 210 may receive the result and provide the received result to the user. The management platform 235 of an embodiment may manage information used in the intelligent server 230. The big data platform 236 of an embodiment may collect user data. The analytics platform 237 of an embodiment may manage the quality of service (QoS) of the intelligent server 230. For example, the analytics platform 237 may manage the components and processing speed (or, efficiency) of the intelligent server 230.

The service server 250 of an embodiment may provide a specified service (e.g., food order or hotel reservation) to the user terminal 210. According to an embodiment, the service server 250 may be a server operated by a third party. The service server 250 of an embodiment may provide the intelligent server 230 with information for generating a plan corresponding to the received voice input. The provided information may be stored in the capsule database. In addition, the service server 250 may provide result information according to a plan to the intelligent server 230. The service server 250 may further include one or more “CP” service servers 251, 252, 253, etc.

In the integrated intelligent system described above, the user terminal 210 may provide various intelligent services to the user in response to user inputs. The user input may include, for example, an input through a physical button, a touch input, or a voice input.

In an embodiment, the user terminal 210 may provide a speech recognition service through an intelligent application (or, speech recognition application) stored therein. In this case, for example, the user terminal 210 may recognize a user utterance or a voice input received through the microphone 212 and provide a service corresponding to the recognized voice input to the user.

In an embodiment, the user terminal 210 may perform a designated action alone or together with the intelligent server 230 and/or the service server 250 based on the received voice input. For example, the user terminal 210 may execute an application corresponding to the received voice input and perform a specified action through the executed application.

In an embodiment, when the user terminal 210 provides a service together with the intelligent server 230 and/or the service server 250, the user terminal 210 may detect a user's utterance by using the microphone 212 and generate a signal (or, voice data) corresponding to the detected user's utterance. The user terminal 210 may transmit the voice data to the intelligent server 230 through the communication interface 213.

As a response to the voice input received from the user terminal 210, the intelligent server 230 of an embodiment may generate a plan for performing a task corresponding to the voice input or a result of performing an action according to the plan. The plan may include, for example, a plurality of actions for performing a task corresponding to a user's voice input, and a plurality of concepts related to the a plurality of actions. The concept may define parameters input to the execution of a plurality of actions or result values output by the execution of a plurality of actions. The plan may include association information between a plurality of actions and a plurality of concepts.

The user terminal 210 of an embodiment may receive the response through the communication interface 213. The user terminal 210 may output a voice signal generated inside the user terminal 210 to the outside by using the speaker 216, and may output an image generated inside the user terminal 210 to the outside by using the display 211.

FIG. 2 illustrates an example in which the intelligence server 230 executes speech recognition of a voice input received from the user terminal 210, then executes natural language understanding and generation on the same, and finally produces a result using a plan, but certain embodiments of the present document are not limited thereto. For example, at least some components of the intelligent server 230 (e.g., natural language platform 232, execution engine 233, and capsule database 238) may be embedded in the user terminal 210 (or, electronic device 101 in FIG. 1), and their operations may be performed by the user terminal 210.

FIG. 3 is a diagram illustrating a form of relation information between concepts and actions stored in a database according to certain embodiments.

The capsule database 300 (e.g., capsule database 238 in FIG. 2) of the intelligent server (e.g., intelligent server 230 in FIG. 2) may store the capsules in the form of a concept action network (CAN). The capsule database may store actions for processing a task corresponding to a user's voice input and parameters utilized for the actions in the form of a concept action network (CAN).

The capsule database may store a plurality of capsules (capsule A 310, capsule B 320) corresponding to each of a plurality of domains (e.g., applications). According to an embodiment, one capsule (e.g., capsule A 310) may correspond to one domain (e.g., location (geo), application). Also, at least one service provider (e.g., CP 1331, CP 2332, CP3333 or CP 334) for performing a function for a domain related to the capsule may correspond to one capsule. According to an embodiment, one capsule may include at least one action 350 and at least one concept 360 for performing a specified function.

The natural language platform (e.g., natural language platform 232 in FIG. 2) may generate a plan for performing a task corresponding to the received voice input by using the capsules stored in the capsule database. For example, the planner module (e.g., planner module 232c in FIG. 2) of the natural language platform may generate a plan by using capsules stored in the capsule database. For example, actions 311 and 313 and concepts 312 and 314 of capsule A 310, and actions 321 and concepts 322 of capsule B 320 may be used to create a plan (e.g., in conjunction with actions and concepts 311, 312, 313 and 314).

FIG. 4 is a diagram illustrating a user terminal that displays a screen for processing a received voice input through an intelligent application according to certain embodiments.

The user terminal 400 may execute an intelligent application to process a user input through the intelligent server (e.g., intelligent server 230 in FIG. 2).

According to an embodiment, upon recognizing a specified voice input (e.g., “wake up!”) or receiving an input through a hardware key (e.g., a predesignated hardware key selectable to execute the intelligent application), the user terminal 400 may execute an intelligent application for processing a voice input. The user terminal 400 may execute the intelligent app in a state where, for example, a schedule application is running. According to an embodiment, the user terminal 400 may display an object (e.g., icon 411) corresponding to the intelligent application on the display 410. According to an embodiment, the user terminal 400 may receive a voice input a user's utterance. For example, the user terminal 400 may receive a voice input “Tell me this week's schedule.” According to an embodiment, the user terminal 400 may display, on the display 410, a user interface (UI) (e.g., input field) 413 of the intelligent application on which text data corresponding to the received voice input is displayed.

According to an embodiment, the user terminal 400 may display a result corresponding to the received voice input on the display 420. For example, the user terminal 400 may receive a plan corresponding to the received user input, and display a present week's schedule within the schedule application on the display 420 according to the received plan.

FIG. 5 is a composition diagram of a voice assistant system according to certain embodiments.

According to an embodiment, the voice assistant system 500 (e.g., electronic device 101 in FIG. 1) may be implemented in an electronic device. The voice assistant system 500 may include a natural language understanding (NLU) module 530, a dialog management module 540, and a natural language generation (NLG) module 560. According to an embodiment, the voice assistant system 500 may further include an automatic speech recognizer (ASR) 525, a text-to-speech (TTS) converter 585, an application database 550, a dialog database 552, and a user database 554. According to an embodiment, at least some components of the voice assistant system 500 may be implemented with a processor.

According to an embodiment, the voice assistant system 500 may receive a text 510 or a voice 520 input from the user. When the voice assistant system 500 receives a voice 520 input from the user, it may convert the voice 520 into text using the automatic speech recognizer 525.

According to an embodiment, the natural language understanding module 530 may perform syntactic analysis and/or semantic analysis on the text 510 received from the user, or the transcribed text generated from the voice 520 and received from the automatic speech recognizer 525. The natural language understanding module 530 may infer the meaning of the received text through syntactic analysis, and may derive the intention of the received text through semantic analysis. For example, upon receiving “Play the latest song,” the natural language understanding module 530 may infer that an intended function requested by the command is music playback, and that the object is a most-recent song, from the words “song,” “latest” and “play”.

According to an embodiment, the dialog management module 540 may store information derived through the natural language understanding module 530 in the dialog database 552 and/or the user database 554. According to an embodiment, the dialog management module 540 may retrieve information from the application database 550, the dialog database 552, and/or the user database 554 by using the information inferred through the natural language understanding module 530. According to an embodiment, the dialog management module 540 may search the dialog database 552 to extract a template to be utilized by the natural language generation module 560 as a response. For example, if the requested command is music playback, the dialog management module 540 may search the application database 550 to retrieve an application related to music playback, and may search the dialog database 552 to retrieve a template that can be used as a response.

According to an embodiment, the natural language generation module 560 may generate a response using a result obtained through the dialog management module 540. The natural language generation module 560 may generate a response by using a template. For example, the natural language generation module 560 may generate “I will play the latest song” in response to the command “Play the latest song.” As another example, in response to the command stating, “Text my mother; I am leaving work now”, the natural language generation module 560 may generate a response stating, “I sent a text message to your mother saying ‘I am leaving work now.’”

According to an embodiment, the response generated by the natural language generation module 560 may be displayed as text 570 by using the display, and/or may be converted into a vocal output 580, using the text-to-speech converter 585.

FIG. 6 illustrates an architecture of the electronic device according to certain embodiments.

With reference to FIG. 6, the electronic device 600 (e.g., voice assistant system 500 in FIG. 5) may include a tone detection module 620, an expression detection module 630, and a response generation module 640. The electronic device 600 may further include an automatic speech recognizer 610 (e.g., automatic speech recognizer 525 in FIG. 5), a text-to-speech converter 650 (e.g., text-to-speech converter 585 in FIG. 5), and/or a group-specific expression database 645.

According to certain embodiments, when a voice is received, the electronic device 600 may convert it into text by using the automatic speech recognizer 610.

According to certain embodiments, the tone detection module 620 may store a natural language processing result such as domain, intention, and object for the received voice or text, a context such as received time and place, and/or user information. The tone detection module 620 may infer or detect the user's explicit intention and implicit intention included in the received voice or text by using the stored information.

According to certain embodiments, the tone detection module 620 may detect a tone by using a pattern of a voice or text received from the user. Specifically, the tone detection module 620 may determine or detect the formality, fluency, and sentiment based on the tone. The tone detection module 620 may include a formality classification module 622, a fluency classification module 624, and a sentiment analysis module 626.

According to certain embodiments, the formality classification module 622 may determine a preset degree of formality through, for example, particular prestored words, phrases or expressions used in the received or converted text and display the determined degree of formality as a value. The degree of formality may indicate whether the text or voice was received in a public setting or in a private setting. For example, the value output by the formality classification module 622 may be one of 1 to 5. The formality classification module 622 may determine the degree of formality by using at least one of, for example, the form of the sentence-closing ending, the usage frequency of words containing Chinese characters, or the conciseness and clarity of the sentence. According to an embodiment, the value of the formal classification module 622 may be determined through learning on a collated corpus. The formality classification module 622 may determine how formal the received or converted text is by performing contrastive learning on a corpus that records, for example, speaker's verbal expressions in newspapers or news, or conversations between friends. Thereafter, the response generation module 640 may infer a situation based on the value of the formality classification module 622 to generate a response suitable for the situation.

According to certain embodiments, the fluency classification module 624 may evaluate the level of fluency of the received or converted text with regard to the current language and/or the level of familiarity with the subject matter of the contents included in the received or converted text, and display it as a fluency value. For example, the value output by the fluency classification module 624 may be “1” to “5.” Even if a text or voice is presented in a language of the user commonly utilized in daily life, the level of familiarity with a specific field (e.g., a topical subject) may vary. The electronic device 600 may determine the degree of fluency, facilitating generation of a response that accounts for an estimate of the user's understanding in the specific field (e.g., the topical subject). For example, if the user is an adult or a child, the degree of fluency may be different even the voice or text data indicates the same subject. As another example, when the subject indicates a particular disease, even if all the users are adults, the degree of fluency between a medical doctor and an ordinary layperson on the particular subject may be different. Therefore, the response generation module 640 may select a word, term, or expression to be used in the response based on the value determined by the fluency classification module 624.

According to certain embodiments, the sentiment analysis module 626 may display a result of determining the user's psychological state by using the expression included in the received or converted text. The sentiment analysis module 626 may determine the user's subjective impression, emotion, attitude, and/or opinion on the subject of the contents included in the received or converted text, and analogize and display the emotional state such as joy, anger, sadness, and/or urgency. For example, if the user's mood is inferred to be “relaxed” and “good” based on characteristics detected from at least the received voice input, the electronic device 600 may provide various pieces of related information, in tandem with a basic response generated according to the subject of the content, as included in the received or converted text. As another example, if “anger” or “urgency” is inferred from the received or converted text, the electronic device 600 may provide the basic response as generated based on the subject of the contents included in the received or converted text, in absence of the various pieces of related information.

According to certain embodiments, the tone detection module 620 may combine the results of the formality classification module 622, the fluency classification module 624, and the sentiment analysis module 626 to generate a comprehensive attribute (e.g., tone, manner) of the response to be generated by the response generation module 640. The tone detection module 620 may transfer the determined result to the response generation module 640. Alternatively, the tone detection module 620 may transfer the individual results of the formality classification module 622, the fluency classification module 624, and the sentiment analysis module 626 to the response generation module 640.

According to certain embodiments, the expression detection module 630 may identify and detect a user-specific expression (e.g., tone or narration) and an expression for a specific topic (e.g., specific contents) from the received or converted text. User-specific expressions can be learned through the growth process and experience of each user. For example, a user's particularities in dialect, a sentence-ending forms, and a habitually used “flowery words” (e.g., more advanced vocabulary) may be stored as user-specific expressions. Expressions for a specific topic may include those expressions utilized specifically within conversations on specific topics or with groups, and include words and phrases that are generally not included in the flow of normal conversation. As such, terms, abbreviations, and buzzwords commonly utilized in a community centered on particular topics may correspond to expressions for a specific topic. For example, these may include certain acronyms, terms and phrases are specifically used among fan clubs.

Likewise, “memes” or expressions that popular in specific Internet communities may be included in the concept of a expression on a specific topic. Because normal expressions and contextual expressions may differ even for a single user, in order to detect a user-specific expression, the expressions used may be stored according each topic and situation. The expression detection module 630 may detect a user's expression by associating each expression with each corresponding situation, and associating each term with each specific topic, through a comparison with a general corpus (e.g., by machine learning). The user's expression, as detected through learning in this manner, may enable the electronic device 600 to provide an AI service that can develop increased familiarity in usage and interaction with the user.

According to certain embodiments, the expression detection module 630 may include a general expression detection module 632, a user-specific expression database 634, and a named-entity expression detection module 636.

According to certain embodiments, the general expression detection module 632 may store and detect a user-specific general expression in the received voice or text. The general expression detection module 632 may store a user-specific general expression in the user-specific expression database 634 for each user, and detect it from the user-specific expression database 634. A general expression may be an expression, such as a verb, an adjective, or an exclamation, excluding an entity. An example of a user-specific general expression may be “sming” (Korean slang for “streaming”), which corresponds to “playing music.”

According to certain embodiments, the named-entity expression detection module 636 may store and detect a user-specific named-entity expression in the received voice or text. The named-entity expression detection module 636 may store a user-specific named-entity expression in the user-specific expression database 634 for each user, and detect it from the user-specific expression database 634. An example of a user-specific named-entity expression may be a user's pet name for a specific person.

According to certain embodiments, the user-specific expression database 634 may be segmented for each user. The user-specific expression database 634 may store user-specific general expressions and user-specific named-entity expressions. User-specific general expressions and user-specific named-entity expressions may be stored together with information about the domain, topic, situation, content, and counterpart (e.g., another user that is present). In addition, priorities can be further stored.

According to certain embodiments, expressions used in a community that the user has subscribed to or frequently accesses may be stored in the group-specific expression database 645. For example, if there is a history of a user accessing a cat community, and cat-related expressions such as “˜ha-nyang”, “nyang-nim (dear cat)”, “servant”, and “˜ong (meow)” are frequently detected in the community, these frequently-detected expressions may be stored in the group-specific expression database 645.

According to certain embodiments, the response generation module 640 may generate a response to the received voice or text by using the result of the tone detection module 620 and the result of the expression detection module 630. The response generation module 640 may generate a response by using a user-specific general expression and a user-specific named-entity expression. According to an embodiment, the response generation module 640 may generate a response by further using an expression stored in the group-specific expression database 645. The response generation module 640 may generate different responses according to the domain, topic, situation, content, and counterpart. For example, when the user inputs “send a text ‘hello,’” the response generation module 640 may generate a response message stating ‘Hi,’ if the counterpart is a friend, but may generate a response message stating, ‘Hello,’ if the counterpart is a boss.

According to certain embodiments, the electronic device 600 may use the text-to-speech converter 650 to output the generated response as a voice. The electronic device 600 may display the generated response as text by using the display or may output the generated response as a voice by using the speaker.

Next, a detailed description will be given of specific operations or various examples of some modules of the electronic device 600.

FIG. 7 illustrates an example in which the expression detection module detects a user-specific expression according to certain embodiments.

According to certain embodiments, the expression detection module 630 may utilize a user log 710 (e.g., a history) of the electronic device to detect a user-specific expression. The user log 710 may include utterance records, short message service (SMS) records, and/or call log automatic send/receive (ASR) records. The expression detection module 630 may compare the user log 710 with expressions stored in a system database 720 to detect user-specific expressions in the user log 710. The system database 720 may store natural language understanding training data (NLU training data) and/or a large conversation corpus. The expression detection module 630 may use an expression type classifier 730 to classify the user-specific expression detected in the user log 710 into a user-specific general expression 740 and a user-specific named-entity expression 750.

According to certain embodiments, the expression detection module 630 may store the classified user-specific general expression 740 and user-specific named-entity expression 750 in a user database for each user (e.g., user-specific expression database 634 in FIG. 6).

FIG. 8 illustrates an example in which the expression detection module detects a user's expression on a specific topic according to certain embodiments.

According to certain embodiments, the expression detection module 630 may utilize a user log 810 of the electronic device to detect a user's expression on a specific topic. The user log 810 may include content and/or corpus of a frequently visited website or community (e.g., an Internet discussion board directed to a specific topic). The expression detection module 630 may compare expressions included in the user log 810 with expressions stored in the system database 820 to detect a user's particular expressions on a specific topic in the user log 810. The system database 820 (e.g., system database 720 in FIG. 7) may store natural language understanding training data (i.e., NLU training data) and/or a literary style corpus of Wikipedia, novels, and articles. The expression detection module 630 may use an expression type classifier 830 (e.g., expression type classifier 730 in FIG. 7) to classify the user's expression on a specific topic detected in the user log 810 into a user-specific general expression 840 and a user-specific named-entity expression 850.

According to certain embodiments, the expression detection module 630 may also detect an expression that is not directly used by the user. For example, when a community that the user has subscribed to or frequently accesses is detected in the user log 810, the expression detection module 630 may detect expressions frequently used in the corresponding community (e.g., “˜ha-nyang”, “nyang-nim” meaning “dear cat,” “servant”, “˜ong which is an onomatopoeia for a cat's meow). The expression detection module 630 may use the expression type classifier 830 to classify an expression frequently used in a specific community detected in the user log 810 into a user-specific general expression 840, a user-specific named-entity expression 850, or other.

According to certain embodiments, the expression detection module 630 may store the classified user-specific general expression 840 and user-specific named-entity expression 850 in a user database for each user (e.g., user-specific expression database 634 in FIG. 6). According to an embodiment, the expression detection module 630 may separately store expressions directly used by the user, and expressions not directly used by the user.

FIG. 9 shows an example of managing a user-specific expression database according to certain embodiments.

According to certain embodiments, the expression detection module (e.g., expression detection module 630 in FIG. 6) may store a user-specific expression and a user's expression on a specific topic in the user-specific expression database 930 (e.g., user-specific expression database 634 in FIG. 6) by using the method described with reference to FIGS. 7 and 8.

According to certain embodiments, the response generation module (e.g., response generation module 640 in FIG. 6) may generate a response using an expression extracted from the user-specific expression database 930. The user may then provide positive or negative feedback 910 for the generated response. The generated response may be stored in the log 920 together with the user's feedback.

According to certain embodiments, the priorities of a user-specific expression and user's expression on a specific topic stored in the user-specific expression database 930 may be adjusted based on the log 920.

According to certain embodiments, the user-specific expression and user's expression on a specific topic whose priority is adjusted can improve the quality of a response to be generated later, thereby increasing user satisfaction.

FIG. 10 is a configuration diagram of a deep learning module for generating a response according to certain embodiments.

According to certain embodiments, the response generation module (e.g., response generation module 640 in FIG. 6) may select a desired response template based on standardized values detected through a natural language processing process of input text or voice to generate a response.

According to certain embodiments, the response generating module 640 may generate a response using a deep learning module trained through corpus records, including pairs of input text or voice and response. According to certain embodiments, the response generation module 640 may include a deep learning module.

According to certain embodiments, the deep learning module may additionally receive user characteristics in addition to the generated response and convert the same into a response appropriate to the user's known current situation. The user characteristics may be values output from the tone detection module (e.g., tone detection module 620 in FIG. 6) and/or the expression detection module (e.g., expression detection module 630 in FIG. 6). For example, the deep learning module may receive as input at least one of formality 1010, fluency 1012, sentiment 1014, general expression 1016, named-entity expression 1018, basic response 1020, or expected response 1022.

According to certain embodiments, in the deep learning module, at least one of formality 1010, fluency 1012, sentiment 1014, general expression 1016, named-entity expression 1018, basic response 1020, or expected response 1022 received by the input 1030 may be sent to the hidden layer 1040, from the hidden layer 1040 to the decoder 1050, and from the decoder 1050 to the output 1060.

FIG. 11 illustrates an example of generating a response based on user's characteristics according to certain embodiments.

According to certain embodiments, in FIG. 11, the user characteristic may be a preference of the user. With reference to part (a) of FIG. 11, the user may generate an input stating, “play Bangtan Boys' song,” to the electronic device (e.g., electronic device 600 in FIG. 6) by voice or text. The tone detection module (e.g., tone detection module 620 in FIG. 6) may determine from the input voice or text that the degree of formality is “3,” the sentiment is “neutral,” and the fluency is “5.” The expression detection module (e.g., expression detection module 630 in FIG. 6) may detect “Bangtan Boys” as a named-entity expression. Then, the response generation module (e.g., response generation module 640 in FIG. 6) may output “I will play ON of BTS” based on the determined formality, sentiment, and fluency, and the detected named-entity expression.

With reference to part (b) of FIG. 11, the user may generate an input stating, “play our Bangtans' song” to the electronic device 600 by voice or text. The tone detection module 620 may determine from the input voice or text that the formality is “2,” the sentiment is “positive” and “excited,” and the fluency is “5.” The expression detection module 630 may detect “Bangtan Boys” as a named-entity expression. According to certain embodiments, the expression detection module 630 may further extract “sming” from the user-specific expression database (e.g., user-specific expression database 634 in FIG. 6) as an expression of the user characteristic for music playback. Then, the response generating module 640 may output “Let's sming together ON sung by our Bangtans” based on the determined formality, sentiment, fluency, and the detected named-entity expression and user characteristic expression.

FIG. 12 illustrates an example of generating a response based on a user's situation according to certain embodiments.

With reference to FIG. 12, the user may generate an input, “text my girlfriend, ‘traffic is backed up, please wait a minute, sorry,’” by voice or text to the electronic device (e.g., electronic device 600 in FIG. 6). The tone detection module (e.g., tone detection module 620 in FIG. 6) may determine from the input voice or text that formality is “2,” the sentiment is “neutral,” and the fluency is “5.” The expression detection module 630 may extract “backed up” and “sorry” from the user-specific expression database (e.g., user-specific expression database 634 in FIG. 6) as expressions indicative of the user's current situation. Then, the response generation module (e.g., response generation module 640 in FIG. 6) may output “I texted your girlfriend, ‘traffic is backed up! please wait a minute. sorry,’ based on the determined formality, sentiment, fluency, and expression for the user's situation.

FIG. 13 illustrates an example of generating a response based on user's characteristics according to certain embodiments.

According to certain embodiments, in FIG. 13, the user characteristic may be the language proficiency of a user. For example, the language proficiency of an adult may differ from that of a child.

With reference to part (a) of FIG. 13, an adult user may generate the input, “What is autumn?” to the electronic device (e.g., electronic device 600 in FIG. 6) by voice or text. The tone detection module (e.g., tone detection module 620 in FIG. 6) may determine from the input voice or text that the formality is “3,” the sentiment is “neutral,” and the fluency is “5,” and the expression detection module (e.g., expression detection module 630 in FIG. 6) may detect “autumn” as a named-entity expression. Then, the response generation module (e.g., response generation module 640 in FIG. 6) may generate the responsive output, “Autumn is one of the four seasons in the temperate region. It is also called fall.” based on the determined formality, sentiment, and fluency, and the detected named-entity expression.

With reference to part (b) of FIG. 13, a child user may generate the same input, “What is autumn?” to the electronic device 600 by voice or text. The tone detection module may determine from the input voice or text that the formality is “2,” the sentiment is “positive” and “excited,” and the fluency is “5,” and the expression detection module 630 may detect “autumn” as a named-entity expression. Then, the response generation module 640 may generate a different “Autumn is the season when the weather is cool between summer and winter. September to November is usually called autumn.” based on the determined formality, sentiment, and fluency, and the detected named-entity expression.

FIG. 14 illustrates an example of generating a response based on the location of the user according to certain embodiments.

Part (a) of FIG. 14 shows an example of using the electronic device (e.g., electronic device 600 in FIG. 6) at home, and part (b) of FIG. 14 shows an example of using the electronic device 600 in the outside or office.

With reference to part (a) of FIG. 14, the user may generate an input stating, “Hello, good morning today” to the electronic device 600 by voice or text. The tone detection module (e.g., tone detection module 620 in FIG. 6) may determine from the input voice or text that the formality is “3,” the sentiment is “happy” and “positive,” and the fluency is “5”, and the expression detection module (e.g., expression detection module 630 in FIG. 6) may detect nothing. Then, the response generation module (e.g., response generation module 640 in FIG. 6) may output response, “Let's start this morning lively and happily,” based on the determined formality, sentiment, and fluency.

With reference to part (b) of FIG. 14, the user may input “Good morning” to the electronic device 600 by voice or text. The tone detection module 620 may determine from the input voice or text that the formality is “5”, the sentiment is “positive,” and the fluency is “5,” and the expression detection module 630 may detect nothing. Then, the response generation module 640 may output “Hello OO, good morning.” based on the determined formality, sentiment, and fluency.

FIG. 15 illustrates an example of generating a response based on a specific topic and a user's expression according to certain embodiments.

With reference to FIG. 15, the user may generate the input “Sming ‘ON’ of our Bangtans” to the electronic device (e.g., electronic device 600 in FIG. 6) by voice or text. The voice or text input by the user may be input to the tone detection module (e.g., tone detection module 620 in FIG. 6) and the expression detection module (e.g., expression detection module 630 in FIG. 6).

According to certain embodiments, the general expression detection module (e.g., general expression detection module 632 in FIG. 6) of the expression detection module 630 may search the user-specific expression database (e.g., user-specific expression database 634 in FIG. 6) to detect “Sming” as a user's general expression, and the named-entity expression detection module (e.g., named-entity expression detection module 636 in FIG. 6) of the expression detection module 630 may search the user-specific expression database 634 to detect “Bangtans” as a named-entity expression.

According to certain embodiments, the formality classification module (e.g., formality classification module 622 in FIG. 6) of the tone detection module 620 may determine a degree of formality of an expression used in the input voice or text. The formality classification module 622 may determine the formality of “Sming ‘ON’ of our Bangtans” to be “2,” for example.

According to certain embodiments, the fluency classification module (e.g., fluency classification module 624 in FIG. 6) of the tone detection module 620 may determine the level of fluency of the input voice or text and/or the level of familiarity with the topic included in the input voice or text. The fluency classification module 624 may determine the fluency of “Sming ‘ON’ of our Bangtans” to be “5,” for example.

According to certain embodiments, the sentiment analysis module (e.g., sentiment analysis module 626 in FIG. 6) of the tone detection module 620 may display the result of inferring the user's psychological state by using the expression included in the input voice or text. The sentiment analysis module 626 may determine the sentiment of “Sming ‘ON’ of our Bangtans” to be “positive” and “excited,” for example.

According to certain embodiments, the response generation module (e.g., response generation module 640 in FIG. 6) may generate a response based on the result of the tone detection module 620 and the result of the expression detection module 630. For example, in reply to “Sming ‘ON’ of our Bangtans”, the response generation module 640 may generate the response, “Let's sming together ‘ON’ of our Bangtans.”

FIG. 16 illustrates another example of generating a response based on a specific topic and a user's expression according to certain embodiments.

With reference to FIG. 16, the user may generate the input, “text to Dayoung that I am sorry I am late because the traffic is backed up today,” by voice or text to the electronic device (e.g., electronic device 600 of FIG. 6). The voice or text input by the user may be input to the tone detection module (e.g., tone detection module 620 in FIG. 6) and the expression detection module (e.g., expression detection module 630 in FIG. 6).

According to certain embodiments, the general expression detection module (e.g., general expression detection module 632 in FIG. 6) of the expression detection module 630 may search the user-specific expression database (e.g., user-specific expression database 634 in FIG. 6), and thus detect the phrase “the traffic is really backed up today” as a user's general expression for “the traffic is backed up today”, and the phrase “I am really sorry I am late,” as a user's general expression for “I am sorry I am late.” According to certain embodiments, the named-entity expression detection module (e.g., named-entity expression detection module 636 in FIG. 6) of the expression detection module 630 may search the user-specific expression database 634 to detect “sweetheart” as a named-entity expression for “Dayoung.”

According to certain embodiments, the formality classification module (e.g., formality classification module 622 in FIG. 6) of the tone detection module 620 may infer the degree of formality of an expression used in the input voice or text. The formality classification module 622 may determine the formality of “text to Dayoung that I am sorry I am late because the traffic is backed up today” to be “2,” for example.

According to certain embodiments, the fluency classification module (e.g., fluency classification module 624 in FIG. 6) of the tone detection module 620 may determine the level of fluency of the input voice or text and/or the level of familiarity with the topic included in the input voice or text. The fluency classification module 624 may determine the fluency of “text to Dayoung that I am sorry I am late because the traffic is backed up today” to be “5,” for example.

According to certain embodiments, the sentiment analysis module (e.g., sentiment analysis module 626 in FIG. 6) of the tone detection module 620 may display the result of inferring the user's psychological state by using the expression included in the input voice or text. The sentiment analysis module 626 may determine the sentiment of “text to Dayoung that I am sorry I am late because the traffic is backed up today” to be “negative” and “apologetic,” for example.

According to certain embodiments, the response generation module (e.g., response generation module 640 in FIG. 6) may generate a response based on the result of the tone detection module 620 and the result of the expression detection module 630. For example, in reply to “text to Dayoung that I am sorry I am late because the traffic is backed up today”, the response generation module 640 may generate the response: “Sweetheart, the traffic is really backed up today . . . I am really sorry I am late.”

FIG. 17 is a flowchart for the electronic device to generate a response to a user's input according to certain embodiments.

According to certain embodiments, at operation 1710, the electronic device (e.g., electronic device 600 in FIG. 6) may receive a text or voice input from the user. The electronic device 600 may receive a voice-based input through the microphone or receive a text using the display. When a voice is received, the electronic device 600 may convert the voice information into text.

According to certain embodiments, at operation 1720, the electronic device 600 may detect a tone from the received voice or text. The electronic device 600 may classify and/or analyze formality, fluency, and sentiment from the detected tone. For example, the electronic device 600 may have previously stored classifications of estimated formality and fluency into levels or degrees, and analyze the sentiment based on characteristics within the received voice or text to select a formality or fluency.

According to certain embodiments, at operation 1730, the electronic device 600 may detect a user-specific expression from the received voice or text. According to certain embodiments, user-specific expressions may include a general expression and a named-entity expression. The electronic device 600 may store a user-specific expression in a database assigned for each user. According to certain embodiments, the electronic device 600 may obtain a user-specific expression by using the user log and store it in the database. The electronic device 600 may detect a user-specific expression from the received voice or text by using user-specific expressions stored in the database. According to certain embodiments, the user log may include at least one of a text message log, a call log, or content of a site frequently visited by the user using the electronic device 600. When content of a site frequently visited by the user is used, the topic of the site may be further detected.

According to certain embodiments, the electronic device 600 may also identify the time or place at which the text or voice is received. The electronic device 600 may further determine whether there is a counterpart of the received text or voice.

According to certain embodiments, at operation 1740, the electronic device 600 may generate a response to the received voice or text based on the detected tone and expression.

According to certain embodiments, at operation 1750, the electronic device 600 may output the generated response. The electronic device 600 may output the generated response as a voice using the speaker or as text using the display.

According to certain embodiments of the disclosure, an electronic device may include: a memory; and a processor operably connected to the memory, such that the processor may be configured to: receive a text or voice; detect a tone from the received voice or text; detect a user-specific expression from the received voice or text; generate a response to the received voice or text based on the detected tone and user-specific expression; and output the generated response as text or voice.

The processor of the electronic device according to certain embodiments of the disclosure may be configured to determine the degree of formality from the detected tone, and generate a response in further consideration of the determined degree of formality.

The processor of the electronic device according to certain embodiments of the disclosure may be configured to determine the degree of fluency from the detected tone, and generate a response in further consideration of the determined degree of fluency.

The processor of the electronic device according to certain embodiments of the disclosure may be configured to determine the degree of sentiment from the detected tone, and generate a response in further consideration of the determined degree of sentiment.

The processor of the electronic device according to certain embodiments of the disclosure may be further configured to determine whether there is a user-specific general expression from the detected expression, and determine whether there is a user-specific named-entity expression from the detected expression.

The processor of the electronic device according to certain embodiments of the disclosure may be configured to detect a user-specific expression by using a user log.

In the electronic device according to certain embodiments of the disclosure, the user log may include at least one of a text message log, a call log, or content of a site frequently visited by the user.

The processor of the electronic device according to certain embodiments of the disclosure may be configured to detect a user-specific expression by using the content of the site frequently visited by the user, and further detect a topic of the site.

The processor of the electronic device according to certain embodiments of the disclosure may be configured to identify the time or place at which the text or voice is received, and generate a response in further consideration of the identified time or place.

The processor of the electronic device according to certain embodiments of the disclosure may be configured to determine whether there is a counterpart of the received voice or text, and generate a response in further consideration of the determined counterpart.

According to certain embodiments of the disclosure, an operation method of an electronic device may include: receiving a text or voice; detecting a tone from the received voice or text; detecting a user-specific expression from the received voice or text; generating a response to the received voice or text based on the detected tone and user-specific expression; and outputting the generated response as text or voice.

According to certain embodiments of the disclosure, the operation method of the electronic device may further include determining the degree of formality from the detected tone, and generating a response to the received voice or text may be generating the response in further consideration of the determined degree of formality.

According to certain embodiments of the disclosure, the operation method of the electronic device may further include determining the degree of fluency from the detected tone, and generating a response to the received voice or text may be generating the response in further consideration of the determined degree of fluency.

According to certain embodiments of the disclosure, the operation method of the electronic device may further include determining the degree of sentiment from the detected tone, and generating a response to the received voice or text may be generating the response in further consideration of the determined degree of sentiment.

According to certain embodiments of the disclosure, in the operation method of the electronic device, detecting a user-specific expression may further include: determining whether there is a user-specific general expression from the detected expression; and determining whether there is a user-specific named-entity expression from the detected expression.

According to certain embodiments of the disclosure, in the operation method of the electronic device, detecting a user-specific expression may be detecting a user-specific expression by using a user log.

According to certain embodiments of the disclosure, in the operation method of the electronic device, the user log may include at least one of a text message log, a call log, or content of a site frequently visited by the user.

According to certain embodiments of the disclosure, in the operation method of the electronic device, detecting a user-specific expression by using content of a site frequently visited by the user may include further detecting a topic of the site.

According to certain embodiments of the disclosure, the operation method of the electronic device may further include identifying the time or place at which the text or voice is received, and generating a response to the received voice or text may be generating the response in further consideration of the identified time or place.

According to certain embodiments of the disclosure, in the operation method of the electronic device, detecting a user-specific expression may further include determining whether there is a counterpart of the received voice or text, and generating a response to the received voice or text may be generating the response in further consideration of the determined counterpart.

The electronic device according to certain embodiments may be one of various types of electronic devices. The electronic devices may include, for example, a portable communication device (e.g., a smartphone), a computer device, a portable multimedia device, a portable medical device, a camera, a wearable device, or a home appliance. According to an embodiment of the disclosure, the electronic devices are not limited to those described above.

It should be appreciated that certain embodiments of the present disclosure and the terms used therein are not intended to limit the technological features set forth herein to particular embodiments and include various changes, equivalents, or replacements for a corresponding embodiment. With regard to the description of the drawings, similar reference numerals may be used to refer to similar or related elements. It is to be understood that a singular form of a noun corresponding to an item may include one or more of the things, unless the relevant context clearly indicates otherwise. As used herein, each of such phrases as “A or B,” “at least one of A and B,” “at least one of A or B,” “A, B, or C,” “at least one of A, B, and C,” and “at least one of A, B, or C,” may include any one of, or all possible combinations of the items enumerated together in a corresponding one of the phrases. As used herein, such terms as “1st” and “2nd,” or “first” and “second” may be used to simply distinguish a corresponding component from another, and does not limit the components in other aspect (e.g., importance or order). It is to be understood that if an element (e.g., a first element) is referred to, with or without the term “operatively” or “communicatively”, as “coupled with,” “coupled to,” “connected with,” or “connected to” another element (e.g., a second element), it means that the element may be coupled with the other element directly (e.g., wiredly), wirelessly, or via a third element.

As used in connection with certain embodiments of the disclosure, the term “module” may include a unit implemented in hardware, software, or firmware, and may interchangeably be used with other terms, for example, “logic,” “logic block,” “part,” or “circuitry”. A module may be a single integral component, or a minimum unit or part thereof, adapted to perform one or more functions. For example, according to an embodiment, the module may be implemented in a form of an application-specific integrated circuit (ASIC).

Certain embodiments as set forth herein may be implemented as software (e.g., the program 140) including one or more instructions that are stored in a storage medium (e.g., internal memory 136 or external memory 138) that is readable by a machine (e.g., the electronic device 101). For example, a processor (e.g., the processor 120) of the machine (e.g., the electronic device 101) may invoke at least one of the one or more instructions stored in the storage medium, and execute it, with or without using one or more other components under the control of the processor. This allows the machine to be operated to perform at least one function according to the at least one instruction invoked. The one or more instructions may include a code generated by a complier or a code executable by an interpreter. The machine-readable storage medium may be provided in the form of a non-transitory storage medium. The term “non-transitory” simply means that the storage medium is a tangible device, and does not include a signal (e.g., an electromagnetic wave), but this term does not differentiate between where data is semi-permanently stored in the storage medium and where the data is temporarily stored in the storage medium.

According to an embodiment, a method according to certain embodiments of the disclosure may be included and provided in a computer program product. The computer program product may be traded as a product between a seller and a buyer. The computer program product may be distributed in the form of a machine-readable storage medium (e.g., compact disc read only memory (CD-ROM)), or be distributed (e.g., downloaded or uploaded) online via an application store (e.g., PlayStore™), or between two user devices (e.g., smart phones) directly. If distributed online, at least part of the computer program product may be temporarily generated or at least temporarily stored in the machine-readable storage medium, such as memory of the manufacturer's server, a server of the application store, or a relay server.

According to certain embodiments, each component (e.g., a module or a program) of the above-described components may include a single entity or multiple entities, and some of the multiple entities may be separately disposed in different components. According to certain embodiments, one or more of the above-described components may be omitted, or one or more other components may be added. Alternatively or additionally, a plurality of components (e.g., modules or programs) may be integrated into a single component. In such a case, according to certain embodiments, the integrated component may still perform one or more functions of each of the plurality of components in the same or similar manner as they are performed by a corresponding one of the plurality of components before the integration. According to certain embodiments, operations performed by the module, the program, or another component may be carried out sequentially, in parallel, repeatedly, or heuristically, or one or more of the operations may be executed in a different order or omitted, or one or more other operations may be added.

	Number	Date	Country
Parent	PCT/KR2021/017180	Nov 2021	US
Child	17539398		US

ELECTRONIC DEVICE AND OPERATING METHOD FOR GENERATING RESPONSE TO USER INPUT

Information

Publication Number

Date Filed

Date Published

Inventors

Original Assignees

CPC

International Classifications

Abstract

Description

Claims

Priority Claims (1)

CROSS-REFERENCE TO RELATED APPLICATION

Continuations (1)