Emotionally Intelligent Responses to Information Seeking Questions

Information

  • Patent Application
  • 20230298580
  • Publication Number
    20230298580
  • Date Filed
    March 18, 2022
    2 years ago
  • Date Published
    September 21, 2023
    a year ago
Abstract
A method for generating emotionally intelligent responses to information seeking questions includes receiving audio data corresponding to a query spoken by a user and captured by an assistant-enabled device associated with the user, and processing, using a speech recognition model, the audio data to determine a transcription of the query. The method also includes performing query interpretation on the transcription of the query to identify an emotional state of the user that spoke the query, and an action to perform. The method also includes obtaining a response preamble based on the emotional state of the user and performing the identified action to obtain information responsive to the query. The method further includes generating a response including the obtained response preamble followed by the information responsive to the query.
Description
Claims
  • 1. A computer-implemented method when executed by data processing hardware causes the data processing hardware to perform operations comprising: receiving audio data corresponding to a query spoken by a user and captured by an assistant-enabled device associated with the user,processing, using a speech recognition model, the audio data to determine a transcription of the query;performing query interpretation on the transcription of the query to identify: an emotional state of the user that spoke the query; andan action to perform;obtaining a response preamble based on the emotional state of the user;performing the identified action to obtain information responsive to the query; andgenerating a response comprising the obtained response preamble followed by the information responsive to the query.
  • 2. The computer-implemented method of claim 1, wherein performing the identified action to obtain the information responsive to the query further comprises querying a search engine using one or more terms in the transcription to obtain the information responsive to the query.
  • 3. The computer-implemented method of claim 1, wherein the operations further comprise: obtaining a prosody embedding based on the identified emotional state of the user that spoke the query; andconverting, using a text-to-speech (TTS) system, a textual representation of the emotionally intelligent response into synthesized speech having a target prosody specified by the prosody embedding.
  • 4. The computer-implemented method of claim 3, wherein: performing query interpretation on the transcription of the query further comprises identifying a severity of the emotional state of the user, andobtaining the prosody embedding is further based on the severity of the emotional state of the user.
  • 5. The computer-implemented method of claim 1, wherein obtaining the response preamble based on the emotional state of the user further comprises querying, using the identified emotional state of the user, a preamble datastore comprising a set of different preambles, each preamble in the set of different preambles mapping to a different emotional state.
  • 6. The computer-implemented method of claim 1, wherein obtaining the response preamble based on the emotional state of the user further comprises generating, using a preamble generator configured to receive the emotional state of the user as input, a preamble mapped to the emotional state of the user.
  • 7. The computer-implemented method of claim 1, wherein obtaining a response preamble based on the emotional state of the user further comprises determining whether the emotional state of the user indicates an emotional need.
  • 8. The computer-implemented method of claim 7, wherein determining whether the emotional state of the user comprises an emotional need is based on the content of the query.
  • 9. The computer implemented method of claim 7, wherein determining whether the emotional state of the user comprises an emotional need further comprises determining whether the emotional state of the user is associated with an emotion category.
  • 10. The computer-implemented method of claim 7, wherein the operations further comprise, when the identified emotional state of the user does not indicate an emotional need, generating the response without obtaining the response preamble.
  • 11. A system comprising: data processing hardware; andmemory hardware in communication with the data processing hardware, the memory hardware storing instructions that when executed on the data processing hardware cause the data processing hardware to perform operations comprising: receiving audio data corresponding to a query spoken by a user and captured by an assistant-enabled device associated with the user;processing, using a speech recognition model, the audio data to determine a transcription of the query;performing query interpretation on the transcription of the query to identify: an emotional state of the user that spoke the query, andan action to perform;obtaining a response preamble based on the emotional state of the user;performing the identified action to obtain information responsive to the query; andgenerating a response comprising the obtained response preamble followed by the information responsive to the query.
  • 12. The system of claim 11, wherein performing the identified action to obtain the information responsive to the query further comprises querying a search engine using one or more terms in the transcription to obtain the information responsive to the query.
  • 13. The system of claim 11, wherein the operations further comprise: obtaining a prosody embedding based on the identified emotional state of the user that spoke the query; andconverting, using a text-to-speech (TTS) system, a textual representation of the emotionally intelligent response into synthesized speech having a target prosody specified by the prosody embedding.
  • 14. The system of claim 13, wherein: performing query interpretation on the transcription of the query further comprises identifying a severity of the emotional state of the user; andobtaining the prosody embedding is further based on the severity of the emotional state of the user.
  • 15. The system of claim 11, wherein obtaining the response preamble based on the emotional state of the user further comprises querying, using the identified emotional state of the user, a preamble datastore comprising a set of different preambles, each preamble in the set of different preambles mapping to a different emotional state.
  • 16. The system of claim 11, wherein obtaining the response preamble based on the emotional state of the user further comprises generating, using a preamble generator configured to receive the emotional state of the user as input, a preamble mapped to the emotional state of the user.
  • 17. The system of claim 11, wherein obtaining the response preamble based on the emotional state of the user further comprises determining whether the emotional state of the user indicates an emotional need.
  • 18. The system of claim 17, wherein determining whether the emotional state of the user comprises an emotional need is based on the content of the query.
  • 19. The system of claim 17, wherein determining whether the emotional state of the user comprises an emotional need further comprises determining whether the emotional state of the user is associated with an emotion category.
  • 20. The system of claim 17, wherein the operations further comprise, when the identified emotional state of the user does not indicate an emotional need, generating the response without obtaining the response preamble.