A standard communications session often involves two people communicating over wireless technologies such as cell phones. Other types of communications sessions involve conferencing that may be used as an umbrella term for various types of online collaborative services including web seminars (“webinars”), webcasts, and peer-level web meetings. Such meetings can also be conducted over the phone with available wireless cell-phone technologies. Conferencing may also be used in a narrow sense to refer only to the peer-level meeting context, in an attempt to differentiate it from other types of collaborative sessions. In general, communications sessions of differing types are made possible by various communications technologies and protocols. For instance, services may allow real-time point-to-point communications as well as multicast communications from one sender to many receivers. It offers data streams of text-based messages, voice and video chat to be shared concurrently, across geographically dispersed locations. Communications applications include meetings, training events, lectures, or presentations from a web-connected computer to other web-connected computers. Most conferencing, however, is conducted between users over their cell-phones where communications are for mostly personal reasons.
This disclosure relates to retrieval of personal media content for participants of a communications session (e.g., bi-directional communications), such as relating to past memories and experiences of the participants. Such memories stored as personal media content can be retrieved based on a determined context and emotional states detected during communications between participants. A user interface is provided to present media content and enable user interaction with devices and methods executing thereon during the communications session. For example, the user interface can be executed on a device such as a cell phone or laptop computer for example. In this way, one or more participants is provided personal media content (in real-time or near real-time) that is directly relevant to the ongoing communications session without having to stop or pause the session and conduct a manual search.
As an example, an emotion detector can be implemented (e.g., as instructions executable by a processor or processors) to determine an emotional state of a given participant of the session to enable media retrieval. For example, the emotion detector can include an emotion recognizer and emotion analyzer. The emotion recognizer detects emotional state parameters from participants of the communications session, such as can be detected from audio and/or video data that is communicated as part of the session. For example, the emotional state parameters can include voice inflections, voice tones, silent periods, facial patterns, pupil dilation, eye movements, hand movements, or body movements.
A context analyzer, operating with the emotion recognizer, determines context data based on information shared between the participants during the communications session such as may include voice-recognized data specifying dates, times as well names of people, things and locations. The emotion analyzer processes the emotional state parameters and the context data. The emotion analyzer determines a probability (e.g., a confidence value) that a given participant has a predetermined emotional state (e.g., happy, sad, bored) during the communications session with respect to the context data. The emotion analyzer thus generates a media request if the emotional state probability exceeds a threshold (e.g., probability threshold indicating a given emotional state). A search function retrieves personal media content from a datastore that is associated with at least one of the participants based on the media request and provides the personal media content to the user interface.
The personal media content can be represented as a graphical user interface (GUI) icon on the user interface. Such GUI icons, for example, can be implemented as executable instructions stored in non-transitory storage media to render various shapes and sizes of interactive GUI elements that represent the retrieved personal media. The GUI thus can present a user-interactive representation of the personal media content such as in the form of a thumbnail, high-resolution images, annotated graphical images, or animated graphical representations. Examples of personal media content include images, audio files, audio/video files, text files, and so forth. A sharing function can be selected (e.g., in response to a user input) via the GUI icon to activate a sharing function to control sharing the retrieved personal media content when the displayed personal media content is selected by the participants. For example, the sharing function can include executable program code to invoke a print option, a save option, a social media posting option, a text message option, or an e-mail option to enable the participants to share the personal media content with others. The GUI can also be selected to execute other functions, such as can include an editing option that allows a user to filter, add content, or alter the retrieved personal media content.
An emotion detector 124, which is implemented as executable instructions, detects an emotional state of a given participant of the communications session and generates a media request 130 based on a detected emotional state and the context data. A representation of personal media content 140 is provided to at least one of the participants based on the media request 130. For example, the representation can include an image that is displayed to one of the participants and/or can include another tangible representation, such as a printed version (e.g., a photograph) of the personal media content 140 that is retrieved. As will be described below with respect to
In an example, the non-transitory medium 100 can be executed as part of a system having a video chat with a touch screen for the user interface 120. Other peripherals can be connected to the system such as a printer or storage device. The system can listen (e.g., via voice recognition program code executing on the device or remotely) for a key word—for example “Remember when . . . ” to information (e.g., derived from context and/or emotional data) about a past memory between participants. Relevant information might include location, rough time, time of day, venue information and people who were part of the event. With this information the system can retrieve photos from a datastore (or datastores) that is associated with the user, such as a local memory of the device executing the non-transitory medium 100 or remote storage (e.g., the user's cloud storage, such as a public and/or private cloud storage system for photo storage, video storage or other cloud system where the user stores media). The user can then choose a sharing function such as for printing out a memorable photo that has been retrieved (e.g., on a local or remote printer, such as cloud print application). The system can respond to more than one keyword that is extracted from the audio/video data during the communications session.
In some examples, the emotion detector 124 is configured to employ facial and/or gesture recognition as part of the emotion recognition process, such as based on video data during the communication session. As mentioned, recognized facial expressions and/or gestures can be utilized to determine an emotional state used to implement retrieval of personal media content. With access to social media information, the retrieval process can be utilized to retrieve socially relevant personal media content—matching photos using facial recognition to profile photos of people mentioned. In addition to display, if there are many retrieved photos, the user can employ user interface touch-screen gestures to scroll through photos and select from a plurality of sharing options by touching graphical icons representing the retrieved information and as will be described below.
The emotion detector 204 includes an emotion recognizer 250 to detect emotional state parameters based on data communicated between the participants of the communications session. For example, the emotional state parameters can include voice inflections, voice tones, silent periods, facial patterns, pupil dilation, eye movements, hand movements and/or body movements that can indicate a given emotional state of a user. An emotion analyzer 260 processes the emotional state parameters and the context data from the context analyzer 210. The emotion analyzer 260 determines a probability of the emotional state of the given participant of the communications session with respect to the context data and generates the media request 230 in response to the probability exceeding a threshold 270. In one example, audio and/or video can be analyzed by artificial intelligence (AI) instructions to assign probabilities (e.g., confidence or likelihood value) to a detected emotional state or multiple such states. In one example, the AI instructions can be implemented by a classifier such as a support vector machine (SVM).
As mentioned, the context data from the context analyzer 210 can represent a date, person, time and/or place that is derived from audio and/or video recognition of the communications data. For example, the context data may represent a prior experience involving the participants to the communications session. The search function (e.g., instructions to search a database) 208 can execute a search of datastores 244 based on the context data to retrieve the personal media content 240. The datastores 244 may include personal storage (e.g., local and/or remote non-transitory storage media) that is associated with at least one of the participants. The search function 208 provides the personal media content 240 to the communications interface 220 based on the media request 230. The personal media content 240 can be stored on the datastore with metadata that includes or is a semantic equivalent to emotional metadata tags or context metadata tags determined by the emotion detector 204 and context analyzer 210, respectively. The search function 208 can match the at least one of the emotional or context metadata tags matched with the determined emotional state and context data to retrieve the personal media content 240.
For example, the sharing function 320 can be activated to enable sharing of the retrieved personal media content described herein in response to the GUI being selected by a user input such as shown at 310. The sharing function can be activated by executing instructions that associate the GUI icon with a list of predetermined automated actions such as printing actions, saving/storage actions, e-mail generation, text message generation, and so forth. The automated actions assigned to the GUI icon can be edited by a given user by entering an e-mail addresses, cell phone numbers, social media pages, and so forth where retrieved content may be further shared. By way of example, the sharing function 320 can include at least one of a print option, a save option, a social media posting option, a text message option, or an e-mail option. In another example, the sharing function 320 can include an editing option to enable participants to filter, add content, or alter the retrieved personal media content in response to user input.
In another example, the sharing function includes or activates executable instructions to implement an editing option to allow users to filter, add content, or alter the retrieved personal media content in response to user input. Although not shown, instructions can be provided in the non-transitory memory 410 for an emotion detector to detect an emotional state of a given participant of the communications session and to generate a media request that is used by the search function 450 to retrieve the personal media content based on the detected emotional state and the context data. As noted previously (e.g.,
In view of the foregoing structural and functional features described above, an example method will be better appreciated with reference to
What has been described above are examples. One of ordinary skill in the art will recognize that many further combinations and permutations are possible. Accordingly, this disclosure is intended to embrace all such alterations, modifications, and variations that fall within the scope of this application, including the appended claims. Additionally, where the disclosure or claims recite “a,” “an,” “a first,” or “another” element, or the equivalent thereof, it should be interpreted to include one such element and neither requiring nor excluding two or more such elements. As used herein, the term “includes” means includes but not limited to, and the term “including” means including but not limited to. The term “based on” means based at least in part on.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2018/053386 | 9/28/2018 | WO | 00 |