This patent document contains material subject to copyright protection. The copyright owner has no objection to the reproduction of this patent document or any related materials in the files of the United States Patent and Trademark Office, but otherwise reserves all copyrights whatsoever.
This application is related to U.S. Provisional Patent Application No. 61/860,222, titled “Unified and Consistent Multimodal Communication Framework,” filed Jul. 30, 2013, the entire contents of which are fully incorporated herein by reference for all purposes. A copy of Application No. 61/860,222 is included herewith as Appendix A hereto and which is considered part of this application.
This invention relates to a framework for communication, and, more particularly, to a unified framework for multi-modal consistent communication supporting voice conversations.
Other objects, features, and characteristics of the present invention as well as the methods of operation and functions of the related elements of structure, and the combination of parts and economies of manufacture, will become more apparent upon consideration of the following description and the appended claims with reference to the accompanying drawings, all of which form a part of this specification. None of the drawings are to scale unless specifically stated otherwise.
FIGS. 4 AND 5A-5B show aspects of exemplary data structures in accordance with embodiments hereof;
As used herein, unless used otherwise, the following terms or abbreviations have the following meanings:
API means application programming interface;
CA means Certificate Authority;
CRL means certificate revocation list;
GUI means graphical user interface (UI);
HTTP means Hyper Text Transfer Protocol;
HTTPS means HTTP Secure;
IP means Internet Protocol;
IPv4 means Internet Protocol Version 4;
IPv6 means Internet Protocol Version 6;
IP address means an address used in the Internet Protocol, including both IPv4 and IPv6, to identify electronic devices such as servers and the like;
JSON means JavaScript Object Notation;
MIME means Multipurpose Internet Mail Extensions;
OCSP refers to the Online Certificate Status Protocol;
PKI means Public-Key Infrastructure;
POTS means Plain old telephone service;
TCP means Transmission Control Protocol;
UI means user interface;
URI means Uniform Resource Identifier;
URL means Uniform Resource Locator;
VKB means virtual keyboard; and
VOIP means Voice of IP.
Computers and computing devices, including so-called smartphones, are ubiquitous, and much of today's communication takes place via such devices. In many parts of the world, computer-based inter-party communication has superseded POTS systems. Much of today's computer-based communication is built on old protocols that were designed to provide simple messages between devices on the homogeneous networks.
The inventors realized that existing communications systems do not support a consistent model of voice and notification within conversations over multiple heterogeneous devices.
It is desirable to provide a system that can maintain or provide consistent model of voice and notification with conversations over multiple heterogeneous devices.
Overview—Structure
Those of ordinary skill in the art will realize and appreciate, upon reading this description, that a particular user/device association may change over time, and further, that a particular device may be associated with multiple users (for example, multiple users may share a computer).
It should be appreciated that a user 102 may not correspond to a person or human, and that a user 102 may be any entity (e.g., a person, a corporation, a school, etc.).
Users 102 may use their associated device(s) 104 to communicate with each other within the framework 100. As will be explained in greater detail below, a user's device(s) may communicate with one or more other users' device(s) via network 106 and a backend 108, using one or more backend applications 112. The backend 108 (using, e.g., backend application(s) 112) maintains a record/history of communications between users in one or more databases 110, and essentially acts as a persistent store through which users 102 share data.
The backend database(s) 108 may comprise multiple separate or integrated databases, at least some of which may be distributed. The database(s) 108 may be implemented in any manner, and, when made up of more than one database, the various databases need not all be implemented in the same manner. It should be appreciated that the system is not limited by the nature or location of database(s) 108 or by the manner in which they are implemented.
It should be appreciated that multiple devices 104 associated with the same user 102 may be communicating via the backend 108 at the same time (for example, as shown in the drawing, some or all of the devices 104-A-1, 104-A-2 . . . 104-A-n associated with user 102-A may be communicating via the backend 108 at the same time).
Devices
The devices 104 can be any kind of computing device, including mobile devices (e.g., phones, tablets, etc.), computers (e.g., desktops, laptops, etc.), and the like. Computing devices are described in greater detail below.
Each device preferably includes at least at one display and at least some input mechanism. The display and input mechanism may be separate (as in the case, e.g., of a desktop computer and detached keyboard and mouse), or integrated (as in the case, e.g., of a tablet device such as an iPad or the like). The term “mouse” is used here to refer to any component or mechanism the may be used to position a cursor on a display and, optionally, to interact with the computer. A mouse may include a touchpad that supports various gestures. A mouse may be integrated into or separate from the other parts of the device. A device may have multiple displays and multiple input devices.
As used herein, the term “mechanism” refers to any device(s), process(es), service(s), or combination thereof. A mechanism may be implemented in hardware, software, firmware, using a special-purpose device, or any combination thereof. A mechanism may be integrated into a single device or it may be distributed over multiple devices. The various components of a mechanism may be co-located or distributed. The mechanism may be formed from other mechanisms. In general, as used herein, the term “mechanism” may thus be considered to be shorthand for the term device(s) and/or process(es) and/or service(s).
Device 104-B (
As another example, a device may be integrated into a television or a set-top box or the like. Thus, e.g., with reference again to
Device 104-C (
These exemplary devices are shown here to aid in this description, and are not intended to limit the scope of the system in any way. Other devices may be used and are contemplated herein.
Backend
A conversation stored and maintained in the backend is considered to be the “true” or authoritative version of that conversation within the system. As such, to the extent that any other version of that conversation exists within the system, e.g., on any device, the version stored and maintained in the backend is considered to be the correct and authoritative version of that conversation. If there are any discrepancies between a conversation version in the backend and any other version of that conversation that might exist in the system, the version of the conversation in the backend controls and is authoritative within the system (and is thus considered to be the “true” version of that conversation).
It should be appreciated that the categorization of data in the backend database(s) is made for the purposes of this description, and those of ordinary skill in the art will realize and appreciate, upon reading this description, that different and/or other categorizations and/or organizations of the data may be used. It should also be appreciated that the backend database(s) 110 preferably include appropriate index data to facilitate fast and efficient access to and update of the various data stored therein.
Each user 102 within the system 100 is preferably identified by an internal user identifier (user ID) that is unique within the system 100. This user ID may be assigned to the user in a registration process. While in a presently preferred implementation there is no explicit signup of users, in an exemplary registration process users may register via one or more external services (114,
With reference again to
It should be appreciated that system is not limited by the manner in which a user ID is assigned to a user. It should also be appreciated that the system is not limited by whether or how registration takes place. Those of ordinary skill in the art will realize and appreciate, upon reading this description, that although registration is described herein via external services, different and/or other registration techniques may be used.
User information such as the user ID and a user's foreign ID(s) may be stored in the user data 140 the backend database(s) 110.
Device Identifiers (IDs)
Each client/device 104 using or associated with the system 100 needs a device identifier (device ID). Each device is associated with at least one user, and the device identifier is unique within the system 100 for a single user. Thus, within the system 100, there may be identical device identifiers, but each <user ID, device ID> pair will be unique.
Authentication
A client/device 104 using the system 100 preferably needs to authenticate itself with the system. In presently preferred implementations, there are two options for authentication: client certificates and access tokens. Client certificates are the preferred approach, and the access token approach is preferably only used to facilitate clients that cannot use client certificates (e.g., web applications). An exemplary client/device 104 may have one or more client certificates and/or access tokens associated therewith. Typically a client/device 104 will have either certificate(s) or tokens, depending on the type of connection that client/device will make with the backend 108.
The system 100 may use or provide one or more certificate authorities (CAs) 119 (
Connecting a Device to a User
As noted, a device 104 needs to be associated with a user 102 in order to operate within the system 100. When connecting a device 104 to a user 102, the client application 114 has to obtain and authenticate a foreign ID from the user 102. Next the client application 114 has to use the foreign ID and authentication data it obtained from the corresponding foreign service and request either a certificate or an access token from the backend 108.
Acquiring the Foreign ID
As some of these foreign services may be embedded into mobile devices, in presently preferred implementations it is the device's responsibility to perform authentication with the foreign service and to acquire the necessary data to allow the system to confirm successful authentication with the service.
Within an OAuth-based authorization service, authentication typically provides an access token. When proceeding with connecting the device, the application has to provide information about the foreign services authentication data as parameters of the request.
Acquiring a Client Certificate
In order to acquire a client certificate, a client/device 104 requests a certificate from the backend 108. The request is preferably in the form of a certificate signing request that is made of the backend 108. Preferably information about the foreign authentication is included in the request. In some cases the certificate request includes information about the device.
Managing the Requesting Device and Its User
With reference again to
The optional Agent Information may include the following:
An authenticated device 104 may ascertain information about itself, its user, and operations directly related to the two from the backend 108. In a presently preferred implementation a device 104 may obtain some or all of the following information: user ID and device ID.
In addition the device may also obtain attributes of the user's profile. In a presently preferred implementation, the device requests the information using an HTTPS GET request and the information is returned to the device in an object (e.g., a JSON response).
Conversations
Recall from above that a “conversation” is the term used herein to refer to an ongoing interaction between a set of one or more users. In some aspects, a conversation may be considered to be a time-ordered sequence of events and associated event information or messages. The first event occurs when the conversation is started, and subsequent events are added to the conversation. The time of an event in a conversation is the time at which the event occurred on the backend.
The Event Information Associated with an Event—Contents of Conversations
Events in a conversation may be represented as or considered to be objects, and thus a conversation may be considered to be a time-ordered sequence of objects. An object (and therefore a conversation) may include or represent text, images, video, audio, files, and other assets. As used herein, an asset refers to anything in a conversation, e.g., images, videos, audio, links (e.g., URIs) and other objects of interest related to a conversation.
As will be understood and appreciated by those of ordinary skill in the art, upon reading this description, in some aspects, a conversation may be considered to be a timeline with associated objects. Each object in the conversation may represent or correspond to an event. Thus, in some aspects, an event may be considered to be a <time, object> pair, and a conversation is collection of such events for a particular user set.
It should be appreciated that the time interval between any two objects may differ. The time intervals between events (including adjacent events) in a conversation may, e.g., fractions of a second, hours, days, weeks, months, years, etc.
An object may contain the actual data of the conversation (e.g., a text message) associated with the corresponding event, or it may contain a link or reference to the actual data or a way in which the actual data may be obtained. For the sake of this discussion, a conversation object that contains the actual conversation data is referred to as a direct object, and a conversation object that contains a link or reference to the data (or some other way to get the data) for the conversation is referred to as an indirect or reference object. A direct object contains, within the object, the information needed to render that portion of the conversation, whereas an indirect object requires additional access to obtain the information needed to render the corresponding portion of the conversation. Thus, using this terminology, an object may be a direct object or an indirect object.
As used herein, the term “render” (or “rendering”) with respect to data refers to presenting those data in some manner, preferably appropriate for the data. For example, a device may render text data (data representing text) as text on a screen of the device, whereas the device may render image data (data representing an image) as an image on a screen of the display, and the device may render audio data (data representing an audio signal) as sound played through a speaker of the device (or through a speaker or driver somehow connected to the device), and a device may render video data (data representing video content) as video images on a screen of the device (or somehow connected to the device). The list of examples is not intended to limit the types of data that devices in the system can render, and the system is not limited by the manner in which content is rendered.
It should be appreciated that a particular implementation may use only direct objects, only indirect objects, or a combination thereof. It should also be appreciated that any particular conversation may comprise direct objects, indirect objects, or any combination thereof. The determination of which conversation data are treated as direct objects and which as indirect objects may be made, e.g., based on the size or kind of the data and on other factors affecting efficiency of transmission, storage, and/or access. For example, certain types of data may be treated as indirect objects because they are typically large (e.g., video or images) and/or because they require special rendering or delivery techniques (e.g., streaming).
As used herein, the term “message” refers to an object or its (direct or indirect) contents. Thus, for a direct object that includes text, the message is the text in that direct object, whereas for an indirect object that refers to an asset, the message is the asset referred to by the indirect object.
A presently preferred implementation uses a combination of direct and indirect objects, where the direct objects are used for text messages and the indirect objects are used for all other assets. In some cases, text messages may be indirect objects, depending on their size (that is, an asset may also include or comprise a text message). It should be appreciated that even though an asset may be referenced via an indirect object, that asset is considered to be contained in a conversation and may be rendered (e.g., displayed) as part of (or apart from) a conversation.
Each conversation has a unique conversation identifier (ID) that is preferably assigned by the backend 108 when a conversation between a set of users begins.
A message can only belong to one conversation and should not move to another conversation. Note, however, that in some cases a conversation may become part of another (e.g., new) conversation (e.g., as its parent), in which case the messages in the conversation may be considered to be part of the new conversation.
A direct object preferably includes the actual contents of the conversation (e.g., the message). A direct object may also include some or all of the following attributes:
An indirect object may have some of the same attributes as a direct object, except that instead of the actual contents of the conversation, an indirect object includes asset metadata.
Assets typically originate outside of the system 100. Preferably each asset is obtained and stored by the system 100 before (or while) being provided to users in a conversation. In this manner a conversation will be reproducible at future times, without reliance on the persistence of external content. Assets are maintained and stored by the backend 108 in assets 144 in the database(s) 110.
The asset metadata may include some or all of the following:
It will be appreciated, as discussed below, that each device should be able to render each asset in some manner. The device may use the content type to determine how the asset should be rendered. In some cases, a device may be able to determine from the location information how to render the asset (e.g., the location information may have type information or other information encoded therein), or the device may determine the type of the asset after obtaining some of or the entire asset.
The conversation data 142 in the database(s) 110 on the backend 108 contains conversation metadata 402 and participant(s) list(s) and may include some or all of the following attributes (with reference to
Each item on the users attribute 416 is an object preferably containing some or all of the following attributes:
It should be appreciated that various techniques may be used to point to the location of an asset. For example, the location data may be a URL or a URI. Those of ordinary skill in the art will appreciate and understand, upon reading this description, that different and/or other ways of pointing to or referring to assets may be used, and that the system is not limited by the manner or location(s) in which asset locations are specified. It should further be appreciated that a particular system may use more than one technique to refer to assets. The different techniques may depend, e.g., on the type or size of an asset.
It should be appreciated that the assets in a conversation (i.e., the assets referenced by indirect objects in the conversation) may be of different types (e.g., audio, pictures, video, files, etc.), and that the assets may not all be of the same size, or stored in the same place or in the same way. Thus, for example, in an exemplary conversation one asset may be a photograph, another asset may be a video, and yet another asset may be a PDF file, and so on.
Those of ordinary skill in the art will appreciate and understand, upon reading this description, that the phrase “an asset may be a video” means that “the asset may refer to content that represents a video.” In general, referring to an asset as being of some type (e.g., audio, picture, video, file, text, etc.), means that the asset comprises data representing something of that type. So that, e.g., if an asset is an audio asset, this means that the asset comprises data representing audio content, if an asset is an image, this means that the asset comprises data representing video content, and so on. It should be appreciated that all assets are essentially data, possibly with associated metadata, and that the type of an asset will affect what the data in an asset represent and, possibly, how those data are to be rendered. For example, when an asset is a video, the asset comprises data (and possibly metadata) representing video content in some form. When appropriately rendered, the video asset data will comprise a video (video images, possibly including audio).
The system may store multiple copies of the same asset.
As used herein, a user participating in a conversation is said to be conversing or engaging in that conversation. The term “converse” or “conversing” includes, without any limitation, adding any kind of content or object to the conversation. It should be appreciated that the terms “converse” and “conversing” include active and passive participation (e.g., viewing or reading or listening to a conversation). It should further be appreciated that the system is not limited by the type of objects in a conversation or by the manner in which such objects are included in or rendered within a conversation.
It should be appreciated that a conversation may also include conversation metadata (e.g., data about when events occur within the conversation). Conversation metadata may be treated as conversation objects, with their rendering (if at all) being dependent, e.g., on system policies. For example, when a user Jane leaves a conversation, that event may be a conversation object with the text “Jane has left the conversation.” In this example, a device may render the object as text on a display. Some other metadata (e.g., system status information) may be stored as part of the conversation but not rendered by any device.
Identifying Conversations
Within the system 100, a conversation may have one or more participants and is uniquely identified (e.g., using a conversation ID 404 in
Note that the set of users in a conversation is preferably unordered, so that a conversation between users A and B may be considered to be the same as the conversation between users B and A.
Users can be added to and/or removed from a conversation, and the manner of doing so is described in greater detail below.
As noted above, when a conversation between a set of users begins, that conversation is given a unique conversation identifier (conversation ID, e.g., 404 in
Users may be added to (or removed from) a conversation. A change in the membership of a conversation may cause a new conversation ID to be generated, based on the new membership. In order to facilitate access to the entire interaction/communication history, for each conversation the system may also maintain information about conversation(s) from which a current conversation originated (e.g., a parent conversation). For example, a first conversation starts at time T0 between user A and user B may be given conversation ID “#AB”. At a later time T1>T0 user C is added to the conversation between user A and user B. The new conversation may be given conversation ID “#ABC”, and its parent ID is set to refer to conversation “A#B”. In this example the conversation metadata 402 for the conversation with users A and B would have conversation ID 404 set to “#AB” and parent 406 unset (or null). The user(s) 416 would be set to “{A, B}”. For user A, the conversation name 420 may be set to “B”, whereas for user B, the conversation name 420 may be set to “A”. When user C is added to the conversation (at time T1), the conversation ID for the new conversation may be set to “#ABC”, the parent 406 for the new conversation may be set to “#AB” (previously it was null or unset), and the users 416 may be set to “{A, B, C}”. The conversation name 420 for conversation “ABC” for user A may be set to “B and C”, conversation name 420 for user B to “A and C”, and conversation name 420 for user C to “A and B”.
It should be appreciated that these settings are provided only by way of example, and are not intended to limit the scope of the system in any way.
In some embodiments there may only be (or the system may only allow) one conversation for any set of users. That is, in some embodiments, a set of users defines (and may only define) one and only one conversation. For example, in such embodiments, there can only be one conversation between users A and B. Some other embodiments may allow for multiple conversations with the same set of participants. In such embodiments a set of users may define more than one conversation, so, e.g., in such embodiments there may be multiple conversations between users A and B. Those of ordinary skill in the art will realize and appreciate, upon reading this description, that when the system allows multiple conversations between identical participants, the internal naming of conversations (the conversation ID) will be a function of more than just the user IDs of its participants. An exemplary function to generate a unique conversation ID may use time of day as an input along with the user IDs.
Techniques and policies for merging and splitting conversations between and among users and for adding and removing users from conversations are described in greater detail below.
As described above, the backend 108 (backend application(s) 112) maintains conversations (as identified by the user IDs of the users having the conversation—i.e., the participants) in a conversation database 142 in database(s) 110 (
As shown in
The user's non-public or internal system information 506 may include conversation information 516, the user ID 146 and foreign IDs 148, information about the user's device(s) 518 (e.g., a list of all devices associated with the user, perhaps by device ID), and other information 519.
It should be appreciated that the categorization of user information described here is given by way of example, and that different and/or other categorizations of the user information may be made. It should also be appreciated that the system need not impose a strict boundary between public/visible and non-visible information for users. It should further be appreciated that the term “private” does not imply any legal notion of privacy and is used here to describe which information may or may not be visible to other users in the system.
The system 100 may allow the user to set and possibly change some or all of the user's public information.
With reference again to
The information described above with reference to
With reference to
The information described above with reference to
Each user 102 in a conversation may interact with that conversation using any device 104 associated with that user (as identified by the user ID), and changes to a conversation are preferably reflected in each present device 104 associated with all users present in the conversation, with the version of the conversation in the conversation database 152 being considered to be the true version of the conversation.
As used herein, a device is “present” if it is online and accessing (or has access to and/or is accessible from) the backend 108. Whether a particular device is present or not present in the system is sometimes referred to as the device's “presence” in the system. A device's presence may be determined in a number of ways. For example, in some case the system may maintain persistent connections with some devices, and, in those cases, the existence of a persistent connection with a device may be used to determine that the device is present. Those of ordinary skill in the art will appreciate and understand that different and/or other techniques may be used to determine whether a device's presence, and that the system is not limited by the manner in which presence is determined.
By way of example, user A may have n devices (device A-1, device A-2, . . . device A-n) associated therewith; user B may have in devices (device B-1, device B-2, . . . device B-n) associated therewith; and user C may have devices 104-C associated therewith. At any time, some or all of user A's devices may be online (or offline). Similarly for user B's devices and user C's devices. User A's devices that are offline (whether turned on or off) are considered to be not present, whereas user A's devices that are online may be considered to be present.
When users A and user B converse, e.g., in a conversation #AB, the messages of that conversation and any other changes to that conversation are preferably reflected in each present device associated with users A and B. The backend 108 maintains conversation #AB in the database(s) 110, and each present device of each participant in that conversation (users A and B) preferably has a view of the conversation #AB consistent with the version in the backend database(s) 110.
If another user C joins the conversation (e.g., as conversation #ABC), the messages of that conversation and any other changes to that conversation are preferably reflected in each present device associated with users A, B, and C. The backend 108 maintains conversation #ABC in the database(s) 110, and each present device of each participant in that conversation (users A, B, and C) preferably has a view of the conversation ABC consistent with the version in the backend database(s) 110.
Users may interact with and view conversations on their devices 104 via user interfaces (UIs) 128 on the respective devices. A user interface 128 provides a view or rendering of an aspect or portion of a conversation (possibly the entire conversation). A view of a conversation within a time period is a view of the objects in that conversation in that time period.
While each present device of each user in a conversation should have the same conversation data, it should be appreciated that the UI of each device 104 of each user associated with a particular conversation may be viewing a different portion of the conversation.
Note that the views of a conversation may cover non-overlapping time periods (as in the example just described), and that the time periods need not be of the same duration or contain the same number of events. That is, each device that has a view of a conversation may have a different view, starting at a different and arbitrary time within the conversation and having an arbitrary duration. However, when two or more devices do have overlapping views of a conversation, the overlapping portions of those views are preferably the same (or contain the same information, however rendered). That is, when two or more devices have views of the same time period of a particular conversation, those views are preferably of the same events/messages, however rendered.
As noted above, while it should be appreciated that while the two devices should have a consistent view of any overlapping data, those of ordinary skill in the art will realize and appreciate, upon reading this description, that actual rendering of that view on different devices may differ. In other words, while the data (objects) corresponding to a view may be identical for two devices, each of those devices may render or present those data differently to the user. For example, the format of the data (e.g., video data or audio data or text data or image data) presented may differ for different devices, even though they are rendering the same data.
Those of ordinary skill in the art will realize and appreciate, upon reading this description, that fully overlapping views of the same conversation data will be the same (although they may be presented differently, e.g., due to device differences, rendering differences, etc.). Furthermore, it should be appreciated that overlapping views of different participant users in the same conversation will preferably also be the same.
As conversations are consistent on all devices, the overlapping views show the same information (regardless of how formatted or rendered). Those of ordinary skill in the art will realize and appreciate, upon reading this description, that the same information may be presented in different manners on different devices, especially on devices of different types. For example, a region or time period which is common to all three views may be presented or rendered differently on a device that is a mobile phone from a device that is running a web-based application using a laptop computer.
As described above, a conversation may be considered to be or include a sequence of events. Some events may involve sending data from one participant to another (or others) via the backend, and some of the events may involve receiving data from another participant (again, via the backend). A conversation may comprise different types of data, e.g., text/message data, metadata, voice data, control data, etc. Those of ordinary skill in the art will realize and appreciate, upon reading this description, that different and/or other categorizations of conversation data may be used. For example, a particular implementation of the system may split metadata into two distinct metadata channels, one for system data and the other for participant data (e.g., “knock, knock” type of data).
A conversation may thus be considered to comprise or be a collection of multiple logical channels, over which data may be sent and received by the conversation's participant(s), with different types of content (data) requiring or using different logical channels.
As shown in
Each type of data may use its corresponding channel. For example, text data within a conversation may use the text channel 602, asset data within that same conversation may use the asset channel 605, metadata within that same conversation may use the metadata channel 606, voice data within that conversation may use the voice channel 608, and control data within that conversation may use the control channel 610.
Each logical channel in a conversation may have its own semantics, policies, and rules, associated therewith, as implemented within the framework 100. In that manner, each type of content may be processed/treated in an appropriate manner for that type of content. For example, in presently preferred implementations, text data (e.g., text messages) are received by intended recipients without any affirmative acts by the intended recipients. On the other hand, as will be explained in greater detail below, voice data (also referred to herein as “voice”) can be treated differently from other types of content (e.g., in some preferred implementations a recipient may have to affirmatively accept voice data). It should be appreciated that requiring an affirmative acceptance of voice data is a policy/implementation feature for the voice channel, and is not a limitation of the system.
The term “voice data,” as used herein, refers generally and without limitation to any kind of data that may be rendered as audio or sound data. Thus, voice data or “voice” may include speech (including synthesized speech) and music.
The various channels may use different types of processing, e.g., in the backend, depending on the way in which the corresponding types of data are expected to be processed. Thus, although voice may be treated/considered as just another kind of data being sent/received (via the backend), voice differs from other data types/content in that (a) voice occurs in “real time” (discussed below); and (b) the recipient preferably has to affirmatively accept the voice data (although, as noted, that is just an implementation/policy decision).
For other purposes, “voice” may be treated as data (content) that is sent/received in a conversation, albeit in its own logical channel.
Thus, in the drawing in
In the drawings (
As noted earlier, the drawings are not to scale (unless specifically stated otherwise), and so it should be appreciated that the time period (T1 to T2) being depicted in the example conversation 600 in
As noted above, in preferred embodiments each participant may choose whether or not to accept voice data from other participants. A participant may preferably chose whether or not to receive voice data at any time during a conversation, and may switch between receiving voice data and not receiving voice data at any time during a conversation. In addition, in some embodiments a particular participant may chose to receive voice data from some other participants and not others during the same time period. In preferred embodiments, a participant may choose to receive voice data on all present devices, on some devices and not others, or on no devices. In some embodiments a user may choose to receive data from some participants on some devices and from other participants on other devices. The selection (acceptance or rejection) of voice data is preferably controlled by a user interface on the user's devices, although in some implementations users may be able to set default control values for voice data. An exemplary UI is described below. Such default control values may be based, e.g., on time of day, which devices are present, the identity of the voice originator, etc. Thus, e.g., a user (participant) may set a default to no voice (without specific acceptance) from 10 PM to 7 AM on all devices; or to no voice (without specific acceptance) unless the voice is from participant Px. As another example, a user (participant) may set a default to no voice without specific acceptance at all times on all devices except device D which, if present, may accept all voice from all participants. This default effectively sets device D to a voice-accepting device whenever device D is present. The default is preferably no voice without specific acceptance at all times and for all participants and on all devices. It should be appreciated that these examples are not limiting of the kinds of control that the system may provide for voice data.
A user may be able to accept voice data at any time after that voice data has begun and arbitrarily to switch back and forth between accepting the voice data and not accepting the voice data. For example,
In the example in
It should be appreciated that in some embodiments a particular user (participant) may be accepting voice from some participants and not others during the same time period. Thus, e.g., with reference again to the example conversation 600″ in
A user may accept voice data on one or more present devices simultaneously.
Some implementations may limit the number of devices on which a participant may receive simultaneous voice data from the same conversation.
As noted, various channels may use different types of processing, e.g., in the backend, depending on the way in which the corresponding types of data are expected to be processed. A channel may thus provide a way to define and control the manner in which the system handles and processes different kinds of data within a conversation. Each channel may thus have semantics associated therewith, where the semantics of a channel define how the system operates on data in that channel.
By way of example, in some implementations, the semantics of the text channel are that text messages are immediately sent to all present devices of all participants. The semantics of the asset channel are that when an asset is sent, a placeholder is immediately put in the conversation and sent to all present devices of all participants. But the actual asset is then uploaded in a separate process (which may be right away, or delayed). The semantics of the metadata channel may depend on the kind of metadata. For example, metadata about a user leaving or joining the conversation may be displayed on participants' devices, whereas some system metadata may not be rendered at all. And the semantics of the voice channel may include that intended recipients have to actively accept participation. Those of ordinary skill in the art will realize and appreciate, upon reading this description, that these are only examples of semantics of various channels. Within a particular implementation, a channel's semantics may include different and/or other processing rules and/or procedures. The system may have the same semantics for different channels.
Channels and channel semantics may also provide discriminatory handling and processing of data by the backend. For example, the backend may search and/or index all data in the text channel, but not necessarily do the same for all asset data or voice data.
As discussed above, with reference again to
A user may use (e.g., initiate or accept) voice on each voice channel in each conversation associated with that user. Thus, as shown in
Note that the drawing does not distinguish between the user's devices, and those of ordinary skill in the art will realize and appreciate, upon reading this description, that when using voice, the user may be using voice on one or more of the user's devices.
While there is preferably no overlap between voice use on different voice channels (i.e., in different conversations), in some embodiments a user may be actively involved in different conversations on different devices, in which cases the user may be using voice simultaneously in more than one conversation. For example, as shown in
Some implementations may limit the number of voice channels (or conversations) a user may be simultaneously involved in, even on different devices.
Getting Users' Attention—Knocks and Knocking
The etiquette associated with VOIP calls has changed from that of POTS calls. VOIP cold calls are rare, and there is typically a back and forth between the participants using text messaging or email or the like to establish times for such calls. (e.g., “Are you there?” “Can I call now?” “How about in 5 minutes?”).
The system as described herein provides suitable support for such approaches. As described, embodiments of the system provide a substantially permanent voice channel per conversation that any participant in the conversation can join and leave at any time.
It is therefore desirable and useful for users to be able to get each other's attention in some way, whether or not they are involved in a conversation.
In the communication framework described herein a conversation, once begun, generally does not end even when there is no activity in that conversation. This is particularly the case for conversations actively using voice, since the preferred default behavior of the system is to require users to affirmatively accept voice data.
As noted before, the backend provides a persistent store through which users share data. Users may therefore connect or reconnect to a conversation at any time and obtain essentially the entire conversation, including parts of the conversation for which they were absent (i.e., for which they did not have any device present or for which they had devices present but did not accept some aspects of the conversation, such as voice).
It should be appreciated that in some embodiments the backend may store voice data associated with each conversation, although this option is less preferred. When voice data are stored, they are preferably stored in a compressed form. Even though the backend may, in some embodiments, maintain and store voice data, voice conversations are preferably real-time. Thus, regardless of whether or not the backend maintains voice data, the nature of voice conversations makes it generally preferable to have other conversation participants listening.
It is therefore generally desirable for each user to be able to get the attention of other users, regardless of whether or not they are already in a conversation with each other.
The approach described herein whereby a user tries to get the attention of one or more other users is referred to herein as “knocking.” The term “knocking” refers generally to an approach, within the system, whereby one user tries to get the attention of one or more other users. In some aspects, as used herein, the term “knocking” refers to the act(s) or process(es) of getting one or more user's attention. As used herein, the user doing the knocking is sometimes referred to as the “knocker,” and a user whose attention a knocker is trying to get is sometimes referred to as a “recipient.”
While knocking is particularly useful for voice conversations (i.e., conversations which are actively using voice) it should be appreciated and understood that knocking is applicable to other aspects of conversations. Thus, for example, knocking may be used before, during, or after joining a voice channel of a conversation. Even in the embodiments in which uses do not have to affirmatively accept voice data (i.e., when the default is to accept all voice data) knocking may be used by a user to get or try to get the attention of other users.
As will be described below, the system's user interface preferably provides ways for users to get each other's attention, e.g., by so-called knocking. When a particular user knocks, in order to get the attention of one or more other users, each of the other users is preferably given some indication that the particular user is trying to get their attention. As the user interface supports multiple views (e.g., a list view in which other uses are listed, a conversation view in which conversations are listed, a conversation list view, etc.), the user interface may provide different forms of knock indication depending on the view.
The user interface preferably also provides ways for users to respond to a knock (i.e., to an attempt to get their attention). In this context, a “knock” may refer to an indication on a user's device that another user is trying to get their attention. The response may be to reject or ignore the knock or to interact with the knocking user (e.g. to join or rejoin a conversation with that user or to open and accept a voice channel in a conversation with that user, etc.). In some cases, a user may provide a default response message, either by text or voice, to knocks from other users.
The types and options of responses that a user may provide to a knock may depend on the user's current state within the system. For example, if the user is already in another conversation using voice, the user may have different response options provided than if the user is inactive.
Knocking indications preferably include both visual and sound indicators. For example, knocking may cause a recipient's devices that are present in the system to play distinctive sounds and have visual indicators (e.g., animation, flashing, etc.).
Since knocking generally indicates some sense of time sensitivity, in preferred embodiments knocks expire after some period of time, and the visual and/or sound indicators associated with a knock may increase in intensity over time.
In some embodiments the system, via the UI, may provide the ability for users to escalate their notifications (i.e., knocks) to other users thereby to express a greater sense of urgency and possibly increase the chances of the recipient noticing the knocks. These escalated knocks are sometimes referred to herein as “hot knocks.”
Aspects of knocking, including the UI associated with knocking, are described in greater detail below.
An architecture of an exemplary implementation of system 100 is described here.
Clients 700 in
As used herein, for this description, “downstream” refers to the direction from devices to the backend, whereas “upstream” refers to the direction from the backend to one or more devices.
Clients 700 communicate with the backend 108 and with each other via the backend. In order to communicate with the backend or with other clients, a client may communicate with one or more backend services 702 via connection and routing mechanism(s) 704. Connection and routing between clients and backend services in the downstream direction may use downstream connection and routing mechanism(s) 706. Clients 700 may use API 708 in order to communicate downstream. It should be appreciated that not all backend system services 702 need be directly visible to or accessible directly by clients 700, even using the API 708. Connection and routing between the backend services and clients in the upstream direction may use upstream connection and routing mechanism(s) 710.
The backend system services 702 may include configuration services 712, user services 714, utilities and miscellaneous services 715, and conversation/asset manager services 716. The conversation/asset manager services 716 may include conversation services 718 and asset services 720. The utilities and miscellaneous services 715 may include search services 717. The backend system services 702 may correspond to, or be implemented by, backend applications 112 in
The backend system services 702 may maintain and access storage/data 722. The storage/data 722 may be stored/maintained in the backend databases 110 of
The backend system preferably maintains and can access state information via state mechanism(s) 734. State mechanism(s) 734 may include presence (device or client status) mechanisms 736. The state mechanism(s) 734 preferably provide state information (for example, presence information about devices) to the upstream connection/routing mechanism(s) 710. The state mechanism(s) 734 may obtain or determine state information directly from clients 700 and/or via connection and routing mechanism(s) 704 (for this reason, in the drawing in
The connection and routing mechanism(s) 704 may use authentication/verification mechanism(s) 742, e.g., to authenticate client downstream requests to the backend and/or backend upstream communications to clients. It is preferable and desirable that clients authenticate themselves when communicating downstream with the backend. Likewise, it is preferable and desirable that backend upstream communication with clients be authenticated and/or verified. Various authentication techniques may be used, including certificate-based and token-based authentication, and it should be appreciated that the system is not limited by the authentication/verification scheme(s) used. The authentication/verification mechanism(s) 742 may include, for example, CA(s) 119 (
To aid in this description,
A User Interface (UI)
Clients (users' devices) interact with each other and the system 100 via the backend 108. These interactions preferably take place using a user interface (UI) application 128 running on each client (e.g., device 104,
Devices
A UI is implemented, at least in part, on a device 104, and preferably uses the device's display(s) and input/interaction mechanism(s). Use of a UI may require selection of items, navigation between views, and input of information. It should be appreciated that different devices support different techniques for presentation of and user interaction with the UI. For example, a device with an integrated touch screen (e.g., device 104-C as shown in
UI Interactions
A UI presents information to a user, preferably in the form of text and/or graphics (including drawings, pictures, icons, photographs, etc.) on the display(s) of the user's device(s). The user may interact with the UI by variously selecting regions of the UI (e.g., corresponding to certain desired choices or functionality), by inputting information via the UI (e.g., entering text, pictures, etc.), and performing acts (e.g., with the mouse or keyboard) to affect movement within the UI (e.g., navigation within and among different views offered by the UI).
The UI application(s) 128 (
It should be appreciated that, depending on the device, the UI may not actually display information corresponding to navigation, and may rely on parts of the screen and/or gestures to provide navigation support. For example, different areas of a screen may be allocated for various functions (e.g., bottom for input, top for search, etc.), and the UI may not actually display information about these regions or their potential functionality.
As has been explained, and as will be apparent to those of ordinary skill in the art, upon reading this description, the manner in which UI interactions take place will depend on the type of device and interface mechanisms it provides.
As used herein, in the context of a UI, the term “select” (or “selecting”) refers to the act of a user selecting an item or region of a UI view displayed on a display/screen of the user's device. The user may use whatever mechanism(s) the device provides to position the cursor appropriately and to make the desired selection. For example, a touch screen 204 on device 104-C may be used for both positioning and selection, whereas device 104-D may require the mouse 208 (and/or keyboard 206) to position a cursor on the display 204 and then to select an item or region on that display. In the case of a touch screen display, selection may be made by tapping the display in the appropriate region. In the case of a device such as 104-A, selection may be made using a mouse click or the like.
Touch Screen Interfaces and Gestures
Touch-screen devices (e.g., an iPad, iPhone, etc.) may recognize and support various kinds of touch interactions, including gestures, such as touching, pinching, tapping, and swiping. These gestures may be used to move within and among views of a UI.
Voice Menu
The UI preferably provides a user with a way to use and control voice aspects of a conversation. For example, the UI on a device preferably provides the user with a way to initiate a voice conversation, mute a voice conversation, join an existing voice conversation, leave a voice conversation, and switch devices within a voice conversation. It should be appreciated that, as discussed above, voice is a kind of content that may have its own channel(s) within a conversation. Thus, as used herein, the term “voice conversation” may refer to the voice channel of a conversation. Thus, e.g., the phrase “initiate a voice conversation” may mean to “initiate the voice channel of a conversation,” and so on.
The UI on a device may provide access to a menu of options relating to voice conversations. The menu may be provided, e.g., as a sliding menu below an input cursor on the display of the device, e.g., as described in U.S. Provisional Patent Application No. 61/838,942, titled “User Interface With Sliding Cursor For Multimodal Communication Framework,” filed Jun. 25, 2013, the entire contents of which are fully incorporated herein by reference for all purposes (and which is included herein as Appendix B).
Although the drawings in
The user can select the cursor 802 or a region around it in order to enter text in the conversation. In addition, as described here, the user may slide or move the cursor (or a portion of the screen to the right of the cursor) to the right in order to expose an underlying menu (e.g., as shown in
In the example shown in view (ii) in
As shown in the drawings, the menu exposed by sliding the cursor to the right (from view (i) to view (ii) in
When the user selects the “talk” or “voice” icon (in view (ii) in
Other menu options may also be included, as shown in exemplary view (iii) in
The “voice” menu may include menu options (e.g., icons or regions) that are not initially exposed. In some embodiments, these may be displayed or exposed by moving other menu options (e.g., icons or regions) to the left or right. Thus, for example, as shown in
In
Avatars
Recall from the discussion above, with reference again to
In presently preferred exemplary implementations the avatar is a round (circular) image derived or extracted from a user picture 512.
A user's avatar (or an area around the avatar) may be animated and/or color coded by the UI to show activity or status of that user. Exemplary animation of an avatar may include changing its color and/or intensity, causing the avatar image to flash or appear to pulsate or throb, or the like. In addition, or instead, an area such as a ring around the avatar may be animated. As used herein, animation of an avatar includes animation or changing of the avatar image itself and/or animation and/or changing of an area around the avatar image. The area around the avatar image may be changed, e.g., to a different color and/or hue and/or intensity. An avatar is said to be animated if it differs from the avatar that was derived from the picture 512 in some way, and it should be appreciated that animation may include, but does not require, movement or repeated change. Thus, e.g., an avatar 902 with a colored ring around it is said to be animated. Similarly, an avatar 902 with a ring around it that rotates or appears to rotate or changes size or appears to change size is said to be animated. In some cases the hue or intensity of the image itself may be modified as part of an animation of an avatar. For example, a black and white image (e.g., as shown in
In general, the term “animation” refers to any change in the avatar. Those of ordinary skill in the art will realize and appreciate, upon reading this description, that the system is not limited by the manner in which an avatar is selected or animated.
An avatar may be animated in different ways, e.g., to show that a user is: (i) talking; (ii) knocking; (iii) connecting; (iv) listening; or (v) muted. It should be appreciated that these are just some states in which a user may be, and a particular system is not required show all these user states, and a particular system may show different and/or other states.
In some cases multiple rings may be used around the image 902 to indicate state. For example,
Although three rings were shown in the example above, those of ordinary skill in the art will realize and appreciate, upon reading this description, that a particular system may use fewer or more rings. It should also be appreciated that although the drawings show discrete boundaries between adjacent rings, in particular implementations the rings may be made to appear to blend into each other.
When movement is shown in an avatar's animation, the speed of that movement may be used to indicate an aspect of the user's state.
Voice Headers
A conversation includes one or more users (also referred to as participants), and it is useful to provide an indication to each participant of which users are the participants of a conversation. It should be appreciated that the term “participant” is used throughout this description to refer to the membership of a conversation, and does not imply any active participation in the conversation. For example, a particular participant may read messages from others and listen to voice from others without sending any messages or voice himself one user.
When at least one user of a conversation is using a voice channel in that conversation, then that conversation's voice channel is said to be active. It should be appreciated and understood that an active voice channel does not require any actual activity on the channel. When a conversation has an active voice channel (i.e., when at least one user is using or has opened the voice channel), the conversation preferably includes header information indicative of the active voice channel. As used herein, this header may be referred to as the conversation's voice header.
A conversation's participants may change, as may their degree of activity in the conversation. The voice header of a conversation may be updated to reflect changes in the conversation's participants. Thus, for example, a conversation voice header preferably reflects the current list of participants. In some embodiments, the voice header may change dynamically to reflect actual activity level (e.g., recency of activity) of the conversation's various participants.
In presently preferred embodiments user information 502 (
With reference again to
In presently preferred embodiments a user's public information 504, including one or more of the user's name 508 and a picture 512 or avatar derived or based on a picture 512 (as described above), may be used to identify that user within a voice header.
In the examples in
With reference again to
The voice header is used to reflect which members have joined the voice channel of the conversation, so that a user may already be a participant in the conversation when they join the voice channel of the conversation. Thus, e.g., when user #4 joins the speech conversation (i.e., joins the speech channel of the conversation) in screen portion (iv) of
The order of the list of users in a voice header is preferably defined by the time at which the participants join the voice channel. As users join, their identifying information (e.g., an avatar) is preferably added to the beginning (i.e., to the left side) of the list of users in the voice header. The identifying information (e.g., avatar(s)) of previous users is then pushed or moved to the right. If the number of users exceeds the available width of the display area (as seen, e.g., in
In some embodiments a user's voice activity (e.g., the user actively talking) does not affect the sorting of the user list in the voice header or the scroll position.
The voice headers shown in
In presently preferred implementations the identifying information of a user is derived directly or otherwise from the user's picture(s) 412 and is preferably that user's “avatar.”
As noted above, a user's identifying information (e.g., their avatar) may be used to show state information about that user. For example, a user's identifying information may show one or more of whether that user is: listening, talking, connecting, knocking, or mute. Some of the state information about a user may not be shown to other participants (e.g., when the user is on mute).
The UI and Knocking
The UI (i.e., the UI application(s) 128) preferably deals with three aspects of knocking. First, the UI preferably supports a user knocking one or more other users in order to try to get their attention. In this regard, the UI may provide one or more mechanisms to support selection of which user(s) to knock and to support knocking those users. Second, the UI should support the providing or rendering knock indication(s) on knock recipient(s), and third, the UI should preferably provide knock recipients with options for responding to knocks (including ignoring knocks).
As noted, a knock preferably has a timeout, that is a period of time when it is considered to be valid and relevant. The time out period for a knock is preferably relatively short, on the order of one to two minutes, although other periods may be used. The timeout period (and remaining time) are preferably indicated to the user (at least to the knocker) in some manner.
Knocking
As noted, UI on client devices preferably supports a user knocking one or more other users in order to try to get their attention. The consequences of such a knock depend, at least in part, on the status (e.g., the presence and activity status of the intended recipient(s)), as discussed below. In some embodiments a user may knock within a conversation by two or three taps on a name or names in the conversation list. In order to prevent inadvertent knocks, preferably three or more taps are needed to initiate (or re-initiate) a knock. In some implementations, the UI may require that the user confirm a knock. In a list or group view the user may knock on one or more other users by tapping (two, three, or more taps) on the name of a group or conversation.
Knock Indications
The following table summarizes aspects of exemplary knock indications in embodiments hereof Those of ordinary skill in the art will realize and appreciate, upon reading this description, that these indications are merely exemplary, and that different and/or other indications and conditions may be used.
It should be appreciated that in all cases the visual indicators and notifications preferably include a distinctive sound.
In general, as shown in
In an example implementation the timeout period is one minute, and the five stages shown in
Hot Knocks
Hot knocks allow a knocker to express urgency and thereby possibly increase the chances of the recipient noticing the knock. Hot knocks may group consecutive knocks into one more intense (e.g., bigger, louder, longer, brighter, etc.) and in general more noticeable notification. A hot knock indicator may increase its intensity in any way, e.g., in terms of one or more of size, speed and/or audio feedback. Like other knocks, hot knocks may indicated in the conversation, the list and preferably as a push notifications.
As shown in
In a presently preferred implementation, in order for two consecutive knocks to be grouped into a hot knock, the following requirements should be met:
1. After the first knock, no other action from any participant (including the person who knocked) has happened in the conversation. Valid actions are new messages or a participant joining the voice channel of the conversation.
2. The second knock happens within the expiration time of the previous knock.
Knocks following a hot knock (and during its timeout period) are preferably ignored until the hot knock expires. After that, a new regular knock may be created. For example, as shown in
In some implementations, a knock occurring within a short period after the expiration of a hot knock (e.g., within 5-10 seconds) may also be a hot knock. Thus, as shown for example in
Knock in Groups
Knocks from different participants in a group are preferably treated as separate knocks and preferably do not implicate each other or hot knocks. Knocks preferably become hot knocks only if coming from the same sender. If a receiver responds to a knock (e.g., taps on a knock indicator), preferably only the sender and the receiver who responded will join voice automatically. The rest of participants have to actively join (e.g., respond to the knock) themselves. Knocks from different senders are preferably sorted by recency in the conversation. Hot knocks update their position.
Consider the following example scenario (with reference to
In
List Views
Knocks and hot knocks are preferably indicated in the list of a list view. (Lists may include dots adjacent user's names indicating other aspects of the system, and knock indicators are preferably in the same place as the dots.) A knock (regular or hot) affects the position of the conversation in the list as any other new message. Muted conversations do not indicate knocks. Archived conversations which get a knock preferably become un-archived, however, conversations which are both archived and muted remain archived.
Knocks (regular or hot) expire after timeout, as is indicated in the list (e.g., by intensity in the animation). After being expired, the animation stops and an indication equivalent to one regular message is used in the dot indication.
Knocks preferably override present dots during their valid time. After timeout, dots are updated with previous value plus missed knock (1 message) plus any value coming from other messages that may have arrived during knock time. As used herein, a missed knock is a knock that has not been followed by a message from any of the recipients on any channel (including joining the voice channel).
Voice-on Indications in Lists
Multiple conversations may have voice activity ongoing at the same time. However, a user may preferably be in only one voice conversation at a time. Conversations with voice activity are preferably indicated in the list, with voice activity indications preferably kept separate from new messages and knock indicators.
As noted, voice activity in a conversation may affect the conversation's position in the list by communicating the following events:
These events may affect archived and muted conversations as any other regular message:
In some cases, the microphone of the conversation the user is in can be muted. If so, this is preferably indicated in the list.
Voice Scenarios
Embodiments of the system may support various scenarios relating to voice and voice conversations including some or all of the following:
Exemplary Voice Conversation Scenarios and Processing and an Exemplary UI
Various exemplary voice conversation scenarios are discussed here and described in
While Texting, Participants Agree to Switch to Voice
Embodiments of the system may support scenarios where, while texting, the participants may to switch to voice.
Join Ongoing Voice Channel
Embodiments of the system may support scenarios in which a participant may join an ongoing voice channel.
In views 1 and 2, participant PC uses her menu (exposed by sliding the cursor to the right) to access the voice menu. Participant PC selects the voice option and joins the voice conversation in view 3 (as indicated by the addition of her avatar to the left side of the voice header).
Multitasking—Texting while on Voice
Embodiments of the system may support scenarios in which a participant may text while on a voice channel.
Multitasking—Send Image while on Voice
Embodiments of the system may support scenarios in which a participant may send an image to others while on a voice channel.
The exemplary scenario in
Multitasking—Switch from Conversation with Voice to Other and Back
Embodiments of the system may support scenarios in which a participant may switch back and forth between voice conversations with different participants.
In view 1 the two participants (PA and PB) are in a voice conversation (i.e., in a conversation using the voice channel), as indicated by their avatars in the voice header.
Participant PA slides his screen to the right, thereby exposing the list view (participant PA's view 2). Participant PA's list shows his voice conversation with participant PB as first on his list. In view 2 participant PA selects another user and engages in conversation with that user. Note that during that conversation participant PA's screen displays a minimized header (e.g., as shown in
In view 3, participant PA again slides his conversation view aside to again expose his list view (view 4) from which he can select participant PB to continue their voice conversation (as shown in view 5).
Note that participant PB's view remains the same throughout this scenario.
Multitasking—Switch to Other Voice Channel
Embodiments of the system may support scenarios in which a participant may switch to another voice channel.
In view 1 the two participants (PA and PB) are in a voice conversation (i.e., in a conversation using the voice channel), as indicated by their avatars in the voice header. In views 2 and 3 someone is knocking on participant PA, as indicated by the knock indicator—the other party's animated avatar—on PA's screen.
Participant PA selects the knock indicator (e.g., by tapping it on the screen) and is initially transferred to a texts conversation with Participant PC (“Kate”) as shown in view 3. View 3 has a minimized voice header showing that Participant PA is still in a voice conversation with Participant PB. When Participant PA selects to enter a voice conversation with Participant PC (“Kate”) (in view 4), he leaves the voice conversation with Participant PB.
When Participant PA wants to rejoin the voice conversation with Participant PB he may do so via the conversation list (view 6 in
Note that Participant PB's view during this scenario shows that she is on the voice channel (her avatar remains in the voice header) in all views, even after Participant PA left.
Multitasking—Join Voice while Listening Music
Embodiments of the system may support scenarios in which a participant may join a voice channel while listening to music on a device.
Multitasking—Play Music while on Voice
Embodiments of the system may support scenarios in which a participant may play music while on a voice channel.
Switch from a One-One Voice Conversation to a Multi-User Voice Conversation
Embodiments of the system may support scenarios in which a participant may switch from a one-one voice conversation to a multi-user voice conversation.
Continue Voice from One Device to Other
Embodiments of the system may support scenarios in which a participant may switch a voice conversation from one device to another.
As shown in
Mute Microphone
Embodiments of the system may support scenarios in which a participant may mute their microphone.
Embodiments of the system may support scenarios in which a participant may mute the microphones of other participants.
Enable Loudspeakers
Embodiments of the system may support scenarios in which a user may enable loudspeakers.
Have Voice on while Using Other Application
Embodiments of the system may support scenarios in which a user may have a voice conversation on while using another application.
Knock Scenarios
Embodiments of the system may support various scenarios relating to knocks and knocking including some or all of the following:
The Application is not Running or Running in the Background (CASE A)
Embodiments of the system may support scenarios in which a user is knocked while the application is not running or is running in the background. FIGS. 14A and 14A-1 depict aspects of such an exemplary scenario. Although the knock is shown as an avatar in
Conversation of the Incoming Knock in the Foreground (CASE B.1)
Embodiments of the system may support scenarios in which a conversation of the incoming knock is in the foreground. FIGS. 14B and 14B-2 depict aspects of such an exemplary scenario.
List View (CASE B.2)
Embodiments of the system may support scenarios of knocking using the list view.
Other Views (CASE B.3)
Embodiments of the system may support scenarios of knocking using other views.
Knock from the List Before Joining
Embodiments of the system may support scenarios of knocking using the list view before joining
Knock from the Conversation Before Joining
Embodiments of the system may support scenarios of knocking from the conversation before joining
Join the Voice Channel First and then Knock
Embodiments of the system may support scenarios of joining the voice channel first and then knocking.
Decline a Knock with a Predefined Text Answer
Embodiments of the system may support scenarios of declining a knock with a predefined text answer.
Decline a Knock with a Custom Text Answer
Embodiments of the system may support scenarios of declining a knock with a custom text answer.
Ignore/Miss a Knock
Embodiments of the system may support scenarios of ignoring or missing a knock.
The User Creates Hot Knocks after the Knocks Expire
Embodiments of the system may support scenarios of a user creating hot knocks after knocks expire.
The User Creates Hot Knocks Before Knocks Expire
Embodiments of the system may support scenarios of a user creating hot knocks before knocks expire.
The User Creates Hot Knocks Before Knocks Expire by Long Tap
Embodiments of the system may support scenarios of a user creating hot knocks before knocks expire.
It should be appreciated that not every implementation of the system will provide all of these features, and not every feature need be provided to all users of a particular system. In addition, a system 100 may provide additional and/or different features.
The services, mechanisms, operations and acts shown and described above are implemented, at least in part, by software running on one or more computers or computer systems or devices. It should be appreciated that each user device is, or comprises, a computer system.
Programs that implement such methods (as well as other types of data) may be stored and transmitted using a variety of media (e.g., computer readable media) in a number of manners. Hard-wired circuitry or custom hardware may be used in place of, or in combination with, some or all of the software instructions that can implement the processes of various embodiments. Thus, various combinations of hardware and software may be used instead of software only.
One of ordinary skill in the art will readily appreciate and understand, upon reading this description, that the various processes described herein may be implemented by, e.g., appropriately programmed general purpose computers, special purpose computers and computing devices. One or more such computers or computing devices may be referred to as a computer system.
According to the present example, the computer system 1500 includes a bus 1502 (i.e., interconnect), one or more processors 1504, one or more communications ports 1514, a main memory 1506, removable storage media 1510, read-only memory 1508, and a mass storage 1512. Communication port(s) 1514 may be connected to one or more networks by way of which the computer system 1500 may receive and/or transmit data.
As used herein, a “processor” means one or more microprocessors, central processing units (CPUs), computing devices, microcontrollers, digital signal processors, or like devices or any combination thereof, regardless of their architecture. An apparatus that performs a process can include, e.g., a processor and those devices such as input devices and output devices that are appropriate to perform the process.
Processor(s) 1504 can be (or include) any known processor, such as, but not limited to, an Intel® Itanium® or Itanium 2® processor(s), AMD® Opteron® or Athlon MP® processor(s), or Motorola® lines of processors, and the like. Communications port(s) 1514 can be any of an RS-232 port for use with a modem based dial-up connection, a 10/100 Ethernet port, a Gigabit port using copper or fiber, or a USB port, and the like. Communications port(s) 1514 may be chosen depending on a network such as a Local Area Network (LAN), a Wide Area Network (WAN), a CDN, or any network to which the computer system 1500 connects. The computer system 1500 may be in communication with peripheral devices (e.g., display screen 1516, input device(s) 1518) via Input/Output (I/O) port 1520. Some or all of the peripheral devices may be integrated into the computer system 1500, and the input device(s) 1518 may be integrated into the display screen 1516 (e.g., in the case of a touch screen).
Main memory 1506 can be Random Access Memory (RAM), or any other dynamic storage device(s) commonly known in the art. Read-only memory 1508 can be any static storage device(s) such as Programmable Read-Only Memory (PROM) chips for storing static information such as instructions for processor(s) 1504. Mass storage 1512 can be used to store information and instructions. For example, hard disks such as the Adaptec® family of Small Computer Serial Interface (SCSI) drives, an optical disc, an array of disks such as Redundant Array of Independent Disks (RAID), such as the Adaptec® family of RAID drives, or any other mass storage devices may be used.
Bus 1502 communicatively couples processor(s) 1504 with the other memory, storage and communications blocks. Bus 1502 can be a PCI/PCI-X, SCSI, a Universal Serial Bus (USB) based system bus (or other) depending on the storage devices used, and the like. Removable storage media 1510 can be any kind of external hard-drives, floppy drives, IOMEGA® Zip Drives, Compact Disc-Read Only Memory (CD-ROM), Compact Disc-Re-Writable (CD-RW), Digital Versatile Disk-Read Only Memory (DVD-ROM), etc.
Embodiments herein may be provided as one or more computer program products, which may include a machine-readable medium having stored thereon instructions, which may be used to program a computer (or other electronic devices) to perform a process. As used herein, the term “machine-readable medium” refers to any medium, a plurality of the same, or a combination of different media, which participate in providing data (e.g., instructions, data structures) which may be read by a computer, a processor or a like device. Such a medium may take many forms, including but not limited to, non-volatile media, volatile media, and transmission media. Non-volatile media include, for example, optical or magnetic disks and other persistent memory. Volatile media include dynamic random access memory, which typically constitutes the main memory of the computer. Transmission media include coaxial cables, copper wire and fiber optics, including the wires that comprise a system bus coupled to the processor. Transmission media may include or convey acoustic waves, light waves and electromagnetic emissions, such as those generated during radio frequency (RF) and infrared (IR) data communications.
The machine-readable medium may include, but is not limited to, floppy diskettes, optical discs, CD-ROMs, magneto-optical disks, ROMs, RAMs, erasable programmable read-only memories (EPROMs), electrically erasable programmable read-only memories (EEPROMs), magnetic or optical cards, flash memory, or other type of media/machine-readable medium suitable for storing electronic instructions. Moreover, embodiments herein may also be downloaded as a computer program product, wherein the program may be transferred from a remote computer to a requesting computer by way of data signals embodied in a carrier wave or other propagation medium via a communication link (e.g., modem or network connection).
Various forms of computer readable media may be involved in carrying data (e.g. sequences of instructions) to a processor. For example, data may be (i) delivered from RAM to a processor; (ii) carried over a wireless transmission medium; (iii) formatted and/or transmitted according to numerous formats, standards or protocols; and/or (iv) encrypted in any of a variety of ways well known in the art.
A computer-readable medium can store (in any appropriate format) those program elements that are appropriate to perform the methods.
As shown, main memory 1506 is encoded with application(s) 1522 that support(s) the functionality as discussed herein (an application 1522 may be an application that provides some or all of the functionality of one or more of the mechanisms described herein). Application(s) 1522 (and/or other resources as described herein) can be embodied as software code such as data and/or logic instructions (e.g., code stored in the memory or on another computer readable medium such as a disk) that supports processing functionality according to different embodiments described herein.
For example, as shown in
As shown, e.g., in
As noted above, backend system services 802 (1522-2 in
During operation of one embodiment, processor(s) 1504 accesses main memory 1506 via the use of bus 1502 in order to launch, run, execute, interpret or otherwise perform the logic instructions of the application(s) 1522. Execution of application(s) 1522 produces processing functionality of the service(s) or mechanism(s) related to the application(s). In other words, the process(es) 1524 represents one or more portions of the application(s) 1522 performing within or upon the processor(s) 1504 in the computer system 1500.
For example, as shown in
It should be noted that, in addition to the process(es) 1524 that carries (carry) out operations as discussed herein, other embodiments herein include the application 1522 itself (i.e., the un-executed or non-performing logic instructions and/or data). The application 1522 may be stored on a computer readable medium (e.g., a repository) such as a disk or in an optical medium. According to other embodiments, the application 1522 can also be stored in a memory type system such as in firmware, read only memory (ROM), or, as in this example, as executable code within the main memory 1506 (e.g., within Random Access Memory or RAM). For example, application 1522 may also be stored in removable storage media 1510, read-only memory 1508, and/or mass storage device 1512.
Those skilled in the art will understand that the computer system 1500 can include other processes and/or software and hardware components, such as an operating system that controls allocation and use of hardware resources.
As discussed herein, embodiments of the present invention include various steps or operations. A variety of these steps may be performed by hardware components or may be embodied in machine-executable instructions, which may be used to cause a general-purpose or special-purpose processor programmed with the instructions to perform the operations. Alternatively, the steps may be performed by a combination of hardware, software, and/or firmware. The term “module” refers to a self-contained functional component, which can include hardware, software, firmware or any combination thereof.
One of ordinary skill in the art will readily appreciate and understand, upon reading this description, that embodiments of an apparatus may include a computer/computing device operable to perform some (but not necessarily all) of the described process.
Embodiments of a computer-readable medium storing a program or data structure include a computer-readable medium storing a program that, when executed, can cause a processor to perform some (but not necessarily all) of the described process.
Where a process is described herein, those of ordinary skill in the art will appreciate that the process may operate without any user intervention. In another embodiment, the process includes some human intervention (e.g., a step is performed by or with the assistance of a human).
Real Time
Those of ordinary skill in the art will realize and understand, upon reading this description, that, as used herein, the term “real time” means near real time or sufficiently real time. It should be appreciated that there are inherent delays in network-based communication (e.g., based on network traffic and distances), and these delays may cause delays in data reaching various components Inherent delays in the system do not change the real-time nature of the data. In some cases, the term “real-time data” may refer to data obtained in sufficient time to make the data useful for its intended purpose.
Although the term “real time” may be used here, it should be appreciated that the system is not limited by this term or by how much time is actually taken. In some cases, real time computation may refer to an online computation, i.e., a computation that produces its answer(s) as data arrive, and generally keeps Up with continuously arriving data. The term “online” computation is compared to an “offline” or “batch” computation.
As used in this description, the term “portion” means some or all. So, for example, “A portion of X” may include some of “X” or all of “X”. In the context of a conversation, the term “portion” means some or all of the conversation.
As used herein, including in the claims, the phrase “at least some” means “one or more,” and includes the case of only one. Thus, e.g., the phrase “at least some ABCs” means “one or more ABCs”, and includes the case of only one ABC.
As used herein, including in the claims, the phrase “based on” means “based in part on” or “based, at least in part, on,” and is not exclusive. Thus, e.g., the phrase “based on factor X” means “based in part on factor X” or “based, at least in part, on factor X.” Unless specifically stated by use of the word “only”, the phrase “based on X” does not mean “based only on X.”
As used herein, including in the claims, the phrase “using” means “using at least,” and is not exclusive. Thus, e.g., the phrase “using X” means “using at least X.” Unless specifically stated by use of the word “only”, the phrase “using X” does not mean “using only X.”
In general, as used herein, including in the claims, unless the word “only” is specifically used in a phrase, it should not be read into that phrase.
As used herein, including in the claims, the phrase “distinct” means “at least partially distinct.” Unless specifically stated, distinct does not mean fully distinct. Thus, e.g., the phrase, “X is distinct from Y” means that “X is at least partially distinct from Y,” and does not mean that “X is fully distinct from Y.” Thus, as used herein, including in the claims, the phrase “X is distinct from Y” means that X differs from Y in at least some way.
As used herein, including in the claims, a list may include only one item, and, unless otherwise stated, a list of multiple items need not be ordered in any particular manner. A list may include duplicate items. For example, as used herein, the phrase “a list of XYZs” may include one or more “XYZs”.
It should be appreciated that the words “first” and “second” in the description and claims are used to distinguish or identify, and not to show a serial or numerical limitation. Similarly, the use of letter or numerical labels (such as “(a)”, “(b)”, and the like) are used to help distinguish and/or identify, and not to show any serial or numerical limitation or ordering.
No ordering is implied by any of the labeled boxes in any of the flow diagrams unless specifically shown and stated. When disconnected boxes are shown in a diagram the activities associated with those boxes may be performed in any order, including fully or partially in parallel.
While the invention has been described in connection with what is presently considered to be the most practical and preferred embodiments, it is to be understood that the invention is not to be limited to the disclosed embodiments, but on the contrary, is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.
Appendix A (U.S. Provisional Patent Application No. 61/860,222, titled “Unified and Consistent Multimodal Communication Framework,” filed Jul. 30, 2013, the entire contents of which are fully incorporated herein by reference for all purposes.)
Number | Date | Country | |
---|---|---|---|
61905410 | Nov 2013 | US |