This invention pertains generally to communication and, more particularly, to computer-facilitated communication.
The use of mobile phones to share multimedia such as photos, videos, text, and location co-ordinates over cellular and Wi-Fi data networks has become common. While the adoption of data-oriented services has grown, voice calling is still the most popular use case for mobile phones. However, challenges related to operating systems, handset hardware and network limitations have contributed to keeping voice and data service largely separate and uncoupled. The user experience can be confusing, frustrating and/or inefficient. Effective communication between users can be compromised. Conventional attempts to address these issues are flawed.
For example, some conventional systems require users to instantiate each data sharing service independently of a voice call and independently identify one or more sharing participants. A particular set of installed applications may be required. One or more of the sharing participants may not be able to receive data simultaneously with voice, and it may be difficult to determine if this is the case in advance. Some conventional systems provide for a type of non-real-time or delayed sharing, but delays can be significant, flawing and even disrupting a conversation. Some conventional systems provide insufficient access to communication device components and/or are inflexible with respect to communication device resource allocation between voice and data aspects of a communication session. Some conventional systems and methods fail to provide a user interface and/or protocol that facilitates effective communication with voice and multiple data-based sharing activities and/or that is extensible, for example, with respect to new data-based sharing activities.
Some conventional systems require relatively high levels of computational resources. This can be particularly problematic in resource-constrained environments such as mobile computing environments and other power-constrained environments. Some conventional systems create an expectation of, or even enforce, socially awkward communication behaviors, situations, protocols and/or idioms (collectively “communication scenarios”). Communication scenarios that are appropriate for formalized communications can be inappropriate and/or ineffective for casual, spontaneous and/or ad hoc communication (collectively, “casual communication”). Some conventional systems require custom and/or specialized hardware, which can impose constraints on wide-spread adoption. Such conventional systems can be richly featured yet of limited utility due to a relatively low number of potential communication participants.
Embodiments of the invention are directed toward solving these and other problems individually and collectively.
As part of real-time communication between users of communication devices, a communication connection may be established between the communication devices in accordance with a telephony protocol. A voice call may be maintained over the communication connection. During the voice call, media may be captured by one of the communication devices and unidirectionally streamed to another of the communication devices. For example, the media may include video. During the streaming of the media, one or more communicative activities may be provided that are concurrent with and contextualized by the streaming media. Such communication may be facilitated by one or more components incorporated into communication devices and/or computing devices.
The terms “invention,” “the invention,” “this invention” and “the present invention” used in this patent are intended to refer broadly to all of the subject matter of this patent and the patent claims below. Statements containing these terms should be understood not to limit the subject matter described herein or to limit the meaning or scope of the patent claims below. Embodiments of the invention covered by this patent are defined by the claims below, not this summary. This summary is a high-level overview of various aspects of the invention and introduces some of the concepts that are further described in the Detailed Description section below. This summary is not intended to identify key or essential features of the claimed subject matter, nor is it intended to be used in isolation to determine the scope of the claimed subject matter. The subject matter should be understood by reference to appropriate portions of the entire specification of this patent, any or all drawings and each claim.
Illustrative embodiments of the present invention are described in detail below with reference to the following drawing figures:
Note that the same numbers are used throughout the disclosure and figures to reference like components and features.
The subject matter of embodiments of the present invention is described here with specificity to meet statutory requirements, but this description is not necessarily intended to limit the scope of the claims. The claimed subject matter may be embodied in other ways, may include different elements or steps, and may be used in conjunction with other existing or future technologies. This description should not be interpreted as implying any particular order or arrangement among or between various steps or elements except when the order of individual steps or arrangement of elements is explicitly described.
In accordance with at least one embodiment of the invention, extensible media and/or activity sharing within a voice call or concurrent real-time communication session is enabled. Real-time media and/or activities may be offered and shared within the context of a synchronous real-time communication, such as a phone call. One or more media types and/or user activities can be initiated, offered and shared either serially (during distinct time intervals) or concurrently (during a same time interval). A concurrent voice call may provide context for (may “contextualize”) shared media and/or activities. In accordance with at least one embodiment of the invention, a first communicative activity contextualizes a second communicative activity when the first communicative activity enhances, changes or modifies a meaning, connotation, cognitive association or content of the second communicative activity. Streaming media, such as audio and video, may contextualize shared activities and/or activity sharing. Such contextualization can significantly enhance communication ease, efficiency and/or effectiveness, as well as reducing user confusion and/or frustration.
In accordance with at least one embodiment of the invention, users may share live experiences within the context of a voice call. Continuous, changing real-time media types such as periodic media (e.g., a user's current location) and streaming media (e.g., what the user is currently “seeing” through their personal video camera) may be shared, as well as interactive activities and static or “atomic” media types such as an image or a text message. Different types of media (e.g., streaming, periodic and atomic) and/or activities may be shared with the context of a voice call. Such sharing may have a unidirectional modality, for example, one party may offer to share media and/or an activity and another party may accept or reject it. Once accepted, the media and/or activity may be available until the sharing party terminates the call or ends the sharing.
A voice call may terminate at a communication device that incorporates a media contextualized activity sharing component in accordance with at least one embodiment of the invention. Alternatively, or in addition, the voice call may terminate at a communication device that does not incorporate the media contextualized activity sharing component. In the later case, shared media and/or activities may be stored at an intermediate location (e.g., a network storage device) for later retrieval by the intended party, for example, when the intended party gains access to the media contextualized activity sharing component.
In accordance with at least one embodiment of the invention, the receiving party may independently manipulate shared media and/or activities while a voice call continues in real-time between the sending party and the receiving party. For example, the receiving party may pause or initiate an “instant replay” of video broadcast by the sending party before returning to “live” view.
In accordance with at least one embodiment of the invention, sharing of streaming media utilizing a unidirectional modality (e.g., “unicast” video) has multiple advantages. For example, computing device resource utilization may be reduced relative to sharing of streaming media using a bidirectional modality (e.g., conventional video “conferencing”). This can be a significant advantage in power-constrained computing environments. As another example, unidirectional modalities can enable socially graceful communication scenarios, particularly in a casual communication context. As yet another example, aspects of an available, deployed and/or installed mobile computing environment, such as the power-constrained, bandwidth-constrained, small-display computing environment of so-called “smart” phones (e.g., the Apple® iPhone®), may be relatively adaptable to unidirectional streaming media sharing modalities and user interfaces. This is not insignificant at least because such adaptability can enable rapid, wide-spread adoption of new communication modalities, thereby enhancing human communication.
One or more of the communication clients 102-110 may include one or more media contextualized activity sharing components in accordance with at least one embodiment of the invention.
In accordance with at least one embodiment of the invention, the USM component 206 facilitates a unicast streaming video mode at times herein called “see what I see” (SWIS) mode. Such a sharing mode may enable a receiving party to “see what I see” with a relatively simple user interface and/or with relatively efficient computing device resource usage, e.g., with respect to bandwidth and processing power. In accordance with at least one embodiment of the invention, recipients in such a sharing mode may initiate an “instant replay” of streamed media while the audio call continues in real-time. For example, streamed media may be stored in a data storage “buffer” (not shown in
The USM component 206 may include a USM media encoder/decoder (codec) 212 optimized for SWIS mode media sharing and/or for a particular communication client (e.g., one of the communication clients 102-110 of
For example, suppose the communication client 108 of
The USM component 206 may further include a USM activities component 214 configured at least to facilitate media contextualized activity sharing. For example, during SWIS mode, users may offer, accept and/or reject a variety of sharing activities such as media annotation including freehand touch-based drawing and text captioning, contextualized messaging including text messaging and rich media messaging, sharing of contacts from a contact database 216 maintained by the communication client 200, media processing including object recognition and visual code (e.g., QR code) recognition, web link sharing including sharing of web links associated with object and code recognitions, concurrent browsing (“co-browsing”) of shared web links and communication client sensor data sharing including sharing of geographic location data. Such sharing activities may be initiated and terminated independent of SWIS mode with one of the user interfaces including the graphical user interface and/or physical user interface components of the communication client including buttons and motion sensors such as sensors that detect device “shaking.” Example details in accordance with at least one embodiment of the invention are described below in more detail with reference to
For example, an address book of a user's contacts (e.g., maintained by the user's communication client 200 of
Two communication clients may be able to connect in a variety of ways. For example, one communication path between the two clients may be through the PSTN, at least in part. Another communication path between the two clients may lay entirely in an IP-based network. Where multiple communication paths exists, one may be preferred or “default”. For example, a lowest cost communication path may be preferred or a type of communication path, such as an all IP-based communication path, may be preferred. Alternatively, or in addition, a set of communication path choices may be presented to a user for selection. In the case that the contacted client is offline, and a push message is sent through one communication path, contact may be attempted with an alternate communication path. For example, if a user does not respond to a notification sent through an IP-based communication path, a call may be placed through the PSTN. This example has the advantage that the call may terminate in the recipient's conventional voicemail box if the recipient does not answer the call. In the case that multiple contact locations (e.g., telephone numbers) are associated with a particular contact, the calling client may attempt contact at each contact location in a default or specified order, and may select an appropriate communication path (or sequence of communication paths) based at least in part on each contact location. Alternatively, or in addition, a set of contact location choices may be presented to a user for selection.
Once a client to client connection has been established, a session management layer operating over the server-routed messaging layer (e.g., maintained by a communication session manager 312) may be used to establish a bi-directional client to client voice channel, for example, in accordance with Internet Engineering Taskforce (IETF) XMPP, Jingle (XEP-0166), ICE (RFC 5245) and STUN (RFC 5389) standards, as appropriate. A client to client video channel may be negotiated (e.g., with respect to codec format and bit rate) at the same time as the voice channel is negotiated, although not necessarily utilized until later in a communication session.
In accordance with at least one embodiment of the invention, an extensible layer for negotiating activities and media sharing over the messaging layer is provided. For example, this may be implemented in accordance with an XMPP over a conventional text messaging channel by using a custom type attribute or globally unique identifier (GUID), for example based on location, to identify the packet as belonging to a specific activity type. Alternatively, or in addition, a namespace-extensible scheme can be utilized to allow third party activities to be added and the information to be routed to the appropriate system component.
Within messaging packets, an object description language such as “JAVASCRIPT” object notation (JSON) may be utilized to specify serialized data interchanged between clients. For example, a location object may be specified as follows:
where “longitude” and “latitude” correspond to geographic coordinates of the messaging originator, “jid” corresponds to a GUID for the data object, “description” is a plain text, human readable description of the data object, and “timestamp” corresponds to a date and time specification (e.g., a number of microseconds since the year 1900).
Different types of data may be handled differently. For example, with respect to atomic units of small data, small objects such as a location can be sent via the messaging channel as described above. With respect to periodic units of small data, some data types such as a location track can be sent as atomic units of small data but may be preceded or ended by a “starting to share” message sent via the same messaging channel. With respect to large data objects, larger data objects such as a picture, saved video, file, or VCard may be transmitted by uploading the object to cloud based storage (e.g., through the communication server and/or an associated web service) and then having the web service inform the other client that the object is available for retrieval. This also allows these larger objects to be persisted indefinitely in the case that they are not able to be retrieved immediately. With respect to streaming video or other streaming media, control messages may be sent via the messaging channel to offer video from one party to the other and to either accept or reject streaming video. In accordance with at least one embodiment of the invention, video streams may correspond to uni-directional offers (for example, a person might offer to show the other person in the call what they are seeing right now) rather than a bi-directional session. In accordance with at least one embodiment of the invention, this does not prevent both parties from offering to share video in a socially graceful communication scenario.
User account management functionality, such as creating, access, updating and deleting user account information including user preferences, as well as user authentication and service configuration including service billing and related functionality, may be provided by a user account manager component 314. Activity sharing functionality, such as processing activity sharing protocol messages, as well as enabling immediate and/or delayed access to media related to shared activities may be provided by an activity sharing manager component 316.
As described above, the USM codec 212 (
In accordance with at least one embodiment of the invention, the device hardware 414 corresponds to hardware of a mobile computing device. For example, the device hardware 414 may include one or more processors including central processing units (CPUs) and special-purpose processors such as telephony protocol processors and video encoding processors, one or more data storage components such as volatile and non-volatile data storage components including DRAM and “Flash” memory, one or more power sub-systems including a battery, as well as one or more network interfaces including wireless network interfaces and/or radios. In the case of mobile computing devices, access to device hardware 414 may be relatively strictly controlled by the operating system 406. For example, rather than providing applications 408 direct access to “device drivers” that in turn directly access the hardware components 402-404, applications 408 may be required to access the hardware components 402-404 indirectly through APIs 410-412 implementing relatively high-level functionality. There are good reasons for such restrictions. For example, “rogue” applications (e.g., applications incorporating incompetent or malicious programming) could otherwise rapidly drain the mobile device's battery of power and/or otherwise abuse the shared computing resources of the device (e.g., exceed a power budget of the application) and/or the communication networks 114 (
In accordance with at least one embodiment of the invention, the hardware components 402-404 include a special-purpose media encoder 402 not directly accessible to user applications 408, however, the operating system 406 provides indirect access to the special-purpose media encoder through a media file writer programmatic object (“Writer object”), for example, incorporated into one of the APIs 410-412. While media may be encoded by appropriately configuring a general-purpose CPU using computer-executable instructions, this may be relatively power-inefficient. This is significant since a mobile computing device without power provides very little functionality at all. In contrast, the special-purpose media encoder 402 may perform its function in a relatively power-efficient manner. Accordingly, access to the encoder 402 utilizing Writer objects may be desirable, and even required for effective media streaming with respect to some mobile computing devices.
In accordance with at least one embodiment of the invention, an application 416 may incorporate and/or be incorporated by the communication client 200 (
Power constraints can be significant when utilizing a mobile computing device to communicate. Another type of constraint includes user interface constraints. Mobile computing devices typically have a relatively small form factor and, accordingly, relatively small user interface components such as graphical displays. Utilization of such user interface resources can have a significant impact on the effectiveness and/or efficiency of communication with the mobile computing device.
In a first mode of operation or user interface context 502, two mobile computing devices, an initiator or sender S and a responder or receiver R, are powered on, but not in communication, for example, through one of the networks 114 (
In accordance with at least one embodiment of the invention, the sender S may initiate a streaming media unicast 508, such as a video unicast, to the receiver R. For example, the sender S may activate a user interface component (e.g., “swipe” an icon and/or slider component). Assuming the receiver R agrees to receive the unicast, the mobile computing device may transition to a streaming media unicast mode or context 510, in which, a media stream is generated, transmitted and presented at a user interface component of both the sender S and the receiver R. For example, the sender S may stream video captured in real-time, and the video may be concurrently presented (with respect to transmission system delays) at displays of the mobile computing devices of both the sender S and receiver R. The unicast context 510 may thus be a “see what I see” (SWIS) mode or context. The diagram in the resource utilization column indicates that a significant increase in utilization of computing and bandwidth resources may occur in this context 510. In at least one embodiment of the invention, this enhanced utilization is still less than conventional video conferencing as well as within the resource capacities of both mobile computing devices.
The streaming media unicast 510 is contextualized by the initial voice call 506. Alternatively, or in addition, the unicast 510 may contextualize further communicative activities. In accordance with at least one embodiment of the invention, the sender S and/or receiver R may initiate activities 512 that are contextualized by the streaming media 510, thereby entering a contextualized activity mode or context 514. For example, such activities may include streaming media annotations including text annotations and freehand drawing, concurrent text-based “chat” or “texting” functionality, sharing of still images captured from a video stream, automated recognition of objects in the video stream (e.g., recognition of objects, faces, text, codes such as QR codes) and triggered actions based on recognitions of sufficient confidence (e.g., accessing and/or transmitting information associated with a recognized QR code), as well as sharing of a current location or other information determined from data received by one or more sensors of the mobile computing device. The diagram in the resource utilization column shows further data being exchanged between the mobile computing devices in accordance with the contextualized activity.
Streaming media contextualized activities 514 may be transitory with respect to the streaming media context 510. Either party may terminate 516 the activity and return to the streaming media context 518. Similarly, the streaming media context may be transitory with respect to the voice call context 506. Again, either party may terminate 520 the unicast and return to the voice call context 522. At call termination 524, a both sender S and receiver R may decide whether to save or store the streaming media that was unicast and/or media associated with the contextualized activities. In at least one embodiment of the invention, the “tiered” nature of the contextualized communication scheme 500 provides an efficient, effective and/or enhanced mode of communication between the sender S and the receiver R that respects the constraints of the associated mobile computing devices.
As depicted in
User interaction with the contact button 624 of the in-call activity dial 614 may enable addition of (or switch to) a new call participant. User interaction with the unicast streaming media button 626 may initiate unidirectional streaming of media captured by a camera (not shown in
The GUI of the sender 702 may include a unicast streaming media activity menu 712. The unicast streaming media activity menu 712 may be transient and/or translucent. For example, the menu 712 may become visible and/or fully visible responsive to touching the display surface 706. The unicast streaming media activity menu 712 of the sender 702 may include a stop unicast button (as depicted in
At some later time, the user may interact with the communication client to initiate USM mode (step 814), for example, using the GUI. In response, the communication client may capture (step 816) and encode (step 818) media, for example, with the USM codec 212 (
While USM mode is active, a participant may further initiate activity sharing (step 902) that is contextualized by the streamed media. For example, the USM activities component 214 (
The USM codec 212 (
The Writer array maintenance thread 1004 may wait for an array event (step 1014). Example array events include initialization, array object dispatch, and/or a periodically generated timing signal. Responsive to array event occurrence, it may be determined (step 1016) whether a number of Writer objects in a Writer array is less than a target number (e.g., 10). If so, the thread 1004 may progress to step 1018 to instantiate and add one or more Writer objects to the array. Otherwise, the thread 1004 may return to step 1014 to wait for a next array event.
The Writer dispatch thread 1008 may wait for a timer signal (step 1020). For example, such a signal may be generated every second. Responsive to receiving the timer signal, a Writer object in the Writer array may be dispatched (step 1022). For example, a currently active and/or oldest Writer object in the Writer array may be signaled (e.g., with a Writer object API call) to finish capturing video to a file and to make the file ready for further processing (e.g., to “close” the file). At step 1024, the Writer dispatch thread 1008 may signal the video streamer thread 1012 that the file is ready for further processing. The thread 1008 may then return to step 1020 to wait for the next timer signal.
The video streamer thread 1012 way wait for dispatch signals (step 1026) such as signals from the writer dispatch thread 1008. Responsive to receipt of such signals, the video streamer thread 1012 may extract streaming media from the file created by a Writer object (step 1028). For example, the file may be in QuickTime® format, and the USM codec 212 (
An instant replay request corresponding to a “time shift” of x seconds may be received (step 1104), for example, responsive to user interaction with a GUI element. A corresponding “shifted” position in the replay queue may be determined (step 1106) at least in part by calculating an absolute time point (“replay start timepoint”) of x seconds ago by subtracting x from the current time, and then walking through the queue from the first packet backwards to find the first packet with a timestamp less than the replay start timepoint. The position of that packet in the queue (i.e., the shifted position) becomes the “on deck” packet to be played next (for example, if it is the third packet in the queue, then the on-deck value is 3). When the time shift is greater than 0, the on-deck value may be incremented every time a new packet is received and pushed on to the queue. When the time shift is greater than 0, whenever packet is received and pushed on to the queue, a packet “get” operation may be performed to obtain the packet to be placed in the decoder pipeline. For example, the get operation may decrement the on-deck value and return the packet at the corresponding position in the queue (step 1108). Alternatively, or in addition, the get operation may be triggered by an independent timer.
When a request to return to live playing is received (step 1110), the on-deck value may be reset to zero (step 1112) and incoming packets may again be sent directly to the decoder. For example, the live play request may be received responsive to user interaction with a GUI element. Responsive to the live play request, a stream reset message may be sent (step 1114). For example, the unicast media stream may include sequences of encoded frames that depend on previous frames (e.g., video frames that contain only changes with respect to one or more previous frames) and periodic “key” frames that are independent of previous frames. The stream reset message may instruct the stream encoder to restart encoding with a new key frame. With respect to the requested time shift associated with an instant replay request, there may not be a key frame in the replay queue near the point where the instant replay should start. Consequently, the stream decoder may not be able to decode some number of encoded packets that are passed to it after instant replay starts until it “syncs” to the stream (e.g., encounters a key frame or some sufficient number of non-key frames). In accordance with at least one embodiment of the invention, a “spool up” time interval (e.g., several seconds) may be added to the “time shift” specified by the instant replay request to compensate. As part of step 1108, packets in the spool up interval (and in the replay queue before the queue position determined in step 1106) may be rapidly provided to the stream decoder. In accordance with at least one embodiment of the invention, the additional packets can significantly increase a chance that the stream decoder will be able to decode each of the frames in the requested instant replay window.
In accordance with at least some embodiments, the system, apparatus, methods, processes and/or operations for communication may be wholly or partially implemented in the form of a set of instructions executed by one or more programmed computer processors such as a central processing unit (CPU) or microprocessor. Such processors may be incorporated in an apparatus, server, client or other computing device operated by, or in communication with, other components of the system. As an example,
It should be understood that the present invention as described above can be implemented in the form of control logic using computer software in a modular or integrated manner. Based on the disclosure and teachings provided herein, a person of ordinary skill in the art will know and appreciate other ways and/or methods to implement the present invention using hardware and a combination of hardware and software.
Any of the software components, processes or functions described in this application may be implemented as software code to be executed by a processor using any suitable computer language such as, for example, Java, C++ or Perl using, for example, conventional or object-oriented techniques. The software code may be stored as a series of instructions, or commands on a computer readable medium, such as a random access memory (RAM), a read only memory (ROM), a magnetic medium such as a hard-drive or a floppy disk, or an optical medium such as a CD-ROM. Any such computer readable medium may reside on or within a single computational apparatus, and may be present on or within different computational apparatuses within a system or network.
All references, including publications, patent applications, and patents, cited herein are hereby incorporated by reference to the same extent as if each reference were individually and specifically indicated to be incorporated by reference and/or were set forth in its entirety herein.
The use of the terms “a” and “an” and “the” and similar referents in the specification and in the following claims are to be construed to cover both the singular and the plural, unless otherwise indicated herein or clearly contradicted by context. The terms “having,” “including,” “containing” and similar referents in the specification and in the following claims are to be construed as open-ended terms (e.g., meaning “including, but not limited to,”) unless otherwise noted. Recitation of ranges of values herein are merely indented to serve as a shorthand method of referring individually to each separate value inclusively falling within the range, unless otherwise indicated herein, and each separate value is incorporated into the specification as if it were individually recited herein. All methods described herein can be performed in any suitable order unless otherwise indicated herein or clearly contradicted by context. The use of any and all examples, or exemplary language (e.g., “such as”) provided herein, is intended merely to better illuminate embodiments of the invention and does not pose a limitation to the scope of the invention unless otherwise claimed. No language in the specification should be construed as indicating any non-claimed element as essential to each embodiment of the present invention.
Different arrangements of the components depicted in the drawings or described above, as well as components and steps not shown or described are possible. Similarly, some features and subcombinations are useful and may be employed without reference to other features and subcombinations. Embodiments of the invention have been described for illustrative and not restrictive purposes, and alternative embodiments will become apparent to readers of this patent. Accordingly, the present invention is not limited to the embodiments described above or depicted in the drawings, and various embodiments and modifications can be made without departing from the scope of the claims below.
This application claims the benefit of U.S. Provisional Application No. 61/542,674, filed Oct. 3, 2011, titled “Synchronous Real-time Communication With Media Contextualized Activity Sharing,” and having Attorney Docket No. 93795-821924 (000200US), the contents of which is hereby incorporated in its entirety by reference.
Number | Date | Country | |
---|---|---|---|
61542674 | Oct 2011 | US |