The disclosed subject matter relates to video communication systems that allow the ad-hoc integration of tablets and phones.
Subject matter related to the present application can be found in the following commonly assigned patents and/or patent applications: U.S. Pat. No. 7,593,032, entitled “System and Method for a Conference Server Architecture for Low Delay and Distributed Conferencing Applications”; International Patent Application No. PCT/US06/62569, entitled “System and Method for Videoconferencing using Scalable Video Coding and Compositing Scalable Video Servers”; International Patent Application No. PCT/US06/061815, entitled “Systems and methods for error resilience and random access in video communication systems”; International Patent Application No. PCT/US07/63335, entitled “System and method for providing error resilience, random access, and rate control in scalable video communications”; International Patent Application No. PCT/US08/50640, entitled “Improved systems and methods for error resilience in video communication systems”; International Patent Application No. PCT/US11/038,003, entitled “Systems and Methods for Scalable Video Communication using Multiple Cameras and Multiple Monitors,” International Patent Application No. PCT/US12/041,695, entitled “Systems and Methods for Improved Interactive Content Sharing in Video Communication Systems”; International Patent Application No. PCT/US09/36701, entitled “System and method for improved view layout management in scalable video and audio communication systems”; International Patent Application No. PCT/US12/041,695, entitled “Systems and Methods for Improved Interactive Content Sharing in Video Communication Systems”; and International Patent Application No. PCT/US10/058,801, entitled “System and method for combining instant messaging and video communication systems.” All of the aforementioned related patents and patent applications are hereby incorporated by reference herein in their entireties.
Certain video communication applications allow the sharing of “content”. The term “content” as discussed herein can refer to or include any visual content that is not the video stream of one of the participants. Examples of content include the visual contents of a computer's screen—either the entire screen (“desktop”) or a portion thereof—or of a window where one of the computer's applications may be displaying its output.
Some systems use a “document camera” to capture such content. This camera can be positioned so that it can image a document placed on a table or special flatbed holder, and can capture an image of the document for distribution to all session participants. In modern systems, where computers are the primary business communication tool, the document camera can be replaced with a VGA input, so that any VGA video-producing device can be connected. In advanced systems, the computer can directly interface with the video communication system using an appropriate network or other connection so that it directly transmits the relevant content material to the session, without the need for conversion to VGA or other intermediate analog format.
On one end of the spectrum, content sharing may be completely passive (“passive content sharing”). In this scenario the video communication system encodes and transmits the content to the participants without providing the capability to modify it in any way. When content is driven by a computer, e.g., sharing a page of a document, it can be possible to show the cursor as well as any highlighting that is applied by the underlying software. This, however, is captured as imagery—it is not possible, in other words, for a remote participant to “take over” the cursor and perform remote editing of the document. This mode is used in many video communication applications.
On the other end of the spectrum there are distributed collaboration applications, such as shared whiteboards, and sometimes referred to as “active content sharing.” In this scenario, users are able to collaboratively edit and view a document in a synchronized fashion. The complexity in building such systems can be significant, and requires specialized protocols and applications. Oftentimes, users are not able to use their favorite applications and are forced to use special, network-aware programs (typically of lower sophistication). Thus, video communication applications typically use passive content sharing rather than active.
Certain video communication systems that rely on the Multipoint Control Unit (MCU) architecture, such as those using the ITU-T Rec. H.323 standard, “Packet-based multimedia communications systems,” incorporated herein by reference in its entirety, also can support a single content stream. ITU-T Rec. H.239, “Role management and additional media channels for H.3xx-series terminals”, incorporated herein by reference in its entirety, defines mechanisms through which two video channels can be supported in a single H.323 session or call. The first channel can be used to carry the video of the participants, and the second can be used to carry a PC graphics presentation or video. For presentations in multipoint conferencing, H.239 defines token procedures to guarantee that only one endpoint in the conference sends the additional video channel, which is then distributed to all conference participants.
When an H.323 call is connected, signaling defined in ITU-T Rec. H.245, “Control protocol for multimedia communication”, incorporated herein by reference in its entirety, can be used to establish the set of capabilities for all connected endpoints and MCUs. When the set of capabilities includes an indication that H.239 presentations are supported, a connected endpoint can choose to open an additional video channel. The endpoint must request a token from the MCU, and the MCU can check if there is another endpoint currently sending an additional video channel. The MCU can use token messages to make this endpoint stop sending the additional video channel. Then the MCU can acknowledge the token request from the first endpoint which then can begin to send the additional video channel which can contain, as an example, encoded video from a computer's video output at XGA resolution. Similar procedures can be defined for the case when two endpoints are directly connected to each other without an intermediate MCU.
Certain video communication systems used for traditional videoconferencing can involve a single camera and a single display for each of the endpoints. High-end systems, for use in dedicated conferencing rooms, can feature multiple monitors. A second monitor is often dedicated to content sharing. When no such content is used, one monitor can feature the loudest speaker whereas another monitor can show some or all of the remaining participants. When only one monitor is available, video and content must be switched, or the screen must be split between the two.
Video communication systems that run on personal computers (or tablets or other general-purpose computing devices) can have more flexibility in terms of how they display both video and content, and can also become sources of content sharing. Indeed, any portion of the computer's screen can be indicated as source for content and be encoded for transmission without any knowledge of the underlying software application (“screen dumping”, as allowed by the display device driver and operating system software). Inherent system architecture limitations, such as allowing only two streams (one video and one content) with H.300-series specifications, can prohibit otherwise viable operating scenarios (i.e., multiple video streams and multiple content streams).
So-called “telepresence” systems can convey the sense of “being in the same room” as the remote participant(s). In order to accomplish this goal, these systems can utilize multiple cameras as well as multiple displays. The displays and cameras can be positioned at carefully calculated locations in order to give a sense of eye-contact. Some systems involve three displays—left, center, and right—although configurations with only two or more than three displays are also available.
The displays can be situated in carefully selected positions in the conferencing room. Looking at each of the displays from any physical position at the conferencing room table can give the illusion that a remote participant is physically located in the room. This can be accomplished by matching the exact size of the person as displayed to the expected physical size of the subject if he or she were actually present at the perceived position in the room. Some systems go as far as matching the furniture, room colors, and lighting, to further enhance the lifelike experience.
In order to be effective, telepresence systems should offer very high resolution and operate with very low latency. For example, these systems can operate at high definition (HD) 1080p/30 resolutions, i.e., 1080 horizontal lines progressive at 30 frames per second. To eliminate latency and packet loss, the systems can use dedicated multi-megabit networks and can operate in point-to-point or switched configurations (i.e., they avoid transcoding). Some video conferencing systems assume that each endpoint is equipped with a single camera, although they can be equipped with several displays. For example, in a two-monitor system, the active speaker can be displayed in the primary monitor, with the other participants shown in the second monitor in a matrix of smaller windows. A “continuous presence” matrix layout permits participants to be continuously present on the screen rather than being switched in and out depending on who is the active speaker. In a continuous presence layout for a large number of participants, when the size of the matrix is exhausted (e.g., 9 windows for a 3×3 matrix), participants can be entered and removed from the continuous presence matrix based on a least-recently active audio policy.
A similar configuration to the continuous presence layout is the “preferred speaker” layout, where one speaker (or a small set of speakers) can be designated as the preferred speaker and can be shown in a window that is larger than the windows of other participants (e.g., double the size).
The primary monitor can show the participants as in a single-monitor system, while the second monitor displays content (e.g., a slide presentation from a computer). In this case, the primary monitor can feature a preferred speaker layout as well, i.e., the preferred speaker can be shown in a larger size window, together with a number of other participants shown in smaller size windows.
Telepresence systems that feature multiple cameras can be designed so that each camera is assigned to its own codec. For example, a system with three cameras and three screens can use three separate codecs to perform encoding and decoding at each endpoint. These codecs can make connections to three counterpart codecs on the remote site, using proprietary signaling or proprietary signaling extensions to existing protocols.
The three codecs are typically identified as “left,” “right,” and “center.” The positional references discussed herein are made from the perspective of a user of the system; left, in this context, refers to the left-hand side of a user (e.g., a remote video conference participant) who is sitting in front of a camera(s) and is using the telepresence system. Audio, e.g., stereo, can be handled through the center codec. In addition to the three video screens, the telepresence system can include additional screens to display a “content stream” or “data stream,” that is, computer-related content such as presentations.
The Primary codec is responsible for audio handling. The system shown in
Referring to
Telepresence systems face certain challenges that may not be found in traditional videoconferencing systems. One challenge is that telepresence systems handle multiple video streams. A typical videoconferencing system only handles a single video stream, and optionally an additional “data” stream for content. Even when multiple participants are present, the MCU is responsible for compositing the multiple participants in a single frame and transmitting the encoded frame to the receiving endpoint. Certain systems address this in different ways. For example, the telepresence system can establish as many connections as there are video cameras (e.g., for a three camera systems, three separate connections are established), and provide mechanisms to properly treat these separate streams as a unit, i.e., as coming from the same location.
The telepresence system can also use extensions to existing signaling protocols, or use new protocols, such as the Telepresence Interoperability Protocol (TIP). TIP is currently managed by the International Multimedia Telecommunications Consortium (IMTC); the specification can be obtained from IMTC at the address 2400 Camino Ramon, Suite 375, San Ramon, Calif. 94583 or from the web site http://www.imtc.org/tip. TIP allows multiple audio and video streams to be transported over a single RTP (Real-Time Protocol, RFC 3550) connection. TIP enables the multiplexing of up to four video or audio streams in the same RTP session, using proprietary RTCP (Real-Time Control Protocol, defined in RFC 3550 as part of RTP) messages. The four video streams can be used for up to three video streams and one content stream.
In both traditional as well as telepresence system configurations, content handling is thus simplistic. There are inherent limitations of the MCU architecture, in both its switching and transcoding configurations. The transcoding configuration can introduce delay due to cascaded decoding and encoding, in addition to quality loss, and is thus problematic for a high-quality experience. Switching, on the other hand, can become awkward, such as when used between systems with a different number of screens.
Scalable video coding (‘SVC’), an extension of the well-known video coding standard H.264 that is used in certain digital video applications, is a video coding technique that is effective in interactive video communication. The bitstream syntax and decoding process are formally specified in ITU-T Recommendation H.264, and particularly Annex G. ITU-T Rec. H.264, incorporated herein by reference in its entirety, can be obtained from the International telecommunications Union, Place de Nations, 1120 Geneva, Switzerland, or from the web site www.itu.int. The packetization of SVC for transport over RTP is defined in RFC 6190, “RTP payload format for Scalable Video Coding,” incorporated herein by reference in its entirety, which is available from the Internet Engineering Task Force (IETF) at the web site http://www.ietf org.
Scalable video and audio coding has been used in video and audio communication using the Scalable Video Coding Server (SVCS) architecture. The SVCS is a type of video and audio communication server and is described in commonly assigned U.S. Pat. No. 7,593,032, entitled “System and Method for a Conference Server Architecture for Low Delay and Distributed Conferencing Applications”, as well as commonly assigned International Patent Application No. PCT/US06/62569, entitled “System and Method for Videoconferencing using Scalable Video Coding and Compositing Scalable Video Servers,” both incorporated herein by reference in their entirety. It provides an architecture that allows for high quality video communication with high robustness and low delay.
Commonly assigned International Patent Application Nos. PCT/US06/061815, entitled “Systems and methods for error resilience and random access in video communication systems,” PCT/US07/63335, entitled “System and method for providing error resilience, random access, and rate control in scalable video communications,” and PCT/US08/50640, entitled “Improved systems and methods for error resilience in video communication systems,” all incorporated herein by reference in their entireties, further describe mechanisms through which a number of features such as error resilience and rate control are provided through the use of the SVCS architecture.
In one example, the SVCS can receive sealable video from a transmitting endpoint and selectively forward layers of that video to receiving participant(s). In a multipoint configuration, and contrary to an MCU, this exemplary SVCS performs no decoding/composition/re-encoding. Instead, all appropriate layers from all video streams can be sent to each receiving endpoint by the SVCS, and each receiving endpoint is itself responsible for performing the composition for final display. Therefore, in the SVCS system architecture, all endpoints can have multiple stream support, because the video from each transmitting endpoint is transmitted as a separate stream to the receiving endpoint(s). Of course, the different streams can be transmitted over the same RTP session (i.e., multiplexed), but the endpoint should be configured to receive multiple video streams, and to decode and compose them for display. This is an important advantage for SVC/SVCS-based systems in terms of the flexibility of handling multiple streams.
In systems that use the SVC/SVCS architecture, content sharing can work as follows. The user interface of the endpoint application, which can run on a personal computer, can allow the user to select any existing application window for sharing with other participants. When such a window is selected, it can appear in a list of available “shares” in the user interface of the other users. To alert them to the new share if no share is currently shown in their window, the newly introduced share can be shown in a “preferred view” (i.e., larger size view) in the main application window together with the videos of the session participants (i.e., the same way as a video participant). Since the size of this view may be small, and at any rate smaller than the size of the typical application window, the user can double-click on it so that it “pops-out” into its own window and thus allow the user to freely resize it. In a room-based system with, for example, two monitors, the content can be shown in its own monitor; if only one monitor is available. the screen can be split between video windows and the content window.
When the shared content is viewed by one or more of the participants, the originating endpoint can encode and transmit the content in the same way that it does any other source of video. The video encoding and decoding can differ in order to accommodate the particular features of computer-generated imagery, but from a system perspective, the content stream is treated as any other video stream. Note that the same video encoder can be used for content as well, but with different tuning and/or optimization settings (e.g., lower frame rate, higher spatial resolution with finer quantization, etc.). The system can support multiple content shares per endpoint, even if it may be confusing for the end user to have multiple active content shares. The inherent multi-stream support of the SVCS architecture makes content handling a natural extension of video.
Commonly assigned International Patent Application No. PCT/US11/038003, entitled “Systems and Methods for Scalable Video Communication using Multiple Cameras and Multiple Monitors,” incorporated herein by reference in its entirety, describes systems and methods for video communication using scalable video coding with multiple cameras and multiple monitors. In this case the architecture can be expanded to include multiple video displays and/or multiple sources for a particulate endpoint.
Commonly assigned International Patent Application No. PCT/US12/041,695, entitled “Systems and Methods for Improved Interactive Content Sharing in Video Communication Systems,” incorporated herein by reference in its entirety, describes improved mechanisms for handling interactive content based on the concept of “grab and draw”. This concept allows an end-user to “grab” content that is currently being shared in a session, use it in an application component that allows annotation (e.g., “draw”), and initiate his or her own share of the grabbed, annotated content. Coupled with touch-sensitive displays, which allow both intuitive “grabbing” as well as direct annotation, this mechanism can be a very effective way of visual communication.
Considering that many users today carry with them smartphones (e.g., Apple iPhone) or tablets (e.g., Apple iPad) that are equipped with a touch-sensitive display, it is desirable to design systems and methods with which such devices can be integrated into video communication systems in a seamless fashion.
Systems and methods for ad-hoc integration of tablets and phones in video communication systems are disclosed herein. In one embodiment of the present disclosure, a user can initiate the connection of a tablet or phone by triggering the display of a QR code on the communication system display. The tablet or phone scans the displayed QR code and obtains information in order to connect to the conference and participate. In one embodiment, the tablet or phone is attached to the endpoint that triggered the ad-hoc integration, whereas in another embodiment the tablet or phone is attached to one of the servers that participate in the conference. Upon connection, the tablet or phone can display shared content in full virtual resolution, and can allow the user to perform sharing of annotated content. In yet another embodiment, the displayed QR code can be used to bring the communication system into a session that has been initiated by the tablet or phone.
Throughout the figures the same reference numerals and characters, unless otherwise stated, are used to denote like features, elements, components or portions of the illustrated embodiments. Moreover, while the disclosed subject matter will now be described in detail with reference to the figures, it is done so in connection with the illustrative embodiments.
The present disclosure describes an audiovisual collaboration system that allows ad-hoc connection of portable devices such as smartphones and tablet computers to facilitate improved content interaction.
In one or more exemplary embodiments of the disclosed subject matter, the collaboration system can be integrated with a video communication system, which uses H.264 SVC and is based on the concept of the SVCS (see U.S. Pat. No. 7,593,032, previously cited). The same collaboration system can be used in legacy systems, including telepresence systems.
One or more embodiments of the disclosed subject matter can use the H.264 standard for encoding the video signals and the Speex scalable codec for audio. Speex is an open-source audio compression format; a specification is available at the Speex web site at http://www.speex.org. Some of the H.264 video streams can be encoded using single-layer AVC, whereas others can be encoded using its scalable extension SVC. Similarly, some of the Speex audio streams can contain only narrowband data (8 KHz), whereas others can contain narrowband, as well as, or separately, wideband (16 KHz) or ultra-wideband (32 KHz) audio. Alternate scalable codecs can be used, including, for example, MPEG-4/Part 2 or H.263++ for video, or G.729.1 (EV) for audio. The Network 202 can be any packet-based network; e.g., an IP-based network, such as the Internet.
In one or more embodiments of the disclosed subject matter, the Receiver 201 and Sender 221 is a general-purpose computer such as PC or Apple computer, desktop, laptop, tablet, etc. running a software application. They can also be dedicated computers engineered to only run the single software application, for example, using embedded versions of commercial operating systems, or even standalone devices engineered to perform the functions of the receiving and sending application, respectively. The receiving software application can be responsible for communicating with the server(s), for establishing connections and/or for receiving, decoding, displaying or playing back received video, content, and/or audio streams. The sending application, or the same receiving application for systems that are both senders and receiver, can also transmit back to a server its own encoded video, content, and/or audio stream.
Transmitted streams can be the result of real-time encoding of the output of one or more cameras and/or microphones attached to Sender 221, or they can be pre-coded video and/or audio stored locally on the Sender 221 or on a data source either accessible from the Sender 221 over the Network 202, or directly attached to it. For content streams, the source material can be obtained directly from a computer screen, through an intermediate analog or digital format (e.g., VGA), or it can be produced by a camera (e.g., a document camera). Other means of obtaining visual content are also possible as persons skilled in the art will recognize.
In one or embodiments, the Sender 221 can be equipped with a connected camera and/or microphone, and can encode and transmit the produced video and audio signal to other participants via a Server 2 220 over a Stream 2. The Sender 221 can also produce one or more content streams that are similarly transmitted to the Server 220 over the same Stream 2. Although
In accordance with the SVCS architecture, a Receiver can compose the decoded video streams (as well as any content streams) received from the Server(s) on its display, and can mix and play back the decoded audio streams. Traditional multi-point video servers such as transcoding MCUs can perform this function on the server itself, either once for all receiving participants, or separately for each receiving participant.
The operation of the Servers 210 and 220 is further detailed in
In one or more embodiments of the disclosed subject matter, scalable coding can be used for the video, content, and audio signals. The video and content signals can be coded, e.g., using H.264 SVC with three layers of temporal scalability and two layers of spatial scalability, with a ratio of 2 between the horizontal and/or vertical picture dimensions between the base and enhancement layers (e.g., VGA and QVGA).
Each of the senders, Sender 1 331, Sender 2 332, and Sender 3 333 can be connected to the Server 320, through which the sender can transmit one or more media streams—audio, video and/or content. Each of the senders, Sender 1 331, Sender 2 332, and Sender 3 333 also can have a signaling connection with Server 320 (labeled ‘SIG’). The streams in each connection are labeled according to: 1) the type of signal, i.e., A for audio, V for video, and C for content; and 2) the layers present in each stream, B for base and E for enhancement. In this particular example depicted in
The Server 320 can be connected to the Receiver 310; packets of the different layers from the different streams can be received by the Server 320, and can be selectively forwarded to the Receiver 310. Although there may be a single connection between the Server 320 and the Receiver 310, those skilled in the art will recognize that different streams can be transmitted over different connections (including different types of networks). In addition, there need not be a direct connection between such elements (i.e., one or more intervening elements can be present).
With continued reference to
The spatiotemporal picture prediction structure in one or more embodiments of the disclosed subject matter is shown in
As discussed above, the SVCS system architecture is inherently multi-stream, since each system component must be able to handle multiple streams of each type. Significantly, the actual composition of video and/or mixing of audio typically occurs at the receivers. Returning to
The layout depicted in
When the share is activated, all participants can start receiving an additional content window displaying the shared content.
Since users can simultaneously share application windows, and a user can share more than one window, a mechanism can be provided to select which share a user views. This can be performed in an exemplary system via the “Toggle” button, as shown in
In one or more embodiments of the disclosed subject matter, the video communication system can feature an interactive content sharing unit, as described in commonly assigned International Patent Application No. PCT/US12/041,695, entitled “Systems and Methods for Improved Interactive Content Sharing in Video Communication Systems,” previously cited. More specifically, in one or more embodiments of the disclosed subject matter, the system can use a touch-screen All-In-One (AIO) personal computer that can run a content sharing only videoconferencing client (e.g., the system does not have a camera and/or microphone connected). The touch screen display can act as a whiteboard. During normal operation, it can show the data share of the conference; As explained above, this can be accomplished by encoding at the originating participant a window of the computer's screen, and distributing to all other participants as with regular video streams. In the same or another embodiment of the disclosed subject matter, the content sharing window can originate from an H.239 client, or any other visual communication protocol. The image shown on the touch screen can also be a regular video stream showing one or more of the participants.
The touch screen can allow a user to touch the screen, thus “grabbing” the image. When doing so, the system can take a snapshot of the content currently displayed on the share window, and create a new share. In one embodiment of the present disclosure, the new share can be shown in a window that features a whiteboard application, through which the user can annotate the snapshot image. In one or more embodiments of the present disclosure, whiteboard annotations can include, for example, selecting different colors from a palette, drawing on the snapshot, or clearing all annotation.
The snapshot image with which the whiteboard interaction starts can be the image previously shared by the other party. The whiteboard image, including any annotations, can be shared with other session participants as any other window share.
In one or more embodiments of the disclosed subject matter, a user can be equipped with a portable device such as, for example, a smartphone or tablet; the device will be referred to herein as an Ad-Hoc Unit (AHU). In one or more embodiments of the present invention, the AHU can be equipped with a touch screen and a camera.
In order to allow the AHU to connect to an on-going communication session, AHU architecture can be an endpoint-based embodiment or a server-based embodiment. In the endpoint-based embodiment, the AHU is attached to the endpoint, whereas in the server-based embodiment, the AHU connects to the server. Both embodiments are described in detail in the following.
Node 760 is a special node in that it includes a touch screen 766 instead of a regular monitor 720. The touch screen 766 can be connected to its Node Unit 765 via a video connection 764, but it also can have a second connection 762 that can provide information about the touch screen status. The second connection 762 can be, for example, a USB or Bluetooth connection, or any other suitable connection. It is also possible that the two connections are over the same physical connection (e.g., over a Thunderbolt connection). In one or more embodiments of the disclosed subject matter, the Content Sharing Node unit can have a regular monitor (e.g., monitor 720) and can be equipped with a pointing device such as a mouse (not shown in
The endpoint 700 can participate in a video communication session like a regular endpoint, as described in International Patent Application No. PCT/US11/038,003 (previously cited). Further, as explained in International Patent Application No. PCT/US12/041,695 (previously cited), the touch screen 766 can allow the user to touch a point on the screen (e.g., a “grab” button) and can instruct the system to “grab” a snapshot of the image currently shown on the screen. The Node 760 can allow the user to annotate the grabbed content (e.g., “draw”), which at the same time can be shared with all other participants as any other type of shared content. This process is referred to as “grab and draw”.
Although
An AHU has all the characteristics of a Content Sharing Node or an endpoint that implements the grab and draw functionality. AHUs generally feature general-purpose computational processing capabilities, coupled with a touch-sensitive display and typically a camera. Most commercial videoconferencing systems today, in fact, feature support for popular smartphone architectures and tablets. An AHU, however, may not be configured to be a permanent component of a collaboration system. It may thus be necessary to have a mechanism through which the AHU can become a temporary component of the collaboration system so that it can be used in an on-going session. Before describing exemplary mechanisms for such ad-hoc integration, we first describe the two different embodiment architectures.
We now examine the process for the ad-hoc attachment of the AHU to a system, in accordance with the principles of the disclosed subject matter. A key objective is that no prior configuration should be necessary, with the exception of, for example, installing the appropriate client software on the AHU. A second key objective is that the process should require no modification to the operating procedures of existing systems.
The process of ad-hoc attachment in one or more embodiments of the present invention is initiated by the end user on the main system, e.g., Endpoint 820a or Endpoint 820b. The end user can, for example, press a button on the user interface (e.g., labeled “Add Phone/Tablet”). Upon initiation, in one embodiment of the disclosed subject matter, the system can display a QR (Quick Response) Code on one of its display monitors. QR Codes are two-dimensional bar codes and are described in U.S. Pat. No. 5,726,435, “Optically readable two-dimensional code and method and apparatus using the same”, as well as International Standard ISO/IEC 18004:2000, “Information Technology: Automatic identification and data capture techniques. Bar code symbology. QR Code.” QR codes can store fairly large amounts of data in different formats. The actual capacity can depend on the data type (mode, or input character set), version (1 through 40, indicating the overall dimensions of the symbol) and/or error correction level. For example, for 40-L symbols (version 40, error correction level L) a QR Code can store 7,089 characters of numeric only data or 4,296 alphanumeric (including punctuation) characters, whereas for binary data the QR Code can store 2,953 bytes. Such capacity can be used to encode in a single QR code information about how to connect to an on-going communication session, appropriate authentication credentials such as a login and password and/or other pertinent information.
Although QR codes are used in all examples indicated in the present disclosure, other types of visually identifiable codes are also possible, such as traditional one-dimensional bar codes, as well as other 2-dimensional codes including High Capacity Color Barcodes (developed by Microsoft), MaciCode (used by United Parcel Service), PDF417 (developed by Symbol Technologies), etc. It is also possible that multiple such codes are displayed at the same time, providing duplicate or complementary information.
In one or more embodiments of the disclosed subject matter, a QR Version 10 (57×57) code is used, capable of encoding up to 174 characters. In one or more embodiments of the disclosed subject matter, the encoded data can be in the form of a URL coupled with appropriate arguments. The URL can point to a server from which the AHU can download the client software to participate in the type of conference session that is being run, whereas the arguments to the URL can be information with which the downloaded software will be configured, and which will allow it to connect to the conference. One example of an encoded string is: “http://www.abc.com/ahu.php?e=192.168.1.5&s=HQ&u=guest&p=test”. In this example, the URL can instruct the AHU to connect to the server www.abc.com using the standard HTTP protocol, and execute the script “ahu.php” with the arguments “e=192.168.1.5” to indicate the IP address of the endpoint, “s=HQ” to indicate the name of the conference session to join, and “u=guest” with “p=test” to provide a corresponding user name and password. The “ahu.php” script can package these parameters together with the executable program that can then feed back to the AHU.
In another embodiment of the disclosed subject matter the QR coded string can be read directly by pre-loaded conferencing software that can exist on the AHU. The QR coded string would then only need to provide information about the system to connect to and/or other session parameters. Use of URLs to connect to conference sessions is described in commonly assigned International Patent Application No. PCT/US10/058,801, entitled “System and method for combining instant messaging and video communication systems,” incorporated herein by reference in its entirety.
Depending on the attachment architecture used, i.e., endpoint-based or server-based, the QR coded data may need to provide information on the network address of the device to attach to, as well as session naming information and/or login credentials. A time window can be encoded into the URL during which the URL can be considered valid. This can ensure that a code can only be used over a set period of time. In one or more embodiments of the disclosed subject matter, the conferencing system can generate pre-authorization tokens that can be embedded in the URL, which can also have a specific period of validity. This allows the system to unilaterally revoke the validity of a given URL.
The software of the AHU in one or more embodiments of the present disclosure allows the AHU to display shared content using the full resolution, by allowing the user to pan the image in the typically small AHU screen. A user can view high resolution imagery on a relatively small-sized screen. Some phones and tablets allow easy zooming in and out using multi-touch gestures (e.g., pinching, etc.). In one or more embodiments of the disclosed subject matter, the AHU software also can allow images that are locally stored on the AHU to become sources of shared content in the conferencing session. In one or more embodiments of the present disclosure, the AHU can offer full capability for annotating the content, thus enabling full support for features such as grab and draw at the AHU.
The steps of an exemplary process of attaching an AHU to a conferencing session in accordance with the principles of the disclosed subject matter is shown in
If the AHU is not equipped with a camera, the conferencing system can display a shortened URL on the screen, using the services of, for example, bit.ly. The end-user can manually type the URL string on the AHU. In both this, as well as the QR Code technique, the user that owns the AHU does not need to have an account on the conferencing system.
In another embodiment where the system is assumed to know the user's URL, e.g., because he or she is assumed to be logged into the system on the main conferencing system, the conferencing system can email the user the URL. The user can access the email message on the AHU and click on the emailed URL. If a different user owns the AHU, the logged in user can easily forward the URL to the email address of the actual AHU owner.
In the preceding description it has been assumed that the conference session is set up by a conferencing system endpoint, and that the AHU is brought into the conference as an ancillary device. The exact same mechanisms, however, can be used in the reverse direction: bringing a conferencing system endpoint into a session that has been initiated by the AHU. The growing use of phones and tablets as stand-alone conferencing system endpoints makes them likely candidates for initiating sessions by individual users. Contrary to room-based endpoints, these portable devices are usually owned and configured by the users themselves, and are thus more convenient for initiating conferencing sessions.
In embodiments where the AHU is the main conferencing device, and where the conferencing system endpoint is brought into an existing conference, the server-based AHU attachment architecture of
In order to bring a conferencing system endpoint such as Room System 830 into a conferencing session in which an AHU 821/822 is participating, nearly the same steps shown in
While the ad-hoc integration process is described herein in the context of videoconferencing systems, it is obvious to persons skilled in the art that the same techniques can be applied to audioconferencing systems or, indeed, web-only conferencing systems. Furthermore, the AHU may feature full video and audio communication capability.
The methods for ad-hoc integration of tablets and phones in video communication systems described above can be implemented as computer software using computer-readable instructions and physically stored in computer-readable medium. The computer software can be encoded using any suitable computer languages. The software instructions can be executed on various types of computers. For example,
The components shown in
Computer system 1000 includes a display 1032, one or more input devices 1033 (e.g., keypad, keyboard, mouse, stylus, etc.), one or more output devices 1034 (e.g., speaker), one or more storage devices 1035, various types of storage medium 1036.
The system bus 1040 link a wide variety of subsystems. As understood by those skilled in the art, a “bus” refers to a plurality of digital signal lines serving a common function. The system bus 1040 can be any of several types of bus structures including a memory bus, a peripheral bus, and a local bus using any of a variety of bus architectures. By way of example and not limitation, such architectures include the Industry Standard Architecture (ISA) bus, Enhanced ISA (EISA) bus, the Micro Channel Architecture (MCA) bus, the Video Electronics Standards Association local (VLB) bus, the Peripheral Component Interconnect (PCI) bus, the PCI-Express bus (PCI-X), and the Accelerated Graphics Port (AGP) bus.
Processor(s) 1001 (also referred to as central processing units, or CPUs) optionally contain a cache memory unit 1002 for temporary local storage of instructions, data, or computer addresses. Processor(s) 1001 are coupled to storage devices including memory 1003. Memory 1003 includes random access memory (RAM) 1004 and read-only memory (ROM) 1005. As is well known in the art, ROM 1005 acts to transfer data and instructions uni-directionally to the processor(s) 1001, and RAM 1004 is used typically to transfer data and instructions in a bi-directional manner. Both of these types of memories can include any suitable of the computer-readable media described below.
A fixed storage 1008 is also coupled bi-directionally to the processor(s) 1001, optionally via a storage control unit 1007. It provides additional data storage capacity and can also include any of the computer-readable media described below. Storage 1008 can be used to store operating system 1009, EXECs 1010, application programs 1012, data 1011 and the like and is typically a secondary storage medium (such as a hard disk) that is slower than primary storage. It should be appreciated that the information retained within storage 1008, can, in appropriate cases, be incorporated in standard fashion as virtual memory in memory 1003.
Processor(s) 1001 is also coupled to a variety of interfaces such as graphics control 1021, video interface 1022, input interface 1023, output interface 1024, storage interface 1025, and these interfaces in turn are coupled to the appropriate devices. In general, an input/output device can be any of: video displays, track balls, mice, keyboards, microphones, touch-sensitive displays, transducer card readers, magnetic or paper tape readers, tablets, styluses, voice or handwriting recognizers, biometrics readers, or other computers. Processor(s) 1001 can be coupled to another computer or telecommunications network 1030 using network interface 1020. With such a network interface 1020, it is contemplated that the CPU 1001 might receive information from the network 1030, or might output information to the network in the course of performing the above-described method. Furthermore, method embodiments of the present disclosure can execute solely upon CPU 1001 or can execute over a network 1030 such as the Internet in conjunction with a remote CPU 1001 that shares a portion of the processing.
According to various embodiments, when in a network environment, i.e., when computer system 1000 is connected to network 1030, computer system 1000 can communicate with other devices that are also connected to network 1030. Communications can be sent to and from computer system 1000 via network interface 1020. For example, incoming communications, such as a request or a response from another device, in the form of one or more packets, can be received from network 1030 at network interface 1020 and stored in selected sections in memory 1003 for processing. Outgoing communications, such as a request or a response to another device, again in the form of one or more packets, can also be stored in selected sections in memory 1003 and sent out to network 1030 at network interface 1020. Processor(s) 1001 can access these communication packets stored in memory 1003 for processing.
In addition, embodiments of the present disclosure further relate to computer storage products with a computer-readable medium that have computer code thereon for performing various computer-implemented operations. The media and computer code can be those specially designed and constructed for the purposes of the present disclosure, or they can be of the kind well known and available to those having skill in the computer software arts. Examples of computer-readable media include, but are not limited to: magnetic media such as hard disks, floppy disks, and magnetic tape; optical media such as CD-ROMs and holographic devices; magneto-optical media such as optical disks; and hardware devices that are specially configured to store and execute program code, such as application-specific integrated circuits (ASICs), programmable logic devices (PLDs) and ROM and RAM devices. Examples of computer code include machine code, such as produced by a compiler, and files containing higher-level code that are executed by a computer using an interpreter. Those skilled in the art should also understand that term “computer readable media” as used in connection with the presently disclosed subject matter does not encompass transmission media, carrier waves, or other transitory signals.
As an example and not by way of limitation, the computer system having architecture 1000 can provide functionality as a result of processor(s) 1001 executing software embodied in one or more tangible, computer-readable media, such as memory 1003. The software implementing various embodiments of the present disclosure can be stored in memory 1003 and executed by processor(s) 1001. A computer-readable medium can include one or more memory devices, according to particular needs. Memory 1003 can read the software from one or more other computer-readable media, such as mass storage device(s) 1035 or from one or more other sources via communication interface. The software can cause processor(s) 1001 to execute particular processes or particular parts of particular processes described herein, including defining data structures stored in memory 1003 and modifying such data structures according to the processes defined by the software. In addition or as an alternative, the computer system can provide functionality as a result of logic hardwired or otherwise embodied in a circuit, which can operate in place of or together with software to execute particular processes or particular parts of particular processes described herein. Reference to software can encompass logic, and vice versa, where appropriate. Reference to a computer-readable media can encompass a circuit (such as an integrated circuit (IC)) storing software for execution, a circuit embodying logic for execution, or both, where appropriate. The present disclosure encompasses any suitable combination of hardware and software.
While this disclosure has described several exemplary embodiments, there are alterations, permutations, and various substitute equivalents, which fall within the scope of the disclosed subject matter. It will thus be appreciated that those skilled in the art will be able to devise numerous systems and methods which, although not explicitly shown or described herein, embody the principles of the disclosed subject matter and are thus within its spirit and scope.
This application claims priority to U.S. Provisional App. Ser. No. 61/668,567, titled, “Systems and Methods for Ad-Hoc Integration of Tablets and Phonesin Video Communication Systems,” filed Jul. 6, 2012, the disclosure of which is incorporated herein by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
61668567 | Jul 2012 | US |