The present embodiments relate generally to digital conferences, such as video conferences, audio conferences, or both video and audio conferences.
A digital conference may be a conference that allows two or more conferencing devices to interact via two-way video and/or audio transmissions. Digital conferencing uses telecommunications of audio and/or video to bring people at different sites together for a meeting. This may include a conversation between two people in private offices (e.g., point-to-point) or involve several sites (e.g., multi-point) with more than one person in large rooms at different sites. Besides the audio and visual transmission of meeting activities, videoconferencing can be used to share documents, computer-displayed information, and whiteboards.
The present embodiments relate to digital conferences. Digital conferences may include video conferences, audio conferences, or both video and audio conferences. However, other technology may be included in the digital conferences, such as document sharing, computer-displayed information, and whiteboards. The present embodiments relate to video conferences in which a mobile device is used as a video conferencing system. The mobile device may be, for example, a small screen mobile device, such as a cellular telephone, smart phone, personal digital assistant, book reader, or electronic tablet. In one embodiment, a video signal may be adjusted to correspond to a display device of the mobile device. The resolution, size, bandwidth, frame rate, and/or focus of the video signal may be adjusted. For example, the video signal may be adjusted to focus on and optimize the display of the face of the speaker that is presently speaking in the video conference. In another embodiment, a video signal may be selected and displayed based on conference participant input, the conference participant speaking or scheduled to speak, or a time interval.
Adjusting a video signal to correspond to a display device of a mobile device may be beneficial because video conference systems generally provide a full size (e.g., a “life size”) image on a screen appropriate for room based systems. For a conference participant using a video capable mobile device, rendering the full size image onto a small screen is not that useful. The present embodiments relate to adjusting the full size image to fit or correspond to a display device of the video capable mobile device. Adjusting the full size image may include adjusting the size (e.g., shrinking) of the full size image, adjusting the resolution, and/or focusing on one or more portions of full size image. Focusing may include cropping or clipping the full size image, which may involve removing the background of the full size image. The cropped image may focus on a video conference participant's face. For example, focus may be on the video conference participant that is speaking or is scheduled to speak, allowing the video conference participant using the video capable mobile device to view a close up image or video of the video conference participant speaking.
Selecting and displaying a video signal may be beneficial because the video conference may include multiple video conference participants and a video conference participant may want or need to scroll through close up images or video of the conference participants in the video conference.
In one aspect, a method may be performed by a conferencing gateway. The method includes receiving a video signal at a conferencing gateway, the video signal being received as input to one or more conferencing devices that are used to participate in a video conference, adjusting the video signal to conform to a mobile conferencing device specification to optimize viewing on a mobile conferencing device, and transmitting the adjusted conferencing signal to the mobile conferencing device for display on a display device of the mobile conferencing device.
In a second aspect, computer readable storage media may include logic that is executed by a processor to receive one or more video signals, the one or more video signals being output to one or more conferencing devices that are used to participate in a video conference, select a video signal based on a conference context, adjust the selected video signal to conform to a display device of a mobile conferencing device and the conference context, transmit the adjusted video signal to the mobile conferencing device for display on the display device of the mobile conferencing device, and transmit the one or more video signals to the one or more conferencing devices.
In a third aspect, a system includes a video conferencing device configured to generate a video signal, a conference gateway configured to receive the video signal and adjust the video signal to conform to a mobile conferencing device specification, and a mobile conferencing device configured to receive the adjusted video signal from the conference gateway and present the adjusted video signal on a display.
The networks 102-106 may be telecommunication networks, digital networks, wireless networks, wired networks, radio networks, Internet networks, intranet networks, Transmission Control Protocol (TCP)/Internet Protocol (IP) networks, Ethernet networks, packet-based networks, fiber optic network, telephone network, cellular networks, computer networks, public switched telephone networks, or any other now known or later developed networks. Example telecommunication networks may include wide area networks, local area networks, virtual private networks, peer-to-peer networks, and wireless local area networks. The networks 102-106 may be operable to transmit messages, communication, information, or other data to and/or from the server 150.
The conferencing devices 110-140 may be owned, operated, managed, controlled, viewed, programmed, or otherwise used by one or more users. For example, in one embodiment, as shown in
The conferencing devices 110-140 may be public switched telephones, cellular telephones, personal computers, personal digital assistants, mobile devices, electronic tablets, remote conferencing systems, small-screen devices, large-screen devices, video conferencing systems, or other devices that are operable to participate in video conferences.
For example, in one embodiment, the conferencing device 110 may be a video-enabled cellular telephone, such as an iPhone® sold by Apple, Inc. or an HTC Fuze® sold by HTC, Inc. The video-enabled cellular telephone may be operable to stream video from the server 150. The video-enabled cellular telephone may include a video camera 116, which may or may not be used during a video conference.
In an example embodiment, the conferencing device 120 may be a telepresence system, such as the Cisco TelePresence System 3000 sold by Cisco, Inc. The Cisco TelePresence System 3000 is an endpoint for group meetings, creating an environment for multiple people to meet in one location, and to be “virtually” joined by additional people. In one embodiment, the Cisco TelePresence System 3000 integrates three 65-inch plasma screens and a specially designed table that seats six participants on one side of the “virtual table.” The Cisco TelePresence System 3000 may support life-size images with ultra-high-definition video and spatial audio. A multipoint meeting can support many locations on a single call. The Cisco TelePresence System 3000 may include one or more cameras, a lighting array, microphones, and speakers. Cisco TelePresence System 3000 allows participants to see and hear each conference participant.
The conferencing devices 110-140 may include a display device 112, an input device 114, and a video camera 116. Additional, different, or fewer components may be provided. For example, in one embodiment, the video camera 116 is not provided or just not used. As discussed below, the conferencing device 110 may be a cellular telephone that includes a video camera 116 but because the video camera 116 is located on the opposite side of the telephone as display device 112, video camera 116 may or may not be used during a video conference. In another embodiment, a wireless communication system may be provided. The wireless communication system may be operable to communicate via a wireless network.
The display device 112 may be a cathode ray tube (CRT), monitor, flat panel, touch screen, a general display, liquid crystal display (LCD), projector, printer or other now known or later developed display device for outputting information. The display device 112 may be operable to display one or more images, text, video, graphic, or data. Additional, different, or fewer components may be provided. For example, multiple displays and/or speakers may be provided.
As shown in
The display device 112 may be a small screen or a large screen. A small screen may be sized to display only one or only a few (e.g., 2, 3, or 4) conference images 116. For example, the display device 112 of the conferencing system 110 may be a small screen display device. In contrast to the display device of the conferencing system 120, which may be a projection screen sized to display a plurality of images, the display device 112 may only be large enough to display a single conference image 116. For example, the small screen display device 112 may only be large enough to display a video signal from a single user. A large screen may have one or more display devices that are sized to display a plurality of conference images. For example, the conferencing device 120 may be sized to display a conference image 116 of all or some of users participating in the video conference, for example, User U1, U3, U4. A small screen may be able to display multiple images or a single image combined from multiple cameras. However, the size may result in undesired resolution or detail being shown for an image or images displayed at a same time on the small screen.
Example sizes of small screen display devices may include approximately 0.5-24 inches. In one embodiment, the size of a small screen display device is less than 8 inches. Example sizes of large screen display devices may include approximately 12 inches to 8 feet. In one embodiment, the size of a large screen display device is 60 inches.
The input device 114 may be a user input, network interface, external storage, other device for providing data to the server 150, or a combination thereof. Example user inputs include mouse devices, keyboards, track balls, touch screens, joysticks, touch pads, buttons, knobs, sliders, combinations thereof, or other now known or later developed user input devices. The user input may operate as part of a user interface. For example, one or more buttons may be displayed on a display. The user input is used to control a pointer for selection and activation of the functions associated with the buttons. The input device 114 may be a hard-wired or wireless network interface. For example, the input device 114 may be coupled with the networks 102-108 to receive data from one or more communication devices. For example, the conferencing devices 110-140 may be controlled from a remote location. A universal asynchronous receiver/transmitter (UART), a parallel digital interface, a software interface, Ethernet, or any combination of known or later developed software and hardware interfaces may be used. The network interface may be linked to various types of networks, including a local area network (LAN), a wide area network (WAN), an intranet, a virtual private network (VPN), and the Internet. The input device 114 may include a telephone keypad. The telephone keypad may include keys that produce dual-tone multi-frequency (DTMF) tones and may be referred to as “DTMF keys.” For example, DTMF keys 2, 4, 6, and 8 may be used as arrows for providing input.
The server 150 may be a DSP/video gateway, central server, telepresence server, Web server, video conferencing server, secure server, internal server, conferencing server, personal computer, or other device or system operable to support a video conference. The server 150 may be configured or programmed to support a video conferencing. Video conferencing uses telecommunications of audio and video to bring the Users U1-U4, which may be at the same or different sites, together for a meeting. Video conferencing may include a conversation between two people in private offices (point-to-point) or involve several sites (multi-point) with more than one person in large rooms at different sites. Besides the audio and visual transmission of meeting activities, videoconferencing can be used to share documents, computer-displayed information, and whiteboards. More than one server 150 may be used for a given video conference.
Supporting a video conference may include establishing, setting up, joining, and/or maintaining connection to a video conference connection. As shown in
The server 150 may receive one or more conferencing signals from the conferencing devices 110-140. A conferencing signal may include an audio signal and a video signal. Alternately, the video signal may include audio. The video signal may include one or more conference images 116. For example, the server 150 may receive a conference image 116 of User U1 from the conferencing device 110; a conference image 116 of User U2 from the conferencing device 120; a conference image 116 of User U3 from the conferencing device 130; and a conference image 116 of User U4 from the conferencing device 140.
The server 150 may transmit one or more conference signals conference images 116 to the conferencing devices 110-140. For example, the server 150 may transmit a conference image 116 of User U2, U3, and/or U4 to the conferencing device 110. The conference images may be transmitted at the same or different times.
The server 150 may generate an adjusted conferencing signal that conforms to a specification of the conferencing device. A specification may include a requirement, capability, preference, setting, or other specification that optimizes viewing. The adjusted conferencing signal may include an adjusted conference image 116. The resolution, size, or focus of the conferencing signal, or any combination thereof may be adjusted. For example, in one embodiment, the conferencing system 120 may adjust the resolution of a video signal. In this example, the conferencing device 120 may record a video signal with a resolution of 1080 p and transmit the video signal to the server 150. Prior to sending the video signal to the conferencing system 110, which may be mobile device with a low-resolution display device 112, the server 150 may adjust the resolution of the video signal to correspond to the display device 112. A low-resolution display device may have a resolution of 720 p, 480 by 320 pixels or 800 by 480 pixels. Video signals with such low-resolutions may be transmitted at a lower bandwidth than video signals with a higher resolution (e.g. 1080 p). Accordingly, the server 150 may adjust the resolution of the video signal to lower the bandwidth of the video signal. In other words, the sever 150 may adjust the video signal to include the optimum or acceptable video for a particular display device or mobile device.
In another implementation, the server 150 may adjust a display size. As used herein, the “display size” relates to the size that the video signal is displayed on a viewing device, such as a display device 112. Adjusting the display size may include shrinking or enlarging. For example, the conferencing system 120 may record a video signal that is to be viewed on a large screen conferencing system, such as a projection screen (e.g., approximately 60+ inches). The video signal may be transmitted to the server 150. The server 150 may recognize that the display device 112 of the conferencing device 110 includes a small screen (e.g., approximately 3 inches). The server 150 may adjust the display size of the video signal to fit on the small screen. For example, surrounding regions for a life size image are clipped so that the image may be displayed smaller than life size with desired resolution. Adjusting the display size may also include adjusting resolution or focus, in order to avoid rending the video signal unclear or fuzzy. Adjusting the display size also reduces the required bandwidth for the video signal.
In another implementation, the server 150 may adjust the frame rate of the video signal to correspond to the capabilities of display device 112 of the mobile device. A frame rate of the display device may be anywhere from 1 frame per second to 80 FPS. Frame rates suitable for broadcast quality video (e.g. 60 FPS) can be achieved by mobile devices. However, lower frame rates (e.g. 10 FPS) require less bandwidth. Accordingly, the server 150 may adjust the frame rate of the video signal to lower the bandwidth of the video signal.
In yet another embodiment, the server 150 may adjust the display focus. Adjusting the display focus may include focusing or cropping around, for example, a face of a conference participant (e.g., User U1-U4). For example, in one embodiment, as shown in
The server 150 may be operable to select a conference image 118 based on conferencing context. Conferencing context may include speaker information, a time interval, user input, or other data about the video conference.
For example, in one embodiment, as shown in
In response to recognizing that user U4 is speaking, the server 150 may transmit the conference signal from conferencing device 140 to the conferencing device 110, the conferencing device 120, the conferencing device 130, or a combination thereof. In the event that user U3 was speaking prior to user U4, the server 150 may stop or cease transmitting the conference signal from user U3 in response to detection of user U4 speaking 300. In one embodiment, both the conference signal from user U3 and U4 may be transmitted, for example, when user U3 and U4 are speaking simultaneously.
Alternatively, or additionally, a conferencing device 110-140 may determine speaker information and detect which user is speaking. For example, as shown in
As shown in
As shown in
The server 150 may transmit the adjusted conferencing signal to the mobile conferencing device for display on a display device of the mobile conferencing device. For example, the server 150 may transmit the adjusted conferencing signal using a protocol, such as the session initiation protocol (SIP), H.323, or a web based protocol such as HTTP/HTML and/or RTSP or some other rich-media protocol. In alternative embodiments, the mobile conference device receives the conferencing signals and adjusts at the mobile conference device.
The server 150 may include a processor and memory. Additional, different, or fewer components may be provided. The processor may be coupled with the memory. Although the server 150 is referred to herein as a server, the server 150 may be a personal computer, gateway, router, mobile device, or other networking device. In an alternative embodiment, one, some, or all of the acts performed by the server 150 may be performed on or in a conferencing device or intermediary component.
The processor may be a general processor, digital signal processor, application specific integrated circuit, field programmable gate array, analog circuit, digital circuit, combinations thereof, or other now known or later developed processors. The processor may be single device or a combination of devices, such as associated with a network or distributed processing. Any of various processing strategies may be used, such as multi-processing, multi-tasking, parallel processing, or the like. Processing may be local, as opposed to remote. In an alternative embodiment, processing may be performed remotely. Processing may be moved from one processor to another processor. The processor may be responsive to logic encoded in tangible media. The logic may be stored as part of software, hardware, integrated circuits, firmware, micro-code or the like.
The memory may be computer readable storage media. The computer readable storage media may include various types of volatile and non-volatile storage media, including but not limited to random access memory, read-only memory, programmable read-only memory, electrically programmable read-only memory, electrically erasable read-only memory, flash memory, magnetic tape or disk, optical media and the like. The memory may be a single device or combinations of devices. The memory may be adjacent to, part of, programmed with, networked with and/or remote from processor.
The processor may be operable to execute logic encoded in one or more tangible media, such as memory. Logic encoded in one or more tangible media for execution may be instructions that are executable by the processor and that are provided on the computer-readable storage media, memories, or a combination thereof. The processor is programmed with and executes the logic. The functions, acts or tasks illustrated in the figures or described herein are executed in response to one or more sets of logic or instructions stored in or on computer readable storage media. The functions, acts or tasks are independent of the particular type of instructions set, storage media, processor or processing strategy and may be performed by software, hardware, integrated circuits, firmware, micro code and the like, operating alone or in combination.
In one embodiment, the memory may be computer readable storage media and may include logic that is executed by the processor to receive one or more video signals, the one or more video signals being input to one or more conferencing devices that are used to participate in a video conference; select a video signal based on conference context; adjust the selected video signal to conform to a specification of a mobile conferencing device; and transmit the adjusted conferencing signal to the mobile conferencing device for display on a display device of the mobile conferencing device.
In act 810, the server or conferencing gateway may receive one or more video signals. The one or more video signals may be input to one or more conferencing devices that are used to participate in a video conference. For example, the one or more conferencing devices may have video conferencing cameras and/or microphones to capture video signals and audio signals. The video and audio signals may be transmitted to the server. The video signal may be a high resolution signal or formatted to fit a large screen. The server optionally determines a conference context.
The conference context may include speaker information, a time interval, facial recognition, speaker information, user input, or a combination thereof. For example the server may determine which user is speaking based on facial recognition of the user speaking, based on a time interval when the user speaking is scheduled to speak, based on audio present in the video signal of the user speaking, or simply based on a user input. The user input could originate with either the user of the mobile conferencing device or the user speaking.
In act 820, the server may adjust the video signal to conform to a mobile conferencing device specification and/or based on the conference context. For example, the video signal may be adjusted to focus on a speakers face based on the display size of the mobile conferencing device. In another example, the video signal may be adjusted from a first resolution to a second resolution. In act 830, the adjusted video signal may be transmitted to the mobile conferencing device for display on a display device of the mobile conferencing device.
Various embodiments described herein can be used alone or in combination with one another. The foregoing detailed description has described only a few of the many possible embodiments. For this reason, this detailed description is intended by way of illustration, and not by way of limitation.