Video conferencing is an established method of simulated face-to-face collaboration between remotely located participants. A video image of a remote environment is broadcast onto a local display, allowing a local user to see and talk to one or more remotely located participants.
Social interaction during face-to-face collaboration is an important part of the way people work. There is a need to allow people to have effective social interaction in a simulated face-to-face meeting over distance. Key aspects of this are nonverbal communication between members of the group and a sense of being copresent in the same location even though some participants are at a remote location and only seen via video. Many systems have been developed that try to enable this. However, key problems have prevented them from being successful or widely used.
For the reasons stated above, and for other reasons that will become apparent to those skilled in the art upon reading and understanding the present specification, there is a need in the art for alternative video conferencing methods.
In the following detailed description of the present embodiments, reference is made to the accompanying drawings that form a part hereof, and in which is shown by way of illustration specific embodiments of the disclosure which may be practiced. These embodiments are described in sufficient detail to enable those skilled in the art to practice the subject matter of the disclosure, and it is to be understood that other embodiments may be utilized and that process or mechanical changes may be made without departing from the scope of the present disclosure. The following detailed description is, therefore, not to be taken in a limiting sense, and the scope of the present disclosure is defined by the appended claims and equivalents thereof.
The various embodiments involve methods for compositing images from multiple meeting locations onto one image display. This various embodiments provide environmental rules to facilitate a composite image that promotes proper eye gaze awareness and social connectedness for all parties in the meeting. These rules enable the joining of widely distributed endpoints into effective face-to-face meetings with little customization.
By characterizing aspects of social connectedness, the various embodiments can be used to automatically blend images from different endpoints. This results in improvements in social connectedness in a widely distributed network of endpoints.
The reduction of poor, inconsistent eye contact is facilitated for all attendees by establishing consistent rules for camera positions and viewpoint arrangement using a central layout and local views. Gaze awareness is also facilitated using a central layout and local views. People onscreen in separate locations acknowledge each other's relative position by looking at them when speaking, etc.
Relative sizes of people and furniture are made geometrically consistent using rules for image capture. People across separate locations are represented on-screen at a consistent size established by the local view as opposed to arbitrary sizes established by the media stream.
An immersive sense of space is created by making items consistent such as eye level, floor level and table level. Rules are established for agreement between these items between images, and between the image and the local environment. In current systems, these items are seldom controlled and so images appear to be from different angles, many times from above.
The system of rules for central layout, local views, camera view and other environmental factors allow many types of endpoints from different manufacturers to interconnect into a consistent, multipoint meeting space that is effective for face-to-face meetings with high social connectedness.
The various embodiments facilitate creation of a panoramic image from images captured from different physical locations that, when combined, can create a single image to facilitate the impression of a single location. This is accomplished by providing rules for image capture that enable generation of a single panorama from multiple different physical locations. For some embodiments, no cropping or stitching of individual images is necessary to form a panorama. Such embodiments allow images to be simply tiled into a composited panorama with only scaling and image frame shape adjustments.
A meeting topology is defined via a central layout that shows the relative orientation of seating positions and endpoints in the layout. This layout can be an explicit map as depicted in
A central layout may also be defined in terms of metadata or other abstract means. For example, a layout type “round” may be defined with attributes of sites=4, seatspersite=6 and orientation map of [A,B,C,D], indicating that four participant locations would be arranged in circular fashion in order A, B, C, D with a maximum view of six seating widths. This would permit automated ordering and scaling of images as will be described herein.
The central layout may include data structures that define environment dimensions such as distances between sites, seating widths, desired image table height, desired image foreground width and locations of media objects like white boards and data displays.
Generically, a local environment is a place where people participate in a social collaboration event or video conference, such as through audio-visual and data equipment and interfaces. A local environment can be described in terms of fields of video capture. By establishing standard or known fields of capture, consistent images can be captured at each participating location, facilitating automated construction of panoramic composite images.
For some embodiments, the field of capture for a local environment is defined by the central layout. For example, the central layout may define that each local environment has a field of capture to place six seating locations in the image. Creating video streams from standard fields of capture can be accomplished physically via Pan-Tilt-Zoom-Focus controls on cameras or digitally via digital cropping from larger images. Multiple fields can be captured from a single local space and used as separate modules. Central layouts can account for local environments with multiple fields by treating them as separate local environments, for example. One example would be an endpoint that uses three cameras, with each camera adjusted to capture two seating positions in its image, thus providing three local environments from a single participant location.
Each local environment participating in a conference would have its own view of the event. For some embodiments, each local environment will have a different view corresponding to its positioning as defined in the central layout.
The local layout is a system for establishing locations for displaying media streams that conform to these rules. The various embodiments will be described using the example of an explicit portal defined by an image or coordinates. Portals could also be defined in other ways, such as via vector graphic objects or algorithmically.
For one embodiment, the width of the table 220 is wider than the foreground width 222 at line A-A′ such that edges of the table do not appear in the portal 230. The portal 230 further has an image table height 226 representing a height of the table 220 within the portal 230 and an image presumed eye height 226 representing a presumed eye height of a participant 225 within the portal 230 as will be described in more detail herein.
The table 220 has a height 234 above the floor 231. A presumed eye height of a participant 225 is given as height 238 from the floor 231. The presumed eye height 238 does not necessarily represent an actual eye height of a participant, but merely the level at which the eyes of an average participant might be expected to occur when seated at the table 220. For example, using ergonomic data, one might expect a 50% seated stature eye height of 47″. The choice of a presumed eye height 238 is not critical. For one embodiment, however, the presumed eye height 238 is consistent across each local environment participating in a video conference, facilitating consistent scaling and placement of portals for display at a local environment.
The portal 230 is defined by such parameters as the field of capture 215 of the camera 212, the height 234 of the table 220, the angle 213 of the camera 212 and the distance 240 from the camera 212 to the intersection of the field of capture 215 with the table 220. The presumed eye height 238 of a local environment 205 defines the image presumed eye height 228 within the portal 230. In other words, the eyes of a hypothetical participant having a seated eye height occurring at presumed eye height 238 of the local environment would result in an eye height within the portal 230 defining the image presumed eye height 228.
For one embodiment, the distance 236 from the camera 212 to the back edge 218 of table 220 and the angle 213 are consistent across each local environment 205 involved in a collaboration. In such an embodiment, as the field of capture 215 is increased to increase the foreground width 222 of the portal 230, the distance 240 from the camera 212 to the intersection of the field of capture 215 with the table 220 is lessened, thus resulting in an increase in the image table height 226 and a reduction of the image presumed eye height 228 of the portal 230.
For further embodiments, by maintaining consistency of height 234 of table 220 and distance 236 of the back edge 218 of the table 220 from the camera 212, as well as the height 242 of the camera 212, consistent portals 230 may be produced across each local environment 205 using different zoom factors. This facilitates alignment of table heights and presumed eye heights within each portal produced using the same field of capture, allowing the images to be placed adjacent one another to provide an impression of a single work space. Alternatively, or in addition, fields of capture 215 for each local environment 205 may be selected from a group of standard fields of capture. The standard fields of capture may be defined to view a set number of seating widths. For example, a first field of capture may be defined to view two seating positions, a second field of capture may be defined to view four seating positions, a third field of capture may be defined to view six seating positions, and so one.
When parameters are chosen to define the fields of capture such that the scaled portals have similar pixel dimensions (to a casual observer) between their presumed eye height (228 in
As shown in
For some embodiments, the portals 230 are displayed such that their image presumed eye height is aligned with the presumed eye height of the local environment displaying the images. This can further facilitate an impression that the participants at the remote environments are seated in the same space as the participants of the local environment when their presumed eye heights are aligned.
For some embodiments, it may not be possible to display the participants of the portal 230 at their full or normal size. For example, the viewing area of the display 210 may not permit full-size display of the participants due to size limitations of the display 210 and the number of participants that are desired to be displayed. In such situations, a compromise may be in order as bringing the displayed presumed eye height in alignment with the presumed eye height of a local environment may bring the displayed table height 256 to a different level than the table height 234 of a local environment, and vice versa. For some embodiments, wherein the displayed image is less than full scale, the portal 230 could be shifted up from the bottom of the display a distance 254 that would bring the displayed presumed eye height 258 to a level less than the presumed eye height 238 of the local environment, thus bringing the displayed table height 256 to a level greater than the table height 234 of the local environment.
At 872, video image streams are received from two or more remote locations. The video image streams represent the portals of the local environments of the remote endpoints.
At 874, the video image streams are scaled in response to a number of received image streams to produce a composite image that fits within the display area of a local endpoint. If non-participant video image streams are received, such as white boards or other data displays, these video image streams may be similarly scaled, or they may be treated without regard to the scaling of the remaining video image streams.
At 876, the scaled video image streams are displayed in panorama for viewing at a local environment. By maintaining consistency of camera and table placement, and using a single field of capture, the scaled video image streams may be displayed adjacent one another to promote the appearance that participants of all of the remote endpoints are seated at a single table. As noted above, the scaled video image streams may be positioned within a viewable area of a display to obtain eye heights similar to those of the local environment in which they are displayed. One or more of the scaled video image streams may further be displayed in perspective. For further embodiments, the video image streams are displayed in an order representative of a central layout chosen for the video conference of the various endpoints. As noted previously, non-participant video image streams may be displayed along with video image streams of participant seating.
The client management system 983 may be part of an endpoint, such as a computer associated with each endpoint, or it may be a separate component, such as a server computer. The central management system 982 may be part of an endpoint or separate from all endpoints.
In practice, the central management system 982 may contact each of the endpoints involved in a given video conference. The central management system 982 may determine their individual capabilities, such as camera control, display size and other environmental factors. For embodiments using global control of portal characteristics, the central management system 982 may then define a single standard field of capture for use among the endpoints 101-104 and communicate these via local meeting layouts passed to the client management systems 983. The client management systems 983 use information from the local meeting layout to cause cameras of the endpoints 101-104 to be properly aligned in response to the standard specified fields of capture. Local, specific fields of capture then are insured to result in video image streams that correspond to the standardized stream defined by the local and central layout.
Upon defining the characteristics controlling the capture and display of video information, the central management system 982 may create a local meeting layout for each local endpoint. Client management systems 983 use these local layouts to create a local panorama receiving a portal from each remaining endpoint for viewing on its local display as part of the constructed panorama. The remote portals are displayed in panorama as a continuous frame of reference to the video conference for each endpoint. The topography of the central layout may be maintained at each endpoint to promote gaze awareness and eye contact among the participants. Other attributes of the frame of reference may be maintained across the panorama including alignment of tables, image scale, presumed eye height and background color and content.
This is a continuation application of U.S. patent application Ser. No. 12/921,378, titled “DISPLAYING PANORAMIC VIDEO IMAGE STREAMS” and filed Sep. 7, 2010 (pending), which is a National Stage Entry of PCT/US08/58006, titled “DISPLAYING PANORAMIC VIDEO IMAGE STREAMS” and filed Mar. 24, 2008 (published), which claims priority to U.S. Provisional Patent Application Ser. No. 61/037,321, titled “DISPLAYING PANORAMIC VIDEO IMAGE STREAMS” and filed Mar. 17, 2008 (expired), each of which is commonly assigned and incorporated herein by reference in their entirety.
Number | Date | Country | |
---|---|---|---|
61037321 | Mar 2008 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 12921378 | Sep 2010 | US |
Child | 13891625 | US |