The invention is better understood with reference to the following drawings. The elements of the drawings are not necessarily to scale relative to one another. Rather, emphasis has instead been placed upon clearly illustrating the invention. Furthermore, like reference numerals designate corresponding similar parts through the several views.
The present disclosure describes not the creation of a metaphorical auditory space or an artificial 3D representational video space, both of which differ from the actual physical environment of the attendees. Rather, the present disclosure describes and claims what is referred to as a “blended space” for audio and video that extends the various attendees' actual physical environments with respective geometrically consistent apparent spaces that represent the other attendees' remote environments.
Accordingly, a method is described for aligning video streams and positioning cameras in a collaboration event to create this “blended space.” A “blended space” is defined such that is combines a local physical environment of one set of attendees with respective apparent spaces of other sets of attendees that are transmitted from two or more remote environments to create a geometrically consistent shared space for the collaboration event that maintains natural collaboration cues such as eye contact and directional gaze awareness. That is, the other attendee's remote environments are represented in the local physical environment of the local attendees in a fashion that is geometrically consistent with the local physical environment. By maintaining the geometric consistency, the resulting blended space extends the local physical environment naturally and consistently with the way the remote environments may be similarly extended with their own blended spaces. In this manner, each blended space for each set of attendees experiences natural collaboration cues such as sufficient eye contact and sufficient directional awareness of where other event attendees are looking at (e.g. gaze awareness). Each blended space thus provides dimensional consistency for all sets of attendees an apparent shared space that is sufficiently similar for all sets of attendees whether in local or remote locations.
A blended space for more than two meeting rooms is presented to allow for conference meetings in a multi-point meeting. The blended space should provide for approximate directional awareness and substantially direct eye contact with at least one person. Further, as additional sites are added or removed from a meeting, the blended space should allow for adding or removing persons while maintaining the correct geometry of the meeting space thereby maintaining a geometrically consistent environment. Additionally, the geometric environment can be allowed to grow or shrink in a dimensionally consistent manner the meeting as needed to accommodate the appropriate number of participants (such as two, three, or four available seats per each site as non-limiting examples). For instance, the blended space conference table may grow larger as people enter to accommodate more seats and objects across screens do not bend or break. Each site thus accommodates the same number of available seats (although some may be unoccupied or vacant) during each blended space event.
As used in the present specification and in the appended claims, the term “media” is defined to include text, video, sound, images, data, or any other information that may be transmitted over a computer network.
Additionally, as used in the present specification and in the appended claims, the term “node” is defined to include any system with means for displaying and/or transmitting media that is capable of communication with a remote system directly or through a network. Suitable node systems include, but are not limited to, a videoconferencing studio, a computer system, a notebook computer, a telephone, a cell phone, a personal digital assistant (PDA), or any combination of the previously mentioned or similar devices.
Similarly, as used in the present specification and in the appended claims, the term “event” is meant to be understood broadly as including any designated time and virtual meeting place providing systems a framework to exchange information. An event allows at least one node to transmit and receive media information. According to one exemplary embodiment, the event exists separate and distinct from all nodes participating in collaboration. Further, an event may exist while nodes are exchanging information and may also exist while no nodes are participating.
Further, as used in the present specification and in the appended claims, the term “topology” is meant to represent the logical relationship of the nodes in an event, including their connections with each other and their position within the event.
Moreover, as used in the present exemplary specification, the terms “subsystem” and “module” shall be used interchangeably to include any number of hardware, software, firmware components, or any combination thereof. As used in the present specification, the subsystems and modules may be a part of or hosted by one or more computing devices including, but in no way limited to, servers, personal computers, personal digital assistants, or any other processor containing apparatus such as codes, switches, and routers, to name a few. Various subsystems and modules may perform differing functions or roles and together remain a single unit, program, device, or system.
An “event management client” is an originator of an event management request. It may be a human driven event such as with a user interface or a machine request from another node, such as a concierge system running an event management application. Nodes may change their manner of participation in an event. Accordingly, the “event management client,” whether human or machine driven, allows for requesting to start and/or update events in the collaboration event.
In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the present systems and methods. It will be apparent, however, to one skilled in the art that the present systems and methods may be practiced without these specific details. Reference in the specification to “one embodiment” or “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment. The appearance of the phrase “in one embodiment” in various places in the specification are not necessarily all referring to the same embodiment.
A look at a two environment system may be helpful in understanding the difficulty in extending into multi-point configurations. For example,
The present system solves this dilemma by having a system that includes a management subsystem configured to dynamically configure the topology of a virtual collaborative event to create a blended space. The management subsystem is configured to receive and process requests originating from at least one event management client, such as with a user interface or server request. The configuration of the collaborative event topology includes the determination of various media stream connections among multiple nodes based on at least one policy for maintaining a geometrically consistent space. This space preserves eye contact and directional awareness. The media stream connections establish and maintain actual relationships among said nodes.
In one exemplary embodiment, the system is made up of a communication network and a plurality of nodes communicatively coupled to the communication network. A management subsystem is communicatively coupled to the network and interfaces to an event management client. The management subsystem is configured to dynamically manage the topology of a blended space collaborative event based on the event management client.
According to one exemplary embodiment, the virtual relationships established between the various nodes of the present exemplary system can simulate spatial relationships between attendees and promote meaningful interaction. Particularly, according to one exemplary embodiment, the perceived topology and issued directives may correspond to certain virtual relationships being envisioned as seats around an imaginary conference table, where video and audio are perceived to come from the left, right, or directly in front of the attendee. According to one exemplary embodiment, the virtual relationships are maintained throughout an event, giving an event a sense of realism and eliminating distractions.
According to one exemplary embodiment, the consideration of virtual relationships between nodes and their corresponding video streams allows an attendee to speak with remote attendees as if they were looking through a virtual window. Once type of virtual relationship may include, for example, the association of a video input stream from an identified node with a corresponding display, camera, and video output stream to allow natural eye contact between attendees at the two nodes. If video from a first node is displayed on the left-most display of a second node, the left-most camera of the second node may be configured to capture the video stream sent back to the first node. Consequently, when an attendee turns to view the left display, his expressions and comments are transmitted as if he were speaking directly to the attendee displayed on his screen. The connection of video streams to appropriate displays maintains natural eye contact and facilities natural communication among attendees. Additionally, this exemplary configuration allows the participants to know when other participants are distracted or are shifting their attention from one participant to another.
In conjunction with the video arrangement described above, audio streams may also be linked between attendees based on a virtual relationship between the nodes. Specifically, according to one exemplary embodiment, audio recorded from a specific node may be reproduced at the recipient node with the same orientation as the display showing the attendee transmitting the audio stream. Each attendee's voice received then corresponds spatially with the video image of that attendee, enhancing the perceived relationship between the attendees.
For the B location, its left screen M1 is imaged with the empty table. The middle screen M2 is imaged with the camera associated with the C location's middle screen M2 and it is directed at C's attendees seated at the center of the table. The right screen M3 in the B location is imaged with the camera associated with the A location's left screen M1 and it is directed to the A attendee's seated at the center of their conference table 34 thus having an angle 36 that is directed from their left side. Both of the A and C cameras' zooms 38 are be set to display the two attendees with as life size as possible in the respective screen in B's location to simulate a real presence feel.
For the C location, the left screen M1 contains the A attendees imaged with the camera associated with the A location's middle screen M2 and directed at the A attendees having an angle 36 that is directed directly at them. The middle screen M2 in the C location contains the B attendees with the camera associated with the B location's middle screen M2 directed at the B attendees having an angle 36 that is directed directly at them. Both of the A and B cameras' zooms 38 are be set to display the two attendees with as life size as possible in the respective screen in C's location to simulate a real presence feel. The right screen M3 in the C location is imaged with the empty table.
For all locations there can be one or more additional monitors (such as D1,
When creating a geometrically consistent environmental for the participants, there needs to be some way to aesthetically control the visual and audio enthronements so that it appears natural to the participants. Collaboration events appear very natural when attendees' everyday expectation regarding their visual relationship to other attendees are preserved. Accordingly, the blended space of the present disclosure is configured to be geometrically consistent to facilitate each natural eye contact and third-party awareness of interactions among other attendees.
In such a geometrically consistent blended space, the camera angles 36 are determined based on the assignment of a set of attendees into a location in the virtual space to allow for sufficient direct eye contact. Further, if an assignment is left open, the video stream is substituted with an acceptable image to maintain the illusion of a geometrically consistent environment. For example, an empty table image (but geometrically and thus dimensionally consistent) is one extreme of this illusion when there may be multiple screens but not enough participating sites with sets of attendees.
A blended space therefore combines a local physical environment (a set of attendee's actual local collaboration room) with apparent spaces transmitted from one or more remote environments (the other set of attendee's local collaboration rooms) that are represented locally in a fashion that is geometrically consistent with the local environment. This resulting blended space extends the local environment naturally and consistently with the way the remote environments may be similarly extended. That is, each local collaboration room has its own local environment that has a blended space created with the other remote environments. However, each blended space must be created to allow the others to maintain geometric consistency. In this manner, each blended space experiences natural collaboration cues such as sufficient eye contact and thus sufficient awareness of where other event attendees are looking (gaze awareness). Accordingly, an apparent shared space is created that is sufficiently similar for all attendees local and remote.
The blended space is typically designed to correspond to a natural real-world space, such as a meeting room with a round conference table 34 arranged with meeting attendees around it. A particular blended space for each local collaboration studio is determined based upon the geometrical positioning and zoom factor of the video camera(s) and display(s) within each physical local environment that is participating in a collaboration event. Determination of the blended space considers the relative positioning of the cameras and displays to assign where the output of each camera will be displayed. Therefore, for a given combination of environment types (e.g., three cameras, each center-mounted above three side-by-side displays) and collaboration event types (e.g., three environments of the same type each displaying four attendees), the blended space may be represented by meta-data sufficient for each environment to be configured for participating in the event. Such meta-data may be determined by formula or by other means. One assignment scheme uses a modulo number of positions. For example, the formula=MOD(virtual_position−1,N) where N=4 for four positions will generate the results shown in Table 1.
A collaboration event may change, for example when another set of attendees from another remote environment joins the collaboration event. Under these circumstances, the blended space may be re-calculated and the camera and display assignments updated to reflect a different blended space, which typically will be defined substantially similar to the blended space it replaces.
When the nature of the collaboration event allows or requires, non-attendee video stream(s) may be assigned to the displays of the environment(s) to enhance the appearance of the blended space. For example, an image of a portion of a meeting table may complete the blended space when there is a lack of a set of attendees to complete the desired collaboration event.
One typical generalized multi-point blended space may include a videoconference system with one or more acquisition devices such as cameras, microphones, scanners, speakers and one or more reproduction devices such as displays, speakers, printers. In addition, the videoconference system will need one or more data paths connecting these devices sufficient to enable acquired data with one or more connection. Given a collaboration studio environment with three monitors plus the local table, consider a video conference with four connects between a first company COA with two sites CVB4 and CVB10 and a second company COB and its two sites DWRC and DWGD. One can arbitrarily assign each site a position in the blended space around a round conference table such as in Table 2 and illustrated in
For audio and video, a model is made for which stream carries the active camera and audio for each position. At any site, any table position is activated for a video stream but has to be sure to mix in the audio to that video stream. Additional flexibility in the configuration allows for rather than having four physical sites, just 3 physical sites with one site activating 2 cameras to achieve 4 streams. Thus, each seat around the round conference table represents a monitor with an associated camera. Accordingly, each camera at a site is treated as a separate position around the table. A physical site with just two active camera when there are three available may have either a dead (inactive) display or a display with an image of just the conference table.
Assuming we number the displays as in
This mapping of the video streams thus create the 4-point camera mapping as in
To ensure that the blended space is configured properly, a user interface must be able to represent this blended space visually so that the meeting participants easily comprehend it.
One method of allowing additional attendees to join the blended space is to provide one or more event management clients, such as with a user interface (UI) (see
The UI provides feedback in the form of a connection sequence animation that provides confirmation to the meeting organizer that the invites have been sent, and that connections are occurring with the studios before people actually show up on display and audio.
The UI allows for a spatial orientation map for users to easily grasp. Overall the 3D tabletop icons represent the meeting space as occupied by the number of table sections in the map. The orientation is from the observers point of view, with the one ‘front’ table representing the ‘here’ for each observer location, and 1-3 tabletops across from it representing the relative ‘there’ locations assigned to or joined in the meeting.
The UI allows for invitation usability with the ‘there’ tables mapped to their respective people displays, setting up invitations clearly communicates to the users in advance which displays their meeting attendees will occupy. The ordering can be a default sequence, or customized during invitation to rearrange attendees to match the appropriate table locations for the meeting.
The UI permits people locations ‘on camera’ ad ‘off camera’ to be distinguished graphically. For example, iconic seating locations, matching the number of seats in each respective studio, are highlighted or dimmed to indicate which seat locations will be sent to the other studio people displays. This graphical distinguishing helps users understand if there may be additional users at locations that they can hear, but not see on the display.
One advantage of this 3D icon interface includes spatial orientation for users of the entire blended space event relative to their individual location. This spatial orientation includes who is at the meeting table and where is each seating location. Invitation usability is enhanced by placing locations at their seating locations by purposeful placement of 1 or more locations in the event space. The location names and local times are tied to people displays. Further, people on the display can be visually mapped to their unique physical location by the relative position of text and table section on the table icon graphic. Accordingly, people locations ‘on camera’ and ‘off camera’ are distinguished graphically.
While the present invention has been particularly shown and described with reference to the foregoing preferred and alternative embodiments, those skilled in the art will understand that many variations may be made therein without departing from the spirit and scope of the invention as defined in the following claims. This description of the invention should be understood to include all novel and non-obvious combinations of elements described herein, and claims may be presented in this or a later application to any novel and non-obvious combination of these elements. The foregoing embodiments are illustrative, and no single feature or element is essential to all possible combinations that may be claimed in this or a later application. Where the claims recite “a” or “a first” element of the equivalent thereof, such claims should be understood to include incorporation of one or more such elements, neither requiring nor excluding two or more such elements.
This application claims the benefit of U.S. Provisional Application No. 60/803584, filed May 31, 2006 and herein incorporated by reference. This application also claims the benefit of U.S. Provisional Application No. 60/803,588, filed May 31, 2006 and herein incorporated by reference.
Number | Date | Country | |
---|---|---|---|
60803584 | May 2006 | US | |
60803588 | May 2006 | US |