The present invention relates to videoconferencing units.
Today's video conferencing systems do not allow for a conversation-like video conference where people are seated in a circle or around a system that is in the middle of the space or table. There are systems today that place a 360 degree camera in the center of a table and the far site is displayed on a wall at one end of the room. The participants are seated on three sides of the table and naturally face the wall display. This results in each participant facing the far site at a different angle, with most participants not facing the camera but instead having at least a portion of the side of their head being seen by the camera. Further, this portion varies with each participant so that it is clear that the participants are not looking at the camera but looking at the display on the wall. This results in a videoconference that is completely different from a normal conversation held in person, where the participants look at each other, and reduces the value of the videoconference.
There have been attempts to address this problem by the use of “presence” systems. However, most presence systems are very expensive and very difficult to set up properly and require significant bandwidth for their communications. This has limited the use of “presence” systems to only the most demanding environments.
According the embodiments of the present invention, systems for videoconferencing are designed where people are seated around a video conferencing system. The systems include a camera so the far site can see the local participants and the systems include displays that show the far site. The displays are properly aligned with the cameras and the local participants so that when people at the far site view the displayed images of the near site, it looks like they have eye contact with the near site. The reverse is also true if the far site has a similar system, so that both groups of participants can have a much more conversational videoconference without the expense and bandwidth of presence systems.
The embodiments allow for participants to sit in a circle or in a geometry where participants see each other around a space and they are all seen by the far site equally well. This is done by placing a surround camera in the center of the space along with the displays that show the far site. When the near site participants look at the far site on the displays, the near site camera provides a near eye-to-eye view to the far site since the camera is placed appropriately with the image of the far site.
Obtaining the alignments of the camera and the displays to provide this apparent eye contact result requires meeting a series of different constraints relating to the various sizes and angles of the components and the locations of the participants.
The present invention has other advantages and features which will be more readily apparent from the following detailed description of the invention and the appended claims, when taken in conjunction with the accompanying drawings, in which:
Embodiments according to the present invention allow for participants to sit in a circle or in a geometry where participants see each other around a space and they are all seen by the far site equally well. This is done by placing a surround camera in the center of the space along with the displays that show the far site. When participants look at the far site, the near site camera provides near eye-to-eye view to the far site since the camera is placed as close as possible to the image of the far site.
A number of variables are relevant to this description and many are shown graphically in
Desired camera height for eye contact—Hcam
Display size width—Dw
Display size height—Dh
Vertical field of view (FOV) of the camera below horizontal axis—FOV−v
Vertical field of view of the camera above horizontal axis—FOV+v
Occlusion of the local participants from each other by the centrally located camera
How far people sit from the camera
Height of the 360° camera relative to top edge of the display so it does not see the top of the displays.—Hdisp
For optimum eye contact with the far site, the camera needs to be positioned close to the display where the far site is shown.
Display angle from vertical, participant's eye is perpendicular to the display—αdisp
The most important fact is the desired camera height Hcam. This can be derived by placing the camera at about an average eye level of an average person.
In practical embodiments, all of these factors come into play. With these factors in mind, it is then best to find the optimum camera height, display height and display tilt angle. Once these quantities are determined, the dimensions of an actual videoconferencing system can be determined. Because of the interrelationships, it is preferred to not allow a user to adjust these alignments so as to provide a “plug and play” experience.
With reference to
To simplify the problem, the following factors were selected to be “known quantities”. These are factors either determined through experiment or left with limited variation or choices due to practical considerations.
Eye level—Heye
This is the eye level of an average person in a sitting position. This is derived through experiment and statistics. It is best to err on the lower end of the eye level distribution to ensure the vast majority of people will not see the camera as an obstruction to the view of the person sitting across the videoconferencing system.
Camera field of view—FOV+v, FOV−v,
Generally a particular camera is chosen for other reasons, so that the FOV of the camera is fixed by the choice.
Display dimension—Dw, Dh
Based on view angle, weight, cost, and other practical considerations, these factors usually limit the display choice to a few options, such as a display size of 27 inches or 23 inches (diagonal). An optimal set of (Hcam, Hdisp, αdisp) should best fit the selected display sizes. Note that the physical dimensions vary, even for a given diagonal size, from model to model due to bezel size variations.
Optimum sitting distance—D
This is how far from the display most people will sit. This is determined by target room size, view angle, social considerations (how close people can comfortably sit together), etc.
To solve the unknown quantities based on the known quantities, the following constrains are applied:
Constrain 1. The camera should be as close to eye level as possible (to maintain good eye contact), though if the camera FOV+v is sufficiently large, the camera cab readily be below eye level.
Hcam≈Heye (Eq. 1)
Constrain 2. Camera should be lower than eye level (to avoid obstruction of people by the camera). As mentioned with regard to Constrain 1, if the camera has a large FOV+v, the camera can be lowered additional amounts as compared to cameras with less FOV+v. This lower position is advantageous because it reduces the angle between the line of sight to the camera and line of sight to the center of the display as discussed in Constrain 3.
Hcam<Heye (Eq. 2)
Constrain 3. The angle between the line of sight to the camera and line of sight to the center of the display should be less than 20° degrees, with smaller angles such as 15°-16° or 10° being advantageous.
θ<20° (Eq. 3)
This constrain is to maintain a good eye contact. A study by Milton Chen, “Leveraging the Asymmetric Sensitivity of Eye Contact for Videoconferencing”, Proceedings of the CHI 2002 Conference on Human Factors in Computing Systems, pp. 49-56, Apr. 20-25, 2002, which is hereby incorporated by reference, suggest that the angle separation between the camera and monitor should be less than 10 degrees before people start to notice eye contact issues. The inventors experience based on testing of units according to the present invention indicates that 20° is a more practical upper bound, with designs often employing angles in the 15°-16° range.
Constrain 4. The line of sight to the center of the display should be perpendicular to the display. If the display is purely vertical, this results in a less pleasing experience as the display is effectively canted with respect to the participant, which causes distortions in the displayed items.
φ=90° (Eq. 4)
Constrain 5. The display's edge should not be seen by the camera.
As shown in the following diagram, this means the angle of the display's outer edge should be outside of the camera field of view as shown in
A similar equation could be derived if 3 or 5 monitors are used to form the center of the room displays.
Given the constrain equations Eq. 1-5, and taking proper approximations considering
we can solve the unknown quantities as follows
Eq. 6 basically says place the camera right below eye level. The most important factor is the height of camera relative to the top edge of the display, Hdisp.
Eq. 7 gives the lower bound of Hdisp based on constrain 5 (edge of the display not in camera view).
Eq. 8 gives the upper bound of Hdisp based on constrain 3 (eye contact). Note that the larger Dw and Dh are, the narrower the range of acceptable Hdisp becomes. Therefore it is generally preferred to use a display with a thinner bezel to minimize Dw and Dh for the same viewing area.
Eq. 9 shows that the display tilt angle is a function of Hdisp, Dh and D. Once Hdisp is determined from the range given by Eq. 7-8, the tilt angle can be easily derived.
As can be seen, the video displays 302A-D generally form the sides of an equilateral polyhedron. Four displays are shown but other numbers could be used if desired, with the video displays still generally forming the sides of an equilateral polyhedron. A 360° panoramic camera is preferred to allow full flexibility in the location of the participants, but in other embodiments the camera could just receive images from an axis aligned with the video displays. In the illustrated case the camera would then get four different images, one for each video display, to capture participants looking at the respective video display.
The central column 304 and arms 306A-D over a separate base 310 are preferred, but other configurations can be used. For example, a sheet metal chassis that is in the shape of the equilateral polyhedron could be used, the video displays mounted to the faces or sides of the chassis and the camera mounted to the top. The chassis could sit on a table or be floor standing, as desired. Numerous other structures could be used to hold the video displays and camera in the determined locations, as dictated by the aesthetics desired by the designers.
In a preferred embodiment of a floor mounted or standing videoconferencing system 300 the camera height is 1062 mm from the ground, the screen dimensions are 655 mm by 435 mm at a αdisp of 15°, with the Hdisp value being 225 mm and the participant seated 1800 mm away from the display with his eye at 1240 mm above the ground. With this embodiment the camera height from the ground can vary by 10 mm, from 1057 mm to 1067 mm and the user distance from the screen can vary from 450 mm to 2500 mm, with the eye height varying between 1140 mm and 1340 mm and still provide acceptable results.
Different numbers of imagers can be used in either a panoramic camera or a camera having views only over the video displays. The camera 602 is connected to a CPU/DSP complex 606. The included CPUs and DSPs provide the processing components to form the panoramic image from the images, encode the image and any audio for transmission using industry standard formats, decode any received image for display, encode any local audio and decode any audio from the far site. A memory 618 holds the necessary software programs and working memory needed for the CPUs and DSPs. A microphone 608 is used to receive the local speech signals. A simple microphone is shown for explanatory purposes, it being understood that many different microphone arrangements could be used, such as liner arrays, circular arrays and the like. An amplifier 610 receives the analog audio output developed by encoding the far site audio and drives a speaker 612 so that participants can hear the audio of the far site. A network adapter 614 is connected to the CPU/DSP complex 606 to provide the interface to connect to a far site videoconferencing unit. Typically the network adapter 614 is an Ethernet adapter configured to use a TCP/IP connection but other types can be used as is well known. The CPU/DSP complex 606 is connected to video displays 616A-D to provide the images from the far site. In preferred embodiments the video displays 616A-D are touch screens to allow easier participant control, though it is noted that when a participant is actually using the touch screen, the participant may be too close to the video display to allow eye contact at the far site. When the participant returns to a normal position after completing operations on the touch screen, the participant will have returned to a position providing eye contact. The images can be presented in various formats. For example, each video display 616A-D could show a panoramic strip and a single large window for the current speaker. Alternatively, each video display 616A-D could show a composite of the four images directly in front of the far site video displays, if a similar videoconferencing unit is present at the far site. Other arrangements and formats can be done if desired and are familiar to those skilled in the art.
With this it is shown that both table top and floor standing videoconference units can be developed that provide eye contact to the far site by following the described procedures.
Embodiments of the invention include:
A near site videoconferencing unit for use with a far site videoconferencing unit, the near site videoconferencing unit comprising: a plurality of rectangular video displays having a predetermined height and width arranged generally vertically and generally forming the sides of an equilateral polyhedron, each video display having a predetermined angle from vertical, the plurality of video displays for displaying video provided from the far site videoconferencing unit; a camera receiving images from at least a plurality of views equal to the plurality of video displays, the camera for providing video images of the near site to the far site videoconferencing unit; and a central structure to which the plurality of video displays are mounted to fix the individual video displays at the predetermined angle from vertical and in the generally equilateral polyhedron configuration and to which the camera is mounted so that the camera is mounted above the plurality of video displays, the central structure having a relationship to a floor of a room so that the camera is a predetermined distance from the floor, wherein the relationship between the height of the camera and the height, width and angle from vertical of the plurality of video displays is such that a participant whose eyes are within a predetermined distance from the camera and a predetermined distance from the floor is perceived by participants at the far site as making eye contact.
The near site videoconferencing unit of embodiment 1, wherein the predetermined distance from the camera, the predetermined distance from the floor, the height of the camera and the height and angle from vertical of the plurality of video displays are such that the angle between the camera and the center of an individual video display with reference to the participant's eyes is less than 10 degrees.
The near site videoconferencing system of embodiment 1, wherein the near end videoconferencing unit is configured to be located on the floor and the predetermined distance of the participant's eyes from the floor is a distance attained in a seated position.
The near site videoconferencing system of embodiment 1, wherein the near end videoconferencing unit is configured to be located on a table, which is in turn located on the floor, the predetermined distance of the participant's eyes from the floor is a distance attained in a seated position and the predetermined distance of the participant's eyes from the camera is attained when the participant is seated at the table.
The near site videoconferencing system of embodiment 1, further comprising: an electronics module coupled to the plurality of video displays and the camera and for connection to the far site videoconferencing unit, the electronics module converting the images from the camera into a videoconferencing format signal for transmission to the far site videoconferencing unit and converting a received videoconferencing format signal from the far site videoconferencing unit for display by the plurality of video displays.
The near site videoconferencing system of embodiment 1, wherein the camera is a 360 degree panoramic camera.
The near site videoconferencing system of embodiment 1, wherein the predetermined distance from the camera and the predetermined distance from the floor of the participant's eyes are each ranges of distances and distance values within each range allow the far site participants to perceive the near site participant as making eye contact.
The above description is illustrative and not restrictive. Many variations of the invention will become apparent to those skilled in the art upon review of this disclosure. The scope of the invention should therefore be determined not with reference to the above description, but instead with reference to the appended claims along with their full scope of equivalents.
This application is a continuation of U.S. patent application Ser. No. 15/354,404, entitled “Method and Design for Optimum Camera and Display Alignment of Center of the Room Video Conferencing Systems,” filed Nov. 17, 2016, which is a continuation of U.S. patent application Ser. No. 14/872,817, entitled “Method and Design for Optimum Camera and Display Alignment of Center of the Room Video Conferencing Systems,” filed Oct. 1, 2015, both of which are hereby incorporated by reference. This is application is related to U.S. patent application Ser. No. 29/539,282, entitled “Videoconferencing Unit,” filed Sep. 11, 2015, which is hereby incorporated by reference.
Number | Name | Date | Kind |
---|---|---|---|
5801919 | Griencewic | Sep 1998 | A |
5808663 | Okaya | Sep 1998 | A |
5890787 | McNelley | Apr 1999 | A |
7746373 | Martini | Jun 2010 | B2 |
7852369 | Cutler | Dec 2010 | B2 |
7920159 | Ueno | Apr 2011 | B2 |
8044990 | Kawaguchi | Oct 2011 | B2 |
8970655 | Paripally | Mar 2015 | B2 |
9615054 | McNelley | Apr 2017 | B1 |
20020018140 | Suemoto | Feb 2002 | A1 |
20030122957 | Emme | Jul 2003 | A1 |
20040114032 | Kakii et al. | Jun 2004 | A1 |
20050024485 | Castles | Feb 2005 | A1 |
20080151053 | Ishii | Jun 2008 | A1 |
20080260131 | Akesson | Oct 2008 | A1 |
20100118112 | Nimri | May 2010 | A1 |
20120162356 | Van Bree et al. | Jun 2012 | A1 |
20140104244 | Baldwin | Apr 2014 | A1 |
20140198172 | Wessling | Jul 2014 | A1 |
20160054756 | Lan | Feb 2016 | A1 |
20160353058 | Caviedes | Dec 2016 | A1 |
20170031434 | Files | Feb 2017 | A1 |
Entry |
---|
Milton Chen, “Leveraging the Asymmetric Sensitivity of Eye Contact for Videoconferencing”, Proceedings of the CHI 2002 Conference on Human Factors in Computing Systems, pp. 49-56, Apr. 20-25, 2002. |
European Search Report received in copending EPO Application No. 16852713.3 dated Apr. 4, 2019, 12 pages. |
Number | Date | Country | |
---|---|---|---|
20190158782 A1 | May 2019 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 15354404 | Nov 2016 | US |
Child | 16037984 | US | |
Parent | 14872817 | Oct 2015 | US |
Child | 15354404 | US |