This application is directed, in general, to providing images to display, such as, providing images for a videoconferencing terminal.
This section introduces aspects that may be helpful in facilitating a better understanding of the disclosure. Accordingly, the statements of this section are to be read in this light and are not to be understood as admissions about what is in the prior art or what is not in the prior art.
Communication via computer networks frequently involves far more than transmitting text. Computer networks, such as the Internet, can also be used for audio communication and visual communication. Still images and video are examples of visual data that may be transmitted over such networks.
One or more cameras may be coupled to a computing device, such as a personal computer (PC), to provide visual communication. The camera or cameras can then be used to transmit real-time visual information, such as video, over a computer network. Dual transmission can be used to allow audio transmission with the video information. Whether in one-to-one communication sessions or through videoconferencing with multiple participants, participants can communicate via audio and video in real time over a computer network (i.e., voice-video communication).
One aspect provides an apparatus. In one embodiment, the apparatus includes: (1) an audio source identifier configured to locate an audio source based on multimodal sensor data from at least two different types of sensors and (2) an image selector configured to automatically direct a camera to view the audio source.
In another aspect, a method of directing a camera to view an audio source is disclosed. In one embodiment, the method includes: (1) locating an audio source based on multimodal sensor data from at least two different types of sensors and (2) automatically directing a camera to view the audio source.
In yet another aspect, a video conferencing terminal is provided. In one embodiment, the video conferencing terminal includes: (1) a camera configured to capture images within a field of view and (2) an audio source locator and tracker configured to locate an audio source based on multimodal sensor data from at least two different types of sensors and automatically direct the camera to view the audio source.
Reference is now made to the following descriptions of embodiments, provided as examples only, taken in conjunction with the accompanying drawings, in which:
The disclosure provides a locating and tracking scheme that employs sensor data from multiple types of sensors (i.e., multimodal sensor data) to locate and track audio sources. The disclosure provides an apparatus for locating and tracking a single or multiple audio sources and directing a camera to capture an images, or images, of the located and tracked audio source. Locating an audio source enables pointing a camera thereat even when there may be multiple audio sources in vicinity. Tracking an audio source enables directing the camera to follow the audio source as it moves.
A video conferencing terminal may employ the disclosed locating and tracking functionality. Accordingly, the audio source to locate and track may be a participant of a video conference who is speaking. In a video-conferencing scenario where there are multiple persons in a meeting room, detecting the participant speaking and targeting the camera on that participant so that a remote location receives the image of the active speaker can be a challenge. A video conferencing terminal with the locating and tracking functionality as disclosed herein allows a person at a remote location from the camera to be able to view the participant who is speaking without the remote person manually steering the camera to stay on the speaker.
As such, a video conference terminal disclosed herein may include speaker localization that allows pointing a camera at the speaker even when there are multiple persons seated around a meeting table. Additionally, the video conferencing terminal may include speaker tracking that allows following the speaker who is not static but is moving around. An example of this case would be when the speaker gets up and starts walking towards a whiteboard.
The locating and tracking functionality disclosed herein may combine audio, video and other sensors, such as thermal and ultrasonic sensors, to locate and track an audio source. In contrast, speaker localization schemes that only use audio (sound source localization) to locate speakers may be prone to errors from background noises and may fail when there are multiple simultaneous speakers. Thus, the disclosure combines sensor data, such as sound source localization with thermal and ultrasonic measurements, to increase accuracy when pointing a camera. The combination of the various types of sensors provides sensor data fusion which is an algorithmic combination of multimodal sensor inputs, i.e., combining data from not just multiples of sensors but also different types of sensors. The combination of the thermal and ultrasonic sensors enables the detection of a person even when the person is not speaking. This is advantageous over audio-only methods (cannot detect when a person is quiet) and video methods such as face detection (where there can be detection failure due to occlusions or rotation of a target face away from the camera). The sensors that are employed may be mounted with a locating and tracking apparatus, such as a video conferencing terminal. In addition, information from other sensors mounted on the walls, ceiling or furniture may be used for sensor data fusion.
In one embodiment, the video conferencing terminal 200 may be implemented as a single device, such as illustrated in
The camera 210 is configured to capture images. The camera 210 may be a video camera, such as a webcam. Additionally, the camera 210 can be used for locating and tracking audio sources such as, for example, individuals who are speaking during a video conference. Accordingly, the camera 210 has pan, tilt and zoom capabilities that allow the camera 210 to dynamically capture images of located and tracked audio sources. The camera 210 may include pan and tilt servos to view a located and tracked audio source. To view an audio source, the camera 210 is manipulated so that a field of view thereof includes the audio source. In some embodiments, the video conferencing terminal 200 itself may move to allow the camera 210 to view a located or tracked audio source. Accordingly, the video conferencing terminal 200 may include pan and tilt servos that move the video conferencing terminal 200 to view an audio source. As such, the pan and tilt servos may be located in a base of the camera 210 or in a base of the video conferencing terminal 200. In addition to pan and tilt capability, the camera 210 may include the ability to zoom-in and zoom-out.
The display 220 may be a conventional display, such as a flat panel display, that presents a view based on input data. In one embodiment, the display 220 may be a liquid crystal display (LCD). The display 220 is coupled to the audio source locator and tracker 240. Conventional audio-video cable may be used to couple the devices together. Wireless connections may also be employed. In some embodiments, the display 220 may be a stand-alone, projector display.
The locating and tracking sensors 230 include multiple types of sensors for locating and tracking an audio source. The various types of sensors are used to provide multimodal sensor data for audio source locating and tracking. The locating and tracking sensors 230 may include a sound sensor 232, a thermal sensor 234 and a distance sensor 236. The locating and tracking sensors 230 may include an additional sensor or sensors as represented by the component 238.
The sound sensor 232 may be a microphone or multiple microphones that are configured to generate an audio signal based on acoustic energy received thereby. As such, the sound sensor 232 may be used to locate the audio source based on audio. In some embodiments, an array of microphones may be used. In one embodiment, stereo microphones may be used.
The thermal sensor 234 is configured to detect an audio source based on temperature. In one embodiment, the thermal sensor 234 may measure the average temperature sensed in a cone of a given angle. The cone may be in a range between about 10 degrees to about 35 degrees. The average temperature may be obtained as a background temperature of a location, such as room, without a person. The average temperature can then be used as a reference. When a person steps into the purview (i.e., the cone) of the thermal sensor 234, such as a speaker during a video conference, the temperature measured would be higher than the background temperature. The distance of the person from the thermal sensor 234 can be determined depending on the measured temperature including the person. The distance may be determined based on a corresponding range of expected temperature values. The corresponding ranges may be stored in a memory associated with a controller of the video conferencing terminal 200. In one embodiment, the thermal sensor 234 may be a conventional thermal sensor.
The thermal sensor 234 may include multiple thermal sensors or thermal detecting devices. In one embodiment, the thermal sensor 234 may include an array of thermal sensing devices. The multiple thermal sensing devices may be distributed around a rotating portion of the videoconferencing terminal 200. As such, a map of a room can be provided with a scan of a smaller angular range for the video conferencing terminal 200.
The distance sensor 236 obtains and provides data on the distance of objects from the distance sensor 236. As such, the distance sensor 236 may be a conventional range finder. Accordingly, the distance sensor 236 may also be configured to detect movement towards and away therefrom. In one embodiment, the distance sensor 236 may be an ultrasonic range finder. An ultrasonic range finder, which may have up to 1-inch accuracy, can be used. Other types of range finders in addition to an acoustic range finder, such as an optical or radar based range finder, may also be used.
The distance sensor 236 may also include multiple distance sensing devices such as range finders. In one embodiment, the distance sensor 236 may include an array of distance sensing device. The multiple distance sensing devices may be distributed around the rotating portion of the videoconferencing terminal 200 to allow mapping of a room employing a smaller angular scan. Thus, compared to having just a single sensor, the videoconferencing terminal 200 would not have to make a larger scan of the room (e.g., 360 degrees) to obtain a map of the people in the room.
The additional sensor 238 may be yet another type of sensor used to collect data for locating and tracking an audio source. The additional sensor 238 may be a video-based sensor that is used to detect movement of an audio source. As such, the additional sensor 238 may be a motion detector in one embodiment. In other embodiments, the additional sensor 238 may be another type of sensor (e.g., another type of conventional sensor) that may be employed to collect and provide data for locating and tracking an audio source.
The audio source locator and tracker 240 is configured to locate and track an audio source and direct the camera 220 to view the located and tracked audio source. The audio source locator and tracker 240 performs the locating and tracking based on multimodal sensor data received from multiple types of sensors. The audio source locator and tracker 240 may be embodied as a processor with an associated memory that includes a series of operating instructions that direct the operation of the processor when initiated thereby. In some embodiments, the audio source locator and tracker 240 may be implemented as dedicated hardware or a combination of dedicated hardware and software. When embodied as a processor, the functions of the audio source locator and tracker 240 may be provided by a single dedicated processor, by a single shared processor, or by a plurality of individual processors, some of which may be shared. Moreover, explicit use of the term “processor” or “controller” should not be construed to refer exclusively to hardware capable of executing software, and may implicitly include, without limitation, digital signal processor (DSP) hardware, network processor, application specific integrated circuit (ASIC), field programmable gate array (FPGA), read only memory (ROM) for storing software, random access memory (RAM), and non volatile storage. Other hardware, conventional and/or custom, may also be included. In one embodiment, the audio source locator and tracker 240 may be implemented as part of the controller of the video conferencing terminal 200.
The audio source locator and tracker 240 includes an audio source identifier 244 and an image selector 248. The audio source identifier 244 is configured to locate an audio source based on multimodal sensor data from the locating and tracking sensors 230. The image selector 248 is configured to automatically direct the camera 210 to view the audio source. In some embodiments, the audio source identifier 244 is further configured to locate potential audio sources based on at least some of the multimodal sensor data. The image selector 248 may also be configured to generate a map of the potential audio sources. The location of the potential audio sources may be mapped with respect to a location of the camera 210, the locating and tracking sensors 230 or the video conferencing terminal 200 itself. In one embodiment, the map may be pre-determined before locating the audio source. In other embodiments, the map may be dynamically determined when locating the audio source.
The video conferencing terminal 300 can generate the map 350 before a video conference even begins. With use of mechanical motion, an initial scan of the conference room may be performed to pre-determine the proximate locations of participants in the room. Knowing the proximate locations of the participants can assist the video conferencing terminal 300 in making intelligent decisions about the location of actual audio sources during a video conference.
The video conferencing terminal 300 may make an initial scan of the room and infer from thermal and distance information where the participants are located relative to a position of the video conferencing terminal 300. In some embodiments, the scan may be 360 degrees. In other embodiments, the scan may be less than 360 degrees, such as when the videoconferencing terminal 300 has multiple of the same type of sensors. In another embodiment, the video conferencing terminal 300 may determine the positions of the participants as a video conference progresses using the directions (e.g., a radial angle with respect to a “home” position of the video conferencing terminal 300) where speech and participants are detected. Both these methods allow the video conferencing terminal 300 to form and maintain a map of the participants in the room as illustrated by the map 350.
The video conferencing terminal 300 includes an audio source locator and tracker. Additionally, the video conferencing terminal 300 may include a camera, various types of sensors, and a display. A field of view for a camera of the video conferencing terminal 300 is denoted in
The track state 410 is maintained when the angle θSSL is zero and the temperature T is equal to the threshold temperature TP. Accordingly, tracking a located audio source can be performed without detecting speech. If the angle θSSL is greater than zero, or there is silence (i.e., no speech detected) or the measured temperature T is less than the threshold temperature TP, then the wait state 420 is entered. At the wait state 420, a timer is initiated. The timer may be set based on experience. Different times may be established for the timer based on desired sensitivity levels or based on different locations. The timer may be set during manufacturing or may be set by an end user employing a user interface. A display may provide a user interface to set the timer.
The wait state 420 is maintained as long as the angle θSSL is greater than zero, the measured temperature T is less than the threshold temperature TP and the timer is greater than zero. Additionally, the wait state is maintained when there is silence, the timer is greater than zero and the measured temperature T is equal to the threshold temperature TP.
From the wait state 420, all of the other states may be entered depending on the status of the various conditions. If there is silence, the timer equals zero and the measured temperature T is less than the threshold temperature TP, then the idle state 430 is entered from the wait state 420. Upon reaching the idle state 430, the video conferencing terminal can move to either the search state 440 or the track state 410 depending on the angle θSSL and the measured temperature T. If the angle θSSL is greater than zero and the measured temperature T is less than the threshold temperature TP, then the search state 440 is entered. If the angle θSSL is equal to zero and the measured temperature T is equal to the threshold temperature TP, then the track state 410 is entered. Thus, even if speech is not detected, the video conferencing terminal may move from the idle state 430.
If the timer is equal to zero and the angle θSSL is equal to zero, then the video conferencing terminal moves from the wait state 420 to the track state 410. Additionally, if the timer is equal to zero and the angle θSSL is greater than zero, then the video conferencing terminal moves from the wait state 420 to the search state 440. The search state 440 is maintained when the angle θSSL is greater than zero and the measured temperature T is less than the threshold temperature TP. When the angle θSSL is equal to zero and the measured temperature T is equal to the threshold temperature TP, then the track state 410 is entered from the search state 440. At the search state 440, servos are activated to move the sensors locate an audio source.
The display 510 may be a conventional display that is configured to provide images for viewing. The display 510 may provide images from a remote location for the video conference. The display 510 may also be configured to provide a user interface. The user interface may include menus activated by touch or by a coupled keyboard, mouse, etc., via the coupling interface. The user interface may allow a user to program various settings for the video conferencing terminal 500 or adjust the picture of the display 510.
The sound sensor 520 is configured to detect sound. The sound sensor 520 includes stereo microphones. The thermal sensor 530 is used to detect heat and the range finder 540 is used to determine distance. Each of these sensors may be conventional devices. In one embodiment, the range finder 540 may be an ultrasonic sensor. These sensors provide the multimodal sensor data that is used by a audio source locator and tracker (not illustrated) of the video conferencing terminal 500 to locate and track audio sources.
The camera 550 is configured to capture images and the speaker 560 is configured to provide audio. The camera 550 and the speaker 560 may be conventional devices that are employed with video conferencing systems.
The base 570 is configured to support the components of the video conferencing terminal 500. The base 570 is configured to set on top of a table for a video conference. The base 570 includes servos to rotate and tilt the video conferencing terminal 500. As illustrated, the base 570 may rotate the video conferencing terminal 500 360 degrees and tilt the video conferencing terminal 500 45 degrees.
In a step 610, a map of potential audio sources is generated based on multimodal sensor data. The data may be provided by multiple sensors or different types of sensors. For example, a thermal sensor and a range finder may be used to provide the multimodal sensor data. The map may be generated with the potential audio sources positioned with respect to a video conferencing terminal or a camera of the video conferencing terminal.
In a step 620, an audio source is located based on multimodal sensor data from at least two different types of sensors. The map may be used to assist in locating the audio source. In addition to the thermal sensor and the range finder, a sound sensor may also be employed to provide the multimodal sensor data. In some embodiments, other types of sensors may also be used to provide multimodal sensor data.
A camera is automatically directed to view the audio source in a step 630. The camera is moved such that the audio source is within the field of view of the camera. The camera may also be directed to zoom-in or zoom-out.
In a step 640, the audio source is tracked. Multimodal sensor data may be used to track the audio source. Tracking may be performed according to the state diagram illustrated in
A person of skill in the art would readily recognize that steps of various above-described methods, including method 600, can be performed by programmed computers. For example, an audio source locator and tracker may be employed to work with other components of a video teleconferencing terminal to perform the steps of the method 600. Herein, some embodiments are also intended to cover program storage devices, e.g., digital data storage media, which are machine or computer readable and encode machine-executable or computer-executable programs of instructions, wherein said instructions perform some or all of the steps of said above-described methods. The program storage devices may be, e.g., digital memories, magnetic storage media such as a magnetic disks and magnetic tapes, hard drives, or optically readable digital data storage media. The embodiments are also intended to cover computers programmed to perform said steps of the above-described methods.
Those skilled in the art to which the application relates will appreciate that other and further additions, deletions, substitutions and modifications may be made to the described embodiments. Additional embodiments may include other specific apparatus and/or methods. The described embodiments are to be considered in all respects as only illustrative and not restrictive. In particular, the scope of the invention is indicated by the appended claims rather than by the description and figures herein. All changes that come within the meaning and range of equivalency of the claims are to be embraced within their scope.
This application claims the benefit of U.S. Provisional Application Ser. No. 61/388,149, filed by Hock M. Ng on Sep. 30, 2010, entitled “TECHNIQUE FOR VIDEOCONFERENCING INCLUDING SPEAKER LOCALIZATION AND TRACKING,” and incorporated herein by reference in its entirety. This application also relates to commonly assigned co-pending U.S. patent application Ser. No. 12/759,823, filed on Apr. 14, 2010, and U.S. patent application Ser. No. 12/770,991, filed on Apr. 30, 2010, both of which are incorporated herein by reference in their entirety.
Number | Name | Date | Kind |
---|---|---|---|
5355163 | Tomitaka | Oct 1994 | A |
5367506 | Inanaga et al. | Nov 1994 | A |
5500671 | Andersson et al. | Mar 1996 | A |
5596645 | Fujimori | Jan 1997 | A |
5745161 | Ito | Apr 1998 | A |
5786846 | Hiroaki | Jul 1998 | A |
5844599 | Hildin | Dec 1998 | A |
5940118 | Van Schyndel | Aug 1999 | A |
5963250 | Parker et al. | Oct 1999 | A |
6005610 | Pingali | Dec 1999 | A |
6021206 | McGrath | Feb 2000 | A |
6072522 | Ippolito et al. | Jun 2000 | A |
6266082 | Yonezawa et al. | Jul 2001 | B1 |
6385352 | Roustaei | May 2002 | B1 |
6487600 | Lynch | Nov 2002 | B1 |
6593956 | Potts et al. | Jul 2003 | B1 |
6628887 | Rhodes et al. | Sep 2003 | B1 |
6766035 | Gutta | Jul 2004 | B1 |
6772195 | Hatlelid et al. | Aug 2004 | B1 |
7035418 | Okuno et al. | Apr 2006 | B1 |
7039221 | Tumey et al. | May 2006 | B1 |
7111045 | Kato et al. | Sep 2006 | B2 |
7221386 | Thacher et al. | May 2007 | B2 |
7271827 | Nister | Sep 2007 | B2 |
7283788 | Posa et al. | Oct 2007 | B1 |
7330607 | Jung et al. | Feb 2008 | B2 |
7512883 | Wallick et al. | Mar 2009 | B2 |
7626569 | Lanier | Dec 2009 | B2 |
7840903 | Amidon et al. | Nov 2010 | B1 |
7880739 | Long et al. | Feb 2011 | B2 |
7913176 | Blattner et al. | Mar 2011 | B1 |
7987309 | Rofougaran | Jul 2011 | B2 |
7995090 | Liu et al. | Aug 2011 | B2 |
8111282 | Cutler et al. | Feb 2012 | B2 |
8125444 | Norager | Feb 2012 | B2 |
8150063 | Chen et al. | Apr 2012 | B2 |
8156184 | Kurata et al. | Apr 2012 | B2 |
8264522 | Martin et al. | Sep 2012 | B2 |
8397168 | Leacock et al. | Mar 2013 | B2 |
8411165 | Ozawa | Apr 2013 | B2 |
8451994 | Abuan et al. | May 2013 | B2 |
8547416 | Ozawa | Oct 2013 | B2 |
20020039111 | Gips et al. | Apr 2002 | A1 |
20020149672 | Clapp et al. | Oct 2002 | A1 |
20030081115 | Curry et al. | May 2003 | A1 |
20040189701 | Badt et al. | Sep 2004 | A1 |
20040233282 | Stavely et al. | Nov 2004 | A1 |
20040257432 | Girish et al. | Dec 2004 | A1 |
20050007445 | Foote et al. | Jan 2005 | A1 |
20050280701 | Wardell | Dec 2005 | A1 |
20060007222 | Uy | Jan 2006 | A1 |
20060077252 | Bain et al. | Apr 2006 | A1 |
20070002130 | Hartkop | Jan 2007 | A1 |
20070075965 | Huppi et al. | Apr 2007 | A1 |
20070120879 | Kanade et al. | May 2007 | A1 |
20070263824 | Bangalore et al. | Nov 2007 | A1 |
20070273839 | Doi et al. | Nov 2007 | A1 |
20080012936 | White | Jan 2008 | A1 |
20080086696 | Sri Prakash et al. | Apr 2008 | A1 |
20080170123 | Albertson et al. | Jul 2008 | A1 |
20080211915 | McCubbrey | Sep 2008 | A1 |
20090041298 | Sandler et al. | Feb 2009 | A1 |
20090111518 | Agrawal et al. | Apr 2009 | A1 |
20090119736 | Perlman et al. | May 2009 | A1 |
20090122572 | Page et al. | May 2009 | A1 |
20090141147 | Alberts et al. | Jun 2009 | A1 |
20090153474 | Quennesson | Jun 2009 | A1 |
20090210804 | Kurata et al. | Aug 2009 | A1 |
20090216501 | Yeow et al. | Aug 2009 | A1 |
20090315984 | Lin et al. | Dec 2009 | A1 |
20100073454 | Lovhaugen et al. | Mar 2010 | A1 |
20100128892 | Chen et al. | May 2010 | A1 |
20100188473 | King et al. | Jul 2010 | A1 |
20100328423 | Etter | Dec 2010 | A1 |
20110170256 | Lee | Jul 2011 | A1 |
20110254914 | Ng | Oct 2011 | A1 |
20110267421 | Sutter, Jr. | Nov 2011 | A1 |
20110268263 | Jones et al. | Nov 2011 | A1 |
20120011454 | Droz et al. | Jan 2012 | A1 |
20120069218 | Gantman | Mar 2012 | A1 |
20120081504 | Ng et al. | Apr 2012 | A1 |
20120083314 | Ng et al. | Apr 2012 | A1 |
20120204120 | Lefar et al. | Aug 2012 | A1 |
20120216129 | Ng et al. | Aug 2012 | A1 |
20130141573 | Sutter et al. | Jun 2013 | A1 |
20130314543 | Sutter et al. | Nov 2013 | A1 |
Number | Date | Country |
---|---|---|
1 643 769 | Jan 2005 | EP |
9306690 | Apr 1993 | WO |
01 82626 | Nov 2001 | WO |
01 86943 | Nov 2001 | WO |
Entry |
---|
Lance Ulanoff—I'Robot's AVA is an App-Ready Robot—5 pages—pcmag.com, Jan 6, 2011—www.pcmag.com/article2/0,2817,2375313,00.asp. |
Travis Deyle—IRobot AVA Telepresence Robot at CES 2011—11 pages—hizook.com, Jan. 6, 2011—www.hizook.com/blog/2011/01/06/irobot-ava-telepresence-robot-ces-2011-one-step-closer-robot-app-stores. |
http://www.necdisplay.com/newtechnologies/curveddisplay/. |
http://www.polycom.com/products/voice/conference—solutions/microsoft—optimized—conferencing/cx5000.html. |
Bolle:U.S. Appl. No. 12/238,096; “Videoconferencing Terminal and Method of Operation Thereof to Maintain Eye Contact,” filed Sep. 25, 2008. |
Bolle; U.S. Appl. No. 12/472,250; Videoconferencing Terminal and Method of Operation Thereof to Maintain Eye Contact, filed May 26, 2009. |
Bolle et al.; U.S. Appl. No. 12/640,998;“Videoconferencing Terminal With a Persistence of Vision Display and a Method of Operation Thereof to Maintain Eye Contact,” filed Dec. 17, 2009. |
M. Gross et al, “blue-c: A Spatially Immersive Display and 3D Video Portal for Telepresence”, project webpage: http://bluec.ethz.ch/,ACM 0730-0301/03/0700-0819, (2003) pp. 819-827. |
M. Kuechler et al, “HoloPort—A Device for Simultaneous Video and Data Conferencing Featuring Gaze Awareness”, In Proceedings of the 2006 IEEE Virtual Reality Conference (VR '06), (2006), pp. 81-87. |
S. Iizadi et al, “Going Beyond the Display: A Surface Technology with an Electronically Switchable Diffuser”, UIST '08, (Oct. 19-22, 2008), Monterey, CA, pp. 269-278. |
H. Ishii et al, “ClearBoard: A Seamless Medium for Shared Drawing and Conversation with Eye Contact”, CHI '92, (May 3-7, 1992), pp. 525-532. |
K-H Tan et al, “ConnectBoard: A remote collaboration system that supports gaze-aware interaction and sharing”, 2009 IEEE International Workshop on Multimedia Signal Processing, MMSP '09, (Oct. 5-7, 2009), 6 pages. |
S. Shiwa et al, “A Large-Screen Visual Telecommunication Device Enabling Eye Contact”, SID 91 Digest, ISSN0097-0966x/91/0000-327 (1991), pp. 327-328. |
C. Bolle et al, “Imaging Terminal”, filed Dec. 8, 2009, U.S. Appl. No. 12/633,656, 21 pages. |
C. Bolle, “Videoconferencing Terminal and Method of Operation Thereof to Maintain Eye Contact”, filed May 26, 2009, U.S. Appl. No. 12/472,250, 25 pages. |
Rafik, David Lo., et al., “Multimodal Talker Localization in Video Conferencing Environments”, Oct. 2, 2004 IEEE, p. 195-200. |
Swivl Web Page, downloaded May 7, 2012; www.swivl.com, 3 pages. |
Swivl Blog Post dated Dec. 28, 2010, www.swivl.com/2010/12/why-a-video-accessory/, 2 pages. |
Swivl Blog Post dated Dec. 22, 2010, www.swivl.com/2010/12/live-on-indiegogo/, 2 pages. |
Foreign Communication From a Related Counterpart Application, PCT Application No. PCT/US2012/066511, International Search Report dated Jun. 13, 2013, 4 pages. |
Cisco Webex, “What is Webex?” https://web.archive.org/web/20110101032216/http://www.webex.com/what-is webex/index.html, downloaded Jan. 24, 2014, 2 pages. |
“HP SkyRoom Version 1 (Quanity 500) Node-locked E-LTU Software (VK634AAE)—Specifications and Warranty,” Hewlett Packard, http://h10010.www.1.hp.com/wwpc/us/en/sm/WFO6b/18964-18964-4020068-4020071-4020069-4020938-4026194-4026196.html?dnr=2, downloaded Jan. 24, 2014, 3 pages. |
Number | Date | Country | |
---|---|---|---|
20120081504 A1 | Apr 2012 | US |
Number | Date | Country | |
---|---|---|---|
61388149 | Sep 2010 | US |