The invention relates to an image acquisition system and in particular to an image acquisition system in which a selected area of an image is optimized.
The use of video conferencing or telephony, which allows remote parties to both see and hear one another, is becoming increasingly popular. As used herein, “video telephony” refers to communications using both video and audio transmitted over a communications network. Such applications facilitate remote communications by providing a visual image of each conference participant. Accordingly, video conferencing allows parties to communicate audibly and visibly, without requiring lengthy and expensive travel.
In a typical video telephony application, a camera is positioned to obtain an image of each of the participants at each endpoint of the communication. The image of a participant at one endpoint is then provided to a participant at another endpoint, so that each participant is viewing the other during the communication session. The video telecommunications interaction can include two or more endpoints, and each endpoint can include more than one participant.
The image that is transmitted during a video conference is often of inferior quality. A number of factors can contribute to the inferior quality of transmitted images. For example, contrast and color saturation levels may be incorrectly set at the transmitting end. In addition, the amount of data that is used to describe an image is often limited, for example due to transmission bandwidth constraints. Furthermore, these limitations on image quality are often exacerbated by poor lighting conditions.
Video cameras that are capable of supporting backlight compensation are capable of removing peaks in image intensity. Such backlight compensation may operate by equalizing the overall image histogram to remove the peaks in image intensity. Although such techniques can be effective at providing an image having improved quality overall, they do not specifically act to improve the quality of those portions of the image that correspond to the face of an imaged participant. Accordingly, the area of the image corresponding to the face of a participant in a video conference may continue to be of relatively low quality.
In order to allow a camera to provide an image that is centered on the face of a participant, face tracking capabilities are available. In a system that provides automatic face tracking, the camera will be zoomed into a detected face and will attempt to make the face dominate the image, thus reducing the effect of the surrounding environment. Although such systems can be effective at following a participant moving around a scene, image information related to background objects is described using the same image parameters available for those portions of the image comprising the face being tracked. As a result, the portion of the image corresponding to the tracked face can be of lower quality than is desired. In particular, because there is a fixed amount of image detail that can be encoded, and because an equal range of available image parameters is devoted to background information as is devoted to the face of the participant, a portion of the finite image information is consumed describing the relatively unimportant background portions of the image.
The present invention is directed to solving these and other problems and disadvantages of the prior art. In accordance with embodiments of the present invention, the portion of an image corresponding to the face of a person is optimized. Such optimization is done with the recognition that background information, for instance, any portion of the image that does not comprise the face of a participant, will have poorer image quality as a result. In this way, the limited information available for describing an image is unevenly allocated between portions of the image corresponding to a participant's face and other areas of the image, to improve the quality of the image in the significant areas.
In accordance with embodiments of the present invention, an image of a scene is obtained. Face tracking technology is then used to identify the portion or portions of the image that contain the face of a person. The area corresponding to those portions of the image that contain the face of a person are then optimized. Such optimization may include allocating a greater number or range of available parameter values to the image information included in the area corresponding to the participant's face than are allocated to other areas of the image. Examples of such parameters include an available range of colors, an available range of brightness levels, an available amount of resolution and an available range of contrast.
In accordance with further embodiments of the present invention, a system providing for dynamic video equalization of images using face tracking is provided. The system may include an imaging camera for obtaining an image of a scene. The system may further include a processor capable of implementing a face tracking application. Furthermore, the processor is generally capable of selectively optimizing an image based on output from the face tracking application. More particularly, a greater number or range of available image parameters can be dedicated to describing portions or areas of the image that correspond to the face of a person than to background areas. In accordance with still other embodiments of the present invention, a video output device or display may be provided for reproducing dynamically equalized images.
Additional features and advantages of the present invention will become more readily apparent from the following discussion, particularly when taken together with accompanying drawings.
With reference now to
The video conferencing system 100 may additionally provide a user selection input device 132 associated with a processor 136, in order to control aspects of the operation of the video conferencing system 100. Furthermore, the processor 136 can operate to process image information obtained by the camera 116 to enhance those portions of the image that include or comprise the face 124 of the first participant 104, as described in greater detail elsewhere herein.
With reference now to
Each video conference location 204 in accordance with embodiments of the present invention may include an audio transceiver 108 comprising a speaker 109 and microphone 110. In addition, each video conference location 204 may include a display 112, camera 116 and a selection input device 132. In addition, the devices or components associated with a video conference location 204 may be interconnected to a processor or controller 136. The processor 136 may be sited at the first video conference location 204. Alternatively, the processor 136 may be sited at a different location. Furthermore, functions of the processor or controller 136 may be distributed among various locations 204 or nodes associated with the video conferencing system 100. The processor or controller 136 may be interconnected to or associated with a communication network interface 208 that interconnects the processor or controller 136 and the associated components to a communication network 212, and in turn to a participant or participants at a second video conferencing location 204b.
As also depicted in
The audio transceiver 108 provides audio output through a speaker 109 and audio input through a microphone 110. In accordance with embodiments of the present invention, the audio transceiver 108 comprises a speaker phone having common telephony functionality. According to further embodiments of the present invention, the audio transceiver 108 comprises a speaker 109 and a microphone 110 that functions as part of a soft phone or video phone running on a processor 136 comprising a general purpose or personal computer. According to other embodiments, the audio transceiver 108 may be provided as part of a video telephone. In general, the audio transceiver 108 may be any device capable of translating acoustical signals into electrical signals and vice versa.
The display 112 may comprise any device capable of receiving a video signal and displaying a corresponding image. Accordingly, the display 112 may comprise a cathode ray tube or a liquid crystal display. The display 112 may be provided, for example, as part of general purpose computer, as part of a video telephone, or as a monitor.
The camera 116 may be any device capable of translating images of a scene into electronic signals. For example, a camera 116 may comprise an optical lens system in combination with an image sensor, such as a charge coupled device (CCD). The camera 116 may be provided, for example, as part of a video telephone or as a computer peripheral.
The user selection input device 132 may comprise various devices for receiving input from a user, such as a video conferencing participant 104. For example, the user selection input device 132 may comprise a keyboard; a pointing device, such as a mouse or track ball; a numeric keypad; a touch sensitive display screen integrated with the display 112; and/or a voice recognition system operating in connection with the audio transceiver 108. Signals from the user selection input device 132 are provided to the processor 136.
The processor 136 may, as mentioned above, comprise a general purpose or personal computer. In addition, the processor 136 may comprise a specially adapted video conferencing processor unit, for example utilizing a specialized controller or a general purpose processor running code specifically adapted for performing video conferencing functions. For example, the processor 136 may comprise a personal computer running a video conferencing software application in connection with a standard operating system, such as the Windows® operating system. As a further example, the processor 136 may comprise a video telephone incorporating a suitably programmed controller running firmware for implementing functions described herein.
In particular, in connection with embodiments of the present invention, the processor 136 runs a video conferencing image acquisition and processing application that incorporates the ability to enhance portions of an image obtained by a camera 116 at the expense of other portions of that image. More particularly, embodiments of the present invention combine face tracking functions with image enhancement functions that provide for an improved or optimized image in those areas of the overall image that correspond to the face 124 of a video conference participant 104. More particularly, the image taken by a camera 116 is processed by the processor 136 to determine the portion or portions of that image that correspond to the face of a video conference participant 104. As can be appreciated by one of skill in the art, such face tracking functions may apply spatial segmentation algorithms or techniques. Once the area or areas corresponding to the face of a video conference participant 104 has been determined, those portions of the image are enhanced relative to other portions of the image. For example, as will be described in greater detail elsewhere herein, image data parameters, such as contrast, brightness, color depth and resolution are optimized for the areas of the image corresponding to the face 124 of a participant 104. Such optimization may comprise devoting a greater portion of an available range of parameter values to those portions of the image corresponding to a participant's face 124 as compared to other portions of the image. Furthermore, such functions may be considered dynamic, since the image pixels comprising a face 124 will typically change from frame to frame of video.
With reference now to
The differences between the image 300 of
Examples of the image information that may be allocated differentially between different portions of an image 304 include contrast, brightness, color depth, and resolution. Therefore, a specific example of allocating greater image information to a specific portion of an image may be given in terms of color depth. In this example, a system or video conferencing protocol limited to transmitting and displaying no more than 256 colors is assumed. In representing the area corresponding to the face 124 of the video conference participant 104, a greater number of colors may be made available by embodiments of the present invention to represent the face 124 of the participant 104 than to the background 312. For instance 254 colors could be available for representing the portion of the image 304 corresponding to the face 124, while the remaining two colors could be allocated to representing the background 312. Of course, other allocations of available image parameters can be made. For example, 200 colors can be used to represent the face 124 while the remaining 56 colors could be used to represent the background 312. Furthermore, a particular video conferencing system 100 may make other differential allocations of image parameters that favor detailed representation of the face 124 of a participant 104 over or at the expense of the background 304 of an image. Because of such differential allocation, it can be appreciated that the face 124 may be represented in greater detail, while distortions may occur with respect to the background 312, as compared to a conventional system, as depicted in
With reference now to
A determination is then made as to whether dynamic video equalization as described herein has been activated (step 408). If dynamic video equalization has not been activated, an image of the scene is provided, without dynamic video equalization. That is, the image of the scene is represented conventionally, with parameters comprising the image data allocated equally across the entire image (step 412). If dynamic video equalization has been activated, a determination is then made as to whether the image obtained by the camera 116 contains within it an image of a human face 124 (step 416). If it is determined that the image does not contain or include a human face 124, the process may proceed to step 412, and an image of the scene is provided without dynamic video equalization. If it is determined that the scene does include a human face 124, the area or areas of the image that correspond to a human face are identified (step 420). In general, when a single participant 104 is included in an image, conventional face tracking and identification application software will identify a single area that comprises the face 124 of that participant 104. Furthermore, where a number of participants 104 are included in an image from a video conference location 204, the face 124 of each of those participants 104 will generally be identified as a separate area within the image. As used herein, the area comprising the “face” 124 of a participant 104 may comprise just the face itself, the face and hair (i.e., the head) of a participant 104, or all areas of the participant's body that are included in the image, such as the head and shoulders of the participant 104.
At step 424, the image in the area or areas identified as corresponding to a human face 124 are optimized. As noted elsewhere herein, optimization of areas corresponding to a human face 124 within the image may comprise allocating or making available a larger number or range of parameters for use in describing those areas to be optimized, as compared to other areas within an image. The effect of such optimization is to provide an image quality that is superior in those areas of the image that have been optimized, as compared to unoptimized areas of the image. In addition to representing optimized areas with greater fidelity and/or detail, it can be appreciated that the background 312 of an optimized image 304 may exhibit characteristics that are usually considered undesirable. For example, the background 312 of an optimized image 304 may experience color banding, aliasing or other image defects. However, such defects are considered acceptable, because it is the face 124 of video conference participants 104 that is of primary importance to video conferencing applications. At step 428, the optimized image of the scene is provided as output from the first video conference location 204, and in particular from the processor 136 associated with or used by the first video conference location 204. The output may then be displayed at other video conference locations 204.
After providing an optimized image as output at step 428, or after providing an image of this scene without dynamic video equalization or optimization at step 412, a determination is made as to whether the video conferencing system 100 has been deactivated (step 432). If the video conferencing system 100 has not been deactivated, the process may return to step 404, and an image of the scene is again obtained. Accordingly, it can be appreciated that the process of identifying an area or areas within an image corresponding to the face 124 of a video conference participant 104, and optimization of available image information for such area or areas may be performed continuously. Furthermore, optimization may be performed for each frame of an image (i.e., for each frame of video) information that is collected. Because the position of a participant's face 124 within a frame will typically vary from frame to frame, the area within which equalization is applied will typically vary. Accordingly, the area within which optimization or video equalization is performed is dynamic. If at step 432 it is determined that video conferencing system 100 has been deactivated, the process may end.
Although embodiments of the present invention have been described in connection with the transmission of video between video conferencing locations or endpoints in realtime or substantially realtime (e.g., after processing delays), it should be appreciated that the present invention is not so limited. In particular, embodiments of the present invention may be applied wherever video information comprising the face of a person as a subject is to be transmitted and/or recorded. Furthermore, it should be appreciated by one of skill in the art from the description provided herein that embodiments of the present invention may be applied wherever image information comprising the face of a person 124 is to be transmitted, recorded or output as or using a limited number of image parameters.
The foregoing discussion of the invention has been presented for purposes of illustration and description. Further, the description is not intended to limit the invention to the form disclosed herein. Consequently, variations and modifications commensurate with the above teachings, within the skill or knowledge of the relevant art, are within the scope of the present invention. The embodiments described hereinabove are further intended to explain the best mode presently known of practicing the invention and to enable others skilled in the art to utilize the invention in such or in other embodiments and with the various modifications required by their particular application or use of the invention. It is intended that the appended claims be construed to include alternative embodiments to the extent permitted by the prior art.
Number | Name | Date | Kind |
---|---|---|---|
4791660 | Oye et al. | Dec 1988 | A |
5164992 | Turk et al. | Nov 1992 | A |
5206903 | Kohler et al. | Apr 1993 | A |
5280561 | Satoh et al. | Jan 1994 | A |
5349379 | Eichenlaub | Sep 1994 | A |
5359362 | Lewis et al. | Oct 1994 | A |
5430473 | Beecher, II et al. | Jul 1995 | A |
5506872 | Mohler | Apr 1996 | A |
5619254 | McNelley | Apr 1997 | A |
5675376 | Andersson et al. | Oct 1997 | A |
5731805 | Tognazzini et al. | Mar 1998 | A |
5828747 | Fisher et al. | Oct 1998 | A |
5839000 | Davis, Jr. et al. | Nov 1998 | A |
RE36041 | Turk et al. | Jan 1999 | E |
5905525 | Ishibashi et al. | May 1999 | A |
5905793 | Flockhart et al. | May 1999 | A |
5982873 | Flockhart et al. | Nov 1999 | A |
5986703 | O'Mahony | Nov 1999 | A |
6046767 | Smith | Apr 2000 | A |
6163607 | Bogart et al. | Dec 2000 | A |
6173053 | Bogart et al. | Jan 2001 | B1 |
6192122 | Flockhart et al. | Feb 2001 | B1 |
6343141 | Okada et al. | Jan 2002 | B1 |
6463220 | Dance et al. | Oct 2002 | B1 |
6483531 | Ryu | Nov 2002 | B1 |
6496217 | Piotrowski | Dec 2002 | B1 |
6498684 | Gladnick et al. | Dec 2002 | B1 |
6507356 | Jackel et al. | Jan 2003 | B1 |
6556196 | Blanz et al. | Apr 2003 | B1 |
6593955 | Falcon | Jul 2003 | B1 |
6597736 | Fadel | Jul 2003 | B1 |
6603491 | Lemelson et al. | Aug 2003 | B2 |
6680745 | Center, Jr. et al. | Jan 2004 | B2 |
6744927 | Kato | Jun 2004 | B1 |
6753900 | Runcie et al. | Jun 2004 | B2 |
6801642 | Gorday et al. | Oct 2004 | B2 |
6812956 | Ferren et al. | Nov 2004 | B2 |
6864912 | Mahaffey et al. | Mar 2005 | B1 |
6878924 | Baron | Apr 2005 | B2 |
7023464 | Harada et al. | Apr 2006 | B1 |
7091928 | Rajasingham | Aug 2006 | B2 |
7262788 | Ono et al. | Aug 2007 | B2 |
7269292 | Steinberg | Sep 2007 | B2 |
20020061131 | Sawhney et al. | May 2002 | A1 |
20020113862 | Center, Jr. et al. | Aug 2002 | A1 |
20040012613 | Rast | Jan 2004 | A1 |
20040210844 | Pettinati et al. | Oct 2004 | A1 |
20050185045 | Kamariotis | Aug 2005 | A1 |
20050210105 | Hirata et al. | Sep 2005 | A1 |
20050248651 | Hirata et al. | Nov 2005 | A1 |
Number | Date | Country |
---|---|---|
0 118 182 | Sep 1985 | EP |
2529352 | Dec 1983 | FR |
401221086 | Sep 1989 | JP |
404344788 | Dec 1992 | JP |
05219269 | Aug 1993 | JP |
WO 9957900 | Nov 1999 | WO |
WO 02085018 | Oct 2002 | WO |