A goal of multi-way video conferencing systems is to provide a natural interaction experience which attempts to simulate a meeting as if all the participants are meeting in the same room. In some higher end video conferencing systems, the system attempts to control the physical environment by using the same cameras, paint, lighting, etc. in order to provide a more natural viewing experience. Unfortunately, even in these controlled settings, differences in lighting conditions, camera settings, paint tolerances etc. can result in undesired differences in the video streams and thus in the appearance of the participants. In lower end video conferencing solutions (desktop to desktop and also mobile video conferencing solutions), controlling the physical environment to minimize the differences in the video streams is not practical solution. Since humans are very sensitive to such differences, these differences tend to disrupt the video conference experience.
The figures depict implementations/embodiments of the invention and not the invention itself. Some embodiments are described, by way of example, with respect to the following Figures.
The drawings referred to in this Brief Description should not be understood as being drawn to scale unless specifically noted.
For simplicity and illustrative purposes, the principles of the embodiments are described by referring mainly to examples thereof. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the embodiments. It will be apparent, however, to one of ordinary skill in the art, that the embodiments may be practiced without limitation to these specific details. Also, different embodiments may be used together. In some instances, well known methods and structures have not been described in detail so as not to unnecessarily obscure the description of the embodiments.
Multi-way video conferencing systems aim to provide a natural experience to the participants, providing a conference environment which simulates the experience of a meeting where all the participants are meeting in the same physical location. Unfortunately, differences in camera settings, lighting conditions, etc. at the sites of the different locations can result in substantial undesired differences in the appearance of the participants. Since humans are very sensitive to such differences, they tend to disrupt the user experience. The invention describes a multi-way video coordination and modification system that samples the multiple video streams and more closely matches human perceptible factors (color balance, contrast, etc.) between the video streams in order to provide more consistent video streams with respect to the human perceptible factors thus providing a more natural interaction environment.
The video modification system 112 is comprised of: a human factor value determination component 102, for determining the value of at least the human perceptible factor for at least a subset of the plurality of video streams 104a-n, wherein the plurality of video streams 104a-n are each captured independently by at least n image capture devices 106a-n; a human factor value comparator component 120 for comparing the value of the at least one human perceptible factor for the at least a subset of the n video streams; and a human factor modification component 124 for modifying the value of the human perceptible factor for the at least a subset of the n video streams to minimize the differences in the values of the human perceptible factor between the n video streams.
Referring to
In one embodiment, each of the n image capture devices captures video of at least one video conference participant. Although more than one participant could be captured by the image capture devices, in the embodiment shown in
Referring to
Referring to
The human perceptible factor determination component includes a plurality of modules that could include but are not limited to: a skin detection module 152, a face detection module 154, a contrast module 156, a sharpness module 158, a lightness module 160). The “modules” and/or “components” in the described system can be implemented in hardware or software or a combination of both. Also, although an element may be shown as a single element, the modules or components can be combined with other components or modules to form a single module that performs the same function. Alternatively; a single module could be separated into multiple modules or components that perform the same function.
The video modification system compares at least one human perceptible quality in the video stream. In one embodiment, only one human perceptible quality is compared and modified. For example, for the embodiment described with respect to
The human perceptible factor determination component 102 uses video information (captured by sensors, the video capture devices, etc.) and analyzes it in order to determine the human perceptible factor value. For example, a frame of the video can be analyzed and information from the video frame can be used to determine the color balance in the image. For example, the skin detection module 152 may be used to detect the skin of a video conference participant. In one example described with respect to
Referring to
The output 138 of the comparison of the human perceptible values is used as input to the human factor modification component. Based on the comparison of the human perceptible modifications in the video streams, the output of the video streams is modified (step 230 in
When all of the video streams are displayed on the display screen of the video participant's display screen, the human perceptible factors for each video stream are determined, compared and modified. However, when only a subset of the video streams are displayed, then it is not required that all n video streams be processed. Instead, only a subset (for example, only the video streams to be displayed) of the video streams are processed. This, for example, could be the case for the embodiment shown in
The method shown in
The previous example describes the case where only a subset of the n independently captured video streams may be compared according to the method 200. In an alternative embodiment where all of the n video streams are displayed—for each of the n video streams, the human perceptible factors are determined, compared and modified. Where all of the n video streams are displayed, the video streams may be processed according to the method comprised of the following steps: determining the value of at least one human perceptible factor for each of the n video streams, where n is an integer greater than one, where the n video streams captured independently by at least n image capture devices (step 210); comparing the value of the at least one human perceptible factor for each of the n video streams (step 220); and based on the comparison, modifying the value of the human perceptible factor for each of the n video streams to minimize the differences in the values of the human perceptible factor between the n video streams (step 230).
In one embodiment of the described invention, at least a subset of the n video streams are processed jointly and automatically modified—instead of enhancing each of the video streams individually. For the case of two video streams—the two video streams could both be jointly processed together and compared at the same time t1 to determine the appropriate color balance. In an alternative embodiment, the two video streams could be processed and average values compared for some time interval. For example, for the case where n=2 and color balance is the human perceptible factor—the two video streams could each be adjusted to minimize the differences between the two video streams. For example, if the first video stream has color balance RGB=[1 1.8] and the second video stream has a color balance RGB=[1 1.6], the color balances of both video streams could be modified to RGB=[1 1.7] to make the color balance in these video streams appear more consistent.
Referring to
In the method shown in
Once the video conference participant's face is detected, the color balance of at least a subset of the n video streams is determined (step 242) using facial features of the video conference participants. From the facial features in the video stream, the colors of the facial features (for example, skin pixels or pixels of the eye whites) are measured and used to determine the nominal color balance of the video based on the combined lighting and capture system used to capture the video stream of interest. In one embodiment, once the face is detected color balance is determined from the skin of the video conference participants. Once it is determined what part of the face is skin using skin detection techniques, nominally known skin tone colors are used in order to determine color balance in the video frame. In an alternative embodiment, the facial feature used to determine the color balance is the eyes. Once the face (and eyes) is detected, eye white detection is used to determine the color balance. Alternatively, one can apply an eye feature detector independent of a face detector. Nominally known eye white colors are used in order to determine color balance in the video frame. The white balancing can be done only if the face is detected, otherwise a default white balance can be used.
After the color balance of each of the n independently captured video stream is determined (step 242), the system jointly processes the n video streams to determine the desired color balance for at least a subset of the video stream (step 248). After determining the desired color balance for the video stream, the color balance of the video streams are compared in order to determine how far away each video stream is away from the desired color balance (step 250). Steps 248 and 250 in
The balancing and comparison of the human perceptible factors can be described and optimized according to the present invention, with respect to equation (1):
Where fi represents the input frame for the video stream indexed by i and p are parameters of the transformation T of frame fi. The transformation T corresponds to a human perceptible factor. Although equation (1) is applicable to different human perceptible factors, for purposes of the color balance example described, p is a parameter vector used to adjust the white balance function T, that maps input colors to output color. The term pi represents the parameter we are comparing in order to minimize the expression in equation (1). The term F is an error criteria for keeping the results close to the target values, in the white balance function T and G is an error criteria that keeps the results of the white balancing for the different streams dose to each other. The relative importance given to the first term in the optimization versus the second term is controlled with the weight λ, for the implementation where we are trying to balance the human perceptible color balance.
Once the desired color balance value is determined, the values are compared to determine how far away the color balance value of each video stream is from the desired color balance value. Once this difference is known, it is used to minimize the differences between the color balance values of the different video streams. If the difference between the color balance values is zero (or small but non-zero because of random errors)—then the color balance values of the video streams match (to the extent possible using the described method). In the embodiment described with respect to
After the colors in the video are modified to the desired color balance (and optionally modified to apply skin tone correction (step 254)), the colors of at least a subset of the n video streams may be optionally be modified towards an improved aesthetic. The process of modifying the video stream to an improved aesthetic (step 258) is defined in more detail with respect to the implementation described with respect to
Referring to the embodiment shown in
In the method shown in
Alter the color balance of each of the n independently captured video stream is determined (step 242), the system jointly processes the n video streams to determine the desired color balance for the video stream (step 248). After determining the desired color balance for the video stream, the color balance of the video streams are compared in order to determine how far away each video stream is away from the desired color balance (step 250). Steps 248 and 250 in
Once the desired color balance value is determined, the values are compared to determine how far away the color balance value of each video stream is from the desired color balance value. Once this difference is known, it is used to minimize the differences between the color balance values of the different video streams. If the difference between the color balance values is zero (or the minimized value(—then the color balance values of the video streams match.
After the colors in the video are modified to the desired color balance (and optionally modified to apply skin tone correction (step 254)), the colors of at least a subset of the n video streams may be optionally be modified towards an improved aesthetic. The process of modifying the video stream to an improved aesthetic (step 258) is defined in more detail with respect to the implementation described with respect to
In one embodiment, the first step in determining whether the video streams are modified towards an improved aesthetic is to determine whether an improved aesthetic is defined (step 260). In some cases, the improved aesthetic is a user or participant defined local system preference. For example, the video conference participant may want his personal or local system to have a color balance with more reddish tones than is standard. In this case, the more reddish “improved” aesthetic may be applied only to the video streams that are displayed on the local participant's display screen according to the local participant's specifications—while the other participants in the conference may display the standard color balance. In an alternative embodiment, the improved aesthetic may be a system defined aesthetic. For example, if the color balance that is the desired color balance is what the system defines as out of range aesthetically, the system may modify the color aesthetic of the desired color balance determined by the balance control component to be ideal.
Some or all of the operations set forth in the method 200 may be contained as utilities, programs or subprograms, in any desired computer accessible medium. In addition, the method 200 may be embodied by computer programs, which may exist in a variety of forms both active and inactive. For example, they may exist as software program(s) comprised of program instructions in source code, object code, executable code or other formats. Any of the above may be embodied on a computer readable medium, which include storage devices and signals, in compressed or uncompressed form.
The computing apparatus 300 includes one or more processor(s) 302 that may implement or execute some or all of the steps described in the methods 200. Commands and data from the processor 302 are communicated over a communication bus 304. The computing apparatus 300 also includes a main memory 306, such as a random access memory (RAM), where the program code for the processor 302, may be executed during runtime, and a secondary memory 308. The secondary memory 308 includes, for example, one or more hard drives 310 and/or a removable storage drive 312, representing a removable flash memory card, etc., where a copy of the program code for the method 200 may be stored. The removable storage drive 312 reads from and/or writes to a removable storage unit 314 in a well-known manner.
These methods, functions and other steps may be embodied as machine readable instructions stored on one or more computer readable mediums, which may be non-transitory. Exemplary non-transitory computer readable storage devices that may be used to implement the present invention include but are not limited to conventional computer system RAM, ROM, EPROM, EEPROM and magnetic or optical disks or tapes. Concrete examples of the foregoing include distribution of the programs on a CD ROM or via Internet download. In a sense, the Internet itself is a computer readable medium. The same is true of computer networks in general. It is therefore to be understood that any electronic device and/or system capable of executing the functions of the above-described embodiments are encompassed by the present invention.
Although shown stored on main memory 306, any of the memory components described 306, 308, 314 may also store an operating system 330, such as Mac OS, MS Windows, Unix, or Linux; network applications 332; and a balance control component 334. The operating system 330 may be multi-participant, multiprocessing, multitasking, multithreading, real-time and the like. The operating system 330 may also perform basic tasks such as recognizing input from input devices, such as a keyboard or a keypad; sending output to the display 320; controlling peripheral devices, such as disk drives, printers, image capture device; and managing traffic on the one or more buses 304. The network applications 332 includes various components for establishing and maintaining network connections, such as software for implementing communication protocols including TCP/IP, HTTP, Ethernet, USB, and FireWire.
The computing apparatus 300 may also include an input devices 316, such as a keyboard, a keypad, functional keys, etc., a pointing device, such as a tracking ball, cursors, etc., and a display(s) 320, such as the LCD screen display 130 shown for example in
The processor(s) 302 may communicate over a network, for instance, a cellular network, the Internet, LAN, etc., through one or more network interfaces 324 such as a Local Area Network LAN, a wireless 802.11x LAN, a 3G mobile WAN or a WiMax WAN. In addition, an interface 326 may be used to receive an image or sequence of images from imaging components 328 such as the image capture device.
The foregoing description, for purposes of explanation, used specific nomenclature to provide a thorough understanding of the invention. However, it will be apparent to one skilled in the art that the specific details are not required in order to practice the invention. The foregoing descriptions of specific embodiments of the present invention are presented for purposes of illustration and description. They are not intended to be exhaustive of or to limit the invention to the precise forms disclosed. Obviously, many modifications and variations are possible in view of the above teachings. The embodiments are shown and described in order to best explain the principles of the invention and its practical applications, to thereby enable others skilled in the art to best utilize the invention and various embodiments with various modifications as are suited to the particular use contemplated. It is intended that the scope of the invention be defined by the following claims and their equivalents: