The invention has other advantages and features which will be more readily apparent from the following detailed description of the invention and the appended claims, when taken in conjunction with the accompanying drawing, in which:
The figures (or drawings) depict a preferred embodiment of the present invention for purposes of illustration only. It is noted that similar or like reference numbers in the figures may indicate similar or like functionality. One of skill in the art will readily recognize from the following discussion that alternative embodiments of the structures and methods disclosed herein may be employed without departing from the principles of the invention(s) herein. It is to be noted that the examples that follow focus on webcams, but that embodiments of the present invention could be applied to other image capturing devices as well.
In one embodiment, the data captured by the image capture device 100 is still image data. In another embodiment, the data captured by the image capture device 100 is video data (accompanied in some cases by audio data). In yet another embodiment, the image capture device 100 captures either still image data or video data depending on the selection made by the user 120. The image capture device 100 includes a sensor for capturing image data. In one embodiment, the image capture device 100 is a webcam. Such a device can be, for example, a QuickCam® from Logitech, Inc. (Fremont, Calif.). It is to be noted that in different embodiments, the image capture device 100 is any device that can capture images, including digital cameras, digital camcorders, Personal Digital Assistants (PDAs), cell-phones that are equipped with cameras, etc. In some of these embodiments, host system 110 may not be needed. For instance, a cell phone could communicate directly with a remote site over a network. As another example, a digital camera could itself store the image data.
Referring back to the specific embodiment shown in
In one embodiment, the device 100 may be coupled to the host 110 via a wireless link, using any wireless technology (e.g., RF, Bluetooth, etc.). In one embodiment, the device 100 is coupled to the host 110 via a cable (e.g., USB, USB 2.0, FireWire, etc.). It is to be noted that in one embodiment, the image capture device 100 is integrated into the host 110. An example of such an embodiment is a webcam integrated into a laptop computer.
The image capture device 100 captures the image of a user 120 along with a portion of the environment surrounding the user 120. In one embodiment, the captured data is sent to the host system 110 for further processing, storage, and/or sending on to other users via a network.
The intelligent image quality engine 140 is shown residing on the host system 110 in the embodiment shown in
The intelligent image quality engine 140 includes a set of image processing features, a policy to control them based on system-level parameters, and a set of ways to interact with the user, also controlled by the policy. Several image processing features are described in detail below. These image processing features improve some aspects of the image quality, depending on various factors such as the lighting environment, the movement in the images, and so on. However, image quality does not have a single dimension to it, and there are a lot of trade-offs. Specifically, several of these features, while bringing some improvement, have some drawbacks, and the purpose of the intelligent image quality engine 140 is to use these features appropriately depending on various conditions, including device capture settings, system conditions, analysis of the image quality (influenced by environmental conditions, etc.), and so on. In a system in accordance with an embodiment of the present invention, the image data is assessed, and a determination is made of the causes of poor image quality. Various parameters are then changed to optimize the image quality given this assessment, so that the subsequent images are captured with optimized parameters.
In order to make informed and intelligent decisions, the intelligent image quality engine 140 needs to be aware of various pieces of information, which it obtains from the captured image, the webcam 100 itself, as well as from the host 110. This is discussed in more detail below with reference to
The intelligent image quality engine 140 is implemented in one embodiment as a state machine. The state machine contains information regarding what global parameters should be changed in response to an analysis of the information it obtains from various sources, and on the basis of various predefined thresholds. The state machine is discussed in greater detail below with respect to
As mentioned above, a system in accordance with an embodiment of the present invention uses information gathered from various sources. An image frame is received (step 210). This image is captured using certain preexisting parameters of the system (e.g., gain of the device, frame rate, exposure time, brightness, contrast, saturation, white balance, focus)
Information is obtained (step 220) from the host 110. Examples of information provided to the intelligent image quality engine 140 by the host 110 include the processor type and speed of the host system 110, the format requested by the application to which the image data is being provided (including resolution and frame-rate), the other applications being used at the same time on the host system 110 (indicating the availability of the processing power of the host system 110 for the image quality engine 140 and also giving information about what the target use of the image could be), the country in which the host system 110 is located, current user settings affecting the image quality engine 140 etc. Information is obtained (step 230) from the device 100. Examples of information provided by the device 100 include the gain, frame rate, exposure and backlight evaluation (metric to evaluate backlight conditions. Examples of information extracted (step 240) from the image frame include the zone of interest, auto-exposure information (this can also be done in the device by the hardware or the firmware, depending on the implementation), backlight information (again, this can also be done in the device as mentioned above), etc. In addition, other information used can include focus, information regarding color content, more elaborate auto-exposure analysis to deal with images with non-uniform lighting images, and so on. It is to be noted that some of the information needed by the intelligent image quality engine can come from a source different from the one mentioned above, and/or can come from more than one source.
The intelligent image quality engine 140 is then called (step 250). Due to the received information, the intelligent image quality engine 140 analyzes, in one embodiment, not only whether the quality of the received image frame is poor, but also why this might be the case. For instance, the intelligent image quality engine can determine that the presence of backlight is what is probably causing the exposure of the image to be non-optimal. In other words, the intelligent image quality engine 140 not only knows where the system is (in terms of its various parameters etc.), but also the trajectory of how it got there (e.g., the gain was increased, then the frame rate was decreased, and so on). This is important because even if the result is the same (e.g., bad picture quality), different parameters may be changed to improve the image quality depending on the assessed cause of this result (e.g., backlighting, low light conditions, etc.). This is discussed below in more detail with respect to
The parameters are then updated (step 260), as determined by the intelligent image quality engine 140. Some sets of parameters are continually tweaked in order to improved image quality in response to changing circumstances. In one embodiment, such continual tweaking of a set of parameters is in accordance with a specific image processing algorithm implemented in response to specific circumstances. For instance, a low light environment may trigger the frame rate control algorithm, and a back light environment may trigger the smart auto-exposure algorithm. Such algorithms are described in more detail below
Table 1 below illustrates an example of output parameters provided by an intelligent image quality engine 140 in accordance with an embodiment of the present invention.
These updated parameters are then communicated (step 265) appropriately (such as to the device 100, and host 110), for future use. Examples of such parameters are provided below in various tables. This updating of parameters results in improved received image quality going forward.
It is to be noted that in one embodiment of the present invention, the intelligent image quality engine 140 is called (step 230) on every received image frame. This is important because the intelligent image quality engine 140 is responsible for updating the parameters automatically, as well as for translating the user settings into parameters to be used by the software and/or the hardware. Further, the continued use of the intelligent image quality engine 140 keeps it apprised regarding which parameters are under its control and which ones are manual at any given time. The intelligent image quality machine 140 can determine what to do depending upon its state, the context, and other input parameters, and produce appropriate output parameters and a list of actions to carry out.
As can be seen from
As mentioned above, in one embodiment of the present invention, the intelligent image quality engine 140 is implemented as a state machine.
In one embodiment of a state machine, when the state machine is invoked, it looks up the current state in the associated context and then uses a predefined table of function pointers to invoke the correct function for that state. The state machine implements all the required decisions, creates the proper output using other functions (if needed) that can be shared with other state functions if appropriate and if a transition occurs it updates the current state in the context so that the next time the state machine is invoked the new state is assumed. With this approach adding a state is as simple as adding an additional function, and changing a transition amounts to locally adjusting a single function.
In one embodiment, the various transitions depend on various predefined thresholds. The value of the specific thresholds is a critical component in the performance of the system. In one embodiment, these thresholds are specific to a device 100, while the state machine is generic across different devices. In one embodiment, the thresholds are stored on the device 100, while the state machine itself resides on the host 110. In this manner, the same state machine works differently for different devices, because of the different thresholds specified. In another embodiment, the state machine itself may have certain states that are not entered for specific devices 100, and/or other states that exist only for certain devices 100.
In one embodiment, the state machine is fully abstracted from the hardware via a number of interfaces. Further, in one embodiment, the state machine is independent of the hardware platform. In one embodiment, the state machine is not dependent on the Operating System (OS). In one embodiment, the state machine is implemented with cross platform support in mind. In one embodiment, the state machine is implemented as a static or dynamic library.
Table 2 below provides an example of how low light states are selected based on the processor speed and the image format expressed in pixels per second (Width×Height×FramesPerSecond) in different modes of the intelligent image quality engine 140 (OFF/Normal mode / Limited CPU mode).
Examples of Low-LightA and Low-LightB are provided in Tables 3 and 4 respectively.
As mentioned above, various reasons for poor image quality are addressed by various embodiments of the present invention. These include low light conditions, back light conditions, noise, etc. In addition, several image pipe controls (such as contrast, saturation etc.) can also be handled. These are now discussed in some detail below.
Smart Auto-Exposure (AE):
If image quality is assessed to be poor due to back-light situations, smart AE is invoked. Smart AE is a feature that improves the auto-exposure algorithm of the camera, improving auto-exposure in the area of the image most important to the user (the zone of interest). In one embodiment, the smart AE algorithm can be located in firmware. In one embodiment, this can be located in software. In another embodiment, it can be located in both the firmware and software. In one embodiment, the smart AE algorithm relies on statistical estimation of the average brightness of the scene, and for that purpose will average statistics over a number of windows or blocks with potentially user-settable size and origin.
The zone (or region) of interest (ZOI) is first computed (step 410) based upon the received image. This zone of interest can be obtained in various ways. In one embodiment, machine vision algorithms are used to determine the zone of interest. In one embodiment, a human face is perceived as constituting the zone of interest. In one embodiment, the algorithms used to compute the region of interest in the image are a face-detector, face tracker, or a multiple face-tracker. Such algorithms are available from several companies, such as Logitech, Inc. (Fremont, Calif.), and Neven Vision (Los Angeles, Calif.). In one embodiment, a rectangle encompassing the user's face is compared in size with a rectangle of a predefined size (the minimum size of the ZOI). If the rectangle encompassing the user's face is not smaller than the minimum size of the ZOI, this rectangle is determined to be the ZOI. If it is smaller than the minimum size of the ZOI, the rectangle encompassing the user's face is increased in size until it matches or exceeds the minimum size of the ZOI. This modified rectangle is then determined to be the ZOI. In one embodiment, the ZOI is also corrected so that it does not move faster than a predetermined speed on the image in order to minimize artifacts caused by excessive adaptation of the algorithm. In another embodiment, a feature tracking algorithm, such as that from Neven Vision (Los Angeles, Calif.) is used to determine the zone of interest.
In yet another embodiment, when no zone of interest is available from machine vision, a default zone of interest is used (for instance, corresponding to the center of the image and 50% of its size). It is to be noted that in one embodiment, the zone of interest also depends upon the application for which the video captured is being used (e.g., for Video Instant Messaging, either the location of motion in the image, or location of the user's face in the image may be of interest). In one embodiment, the ZOI location module will output coordinates of a sub-window where the user is located. In one embodiment, this window encompasses the face of the user, and may encompass other moving objects as well. In one embodiment) the window is updated after every predefined number of milliseconds. In one embodiment, each coordinate cannot move by more than a predetermined number of pixels per second towards the center of the window, or by more than a second predetermined number of pixels per second in the other direction. Additionally, in one embodiment, the minimal window dimensions are no less than a predetermined number of pixels both horizontally and vertically of the sensor dimensions.
The zone of interest computed for the frame is then translated (step 420) into the corresponding region on the sensor of the image capture device 100. In one embodiment, when the ZOI is computed (step 410) in the host 110, it needs to be communicated to the camera 100. The interface used to communicate the ZOI is defined for each camera. In one embodiment, the auto-exposure algorithm reports its capacities in a bitmask for a set of different ZOIs. Then, the driver for the camera 100 posts the ZOI coordinates to the corresponding property, expressed in sensor coordinates. The driver knows the resolution of the camera, and uses this to translate (step 420) from window coordinates to sensor coordinates.
The ZOI is then mapped (step 430) to specific hardware capabilities depending on the AE algorithm used. For example if the AE algorithm uses a number of averaging zones on the sensor, the ZOI is made to match up as closely as possible to a zone made of these averaging zones. The AE algorithm will then use the zones corresponding to the ZOI with a higher averaging weight while determining exposure needs. In one embodiment, each averaging zone in the ZOI has a weightage which is a predetermined amount more that the other averaging zones (outside the ZOI) in the overall weighted average used by the AE algorithm. This is illustrated in
Table 5 below illustrates some possible values of some of the parameters discussed above for one embodiment of the smart AE algorithm.
In one embodiment, some of the above parameters are fixed across all image capture devices, while others vary depending on which camera is used. In one embodiment, some of the parameters can be set/chosen by the user. In one embodiment, some of the parameters are fixed. In one embodiment, some of the parameters are specific to the camera, and are stored on the camera itself.
In one embodiment, the smart auto-exposure algorithm reports certain parameters to the intelligent image quality engine 140 for example the current gain, with different units so that meaningful thresholds can be set using integer numbers. For example, in one embodiment, to allow sufficient precision, the gain is defined as an 8 bit integer, with 8 being a gain of 1, and 255 being a gain of 32.
In one embodiment, the smart auto-exposure algorithm reports to the intelligent image quality machine 140 an estimation of the degree to which smart AE is required (backlight estimation), by subtracting the average of the outside windows from the average of the center windows. For that purpose, in one embodiment the default size of the center window is approximately half the size of the entire image. Once the smart AE feature is enabled, that center window becomes the ZOI as discussed above. In one embodiment, this estimation of the degree to which smart AE is required is based on the ratio (rather than the difference), depending on the implementation between the average of the center and the average of the outside. In one embodiment, a uniform image will yield a small value, and the bigger the brightness difference between the center and the surrounding, the larger this value (regardless of whether the center or the outside is brighter)
Frame Rate Control:
When low light conditions are encountered, the frame rate control feature may be implemented in accordance with an embodiment of the present invention. This provides for a better signal-to-noise ratio in low-light conditions.
When the frame rate requested by the application is reached, and the light available decreases further, the gain is increased (as depicted by the horizontal part of the plot) steadily. As the light available decreases even further, a point is reached (the maximum gain threshold) when increasing the gain further in not acceptable. This is because an increase in gain makes the image noisy, and the maximum gain threshold is the point when further increase in noisiness is no longer acceptable. If the available light decreases further beyond this point, then the frame rate is decreased again (integration time is increased). Finally, when the frame rate has been reduced to a minimum threshold (min frame rate), if available light is further decreased, other measures are tried. For instance, gain may be increased further, and/.or other image pipe controls are played with (for instance, desaturation may be increased, contrast may be manipulated, and so on).
In one embodiment, the frame rate algorithm has the parameters shown in Table 6.
When the maximum frame time is shorter than the maximum frame time corresponding to the frame rate requested by the application, in one embodiment this parameter is disregarded in order to optimize image quality (this is what happens on the left side of
Image Pipe Controls
Several other features that are implemented in accordance with an embodiment of the present invention, and are discussed here under image pipe controls. Image pipe controls are a set of knobs in the image pipe that have an influence on image quality, and that may be set differently to improve some aspects of the image quality at the expense of some others. For instance, these include saturation, contrast, brightness, and sharpness. Each of these controls has some tradeoffs. For instance, controlling saturation levels trades colorfulness for noise, controlling sharpness trades clarity for noise, and controlling contrast trades brightness for noise. In accordance with embodiments of the present invention, the user specified level of a control will be met as much as possible, while taking into account the interplay of this control with several other factors, to ensure that the overall image quality does not degrade to unacceptable levels.
In one embodiment, these image pipe controls are controlled by the intelligent image quality machine 140. In another embodiment, a user can manually set one or more of the se image pipe controls to different levels, as discussed in further details below. In another embodiment, one or more image pipe controls can be controlled by both the user and the intelligent image quality engine, with the user's choice overruling that of the intelligent image quality engine.
In one embodiment, the various controls are part of the image pipe, either in software or in hardware. Some of the parameters for the image pipe controls are in Table 7 below.
Temporal Filter
As mentioned above with respect to
In one embodiment, the temporal noise filter is a software image processing algorithm that removes the noise by averaging pixels temporally in non-motion areas of the image. While temporal filtering reduces temporal noise in fixed parts of the image, it does not affect the fixed pattern noise. This algorithm is useful when the gain reaches levels at which noise becomes more apparent. In one embodiment, this algorithm is activated only when the gain level is above a certain threshold.
In one embodiment, temporal filtering has the parameters shown in Table 8:
User Interface
In one embodiment, the default implemented in the image capture device 100 is that the intelligent image quality engine 140 is enabled, but not implemented without user permission. Initially the actions of the intelligent image quality engine 140 are limited to detecting conditions affecting the quality of the image (such as lighting conditions (low-light or backlight)), and/or using the features as long as they do not have any negative impact on user experience. However, in one embodiment, the user is asked for permission before implementing algorithms that make tradeoffs as described above.
As mentioned above, improvements to the image quality that can be made without impacting the user experience are made automatically in one embodiment. When any of the triggers are reached requiring further improvements which will result in tradeoffs, the user 120 is asked whether to enable such features, and is informed about the negative effects, or given the option to optimize those himself. The user 120 is also asked, in one embodiment, whether he wants to be similarly prompted in future instances, or whether he would like the intelligent image quality engine to proceed without prompting him in the future.
In one embodiment, if the user 120 accepts the implementation of the intelligent image quality engine 140, and chooses not to be asked next time, then the intelligent image quality engine 140 will use various features in the future without notifying the user 120 again, unless the user 120 changes this setting manually. If the user 120 accepts the implementation of the intelligent image quality engine 140, but chooses to be notified next time, then the intelligent image quality engine 140 will use various features without notifying the user 120, until no such features including tradeoffs are needed, or the camera 100 is suspended or closed. If the user 120 refuses to use the intelligent image quality engine 140, then the actions taken will be limited to those that do not have any negative impact on the user experience.
In one embodiment, several of the features associated with the intelligent image quality engine 140 can also be manually set.
Table 9 below includes the mapping of User Interface (UI) controls to parameters in accordance with an embodiment of the present invention.
As can be seen from
While in Auto mode (9 or 10) the UI behaves as:
There is a distinction between auto modes 9 and 10. Mode 9 is the high power consumption by the CPU of the host system 110 mode, and 10 is the low power consumption by the CPU of the host system 110 mode. Other features/applications (e.g., intelligent face tracking, use of avatars, etc.) used affect the selection of these modes.
In one embodiment, these modes are stored on a per-device level in the application. If the user puts one camera in manual mode and plugs in a new camera, the new camera is initialized into the default mode. Plugging the old camera in will initialize it in the manual mode. If the user cancels (presses esc key) while the prompt dialog shown in
In accordance with an embodiment of the present invention, an image capture device 100 is equipped with one or more LEDs. These LED(s) will be used to communicate to the user information regarding the intelligent image quality engine 140. For instance, in one embodiment, a steady LED is the default in normal mode. A blinking mode for the LED is used, in one embodiment, to give feedback to the user about specific modes the camera 100 may transition into. For instance, when none of the intelligent image quality algorithms (e.g., the frame rate control, the smart AE, etc.) are being implemented, the LED is green. When the intelligent image quality engine enters one of the states where such an algorithm will be implemented, the LED blinks. Blinking in this instance indicates that user interaction is required. When the user interaction (such as in
While particular embodiments and applications of the present invention have been illustrated and described, it is to be understood that the invention is not limited to the precise construction and components disclosed herein. For example, other metrics and controls may be added, such as software based auto-focus, different uses for the ZOI, more advanced backlight detection and AE algorithms, non uniform gain across the image etc. Various other modifications, changes, and variations which will be apparent to those skilled in the art may be made in the arrangement, operation and details of the method and apparatus of the present invention disclosed herein, without departing from the spirit and scope of the invention as defined in the following claims.