1. Field of the Invention
The present invention relates generally to a method and apparatus for the capture, analysis, and enhancement of digital images and digital image sequences and to a data format resulting therefrom.
2. Description of the Related Art
Millions of users are turning to digital devices for capturing and storing their documents and still and motion pictures. Market analysts estimate that more than 140 million digital image sensors were produced for digital cameras and scanners in all applications in 2002. This number is expected to grow over sixty percent per year through 2006. The digital image sensor is the “film” that captures the image and sets the foundations of image quality in a digital imaging system. Present camera designs require significant processing of the data from the digital image sensors in order to obtain a meaningful digital image from the “film” after the picture is taken. Despite this processing, millions of users are also being exposed to the need (and opportunity) to correct or adjust these images on computers using image manipulation software to obtain the desired image quality.
The body of algorithms, mathematics, and techniques, for the correction, adjustment, compression, transmission or interpretation of digital images and image sequences are prescribed by the broad field of digital image processing. Almost every digital imaging application incorporates some digital image processing algorithms into either the system software or hardware to achieve the desired objective. Most of these methods are used to process the image after the image has been acquired. Image processing methods that are used to process the image after the image formation are called post-processing methods. Post-processing methods make up the majority of techniques implemented in current imaging systems and include techniques for the enhancement, restoration and compression of digital image stills and image sequences.
Growing with millions who are essentially becoming their own photo-labs, by fixing, printing, and distributing their own digital images and video, is the demand for more a sophisticated means of post-processing images and video. Even film photographers are seeking solace in the digital domain to correct problems with their film images by scanning them in at kiosks to hopefully correct problems with the images using special post processing algorithms. Furthermore, the growth in digital imaging is leading to a burgeoning number of images and image sequences in digital format and the need to compress, describe catalogue, and transmit objects in digital still images and video is becoming paramount. This trend toward object or content based processing presents new opportunities as well as new challenges for the processing of digital still images and video.
The need to adjust picture quality after capture is required due to many factors. For example, lossy compression, inaccurate lens settings, inappropriate lighting conditions, erroneous exposure times, sensor limitations, uncertain scene structure and dynamics are all factors that affect final image quality. Sensor noise, motion blur, defocus, color aberrations, low contrast, and over/under exposure are all examples of distortions that may be introduced into the image during image formation. Lossy compression of the image further aggravates these distortions.
The field of image restoration is the area of digital image processing that provides rigorous mathematical methods for the estimation of an original, undistorted image from a degraded, observed image. Restoration methods are based on (parameterized) models of the image formation and the image distortion process. In contrast, the field of image enhancement provides methods for ad hoc, subjective adjustment of digital still images and video. Image enhancement methods are implemented without the guide of a rigorous image model. The overwhelming majority of software and hardware implementations of image processing algorithms utilize image enhancement methods because of their simplicity. However, because of their ad hoc application, image enhancement algorithms are effective on only a limited class of image distortions.
The need for improved image enhancement is demonstrated by the market driven efforts put forth by major digital imaging software companies like Adobe Systems Incorporated. Approximately $66 million of Adobe's reported $297 million in sales in the quarter ending Feb. 28, 2003, was spent on research and development in digital imaging software. Adobe also reported a 23% increase in digital imaging software sales over the same quarter of 2003. Among the most recent technical advances in this area is a new opportunity to access camera raw or the “digital negative” image for more powerful post-processing. The “digital negative” is the image data before post processing closest to the sensor array. However, post-processing of even the raw camera data remains limited if information regarding the scene and the camera is not incorporated into the post-processing effort.
Many digital image distortions are caused by the physical limitations of practical cameras. These limitations begin with the passive image formation process used in many digital imaging systems. Traditional imaging systems, as shown in
where, {haeck over (f)}(l) is the continuous value of image intensity (before analog-to-digital conversion) at pixel location l=(x,y), τe is the exposure time in seconds, ε=(εx, εy) is the pitch of the pixel respectively, iph(l,t) and in(l,t) are the photo electronic current and electronic noise current at location l at time t.
The equation describes the pixel level image formation found in almost all digital and chemical film imaging systems. The equation also describes the image formation as a passive, continuous time process that requires shutter management and exposure time determination. Shutter management and exposure time determination is one of the weaknesses of conventional image formation and is based on a one hundred year old film image capture philosophy. This is the same image formation approach that provided the original motivation to digitize film photographs for post processing in the 1960's.
Shuttering is used to prevent bright light from saturating chemical film and to limit bleaching and blooming in electronic imaging arrays. In shuttering, the entire film/array surface is subject the same exposure time despite the fact that the brightness of the incident light varies across the area of the film. For this reason, some areas on the film are often underexposed or overexposed because of the global determination of exposure time. In addition, most exposure time determination strategies are easily tricked by scene dynamics, lens settings and changing lighting conditions. The global shuttering approach to image formation is only suitable for capturing static, low contrast images where the scene and camera is stationary and the difference between bright and dark regions in the image is small.
For these and other reasons presented later herein, the performance of the current digital and film cameras are limited by design. The passive image formation process described in the equation limits low light imaging performance, limits array (or film) sensitivity, limits array (or film) dynamic range, limits image brightness and clarity, and allows for a host of distortions including noise, blur, and low contrast to corrupt the final image.
Whether in a digital or chemical film imaging system, the sensor array 22 sets the foundation of image quality. How this image is captured is key because the quality of the signal read from the “film” guides the ultimate image quality downstream. The image formation process as shown in
The earliest post-processing algorithms were developed to correct the distortions observed in moon images caused by the inherent limitations of the television camera aboard the Ranger 7 probe launched in 1964. Almost 40 years later, post-processing algorithms remain necessary to correct image distortions from cameras. The major obstacle to accurate and reliable post-processing of digital images and video is the lack of detailed knowledge of the imaging system, the image distortion, and the image formation process. Without this information, adjusting the image quality after the image formation is an inefficient guessing game. Many post-processing software packages, for example, Adobe Photoshop and Corel Paint, give the user some control over their image enhancement algorithms. However, without detailed knowledge of the image formation process, the suite of image improvement tools in these packages: cannot correct the underlying source of the distortion; are limited to user selectable or global algorithm implementation; are not compatible with object oriented post-processing; are useful on a limited class of image distortions; are often applied in image regions that are not distorted; are not suitable for reliable automatic removal of many distortions; and are applied after the image formation process is complete.
The most successful applications of post-processing for image enhancement are those where one or more of the following is known: knowledge of the scene, knowledge of the distortion, or knowledge of the system used to acquire the image. An example of a startling success in post-processing is the Hubble Space Telescope (HST). The images from the billion dollar HST were distorted due to a misaligned mirror. The behavior of the HST was well known and highly engineered, therefore it was possible to derive accurate image distortion models that could be used to restore the degraded HST images. The HST mirror was later fixed in a another mission; however, due to the available technology, many distorted images where salvaged by post processing.
Unfortunately most post-processing software and hardware implementations do not have access to nor do they incorporate or convey limited knowledge of the scene, the distortion, or the camera in their processing. In addition, the parameters that characterize the filters and algorithms used to reliably remove distortions from digital images and video require additional knowledge that is often lost after the image is formed and stored.
Detailed information is required to properly (and automatically) adjust image quality. The beginnings of such information includes, for example, camera settings (aperture, f-stop, focal length, exposure time) and film/sensor array parameters (speed, color filter array type, pixel size and pitch), are examples of some of the parameters available for exchange according to the digital camera standard EXIF V2.2. However, these parameters only describe the camera parameters not the scene structure or dynamics. Detailed scene information is not extracted or conveyed to the end user (external devices) in conventional cameras. Meta-data regarding the scene structure and dynamics is extremely valuable to those who want to restore images, correct severe distortions, or analyze complex digital images quickly.
In general, post processing becomes inefficient in the absence of such knowledge in that the perceived distortion may not be in the user selected region of the image. In this case, post-processing is applied in areas where no distortions exist, resulting in wasted computational effort and the possibility of introducing unwanted artifacts.
Despite the definition of sophisticated content or object based encoding standards for digital still images and digital video images, there remains the challenge of breaking down the image into its component objects. This process is called image segmentation. Efficient and reliable image segmentation remains an open challenge. In order for the higher level content-based functionality of multimedia standards, such as MPEG-4 and MPEG-7 to expand in popularity, segmenting the image (sequence) into its components and providing a framework for post processing these objects will be required.
A powerful cue for image segmentation is motion. The evidence and nature of the motion in an image sequence provides salient cues for differentiating background objects from foreground objects. Important information regarding the motion of objects in a still image is lost during image formation. If an object moves during image formation, a blur will be evident in the final image. Characterizing the blur in the image requires more information than what is available in a single frame. However, sufficient information regarding the motion and the extent of a moving object can be derived by monitoring the behavior of pixels during image formation.
The present invention extracts, records, and provides critical scene and image formation data, referred to herein as meta-data, to improve the effectiveness and performance of still image and video image processing using hardware and software resources. Without a loss of generality, from this point forward, post-processing will refer to hardware and software apparatus and methods for both digital still image and video image processing. Digital still image and video image processing includes methods for the enhancement, restoration, manipulation, automatic interpretation and compression of visual communications data.
Many image distortions can be detected and, in some cases, prevented at the pixel level during image formation. Post-processing can be used reduce or eliminate these distortions without pixel level processing if sufficient information is provided to the post-processing algorithms. Part of the present invention is the definition of the relevant information required for post-processing to efficiently remove difficult distortions.
Key innovations of the various embodiments of this invention are to improve image and video post-processing through: extraction of meta-data from the image both at and during the image formation process; computation and provision of meta-data describing the type and presence of a distortion or activity in an image or image sequence region; computation and provision of meta-data to focus processing effort on specific regions of interest within an image or image sequence; and/or to provide sufficient meta-data for the correction of an image or image sequence region based on the type and extent of the distortion of digital images and video.
The invention disclosed in this document in its various embodiments can be: used in any array of sensors where the all or part of the array elements are used to extract an image or some other interpretable information; used in multi-dimensional imaging systems including 3D and 4D imaging systems; applied to arrays of sensors that are sensitive to thermal or mechanical, or electromagnetic energies; applied to a sequence of images to derive a high quality individual frame; and/or implemented in hardware or software.
a is a schematic diagram of a generic conventional digital imaging system;
b is a flow diagram of the process steps being carried out by the imaging system of
a,
2
b,
2
c and 2d are graphs of pixel charge accumulation;
a,
3
b,
3
c and 3d are graphs of pixel signal intensity;
a is a block diagram showing a basic digital camera OEM development system architecture;
b is a block diagram of a basic digital camera with a meta-data processor;
a is a schematic diagram showing a meta-data enabled image formation;
b is a flow diagram showing a meta-data enabled image formation of
a is a block diagram of a meta-data processor implementations having the meta-data processor combined with system controller;
b is a block diagram of a meta-data processor implementation having the meta-data processor combine with DSP/RISC processor
c is a block diagram of a meta-data processor implementation having the meta-data processing combined with system controller and DSP/RISC; and
In an embodiment of the present invention, information regarding the scene is derived from analyzing (i.e. filtering and processing) the evolution of pixels (or pixel regions) during image formation. This methodology is possible since many common image distortions have pixel level profiles that deviate from the ideal. Pixel profiles provide valuable information that is inaccessible in conventional (passive) image formation. Pixel signal profiles are shown in
Signal distributions shown in
The graphs of
In an embodiment of the invention, meta-data refers to a set of information that can be used to improve the performance or add new functionality to the post-processing of digital images and video in either software or hardware. Meta-data may include one or more of the following: camera parameters, sensor/film parameters, scene parameters, algorithm parameters, pixel values, time instants or distortion indicator flags. This list is not exhaustive, and further aspects of the image may be identified in the meta-data. The meta-data in various embodiments conveys information regarding single pixels or arbitrarily shaped or sized regions, such as object regions.
Using this definition, meta-data can be put into one of two categories, (1) pre-acquisition meta-data (P-Data) and (2) intra-acquisition meta-data (I-Data). Pre-acquisition meta-data refers to the scene and imaging system information available before image is formed on the sensor array. The P-Data may vary from image to image but is static during image formation. Such pre-acquisition data can also apply to film systems. P-Data data is derived by the imaging system before acquiring an image of the desired light (energy). Specific examples of pre-acquisition meta-data can includes all of the tags in the EXIF standard, for example, exposure time, speed, f-stop, and aperture size.
Some of this information is available far in advance of the image acquisition, such as the sensor parameters and lens focal length. Other information is available only immediately before the image acquisition begins, such as ambient light conditions and exposure time. The present invention also encompasses meta-data within the class of pre-acquisition meta-data that is captured and defined during the image capture, or acquisition. For instance, exposure time could be set by the imaging system prior to initiating the image acquisition or may be changed during the course of image acquisition as a result of changes in the lighting conditions, for example, or due to real time monitoring of the image capture by light sensors or the like. This information is included within the definition of pre-acquisition meta-data for purposes of this invention even if some of the data is derived during the acquisition of the image.
The determination of the pre-acquisition parameters facilitates the attainment of meaningful images. Many image distortions occur and cannot be addressed in subsequent processing when these parameters are improperly set or are unknown. With such information available, processing of the image can be carried out in a meaningful way.
Intra-acquisition meta-data, or I-Data, refers to the information regarding the image that can be derived during the image formation process. The I-Data tends to be dynamic information that provides data that can be used to detect the onset or presence of an image distortion in a specific pixel or region of pixels. The intra-acquisition data is, in one embodiment of the invention, derived on a pixel or pixel region basis by monitoring the pixels or pixel regions, although it is within the scope of this invention that the intra-acquisition data could be image wide. I-Data conveys information for image post-processing software or hardware to correct or, in some cases, prevent distortions from corrupting the details of the final image. Those skilled in the art also will note that I-Data can assist in motion estimation and analysis and image segmentation. I-Data can include but is not limited to, distortion indicator flags and time instants for a pixel or group of pixels. An efficient representation for I-Data according to the present embodiment is as masks where each pixel or pixel block location is mapped to a specific I-Data location. For example, in an image sized mask, each pixel can map to specific I-Data mask location.
The present method addresses both the rate of accumulation of the signal intensity and changes in the rate of signal accumulation or signal intensity at the sensor, pixel or pixel region that occur at or after a time of acquisition of the image. These may be a result of, for example, movement that occurs by one or more objects in the image frame or by the image capture device during the acquisition, unexpected time variations in illumination or reflectance, or under-exposure (low light) or over-exposure (saturation) of the sensors, pixels or pixel regions during the acquisition of the image. The events which are characterized as changes in the rate of signal accumulation may be described as temporal events or temporal changes in the image during the acquisition since they occur at some time or over some time during the image acquisition interval. They may also be thought of as temporal perturbations or unexpected temporal changes. Motion is one class of such temporal change. The rate of change of the intensity signal is used to identify and correct the temporal events, and can also be used to identify and correct low light conditions wherein insufficient light reaches the sensor to overcome the effects of noise on the desired signal.
In one embodiment, the intra-acquisition meta-data extraction process utilizes an image sensor 200, distortion detector 202, image estimator 204, mask formatter 206, and an image sequence formatter 208, as shown in
In further detail as shown in
In
The distance measure module in the blur processor determines what facet of the signal will be detected to indicate a distortion. Motion blur distortions occur when individual pixels in an image region observe a mixture of multiple intensities caused by moving objects during image formation. Detecting motion blur at the pixel level, is to detect the change in image intensity at the pixel during image formation. By detecting this change, the original (pre-blur) pixel intensity can be preserved. The distance measure may used to detect a change in the mean, variance, correlation or sign of correlation of the residual rBk. Since the pixel in an imaging array experience both signal dependent (i.e., shot noise) and signal independent noise (i.e., thermal noise) change in mean, variance and correlation can be applied. In this embodiment, the change in mean distance measure, sBk=rBk is used. Examples of change in variance, correlation or sign of correlation distance measures include sBk=(rBk)2−sr2, sBk=rBkfk−m(l) and sBk=sign(rBkrBk−1)respectively where sr2 is a known residual variance and m<k.
When a distortion is detected, the blur detection module emits an alarm consisting of the time of the distortion kB, and a (pre-distortion) pixel value fB. The blur detection algorithm in the change of mean case uses the CUSUM (Cumulative SUM) algorithm,
where n>0 is a drift parameter and hk>0 is an index dependent detection threshold parameter. This algorithm is resistant to false positives caused by large instantaneous errors below threshold hk thus permitting integration or filtering of the pixel intensity to continue. The drift parameter adds a temporal low-pass filtering that effectively filters or “subtracts-off” spurious errors, reduces false positives, and making the detection process biased to large localized errors or small clustered errors characterized by motion blur. When gBk exceeds the threshold hk, an alarm is emitted and the algorithm is restarted gBk=0 in the next time instant. The threshold hk is allowed to be index dependent to maximize integration time at each pixel. The threshold hk is ignored at first sample time k=1, and may be allowed to increase at the end of the exposure interval since the larger intensity deviations will be required to corrupt a pixel near the end of exposure time. This is allowed to further reduce signal independent noise at the pixel. The essential tradeoff in change detection is sensitivity versus delay. The values hk and n are tuned to optimize detection time and to prevent false positives, those skilled in the art are familiar with methods to design these parameters. The disclosed method of blur detection is superior to the work first by Tull and later by El-Gamal by allowing forgetting into the detection process and by allowing for meta-data to be generated from the detection process.
The magnitude processor 212 shown in
In the filter stage of the exposure processor, an estimate of the current image intensity {circumflex over (q)}Ek is obtained using a 2nd order auto-regressive (AR) prediction error estimator1, which gives the prediction error, rBk=fk(l)−{circumflex over (q)}Bk.
The output of the exposure processor distance measure module is computed from sEk={circumflex over (q)}Ek+(N−k)rBk which is an extrapolation of the current intensity estimate to its final pixel intensity.
The exposure detector module implements two CUSUM based algorithms,
where hL and hU are the lower and upper detector thresholds, nL and nU the lower and upper drift coefficients and gLk and gUk are the upper and lower test statistics, respectively. The drift coefficients and threshold are set to perform upper and lower boundary detection for the pixel intensity. When either test statistics exceed their respective thresholds, an alarm consisting of the instantaneous prediction error, stored in fE, and the time instant of the alarm, kE, is sent to the distortion interpreter.
The distortion interpreter (DI) 214 prioritizes the distortion vectors and prepares the intra-acquisition meta-data for each pixel. The interpreter tracks changes in the distortion vectors and eliminates redundant detection. In the embodiment, the interpreter is responsible for recording one distortion event (per pixel per exposure) to minimize storage. A multiplicity of distortion events per pixel per exposure time can be catalogued with sufficient memory resources. The distortion interpreter generates, stores and emits meta-data based on events obtained from the exposure and blur detectors. The meta-data output vector format for each pixel is
v(l)={(distortion class, time, value),(distortion class, time, value)}
Each pixel can only have a single exposure class distortion or a single blur class distortion or both. Two single or blue class distortions are not allowed. For example, let a pixel experience a single change corresponding to motion at instant k during the exposure time. At the end of the exposure time, the DI generates a vector, v(l)={PB,k,fB}, where PB is a distortion class symbol indicates partially blurred, k is the time instant and fB is the pre-distortion value of the pixel. This vector allows the fully exposed value of the original pixel intensity to be reconstructed in post-processing as, fN (l)=({fraction (N/k)})×fB where N is the number of observations made during image formation. Consider the same pixel but the new intensity value observed by this pixel will saturate the pixel. In this case the meta-data vector becomes, v(l)={PB,k,fB,X,k+1,fE}. This vector allows post processing software to accurately reconstruct the original un-blurred pixel at time k and the high intensity pixel value observed at instant k+1. The pixel value at k+1 is given as fk+1(l)=(N/k+1)×fE. If the pixel is reset at this point, more intensities could be estimated. By predicting the onset of saturation, light intensities N times brighter than the dynamic range of the pixel can be represented in post-processing, where N is the number of observations of the pixel.
The distortion interpreter generates one of three blur distortion class symbols per pixel, partially-blurred (PB), blurred (B), or no blur at all (S). The S class is typically dropped in practice. This classification is based on the number of changes observed during image formation. In the case of a PB pixel, a single change is observed during image formation as is the case when an object covers or uncovers a pixel (or pixel region). When two or more intensity changes are observed during image formation the pixel is said to be blurred (B) pixel. When no changes are detected during image formation then the pixel is a stationary or an (S) pixel. In practice (PB and B) pixels do not occur in isolation. The distortion interpreter enforces this constraint on the Blur Processor detector by checking neighborhood pixels for other (PB and B) pixels to ensure consistency. The distortion interpreter may reset the condition of the blur processor to enforce this condition at a local pixel.
The distortion interpreter also generates one of three exposure distortion class symbols per pixel, under-exposed (L), over-exposed (X) or sufficiently exposed (N). In practice (L and X) pixels do not occur in isolation. The distortion interpreter enforces this constraint on the exposure processor by checking neighborhood pixels for other (L and X) pixels to ensure consistency. The distortion interpreter may reset the condition of the exposure processor to enforce this condition. The (L) assignment will allow the noise in under-exposed pixels to be spatially filtered with similar pixels in post-processing. Numerous methods to filter noise are known to those skilled in the art.
The image intensity estimator develops the final value of the image from the samples, fk(l) and produces a two dimensional vector of intensity values f. Various filtering methods can be used to estimate the final image intensity to reduce noise. In this embodiment, the image intensity is accumulated (and later averaged) as in a conventional imaging system while distortions are managed by the distortion detector.
The mask formatter structures the intra-acquisition meta-data into masks for efficient storage and transmission for each pixel. The intra-acquisition meta-data may be provided for pixel groups rather than for individual pixels in some instances. The groups or regions of pixels may be defined in any number of ways. In one embodiment, the regions of pixels are defined by binning of the pixels during imaging. Binning is the process whereby groups of adjacent pixels are combined to act as a single pixel during the image capture.
For purposes of the present invention, the terms pixel and pixel regions include sensors having multiple sensor elements, sensor elements arranged in a sensor array, single or multiple chip sensors, binned pixels or individual pixels, groupings of neighboring pixels, arrangements of sensor components, scanners, progressively exposed linear arrays, etc. The sensor or sensor array is more commonly sensitive to visible light, but the present invention encompasses sensors that detect other wavelengths of energy, including infrared sensors (such as near and/or far infrared sensors), ultraviolet sensors, radar sensors, X-ray sensors, T-ray (Terahertz radiation) sensors, etc.
The present invention refers to masks for defining various regions and/or groups of pixels or sensors. The identification of such groups of sensor or regions need not be described by a mask in the traditional sense of image processing, but for purposes of the present invention encompasses identification and/or definition of the sensors, pixels, or regions by whatever means provides a communication of the identified sensors, pixels or regions. References to masks herein include such definitions or identifications.
A blur mask is provided according to some embodiments of the invention. In a still image, motion blur is both a objectionable image distortion as well as an important visual cue. There is psychophysical evidence from the visual science literature that motion related distortions are used by the human visual system to adjust the perceived spatial and temporal resolution of the images on the retina. For this reason, appropriate treatment of the blur in the image is important to the visual clues for the observer or for removing undesired blur. The blur mask is therefore an important meta-data component in some embodiments of the invention. The purpose of the blur mask is threefold: to define regions corresponding to fast moving objects, to facilitate object oriented post-processing, and to remove motion related distortions.
Each element of the blur mask 80 can classify a pixel in one of three categories, as noted in
Category S—Stationary: A pixel is assigned this designation if it has been determined that the pixel observed a single energy intensity during image formation and therefore did not experience a motion related blur. This determination can be made deterministically or stochastically. An example of a stationary pixel or pixel group is indicated in
Category PB—Partially blurred: A sensor pixel is assigned this designation if it has been determined that, at any instant, the sensor pixel observed a mixture of two more distinguishable energy intensities during the image formation time, or exposure time. In this case, the sensor pixel contains a blurred observation of the original scene. When used in conjunction with pixel motion estimates and the classification B—Blurred, the PB—partially blurred classification specifically designates pixels that observed a combination of moving and stationary objects. In the usual case, the moving objects are foreground objects and the stationary objects are background objects, although this is not always so. An example of a partially blurred pixel or pixel group is indicated in
Category B—Blurred: A pixel is assigned this designation if it has been determined that the pixel or pixel region observed a mixture of multiple energy intensities throughout the image formation time and therefore the pixel is a blurred observation of the original scene. An example of a blurred pixel or pixel region is indicated in
When used in conjunction with pixel motion estimates and the PB—partially blurred pixel classification, the B—blurred pixel classification specifically designates pixels or pixel regions that only observed moving, usually foreground, objects during the exposure time. The reference to objects here and throughout is not limited to physical objects, but includes image areas that may include background, foreground or mid-ground objects or areas or portions of objects.
The classification process for each pixel or pixel region can be made deterministically (such as by detecting changes in slope of the pixel profile), or stochastically (such as by using estimation theory and detecting changes in an estimated parameter vector) using a single pixel or pixel region or by using multiple pixels or pixel regions in each case. In the absence of pixel or pixel region motion estimates, only the S—stationary and PB—partially blurred classifications are used in the blur mask since the distinction between blurred and non-blurred pixels are derivable from pixel profiles. Additional information such as motion estimates facilitates the distinction of B—blurred and PB—partially blurred pixel classifications for the purpose of object based motion blur restoration.
The areas of the image having common categories of pixels or pixel regions are groups into bounded regions, these bounded regions providing the blur mask of the meta-data. Thus, the blur mask 80 is used to indicate areas of an image in which motion resulted in blurring of the image. Post processing methods can use such masks to reduce, remove, or otherwise process the areas of the image defined by the mask. Detection of the blurred portions of the image may also be used for motion detection or object identification, such as in vision systems for intelligent systems, autonomous vehicles, security systems, or other applications where such information could be useful.
An important concept embodied in the foregoing discussion of the blur mask is that neighboring pixels or pixel regions experience the same or similar results during the imaging process. Blur does not occur in only a single pixel but instead is found over an area of the image. The detection of blur is assisted by computing a result for a neighborhood of pixels and the processing of the image to remove or otherwise treat the blur is carried out on the neighborhood of pixels. This neighborhood concept carries through to the following discussion of intensity masks and event time masks as well. Any distortion determined using the present invention may be recognized or processed by relying on neighboring pixels or pixel regions.
The detection of the blurring in the image requires sampling of the sensor during image acquisition. This may be performed in a number of ways, including sampling only selected ones of the pixels of the image or sampling all or most of the pixels in the sensor. To accomplish this, particularly the latter approach, requires a sensor or sensor array which permits non-destructive reading of the signal during the image acquisition. Examples of sensors that permit this are CMOS (Complementary Metal Oxide Semiconductor) sensors and CID (Charge Injection Device) sensors. The pixels or pixel groups can thus be looked at at multiple times during the image formation. In the case where non-destructive sensing is not possible, intra acquisition pixel values may be stored in external memory for processing.
As shown in
Signal dependent noise includes, for example, shot noise where the variance of this noise is typically proportional to the square root of signal intensity. In low lighting conditions, pixel responses to incident light can be dominated by both signal dependent and signal independent noise sources and should be processed according to this knowledge.
State X—Saturated: A pixel or pixel region receiving this designation has observed high intensity light based on the camera or imaging system settings, for example the intensity of the received light is too great for the length of the exposure. Pixels having this designation either have saturated or will saturate during the image exposure time. An example of state X is shown at 90.
State L—Low light: A pixel or pixel region assigned this designation has observed low light intensity relative to camera settings and may be underexposed. Consequently, a pixel or pixel region with the state L will be contaminated with noise. In other words, the noise will be a significant portion of the useful signal available from the pixel. An example of a pixel or pixel region with state L is at 92.
State N—Normal: A pixel or pixel region assigned this designation has been determined to have been properly exposed according to the camera settings and will need minimal noise processing. In other words, the noise signal is not a significant portion of the useful signal from this pixel or pixel region (because the useful signal is much higher than the noise portion of the signal) and the pixel has not reached or neared saturation. An example of a pixel or pixel region at state N is at 94.
The areas of the image having these states are grouped to form the bounded areas of the intensity mask. The intensity mask is a component of the meta-data according to embodiments of the invention.
The intensity mask 88 allows for powerful post-processing to localize computation efforts to remove distortions and extend camera performance. State L—low light pixels detected by this mask can be corrected by local filtering among other low light pixels or pixel regions. In other words, the noise signal is filtered out of the under-exposed, state L pixels or pixel regions. Bright state X—saturated class pixels that have not yet reached the saturation level may be extrapolated to their ultimate value with the assistance of an event time mask. The event time mask is discussed in greater detail hereinafter. It may also be possible to do an extrapolation of an ultimate value for pixels that have reached a saturation point. It may be necessary in such instances to perform a shifting of the brightness, or intensity, range of the image to accommodate the extrapolated value. This post-processing capability expands the linear dynamic range of the captured image for richer color and greater detail, or at least to obtain detail in an area of the image otherwise void of information (a region of saturated pixels).
The intensity mask 88 also allows for the detection of isolated false pixel values in an image. In general, the presence of low light and bright light pixels in isolation in the image are highly unlikely. In the image, the low light or bright light pixels correspond to objects in the image and are nearly always grouped with neighboring pixels having the same or similar light conditions. If saturated or low light pixels do occur in isolation, it is generally due to, for example, temporal noise, shot noise and/or fixed pattern noise as the source. These pixels are easily identified with an intensity mask such as shown in
As shown in
In
Pixel or pixel regions charge levels are determined at the various sampling times. This information may be used in post processing to reconstruct what a charge curve of a pixel or pixel region may have been without the distortion event, and thereby remove the distortion from the image. For example, movement of an object in the image frame during the image acquisition causes blurring in the image. The sampling may reveal portions of the exposure before or after the blurring effect and the sampled image signals are used to reconstruct the image without the blur. The same may apply for other events that occur during the image acquisition.
The event time mask may be used in the detection or correction of blur or over and under exposure in the image. In other words, the various masks of the meta-data are used together to the best advantage in the post processing of the image. In addition to the image features addressed in the foregoing, various other image characteristics and distortions may be determined by monitoring the timing of the events during the image acquisition. These additional characteristics and distortions are within the scope of this invention as well.
According to various embodiments of the invention, an imaging system is provided a meta-data processor.
b shows a digital imaging system 130 with the addition of a meta-data processor 132, wherein the same or similar elements are provided with identical reference characters. The meta-data processor 132 is connected directly to the sensor array 112 and to the DSP/RISC 124 and also receives the timing control signals over the connection 126. The meta-data processor 132 stores global P-Data (pre-acquisition data) and samples the image sensor 112 during image formation to extract and compute I-Data (intra-acquisition data) masks for use by an internal DSP/RISC (Digital Signal Processor/Reduced Instruction Set Computer) and/or external software for post processing. The meta-data processor 132 may be a separate programmable chip processor such as an application specific integrated circuit (ASIC), a field programmable gate array (FPGA) or a microprocessor.
With reference to
The sensor array 22 or 112 used in the present invention may be a black and white sensor array or a color sensor array. In color sensor arrays, it is common that pixel elements are provided with color filters, also known as a color filter array, to enable the sensing of the various colors of the image. The meta-data may apply to all the pixels or pixel regions of the senor array or may apply separately to pixels or pixel regions assigned to common colors in the color filter array. For example, all pixels of the blue filters in the filter array may have a meta-data component and pixels of the yellow filters have a different meta-data component, etc. The image sensing array may be sensitive to wavelengths other than visible light. For example, the sensor may be an infrared sensor. Other wavelengths are of course possible.
The sensor of the present invention may be a single chip or may be a collection of chips arranged in an array. Other sensor configurations are also possible and are included within the scope of this invention.
Meta-data extraction, computation and storage can be integrated with other components of the imaging system to reduce chip count and decrease manufacturing cost and power consumption.
a,
11
b and 11c illustrate three additional configurations for meta-data processing incorporation into the imaging system. As above, the same or similar elements are provided with identical reference characters. In
b illustrates an embodiment in which a combination meta-data processor and DSP/RISC processor 150 is provided, thereby eliminating the separate DSP/RISC element. In
The meta-data is used by post image acquisition processing hardware and software. The meta-data developed according to the foregoing is output from the imaging system along with the image data, and may be included in the image data file, such as in header information, or as a separate data file. An example of the meta-data structure, whether it is to be separate or incorporated with image data, is shown in
The example of the data structure of
The meta-data has been described as being extracted during the acquisition of the image data. The present invention also encompasses the extraction of the meta-data after the acquisition of the image data. For example, the data structure of
Meta-data enabled software is preferably provided to process the image file provided with this additional information. The software of a preferred embodiment includes a graphical user interface (GUI) that runs on a personal computer or workstation under Windows, Linux or Mac OS. Other operating systems are of course possible. The software communicates with the imaging device via the camera's I/O (Input/Output) interface to receive the image data and meta-data. Alternatively, the software receives the stored data from a storage or memory. For example, the image may be stored to a solid state memory card and the memory card connected to the image processing computer through a appropriate slot in the computer or an external memory card reader. It is also within the scope of the present invention that the image data along with the meta-data is stored to magnetic tape, hard disk storage, or optical storage or other storage means. In a security system, for example, the image data is stored onto a mass storage system and only selected portions of the image data may be processed when needed.
The software for processing the image data displays the original degraded image and provides a window for viewing the post-processed scene. Alternately, the software may perform the necessary processing and show only the final, processed image. The software provides pull down menus and options to display post image acquisition processing processes and algorithms and their parameters. The user of the software is preferably guided through the image processing based on the information in the meta-data, or the processing may be performed automatically or semi-automatically. The software performs the meta-data enabled post-processing by accessing the I-Data and P-Data meta-data in the memory locations in the meta-data processor or memory via the I/O block. The I/O block can provide images and meta-data either via a wireless connection such as Bluetooth or 802.11 (A, B, or G) or via a wired connection such control timing
Control timing is possible using a parallel interface or serial interfaces such as USB I or II or Firewire. The meta-data aware post-processing software of a preferred embodiment provides an indication to the user that meta-data of a specific class is available to assist in post-processing. The GUI is capable of showing pixel regions that were found to be distorted according to the meta-data. These areas can be color coded to indicate to the user the type of distortion in a specific pixel region. The user can select pixel regions to enable or disable processing of a specific distortion. The user may also select a region for automatic or manual post processing.
Compression, enhancement or manipulation of the image data such as rotation, zoom, or scaling of the image sequence can be dictated by the downloaded meta-data. After the image or image sequence has been processed, the new image data may be saved via the software.
A method and apparatus for extracting and providing meta-data for the improved post-processing of digital images and video has thus been presented. The present improvements overcome the limitations in performance that most hardware and software based post-processing methods are subject to by the failure to account for or provide access to information regarding the scene, the distortion or the image formation process. An implementation of post-processing utilizing knowledge regarding scene, the distortion, or the image formation process is available by the present method and apparatus. The use of meta-data improves image and video processing performance including the compression, manipulation and automatic interpretation.
Although other modifications and changes may be suggested by those skilled in the art, it is the intention of the inventors to embody within the patent warranted hereon all changes and modifications as reasonably and properly come within the scope of their contribution to the art.
The present application claims the benefit of U.S. Provisional Patent Application Ser. No. 60/462,388, filed Apr. 14, 2003, and U.S. Provisional Patent Application Ser. No. 60/468,262, filed May 7, 2003. The entire content of both provisional applications is incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
60462388 | Apr 2003 | US | |
60468262 | May 2003 | US |