This disclosure relates generally to the field of image processing and more particularly to a method and apparatus for decomposition and compression of motion imagery.
In conventional image processing scenarios, sensor data obtained from the image or scene being observed often overwhelms the processing capabilities of the image processing systems. Spatial-temporal compression of images is standardized in MPEG4/H.264 and other MPEG standards that apply the same image compression to the whole image. However, such conventional techniques are geared toward TV/movie type scenes, and are not optimized for airborne/space sensors. As a result, using such conventional techniques creates communications bottlenecks. Some conventional techniques characterize the apparent motion of the background and compress this information iteratively. However, such iterative processes are computationally intensive and require tracking of a large number of changes to identify moving pixels/objects from static image pixels/objects.
In accordance with an embodiment, a method for processing images includes receiving, at an image processor, a set of images corresponding to a scene changing with time, decomposing, at the image processor, the set of images to detect static objects, leaner objects, and mover objects in the scene, the mover objects being objects that change spatial orientation in the scene with time, and compressing, using the image processor, the mover objects in the scene separately at a rate different from that of the static objects and the leaner objects for storage and/or transmission.
In accordance with an embodiment, an image processing system includes an imaging platform having a sensor that is configured to capture images of a scene, each image comprising a plurality of pixels, and an image processor coupled to the imaging platform and to one or more memory devices having instructions thereupon. The instructions when executed by the image processor cause the image processor to receive a set of images corresponding to a scene changing with time, decompose the set of images to detect static objects, leaner objects, and mover objects in the scene, the mover objects being objects that change spatial orientation in the scene with time, and compress the mover objects in the scene separately at a rate different from that of the static objects and the leaner objects for storage and/or transmission.
In accordance with an embodiment, a tangible computer-readable storage medium includes one or more computer-readable instructions thereon for processing images, which when executed by one or more processors cause the one or more processors to receive a set of images corresponding to a scene changing with time, decompose the set of images to detect static objects, leaner objects, and mover objects in the scene, the mover objects being objects that change spatial orientation in the scene with time, and compress the mover objects in the scene separately at a rate different from that of the static objects and the leaner objects for storage and/or transmission.
These and other features and characteristics, as well as the methods of operation and functions of the related elements of structure and the combination of parts and economies of manufacture, will become more apparent upon consideration of the following description and the appended claims with reference to the accompanying drawings, all of which form a part of this specification, wherein like reference numerals designate corresponding parts in the various Figures. It is to be expressly understood, however, that the drawings are for the purpose of illustration and description only and are not intended as a definition of the limits of claims. As used in the specification and in the claims, the singular form of “a”, “an”, and “the” include plural referents unless the context clearly dictates otherwise.
In the description that follows, like components have been given the same reference numerals, regardless of whether they are shown in different embodiments. To illustrate an embodiment(s) of the present disclosure in a clear and concise manner, the drawings may not necessarily be to scale and certain features may be shown in somewhat schematic form. Features that are described and/or illustrated with respect to one embodiment may be used in the same way or in a similar way in one or more other embodiments and/or in combination with or instead of the features of the other embodiments.
When imaging platform 202 is mobile, some stationary features of scene 100 may appear to move but do not actually move. Such stationary features of scene 100 are known as “leaners” or “leaning objects.”
In one embodiment, the background pixels are spatially compressed using one of many possible techniques known in the art such as JPEG-2000. The spatial compression amount may be selected by the user. Factors of two to three times (2-3×) may usually be obtained for mathematically lossless compression. Compression of as much as 10-20× may be selected depending on the user's requirements and tolerance for loss in the background. The mover pixels are identified and transmitted in parallel with the compressed background. By way of example only, the mover pixels are typically 0.1-1% of all pixels in scene 100. In one embodiment, these mover pixels may not be compressed. “Decompression” of the ensemble is executed by overlaying the mover pixels from each image on top of the decompressed background image of scene 100.
In one embodiment, the rate of motion of the leaner pixels is determined and converted to a height estimate. This height estimate is transmitted along with the background image and the movers, for example, in a file. The decompressed image may be annotated or color coded for height. In some embodiments a group of contiguous leaner pixels may be segmented and then labeled as a leaning object. In this case, a spatial compression algorithm may be applied to the height values of the object. The compressed values are transmitted along with the background image and movers and are decompressed prior to labeling or color coding for height.
In one embodiment, a local region of pixels (referred to as a “chip” or “image chip”) around each mover is selected and transmitted with the static background image. Each chip may be transmitted in its original form or it may be compressed using a known spatial compression technique, such as those discussed above.
Image processing system 400 includes input sensor pixel data unit 402, image processor 404, and image output unit 406, among other hardware components, as described in the following paragraphs. In one embodiment, such hardware components may be standard computer processor(s) coupled with digital memory device(s) or other computer readable media (tangible and/or non-transitory) with instructions thereupon to cause the processor(s) to carry out the steps associated with features and functionality of the various aspects of this disclosure.
Input sensor pixel data unit 402 includes a plurality of sensors that capture a plurality of frames making up field of view 200 of scene 100. Each frame captured is stored as a pixel array or matrix in a memory of input sensor pixel data 402. Such captured data is typically received at data rates in the range of hundreds of Mega-bits per second (Mbps) to Giga-bits per second (Gbps). Since such capturing of images is known to one of ordinary skill in the art, it will not be described herein. Captured images in the form of pixel data stored in input sensor pixel data 402 is provided over communications channel 403 to image processor 404. Examples of sensors include, but are not restricted to, cameras, charge coupled devices, and the like.
Image processor 404 is configured to receive input sensor pixel data from input sensor pixel data unit 402 for processing. Image processor 404 includes image freezing module 404a, decomposition module 404b, and image compression module 404c, in addition to other hardware components including microprocessors configured to execute one or more instructions residing on memory devices in image processor 404, which when executed by the microprocessors cause the image processor 404 to carry out various image processing functionalities described herein. Image freezing module 404a, decomposition module 404b, and image compression module 404c are described below. Output from image processor 404 is provided to image output unit 406 over communication channel 405. Advantageously, in one embodiment, compressed output data from image processor 404 is provided at substantially lower data speeds in the range of sub-1 Mega-bits per second (Mbps), as compared to input data provided by input sensor pixel data unit 402 in the Gbps range.
Image output unit 406 may be a visual display unit or another image processing unit that prioritizes processing of data based upon specific applications. By way of example only, image output unit 406 may be a memory unit or a transmitter depending on how compressed output of image processor 404 is to be processed.
In one embodiment, compressed output 406 of image processor is transmitted over communication channel 408 to decompressed output unit 410 to reproduce a decompressed version of images captured by imaging platform 202. An example of decompression carried out by decompression output unit 410 is described with reference to
Output of image freezing module 404a is provided to image decomposition module 404b. Image decomposition module 404b processes each frame to separate static pixels, leaners, and movers. Movers or mover objects are defined as objects in scene 100 that change their physical location and spatial orientation with time. Examples of movers are vehicles and people. The procedure carried out by decomposition module 404b is described using method 600 of
In step 604, frame differencing based upon thresholding is carried out. Frame differencing helps to show movers clearly, along with edges of leaners. In one embodiment, frame differencing is carried out between pairs of successive frames to detect changes between the two successive frames. For example, a threshold value may be compared to an average frame brightness of the two successive frames to result in a difference frame. By way of example only and not by way of limitation,
In step 606, movers are selected using shape and size filtering. Shape and size filtering is described in
where Fdiff=frame difference (positive/negative separately) and no, nb are linear dimensions of object box 1004 and background box 1002, respectively. Linear scaling implies background contribution of a line will cancel the object box contribution, with CumDiff=0. A larger blob will have a larger background contribution with CumDiff<0. Applying the threshold on CumDiff yields movers.
In step 608, fusion of difference frames from at least two different pairs of frames is carried out. For example, in
In one embodiment, for example, as shown in
First pair of difference frames 1202(1) includes a first difference frame that is a positive difference of frames 600(1) and 600(2) and a second difference frame that is a negative difference of frames 600(1) and 600(2). Similarly, second pair of difference frames 1202(2) includes a first difference frame that is a positive difference of frames 600(2) and 600(3) and a second difference frame that is a negative difference of frames 600(2) and 600(3). Thresholding is applied to each difference frame in first pair of difference frames 1202(1) and 1202(2) in steps 1204(1) and 1204(2), respectively. Step 606 is carried out for each difference frame as shown in steps 1206(1) and 1206(2). Step 608 is refined into fusion step 1208 where two frame differences, positive and negative, are created as fusion frames. Positive differences show where a pixel or object has moved and made the intensity brighter. Negative differences show where a pixel has moved and reduced the intensity. For example, vehicles moving on a dark road result in positive changes for the new location of the vehicle and negative changes where the vehicle previously was located. Remaining steps of
Referring back to
Once movers have been identified and extracted, in step 614, reference or background scene or frame from selected “freeze frame” with movers removed and replaced with background from adjacent frames is created. Such construction of reference frame according to step 614 is illustrated in
Referring back to
In step 1404, image processor 404 calculates a direction of leaning based on engagement geometry (i.e., based on pointing direction of imaging platform 202 and field of view 200).
In step 1406, compression module 404a separates out static non-movers, leaners, and movers as obtained from the movies or video streams in decomposition module 404b.
In step 1408, compression module 404c applies spatial compression of static non-mover objects. By way of example only and not by way of limitation, spatial compression may include compression techniques such as JPEG 2000, or other static image compression techniques known to one of ordinary skill in the art.
In parallel, in step 1410, the moving pixels are segregated by pixel, track file, and, optionally, the moving pixels are identified as mover or leaner.
In step 1412, meta-data information associated with leaners is stored in files for transmission and/or compression followed by transmission. In this respect, the term “transmission” may be associated with source coding, or with channel coding of the channel used for transmission, and aspects of this disclosure are not limited by whether compression is done at source (i.e., at imaging platform), or during transmission accounting for transmission channel artifacts. By way of example only and not by way of limitation, metadata information includes the time of each image frame, the position of imaging platform 202, the position of the four corners of the image, the velocity of imaging platform 202, the angle rates of any pointing system, and the like. In addition, in step 1412, only the movie or video stream including movers (created in step 616 of
During decompression, for example, as shown in
In an alternative embodiment, in step 1402, instead of assessing motion in field of view 200, a motion vector is assigned to each moving pixel in each frame 200(1)-200(n). Further in this embodiment, the compressed mover video of step 1412 is created using eigen-value coefficients obtained from processing of motion vectors assigned to each pixel of frames 200(1)-200(n). In this embodiment, decompression and reconstruction of field of view 200 is then carried out according to
In step 1502, similar to step 1414 of
In step 1508, leaners extracted in step 1504 are overlaid on decompressed static objects images. In step 1510, the overlaid frames are used to create frozen series of frames 200(1)′-200(n)′. In step 1512, movers extracted in step 1506 are overlaid on frozen frames 200(1)′-200(n)′ that does not have movers. In step 1514, unfrozen video is created to reconstruct scene 100 using eigen-value coefficients provided from step 1412. By way of example only, use of eigenvalue coefficients is described in U.S. patent application Ser. No. 12/908,540, entitled “CORRECTING IMAGE DISTORTION FOR THREE DIMENSIONAL (3-D) PERSISTENT OBSERVATIONS,” filed Oct. 20, 2010, incorporated by reference herein in its entirety, and will not be described in detail herein. In an alternative embodiment, step 1514 is optional.
Referring to
Referring to
Referring now to
In step 1703, the stabilization prediction carried out by stabilization prediction module 1703a may be used to generate an approximate homography transform of step 1702 that freezes the background scene in frames 1 and (n+2). Because of expected errors in the homography transform, the transform may be varied slightly and a determination which transform to optimize the freezing be made.
In step 1704, sparse regions are selected in frames 1 and (n+2) to optimize the background freezing. Because subsequent steps 1706, 1708 and 1710 may be time consuming and because the entire background image is not necessary to optimize the freezing, processing may be significantly reduced without loss of performance. These regions are selected to be small regions of background pixels spread across the scene. Determining the optimum homography transform by applying them only to these selected background regions may reduce processing, while ultimately providing a high quality transform which may be applied to the whole scene 100 for good freezing and decomposition.
In step 1706, various X and Y coordinate offsets may be applied to the selected frames 200(1) and 200(n+2) over all selected sparse regions of such selected frames. The quality of freezing may be evaluated for each offset to optimize freezing. This step may be carried out if the stabilization prediction is not accurate to the sub-pixel level that may be achieved by image correlation.
In step 1708, correlations over such sparse regions may be calculated using a computing device in order to select the combination of homography transform and X/Y offset with the best correlation as the optimum freezing transformation.
In step 1710, based upon such sparse region correlations, the best homography transform and X and Y coordinate offsets may be determined and stored in a memory device. These may be used to freeze the full frame in step 1712. In this way, only the optimum transform is applied to the full frame, while the sequence of transforms that has been tested was applied only to sparse regions of the frame.
In step 1712, the optimum homography transform and X/Y offsets are applied to the full (n+2)th frame to freeze it relative to the first frame.
In step 1716, movers are detected in the (n+2)th frame and first frame by comparing the two scenes. When movers are discovered in these “end” frames and the magnitude of the motion may be estimated, and regions of interest may be defined in the intermediate frames. Additionally or optionally, the frame rate for FPA 1604 may be adapted after every “n” number of frames, n being an integer value. If the motion of all relevant objects from one frame to the next is consistently small, the frame rate may be decreased; however, if the motion is large, the frame rate may be increased.
In step 1718, newly detected movers in the first and (n+2)th frame from step 1716 are associated with each other and may also be associated with old movers and/or track files associated with the old movers.
In step 1720, regions of interest in intermediate frames are identified. These regions of interest encompass the area by which each mover in the first frame may get to the associated mover in the (n+2)th frame. If there is too much ambiguity in the association or the mover path, the number of frames skipped, n, may be reduced.
In step 1714, homography transforms are applied to the regions of interest (“ROIs”) in intermediate full resolution frames. If the observer motion is small or closely matches the stabilization prediction, the optimum transform for intermediate frames may be interpolated from that transform used to freeze the (n+2)th frame relative to the first frame. In some cases, additional variation of the transform and correlation of background portions of the scene may be needed to optimize the freezing and decomposition. Step 1714 combines the best transformation and offset information for background freezing from step 1710 with the selected regions of interest for movers from step 1720 to freeze the regions where movers are expected to be in the intermediate frames.
In step 1722, movers are detected in intermediate frames. Additionally, the images in these small groups of pixels comprise the scene chips which observe the movers in each frame. These chips may be used to characterize the moving objects. The frame rate may be adapted if large motion is causing ambiguities in the association of movers and the identification of intermediate regions of interest, or if small motion is causing unnecessary processing.
It is to be noted that various steps of the flowcharts discussed in the disclosure may be carried out using computing devices having processors, memory, buses, and ports to aid carrying out such steps. Alternatively, or additionally, such steps may be carried out by executing instructions residing upon non-transitory or tangible computer readable media using one or more processor devices. Further, steps of the flowcharts may be merged, skipped, or new steps added for specific applications in which various embodiments may be implemented, as may be contemplated by those of ordinary skill in the art after reading this disclosure.
Although the above disclosure discusses what is currently considered to be a variety of useful embodiments, it is to be understood that such detail is solely for that purpose, and that the appended claims are not limited to the disclosed embodiments, but, on the contrary, are intended to cover modifications and equivalent arrangements that are within the spirit and scope of the appended claims.
Number | Name | Date | Kind |
---|---|---|---|
5109435 | Lo | Apr 1992 | A |
5991444 | Burt et al. | Nov 1999 | A |
6738424 | Allmen | May 2004 | B1 |
7085401 | Averbuch et al. | Aug 2006 | B2 |
7366325 | Fujimura et al. | Apr 2008 | B2 |
7440637 | Schechner et al. | Oct 2008 | B2 |
7650058 | Garoutte | Jan 2010 | B1 |
7792520 | Sohn et al. | Sep 2010 | B2 |
7990422 | Ahiska et al. | Aug 2011 | B2 |
8004570 | Saito et al. | Aug 2011 | B2 |
8325799 | Chono et al. | Dec 2012 | B2 |
8400619 | Bachrach | Mar 2013 | B1 |
20030122862 | Takaku | Jul 2003 | A1 |
20030122868 | Aggarwal | Jul 2003 | A1 |
20030215141 | Zakrzewski et al. | Nov 2003 | A1 |
20040197014 | Oohashi | Oct 2004 | A1 |
20050158023 | Takasu et al. | Jul 2005 | A1 |
20050162701 | Hirano | Jul 2005 | A1 |
20060045311 | Shibuya | Mar 2006 | A1 |
20070071296 | Nonaka et al. | Mar 2007 | A1 |
20070132856 | Saito et al. | Jun 2007 | A1 |
20070253625 | Yi | Nov 2007 | A1 |
20080063355 | Nakano | Mar 2008 | A1 |
20080273751 | Yuan et al. | Nov 2008 | A1 |
20090136023 | Pan et al. | May 2009 | A1 |
20100014709 | Wheeler et al. | Jan 2010 | A1 |
20100073519 | Onoe et al. | Mar 2010 | A1 |
20100097444 | Lablans | Apr 2010 | A1 |
20100100835 | Klaric et al. | Apr 2010 | A1 |
20100265364 | Robinson et al. | Oct 2010 | A1 |
20120098933 | Robinson | Apr 2012 | A1 |
20120177121 | Tripathi | Jul 2012 | A1 |
20120320237 | Liu | Dec 2012 | A1 |
Number | Date | Country |
---|---|---|
196 15 657 | Aug 1997 | DE |
06078272 | Mar 1994 | JP |
09200704 | Jul 1997 | JP |
03058960 | Jul 2003 | WO |
2003058960 | Jul 2003 | WO |
2008072024 | Jun 2008 | WO |
Entry |
---|
Anandan et al., “Video as an image data source: efficient representations and applications,” Proceedings of the International Conference on Image Processing (ICIP). Washington, Oct. 23-26, 1995; IEEE Comp.Soc. Press, US vol. 1, pp. 318-321. |
Wiegand et al.; “Overview of the H.264/AVC Video Coding Standard”; IEEE Transactions on Circuits and Systems for Video Technology; Jul. 1, 2003; pp. 560-576; 13(7); IEEE Service Center; Piscataway, NJ, US. |
Creech; “NGA Approaches to Wide Area Motion Imagery”; National Geospatial-Intelligence Agency; pp. 1-21; [Approved for Public Release 11-146][AIE Eastern FMV Conference][Feb. 28, 2011-Mar. 2, 2011]. |
Heller; “From Video to Knowledge”; Lawrence Livermore National Laboratory; pp. 4-11 (2011). |
Nadernejad et al.; “Edge Detection Techniques: Evaluations and Comparisons”; Applied Mathematical Sciences; 2 (31):1507-1520 (2008). |
Seitz et al.; “A Comparison and Evaluation of Multi-View Stereo Reconstruction Algorithms”; Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition; vol. 1—8 pages (2006). |
Singh; “Performance Analysis for Objective Methods of Video Quality Assessment”; 9 pages; [Printed online: Oct. 24, 2010 at http://www.eetimes.com/General/DisplayPrintViewContent?contentItemId=4196911; Published Oct. 18, 2005]. |
Sullivan et al.; “The H.264/AVC Advanced Video Coding Standard: Overview and Introduction to the Fidelity Range Extensions”; SPIE Conference on Applications of Digital Image Processing XXVII; pp. 1-22 (2004). |
Number | Date | Country | |
---|---|---|---|
20130216144 A1 | Aug 2013 | US |