The present invention relates to a method and apparatus for video image content retrieval, and more particularly, to a method and apparatus for identifying foregrounds of visual content.
Content-based similarity retrieval for multimedia data becomes important after international coding standards were established, such as JPEG, MPEG-1, MPEG-2, and MPEG-4, which have been widely used over the Internet. In general, one image can express than words. When performing similarity retrieval for multimedia databases, the retrieval result would depend on user's definition on image similarity. For a content-based image retrieval system, the retrieval performance would be affected by the result of image segmentation. In general, if features extracted from the entire image include trivial background information, it would bias the retrieval result.
Concerning image pre-processing, good retrieval performances can be achieved only when the key subject of visual contents is precisely specified. For example, shape descriptors should be applied to descriptions of the shape of meaningful objects instead of blind descriptions of the entire image.
Mathematical morphology is a set-theoretically method for image processing. It is a powerful tool and it can be employed in removing backgrounds or extracting foregrounds of visual content. Some basic morphological operations, such as erosion, dilation, opening, and closing, will be introduced as follows.
Dilation and erosion of a gray-level image I(x, y) by a two dimensional structure element (SE) B (for example, a disk or square) are respectively defined as
(I[+]B)(x,y)=max{I(x−k,y−l)|(k,l)εB} (1)
(I[−]B)(x,y)=min{I(x+k,y+l)|(k,l)εB} (2)
where [+] and [−] are dilation and erosion operators respectively. When performing dilation and erosion operations to an image by using a structure element in the shape of a circular disk, it looks like the circular disk moves around the boundary between foreground areas and background areas. The circular disk broadens or reduces the boundary corresponding to dilation or erosion operations.
Opening operation is accomplished by performing erosion and then dilation; closing operation is accomplished by performing dilation and then erosion. Opening and closing operations for a gray-level image I are respectively defined as
I∘B=(I[−]B)[+]B (3)
IB=(I[+]B)[−]B (4)
where ∘ and are opening and closing operators respectively. Opening operation smoothes the contours of an object and removes thin protrusions; closing operation generally fuses narrow breaks, and fills long thin gulfs and small holes. Please refer to
However, conventional opening and closing operations can not preserve the boundary information between foreground areas and background areas on an image. For identifying visual content foregrounds, there exist drawbacks when processing images by utilizing conventional morphology.
The present invention performs multi-scale opening and closing by reconstruction (MSOR and MSCR) with a three-dimensional (3-D) structure element to an image which is realized as position (x,y) and pixel value in three-dimensional space. In contrast, conventional morphology performs opening and closing operations with a two-dimensional (2-D) structure element to an image which is realized as position (x,y) in two-dimensional space. For image segmentation, the present invention can precisely process the boundary between foreground areas and background areas, and can preserve the boundary information as well.
The present invention performs MSOR and MSCR operations to an image for identifying visual content foregrounds. The present invention provides a foreground identification method and an apparatus implementing the same. The method comprises steps of: (a) respectively performing MSOR and MSCR operations to an original image by using plural values of 3-D structure element for determining a 3-D opening-by-reconstruction modest structure element and a 3-D closing-by-reconstruction modest structure element; (b) comparing the original image and an image obtained by performing MSOR operation with the 3-D opening-by-reconstruction modest structure element to the original image so as to generate an enhanced top-hat image, and comparing the original image and an image obtained by performing MSCR operation with the 3-D closing-by-reconstruction modest structure element to the original image so as to generate an enhanced bottom-hat image; (c) locating an overlap region between the enhanced top-hat image and the enhanced bottom-hat image, where the overlap region forms a foreground identification screen; (d) simulating the variation of pixel colors in background areas extracted by the foreground identification screen for generating an interpolated background mesh; (e) dividing the original image into a plurality of regions and indexing the plural regions so that an image with indexed regions is obtained; (f) comparing the interpolated background mesh and the image with indexed regions for determining a refined foreground identification screen according to a refinement calculation; and (g) identifying and extracting foregrounds from the original image by utilizing the refined foreground identification screen. The foreground identification apparatus comprises modules that their functions respectively correspond to the steps (a)-(g).
a shows an original image I.
b shows an image, denoted as IBO, which is obtained by performing opening operation to the original image I.
c illustrates processes of performing MSOR operation to the original image I, where images generated correspondingly are denoted as IBOR.
d shows an image, denoted as IOR, which is obtained by performing MSOR operation to the original image I.
a illustrates an image realized as position (x,y) and pixel value in a three-dimensional space.
b shows a gray-level background mesh generated after performing MSOR and MSCR operations to an image.
Two operations, multi-scale opening and closing operations, will be introduced as follows. The multi-scale opening and closing operations for an image I by structure element (SE) B are respectively defined as
I∘nB=(I[−]nB)[+]nB (5)
InB=(I[+]nB)[−]nB (6)
where n is an integer, represented as a scaling factor. Equation (5) can be implemented as
Likewise for InB. In addition, n-scale opening (closing) operation is equivalent to that performing opening (closing) operation by using a structure element of n-times size, i.e. nB. For example, 2-scale opening operation is equivalent to that performing opening operation by using a structure element of 2-times size, i.e. 2B.
Two operations, multi-scale opening and closing by reconstruction (MSOR and MSCR) will be introduced as bellow. MSOR operation is defined as
I
B
OR=(I{tilde over (∘)}nB)=δBm+1(I∘nB,I)=min(δBm(I∘nB,I)[+]nB,I) (8)
where m is an integer, represented the number of times that reconstruction is to be performed, δB1=min((I∘nB)[+]nB,I), and n is also an integer, represented as a scaling factor. Similarly, MSCR operation is defined as
I
B
CR=(I{tilde over ()}nB)=εBm+1(InB,I)=max(εBm(InB,I)[−]nB,I) (9)
where m is an integer, represented the number of times that reconstruction is to be performed, εB1=max((InB)[−]nB,I), and n is also an integer, represented as a scaling factor.
MSOR and MSCR operations can preserve the boundary information between foreground areas and background areas on an image. Please refer to
When performing MSOR and MSCR operations to an image by using a three-dimensional (3-D) structure element (such as a ball), it looks like the ball rolls in a three-dimensional space consisted of position (x,y) and pixel value (as shown in
Please refer to
In Step S402, the modest SE determination module 302 receives an inputted original image, for example, an image which has an obvious key object and background. In the beginning, the original image is processed by performing MSOR (MSCR) operation with a value of three-dimensional (3-D) structure element. The value of 3-D structure element is increased gradually. An increased value of 3-D structure element is used for performing MSOR (MSCR) operation to the original image. If no change would occur after performing MSOR operation to the original image by using the increased value of 3-D structure element, a 3-D opening-by-reconstruction modest structure element, denoted as BO, is determined. If no change would occur after performing MSCR to the original image by using the increased value of 3-D structure element, a 3-D closing-by-reconstruction modest structure element, denoted as BC, is determined.
In Step S404, the enhanced top-hat image and bottom-hat image generation module 304 receives BO and BC which are determined by the modest SE determination module 302. MSOR operation is performed to the original image by using BO; MSCR operation is performed to the original image by using BC. The original image and an image obtained by performing MSOR operation with BO are compared so as to generate an enhanced top-hat image; the original image and an image obtained by performing MSCR operation with BC are compared so as to generate an enhanced bottom-hat image. An example of aforesaid comparison is to calculate the pixel value difference between the two images, the original image and the image obtained by performing MSOR (or MSCR).
In Step S406, the foreground identification screen generation module 306 receives the enhanced top-hat image and the enhanced bottom-hat image which are generated by the enhanced top-hat image and bottom-hat image generation module 304. The foreground identification screen generation module 306 locates an overlap region between the enhanced top-hat image and the enhanced bottom-hat image. The present invention takes the overlap region as a foreground identification screen, which is capable of identifying and extracting foregrounds from the original image. In the present invention, the foreground identification screen is refined for obtaining delicate foregrounds. The refinement will be described as follows.
In Step S408, the background mesh generation module 308 receives the original image and the foreground identification screen which is generated by the foreground identification screen generation module 306. In the beginning, the foreground identification screen is transformed into a background identification screen. Taking a two-value foreground identification screen (each pixel value is 1 or 0) for example, during transformation, those original pixels of value “1” are replaced with value “0” and those original pixels of value “0” are replaced with value “1”. The background identification screen is utilized for extracting the background of original image. After that, the background mesh generation module 308 simulates the variation of pixel colors in the extracted background areas for generating an interpolated background mesh.
In Step S410, the image segmentation module 310 receives the original image, divides the original image into a plurality of regions and indexing the plural regions so that an image with indexed regions is obtained. A pixel, a square block consisted of plural pixels, or an arbitrary shaped region can be utilized as a single region afore mentioned.
In Step S412, the refinement module 312 receives the image with indexed regions from the image segmentation module 310 and the interpolated background mesh from the background mesh generation module 308. The refinement module 312 compares the image with indexed regions and the interpolated background mesh for determining a refined foreground identification screen according to a refinement calculation.
In Step S414, the foreground identification module 314 receives the refined foreground identification screen from the refinement module 312 and the original image. The foreground extraction module 314 identifies and extracts foregrounds from the original image by using the refined foreground identification screen.
By the above-mentioned steps (Step S402 to S414), foregrounds on the original image are thus identified and outputted.
In Step S402, the difference between all pixels in the original image I and an image, denoted as IBOR(t), obtained by performing MSOR operation to the original image I, is represented as
where t is a scaling factor for changing the value of 3-D structure element. The relation between the difference ΔIBOR(t) and 3-D structure element B is depicted as MSOR curve in
where t is a scaling factor for changing the value of 3-D structure element. The relation between the difference ΔIBCR(t) and 3-D structure element B is depicted as MSCR curve in
For the MSOR curve, when increasing the value of 3-D structure element, the difference ΔIBOR(t) reaches to a constant. When 3-D structure element B is larger than a value BO, the difference ΔIBOR(t) would approach constant. In the meanwhile, no change would occur after performing MSOR operation to the original image, i.e. Σx,y|IBOR(t)−IB
Referring to
In the beginning, the present invention sets two values of 3-D structure elements, B1 and B2, where B2>B1 (Step S652), and then calculate the difference between all pixels in the original image and an image obtained by performing MSOR operation to the original image by using B1, i.e.
(Step S654), and the difference between all pixels in the original image and an image obtained by performing MSOR operation to the original image by using B2, i.e.
(Step S656). The slope,
is calculated (Step S658). If the slope approaches zero (Step S660), B1 is adopted as BO, the opening-by-reconstruction 3-D modest structure element. Otherwise, B2 is increased and the increased value of B2 is set as a new value of B2; the original value of B2 is set as a new value of B1, i.e. B2+δB is set to B2 and original B2 is set to B1 (Step 664). For next proceeding, steps S654 to S658 are performed again by using the new values of B1 and B2. In addition, the original image can be resized to a small one and the resized original image is used to determine BO through steps S652 to S658. In this manner, it can reduce time for determining BO. Similarly, it can also reduce time for determining BC by using the resized original image.
Please refer to
Step S702 corresponds to Step S402. Step S702 is similar to Step S402. For clarity and conciseness, the description of Step S702 will be omitted herein.
Step S704 corresponds to Step S404. In Step S704, to generate the enhanced top-hat image, each pixel in the original image subtracts corresponding pixel in the image obtained by performing MSOR operation to the original image by using BO, represented as
I
B
etop(x,y)=(I−IB
where IB
I
B
ebot(x,y)=(IB
where IB
Step S706 corresponds to Step S406. In Step S706, the overlap region between the enhanced top-hat image and the enhanced bottom-hat image is taken as the foreground identification screen. A two-value foreground identification screen (each pixel value is 1 or 0) is obtained by the following equation:
where IFGα is the foreground identification screen, IFGα(x,y)ε{0,1}, IB
Step S708 corresponds to Step S408. In Step S708, the foreground identification screen obtained from Step S706 is transformed into a background identification screen. The background of original image is extracted by utilizing the background identification screen. After that, an interpolated background mesh is generated by utilizing conventional Lagrangian interpolation algorithm to simulate the variation of pixel colors in the extracted background. Utilizing the background identification screen to extract the background of original image is represented as
IGα=IBGα•I (15)
where IGα is the extracted background, IGα(x,y)ε{0, 1, . . . , 2n−1}, IBGα is the background identification screen, and the operation symbol • denotes element by element multiplications.
Step S710 corresponds to Step S410. In Step S710, conventional JSEG (J measure based segmentation) algorithm can be utilized for dividing the original image into arbitrary shaped regions.
Step S712 corresponds to Step S412. In Step S712, the refined foreground identification screen determined according to a refinement calculation is represented as
where ĨFGα is the refined foreground identification screen, I(i) is an i-th region in the original image, IBGM(i) is a corresponding i-th region in the interpolated background mesh, ∪i is a union operation for sets of image pixels in the i-th region, • is an element by element logical AND operation between image matrix └•┘ and a unit matrix U with the same dimension, and TN is a threshold to determine whether one region-map unit, I(i), in I is close enough to the corresponding region in IBGM(i) or not. The threshold TN can be a predetermined value or can be determined by the result of foregrounds extracted by the refined foreground identification screen, which is calculated from equation (16).
Step S714 corresponds to Step S414. In Step S714, to identify and extract foregrounds from the original image, each pixel in the original image multiples corresponding pixel in the refined foreground identification screen, represented as
IFG=ĨFGα•I (17)
where IFG is the foregrounds extracted from the original image, ĨFGα is the refined foreground identification screen, and the operation symbol • denotes element by element multiplications.
While the preferred embodiments of the present invention have been illustrated and described in detail, various modifications and alterations can be made by persons skilled in this art. The embodiment of the present invention is therefore described in an illustrative but not restrictive sense. It is intended that the present invention should not be limited to the particular forms as illustrated, and that all modifications and alterations which maintain the spirit and realm of the present invention are within the scope as defined in the appended claims.
Number | Date | Country | Kind |
---|---|---|---|
097132384 | Aug 2008 | TW | national |