This application claims priority from Korean Patent Application No. 10-2017-0084002 filed on Jul. 3, 2017 in the Korean Intellectual Property Office, the disclosure of which is incorporated herein by reference in its entirety.
The present invention relates to a method and apparatus for extracting a foreground. More specifically, the present invention relates to a method and apparatus for extracting a foreground in which a foreground is extracted by dividing an image into a foreground region and a background region.
Recently, as the installation of closed circuit television (CCTV) has spread, interests in intelligent image analysis technology have increased for efficient monitoring. The intelligent image analysis technology is a technology of detecting predefined events through image analysis and automatically transmitting alarms. Examples of events detected in the intelligent image analysis include intrusion detection and object counting.
The intelligent image analysis is performed, for example, through foreground extraction, object detection, object tracking, and event detection. At this time, foreground objects extracted by dividing an image into a background and a foreground in the foreground extracting process continue to be used as basic data for objection detection and tracking. Therefore, the foreground extracting process is a basic and important process in the intelligent image analysis.
In order to extract a foreground from an image as described above, various foreground extracting algorithms have been proposed so far. However, most of the proposed algorithms have problems such as low accuracy, sensitivity to noise, and high computational complexity. Specifically, since frame difference-based algorithms are very poor in foreground extraction accuracy and GMM (Gaussian mixture model)-based algorithms are sensitive to noise to require a large amount of computation in the image post-processing, there is a problem that it takes a considerable time to extract the foreground. Therefore, it is difficult to apply the proposed algorithms to the intelligent image analysis requiring accurate foreground extraction in real time.
Accordingly there is required a method capable of rapidly extracting a foreground through an operation of resistance to noise and low complexity.
An aspect of the present invention is to provide a method and apparatus for extracting a foreground, which has resistance to noise and can guarantee a certain level of accuracy and reliability over foreground extraction results.
Another aspect of the present invention is to provide a method and apparatus for extracting a foreground, which can rapidly separate a foreground and a background by reducing the complexity of operations used for foreground extraction.
In accordance with an aspect of the disclosure, there is provided a method, comprising: acquiring, by a device, encoded image data corresponding to an original image; decoding, by the device, the encoded image data; acquiring, by the device, a foreground extraction target frame and an encoding parameter associated with an encoding process of the original image based on decoding the encoded image data; extracting, by the device, a first candidate foreground associated with the foreground extraction target frame based on the encoding parameter; extracting, by the device, a second candidate foreground associated with the foreground extraction target frame based on a preset image processing algorithm; and determining, by the device, a final foreground associated with the foreground extraction target frame based on the first candidate foreground and the second candidate foreground.
In accordance with another aspect of the disclosure, there is provided a method, comprising: acquiring, by a device, encoded image data associated with an original image that was encoded based on an encoding process; decoding, by the device, the encoded image data and acquiring a foreground extraction target frame and an encoding parameter associated with the encoding process based on decoding the encoded image data, wherein the encoding parameter includes a motion vector; and extracting, by the device, a foreground associated with the foreground extraction target frame using a cascade classifier based on the motion vector.
In accordance with another aspect of the disclosure, there is provided an apparatus, comprising: a memory configured to store instructions; and at least one processor configured to execute the instructions to: acquire encoded image data generated through an encoding process performed on an original image; perform a decoding process on the encoded image data and acquire a foreground extraction target frame and an encoding parameter associated with the encoding process based on the decoding process; extract a first candidate foreground associated with the foreground extraction target frame using the encoding parameter; extract a second candidate foreground associated with the foreground extraction target frame using a preset image processing algorithm; and determine a final foreground associated with the foreground extraction target frame based on the first candidate foreground and the second candidate foreground.
However, aspects of the present invention are not restricted to the one set forth herein. The above and other aspects of the present invention will become more apparent to one of ordinary skill in the art to which the present invention pertains by referencing the detailed description of the present invention given below.
The above and other aspects and features of the present invention will become more apparent by describing in detail exemplary embodiments thereof with reference to the attached drawings, in which:
Advantages and features of the present invention and methods of accomplishing the same may be understood more readily by reference to the following detailed description of exemplary embodiments and the accompanying drawings. The present invention may, however, be embodied in many different forms and should not be construed as being limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete and will fully convey the concept of the the present invention to those skilled in the art, and the present invention will only be defined by the appended claims. In the drawings, the size and relative sizes of layers and regions may be exaggerated for clarity. Like reference numerals refer to like elements throughout the specification. The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the present invention.
Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the present invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and/or the specification and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
It will be further understood that the terms “comprises” and/or “comprising,” or “includes” and/or “including” when used in this specification, specify the presence of stated features, regions, integers, steps, operations, instructions, elements, components, and/or groups, but do not preclude the presence or addition of one or more other features, regions, integers, steps, operations, instructions, elements, components, and/or groups thereof.
Hereinafter, embodiments of the present invention will be described with reference to the attached drawings.
Referring to
In this embodiment, the intelligent image analysis system may include an image capturing apparatus 200, a foreground extracting apparatus 100, and an image analyzing apparatus 300. However, this configuration is only a preferred embodiment for achieving an object of the present invention, and, if necessary, some components may be added or omitted. Further, the respective components of the intelligent image analysis system shown in
In the intelligent image analysis system, the image capturing apparatus 200 is an apparatus for providing image data generated through image capturing. The image capturing apparatus 200 may be implemented as, for example, a CCTV camera, but the present invention is not limited thereto.
As shown in
Here, the encoding process may be a process of converting an original image into a designated image format. Examples of the image format may include, but are not limited to, standard image formats such as MPEG-1, MPEG-2, MPEG-4, and H-264.
In the intelligent image analysis system, the foreground extracting apparatus 100 is a computing apparatus that extracts foreground by separating foreground and background from a given image. Here, examples of the computing apparatus may include, but are not limited to, a notebook, a desk top, and a laptop, and may include all kinds of apparatuses equipped with computing means and communication means. However, since foreground extraction must be performed faster than anything else in order to perform intelligent image analysis in real time, the foreground extracting apparatus 100 may be preferably implemented as a high-performance server computing apparatus.
Specifically, as shown in
According to an embodiment of the present invention, the encoding parameter may include a motion vector (MV), a discrete cosine transform (DCT) coefficient, and partition information including the number and size of prediction blocks. However, the present invention is not limited thereto.
In an embodiment, the foreground extracting apparatus 100 may extract a first candidate foreground using the encoding parameters, and may extract a second candidate foreground using a preset image processing algorithm. Further, the foreground extracting apparatus 100 may determine a final foreground for a foreground extraction target frame from the first and second candidate foregrounds using a Markov Random Field (MRF) model. Here, the preset image processing algorithm may be, for example, a frame difference-based image processing algorithm or a GMM-based image processing algorithm, but is not limited thereto, and at least one image processing algorithm widely known in the art may be used without limitation. According to this embodiment, since the final foreground is determined using a plurality of candidate foregrounds, there is an advantage that the accuracy and reliability of the extracted foreground results can be improved. However, even according to this embodiment, it was found from comparative experimental results that the complexity of the entire operation is not high. The above comparative experimental results are referred to the experimental results shown in
In another embodiment, the foreground extracting apparatus 100 may extract a first candidate foreground for a foreground extraction target frame using the encoding parameters, and may determine a final foreground for the foreground extraction target frame from the first candidate foreground using the MRF model. According to this embodiment, since the final foreground is determined directly from a single candidate foreground, there is an advantage that the foreground extraction results can be provided quickly. However, even according to this embodiment, it was found from comparative experimental results that a foreground having resistance to noise and high accuracy can be extracted. The above comparative experimental results are referred to the experimental results shown in
In the intelligent image analysis system, the image analyzing apparatus 300 is a computing apparatus for performing intelligent image analysis on the basis of foreground information provided by the foreground extracting apparatus 100. For example, the image analyzing apparatus 300 may recognize an object from the extracted foreground, track the recognized object, or perform image analysis for object counting.
In the intelligent image analysis system, the foreground extracting apparatus 100 and the image capturing apparatus 200 may communicate with each other through a network. Here, as the network, all kinds of wired/wireless networks such as local area network (LAN), wide area network (WAN), mobile radio communication network, and wireless broadband internet (WIBRO) may be used.
Up to now, an intelligent image analysis system according to an embodiment of the present invention has been described with reference to
Referring to
The image acquiring unit 110 acquires encoded image data. For example, the image acquiring unit 110 may receive image data encoded in the form of a bitstream in real time, but the method of acquiring the encoded image data using the image acquiring unit 110 is not limited thereto.
The image decoding unit 130 performs a decoding process of the encoded image data acquired by the image acquiring unit 110, and acquires a foreground extraction target frame and encoding parameters as a result of the decoding process. Since the decoding process is already obvious to those skilled in the art, a detailed description thereof will be omitted.
The candidate foreground extracting unit 150 extracts a candidate foreground from the foreground extraction target frame. For this purpose, as shown in
The first candidate foreground extracting unit 151 extracts a first candidate foreground for the foreground extraction target frame using the encoding parameters acquired as a result of the decoding process. Details thereof will be described later with reference to
The second candidate foreground extracting unit 153 extracts a second candidate foreground for the foreground extraction target frame using a preset image processing algorithm. Here, as the preset image processing algorithm, any algorithm may be used.
According to an embodiment of the present invention, the second candidate foreground extracting unit 153 may extract a plurality of second candidate foregrounds using a plurality of image processing algorithms in order to improve the accuracy and reliability of the foreground extraction result. In this case, as shown in
The final foreground determining unit 170 determines a final foreground from at least one candidate foreground using the MRF model. For example, the final foreground determining unit 170 may determine a final foreground by performing an operation that minimizes an MRF-based energy function. Details thereof will be described later with reference to
Each of the components in
Referring to
The processor 101 controls the overall operation of each component of the foreground extracting apparatus 100. The processor 101 may be configured to include a central processing unit (CPU), a micro processor unit (MPU), a microcontroller unit (MCU), a graphic processing unit (GPU), or any type of processor that is well known in the art. Further, the processor 101 may perform operations on at least one application or program for executing a method according to embodiments of the present invention. The foreground extracting apparatus 100 may include one or more processors.
The memory 103 stores various data, commands and/or information. The memory 103 may load one or more programs 109a from the storage 109 in order to execute a foreground extracting method according to embodiments of the present invention.
The bus 105 provides a communication function between the components of the foreground extracting apparatus 100. The bus 105 may be implemented as various types of buses such as an address bus, a data bus, and a control bus.
The network interface 107 supports wired/wireless internet communication of the foreground extracting apparatus 100. In addition, the network interface 107 may support various communication methods other than internet communication. For this purpose, the network interface 107 may be configured to include a communication module that is well known in the art.
The storage 109 may non-temporarily store the one or more programs 109a. In
The storage 109 may be configured to include non-volatile memory such as ROM (Read Only Memory), EPROM (Erasable Programmable ROM), EEPROM (Electrically Erasable Programmable ROM), or flash memory, hard disk, detachable disk, or any type of computer-readable recording medium well known in the art to which the present invention pertains.
The foreground extracting software 109a may perform a foreground extracting method according to an embodiment of the present invention.
Specifically, the foreground extracting software may be loaded in the memory 103, and may execute the following operations using one or more processor 101, the operations including: acquiring encoded image data generated by an encoding process for an original image; performing the encoding process for the encoded image data and acquiring a foreground extraction target frame and encoding parameters calculated from the encoding process as a result of the encoding process; extracting a first candidate foreground for the foreground extraction target frame using the encoding parameters; extracting a second candidate foreground for the foreground extraction target frame using a preset image processing algorithm; and determining a final foreground for the foreground extraction target frame on the basis of the first candidate foreground and the second candidate foreground.
Or, the foreground extracting software may execute the following operations: acquiring encoded image data generated by an encoding process for an original image; performing the encoding process for the encoded image data and acquiring a foreground extraction target frame and encoding parameters calculated from the encoding process as a result of the encoding process, the encoding parameter including motion vectors; and extracting a foreground for the foreground extraction target frame using a cascade classifier based on the motion vector.
Up to now, the foreground extracting apparatus 100 according to the embodiment of the present invention has been described with reference to
Each step of the foreground extracting method according to an embodiment of the present invention, which will be described later, may be performed by a computing apparatus. For example, the computing apparatus may be a foreground extracting apparatus 100. For the convenience of explanation, a description of operation subject of each step included in the foreground extracting method may be omitted. In addition, each step of the foreground extracting method may be an operation performed in the foreground extracting apparatus 100 by allowing the processor 101 to execute the foreground extracting software 109a.
Referring to
Next, the foreground extracting apparatus 100 performs the decoding process for the encoded image data, and acquires a foreground extraction target frame and encoding parameters calculated from the encoding process as a result of the decoding process (S200). As described above, the encoding parameters may include a motion vector, a DCT coefficient, and partition information including the number and size of prediction blocks.
In order to provide the convenience of understanding, briefly explaining the motion vector among the encoding parameters, as a block matching algorithm is performed in a unit of prediction block in the encoding process, a motion vector is calculated in a prediction block, and the motion vector is included in the image data encoded in the form of a difference value. Therefore, in the decoding process, a motion vector in a unit of prediction block may be acquired again using the difference value of the motion vector. Since it is obvious that those skilled in the art can understand such contents, a detailed description thereof will be omitted.
Next, the foreground extracting apparatus 100 extracts a first candidate foreground for the foreground extraction target frame using the encoding parameters (S300). Specifically, the foreground extracting apparatus 100 may extract the first candidate foreground using a cascade classifier constructed based on various features of the encoding parameters. Here, the reason for utilizing the cascade classifier is to minimize the influence of noise that may be included in the encoding parameters. Details thereof will be described later with reference to
Next, the foreground image extracting apparatus 100 extracts a second candidate foreground for the foreground extraction target frame using a preset image processing algorithm (S400). As the preset image processing algorithm, any image processing algorithm such as a frame difference-based image processing algorithm or a GMM-based processing algorithm may be used.
In an embodiment, a plurality of second candidate foregrounds may be extracted using a plurality of image processing algorithms. That is, the foreground image extracting apparatus 100 may extract n second candidate foregrounds such as 2-1st candidate foreground, . . . , and 2-nth candidate foreground, using n image processing algorithms (n is a natural number of 2 or more). According to this embodiment, the accuracy and reliability of the result of the extracted final foreground can be improved compared to when one second candidate foreground is used.
In the above embodiment, the value of n may be a predetermined fixed value or a variable value that varies depending on the situation. For example, as the computing performance of the foreground extracting apparatus 100 increases, as the resolution of the foreground extraction target frame decreases, or as the accuracy requirement of the intelligent image analysis system, the value of n may be a variable value that is set to a large value.
Next, the foreground extracting apparatus 100 determines a final foreground for the foreground extraction target frame using the first candidate foreground and the second candidate foreground (S500). According to this embodiment, the foreground extracting apparatus 100 may determine the final foreground using an MRF-based probability model. Details thereof will be described later with reference to
Meanwhile, according to this embodiment, before performing the step (S500) of determining the final foreground, when the foreground classification units of the first candidate foreground and the second candidate foreground are different, a step of matching them may be performed. Here, the foreground classification unit refers to a size of a unit area in which foreground and background are classified in an image.
For example, since the encoding parameters are calculated in a unit of block (e.g., macroblock), the first candidate foreground extracted using the encoding parameters may be a candidate foreground in which a foreground and a back ground are classified in a unit of block. In contrast, the second candidate foreground extracted using the image processing algorithm such as GMM may be a candidate foreground in which a foreground and a background are classified in a unit of pixel. Like this, when foreground classification units are different from each other as a block and a pixel, a step of matching the foreground classification unit of the first candidate foreground with the foreground classification unit of the second candidate foreground may be performed. A detailed description thereof will be described with reference to examples shown in
Up to now, the foreground extracting method according to the embodiment of the present invention has been described with reference to
Hereinafter, the step (S300) of extracting the encoding parameter-based first candidate foreground will be described in detail with reference to
According to an embodiment, the foreground extracting apparatus 100 may extract the first candidate foreground through a cascade classifier using various features based on the encoding parameters as classification criteria. Here, the cascade classifier refers to a classifier that classifies each block included in the foreground extraction target frame into foreground or background by sequentially performing a plurality of classification steps. For reference, each of the plurality of classification steps may be referred to as a step-by-step classifier.
In some embodiments of the present invention, the cascade classifier may include a first-step classifier using features based on the first encoding parameter and a second-step classifier using features based on the second encoding parameter. The first-step classifier may include a 1-1-step classifier using a first feature based on the first encoding parameter (hereinafter, briefly referred to as a “first parameter feature”) and/or a 1-2-step classifier using a second feature based on the second encoding parameter (hereinafter, briefly referred to as a “second parameter feature”). Like this, the kind and number of the encoding parameters used in the cascade classifier, and the kind and number of the features based on the encoding parameters may be changed depending on embodiments.
Hereinafter, a cascade classifier-based foreground extracting method performed in the step (S300) will be described in more detail with reference to the cascade classifier shown in
Referring to
As described above, it should be noted that the motion vector-based cascade classifier shown in
Hereinafter, the encoding parameters that can be used in each classification step of the above cascade classifier, the features based on the encoding parameters, and the classification conditions based on the features will be described.
In an embodiment, a motion vector may be used as a classification criterion of the cascade classifier. Further, the length (or size) and direction of the motion vector may be used as the features of the motion vector, and the comparative result between the motion vector feature of a classification target block and the motion vector features of peripheral blocks may also be used.
Specifically, for example, in the specific classification step, a determination may be performed as to whether the length of the motion vector length of a classification target block is a first threshold value or less, and the classification target block may be classified as a background if the length of the motion vector length is a first threshold value or less.
As another example, in the specific classification step, a determination may be performed as to whether the length of the motion vector length of the corresponding block is a second threshold value or more, and the corresponding block may be classified as a background if the length of the motion vector length is a second threshold value or more. If the length of the motion vector is excessively large, the block is likely to be noise.
As another example, in the specific classification step, classification target blocks may be classified based on the comparative result between the motion vector feature of the classification target block and the motion vector features of peripheral blocks adjacent to the classification target block. Here, as shown in
In an embodiment, DCT coefficients may be used as the classification criterion of the cascade classifier. For example, among the peripheral blocks located within a predetermined distance from the classification target block, when the number of peripheral blocks having a DCT coefficient of not 0 is a threshold value or less, the corresponding blocks may be classified as background.
In an embodiment, partition information including the number and size of prediction blocks may be used the classification criterion of the cascade classifier. The partition information indicates information about a prediction block included in a macroblock, and it will be obvious to those skilled in the art, so that a description thereof will be omitted. For example, when the number of prediction blocks included in the classification target block is a threshold value or more or the number of prediction blocks having a predetermined size or less, the classification target block may be classified as foreground. In the opposite case, the classification target block may be classified as background. Generally, the reason for this is that a foreground object is characterized in that it is composed of a large number of small prediction blocks. As another example, the number of prediction blocks among the peripheral blocks of the classification target blocks is a threshold value or more and/or the number of the peripheral blocks having a predetermined size or less satisfying the condition of the number of prediction blocks being threshold value or more is a threshold value or more, the classification block may be classified as foreground.
For reference, the number of classification steps (or classifiers) constituting the above-described cascade classifiers may be a predetermined fixed value or a variable value that varies depending on the situation. For example, as the computing performance of the foreground extracting apparatus 100 increases, as the resolution of the foreground extraction target frame decreases, or as the accuracy requirement of the intelligent image analysis system, the number of classification steps may be a variable value that is set to a large value.
Up to now, a cascade classifier-based foreground classifying method that can be referred to in some embodiments of the present invention has been described with reference to
Hereinafter, a method of matching the classification units of the first candidate foreground and the second candidate foreground will be described with reference to
According to embodiments of the present invention, the foreground extracting apparatus 100 may match the classification units of the first candidate foreground and the second candidate foreground based on the block size which is a classification unit of the first candidate foreground. This matching is performed in order to reduce the complexity of an operation used in the foreground extraction by performing an operation in a unit of block at the time of determining a final foreground.
Specifically, the foreground extracting apparatus 100 groups the pixels included in the second candidate foreground into respective blocks. At this time, the grouping may be performed so that the position and size of each block correspond to each block of the first candidate foreground. The foreground extracting apparatus 100 may match the classification units of the first candidate foreground and the second candidate foreground by classifying each of the blocks included in the second candidate foreground as foreground or background according to Equation 1 below. In Equation 1, σu indicates the classification result of block u, j indicates an index of a pixel included in the block u, N(A) indicates the number of pixels A classified as foreground, and T indicates a threshold value. The classification result “0” indicates a case where the block is classified as background, and the classification result “1” indicates a case where the block is classified as foreground.
In Equation 1, the threshold value T may be a predetermined fixed value or may be a variable value that varies depending on the situation. For example, the threshold value T may be a variable value set to a smaller value when the number of blocks classified as foreground among the adjacent peripheral blocks is equal to or more than the threshold value, and may be a variable value set to a larger value when the number of blocks classified as background among the adjacent peripheral blocks is equal to or more than the threshold value.
Referring to
Up to now, the method of matching the classification units of the first candidate foreground and the second candidate foreground has been described with reference to
Hereinafter, the step (S500) of determining the final foreground will be described in detail using an MRF-based probability model.
Referring to
According to embodiments of the present invention, the foreground extracting apparatus 100 may determine the classification result w of each block included in the final foreground so that the energy value of the energy function described in Equation 2 below is minimized. Since those skilled in the art can obviously understand that a foreground extracting process can be modeled into a problem of minimizing the energy value of an MRF-based energy function, a detailed description thereof will be omitted. Further, those skilled in the art can obviously understand that Equation 2 below is determined based on the MRF model shown in
E=αE
v
+βE
u
+E
ω [Equation 2]
In Equation 2, the first energy term Ev indicates an energy term according to the relationship between the first block of the final foreground and the second block of the first candidate foreground, the second energy term Eu indicates an energy term according to the relationship between the first block of the final foreground and the third block of the second candidate foreground, and the third energy term Eω indicates an energy term according to the relationship between the first block of the final foreground and the peripheral block adjacent to the first block. α and β indicate scaling factors controlling the weighted value of each energy term. Hereinafter, a method of calculating the energy value of each energy term will be described.
According to embodiments of the present invention, the energy value of the first energy term Ev may be calculated using energy values of a plurality of frames including a foreground extracting frame in order to consider temporal continuity between image frames. The reason for this is that unit blocks classified as foreground in both the previous frame and the subsequent frame of the foreground extraction target frame are likely to be classified as foreground in the current frame.
Specifically, the first energy term Ev may be calculated by accumulating the energy values of the previous frame, the foreground extraction target frame, and the subsequent frame. This is expressed by Equation 3 below. In Equation 3, Evt indicates a energy term of the foreground extraction target frame (t), Evt−1 and Evt+1 indicate energy terms of the previous frame (t−1) and the subsequent frame (t+1), respectively, and the first energy term Ev is calculated based on three consecutive frames.
E
v
=E
v
t−1
+E
v
t
+E
v
t+1 [Equation 3]
Each of the energy terms shown in Equation 3 may be calculated according to Equation 4 below. In Equation 4, Dv (vi,ω) indicates the similarity between the first block (ω) of the final foreground and the second block (vi) of the first candidate foreground. In Equation 4, the minus sign means that as the similarity between two blocks increases, the energy value of each energy term is determined to have a smaller value.
E
v
f
=−D
v
f(vi,ω) [Equation 4]
In Equation 4, the similarity between two blocks may be calculated by using, for example, sum of squared difference (SSD), sum of absolute difference (SAD), or whether the labels indicating the classification result (e.g. 1 is foreground and 0 is background), but may also be calculated by any method.
Next, the energy value of the second energy term Eu may be calculated according to Equations 5 and 6 below. The second energy term Eu may also be calculated by accumulating the energy values of the previous frame, the foreground extraction target frame, and the subsequent frame in consideration of temporal continuity. Descriptions of Equations 5 and 6 below will be omitted because they are the same as those for calculating the energy value of the first energy term (Ev).
E
u
=E
u
t−1
+E
u
t
+E
u
t+1 [Equation 5]
E
u
f
=−D
u(σuf,ω) [Equation 6]
Next, the energy value of the third energy term Eω may be calculated according to Equations 7 below in consideration of similarity of the corresponding block and the peripheral block. This can be understood that, considering the characteristics of a rigid body having a compact form, if the peripheral block is classified as a foreground object, the corresponding block is also likely to be included in the same foreground object. In Equation 7, first peripheral blocks (1st-order neighborhood blocks) may be peripheral blocks located within a first distance, for example, upper, lower, left and right peripheral blocks. Further, second peripheral blocks (2nd-order neighborhood blocks) may be peripheral blocks located within a second distance greater than the first distance, for example, diagonal peripheral blocks, but the present invention is not limited thereto.
Further, in Equation 7, in order to give a higher weighted value to the similarity with the first peripheral block at a closer distance, the energy term coefficient γ1 for the first peripheral block may be set to a higher value than the energy term coefficient γ2 for the second peripheral block, but the present invention is not limited thereto.
The final foreground classification result indicating the solution of Equation 2 may be determined using an algorithm such as ICM (Iterated Conditional Modes) or SR (Stochastic Relaxation). Since the solution of the above Equations is already obvious to those skilled in the art, and a description thereof will be omitted.
According to embodiments of the present invention, the solution according to Equation 2 can be derived for each block included in the final foreground. In other words, an operation for deriving the solution of Equation 2 in a unit of pixel may not be performed, but an operation for deriving the solution of Equation 2 in a unit of block may be performed. Thus, the complexity of the operation for the final foreground determining (step S500) can be greatly reduced.
Meanwhile, according to embodiments of the present invention, a plurality of second candidate foregrounds may be used to determine the final foreground using a plurality of image processing algorithms. In this case, Equation 2 above can be expanded as shown in Equation 8 below. In Equation 8 below, the first energy term (Ev) indicates an energy term for the first candidate foreground, the 2-1st energy term (Eu1) indicates an energy term relating to the 2-1st candidate foreground, and the 2-nth energy term (Eun) indicates the energy term for the 2-nth candidate foreground.
E=αE
v+β1Eu
According to an embodiment, a plurality of first candidate foregrounds may be used. For example, a 1-1st candidate foreground determined through a motion vector-based cascade classifier, a DCT coefficient, and/or a 1-2nd candidate foreground determined through a partition information-based cascade classifier may be used to determine the final foreground. In this case, the energy function based on the MRF model may include a plurality of first energy terms.
According to an embodiment, the final foreground may be determined using only the first candidate foreground in order to provide faster foreground extraction results. In this case, in Equation 2, the final foreground may be determined by setting the coefficient factor (β) to zero. For example, if the intelligent image analysis system provides a heat map for flow population through image analysis, the accuracy of the foreground extraction may not be high. Therefore, in this case, a first candidate foreground is extracted, and the final foreground may be quickly provided using only the first candidate foreground. For reference, according to the experimental results to be described later with reference to
Up to now, a method of determining the final foreground using the MRF-based probability model in step S500 has been described in detail with reference to
Next, comparative experimental results of a conventional foreground extracting method and a foreground extracting method according to some embodiments of the present invention will be briefly described with reference to
Referring to
Further, referring to the foreground extraction results (730 and 750) of
In summary, it can be seen that the proposed method rapidly provides foreground extraction results while eliminating noise as compared with the conventional method.
Next, a case of determining the final foreground using only the first candidate foreground according to the embodiment of the present invention and comparative experimental results using the GMM-based image processing algorithm and the frame difference-based image processing algorithm will be described with reference to
Referring to
Referring to the foreground extraction results (810, 830, and 850) shown in
Finally, comparative experimental results for conventional optical flow and the proposed method will be described with reference to
As a typical method of performing motion estimation in an image, there is a method of using a block matching algorithm and an optical flow. The motion estimation result can be obtained by using the motion vector calculated through the block matching algorithm, but there is a disadvantage in that when the block matching algorithm is used, accuracy is lowered because the motion vector includes noise, compared to when the optical flow is used. However, when the method proposed in the embodiment of the present invention is used, the noise included in the motion vector is purified through the cascade classifier and the MRF model, so that the optical flow may be replaced. For example, the foreground extraction result according to the proposed method is defined as a motion map, and the motion vector value of the corresponding block is output only when the value of the motion map of the corresponding block is 1 (that is, when classified as foreground), thereby rapidly acquiring the motion estimation result.
Although various optical flow algorithms exist, a dense optical flow technique for calculating the optical flow in a unit of pixel is complex in operation to be applied to an actual system, so that a sparse optical flow technique for extracting several feature points and then calculating the optical flow for the feature points is generally used.
As shown in
Up to now, the comparative experimental results of the conventional foreground extracting method and the proposed foreground extracting method according to some embodiments of the present invention have been briefly described with reference to
The concepts of the present invention having been described above with reference to
Although operations are shown in a specific order in the drawings, it should not be understood that desired results can be obtained only when the operations must be performed in the specific order shown in the drawings or in a sequential order, or all the shown operations must be performed. In certain situations, multitasking and parallel processing may be advantageous. Moreover, it should not be understood that the separation of the various configurations in the above-described embodiments is necessarily required, and it should be understood that the described program components and systems may generally be integrated together into a single software product or packaged into a plurality of software products.
As described above, according to the embodiments of the present invention, a candidate foreground is extracted using en encoding parameter calculated in the encoding process of an image. Since the encoding parameter is information calculated in the encoding process including complicated operations, a relatively accurate foreground can be extracted even with a small number of operations. Moreover, the encoding parameters are not directly used for candidate foreground extraction but the classification is performed through a plurality of classification steps constituting the cascade classifier, so that the noise included in the encoding parameters can be purified. Therefore, there is provided an effect that a foreground extraction result is relatively resistant to noise and has high reliability.
Further, since the encoding parameters are information derived naturally in the image decoding process, it is not necessary to perform additional operations to acquire the encoding parameters. Further, since the cascade classifier does not perform an operation with high complexity, there is an effect that the foreground extraction result can be provided quickly.
Further, the final foreground can be determined using both the first candidate foreground extracted using the encoding parameters and the second candidate foreground extracted using a pixel-based image processing algorithm. Here, the final foreground may be determined using a markov random field (MRF)-based probability model. Accordingly, the accuracy and reliability of the foreground extraction result can be improved compared to those of conventional art.
In addition, the process of determining the final foreground using the MRF-based probability model is performed in a unit of block rather than in a unit of pixel. Therefore, the complexity of operations for foreground extraction is reduced, so that the accuracy of the foreground extraction result can be improved, and the processing performance of foreground extraction can also be improved.
The effects of the present invention are not limited by the foregoing, and other various effects are anticipated herein.
Although the preferred embodiments of the present invention have been disclosed for illustrative purposes, those skilled in the art will appreciate that various modifications, additions and substitutions are possible, without departing from the scope and spirit of the invention as disclosed in the accompanying claims.
Exemplary embodiments of the present invention have been described with reference to the accompanying drawings. However, those skilled in the art will appreciate that various modifications, additions and/or substitutions are possible, without materially departing from the scope and spirit of the present invention. All such modifications are intended to be included within the scope of the present invention as defined by the following claims, with equivalents of the claims to be included therein. Although the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the foregoing is illustrative and is not to be construed as limiting the scope of the present invention.
Number | Date | Country | Kind |
---|---|---|---|
10-2017-0084002 | Jul 2017 | KR | national |