These and/or other aspects and advantages of the invention will become apparent and more readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the like elements throughout. Embodiments are described below to explain the present invention by referring to the figures.
Thus, according to an embodiment of the present invention, processes detecting captions from videos include localization processes of localizing text domains, binarization processes removing a background from the localized text domain, and recognition processes recognizing the text after the background has been removed.
Such binarization processes may include removing a background from a localized text domain, where the background has been precisely removed from a text domain using a stroke filter response value of a stroke filter, to increase text recognition rates and improve processing speeds. As understood, further discussion of such localization recognition processes will be further omitted.
A stroke filter according to an embodiment of the present invention, for example, can been seen in more detail in Korean Patent Application No. 10-2005-111432, filed on 2005. Accordingly, further discussion of such stroke filters will only be briefly further described below.
When a length of the first filter {circle around (1)} is d, lengths d1 of the second filter {circle around (2)} and the third filter {circle around (3)} may correspond to ½ of the length of the first filter {circle around (1)}, for example. Also, a distance d2 between the first filter {circle around (1)} and the second filter {circle around (2)} may correspond to ½ of the length of the first filter {circle around (1)}, for example, and a distance between the first filter {circle around (1)} and the third filter {circle around (3)} may correspond to ½ of the length of the first filter {circle around (1)}. Here, it should be noted that such references should only be considered example, as embodiments of the present invention can be implemented with various filters, for example.
The stroke filter detects text strokes by changing an angle α of the stroke filter. For example, whenever rotating the angle α of the stroke filter by 0, 45, 90, and 135 degrees, strokes can be detected from pixel values of pixels included in the stroke filter.
In this case, the image of the text domain on which bright stroke filtering and dark stroke filtering are performed is a text domain of an image extracted by the localization process.
Hereinafter, example operations of this text extraction method, according to an embodiment of the present invention, will be described in greater detail.
In operation S210, the bright stroke filtering and the dark stroke filtering are performed on the image of the text domain extracted by the localization process, and a response value is acquired by each filtering.
In this case, the response values acquired by the bright stroke filtering and the dark stroke filtering may be expressed as shown in the below Equation 1 and Equation 2, for example.
R
B(α,d)=m1−m2+m1−m3−|m2−m3| Equation 1
R
D(α,d)=m2−m1+m3−m1−|m2−m3| Equation 2
Here, RB and RD indicate the response values of the bright stroke filter and the dark stroke filter, α indicates an angle of a gradient of the stroke filter, d indicates a length of the first filter {circle around (1)}, m1, m2, and m3 indicate means with respect to pixel values of pixels included in the first filter {circle around (1)}, second filter {circle around (2)}, and third filter {circle around (3)}, e.g., as shown in
In operation S220, the text color polarity of the text having a bright or dark color polarity is determined through two techniques, according to polarities of the text and a background of the text.
As one of the two techniques, the color polarity of the text is determined by using a rate FR of a response value RB of the bright stroke filter and a response value RD of the dark stroke filter may be applied when the polarity of the background is different from the polarity of the text. See Equation 3, below.
F
R
=ΣR
B
/ΣR
D Equation 3
As shown in the above Equation 3, for example, the polarity of the text may be determined to be bright, when RB is much greater than RD, “FR>>1”, or the polarity of the text may be determined to be dark, when RB is much smaller than RD, “FR<<1”. Accordingly, when the polarity of the text is different from the polarity of the background, the color polarity of the text may be determined by using only a value of FR.
The other of the two techniques may be applied when the polarity of the text is similar to the polarity of the background, for example. When the polarity of the text is similar to the polarity of the background, since the value of FR is designated close to “1” with respect to both cases that the polarity of the text is bright or dark, the rate of a number of crossings in a binarized image as well as the value of FR may be used.
In this case, a rate FE of a number NB of bright crossings and a number of ND of dark crossings in the binarized image may be expressed as shown in the below Equation 4, for example.
F
E
=ΣN
B
/ΣN
D Equation 4
As known from Equation 4, the polarities of the text and the background may be considered bright when NB is less than ND, “FE<1”, for example, and the polarities of the text and the background may be considered dark when NB is much greater than ND, “FE>1”. Accordingly, when the polarities of the text are similar to the polarity of the background, the color polarity of the text may be determined by using both the value of FR, and the value of FE. Namely, when the value of the FR is close to “1” and the value of FE is less than “1”, the color polarity of the text may be determine to be bright, and when the value of FE is greater than “1”, the color polarity of the text may be determined to be dark.
In
Measured values according to the color polarities of the background and the text in the two techniques of determining the color polarity are shown in the below Table 1, for example.
Here, BonD is an image including a bright text existing in a dark background, DonB is an image including a dark text existing in a bright background, BonB is an image including a bright text existing in a bright background, and DonD is an image including a dark text existing in a dark background.
These are four examples of determining the color polarity of the text, as shown in Table 1, and may be further expressed as below, according to one embodiment of the present invention.
When FR is greater than 1.1 (FR>1.1), the color polarity of the text may be determined to be bright, when FR is less than 0.9 (FR<0.9), the polarity of the text may be determined to be dark, when FR is greater than or equal to 0.9 and less than or equal to 1.1 (0.9≦FR≦1.1) and FE is less than or equal to 1 (FE≦1), the color polarity of the text may be determined to be bright, and when FR is greater than or equal to 0.9 and less than or equal to 1.1 (0.9≦FR≦1.1) and FE is greater than 1 (FE>1), the color polarity of the text may be determined to be dark.
Though such values have been referenced in this embodiment, such values used for determining the color polarity of the text, such as 0.9 and 1.1, are not fixed and may be changed depending upon circumstances. Thus, alternate embodiments are equally available.
When the color polarity of the text is determined in operation S220, a binarization process with respect to the response value of the stroke filter may be performed by using a threshold, in operation S230. A binarized domain acquired by operation S230 may be used for an initial seed domain to expand a local domain, for example. In this case, depending on embodiment, the threshold may be selectively assigned by a designer.
In operation S240, the local domain may further be expanded by using the binarized domain.
Here, when the domain expansion condition of the pixels is determined to be the non-text domain in the window the probability Pr(s) of each pixel is greater than a predetermined value T1 and a difference in density, with a neighboring text pixel, is less than a predetermined value T2. In this case, T1 and T2 may be 0.75 and 15, respectively, for only examples. Again, embodiments of the present invention are not limited to such values and T1 and T2 may be changed depending upon circumstances. The probability Pr(s) of the corresponding pixel may be determined by using a probability density function PDF(s), calculated as shown below in Equation 5, for example.
Pr(s)=PDF(s) Equation 5
The process of expanding the local domain, illustrated in
In operation S410, a binarized stroke image and an original image of a text domain, shown in portion (a) of
In operation S420, a window having a predetermined number of pixels, such as 9 of pixels, may be selected. Thereafter, in operation S430, it may be determined whether the number of pixels determined to be text is represented by 4 to 8 pixels in a corresponding window.
When the number of pixels determined to be the text in the window is 4 to 8 pixels, as a result of the determination of operation S430, operations S440 through operation S470 may be performed. When the number of pixels determined to be the text in the window does not correspond to 4 to 8 pixels, operation S470 may be performed.
When the number of pixels determined to be the text in the window is 4 to 8 pixels, for example, as shown in portion (b) of
When the process of domain expansion is performed with respect to the entire window of the text domain and a change in a label of the pixel does further not occur in operation S470, a text domain portion (d) of
When operation S420 of expanding the local domain is performed via a series of processes, in the aforementioned operation S250, the OCR may recognize the text domain in which the local domain is expanded.
Again, with reference to
When the recognition score is low, in operation S270, the text color may be converted into the opposite polarity and operation S230 performed. Namely, the text color may be converted into the polarity opposite to the color polarity determined in operation S220, and operations S230 through S260 may be repeated.
As a result of experiments performing such processes according to one embodiment of the present invention, a precision in the determining of the color polarity was 97.4%, and a result of extracting the text was excellent.
As shown in
Similarly, as shown in
Namely, a text color polarity of a text domain detected by a localization process is determined by the text extraction process of using a response value acquired by a stroke filter, according to an embodiment of the present invention, and an original image is converted into a binary image and locally expanded, thereby extracting the precise text domain from the original image.
The stroke filter unit 910 filters an original image of an input text domain by using stroke filters. In this case, the stroke filter unit 910 may perform all bright stroke filtering and dark stroke filtering and output response values, for example.
The text color polarity determiner 920 may determine a color polarity of the text by using the response value of the stroke filter unit 910. Here, the text color polarity may be determined by using a rate of the response values of the bright stroke filter and the dark stroke filter, for example. When the rate is greater than 1, the text color polarity may be determined to be bright, and when the rate is less than or equal to 1, the text color polarity may be determined to be dark.
In this case, a rate of a number of bright crossings to a number of dark crossings in a binarized image may be used. When the rate of the response values is from 0.9 and 1.1, the text color polarity may be determined to be bright when the rate of the numbers is less than or equal to 1 and may be determined to be dark when the rate of the number is greater than 1.
The binarization performer 930 may perform binarization of thetext domain with respect to the response values of the stroke filter unit 910. In this case, the binarization may be performed based on a simple threshold.
The local domain expander 940 may further expand a local domain by using a binarized domain, e.g., acquired by the binarization of the binarization performer 930, and output a result of the local domain expansion to an OCR to recognize the extracted text domain.
The probability density calculator 1010 may calculate a PDF of text domain density by using a binarized stroke image and an original image.
The window selector 1020 may further selects a window having a predetermined number of pixels, such as 9 pixels, for example.
The text domain expander 1030 performs domain expansion when a probability P(s) of each pixel determined to be a non-text domain in the window selected by the window selector 1020 is less than a predetermined value T1, such as 0.75, and a difference in density from a neighboring pixel is less than a predetermined value T2, such as 15, again noting that alternative values are equally available.
The domain expansion completion determiner 1040 may still further determine a pixel label change of the binarized stroke image, send the binarized stroke image to the window selector 1040, and output a text domain in which a local domain is expanded, to the OCR, when there are no pixel label changes.
In addition to the above described embodiments, embodiments of the present invention can also be implemented through computer readable code/instructions in/on a medium, e.g., a computer readable medium, to control at least one processing element to implement any above described embodiment. The medium can correspond to any medium/media permitting the storing and/or transmission of the computer readable code.
The computer readable code can be recorded/transferred on a medium in a variety of ways, with examples of the medium including magnetic storage media (e.g., ROM, floppy disks, hard disks, etc.), optical recording media (e.g., CD-ROMs, or DVDs), and storage/transmission media such as carrier waves, as well as through the Internet, for example. Here, the medium may further be a signal, such as a resultant signal or bitstream, according to embodiments of the present invention. The media may also be a distributed network, so that the computer readable code is stored/transferred and executed in a distributed fashion. Still further, as only an example, the processing element could include a processor or a computer processor, and processing elements may be distributed and/or included in a single device.
An aspect of an embodiment of the present invention provides a text extraction method, medium, and system in which a color polarity of a text is determined by using a response value of a stroke as a feature forming text, thereby improving precision of color polarity determination.
An aspect of an embodiment of the present invention further provides a text extraction method, medium, and system in which a non-stroke background domain is removed by stroke filters, thereby improving performance of an extracting of a text domain.
An aspect of an embodiment of the present invention further provides a text extraction method, medium, and system in which response values of stroke filters, used in a detecting of text, are used, thereby reducing calculation amounts to reduce processing times in text extraction.
An aspect of an embodiment of the present invention further provides a text extraction method, medium, and system in which text extraction is performed by a stroke, thereby providing a improved results when the color polarity of a background of text is similar to the color polarity of the text.
Although a few embodiments of the present invention have been shown and described, it would be appreciated by those skilled in the art that changes may be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the claims and their equivalents.
Number | Date | Country | Kind |
---|---|---|---|
10-2006-0055606 | Jun 2006 | KR | national |