The invention relates to a method of adapting an image.
The invention also relates to control software for making a programmable device operative to perform such a method.
The invention further relates to an electronic device comprising electronic circuitry operative to adapt an image.
The invention also relates to electronic circuitry for use in such a device.
An example of such a method is known from US 2003/0021586. The known method controls the display of closed captions and subtitles for a combination system of optical or other recording/reproducing apparatus and a television. The known method ensures that the displayed closed captions and subtitles that both exist as text in ASCII format do not overlap. The known method has the drawback that it cannot be used to control the display of closed captions and subtitles if the subtitles form an integral part of the image.
It is a first object of the invention to provide a method of the type described in the opening paragraph, which can be used to control the display of text forming an integral part of the image.
It is a second object of the invention to provide an electronic device of the type described in the opening paragraph, which can be used to control the display of text forming an integral part of the image.
According to the invention, the first object is realized in that the method comprises the steps of identifying a text in the image, the text having a typographical aspect, and modifying the typographical aspect of the text. Analog video material (e.g. analog video broadcasts or analog video tapes) often contains overlay captions and/or subtitles. The method of the invention makes it possible to customize the appearance of overlay text on a display.
In an embodiment of the method of the invention, the typographical aspect comprises font size. The typographical aspect may additionally or alternatively comprise, for example, font type and/or font color. Increasing the font size makes the text easier to read for people who have difficulty reading and/or who use devices with small displays, e.g. mobile phones.
The step of identifying a text in the image may comprise detecting horizontal text line boundaries by determining which ones of a plurality of image lines comprise a highest number of horizontal edges. This improves the text detection performance of the identifying step. By first detecting horizontal text line boundaries, the area that has to be processed in the next step of the text detection algorithm can be relatively small. The inventive idea of detecting horizontal text line boundaries in order to decrease the area that has to be processed, and embodiments of this idea, can also be used without the need to modify the typographical aspect of the text, e.g. when it is used in multimedia indexing and retrieval applications.
The step of identifying a text in the image may further comprise determining a set of pixel values only occurring between the horizontal text line boundaries and identifying pixels as text pixels if the pixels have a value from said set of pixel values. Unlike some alternative text detection algorithms, this text detection algorithm makes it possible to detect inverted text as well as normal text.
The step of identifying a text in the image may further comprise determining a word boundary by performing a morphological closing operation on the identified text pixels and identifying further pixels as text pixels if the further pixels are located within the word boundary. This ensures that a larger number of the text pixels in the video image can be correctly identified.
The step of modifying the typographical aspect of the text may comprise processing text pixels, which form the text, and overlaying the processed pixels on the image. This is useful for adapting images that are composed of pixels.
The method of the invention may further comprise the step of replacing at least one of the text pixels with a replacement pixel, the value of the replacement pixel being based on a value of a non-text pixel, i.e. a pixel which did not form the text. Removal of original text may be necessary if the reformatted text does not completely overlap the original text. By using a replacement pixel, which is based on a value of a non-text pixel, the number of visible artifacts decreases. This inventive way of removing text causes a relatively low number of artifacts and is useful in any application in which text is removed. If a user simply wants to remove subtitles, because he can understand the spoken language, it is not necessary to modify the typographical aspect of the subtitles.
The value of the replacement pixel may be based on a median color of non-text pixels in a neighborhood of the at least one text pixel. In tests, this resulted in replacement pixels that were less noticeable than replacement pixels that were determined with alternative algorithms.
The method of the invention may further comprise the step of replacing a further text pixel in a neighborhood of the replacement pixel with a further replacement pixel, the value of the further replacement pixel being at least partly based on the replacement pixel. Simply increasing the neighborhood size if text pixels have fewer than a pre-determined number of non-text pixels in its neighborhood is not appropriate, because the estimated color may not be accurate if distant background pixels are used, and the larger the neighborhood size, the more computation is needed. If the value of the further replacement pixel is at least partly based on the replacement pixel, and especially if the value of the further replacement pixel is based on a plurality of replacement pixels in the neighborhood of the further replacement pixel, a relatively small neighborhood size is sufficient to achieve a good reduction of visible artifacts.
The step of modifying the typographical aspect of the text may comprise scrolling the text in subsequent images. If the enlarged subtitles or captions have to be fit in their entirety in the video image, the enlargement of the subtitles or captions is limited to a certain maximum. This maximum may be insufficient for some persons. By scrolling the reformatted text pixels in subsequent video images, the text size can be enlarged even further.
The method of the invention may further comprise the step of enabling a user to define a rate at which the text will be scrolled. This allows a user to adjust the rate to his reading speed.
According to the invention, the second object is realized in that the electronic circuitry functionally comprises an identifier for identifying a text in the image, the text having a typographical aspect, and a modifier for modifying the typographical aspect of the text. The electronic device may be, for example, a PC, a television, a set-top box, a video recorder, a video player, or a mobile phone.
These and other aspects of the invention are apparent from and will be further elucidated, by way of example, with reference to the drawings, in which:
Corresponding elements in the drawings are denoted by the same reference numerals.
The method of the invention, see
Step 3 of modifying the typographical aspect of the text may comprise scrolling the text in subsequent images. In
Overlay text detection in video has recently become popular as a result of the increasing demand for automatic video indexing tools. All of the existing text detection algorithms exploit the high contrast property of overlay text regions in one way or another. In a favorable text detection algorithm, the horizontal and vertical derivatives of the frame where text will be detected are computed first in order to enhance the high contrast regions. It is well-known in the image and video-processing literature that simple masks, such as masks 61 and 63 of
A statistical learning tool can be used to find an optimal text/non-text classifier. Support Vector Machines (SVMs) result in binary classifiers and have nice generalization capabilities. An SVM-based classifier trained with 1,000 text blocks and, at most, 3,000 non-text blocks for which edge orientation features are computed, has provided good results in experiments. As it is difficult to find the representative hard-to-classify non-text examples, the popular bootstrapping approach that was introduced by K. K. Sung and T. Poggio in “Example-based learning for view-based human face detection,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 20, no. 1, pp. 39-51, January 1998 can be followed. Bootstrap-based training is completed in several iterations and, in each iteration, the resulting classifier is tested on some images that do not contain text. False alarms over this data set represent difficult non-text examples that the current classifier cannot correctly classify. These non-text samples are added to the training set; hence, the non-text training dataset grows and the classifier is retrained with this enlarged dataset. When a classifier is being trained, an important issue to decide upon is the size of the image blocks that are fed to the classifier because the height of the block determines the smallest detectable font size, whereas the width of the block determines the smallest detectable text width. 12×12 Blocks for training the SVM classifier provide good results, because in a typical frame with a height of 400 pixels, it is rare to find a font size which is smaller than 12. Font size independence is achieved by running the classifier with 12×12 window size over multiple resolutions, and location independence is achieved by moving the window in horizontal and vertical directions to evaluate the classifier over the whole image. The described text detection algorithm results in block-based text regions as shown in
Step 1 of identifying a text in the image may comprise detecting horizontal text line boundaries by determining which ones of a plurality of image lines comprise a highest number of horizontal edges. One way of obtaining a pixel-accurate text mask is by specifically locating text line and word boundaries (primarily to be able to display text in multiple lines and to extract the text mask more accurately) and extracting the binary text mask. A morphological analysis can be performed after the text regions in the same line and adjacent rows have been combined to result in a single joint region to be processed. ROI 71 of
Step 1 of identifying a text in the image may further comprise determining a set of pixel values only occurring between the horizontal text line boundaries and identifying pixels as text pixels if the pixels have a value from said set of pixel values. After text lines are detected, a threshold Tbinarization is automatically computed to find the binary and pixel-wise more accurate text mask. The parameter Tbinarization is set in such a way that no pixel outside the detected text lines shown in ROI 75 of
Step 1 of identifying a text in the image may further comprise determining a word boundary by performing a morphological closing operation on the identified text pixels and identifying further pixels as text pixels if the further pixels are located within the word boundary. A morphological closing operation, whose result is shown in ROI 79 of
Step 1 of modifying the typographical aspect of the text may comprise processing text pixels, which form the text, and overlaying the processed pixels on the image. After or before overlaying the processed pixels on the image, a step 9 of replacing at least one of the text pixels with a replacement pixel may be performed, the value of the replacement pixel being based on a value of a non-text pixel. The value of the replacement pixel may be based on a median color of non-text pixels in a neighborhood of the at least one text pixel. An enlarged text mask as shown in ROI 79 of
The method of the invention may further comprise the step of replacing a further text pixel in a neighborhood of the replacement pixel with a further replacement pixel, the value of the further replacement pixel being at least partly based on the replacement pixel. If the text pixel is distant to the boundary of the text mask, even a large window may then not have enough non-text pixels to approximate the color to be used for filling in the text pixel. Furthermore, the use of larger windows for these pixels is not appropriate because 1) they are far from background so that the estimated color may not be accurate if distant background pixels are used, and 2) the larger the window size, the more computations are needed. In these cases, the median color of these pixels in the small, such as 3×3, neighborhood of the current text pixel is assigned as its color. This neighborhood is defined in accordance with the processing direction so that all text pixels in the neighborhood have already been assigned a color. Note that the color values of all pixels in this small window are used regardless of them originally being text or non-text. The result of this text detection algorithm is shown in
The electronic device 21 of the invention, see
While the invention has been described in connection with favorable embodiments, it will be understood that modifications thereof within the principles outlined above will be evident to those skilled in the art, and thus the invention is not limited to the favorable embodiments but is intended to encompass such modifications. The invention resides in each and every novel characteristic feature and each and every combination of characteristic features. Reference numerals in the claims do not limit their protective scope. Use of the verb “to comprise” and its conjugations does not exclude the presence of elements other than those stated in the claims. Use of the article “a” or “an” preceding an element does not exclude the presence of a plurality of such elements.
The invention can be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed device, ‘Control software’ is to be understood to mean any software product stored on a computer-readable medium, such as a floppy disk, downloadable via a network, such as the Internet, or marketable in any other manner.
Number | Date | Country | Kind |
---|---|---|---|
04105759.7 | Nov 2004 | EP | regional |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/IB05/53661 | 11/8/2005 | WO | 5/9/2007 |