[Not Applicable]
[Not Applicable]
[Not Applicable]
Video displays on multimedia devices come in many sizes. When a video image is scaled to fit the display size, textual information that may be contained in the video image is also scaled. Compact video displays may result in the scaling of text to the extent that the text is unreadable.
Further limitations and disadvantages of conventional and traditional approaches will become apparent to one of skill in the art, through comparison of such systems with some aspects of the present invention as set forth in the remainder of the present application with reference to the drawings.
A system and/or method is provided for processing text in a video stream, substantially as shown in and/or described in connection with at least one of the figures, as set forth more completely in the claims. Advantages, aspects and novel features of the present invention, as well as details of an illustrated embodiment thereof, will be more fully understood from the following description and drawings.
Aspects of the present invention relate to technique for modifying the way in which text is presented in video material, either to suit the capabilities of a display device or to improve its availability to users with special requirements. The following methods and systems may be used, for example, in conjunction with set-top-box decoders and multimedia processors. Although the following description may refer to particular wireless communication standards, many other standards may also use these systems and methods.
The following methods and systems may be particularly applicable to small or low-resolution display screens. This type of display is generally used in mobile telephones and in portable media players. If the video content was originally intended for display on a conventional television, the text may be difficult to read on a small screen. The following methods and systems can make the text easier to read. Moreover, the following methods and systems can be used by partially-sighted users to improve the clarity of text displayed on a conventional television or video screen.
The text content is then decoded, 103. The text to be extracted may be included in the main video image, or it may be included in supplementary data (“metadata”) that is part of or associated with the television transmission or the media file. If the text is in an image format, the text would be decoded using optical character recognition techniques. For example, text may be in an image format included in a video image, encoded as a bitmap, or stored in another video format in the metadata.
The extracted and decoded text may be modified in various ways prior to being presented to the user. The extracted text may be re-rendered and displayed, 105. The re-rendered text may typically replace the original text. The re-rendered text may be displayed in a clearer font or in a larger font. The processed text may be, for example, news and stock tickers, captions, subtitles for the hearing impaired and subtitles that translate foreign-language speech.
The decoded text may be translated into a different language, 107. For example, subtitles intended for the hearing impaired could be translated for use by users that do not understand the language of the soundtrack, and subtitles on foreign-language content could be translated into a third language.
The decoded text may also be used in conjunction with an automatic speech generation system to speak the text that is displayed on the screen, 109. This may be useful for blind and partially-sighted users and for users that have difficulty reading. Audio processing may be used to make the generated speech and the original soundtrack appear to originate from different locations. Audio processing may also be combined with language translation to generate speech in a language other than the language of the decoded text.
Enabling or disabling the foregoing functionality may be automatic or used-controlled.
The text content of the video stream is extracted by a text detector, 203. The text to be extracted may be included in the main video image, or it may be included in supplementary data (“metadata”) that is part of or associated with the television transmission or the media file.
The extracted text is decoded by the text decoder, 205. If the text is in an image format, the text would be decoded using optical character recognition techniques. For example, text may be in an image format included in a video image, encoded as a bitmap, or stored in another video format.
The decoded text may be modified in various ways prior to being presented to the user. The extracted text may be re-rendered by a display engine, 207. The display engine, 207, may insert the re-rendered text in place of the extracted text. The re-rendered text may be displayed in a clearer font or in a larger font. For example, a mobile media device, 209, may have a small screen. The display engine, 207, may automatically display the text with a legible font. Alternatively, the re-rendered text size may be adjustable by the user of the mobile media device, 209.
The processed text may be, for example, news and stock tickers, captions, subtitles for the hearing impaired, and subtitles that translate foreign-language speech.
The decoded text may also be translated into a different language.
Additionally, subtitles intended for the hearing impaired could be translated for use by users that do not understand the language of the soundtrack, and subtitles on foreign-language content could be translated into a third language.
The decoded text may also be used in conjunction with an automatic speech generation system to speak the text that is displayed on the screen.
Audio processing may also be combined with language translation to generate speech in a language other than the language of the decoded text.
The present invention may be realized in hardware, software, or a combination of hardware and software. The present invention may be realized in a centralized fashion in an integrated circuit or in a distributed fashion where different elements are spread across several circuits. Any kind of computer system or other apparatus adapted for carrying out the methods described herein is suited. A typical combination of hardware and software may be a general-purpose computer system with a computer program that, when being loaded and executed, controls the computer system such that it carries out the methods described herein.
The present invention may also be embedded in a computer program product, which comprises all the features enabling the implementation of the methods described herein, and which when loaded in a computer system is able to carry out these methods. Computer program in the present context means any expression, in any language, code or notation, of a set of instructions intended to cause a system having an information processing capability to perform a particular function either directly or after either or both of the following: a) conversion to another language, code or notation; b) reproduction in a different material form.
While the present invention has been described with reference to certain embodiments, it will be understood by those skilled in the art that various changes may be made and equivalents may be substituted without departing from the scope of the present invention. In addition, many modifications may be made to adapt a particular situation or material to the teachings of the present invention without departing from its scope. Therefore, it is intended that the present invention not be limited to the particular embodiment disclosed, but that the present invention will include all embodiments falling within the scope of the appended claims.