The invention relates to a method and to an apparatus for composition of subtitles for audio/video presentations, which can be used e.g. for HDTV subtitles in pre-recorded formats like the so-called Blue-ray Disc.
The technique of subtitling for Audio-Visual (AV) material has been used beginning with the first celluloid cinema movies and further until the recent digital media appeared. The main target of subtitling has been the support of handicapped people or small ethnographic language groups. Therefore subtitling often aims at the presentation of text information even when having been encoded as graphic data like pixel maps. Therefore pre-produced AV material for broadcasting (Closed Caption, Teletext, DVB-Subtitle etc.) and movie discs (DVD Sub-Picture etc.) primarily are optimized for subtitles representing simple static textual information. However, progress in PC software development for presentation and animation of textual information induces a corresponding demand for possibilities and features within the digital subtitling technique used for pre-recording and broadcasting. Using straightforward approaches without any special precautions, these increased requirements for subtitling would consume a too big portion of the limited overall bandwidth. The conflicting requirements for a ‘full feature’ subtitle encompassing karaoke all through genuine animations are on one hand the coding efficiency and on the other hand the full control for any subtitle author.
For today's state of the art of digitally subtitling AV material with separate subtitling information two main approaches exist: Subtitling can be based on either pixel data or on character data. In both cases, subtitling schemes comprise a general framework, which for instance deals with the synchronization of subtitling elements along the AV time axis.
In the character-based subtitling approach, e.g. in the teletext system ETS 300 706 of European analog or digital TV, strings are described by sequences of letter codes, e.g. ASCII or UNICODE, which intrinsically allows for a very efficient encoding. But from character strings alone, subtitling cannot be converted into a graphical representation to be overlaid over video. For this, the intended character set, font and some font parameters, most notably the font size, must either be coded explicitly within the subtitling bitstream or an implicit assumption must be made about them within a suitably defined subtitling context. Also, any subtitling in this approach is confined to what can be expressed with the letters and symbols of the specific font(s) in use. The DVB Subtitling specification ETS 300 743, in its mode of “character objects”, constitutes another state-of-the-art example of character-based subtitling.
In the pixel-based subtitling approach, subtitling frames are conveyed directly in the form of graphical representations by describing them as (typically rectangular) regions of pixel values on the AV screen. Whenever anything is meant to be visible in the subtitling plane superimposed onto video, its pixel values must be encoded and provided in the subtitling bitstream, together with appropriate synchronization info, and hence for the full feature animation of subtitles all pixel changed must be transported. Obviously, when removing any limitations inherent with full feature animations of teletext, the pixel-based approach carries the penalty of a considerably increased bandwidth for the subtitling data. Examples of pixel-based subtitling schemes can be found in DVD's sub-picture concept “DVD Specification for Read-Only disc”, Part 3: Video, as well as in the “pixel object” concept of DVB Subtitling, specified in ETS 300 743.
Embodiments of the invention include a subtitling format encompassing elements of enhanced syntax and semantic to provide improved animation capabilities. The disclosed embodiments improve subtitle performance without stressing the available subtitle bitrate. This will become essential for authoring content of high-end HDTV subtitles in pre-recorded format, which can be broadcast or pressed on high capacity optical media, e.g. the Blue-ray Disc. The invention includes abilities for improved authoring possibilities for the content production to animate subtitles.
Introduced by the disclosure are elements of syntax and semantic describing the color change for parts of graphics to display. This can be used for highlight effects in applications like for example karaoke, avoiding the repeated transfer of pixel data.
Other disclosed elements of syntax and semantic facilitate the ability of cropping parts of the subtitles before displaying them. By using the technique of subsequently transferred cropping parameters for an object to display, a bit saving animation of subtitles becomes available. Such cropping parameter can be used for example to generate text changes by wiping boxes, blinds, scrolling, wipes, checker boxes, etc.
Furthermore the disclosed elements can be used to provide interactivity on textual and graphical information. Especially the positioning and/or color settings of subtitles can be manipulated based upon user request.
Exemplary embodiments of the invention are described with reference to the accompanying drawings and tables, which show:
The invention can preferably be embodied based on the syntax and semantic of the DVB subtitle specification (DVB-ST). To provide improved capabilities for the manipulation of graphic subtitle elements, the semantics of DVB-ST's page composition segment (PCS) and region composition segment (RCS) are expanded.
DVB_ST uses page composition segments (PCS) to describe the positions of one or more rectangular regions on the display screen. The region composition segments (RCS) are used to define the size of any such rectangular area and identifies the color-lookup-table (CLUT) used within.
Embodiments of the proposed invention keeps backward compatibility with DVB-ST by using different segment_types for the enhanced PCS and RCS elements, as listed in
The enhanced PCS shown in
The enhanced RCS shown in
The enhanced PCS and enhanced RCS elements provide the ability that subtitles can be manipulated independent from the encoding method i.e. independent from whether they are encoded as character data or pixel data.
The enhanced PCS and RCS can be used to perform many different animation effects for subtitles. Those could be wiping boxes, blinds, scrolling, wipes, checker boxes, etc. The following figures show an application example for karaoke.
The region sub-CLUT location shown in the lower part of
Picking up all parameters defined with the previous figures results in the displayed subtitle as depicted in
As the enhanced PCS are sent within MPEG packet elementary stream (PES) packets labeled by presentation time stamps (PTS), any effect can be synchronized to the AV.
Another idea of the invention is the superseding of subtitle animation parameters by the user. This offers a way to realize interactive subtitles. The enhanced PCS parameters are transferred as a default, and the user may change them via a remote control for example. Thus the user is able to move, crop or highlight the subtitle.
This could be an advantage for a user defined repositioning of a subtitling text, so that the user can subjectively minimize the annoyance by the subtitle text placement on top of the motion video. Also the color of the subtitles could be set according to users preferences.
Another application for overriding subtitle animation parameters like position, cropping rectangle, CLUTs and sub-CLUTs is the realization of some very basic sort of interactive gaming. The subtitle may carry pixel data of an animated character. This character is subsequently moved on the display screen driven by either user interaction, programmatic control or both.
The overriding of subtitle animation parameters can be implemented in at least two ways. The first option is that the overriding parameters SD replace the parameters DD send in the bitstream. The second option is that the overriding parameters SD are used as an offset that is added to or subtracted from the subtitle animation parameters DD send in the bitstream.
The enhanced PCS and RCS provide a lot more of animation capabilities not explained. Following is a non-exhaustive list of examples: wiping boxes, blinds, scrolling, wipes, checker boxes in details.
Exemplary video and graphics planes are shown in
The apparatus contains a still picture decoder SPDec and an MPEG-2 video decoder MVDec, but since only one of them is used at a time, a switch s1 can select which data shall be used for further processing. Moreover, two identical decoders AVSGDec1,AVSGDec2 are used for decoding subtitle and animation data. The outputs of these two decoders AVSGDec1, AVSGDec2 may be switched by independent switches s2,s3 to either a mixer MX, or for preprocessing to a mixer and scaler MXS, which outputs its resulting data to said mixer MX. These two units MX, MXS are used to perform the superimposing of its various input data, thus controlling the display order of the layers. The mixer MX has inputs for a front layer f2, a middle front layer mf, a middle back layer mb and a background layer b2. The front layer f2 may be unused, if the corresponding switch s3 is in a position to connect the second AV sync graphics decoder AVSGDec2 to the mixer and scaler MXS. This unit MXS has inputs for front layer f1, middle layer m and background layer b. It superimposes these data correspondingly and sends the resulting picture data to the background input b2 of the mixer MX. Thus, these data represent e.g. a frame comprising up to three layers of picture and subtitles, which can be scaled and moved together within the final picture. The background input b1 of the mixer and scaler MXS is connected to the switch s1 mentioned above, so that the background can be generated from a still picture or an MPEG-2 video.
The output of the first AV sync graphics decoder AVSGDec1 is connected to a second switch s2, which may switch it to the middle layer input m of the mixer and scaler MXS or to the middle back layer input mb of the mixer MX. The output of the second AV sync graphics decoder AVSGDec2 is connected to a third switch s3, which may switch it to the front layer input f1 of the mixer and scaler MXS or to the middle front layer input mf of the mixer MX.
Depending on the positions of the second and third switch s2,s3, either the output of the first or the second AV sync graphics decoder AVSGDec1,AVSGD2 may have priority over the other, as described above. For having the data from the first decoder AVSGDec1 in the foreground, the second switch s2 may route the subtitle data to the middle back input mb of the mixer MX, while the third switch s3 routes the animation graphics data to the front input f1 of the mixer and scaler MXS, so that it ends up at the background input b2 of the mixer MX. Otherwise, for having the data from the second decoder AVSGDec2 in the foreground, the switches s2,s3 may route their outputs to the same unit, either the mixer and scaler MXS or the mixer MX, as shown in
Number | Date | Country | Kind |
---|---|---|---|
02025474 | Nov 2002 | EP | regional |
This application is a Continuation of co-pending U.S. application Ser. No. 12/800,418, herein incorporated by reference in its entirety, which is a Continuation-In-Part of U.S. application Ser. No. 10/535,106, filed May 16, 2005, herein incorporated by reference in its entirety. This application claims benefit of U.S. application Ser. No. 12/800,418, filed May 14, 2010, which claims the benefit, under 35 U.S.C. §365 of International Application PCT/EP03/12261, filed Nov. 3, 2003, which was published in accordance with PCT Article 21(2) on Jun. 3, 2004 in English and which claims the benefit of European patent application No. 02025474.4, filed Nov. 15, 2002.
Number | Name | Date | Kind |
---|---|---|---|
3891792 | Kimura | Jun 1975 | A |
4706075 | Hattori et al. | Nov 1987 | A |
4853784 | Abt et al. | Aug 1989 | A |
4876600 | Pietzsch et al. | Oct 1989 | A |
4961153 | Fredrickson et al. | Oct 1990 | A |
5214512 | Freeman | May 1993 | A |
5260695 | Gengler et al. | Nov 1993 | A |
5351067 | Lumelsky et al. | Sep 1994 | A |
5524197 | Uya et al. | Jun 1996 | A |
5530797 | Uya et al. | Jun 1996 | A |
5715356 | Hirayama et al. | Feb 1998 | A |
5742352 | Tsukagoshi | Apr 1998 | A |
5758007 | Kitamura | May 1998 | A |
5930450 | Fujita | Jul 1999 | A |
6046778 | Nonomura et al. | Apr 2000 | A |
6115077 | Tsukagoshi | Sep 2000 | A |
6160952 | Mimura et al. | Dec 2000 | A |
6249640 | Takiguchi et al. | Jun 2001 | B1 |
6275267 | Kobayashi | Aug 2001 | B1 |
6415437 | Ludvig et al. | Jul 2002 | B1 |
6424792 | Tsukagoshi et al. | Jul 2002 | B1 |
6466220 | Cesana et al. | Oct 2002 | B1 |
6493036 | Fernandez | Dec 2002 | B1 |
6661427 | MacInnis et al. | Dec 2003 | B1 |
6661467 | Van Der Meer et al. | Dec 2003 | B1 |
6680448 | Kawashima et al. | Jan 2004 | B2 |
6741794 | Sumioka et al. | May 2004 | B1 |
6771319 | Konuma | Aug 2004 | B2 |
6775467 | Su | Aug 2004 | B1 |
6859236 | Yui | Feb 2005 | B2 |
6888577 | Waki et al. | May 2005 | B2 |
7062153 | Suda | Jun 2006 | B2 |
7068324 | Englert | Jun 2006 | B2 |
7623140 | Yeh et al. | Nov 2009 | B1 |
7676142 | Hung | Mar 2010 | B1 |
7852411 | Adolph et al. | Dec 2010 | B2 |
8737810 | Gandolph et al. | May 2014 | B2 |
20010017886 | Webb et al. | Aug 2001 | A1 |
20020063681 | Lan et al. | May 2002 | A1 |
20020075403 | Barone et al. | Jun 2002 | A1 |
20020122136 | Safadi et al. | Sep 2002 | A1 |
20020196369 | Rieder et al. | Dec 2002 | A1 |
20030194207 | Chung et al. | Oct 2003 | A1 |
20030237091 | Toyama et al. | Dec 2003 | A1 |
20060013563 | Adolph et al. | Jan 2006 | A1 |
20060153532 | McCrossan et al. | Jul 2006 | A1 |
20060204092 | Hamasaka et al. | Sep 2006 | A1 |
20070280641 | Uchimura | Dec 2007 | A1 |
20080187286 | Chung et al. | Aug 2008 | A1 |
20110044662 | Gandolph et al. | Feb 2011 | A1 |
20110150421 | Sasaki et al. | Jun 2011 | A1 |
20130250058 | DeHaan | Sep 2013 | A1 |
20150255120 | Gandolph et al. | Sep 2015 | A1 |
20150255121 | Gandolph et al. | Sep 2015 | A1 |
Number | Date | Country |
---|---|---|
1094670 | Apr 2001 | EP |
10091172 | Apr 1988 | JP |
64080181 | Mar 1989 | JP |
5037873 | Feb 1993 | JP |
7226920 | Aug 1995 | JP |
7250279 | Sep 1995 | JP |
8234775 | Sep 1996 | JP |
8317301 | Nov 1996 | JP |
9051489 | Feb 1997 | JP |
9051502 | Feb 1997 | JP |
9284708 | Oct 1997 | JP |
108129 | Apr 1998 | JP |
63175583 | Jul 1998 | JP |
2002216585 | Aug 2002 | JP |
200464206 | Feb 2004 | JP |
200291409 | Mar 2007 | JP |
Entry |
---|
Bloom et al: “A Watermarking to Track Motion Picture Theft”, 2004 IEEE, pp. 362, 362-367. |
Brett et al: “Video Processing for Single-Chip DVB Decoder,” 2001 IEEE. Manuscript received Jun. 25, 2001, pp. 385-393. |
Haitsma et al: “A Watermarking Scheme for Digital Cinema”, 2001 IEEE, pp. 487-489. |
ETSI—ETS300743—Digital Video Broadcasring (DVB)—Subtitling system, pp. 1-45, Sep. 1997. |
Number | Date | Country | |
---|---|---|---|
20150281634 A1 | Oct 2015 | US | |
20170195618 A9 | Jul 2017 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 12800418 | May 2010 | US |
Child | 14224197 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 10535106 | US | |
Child | 12800418 | US |