This application claims the benefit of Korean Patent Application No. 10-2005-0038491, filed on May 9, 2005, in the Korean Intellectual Property Office, the disclosure of which is incorporated herein by reference.
1. Field of the Invention
The present invention relates to an apparatus for processing or using a moving-picture, such as audio and/or video storage media, multimedia personal computers, media servers, Digital Versatile Disks (DVDs), recorders, digital televisions and so on, and more particularly, to an apparatus and method for summarizing a moving-picture using events, and a computer-readable recording medium storing a computer program for controlling the apparatus.
2. Description of Related Art
Recently, along with the increase in the capacity of data storage media to the range of tera-bytes, developments in data compression technologies, variety in the types of digital devices, multichannelization in broadcasting, the explosion in the creation of personnel contents and so on, the creation of multimedia contents is widespread. However, users often do not have sufficient time to search their desired contents from these various and voluminous multimedia contents, as well as experiencing difficulties in searching them. Accordingly, many users hope that their PCs, etc. summarize and show their desired contents. For example, many users hope that they see their desired contents where they are, see the summarized or highlighted portions of their desired contents, their desired contents or scenes are made as an index, and contents or scenes are provided according to their tastes or moods.
In order to satisfy these users' requirements, various methods for summarizing a moving-picture have been developed. Conventional methods for segmenting and summarizing a moving-picture for each shot are disclosed in U.S. Pat. Nos. 6,072,542, 6,272,250 and 6,493,042. However, since these conventional moving-picture summarizing methods divide a moving-picture into too many segments, they do not provide summarized moving-picture information to users.
Conventional methods for summarizing a moving-picture based on similarity of single information are disclosed in U.S. Pat. Nos. 5,805,733, 6,697,523, and 6,724,933. These conventional methods summarize a moving-picture based on similarity of color, instead of segmenting a moving-picture based on contents. Therefore, the conventional methods do not always accurately summarize a moving-picture depending on its contents.
A conventional method for compressing a moving-picture based on a multi-modal is disclosed in U.S. Patent No. 2003-0131362. However, in this conventional method, a moving-picture is compressed at a very slow speed.
An aspect of the present invention provides a moving-picture summarizing apparatus using events, for correctly and quickly summarizing a moving-picture based on its contents using video and audio events.
An aspect of the present invention further provides a moving-picture summarizing method using events, for correctly and quickly summarizing a moving-picture based on its contents using video and audio events.
An aspect of the present invention further more provides a computer readable recording medium storing a computer program for controlling the moving-picture summarizing apparatus using the events.
According to an aspect of the present invention, there is provided a moving-picture summarizing apparatus using events, comprising: a video summarizing unit combining or segmenting shots considering a video event component detected from a video component of a moving-picture, and outputting the combined or segmented result as a segment; and an audio summarizing unit combining or segmenting the segment on the basis of an audio event component detected from an audio component of the moving-picture, and outputting a summarized result of the moving-picture, wherein the video event is an effect inserted where the content of the moving-picture changes and the audio event is the type of sound by which the audio component is identified.
According to another aspect of the present invention, there is provided a moving-picture summarizing method comprising: combining or segmenting shots considering a video event component detected from a video component of a moving-picture, and deciding the combined or segmented result as a segment; and combining or segmenting the segment on the basis of an audio event component detected from an audio component of the moving-picture, and obtaining a summarized result of the moving-picture, wherein the video event is an effect inserted where the content of the moving-picture changes and the audio event is the type of sound by which the audio component is identified.
According to still another aspect of the present invention, there is provided a computer-readable recording medium having embodied thereon a computer program for controlling a moving-picture summarizing apparatus performing a moving-picture summarizing method using events, the method comprises: combining or segmenting shots considering a video event component detected from a video component of a moving-picture, and deciding the combined or segmented result as a segment; and combining or segmenting the segment on the basis of an audio event component detected from an audio component of the moving-picture, and obtaining a summarized result of the moving-picture, wherein the video event is an effect inserted where the content of the moving-picture changes and the audio event is the type of sound by which the audio component is identified.
Additional and/or other aspects and advantages of the present invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
The above and/or other aspects and advantages of the present invention will become apparent and more readily appreciated from the following detailed description, taken in conjunction with the accompanying drawings of which:
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the like elements throughout. The embodiments are described below in order to explain the present invention by referring to the figures.
Alternately, the moving-picture summarizing apparatus shown in
The operations 40 and 42 shown in
Referring to
Referring to
Referring to
After operation 80, the scene transition detector 62 receives the video component of the moving-picture through the input terminal IN3, detects a scene transition portion from the received video component, outputs the scene transition portion to the audio summarizing unit 12 through an output terminal OUT4, creates time and color information of the same scene period using the scene transition portion, and outputs the created time and color information of the same scene period to the video shot combining/segmenting unit 64 (operation 82). Here, the same scene period consists of frames between scene transition portions, that is, a plurality of frames between a frame at which a scene transition occurs and a frame at which a next scene transition occurs. The same scene period is also called a ‘shot’. The scene transition detector 62 selects a single representative image frame or a plurality of representative image frames from each shot, and can output the time and color information of the selected frame(s). The operation performed by the scene transition detector 62, that is, detecting the scene transition portion from the video component of the moving-picture, is disclosed in U.S. Pat. Nos. 5,767,922, 6,137,544, and 6,393,054.
According to the present embodiment, it is possible to perform operation 82 before operation 80 or simultaneously perform operations 80 and 82, differently from that illustrated in
After operation 82, the video shot combining/segmenting unit 64 measures the similarity of the shots received from the scene transition detector 62 using the color information of the shots, combines or segments the shots based on the measured similarity and the video event component received from the video event detector 60, and outputs the combined or segmented result as a segment through an output terminal OUT3 (operation 84).
Referring to
The similarity calculating unit 102 reads a first predetermined number of color information belonging to a search window from the color information stored in the buffer 100, calculates the color similarity of the shots using the read color information, and outputs the calculated color similarity to the combining unit 104. Here, the size of the search window corresponds to the first predetermined number and can be variously set according to EPG (Electronic Program Guide) information. According to the present embodiment, the similarity calculating unit 102 can calculate the color similarity using the following Equation 1:
Here, Sim(H1, H2) represents the color similarity of two shots H1 and H2 to be compared in similarity, received from the scene transition detector 62, H(n) and H2 (n) respectively represent color histograms of the two shots H1 and H2, N is the level of the histogram, and min(x,y) represents a minimum value of x and y based on an existing histogram intersection method.
The combining unit 104 compares the color similarity calculated by the similarity calculating unit 102 with a threshold value, and combines the two shots in response to the compared result.
The video shot combining/segmenting unit 64 can further include a segmenting unit 106. If a video event component is received through an input terminal IN5, that is, if the result combined by the combining unit 104 has a video event component, the segmenting unit 106 segments the result combined by the combining unit 104 on the basis of the video event component received from the event detector 60 and outputs the segmented results as segments through an output terminal OUT5.
According to an embodiment of the present invention, as shown in
According to another embodiment of the present invention, the video shot combining/segmenting unit 64 can provide a combining/segmenting unit 108 in which the combining unit 104 is integrated with the segmenting unit 106, but, the combining unit 106 and the segmenting unit 106 are provided separately instead, as shown in
For facilitating the understanding of the present embodiment, it is assumed that the size of the search window, that is, the first predetermined number is '8. However, it is to be understood that this is a non-limiting example.
Referring to
For example, the similarity calculating unit 102 can check the similarity of two shots starting from a final buffer. That is, it is assumed that the similarity calculating unit 102 checks the similarity of two shots, by comparing a shot corresponding to color information stored in the first buffer (B#=1) with a shot corresponding to color information stored in an eighth buffer (B#=8), comparing the shot corresponding to the color information stored in the first buffer (B#=1) with a shot corresponding to color information stored in the seventh buffer (B#=7), and then comparing the shot corresponding to the color information corresponding to the first buffer (B#=1) with a shot corresponding to color information stored in the sixth buffer (B#=6), etc.
Under the assumption, the combining/segmenting unit 108 compares the similarity Sim(H1,H8) between the first buffer (B#=1) and the eighth buffer (B#=8) calculated by the similarity calculating unit 102 with a threshold value. If the similarity Sim(H1,H8) between the first buffer (B#=1) and the eighth buffer (B#=8) is smaller than the threshold value, the combining/segmenting unit 108 determines whether or not the similarity Sim(H1,H7) between the first buffer (#B=1) and the seventh buffer (#B=7) calculated by the similarity calculating unit 102 is greater than the threshold value. If the similarity Sim(H1,H7) between the first buffer B#=1 and the seventh buffer B#=7 is greater than the threshold value, all SIDs corresponding to the first through seventh buffers (B#=1 through B#=7) are set to ‘1’. In this case, similarity comparisons between the first buffer (B#=1) and the sixth through second buffers (B#=6 through B#=2)) are not performed. Accordingly, the combining/segmenting unit 108 combines the first through seventh shots.
However, in order to provide a video event, for example, a fade effect, it is assumed that a black frame is included in the fourth shot. In this case, when a video event component is received from the video event detector 60 through an input terminal IN5, the combining/segmenting unit 108 sets SIDs of the first buffer (B#=1) through the fourth buffer (B#=4) to ‘1’ and sets SID of the fifth buffer (B#=5) to ‘2’, as shown in
Then, the combining/segmenting unit 108 checks whether or not to combine or segment shots 5 through 12 belonging to a new search window (that is, a search window 112 shown in
The combining/segmenting unit 108 compares the similarity Sim(H5,H12) between the fifth buffer (B#=5) and the twelfth buffer (B#=12) calculated by the similarity calculating unit 102 with the threshold value. If the similarity Sim(H5,H12) between the fifth buffer (B#=5) and the twelfth buffer (B#=12) is smaller than the threshold value, the combining/segmenting unit 108 determines whether or not the similarity Sim(H5,H12) between the fifth buffer (B#=5) and the eleventh buffer (B#=11) calculated by the similarity calculating unit 102 is greater than the threshold value. If the similarity Sim(H5,H11) between the fifth buffer (B#=5) and the eleventh buffer (B#=11) is greater than the threshold value, the combining/segmenting unit 108 sets all SIDs of the fifth buffer (B#=5) through the eleventh buffer (B#=11) to ‘2’ as shown in
The combining/segmenting unit 108 performs the above operation until SIDs for all shots, that is, for all B#s stored in the buffer 100 are obtained using the color information of the shots stored in the buffer 100.
Referring to
Meanwhile, after operation 40, the audio summarizing unit 12 receives an audio component of the moving-picture through an input terminal IN2, detects an audio event component from the received audio component, combines or segments the segments received from the video summarizing unit 10 on the basis of the detected audio event component, and outputs the combined or segmented result as a summarized result of the moving-picture (operation 42). Here, the audio event means the type of sound of identifying audio components and the audio event component may be one of music, speech, environment sound, hand clapping, a shout of joy, clamor, and silence.
The audio characteristic value generator 120 shown in
The frame divider 150 divides the audio component of the moving-picture received through an input terminal IN9 by a predetermined time, for example, by a frame unit of 24 ms. The feature extractor 152 extracts audio features for the divided frame units. The average/standard deviation calculating unit 154 calculates an average and a standard deviation of a second predetermined number of audio features for the second predetermined number of fames, extracted by the feature extractor 152, and outputs the calculated average and standard deviation as audio characteristic values though an output terminal OUT7.
Conventional methods for generating audio characteristic values from audio components of a moving-picture include U.S. Pat. No. 5,918,223, U.S. Patent Publication No. 2003-0040904, a paper entitled “Audio Feature Extraction and Analysis for Scene Segment and Classification,” by Yao Wang and Tsuhan Chen, and a paper entitled “SVM-based Audio Classification for Instructional Video Analysis,” by Ying Li and Chitra Dorai.
Referring to
Conventional methods for detecting audio event components from audio characteristic values include various statistical learning models, such as a GMM (Gaussian Mixture Model), an HMM (Hidden Markov Model), a NN (Neural Network), an SVM (Support Vector Machine), and the like. Here, a conventional method for detecting audio events using SVM is disclosed in a paper entitled “SVM-based Audio Classification for Instructional Video Analysis,” by Ying Li and Chitra Dorai.
After operation 142, the recombining/resegmenting unit 124 combines or segments the segments received from the video summarizing unit 10 through an input terminal IN8, using the scene transition portions received from the scene transition detector 62 through the input terminal IN7, on the basis of the audio event components received from the audio event detector 122, and outputs the combined or segmented result as a summarized result of the moving-picture through an output terminal OUT6 (operation 144).
The recombining/resegmenting unit 124 receives segments 160, 162, 164, 166, and 168 as shown in
The recombining/resegmenting unit 124 receives segments 180,182, 184, 186, and 188 as shown in
Meanwhile, according to another embodiment of the present invention, the moving-picture summarizing apparatus shown in
The metadata generator 14 receives the summarized result of the moving-picture from the audio summarizing unit 12, generates metadata of the summarized result of the moving-picture, that is, characteristic data, and outputs the generated metadata and the summarized result of the moving-picture to the storage unit 16. Then, the storage unit 16 stores the metadata generated by the metadata generator 14 and the summarized result of the moving-picture and outputs the stored result through an output terminal OUT2.
According to another embodiment of the present invention, the moving-picture summarizing apparatus shown in
The summarizing buffer 18 buffers the segments received from the video summarizing unit 10 and outputs the buffered result to the display unit 20. For performing the operation, the video summarizing unit 10 outputs a previous segment to the summarizing buffer 18 whenever a new segment is generated. The display unit 20 displays the buffered result received from the summarizing buffer 18 and the audio component of the moving-picture received from the input terminal IN2.
According to the present embodiment, the video components of the moving-picture can include EPG information and video components included in a television broadcast signal, and the audio components of the moving-picture can include EPG information and audio components included in a television broadcast signal.
The video summarizing unit 210, the audio summarizing unit 216, the metadata generator 218, the storage unit 220, the summarizing buffer 212, and the display unit 214, shown in
Referring to
The video decoder 206 decodes the video component received from the MUX 204 and outputs the decoded result as the video component of the moving-picture to the video summarizing unit 210. Likewise, the audio decoder 208 decodes the audio component received from the MUX 204 and outputs the decoded result as the audio component of the moving-picture to the audio summarizing unit 216 and the speaker 214. The speaker 215 provides the audio component of the moving-picture as sound.
The video summarizing unit 318, the audio summarizing unit 324, the metadata generator 326, the storage unit 328, the summarizing buffer 320, and the display unit 322, shown in
The moving-picture summarizing apparatus shown in
As shown in
Meanwhile, the above-described embodiments of the present invention can also be embodied as computer readable codes/instructions/programs on a computer readable recording medium. Examples of the computer readable recording medium include storage media, such as magnetic storage media (for example, ROMs, floppy disks, hard disks, magnetic tapes, etc.), optical reading media (for example, CD-ROMs, DVDs, etc.), carrier waves (for example, transmission through the Internet) and the like. The computer readable recording medium can also be distributed over network coupled computer systems so that the computer readable code is stored and executed in a distributed fashion.
In a moving-picture summarizing apparatus and method using events and a computer-readable recording medium for controlling the apparatus, according to the above-described embodiments of the present invention, since shots can be correctly combined or segmented based on contents using video and audio events and a first predetermined number can be variously set according to genre on the basis of EPG information, it is possible to differentially summarize a moving-picture according to genre on the basis of EPG information. Also, since a moving-picture can be summarized in advance using video events, it is possible to summarize a moving-picture at a high speed.
Although a few embodiments of the present invention have been shown and described, the present invention is not limited to the described embodiments. Instead, it would be appreciated by those skilled in the art that changes may be made to these embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.
Number | Date | Country | Kind |
---|---|---|---|
10-2005-0038491 | May 2005 | KR | national |