The above and/or other aspects and advantages of the present invention will become apparent and more readily appreciated from the following detailed description, taken in conjunction with the accompanying drawings of which:
Reference will now be made in detail to exemplary embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the like elements throughout. The exemplary embodiments are described below in order to explain the present invention by referring to the figures.
Referring to
The event detection unit 110 detects a video event and an audio event from multimedia contents. Specifically, the video event is generated from at least any one of a scene transition part and a contents change part of the multimedia contents, and an audio event is generated according to an auditory component change.
The event detection unit 110 detects the video event by referring to shot information, corresponding to a shot extracted from a video signal of the multimedia contents. The shot information may include at least any one of shot time information and shot color information, corresponding to the shot. The shot in this specification indicates a predetermined multimedia frame section which is divided by a single camera movement when recording the multimedia, and a basic process unit to divide the multimedia contents into each scene.
Also, as an embodiment of the present invention, the video event, detected from the event detection unit 110, is generated according to application of a GT effect. The GT effect indicates a graphic effect which is intentionally inserted into a transition part of the multimedia contents. Therefore, the point where the GT effect is applied is considered to be where a contents change has occurred in the transition part of the multimedia contents. As an example, the GT effect may include at least any one of a fade effect, a dissolve effect, and a wipe effect. Generally, the fade effect exits between a frame to be faded-in and a frame to be faded-out, and a single color frame exits in a center of frames.
Referring to
Also, as another embodiment of the present invention, the event detection unit 110 calculates an average and a standard deviation of an audio feature, corresponding to each frame, using an audio feature extracted by a predetermined frame from an audio signal of the multimedia contents, and detects the audio event using the calculated average and the standard deviation of the audio feature. The audio feature may include at least any one of a Mel-frequency cepstral coefficient (MFCC), a spectral flux, a centroid, a rolloff, a Zero Crossing Rate (ZCR), an energy, and a pitch.
Specifically, the event detection unit 110 generates an audio feature value using the calculated average and the standard deviation of the audio feature, and detects the audio event, generated according to the auditory component change, by dividing the audio features using the audio feature value.
The segment generation unit 120 generates at least one segment by dividing or merging at least one shot which forms the multimedia contents, by referring to the video event.
Referring to
The shot color information reader 310 reads shot color information which is included in a predetermined search window size, from an event buffer, the event buffer recoding the shot color information corresponding to a shot, included in the video event. As an example, the search window size may be determined by an electronic program guide (EPG).
The similar shot color detection unit 320 calculates a similarity between the read shot color information using Equation 1 below, and detects similar shot color information using the calculated similarity.
The segment merging unit 330 merges the similar shot color information to generate a segment.
Referring to
Initially, the segment generation unit 120 of
As illustrated in part II of
More specifically, the shot color information reader 310 reads shot color information included in the search window size 410, the at least one shot being included in the search window size 410, and the similar shot color detection unit 320 of
In this case, the similar shot color detection unit 320 of
As another example, when a frame where the fade effect, i.e. the GT effect, has been applied is included in a fourth buffer B# 4 as illustrated in
Referring back to
Referring to
The event feature extraction unit 510 extracts event feature information with respect to a video event and an audio event corresponding to the segment.
As an embodiment of the present invention, the event feature information with respect to the video event corresponds to a shot change rate of the video event, and the shot change rate of the video event is calculated using Equation 2 below.
As another embodiment of the present invention, the event feature information with respect to the audio event corresponds to an audio signal energy, and the audio signal energy is calculated using Equation 3 below.
As still another embodiment of the present invention, the event feature information corresponds to music class ratio of the audio event, and the music class ratio is calculated using Equations 4 and 5 below.
The uprush calculation unit 520 calculates the uprush degree corresponding to each of the segments using the event feature information.
The selection unit 530 selects a segment whose uprush degree is greater than a predetermined level according to the calculated uprush degree.
As an example of the selection unit 530, the selection unit 530 selects a segment whose uprush degree is greater than the predetermined level by applying a weight to at least any one of the shot change rate, the audio signal energy, and the music class ratio of the audio event. As an example, when it is determined the music class rate of the audio event of the audio event is important, the selection unit 530 selects the segment by applying the weight, e.g. 5:2:3, with respect to the shot change rate, the audio signal energy and the music class ratio of the audio event. As another example of the selection unit 530, the selection unit 530 selects the segment according to at least any one of a user's request, a type of multimedia contents, and a desired time. As an example, when the multimedia contents is an action movie, since the shot change rate, the audio signal energy, and the music class ratio of the audio event are important, selection unit 530 selects the segment by applying the weight, e.g. 4:3:3, with respect to the shot change rate, the audio signal energy, and the music class ratio of the audio event.
Referring back to
Referring to
As an example of operation S610, the video event may be detected by referring to shot information, the shot information corresponding to a shot which is extracted from a video signal of the multimedia contents. The shot information may include at least any one of shot time information and shot color information corresponding to the shot.
As an embodiment of the present invention, the video event may be generated according to application of a GT effect. The GT effect indicates a graphic effect which is intentionally inserted into a transition part of the multimedia contents. Therefore, it is considered that a contents change has occurred from the transition part of the multimedia contents, the point where the GT effect is applied. As an example, the GT effect may include at least any one of a fade effect, a dissolve effect and a wipe effect.
As another example of operation S610, an average and a standard deviation of an audio feature, corresponding to each frame, using an audio feature which is extracted from an audio signal of the multimedia contents for a predetermined frame, is calculated, and the audio event is detected using the calculated average and the standard deviation of the audio feature. As an example, the audio feature may include at least any one of a Mel-frequency cepstral coefficient (MFCC), a spectral flux, a centroid, a rolloff, a Zero Crossing Rate (ZCR), an energy, and a pitch.
In operation S620, the summary clip generation method generates at least one segment by dividing or merging at least one shot which forms the multimedia contents, by referring to the video event.
Referring to
In operation S720, the summary clip generation method calculates a similarity between the read shot color information using Equation 1 below, and detects similar shot color information using the calculated similarity.
In operation S730, the summary clip generation method generates a segment by merging the similar shot color information.
Referring back to
Referring to
As an embodiment of the present invention, the event feature information with respect to the video event corresponds to a shot change rate of the video event, and the shot change rate of the video event is calculated using Equation 2 below.
As another embodiment of the present invention, the event feature information with respect to the audio event corresponds to an audio signal energy, and the audio signal energy is calculated using Equation 3 below.
As still another embodiment of the present invention, the event feature information corresponds to music class ratio of the audio event, and the music class ratio is calculated by Equations 4 and 5 below.
Also, in operation S820, the summary clip generation method calculates the uprush degree corresponding to each of the segments using the event feature information.
Also, in operation S830, the summary clip generation method selects a segment whose uprush degree is greater than a predetermined level according to the calculated uprush degree.
As an example of the operation S830, the summary clip generation method selects a segment whose uprush degree is greater than the predetermined level by applying a weight to at least any one of the shot change rate, the audio signal energy, and the music class ratio of the audio event. As another example of the selection unit 530, the selection unit 530 selects the segment according to at least any one of a user's request, a type of multimedia contents, and a desired time.
Referring back to
Hereinafter, a detailed description will be omitted since the summary clip generation method according to the present invention is similar to the method described above, and the aforementioned embodiments from
The summary clip generation method according to the above-described embodiment of the present invention may be recorded in computer-readable media including program instructions to implement various operations embodied by a computer. The media may also include, alone or in combination with the program instructions, data files, data structures, and the like. Examples of computer-readable media include magnetic media such as hard disks, floppy disks, and magnetic tape; optical media such as CD ROM disks and DVD; magneto-optical media such as optical disks; and hardware devices that are specially configured to store and perform program instructions, such as read-only memory (ROM), random access memory (RAM), flash memory, and the like. The media may also be a transmission medium such as optical or metallic lines, wave guides, and the like, including a carrier wave transmitting signals specifying the program instructions, data structures, and the like. Examples of program instructions include both machine code, such as produced by a compiler, and files containing higher level code that may be executed by the computer using an interpreter. The described hardware devices may be configured to act as one or more software modules in order to perform the operations of the above-described embodiments of the present invention.
According to the present invention, there is provided a summary clip generation system and a summary clip generation method which can generate a summary clip of multimedia contents using uprush degree of at least one segment which is calculated by dividing or merging a shot forming the multimedia contents.
Also, according to the present invention, there is provided a summary clip generation method which can satisfy a user's need since a summary clip is generated by selecting a segment according to a user's requirements or a type of multimedia contents.
Also, according to the present invention, there is provided a summary clip generation method which can accurately extract a highlight portion since a summary clip of multimedia contents is generated using a shot change rate, an audio signal energy, and a music class ratio.
Although a few exemplary embodiments of the present invention have been shown and described, the present invention is not limited to the described exemplary embodiments. Instead, it would be appreciated by those skilled in the art that changes may be made to these exemplary embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.
Number | Date | Country | Kind |
---|---|---|---|
10-2006-0079788 | Aug 2006 | KR | national |