This application claims the priority of Korean Patent Application No. 2004-0095903, filed on Nov. 22, 2004, in the Korean Intellectual Property Office, the disclosure of which is incorporated herein by reference.
1. Field of the Invention
The present invention relates to a method of and apparatus for summarizing a sports moving picture, and more particularly, to a method and apparatus for summarizing a sports moving picture based on audio and image data contained in the moving picture.
2. Description of Related Art
In general, an image reproduction apparatus such as a personal video recorder (PVR) has a main function of reproducing a moving picture stored in a storage device on a display screen. In addition, the image reproduction apparatus has an additional function of decoding encrypted image data and outputting the decoded image data. Recently, as techniques for networking, digital data storage devices, image compressing, and image recovering have been greatly developed, the image reproduction apparatuses for reproducing digital images stored in the storage devices have been widely popularized.
In general, a long time, for example, about two hours, is taken to reproduce a moving picture of sports such as a football game. Therefore, there is a need for a function of easily and promptly retrieving, editing, and reproducing only the interesting scenes such as goal and shooting scenes. Such a function is called “moving picture summarizing.”
In a conventional moving picture summarizing method, events such as an attack, a fast break attack, and shooting are detected based on information on color, motion, audio, or the like extracted from a sports moving picture, and then, the moving picture is summarized based on important ones of the detected events. In another conventional moving picture summarizing method, a sports moving picture is segmented into play and non-play shots, and then, the moving picture is summarized by joining only the play shots.
However, in the conventional moving picture summarizing methods, unimportant scenes are inserted into the summarized moving picture, so that reliability of the summarized moving picture may decrease. In addition, the moving picture is not effectively summarized, so that the length of the summarized moving picture may be much longer than a desired length thereof.
An aspect of the present invention provides a method and apparatus for summarizing a sports moving picture based on audio and image data extracted from the sports moving picture.
According to an aspect of the present invention, there is provided a method of summarizing a sports moving picture, including: segmenting the sports moving picture into shots and extracting audio and image data of the segmented shots; calculating a level of importance for each of the shots based on the extracted audio and image data; and selecting important shots of the shots based on the calculated levels of importance of the shots and summarizing the sports moving picture based on the selected important shot.
The calculating of the levels of importance may include: detecting events which occur in each of the shots based on the extracted audio and image data; and calculating the level of importance of the shot based on the detected events of the shots.
The events may include at least one of cheering, whistle, important area, and replay events. In addition, in a case where the sports moving picture is of a football game, hokey, a handball game, or the like, the important area may be the penalty area.
The cheering event may be detected from a shot if an STE (short time energy) of extracted audio data of the shot is more than a specified value. The whistle event may be detected from a shot if a ZCR (zero crossing rate) of extracted audio data of the shot is more than a specified value.
The penalty area may be detected by performing: extracting a long view from views of the shot; extracting white regions from the extracted long view; extracting straight-line regions from the extracted white regions; and detecting the penalty area based on the extracted straight-line regions.
The extracting of the long view may be performed by extracting an image of which field color has an occupation ratio more than a specified value as the long view. The extracting of the white regions may be performed by extracting from the extracted long views a region of which brightness is more than a specified multiple of an average brightness of the extracted long view.
The detecting of the penalty area may be performed by detecting the penalty area from the extracted straight-line regions based on slopes of the extracted straight lines
The replay event may be detected if a ZCR of a brightness difference between continuous images of the shot is more than a specified value. The replay event may be detected from the shots which follow in a specified time after a specified event occurs.
The level of importance of the shot may be calculated based on weighting factors allocated to the detected events of the shot. The weighting factors may be allocated to the events by a user.
The important shots may be selected so that a sum of playing times of the important shots is shorter than a summarizing time input by a user.
According to another aspect of the present invention, there is provided a method of summarizing a sports moving picture, including: detecting a field color from the sports moving picture; segmenting the sports moving picture into shots and extracting audio and image data of the segmented shots; detecting audio events of the shots based on the extracted audio data and detecting visual events of the shots based on the detected field color and the extracted image data; calculating a level of importance for each of the shots based on the detected audio and visual events; and summarizing the sports moving picture based on the calculated levels of importance of the shots.
The detecting of the field color may include: obtaining color distributions of pixels of images of the sports moving picture for a specified time; detecting a dominant color in which a largest number pixels are distributed based on the obtained color distributions; and determining adjacent colors in a specified range at a center of the detected dominant color as the field color.
The color distributions may be YUV distributions of the pixels of the images. The field color may be updated every specified time. The field color may be updated every time when a ratio of pixels having the field color to the entire pixels of the image of the sports moving picture is more than a specified value.
According to another aspect of the present invention, there is provided a sports moving picture summarizing apparatus including: a data extraction unit segmenting the sports moving picture into shots and extracting audio and image data of the segmented shots; an event detection unit detecting events of the shots based on the extracted audio and image data; a level-of-importance calculation unit calculating a level of importance of the shots based on a level of importance for each of the shots; and a summarizing unit selecting important shots of the shots based on the calculated levels of importance of the shots and summarizing the sports moving picture by joining the selected important shots.
The events may include at least one of cheering, whistle, important area, and replay events. In addition, in a case where the sports moving picture is of a football game, hokey, a handball game, or the like, the important area may be the penalty area.
The cheering event may be detected from a shot if an STE (short time energy) of extracted audio data of the shot is more than a specified value. The whistle event may be detected from a shot if a ZCR (zero crossing rate) of extracted audio data of the shot is more than a specified value.
The event detection unit may include: a long view extraction unit extracting a long view from views of the shot; a white region extraction unit extracting white regions from the extracted long view; a straight-line region extraction unit extracting straight-line regions from the extracted white regions; and a penalty area detection unit detecting the penalty area based on the extracted straight-line regions.
The replay event may be detected if a ZCR of a brightness difference between continuous images of the extracted shots is more than a specified value. The replay event may be detected from shots which follow in a specified time after a specified event occurs.
The level-of-importance calculation unit may calculate the levels of importance of the shots based on weighting factors allocated to the detected events of the shots. The weighting factors may be allocated to the events by a user.
The important shots may be selected so that a sum of playing times of the important shots is shorter than a summarizing time input by a user.
The sports moving picture summarizing apparatus may further include a field color detection unit detecting a field color from the sports moving picture. The field color detection unit may include: a color distribution calculation unit obtaining color distributions of pixels of images of the sports moving picture for a specified time; a dominant color detection unit detecting a dominant color in which a largest number of pixels are distributed based on the obtained color distributions; and a field color determination unit determining adjacent colors in a specified range at a center of the detected dominant color as the field color.
The color distributions may be YUV distributions of the pixels of the image. The field color may be updated every specified time. The field color may be updated every time when a ratio of pixels having the field color to the entire pixels of the image of the sports moving picture is more than a specified value.
According to other aspects of the present invention, there are provided computer-readable storage media encoded with processing instructions for causing a processor to perform the aforementioned methods.
According to another aspect of the present invention, there is provided an apparatus including: a data extractor segmenting a moving picture into shots and extracting audio data and image data from the segmented shots; an event detector detecting events occurring in the shots based on the extracted audio and image data; a level-of-importance calculator calculating a level of importance for each of the shots based on the detected events; and a summarizer selecting at least one of the shots based on the calculated levels and summarizing the moving picture based on the at least one selected shot.
Additional and/or other aspects and advantages of the present invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
The above and/or other aspects and advantages of the present invention will become apparent and more readily appreciated from the following detailed description, taken in conjunction with the accompanying drawings of which:
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the like elements throughout. The embodiments are described below in order to explain the present invention by referring to the figures.
The operations of the sports moving picture summarizing apparatus of
The field color detection unit 100 analyzes a received sports moving picture to detect a field color of a field where a sports game is playing (operation 1100). In operation 1100, instead of detecting the field color based on the field color detection unit 100, a user may input the field color. Alternatively, a field color data attached to the sports moving picture may be used to detect the field color.
The field color detection unit 100 segments the received sports moving picture into shots, that is, images showing the same scene (operation 1110). Next, audio data is extracted from the sports moving picture, so that the audio and image data of the segmented shots is extracted (operation 1120).
The event detection unit 120 detects events of the shots based on the extracted audio and image data of the shots (operation 1130). The visual events detected from the image data include an important area scene event and a replay event. The important area scene event includes, for example, penalty area scene events, central area scene events, and the like, in case of a football game, hockey, a handball game, or the like. The audio events detected from the audio data include a cheering event, a whistle event, and the like. In addition, it is preferable that a function of defining desired events be provided to a user.
The level-of-importance calculation unit 130 receives the event data of the shots input from the event detection unit 120 and calculates levels of importance of the shots based on the events (operation 1140). The levels of importance of the shots may be calculated based on weighting factors allocated to the events. When a cheering event and a penalty area scene event are detected at a first shot, if the weighting factors of 2 and 10 are allocated to the cheering and penalty area scene events, respectively, the level of importance of the first shot becomes 12.
The weighting factors may be allocated to the events in advance by taking into consideration the levels of importance of the events. Alternatively, the user may allocate the weighting factors to the events or modify the pre-allocated weighting factors. For example, if the user desires to watch only penalty scenes in a moving picture of a football game, an arbitrary weighting factor is allocated to only the whistle event, and the weighting factor of 0 is allocated to the remaining events.
The summarizing unit 140 receives the levels of importance of the shots input from the level-of-importance calculation unit 130, selects important shots based on the levels of importance of the shots (operation 1150), and summarizes the sports moving picture based on the selected shots (operation 1160). In operation 1150, in a case where the user inputs an extracted audio and image data ed length (time) of the summarized moving picture, the important shots are selected so that a sum of playing times of the important shots is shorter than the desired length (time) of the summarized moving picture input by the user. For example, when the user desires to summarize a football game moving picture having 200 segmented shots into a summarized moving picture having a length of 1 minute, if the sum of replay times of the highest 20 important shots is 58 seconds and the sum of the replay times of the highest 21 important shots is 1 minute and 5 seconds, the highest 20 important shots are selected as the aforementioned important shots. The selected important shots are coupled in a time sequence to generate a summarized moving picture.
The operation of the field color detection unit 100 of
The color distribution calculation unit 200 integrates colors of all pixels belonging to continuous images of the sports moving picture for a specified time t to obtain the color distribution of the pixels (operation 1200). The color distribution is a YUV color distribution.
In order to reduction calculation data amount and time taken to detect the field color, before the color distribution is obtained in the color distribution calculation unit 200, the view sizes of the images are reduced.
The dominant color detection unit 210 receives the information of the color distribution of the pixels input from the color distribution calculation unit 200 and detects a dominant color, that is, a color of a pixel having a largest color distribution (operation 1210). The field color determination unit 220 determines adjacent colors in a specified range at a center of the detected dominant color as the field colors (operation 1220).
In order to detect the field colors depending on weather, time, and illumination of the field, the field colors is updated by repeatedly performing the aforementioned operations of
Now, an operation of detecting events from the audio and image data by the event detection unit 120 will be described in detail.
The operations of the event detection unit 120 of
The long view extraction unit 600 extracts a long view from views of the shots (operation 1310).
The white region extraction unit 610 extracts white regions from the extracted long views (operation 1130). In the white region extraction unit 610, the white regions are obtained by extracting regions (pixels), each of which brightness is more than a specified multiple of an average brightness of the extracted long views. For example, the white regions are extracted by the pixels, each of which brightness is more than 1.2 times the average brightness of the extracted long views.
The straight-line region extraction unit 620 extracts straight-line regions from the extracted white regions (operation 1320). The straight-line region extraction unit 620 extracts the straight-line regions based on a Hough transformation scheme. In the Hough transformation scheme, a set of points, of which two points constitute a straight line having a slope larger than a specified value, is extracted as a straight-line region.
The penalty area detection unit 630 detects the penalty area based on the extracted straight-line regions (operation 1130).
In the event detection unit 120, the ZCRs of the brightness differences between the consecutive images belonging to the shots are obtained, and if the ZCRs are more that a specified value, it is determined that the replay events occur in the associated shots. The ZCR can be calculated by using Equation 1.
Here, Zc, t, θ, L, D, and f denote a ZCR, a time, a specified threshold value, a length of a normalized window of each of the images, an intensity difference between the images, and the number of repeating images, respectively. In the event detection unit 120, it is determined whether the replay events occur in a specified time, for example, in 2 minutes after a specific event (for example, a penalty event) occurs.
Now, an operation of detecting a whistle event from the aforementioned audio data of the sports moving picture by the event detection unit 120 will be described. In general, a whistle has a large ZCR like a voice which is generated by a vibration of a vocal cord of a human. In the event detection unit 120, the ZCRs of the audio date of the shots are calculated by using Equation 2, and if the ZCRs are more than a specified value, it is determined that the whistle events occur in the associated shots.
Here, Zc, w(m-n), s(n), and N denote a ZCR, a normalized window function of audio data, a size of n-th audio data, and the number of audio data samples.
Now, an operation of detecting a cheering event from the aforementioned audio data of the sports moving picture by the event detection unit 120 will be described. In the event detection unit 120, short time energy (STE) of the audio date of the shots are calculated by using Equation 3, and if the STE is more than a specified value, it is determined that the cheering event occurs in the associated shot.
Here, Es, w(m-n), s(n), and N denote an STE of audio data, a normalized window function of audio data, a size of n-th audio data, and the number of audio data samples.
In a method and apparatus for summarizing a sports moving picture according to the above-described embodiments of present invention, the sports moving picture are summarized by calculating levels of importance of shots in the sports moving picture based on extracted audio and image data and selecting important shots based on the calculated levels of importance, so that reliability of the summarized sports moving picture can increase and a user can generate a desired length of summarized sports moving picture.
The above-described embodiments of the present invention can also be embodied as computer readable codes on a computer readable recording medium. The computer readable recording medium is any data storage device that can store data which can be thereafter read by a computer system. Examples of the computer readable recording medium include read-only memory (ROM), random-access memory (RAM), CD-ROMs, magnetic tapes, floppy disks, optical data storage devices, and carrier waves (such as data transmission through the Internet). The computer readable recording medium can also be distributed over network coupled computer systems so that the computer readable code is stored and executed in a distributed fashion.
Although a few embodiments of the present invention have been shown and described, the present invention is not limited to the described embodiments. Instead, it would be appreciated by those skilled in the art that changes may be made to these embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.
Number | Date | Country | Kind |
---|---|---|---|
10-2004-0095903 | Nov 2004 | KR | national |