The present disclosure is generally related to video signal processing and, more particularly, is related to systems, methods, and computer readable media having programs for classifying sports video.
In recent years, among the various kinds of multimedia, video is becoming an important component. Video refers to moving images together with sound and can be transmitted, received, and stored in a variety of techniques and formats. Video can include many different genres including, but not limited to episodic programming, movies, music, and sports, among others. End users, editors, viewers, and subscribers may wish to view only selected types of content within each genre. For example, a sports viewer may have great interest in identifying specific types of sporting events within a video stream or clip. Previous methods for classifying sports video have required the analysis of video segments and corresponding motion information. These methods, however, require significant processing resources that may be costly and cumbersome to employ.
Embodiments of the present disclosure provide a system, method, and computer readable medium having a program for classifying sports video. In one embodiment a system includes: logic configured to collect a plurality of key audio samples from a plurality of types of sports; logic configured to extract a plurality of sample audio features from a plurality of frames within each of the plurality of key audio samples; logic configured to generate a plurality of patterns corresponding to the plurality of key audio samples; logic configured to extract a plurality of audio features from the plurality of frames within an audio stream of a video clip; logic configured to compare the plurality of sample audio features in the plurality of patterns with the a plurality of audio features extracted from the audio stream; and logic configured to classify the video clip based on the location and the frequency of the key audio components.
In another embodiment, a method includes: extracting, from an audio stream of a video clip, a plurality of key audio components contained therein; and classifying, using at least one of the plurality of key audio components, a sport type contained in the video clip.
In a further embodiment, a computer readable medium having a computer program for classifying sports video includes: logic configured to extract a plurality of key audio components from a video clip; and logic configured to classify a sport type corresponding to the video clip.
Other systems and methods will be or become apparent from the following drawings and detailed description.
The components in the drawings are not necessarily to scale, emphasis instead being placed upon clearly illustrating the principles of the present disclosure. In the drawings, like reference numerals designate corresponding parts throughout the several views.
Reference will now be made to the drawings. While the disclosure will be provided in connection with these drawings, there is no intent to limit the disclosure to the embodiment or embodiments disclosed herein. On the contrary, the intent is to cover all alternatives, modifications, and equivalents.
Beginning with
In block 104, features are extracted for the key audio components. Features can include, but are not limited to, mel-frequency cepstrum coefficients 106, noise frame ratio 107, and pitch 108. For example, other features that can be used include LPC coefficients 109, LSP coefficients 111, audio energy 113, and zero-crossing rate 114. The mel-frequency cepstrum coefficients are derived from the known variation of critical bandwidths of the human ear. Filters are spaced linearly at low frequencies and logarithmically at high frequencies and a compact representation of an audio feature can be produced using coefficients corresponding to each of the bandwidths. These features can be extracted in a frame by frame manner. After the features are extracted in block 104, a pattern is built for each key audio component 110. The pattern can include the specific mel-frequency cepstrum coefficients 106 and pitch 108 that are exclusive to the key audio component. A model for each key audio component is trained in block 112 in order to capture the unique audio characteristics of the key audio component without being constrained by the limitations of a particular sample of the key audio component.
Reference is now made to
Distribution and frequency characteristics corresponding to a specific sport are matched in block 130 and the video is classified as a particular sports type in block 132. The distribution and frequency characteristics in combination with the identity of the key audio components can be used to specifically classify the sports type. For example, the distribution and frequency of a ball bouncing in a basketball game are distinctive from those occurring in a tennis match. Further, optionally, the video clips can be edited based on the location and distribution of key audio components in block 134. For example, in a football game the time between plays can be edited out of a video clip by retaining the portion of the video segment that occurs starting a few seconds before a helmet collision key audio component and ending a few seconds after a whistle key audio component. One example of an audio distribution coefficient is the tempo of the occurrence of a key audio component, as expressed in, for example, the time domain.
Reference is now made to
By way of example, a key audio component 152 may be extracted from a sports video and identified as a whistle. In the absence of any other key audio components 152, if the whistle occurs at a low frequency throughout the event and at irregular intervals, the event may be classified based on the timing and/or regularity of the whistle (e.g. classified as a soccer match). Alternatively, where the video includes a second key audio component 152 such as a ball/floor collision or a stick/puck collision, the sport type 150 can be categorized with greater certainty (e.g. as a basketball or hockey game, respectively). Other sport types 150 may include the same key audio component 152 and only be distinguishable by frequency 154 for distribution 156. For example, the key audio component of a water splash as found in swimming and diving events may differ only in the frequency of the water splash. Alternatively, the swimming event may include a second key audio component 152 such as a starter pistol. In this manner, many different formats of the same sport can be classified through the use of the key audio components 150.
Reference is now made to
Similarly, the audio component sample string of
Reference is now made to
The system 180 further includes logic to extract audio features from an audio stream of a video clip in block 190. A video clip can be a digital or analog streaming video signal or a video stored on a variety of storage media types. The audio features from the patterns are compared to the audio features from the audio stream in the logic of block 194. Based on the outcome of the comparison and the logic of block 194, the video is classified by sports type in the logic block 196. In addition to comparing the type of audio components in the patterns and the video clip, the distribution and frequency of occurrence of the key audio components can also be compared.
Reference is now made to
The key audio components correspond to specific identifiable audio events that occur within the audio stream of specific sports videos. For example, a key audio component of an auto racing sporting event can be an engine sound. Similarly, a key audio component of a sumo wrestling event can be the unique referee sound made at the beginning of each match. Further, the sound of a whistle may be a key audio component in a basketball game. While the whistle is not exclusive to the game of basketball, the frequency of whistle sounds combined with the distribution of the whistle sounds can be utilized in combination with other key audio components to determine whether or not a sports video is a basketball game. For example, a sports video of a hockey game may include the key audio component of a whistle and a key audio component of a skate/ice engagement sound. Alternatively, a soccer match may include a whistle sound as a key audio component with few or no other key audio components identified. In this case, the frequency of occurrence in the distribution throughout the match would be used to identify the video as a soccer match.
After extracting the key audio components, the method 200 further includes classifying a sport type contained in the video in block 230. By determining the type of key audio components, the frequency of occurrence of each of the key audio components, and the distribution of the key audio components within the video, the video can be classified as a specific sport type.
Reference is now made to
The computer-readable medium 300 also includes logic to classify a sports type of the video in block 330. Classification is performed by comparing the key audio components from samples with the key audio component data from the video. In some embodiments a pattern corresponding to each key audio component can be used for the comparison. In this manner, the key audio components can be identified. Additionally, the frequency of occurrence within the video and the distribution throughout the video are determined for classifying the sports type of video. The distribution and frequency information can be further utilized to edit the video.
Embodiments of the present disclosure can be implemented in hardware, software, firmware, or a combination thereof. Some embodiments can be implemented in software or firmware that is stored in a memory and that is executed by a suitable instruction execution system. If implemented in hardware, an alternative embodiment can be implemented with any or a combination of the following technologies, which are all well known in the art: a discrete logic circuit(s) having logic gates for implementing logic functions upon data signals, an application specific integrated circuit (ASIC) having appropriate combinational logic gates, a programmable gate array(s) (PGA), a field programmable gate array (FPGA), etc.
Any process descriptions or blocks in flow charts should be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps in the process, and alternate implementations are included within the scope of an embodiment of the present disclosure in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present disclosure.
A program according to this disclosure that comprises an ordered listing of executable instructions for implementing logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. In the context of this document, a “computer-readable medium” can be any means that can contain, store, communicate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. The computer readable medium can be, for example but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device. More specific examples (a nonexhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic) having one or more wires, a portable computer diskette (magnetic), a random access memory (RAM) (electronic), a read-only memory (ROM) (electronic), an erasable programmable read-only memory (EPROM or Flash memory) (electronic), an optical fiber (optical), and a portable compact disc read-only memory (CDROM) (optical). In addition, the scope of the present disclosure includes embodying the functionality of the illustrated embodiments of the present disclosure in logic embodied in hardware or software-configured mediums.
It should be emphasized that the above-described embodiments of the present disclosure, particularly, any illustrated embodiments, are merely possible examples of implementations. Many variations and modifications may be made to the above-described embodiment(s) of the disclosure without departing substantially from the spirit and principles of the disclosure.
Number | Name | Date | Kind |
---|---|---|---|
3992029 | Washizawa et al. | Nov 1976 | A |
4360213 | Rudwick et al. | Nov 1982 | A |
4723364 | Marxer | Feb 1988 | A |
4805749 | Gerch | Feb 1989 | A |
4830154 | Gerch et al. | May 1989 | A |
5387176 | Markoll | Feb 1995 | A |
5538406 | Siegal et al. | Jul 1996 | A |
5665049 | Markoll | Sep 1997 | A |
5731844 | Rauch et al. | Mar 1998 | A |
5754939 | Herz et al. | May 1998 | A |
5758259 | Lawler | May 1998 | A |
5798785 | Hendricks et al. | Aug 1998 | A |
5912696 | Buehl | Jun 1999 | A |
5970447 | Ireton | Oct 1999 | A |
6020883 | Herz et al. | Feb 2000 | A |
6038367 | Abecassis | Mar 2000 | A |
6085235 | Clarke, Jr. et al. | Jul 2000 | A |
6177931 | Alexander et al. | Jan 2001 | B1 |
6185527 | Petkovic et al. | Feb 2001 | B1 |
6195661 | Filepp et al. | Feb 2001 | B1 |
6199076 | Logan et al. | Mar 2001 | B1 |
6236395 | Sezan et al. | May 2001 | B1 |
6268849 | Boyer et al. | Jul 2001 | B1 |
6295092 | Hullinger et al. | Sep 2001 | B1 |
6345252 | Beigi et al. | Feb 2002 | B1 |
6404925 | Foote et al. | Jun 2002 | B1 |
6476308 | Zhang | Nov 2002 | B1 |
6570991 | Scheirer et al. | May 2003 | B1 |
6654721 | Handelman | Nov 2003 | B2 |
6710822 | Walker et al. | Mar 2004 | B1 |
6714910 | Rose et al. | Mar 2004 | B1 |
6847682 | Liang | Jan 2005 | B2 |
6928407 | Ponceleon et al. | Aug 2005 | B2 |
6996171 | Walker et al. | Feb 2006 | B1 |
7051352 | Schaffer | May 2006 | B1 |
7096486 | Ukai et al. | Aug 2006 | B1 |
7277537 | Li | Oct 2007 | B2 |
7370276 | Willis | May 2008 | B2 |
7454331 | Vinton et al. | Nov 2008 | B2 |
7533399 | Ma et al. | May 2009 | B2 |
7581237 | Kurapati | Aug 2009 | B1 |
7600244 | Maruyama et al. | Oct 2009 | B2 |
7716704 | Wang et al. | May 2010 | B2 |
7774288 | Acharya et al. | Aug 2010 | B2 |
7885963 | Sanders | Feb 2011 | B2 |
7962330 | Goronzy et al. | Jun 2011 | B2 |
20010023401 | Weishut et al. | Sep 2001 | A1 |
20020016966 | Shirato | Feb 2002 | A1 |
20020093591 | Gong et al. | Jul 2002 | A1 |
20020133499 | Ward et al. | Sep 2002 | A1 |
20020157116 | Jasinschi | Oct 2002 | A1 |
20030007001 | Zimmerman | Jan 2003 | A1 |
20030093329 | Gutta | May 2003 | A1 |
20030093790 | Logan et al. | May 2003 | A1 |
20030097186 | Gutta et al. | May 2003 | A1 |
20030097196 | Gutta et al. | May 2003 | A1 |
20030101451 | Bentolila et al. | May 2003 | A1 |
20030147466 | Liang | Aug 2003 | A1 |
20040070594 | Burke | Apr 2004 | A1 |
20040078188 | Gibbon et al. | Apr 2004 | A1 |
20040098376 | Li et al. | May 2004 | A1 |
20040158853 | Doi et al. | Aug 2004 | A1 |
20040201784 | Dagtas et al. | Oct 2004 | A9 |
20040210436 | Jiang et al. | Oct 2004 | A1 |
20050131688 | Goronzy et al. | Jun 2005 | A1 |
20050139621 | Foster | Jun 2005 | A1 |
20050160449 | Goronzy et al. | Jul 2005 | A1 |
20050195331 | Sugano et al. | Sep 2005 | A1 |
20050216260 | Ps et al. | Sep 2005 | A1 |
20060123448 | Ma et al. | Jun 2006 | A1 |
20060149693 | Otsuka et al. | Jul 2006 | A1 |
20060251385 | Hwang et al. | Nov 2006 | A1 |
20070216538 | Thelen et al. | Sep 2007 | A1 |
20070250313 | Chen et al. | Oct 2007 | A1 |
20070271287 | Acharya et al. | Nov 2007 | A1 |
20080138029 | Xu et al. | Jun 2008 | A1 |
20080140406 | Burazerovic et al. | Jun 2008 | A1 |
20110106531 | Liu et al. | May 2011 | A1 |
Number | Date | Country |
---|---|---|
1531457 | May 2005 | EP |
Entry |
---|
Xiong, Z., Radhakrishnan, R., Divakaran, A., and Huang, T. S. 2003a. Audio-based highlights extraction from baseball, golf and soccer games in a unified framework. In Proceedings of the ICASSP Conference (Hong Kong, China). |
M. Xu, N. C. Maddage, C.-S. Xu, M. Kankanhalli, Q. Tian, “Creating Audio Keywords for Event Detection in Soccer Video,” In Proc. of ICME 2003, pp. 281-284, 2003. |
R. Radhakrishan, Z. Xiong, A. Divakaran, Y. Ishikawa, “Generation of sports highlights using a combination of supervised & unsupervised learning in audio domain”, In Proc. of International Conference on Pacific Rim Conference on Multimedia, vol. 2, pp. 935-939, Dec. 2003. |
Sadlier, D., Marlow, S., O'Connor, N., and Murphy, N. MPEG Audio Bitstream Processing Towards the Automatic Generation of Sports Programme Summaries. In Proceedings of the IEEE International Conference on Multimedia and Expo (ICME 2002) (Lausanne, Switzerland, Aug. 26-29, 2002). |
E. Wold, T. Blum, D. Keislar, and J. Wheaton, “Content-based classification, search, and retrieval of audio,” IEEE Multimedia, vol. 3, No. 2, 1996. |
Tjondronegoro, “Content-based Video Indexing for Sports Applications using Integrated Multi-Modal Approach”, PhD Thesis, Deakin University, May 2005. |
W. Zhou, S. Dao, and C.-C. Jay Kuo, On-Line Knowledge- and Rule-Based Video Classification System for Video Indexing and Dissemination,Information Systems, vol. 27, No. 8, 2002, pp. 559-586. |
D. Tjondronegoro, Y.-P.P. Chen, and B. Pham, “Sports Video Summarization Using Highlights and Play-Breaks,” Proc. ACM SIGMM Int'l Workshop Multimedia Information Retrieval, ACM Press, 2003, pp. 201-208. |
Messer K et al.,“Automatic Sports Classification” Pattern Recognition, 2002. Proceedings. |
Zhu Liu et al., “Audio Feature Extraction and Analysis for Scene Segmentation and Classification” Journal of VLSI signal processing systems for signal, image, and video technology, Springer, New York, NY, US, vol. 20, No. 1/2, Oct. 1998 (Oct. 1998), pp. 61-78, Xp000786728 ISSN: 0922-5773. |
Kieron Messer, William Christmas, and Josef Kittler, “Automatic Sports Classification”, University of Surrey, Guildford GU2 7XH.Uk., pp. 1005-1008. |
Zhu Liu, Yao Wang, and Tsuhan Chen, “Audio Feature Extraction and Analysis for Scene Segmentation and Classification,” XP-000786728, 1998, pp. 61-78. |
European Search Report on Application No. EP 06 02 0505, place of search The Hague, Jul. 2, 2007. |
Number | Date | Country | |
---|---|---|---|
20070250777 A1 | Oct 2007 | US |