Systems and methods for analyzing video content

Abstract
Disclosed are systems, methods, and computer readable media having programs for analyzing video. In one embodiment, a method includes: detecting a plurality of whistle sounds in an audio stream of a video; and determining a video content based on a plurality of properties corresponding to the plurality of whistle sounds. In one embodiment a computer readable medium having a computer program for analyzing video includes: logic configured to generate a plurality of whistle sound patterns; logic configured to detect a whistle sound in a video; and logic configured to analyze the video using the whistle sound.
Description
TECHNICAL FIELD

The present disclosure is generally related to video signal processing and, more particularly, is related to systems, methods, and computer readable media having programs for analyzing the content of video.


BACKGROUND

In recent years, among the various kinds of multimedia, video is becoming an important component. Video refers to moving images together with sound and can be transmitted, received, and stored in a variety of techniques and formats. Video can include many different genres including, but not limited to episodic programming, movies, music, and sports, among others. End users, editors, viewers, and subscribers may wish to view only selected types of content within each genre. For example, a sports viewer may have great interest in identifying specific types of sporting events within a video stream or clip. Previous methods for classifying sports video have required the analysis of video segments and corresponding motion information. These methods, however, require significant processing resources that may be costly and cumbersome to employ.


SUMMARY

Embodiments of the present disclosure provide a system, method and computer readable medium having a program for analyzing video content. In one embodiment a system includes: logic configured to collect sample whistle sounds corresponding to a plurality of sport types; logic configured to determine a plurality of sample whistle features; logic configured to generate a plurality of whistle sound patterns; logic configured to extract a plurality of audio features corresponding to a plurality of frames in a video; logic configured to compare the plurality of sample whistle features with the plurality of audio features to determine a plurality of whistle sounds in the video; logic configured to determine a sport type using a type of whistle indicator; logic configured to determine a sport type using a quantity of whistle occurrences data value; and logic configured to determine a sport type using a time of whistle occurrences data set.


In another embodiment, a method includes: detecting a plurality of whistle sounds in an audio stream of a video; and determining a video content based on a plurality of properties corresponding to the plurality of whistle sounds.


In a further embodiment, a computer readable medium having a computer program for analyzing video includes: logic configured to generate a plurality of whistle sound patterns; logic configured to detect a whistle sound in a video; and logic configured to analyze the video using the whistle sound.


Other systems and methods will be or become apparent to one with skill in the art upon examination of the following drawings and detailed description.




BRIEF DESCRIPTION OF THE DRAWINGS

Many aspects of the disclosure can be better understood with reference to the following drawings. The components in the drawings are not necessarily to scale, emphasis instead being placed upon clearly illustrating the principles of the present disclosure. Moreover, in the drawings, like reference numerals designate corresponding parts throughout the several views.



FIG. 1 is a block diagram illustrating an embodiment of building whistle patterns for use in analyzing video.



FIG. 2 is a block diagram illustrating an embodiment that uses the patterns of FIG. 1 to analyze video.



FIG. 3 is a table illustrating exemplary embodiments of sports types as related to whistle sounds.



FIGS. 4A-4C are diagrams illustrating audio sample strings with whistle sounds corresponding to different sports types.



FIGS. 5A and 5B are diagrams illustrating audio strings with whistle sounds corresponding to entire events of two different sports types.



FIG. 6 is a block diagram illustrating an embodiment of a system for analyzing video.



FIG. 7 is a block diagram illustrating an embodiment of a method for analyzing video.



FIG. 8 is a block diagram illustrating an embodiment of a computer readable medium having a program for analyzing video.




DETAILED DESCRIPTION

Having summarized various aspects of the present disclosure, reference will now be made in detail to the description of the disclosure as illustrated in the drawings. While the disclosure will be described in connection with these drawings, there is no intent to limit it to the embodiment or embodiments disclosed herein. On the contrary, the intent is to cover all alternatives, modifications and equivalents included within the spirit and scope of the disclosure as defined by the appended claims.


Beginning with FIG. 1, illustrated is a block diagram of an embodiment for building whistle patterns for analyzing video. The patterns can include patterns of one or more data features for whistle sounds. The patterns can be compared to the data features of video clips to determine the whistle sounds present in the video. In building the patterns, whistle sound samples are collected for different sports in block 102. The whistle sound samples can be collected for any number of sports including, but not limited to, football, soccer, basketball, lacrosse, hockey, and field hockey, among others. Examples of how whistles are used within these types of sports can include, for example, starting and stopping plays, signaling the start and end of periods of play, fouls, penalties, and time-outs, among others.


In block 104, features of sample whistle sounds are extracted from an audio sample in a frame-by-frame manner. Features can include, but are not limited to, mel-frequency cepstrum coefficients 106, noise frame ratio 107, and pitch 108. For example, other features that can be used include LPC coefficients 109, LSP coefficients 111, audio energy 113, and zero-crossing rate 114. The mel-frequency cepstrum coefficients are derived from the known variation of critical band-widths of the human ear. Filters are spaced linearly at low frequencies and logarithmically at high frequencies and a compact representation of an audio feature can be produced using coefficients corresponding to each of the band-widths. After the features are extracted in block 104, a whistle sound pattern is built for whistles corresponding to each of the different sports 110. The pattern can include the specific mel-frequency cepstrum coefficients 106 and pitch 108 that are statistically exclusive to the whistles used in different sport types.


Reference is now made to FIG. 2, which is a functional block diagram illustrating use of the patterns of FIG. 1 to analyze video. A video is input in block 120 and the sound features are extracted from the video clip in block 122. The video clip can be a digital or analog streaming video signal or a video stored on a variety of storage media types. For example, the video can be stored in solid state hardware or on magnetic or optical storage media using analog or digital technology. The extracted sound features are compared to whistle sound patterns 126, in block 124. The occurrences of whistles in the video are determined in block 128.


A sports type is determined in block 130 based on whistle occurrences. For example, by analyzing whistle occurrence characteristics, it can be determined that the video is, for example, a soccer match by using the quantity of whistles and the time between each of the whistles or groups of whistles. Further, optionally, the video clips can be manipulated based on the whistle information in block 132. For example, in a football game the time between plays can be edited out of a video by retaining the portion of the video segment that occurs starting a few seconds before a whistle sound that is determined to be a play ending whistle. Similar periods of non-play can be edited out by identifying the halftime based on the lack of whistle sounds.


Reference is now made to FIG. 3, which is a table illustrating exemplary embodiments of sports types as related to whistle sounds. The table includes a column for sports type 150, which features an example of a variety of different sports that can be classified under the methods and systems herein. The table also includes a whistle type column 152 that can list the sport specific attributes of a whistle sounds corresponding to the sports type of column 150. For example, whistle type can include characteristics describing the tonal frequency or pitch of whistles used in a particular sport. The whistle type can also include characteristics describing the average duration of a whistle sound as it is used in a particular sport. The table also includes a quantity column 154, which includes a quantity of whistle sounds that are likely to occur in a particular type listed in column 150.


Similarly, a table also includes a relative occurrence time column 156 that describes a distribution of the whistle sounds in a typical event listed in column 150. One example of a relative occurrence time that can be specific to each sport is the beginning and ending of a period of play. The entries in the relative occurrence time describe, for example, the structure of play corresponding to the sports types in column 150. By analyzing the relative occurrence time of the whistles, the number and duration of play periods can be determined. The structure of play can be used to determine the sports type.


Another example of a relative occurrence time that can be specific to a particular sport can be length of an individual play, in, for example, a football game. A whistle is sounded, for example, at the end of a play in a football game. The next end of play whistle is likely to occur within a few seconds, in the case of a rushed down and a short play, or a greater number of seconds in the circumstance where a team uses the entire play clock before executing the next play.


By way of example, the quantity and relative occurrence time of whistle sounds may be used to determine the sports type in the absence of a distinctive whistle type 152. In the case where the whistle occurs a high quantity 154 of times throughout the event and in two periods of play 156, the event may be classified based on the quantity and/or relative occurrence times of the whistle (e.g. classified as a basketball game).


Alternatively, where the whistle occurs a high quantity 154 of times throughout the event and in four periods of play 156, the event may be classified based on the quantity and/or relative occurrence times or rhythms of the whistle (e.g. classified as a football game). Many sport types 150 may include the same or indistinguishable whistle types 152 and only be distinguishable by quantity 154 and relative occurrence time 156.


Additionally, while the quantity, for example, is depicted as being described in terms of categories such as high, medium, and low, the quantity can also be evaluated and determined in numerical terms. Such terms can be determined based on statistical or numeric techniques and can include values such as median, mean, and standard deviation, among others. All applicable statistical or numerical techniques are contemplated within the scope and spirit of this disclosure.


Reference is now made to FIGS. 4A-4C, which are diagrams illustrating audio component sample strings with whistle sounds corresponding to different sports types. Reference is first made to FIG. 4A, which is an audio component sample string corresponding to a football game. Each of the bars represents an audio sample that occurs along a timeline 176. The relevance of the bars to the analysis of the video is illustrated by the different heights of the bars. For example, a tall bar represents a whistle sound occurrence 170 and a short bar represents other audio 172. The high quantity of whistles 170 that occur in a substantially regular distribution throughout the time of the event can occur in a football or a basketball game, for example. Where the relative occurrence times of the whistles indicate that the game includes four periods or quarters of play, the video can be determined using the relative occurrence times of the whistle (e.g. determined to be a football game). Alternatively, where the relative occurrence times of the whistles indicate that the game includes two periods or halves of play, the video can be determined using the relative occurrence times of the whistle (e.g. determined to be a basketball game) as in FIG. 4C.


Similarly, the audio sample string of FIG. 4B can be identified as a soccer match where a whistle 170 is contained in the video in a low quantity and the relative occurrence times indicate that there are two halves of play with a total duration consistent with a soccer match. FIG. 4C can be identified as a basketball game based on the high quantity of whistles and the relative occurrence times. In contrast with football, basketball can include many plays and possession changes without the occurrence of a whistle. This difference renders the relative occurrence times of whistle sounds in football games distinguishable from those of basketball games.


Reference is made to FIGS. 5A and 5B, which are diagrams illustrating audio strings with whistle sounds corresponding to entire events of two different sports types. Reference is first made to FIG. SA, which is an audio string corresponding to an entire football game. Each of the bars represents an audio sample that occurs during the game. The tall bars represent whistle sound occurrences 170 and the short bars represent other audio. A football game can be, for example, characterized by a high quantity of whistles 170 that occur in a substantially regular distribution coupled with the breaks in play that occur during the quarter change 175 and the halftime 173. Similarly, referring to FIG. 5B, fewer whistle occurrences 170 and a game having only a single break in play at a halftime 173 allow the sports type to be determined using the quantity and relative occurrence times of the whistle sounds (e.g. as a soccer match).


Reference is now made to FIG. 6, which is a block diagram illustrating an embodiment of a system for analyzing video. The system 180 includes logic to collect sample whistle sounds in block 182. The system 180 further includes logic to determine sample whistle features, including, for example pitch and mel-frequency cepstrum coefficients. The mel-frequency cepstrum coefficients provide a compact representation of an audio feature that can be produced using coefficients corresponding to a specific series of band-widths. The system 180 further includes logic to generate whistle sound patterns in block 186. In this manner, the whistle sound patterns can be used to extract audio features from a video in block 188. A video can be a digital or analog streaming video signal or a video stored on a variety of storage media types.


The system 180 further includes logic to compare audio features and the whistle sound patterns in block 190. The mel-frequency cepstrum coefficients and pitch data from the patterns is compared to the extracted mel-frequency cepstrum coefficient and pitch data from the audio stream. Similarly, the system 180 includes logic to determine a sports type using whistle type information in block 192. The whistle type information can include, for example, tonal pitch or frequency and duration, among others. Additionally or alternatively, the sports type can be determined using the quantity of whistles in a video in block 194. Also, the sports type can be determined using the time of the whistle occurrences in block 196.


Reference is now made to FIG. 7, which is a block diagram illustrating an embodiment of a method for analyzing video. The method 200 begins with detecting whistle sounds in an audio stream in block 210. The whistle sounds can be detected using, for example, previously calculated features corresponding to sample whistle sounds. Examples of such features can include mel-frequency cepstrum coefficients, pitch, LPC coefficients, LSP coefficients, audio energy, zero-crossing rate, and noise frame ratios, among others. The audio stream can be processed into the same features and the features compared to those of the samples. The content of the video is determined based on the whistle sounds, using for example, multiple whistle sound characteristics. Examples of whistle sound characteristics include, but are not limited to, rhythms of whistle occurrences in a video, the type of whistle, and the quantity of whistle sounds in a video. For example, a high quantity of whistles that occur throughout the time of the event can occur in a football or a basketball game. Where the rhythms of whistle occurrences indicates that the game is continuously played without regular whistle interruption after individual plays, the video can be determined using the rhythms of whistle occurrences (e.g. to be a basketball game).


Reference is now made to FIG. 8, which is a block diagram illustrating an embodiment of a computer-readable medium having a program for analyzing video. The computer-readable medium 300 includes logic to generate whistle sound patterns from samples in block 310. The computer-readable medium 300 also includes logic to detect whistle sound in a video in block 320. The video can be a digital or analog streaming video signal or a video stored on a variety of storage media types. The whistle sound data is extracted from an audio stream of the video.


The computer-readable medium 300 further includes logic to analyze the video in block 330 using the whistle sounds. The analysis is performed by determining multiple whistle sound characteristics. For example, a whistle type might be distinctive among specific sporting events. Whistle type might be used to describe actual structural or functional differences in whistles or the style of using the whistle in the video. For example, some whistle types might be characterized by long duration whistle sounds. In contrast, other whistle types might be characterized by multiple short bursts or patterns of bursts.


Additionally, the whistle data can be further utilized to manipulate the video. In this manner, a user can experience improved playback quality by eliminating or bypassing undesirable segments of the video. Also, a cost reduction can be realized through reduced storage media requirements of the manipulated video. Further, the cost may be reduced through lower power consumption based on the reduced playback time of reviewing manipulated video.


Embodiments of the present disclosure can be implemented in hardware, software, firmware, or a combination thereof. Some embodiments can be implemented in software or firmware that is stored in a memory and that is executed by a suitable instruction execution system. If implemented in hardware, an alternative embodiment can be implemented with any or a combination of the following technologies, which are all well known in the art: a discrete logic circuit(s) having logic gates for implementing logic functions upon data signals, an application specific integrated circuit (ASIC) having appropriate combinational logic gates, a programmable gate array(s) (PGA), a field programmable gate array (FPGA), etc.


Any process descriptions or blocks in flow charts should be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps in the process, and alternate implementations are included within the scope of an embodiment of the present disclosure in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present disclosure.


A program according to this disclosure that comprises an ordered listing of executable instructions for implementing logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. In the context of this document, a “computer-readable medium” can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. The computer readable medium can be, for example but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, device, or propagation medium. More specific examples (a nonexhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic) having one or more wires, a portable computer diskette (magnetic), a random access memory (RAM) (electronic), a read-only memory (ROM) (electronic), an erasable programmable read-only memory (EPROM or Flash memory) (electronic), an optical fiber (optical), and a portable compact disc read-only memory (CDROM) (optical). In addition, the scope of the present disclosure includes embodying the functionality of the illustrated embodiments of the present disclosure in logic embodied in hardware or software-configured mediums.


It should be emphasized that the above-described embodiments of the present disclosure, particularly, any illustrated embodiments, are merely possible examples of implementations. Many variations and modifications may be made to the above-described embodiment(s) of the disclosure without departing substantially from the spirit and principles of the disclosure.

Claims
  • 1. A system for analyzing video, comprising: logic configured to collect sample whistle sounds corresponding to a plurality of sport types; logic configured to determine a plurality of sample whistle features; logic configured to generate a plurality of whistle sound patterns; logic configured to extract a plurality of audio features corresponding to a plurality of frames in a video; logic configured to compare the plurality of sample whistle features with the plurality of audio features to determine a plurality of whistle sounds in the video; logic configured to determine a sport type using a type of whistle indicator; logic configured to determine a sport type using a quantity of whistle occurrences data value; and logic configured to determine a sport type using a time of whistle occurrences data set.
  • 2. The system of claim 1, further comprising means for manipulating the video based on the content, the quantity of whistle occurrences data value, and the time of whistle occurrences data set.
  • 3. A method for analyzing video, comprising: detecting a plurality of whistle sounds in an audio stream of a video; and determining a video content based on a plurality of properties corresponding to the plurality of whistle sounds.
  • 4. The method of claim 3, further comprising generating a plurality of whistle sound patterns.
  • 5. The method of claim 4, wherein the generating comprises collecting a plurality of whistle sound samples corresponding to a plurality of sports types.
  • 6. The method of claim 4, wherein the generating further comprises collecting the plurality of whistle sound samples for the plurality of sports types.
  • 7. The method of claim 4, wherein the generating further comprises determining a plurality of whistle sound sample features.
  • 8. The method of claim 3, wherein the detecting comprises extracting a plurality of whistle sounds from the video.
  • 9. The method of claim 3, wherein the detecting comprises determining a plurality of whistle sound features.
  • 10. The method of claim 9, wherein the plurality of whistle sound features are determined for each of a plurality of frames in the video.
  • 11. The method of claim 3, wherein the determining comprises comparing the plurality of whistle sound features with a plurality of whistle sound sample features.
  • 12. The method of claim 3, wherein the determining further comprises classifying a sport type using a plurality of whistle sound characteristics.
  • 13. The method of claim 12, wherein one of the plurality of whistle sound characteristics comprises a quantity of occurrences in the video.
  • 14. The method of claim 12, wherein one of the plurality of whistle sound characteristics comprises a plurality of rhythms of whistle occurrences in the video.
  • 15. The method of claim 12, wherein one of the plurality of whistle sound characteristics comprises a whistle duration.
  • 16. The method of claim 12, wherein one of the plurality of whistle sound characteristics comprises a whistle tonal frequency.
  • 17. The method of claim 3, further comprising manipulating the video based on the video content and a plurality of whistle sound characteristics.
  • 18. A computer readable medium having a computer program for analyzing video, comprising: logic configured to generate a plurality of whistle sound patterns; logic configured to detect a whistle sound in a video; and logic configured to analyze the video using the whistle sound.
  • 19. The computer readable medium of claim 18, wherein the detect logic is configured to extract the whistle sound from the video.
  • 20. The computer readable medium of claim 19, wherein the detect logic is further configured to determine a plurality of whistle features.
  • 21. The computer readable medium of claim 20, wherein one of the plurality of features comprises a pitch for each of a plurality of frames.
  • 22. The computer readable medium of claim 18, wherein the analyze logic is configured to determine a sport type using a plurality of whistle characteristics.
  • 23. The computer readable medium of claim 22, wherein one of the plurality of whistle characteristics comprises a quantity of occurrences in the video.
  • 24. The computer readable medium of claim 22, wherein one of the plurality of whistle characteristics comprises a plurality of rhythms of whistle occurrences in the video.
  • 25. The computer readable medium of claim 18, further comprising logic is configured to manipulate the video using a characteristic of the whistle sound.
  • 26. The computer readable medium of claim 18, wherein the analyze logic is configured to determine a sport type using a time interval between whistles.