The present disclosure is generally related to video signal processing and, more particularly, is related to systems, methods, and computer readable media having programs for analyzing the content of video.
In recent years, among the various kinds of multimedia, video is becoming an important component. Video refers to moving images together with sound and can be transmitted, received, and stored in a variety of techniques and formats. Video can include many different genres including, but not limited to episodic programming, movies, music, and sports, among others. End users, editors, viewers, and subscribers may wish to view only selected types of content within each genre. For example, a sports viewer may have great interest in identifying specific types of sporting events within a video stream or clip. Previous methods for classifying sports video have required the analysis of video segments and corresponding motion information. These methods, however, require significant processing resources that may be costly and cumbersome to employ.
Embodiments of the present disclosure provide a system, method and computer readable medium having a program for analyzing video content. In one embodiment a system includes: logic configured to collect sample whistle sounds corresponding to a plurality of sport types; logic configured to determine a plurality of sample whistle features; logic configured to generate a plurality of whistle sound patterns; logic configured to extract a plurality of audio features corresponding to a plurality of frames in a video; logic configured to compare the plurality of sample whistle features with the plurality of audio features to determine a plurality of whistle sounds in the video; logic configured to determine a sport type using a type of whistle indicator; logic configured to determine a sport type using a quantity of whistle occurrences data value; and logic configured to determine a sport type using a time of whistle occurrences data set.
In another embodiment, a method includes: detecting a plurality of whistle sounds in an audio stream of a video; and determining a video content based on a plurality of properties corresponding to the plurality of whistle sounds.
In a further embodiment, a computer readable medium having a computer program for analyzing video includes: logic configured to generate a plurality of whistle sound patterns; logic configured to detect a whistle sound in a video; and logic configured to analyze the video using the whistle sound.
Other systems and methods will be or become apparent to one with skill in the art upon examination of the following drawings and detailed description.
Many aspects of the disclosure can be better understood with reference to the following drawings. The components in the drawings are not necessarily to scale, emphasis instead being placed upon clearly illustrating the principles of the present disclosure. Moreover, in the drawings, like reference numerals designate corresponding parts throughout the several views.
Having summarized various aspects of the present disclosure, reference will now be made in detail to the description of the disclosure as illustrated in the drawings. While the disclosure will be described in connection with these drawings, there is no intent to limit it to the embodiment or embodiments disclosed herein. On the contrary, the intent is to cover all alternatives, modifications and equivalents included within the spirit and scope of the disclosure as defined by the appended claims.
Beginning with
In block 104, features of sample whistle sounds are extracted from an audio sample in a frame-by-frame manner. Features can include, but are not limited to, mel-frequency cepstrum coefficients 106, noise frame ratio 107, and pitch 108. For example, other features that can be used include LPC coefficients 109, LSP coefficients 111, audio energy 113, and zero-crossing rate 114. The mel-frequency cepstrum coefficients are derived from the known variation of critical band-widths of the human ear. Filters are spaced linearly at low frequencies and logarithmically at high frequencies and a compact representation of an audio feature can be produced using coefficients corresponding to each of the band-widths. After the features are extracted in block 104, a whistle sound pattern is built for whistles corresponding to each of the different sports 110. The pattern can include the specific mel-frequency cepstrum coefficients 106 and pitch 108 that are statistically exclusive to the whistles used in different sport types.
Reference is now made to
A sports type is determined in block 130 based on whistle occurrences. For example, by analyzing whistle occurrence characteristics, it can be determined that the video is, for example, a soccer match by using the quantity of whistles and the time between each of the whistles or groups of whistles. Further, optionally, the video clips can be manipulated based on the whistle information in block 132. For example, in a football game the time between plays can be edited out of a video by retaining the portion of the video segment that occurs starting a few seconds before a whistle sound that is determined to be a play ending whistle. Similar periods of non-play can be edited out by identifying the halftime based on the lack of whistle sounds.
Reference is now made to
Similarly, a table also includes a relative occurrence time column 156 that describes a distribution of the whistle sounds in a typical event listed in column 150. One example of a relative occurrence time that can be specific to each sport is the beginning and ending of a period of play. The entries in the relative occurrence time describe, for example, the structure of play corresponding to the sports types in column 150. By analyzing the relative occurrence time of the whistles, the number and duration of play periods can be determined. The structure of play can be used to determine the sports type.
Another example of a relative occurrence time that can be specific to a particular sport can be length of an individual play, in, for example, a football game. A whistle is sounded, for example, at the end of a play in a football game. The next end of play whistle is likely to occur within a few seconds, in the case of a rushed down and a short play, or a greater number of seconds in the circumstance where a team uses the entire play clock before executing the next play.
By way of example, the quantity and relative occurrence time of whistle sounds may be used to determine the sports type in the absence of a distinctive whistle type 152. In the case where the whistle occurs a high quantity 154 of times throughout the event and in two periods of play 156, the event may be classified based on the quantity and/or relative occurrence times of the whistle (e.g. classified as a basketball game).
Alternatively, where the whistle occurs a high quantity 154 of times throughout the event and in four periods of play 156, the event may be classified based on the quantity and/or relative occurrence times or rhythms of the whistle (e.g. classified as a football game). Many sport types 150 may include the same or indistinguishable whistle types 152 and only be distinguishable by quantity 154 and relative occurrence time 156.
Additionally, while the quantity, for example, is depicted as being described in terms of categories such as high, medium, and low, the quantity can also be evaluated and determined in numerical terms. Such terms can be determined based on statistical or numeric techniques and can include values such as median, mean, and standard deviation, among others. All applicable statistical or numerical techniques are contemplated within the scope and spirit of this disclosure.
Reference is now made to
Similarly, the audio sample string of
Reference is made to
Reference is now made to
The system 180 further includes logic to compare audio features and the whistle sound patterns in block 190. The mel-frequency cepstrum coefficients and pitch data from the patterns is compared to the extracted mel-frequency cepstrum coefficient and pitch data from the audio stream. Similarly, the system 180 includes logic to determine a sports type using whistle type information in block 192. The whistle type information can include, for example, tonal pitch or frequency and duration, among others. Additionally or alternatively, the sports type can be determined using the quantity of whistles in a video in block 194. Also, the sports type can be determined using the time of the whistle occurrences in block 196.
Reference is now made to
Reference is now made to
The computer-readable medium 300 further includes logic to analyze the video in block 330 using the whistle sounds. The analysis is performed by determining multiple whistle sound characteristics. For example, a whistle type might be distinctive among specific sporting events. Whistle type might be used to describe actual structural or functional differences in whistles or the style of using the whistle in the video. For example, some whistle types might be characterized by long duration whistle sounds. In contrast, other whistle types might be characterized by multiple short bursts or patterns of bursts.
Additionally, the whistle data can be further utilized to manipulate the video. In this manner, a user can experience improved playback quality by eliminating or bypassing undesirable segments of the video. Also, a cost reduction can be realized through reduced storage media requirements of the manipulated video. Further, the cost may be reduced through lower power consumption based on the reduced playback time of reviewing manipulated video.
Embodiments of the present disclosure can be implemented in hardware, software, firmware, or a combination thereof. Some embodiments can be implemented in software or firmware that is stored in a memory and that is executed by a suitable instruction execution system. If implemented in hardware, an alternative embodiment can be implemented with any or a combination of the following technologies, which are all well known in the art: a discrete logic circuit(s) having logic gates for implementing logic functions upon data signals, an application specific integrated circuit (ASIC) having appropriate combinational logic gates, a programmable gate array(s) (PGA), a field programmable gate array (FPGA), etc.
Any process descriptions or blocks in flow charts should be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps in the process, and alternate implementations are included within the scope of an embodiment of the present disclosure in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present disclosure.
A program according to this disclosure that comprises an ordered listing of executable instructions for implementing logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. In the context of this document, a “computer-readable medium” can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. The computer readable medium can be, for example but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, device, or propagation medium. More specific examples (a nonexhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic) having one or more wires, a portable computer diskette (magnetic), a random access memory (RAM) (electronic), a read-only memory (ROM) (electronic), an erasable programmable read-only memory (EPROM or Flash memory) (electronic), an optical fiber (optical), and a portable compact disc read-only memory (CDROM) (optical). In addition, the scope of the present disclosure includes embodying the functionality of the illustrated embodiments of the present disclosure in logic embodied in hardware or software-configured mediums.
It should be emphasized that the above-described embodiments of the present disclosure, particularly, any illustrated embodiments, are merely possible examples of implementations. Many variations and modifications may be made to the above-described embodiment(s) of the disclosure without departing substantially from the spirit and principles of the disclosure.