The present invention relates to a method and system for video searching, video mining, content association, and clustering. The present invention further relates to automatically detecting repeated video clips.
Modern mobile telecommunications devices, such as cellular telephones, may download a variety of media content. This media content may include such media types as video. The video content may be any of a variety of formats, such as standards provided by Moving Picture Experts Group (MPEG) (Including MPEG 1, Layer 3 (MP3)), and others.
The video content may be made of a set of individual frames, showing images without any temporal component. These frames may be grouped into video clips, showing a series of frames over a specified temporal period. Often a video sequence of a set of video data content may include a number of repeated video clips. These video clips may be intentionally included by the video content provider, or may be due to errors that may occur during the transmission of the data. A user may want to have the extra clips removed prior to viewing the video data content. Sorting out the repetitive video clips currently requires a substantial amount of processing power.
The major difficulty of repetitive clip discovery is that, barring personally watching the video, the user may not know where the repetitive clips are and how long they are. One method includes checking every different length of video clips for every frame of video data. This naïve mining method is computationally expensive. For example, suppose the database is of size n, the total number of possible segments needed to query in the database is:
For each query, the database must search to find its best matched candidates. Therefore the computational cost can be of a complexity O(n4) by using the naive mining method. This is not a reasonable solution for large database.
A method, mobile telecommunications apparatus, and electronic device for searching for repetitive video content are disclosed. A memory may store a set of video data. A processor may match a premier query window to a trellis match video window of the set of video data. The processor may compare a successive query window to a successive trellis match video window. The processor may disregard the trellis match video window if the successive trellis match video window does not match the successive query window.
In order to describe the manner in which the above-recited and other advantages and features of the invention can be obtained, a more particular description of the invention briefly described above will be rendered by reference to specific embodiments thereof which are illustrated in the appended drawings. Understanding that these drawings depict only typical embodiments of the invention and are not therefore to be considered to be limiting of its scope, the invention will be described and explained with additional specificity and detail through the use of the accompanying drawings in which:
a-b illustrate in block diagrams two types of video searches.
a-b illustrate in block diagrams the creation of an ordinal feature signature.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The features and advantages of the invention may be realized and obtained by means of the instruments and combinations particularly pointed out in the appended claims. These and other features of the present invention will become more fully apparent from the following description and appended claims, or may be learned by the practice of the invention as set forth herein.
Various embodiments of the invention are discussed in detail below. While specific implementations are discussed, it should be understood that this is done for illustration purposes only. A person skilled in the relevant art will recognize that other components and configurations may be used without parting from the spirit and scope of the invention.
The present invention comprises a variety of embodiments, such as a method, an apparatus, and an electronic device, and other embodiments that relate to the basic concepts of the invention. The electronic device may be any manner of computer, mobile device, or wireless communication device.
A method, apparatus, and electronic device for searching for repetitive video content are disclosed. A memory may store a set of video data. A processor may match a premier query window to a trellis match video window of the set of video data. The processor may compare a successive query window to a successive trellis match video window. The processor may disregard the trellis match video window if the successive trellis match video window does not match the successive query window.
a-b illustrate in block diagrams two types of video searches.
For the tasks of video clip search and repetition discovery, feature, or signature, extraction may produce compact, robust and distinguishable signatures. Ordinal feature and color feature may be combined to serve as the signatures. Segmenting long video sequences into fixed length windows and comparing the video feature signatures may classify the video content of the video database. Compared with key frame based shot representation, the ambiguity of key frame selection and the difficulty of detecting gradual shot transition are thus avoided. Note such gradual shot transitions appear very commonly in commercials and program lead-in and lead-out due to post editing.
Using video feature figure signatures to characterize video segments has many advantages. An ordinal pattern distribution histogram provides a unique sparse distribution, and thus is more distinguishable than a CFS alone. The OFS is a good supplement to a CFS as it provides spatial-temporal information. Thus when combined with the color histograms, such signatures can lead towards a robust feature set. Ordinal pattern distribution is insensitive to a global color shifting, color format changes or other coding variations such as frame size change, rate change, and others.
a illustrates in a block diagram reducing 400 the image into different spatial layouts. The image 410 may be divided into multiple sub-images. In the present example, the image 410 is divided into four sub-images. The layouts may arrange the sub-images into three spatial layouts, a 2×2 pattern 420, a 4×1 pattern 430, or a 1×4 pattern 440. By measuring different spatial layouts of the images, the signature becomes more distinguishable.
Returning to
For example, a video segment can be compactly represented by 3 normalized 24-dimensional ordinal pattern distribution histograms, corresponding to Y, Cb, and Cr channels respectively. For each channel c=Y, Cb, Cr, the video clip is represented as:
Here the number of possible patterns (NoP=4!=24) is the dimension of the histogram. As a result, the total dimension of the spatial-temporal signature Hopd also becomes 72.
The cumulative color histograms of all the frames within a video segment may be used as the color signature.
where Hi|i=bk, bk+1, . . . , bk+M−1 denotes the color histogram of the corresponding frame within the video sequence. M is the number of frames and B is the color bin number. The video processor may set the color bin number to equal the number of possible patterns (Block 520). In this example, B is selected as 24 for uniform quantization. The video processor may create a color feature vector for each color channel, such as Y, Cb and Cr (Block 530). Hcccd may thus be a 24-dimensional feature vector. The total size of the color signature Hccd may be 72-dimension. Finally the video feature signature dimensionality becomes 144.
The video clip search problem may be formulated as an approximate nearest neighbor search problem. Where ε-Nearest Neighbor Search (ε-NNS) and given a set P of n points in a norm space ld, P is preprocessed so as to efficiently return a point p in P for any given query point q, such that d(q,p)<=(1+ε)d(q, P), where d(p,P) is the distance of q to its closest point in P.
Returning to
The controller/processor 910 may be any programmed processor known to one of skill in the art. However, the decision support method can also be implemented on a general-purpose or a special purpose computer, a programmed microprocessor or microcontroller, peripheral integrated circuit elements, an application-specific integrated circuit or other integrated circuits, hardware/electronic logic circuits, such as a discrete element circuit, a programmable logic device, such as a programmable logic array, field programmable gate-array, or the like. In general, any device or devices capable of implementing the decision support method as described herein can be used to implement the decision support system functions of this invention.
The memory 920 may include volatile and nonvolatile data storage, including one or more electrical, magnetic or optical memories such as a random access memory SAM), cache, hard drive, or other memory device. The memory may have a cache to speed access to specific data. The memory 920 may also be connected to a compact disc-read only memory (CD-ROM), digital video disc-read only memory (DVD-ROM), DVD read write input, tape drive or other removable memory device that allows media content to be directly uploaded into the system.
The digital media processor 940 is a separate processor that may be used by the system to more efficiently present digital media. Such digital media processors may include video cards, audio cards, or other separate processors that enhance the reproduction of digital media.
The Input/Output interface 950 may be connected to one or more input devices that may include a keyboard, mouse, pen-operated touch screen or monitor, voice-recognition device, or any other device that accepts input. The Input/Output interface 950 may also be connected to one or more output devices, such as a monitor, printer, disk drive, speakers, or any other device provided to output data.
The network interface 960 may be connected to a communication device, modem, network interface card, a transceiver, or any other device capable of transmitting and receiving signals over a network. The network interface 960 may be used to transmit the media content to the selected media presentation device. The network interface may also be used to download the media content from a media source, such as a website or other media sources. The components of the computer system 900 may be connected via an electrical bus 970, for example, or linked wirelessly.
Client software and databases may be accessed by the controller/processor 910 from memory 920, and may include, for example, database applications, word processing applications, the client side of a client/server application such as a billing system, as well as components that embody the decision support functionality of the present invention. The user access data may be stored in either a database accessible through the database interface 940 or in the memory 920. The computer system 900 may implement any operating system, such as Windows or UNIX, for example. Client and server software may be written in any programming language, such as ABAP, C, C++, Java or Visual Basic, for example.
Although not required, the invention is described, at least in part, in the general context of computer-executable instructions, such as program modules, being executed by the electronic device, such as a general purpose computer. Generally, program modules include routine programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. Moreover, those skilled in the art will appreciate that other embodiments of the invention may be practiced in network computing environments with many types of computer system configurations, including personal computers, hand-held devices, multi-processor systems, microprocessor-based or programmable consumer electronics, network PCs, minicomputers, mainframe computers, and the like.
Embodiments may also be practiced in distributed computing environments where tasks are performed by local and remote processing devices that are linked (either by hardwired links, wireless links, or by a combination thereof through a communications network.
Embodiments within the scope of the present invention may also include computer-readable media for carrying or having computer-executable instructions or data structures stored thereon. Such computer-readable media can be any available media that can be accessed by a general purpose or special purpose computer. By way of example, and not limitation, such computer-readable media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to carry or store desired program code means in the form of computer-executable instructions or data structures. When information is transferred or provided over a network or another communications connection (either hardwired, wireless, or combination thereof to a computer, the computer properly views the connection as a computer-readable medium. Thus, any such connection is properly termed a computer-readable medium. Combinations of the above should also be included within the scope of the computer-readable media.
Computer-executable instructions include, for example, instructions and data which cause a general purpose computer, special purpose computer, or special purpose processing device to perform a certain function or group of functions. Computer-executable instructions also include program modules that are executed by computers in stand-alone or network environments. Generally, program modules include routines, programs, objects, components, and data structures, etc. that perform particular tasks or implement particular abstract data types. Computer-executable instructions, associated data structures, and program modules represent examples of the program code means for executing steps of the methods disclosed herein. The particular sequence of such executable instructions or associated data structures represents examples of corresponding acts for implementing the functions described in such steps.
Although the above description may contain specific details, they should not be construed as limiting the claims in any way. Other configurations of the described embodiments of the invention are part of the scope of this invention. For example, the principles of the invention may be applied to each individual user where each user may individually deploy such a system. This enables each user to utilize the benefits of the invention even if any one of the large number of possible applications do not need the functionality described herein. In other words, there may be multiple instances of the electronic devices each processing the content in various possible ways. It does not necessarily need to be one system used by all end users. Accordingly, the appended claims and their legal equivalents should only define the invention, rather than any specific examples given.