A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.
1. Field of the Invention
This invention relates to the field of stream media indexing based on similarities.
2. Description of the Related Art
Streams of media such as slides of a captured presentation need to be segmented for indexing and subsequent full-text retrieval purposes. Traditionally, this indexing has been performed based on visual similarity. Once segmented, text was extracted from each slide via Optical character recognition (OCR) and a full-text index entry (document) was built for each slide. While this approach worked reasonably well, it was limited in at least two ways. First, OCR introduced recognition errors, decreasing the performance of subsequent full-text queries, and the relatively small amount of text per slide made it harder to identify term co-occurrence which underpins effective query performance; Second, segmented data streams are hard to index when the textual information associated with each segment is limited and noisy. Accurate textual information is important for ad-hoc retrieval of segments from data streams.
Various embodiments of the present invention enable an approach to index segments of a media stream containing visual and textual information, using a combination of visual, textual, auditory and temporal features to group segments that correspond to topical contexts into logical groups. A visual/temporal/auditory/textual weighting scheme is adopted, which allows segments from elsewhere in the same presentation to affect the index terms associated with the current segment.
Preferred embodiment(s) of the present invention will be described in detail based on the following figures, wherein:
The invention is illustrated by way of example and not by way of limitation in the figures of the accompanying drawings in which like references indicate similar elements. It should be noted that references to “an” or “one” or “some” embodiment(s) in this disclosure are not necessarily to the same embodiment, and such references mean at least one.
Referring to
Referring to
In some embodiments, the similarities between the indexed segment and its neighboring segments include but are not limited to, the overlap, which can be but is not limited to syntactic, semantic, linguistic or statistical similarity, among terms found on the neighboring segments, their temporal and sequential proximity, and similarity between visual features of the segments. This expanded and re-weighted term vector would be used to index each segment, thereby allowing the retrieval of concepts that are distributed among neighboring segments, and improving term frequency-based metrics by smoothing them over multiple segments.
In some embodiments, textual terms in a segment can be generated for assessing textual similarity with its neighbors in a number of ways. One standard text segmentation technique is to run a fixed-length window over the text, computing measures of coherence, which can be but are not limited to, statistical, symbolic, probabilistic and the like, over the window, and thresholding the resulting value to generate coherent passages. Alternatively, lexical units such as paragraphs or sentences can be used to generate passages. Finally, text may be segmented into fixed-word-count passages. While traditionally used for splitting a document into multiple pieces, these techniques can be used in reverse, to join text associated with neighboring segments into a single weight vector.
In some embodiments, weight vector can be computed based on a distance within some feature space, which can be but is not limited to, Euclidian and statistical, with features derived from one or more of the following factors:
In some embodiments, the term weight vector can be incorporated into an index once it is computed. Two exemplary strategies for incorporating the term weight vector are: index-time and query-time grouping.
Index-time grouping involves creating coherent documents based on groups of adjacent segments of sufficient similarity. Two or more adjacent segments can be grouped together into a single document, indexed with all their contained terms, and retrieved as a unit.
In query-time grouping, segments are indexed individually, and then grouped after query evaluation to produce a query-biased grouping in which the weights of query terms or other related terms are boosted in computing the grouping.
In some embodiments, the segment group approach can compensate for OCR errors by increasing the likelihood that a correctly-recognized term will be associated with a group of segments. As a non-limiting example, assume a term (feature) occurs in three consecutive segments and it is mis-recognized in two of three cases. Without segment grouping, only the segment that contains the correctly-recognized word would be retrieved. With segment grouping, the correctly spelled variant would be propagated to its neighboring segments, increasing the likelihood of retrieval.
One embodiment may be implemented using a conventional general purpose or a specialized digital computer or microprocessor(s) programmed according to the teachings of the present disclosure, as will be apparent to those skilled in the computer art. Appropriate software coding can readily be prepared by skilled programmers based on the teachings of the present disclosure, as will be apparent to those skilled in the software art. The invention may also be implemented by the preparation of integrated circuits or by interconnecting an appropriate network of conventional component circuits, as will be readily apparent to those skilled in the art.
One embodiment includes a computer program product which is a machine readable medium (media) having instructions stored thereon/in which can be used to program one or more computing devices to perform any of the features presented herein. The machine readable medium can include, but is not limited to, one or more types of disks including floppy disks, optical discs, DVD, CD-ROMs, micro drive, and magneto-optical disks, ROMs, RAMs, EPROMs, EEPROMs, DRAMs, VRAMs, flash memory devices, magnetic or optical cards, nanosystems (including molecular memory ICs), or any type of media or device suitable for storing instructions and/or data. Stored on any one of the computer readable medium (media), the present invention includes software for controlling both the hardware of the general purpose/specialized computer or microprocessor, and for enabling the computer or microprocessor to interact with a human user or other mechanism utilizing the results of the present invention. Such software may include, but is not limited to, device drivers, operating systems, execution environments/containers, and applications.
The foregoing description of the preferred embodiments of the present invention has been provided for the purposes of illustration and description. It is not intended to be exhaustive or to limit the invention to the precise forms disclosed. Many modifications and variations will be apparent to the practitioner skilled in the art. Particularly, while the concept “module” is used in the embodiments of the systems and methods described above, it will
3 be evident that such concept can be interchangeably used with equivalent concepts such as, bean, class, method, type, component, interface, object model, and other suitable concepts. Embodiments were chosen and described in order to best describe the principles of the invention and its practical application, thereby enabling others skilled in the art to understand the invention, the various embodiments and with various modifications that are suited to the particular use contemplated. It is intended that the scope of the invention be defined by the following claims and their equivalents.