The present teachings relate generally to methods and systems for enhancing visual content database retrieval, and more particularly, to platforms and techniques for performing visual search and retrieval by first combining various visual features derived from visual content, and then searching for similar visual content using the combined visual features and/or indexing the combined visual features.
Typically, when image/video search and retrieval are conducted, low-level visual features are used. For instance, color histogram is used to compare the similarity between a query image/video with videos in a database. Recently, researchers have begun to pay greater attention to image/video retrieval using high-level visual features, such as retrieval based on visual concepts in an image/video.
However, there are limitations with using either low-level or high-level visual features to retrieve images/videos. For instance, low-level visual features do not take image content into consideration, and results retrieved using low-level visual features may simply reflect visual similarity but not be meaningful. Using high-level visual features to retrieve images/videos may also return poor results because of sensitivities of high-level visual features extraction.
According to the present teachings in one or more aspects, methods and systems for enhancing visual content database retrieval are provided, in which a visual content retrieval system performs visual search and content retrieval by combining low-level and high-level visual features derived from visual content, and then searches for and retrieves similar visual content using the combined visual features and/or indexes the combined visual features. In general implementations of the present teachings, the visual content retrieval system can combine low-level and high-level visual descriptors of a query video into combined visual descriptors, and can then search for and retrieve one or more similar videos in a video database using the combined visual descriptors of the query video.
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate aspects of the present teachings and together with the description, serve to explain principles of the present teachings. In the figures:
FIG. l illustrates an exemplary visual content retrieval system that performs visual searching and retrieval by combining low-level and high-level visual features derived from visual content, and then searching for similar visual content using the combined visual features and/or indexing the combined visual features, consistent with various embodiments of the present teachings;
Reference will now be made in detail to various embodiments of the present teachings, an example of which is illustrated in the accompanying drawings. Wherever possible, the same reference numbers will be used throughout the drawings to refer to the same or like parts.
In the following description, reference is made to the accompanying drawings that form a part thereof, and in which is shown by way of illustration specific implementations in which may be practiced. These implementations are described in sufficient detail to enable those skilled in the art to practice these implementations and it is to be understood that other implementations may be utilized and that modifications and equivalents may be made without departing from the scope of the present teachings. The following description is, therefore, merely exemplary.
Additionally, in the subject description, the word “exemplary” is used to mean serving as an example, instance, or illustration. Any aspect or design described herein as “exemplary” is not necessarily to be construed as preferred or advantageous over other aspects or designs. Rather, use of the word exemplary is intended to present concepts in a concrete fashion.
Aspects of the present teachings relate to systems and methods for enhancing visual content database retrieval. More particularly, in various aspects, and as for example generally shown in
According to various embodiments, and as generally shown in
Image processor 110 can process videos in video database 120 and populate video features database 130 offline, i.e., when not searching for nearest neighboring videos for query video 150, and thus improving turnaround time when searching for the nearest neighboring videos. Although
As shown in
Subsequently, in 250, visual content retrieval system 100 can determine whether the visual content is a query video (e.g., query video 150) or not (e.g., a video in video database 120). If the visual content is determined to not be a query video, then processing 200 can proceed directly to 280. Alternatively, if in 250 the visual content is determined to be a query video, then in 260 visual content retrieval system 100 can use image visual content retriever 170 to search for and provide one or more nearest neighboring videos for the query video based on the combined visual descriptors. Next, in 270, visual content retrieval system 100 can store the query video in video database 120 for future retrieval.
In 280, visual content retrieval system 100 can store and/or index the combined visual descriptors of the visual content in video features database 130, which can then be used by visual content retrieval system 100 to search for and retrieve the visual content in the future. Finally, in 290, visual content retrieval system 100 can determine whether or not to continue processing 200. If yes, then processing 200 returns to 210; if no, then processing 200 ends.
As shown, system 300 may include at least one processor 302, a keyboard 317, a pointing device 318 (e.g., a mouse, a touchpad, and the like), a display 316, main memory 310, an input/output controller 315, and a storage device 314. Storage device 314 can comprise, for example, RAM, ROM, flash memory, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. A copy of the computer program embodiment of visual content retrieval system 100 can be stored on, for example, storage device 314. System 300 may also be provided with additional input/output devices, such as a printer (not shown). The various components of system 300 communicate through a system bus 312 or similar architecture. In addition, system 300 may include an operating system (OS) 320 that resides in memory 310 during operation. One skilled in the art will recognize that system 300 may include multiple processors 302. For example, system 300 may include multiple copies of the same processor. Alternatively, system 300 may include a heterogeneous mix of various types of processors. For example, system 300 may use one processor as a primary processor and other processors as co-processors. For another example, system 300 may include one or more multi-core processors and one or more single core processors. Thus, system 300 may include any number of execution cores across a set of processors (e.g., processor 302). As to keyboard 317, pointing device 318, and display 316, these components may be implemented using components that are well known to those skilled in the art. One skilled in the art will also recognize that other components and peripherals may be included in system 300.
Main memory 310 serves as a primary storage area of system 300 and holds data that is actively used by applications, such as visual content retrieval system 100, running on processor 302. One skilled in the art will recognize that applications are software programs that each contains a set of computer instructions for instructing system 300 to perform a set of specific tasks during runtime, and that the term “applications” may be used interchangeably with application software, application programs, and/or programs in accordance with embodiments of the present teachings. Memory 310 may be implemented as a random access memory or other forms of memory as described below, which are well known to those skilled in the art.
OS 320 is an integrated collection of routines and instructions that are responsible for the direct control and management of hardware in system 300 and system operations. Additionally, OS 320 provides a foundation upon which to run application software. For example, OS 320 may perform services, such as resource allocation, scheduling, input/output control, and memory management. OS 320 may be predominantly software, but may also contain partial or complete hardware implementations and firmware. Well known examples of operating systems that are consistent with the principles of the present teachings include MICROSOFT WINDOWS (e.g., WINDOWS CE, WINDOWS NT, WINDOWS 2000, WINDOWS XP, and WINDOWS VISTA), MAC OS, LINUX, UNIX, ORACLE SOLARIS, OPEN VMS, and IBM AIX.
The foregoing description is illustrative, and variations in configuration and implementation may occur to persons skilled in the art. For instance, the various illustrative logics, logical blocks, modules, and circuits described in connection with the embodiments disclosed herein may be implemented or performed with a general purpose processor (e.g., processor 302), a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein. A general-purpose processor may be a microprocessor, but, in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration.
In one or more exemplary embodiments, the functions described may be implemented in hardware, software, firmware, or any combination thereof. For a software implementation, the techniques described herein can be implemented with modules (e.g., procedures, functions, subprograms, programs, routines, subroutines, modules, software packages, classes, and so on) that perform the functions described herein. A module can be coupled to another module or a hardware circuit by passing and/or receiving information, data, arguments, parameters, or memory contents. Information, arguments, parameters, data, or the like can be passed, forwarded, or transmitted using any suitable means including memory sharing, message passing, token passing, network transmission, and the like. The software codes can be stored in memory units and executed by processors. The memory unit can be implemented within the processor or external to the processor, in which case it can be communicatively coupled to the processor via various means as is known in the art.
If implemented in software, the functions may be stored on or transmitted over a computer-readable medium as one or more instructions or code. Computer-readable media includes both tangible computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available tangible media that can be accessed by a computer. By way of example, and not limitation, such tangible computer-readable media can comprise RAM, ROM, flash memory, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. Disk and disc, as used herein, includes CD, laser disc, optical disc, DVD, floppy disk and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Also, any connection is properly termed a computer-readable medium. For example, if the software is transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. Combinations of the above should also be included within the scope of computer-readable media. Resources described as singular or integrated can in one embodiment be plural or distributed, and resources described as multiple or distributed can in embodiments be combined. The scope of the present teachings is accordingly intended to be limited only by the following claims, and modifications and equivalents may be made to the features of the claims without departing from the scope of the present teachings.
Number | Date | Country | Kind |
---|---|---|---|
201210280182.3 | Aug 2012 | CN | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2013/053937 | 8/7/2013 | WO | 00 |