Embodiments relate generally to digital image processing. More specifically, embodiments relate to a triangle mesh based image descriptor.
With the increasing popularity of mobile camera devices and high-speed mobile Internet, the ability to search an image database for images that match a query image captured by a mobile camera device is desirable. The query image may be sent, e.g., via a mobile Internet connection, to an image search system so that similar images may be searched from a remote image database. Further, mobile device users would typically like to find context (i.e., semantic) information related to visual objects in the physical world by taking a picture of the objects. As such, a service that would allow mobile device users to take pictures of objects and receive information about the objects from a database containing semantic context information about images would be regarded as a value-added service. For example, by taking a picture of a movie advertisement that includes a picture, the context information about the movie, such as, who the main actors and actresses are, what the movie is about, and the like, may be displayed for review by the user on the display screen of mobile terminal.
A visual interaction based service of the type mentioned above depends heavily upon being able to effectively and efficiently perform image matching, which, in turn, depends upon having a robust descriptor of image content. It is typically quite challenging to define an image descriptor that is insensitive to image noise, geometric distortion, spatial translation variation, rotation variation, scale variation, and the like, and that still optimizes both local and global image content representations. Conventional image descriptors typically can be classified into two categories: global image descriptors and local image descriptors. Global image descriptors focus on the description of the whole image vector, such as global histogram, image gradient, and so on. While local image descriptors are the image local features extracted around previously detected interest points, and the image is represented by all or some of these local features. Although local descriptors have strong representative ability, they typically fail to utilize global information of the image, thereby limiting their effectiveness in representing image content. As such, an improved image descriptor would advance the art.
The following presents a simplified summary in order to provide a basic understanding of some aspects of the invention. The summary is not an extensive overview of the invention. It is neither intended to identify key or critical elements of the invention nor to delineate the scope of the invention. The following summary merely presents some concepts of the invention in a simplified form as a prelude to the more detailed description below.
Embodiments are directed to creating a triangle mesh by using a distance-minimum criterion on a plurality of feature points detected from an image, computing, based on the triangle mesh, global features that describe a global representation of content of the image, and computing, based on the triangle mesh, local features that describe a local representation of content of the image. The global features may include a triangle distribution scatter of mesh that shows a texture density of the content of the image and a color histogram of mesh region that represents image color information corresponding to a mesh region of interest. The local features may include a definition of each mesh triangle shape via its three angles and a color histogram of each mesh triangle to represent image color information corresponding to each triangle region.
A more complete understanding of the present invention and the advantages thereof may be acquired by referring to the following description in consideration of the accompanying drawings, in which like reference numbers indicate like features, and wherein:
In the following description of the various embodiments, reference is made to the accompanying drawings, which form a part hereof, and in which is shown by way of illustration various embodiments in which the invention may be practiced. It is to be understood that other embodiments may be utilized and structural and functional modifications may be made without departing from the scope of the present invention.
The mobile terminal 10 includes an antenna 12 in operable communication with a transmitter 14 and a receiver 16. The mobile terminal 10 further includes a controller 20 or other processing element that provides signals to and receives signals from the transmitter 14 and receiver 16, respectively. The signals include signaling information in accordance with the air interface standard of the applicable cellular system, and also user speech and/or user generated data. In this regard, the mobile terminal 10 is capable of operating with one or more air interface standards, communication protocols, modulation types, and access types. By way of illustration, the mobile terminal 10 is capable of operating in accordance with any of a number of first, second and/or third-generation communication protocols or the like. For example, the mobile terminal 10 may be capable of operating in accordance with second-generation (2G) wireless communication protocols IS-136 (TDMA), GSM, and IS-95 (CDMA) or third-generation wireless communication protocol Wideband Code Division Multiple Access (WCDMA).
It is understood that the controller 20 includes circuitry required for implementing audio and logic functions of the mobile terminal 10. For example, the controller 20 may be comprised of a digital signal processor device, a microprocessor device, and various analog to digital converters, digital to analog converters, and other support circuits. Control and signal processing functions of the mobile terminal 10 are allocated between these devices according to their respective capabilities. The controller 20 thus may also include the functionality to convolutionally encode and interleave message and data prior to modulation and transmission. The controller 20 can additionally include an internal voice coder, and may include an internal data modem. Further, the controller 20 may include functionality to operate one or more software programs, which may be stored in memory. For example, the controller 20 may be capable of operating a connectivity program, such as a conventional Web browser. The connectivity program may then allow the mobile terminal 10 to transmit and receive Web content, such as location-based content, according to a Wireless Application Protocol (WAP), for example.
The mobile terminal 10 also comprises a user interface including an output device such as a conventional earphone or speaker 24, a ringer 22, a microphone 26, a display 28, and a user input interface, all of which are coupled to the controller 20. The user input interface, which allows the mobile terminal 10 to receive data, may include any of a number of devices allowing the mobile terminal 10 to receive data, such as a keypad 30, a touch display (not shown) or other input device. In embodiments including the keypad 30, the keypad 30 may include the conventional numeric (0-9) and related keys (#, *), and other keys used for operating the mobile terminal 10. Alternatively, the keypad 30 may include a conventional QWERTY keypad. The mobile terminal 10 further includes a battery 34, such as a vibrating battery pack, for powering various circuits that are required to operate the mobile terminal 10, as well as optionally providing mechanical vibration as a detectable output.
In certain embodiments, the mobile terminal 10 includes a camera module 36 in communication with the controller 20. The camera module 36 may be any means for capturing an image for storage, display or transmission. For example, the camera module 36 may include a digital camera capable of forming a digital image file from a captured image. As such, the camera module 36 includes all hardware, such as a lens or other optical device, and software necessary for creating a digital image file from a captured image. Alternatively, the camera module 36 may include only the hardware needed to view an image, while a memory device of the mobile terminal 10 stores instructions for execution by the controller 20 in the form of software necessary to create a digital image file from a captured image. In certain embodiments, the camera module 36 may further include a processing element such as a co-processor which assists the controller 20 in processing image data and an encoder and/or decoder for compressing and/or decompressing image data. The encoder and/or decoder may encode and/or decode according to a JPEG standard format.
The mobile terminal 10 may further include a user identity module (UIM) 38. The UIM 38 is typically a memory device having a processor built in. The UIM 38 may include, for example, a subscriber identity module (SIM), a universal integrated circuit card (UICC), a universal subscriber identity module (USIM), a removable user identity module (R-UIM), etc. The UIM 38 typically stores information elements related to a mobile subscriber. In addition to the UIM 38, the mobile terminal 10 may be equipped with memory. For example, the mobile terminal 10 may include volatile memory 40, such as volatile Random Access Memory (RAM) including a cache area for the temporary storage of data. The mobile terminal 10 may also include other non-volatile memory 42, which can be embedded and/or may be removable. The non-volatile memory 42 can additionally or alternatively comprise an EEPROM, flash memory or the like, such as that available from the SanDisk Corporation of Sunnyvale, Calif., or Lexar Media Inc. of Fremont, Calif. The memories can store any of a number of pieces of information, and data, used by the mobile terminal 10 to implement the functions of the mobile terminal 10. For example, the memories can include an identifier, such as an international mobile equipment identification (IMEI) code, capable of uniquely identifying the mobile terminal 10.
The MSC 46 may be coupled to a data network, such as a local area network (LAN), a metropolitan area network (MAN), and/or a wide area network (WAN). The MSC 46 may be directly coupled to the data network. In certain embodiments, however, the MSC 46 is coupled to a GTW 48, and the GTW 48 is coupled to a WAN, such as the Internet 50. In turn, devices such as processing elements (e.g., personal computers, server computers or the like) can be coupled to the mobile terminal 10 via the Internet 50. For example, as explained below, the processing elements may include one or more processing elements associated with a computing system 52 (two shown in
The BS 44 may also be coupled to a signaling GPRS (General Packet Radio Service) support node (SGSN) 56. As known to those skilled in the art, the SGSN 56 is typically capable of performing functions similar to the MSC 46 for packet switched services. The SGSN 56, like the MSC 46, may be coupled to a data network, such as the Internet 50. The SGSN 56 may be directly coupled to the data network. In certain embodiments, however, the SGSN 56 is coupled to a packet-switched core network, such as a GPRS core network 58. The packet-switched core network is then coupled to another GTW 48, such as a GTW GPRS support node (GGSN) 60, and the GGSN 60 is coupled to the Internet 50. In addition to the GGSN 60, the packet-switched core network may also be coupled to a GTW 48. Also, the GGSN 60 may be coupled to a messaging center. In this regard, the GGSN 60 and the SGSN 56, like the MSC 46, may be capable of controlling the forwarding of messages, such as MMS messages. The GGSN 60 and SGSN 56 may also be capable of controlling the forwarding of messages for the mobile terminal 10 to and from the messaging center.
In addition, by coupling the SGSN 56 to the GPRS core network 58 and the GGSN 60, devices such as a computing system 52 and/or origin server 54 may be coupled to the mobile terminal 10 via the Internet 50, SGSN 56 and GGSN 60. In this regard, devices such as the computing system 52 and/or origin server 54 may communicate with the mobile terminal 10 across the SGSN 56, GPRS core network 58 and the GGSN 60. By directly or indirectly connecting mobile terminals 10 and the other devices (e.g., computing system 52, origin server 54, etc.) to the Internet 50, the mobile terminals 10 may communicate with the other devices and with one another, such as according to the Hypertext Transfer Protocol (HTTP), to thereby carry out various functions of the mobile terminals 10.
Although not every element of every possible mobile network is shown and described herein, it should be appreciated that the mobile terminal 10 may be coupled to one or more of any of a number of different networks through the BS 44. In this regard, the network(s) may be capable of supporting communication in accordance with any one or more of a number of first-generation (1G), second-generation (2G), 2.5G, third-generation (3G) and/or future mobile communication protocols or the like. For example, one or more of the network(s) can be capable of supporting communication in accordance with 2G wireless communication protocols IS-136 (TDMA), GSM, and IS-95 (CDMA). Also, for example, one or more of the network(s) may be capable of supporting communication in accordance with 2.5G wireless communication protocols GPRS, Enhanced Data GSM Environment (EDGE), or the like. Further, for example, one or more of the network(s) may be capable of supporting communication in accordance with 3G wireless communication protocols such as Universal Mobile Telephone System (UMTS) network employing Wideband Code Division Multiple Access (WCDMA) radio access technology. Some narrow-band AMPS (NAMPS), as well as TACS, network(s) may also benefit from embodiments of the present invention, as should dual or higher mode mobile stations (e.g., digital/analog or TDMA/CDMA/analog phones).
The mobile terminal 10 may be further coupled to one or more wireless access points (APs) 62. The APs 62 may comprise access points configured to communicate with the mobile terminal 10 in accordance with techniques such as, for example, radio frequency (RF), Bluetooth (BT), infrared (IrDA) or any of a number of different wireless networking techniques, including wireless LAN (WLAN) techniques such as IEEE 802.11 (e.g., 802.11a, 802.11b, 802.11g, 802.11n, etc.), WiMAX techniques such as IEEE 802.16, and/or ultra wideband (UWB) techniques such as IEEE 802.15 or the like. The APs 62 may be coupled to the Internet 50. Like with the MSC 46, the APs 62 may be directly coupled to the Internet 50. In certain embodiments, however, the APs 62 are indirectly coupled to the Internet 50 via a GTW 48. Furthermore, in certain embodiments, the BS 44 may be considered as another AP 62. As will be appreciated, by directly or indirectly connecting the mobile terminals 10 and the computing system 52, the origin server 54, and/or any of a number of other devices, to the Internet 50, the mobile terminals 10 may communicate with one another, the computing system, etc., to thereby carry out various functions of the mobile terminals 10, such as to transmit data, content or the like to, and/or receive content, data or the like from, the computing system 52. As used herein, the terms “data,” “content,” “information” and similar terms may be used interchangeably to refer to data capable of being transmitted, received and/or stored in accordance with embodiments.
Although not shown in
Certain embodiment will now be discussed with reference to
The system of
A triangle mesh constructor 404 is configured to create a triangle mesh 405 with feature points as mesh nodes based on distance-minimum criterion, as is discussed in more detail below. The resulting triangle mesh is affine-invariant in accordance with certain embodiments.
A global descriptor computation module 406 is configured to compute global features that describe a global representation of image content. The global descriptor computation module computes a triangle distribution scatter of mesh based on the constructed triangle mesh. The triangle distribution scatter of mesh is computed to show the texture density of the image content, and a color histogram of mesh region is computed to represent image color information corresponding to a mesh region of interest. These global features, namely, the triangle distribution scatter of mesh and the color histogram of mesh region, describe a global representation of image content.
A local descriptor computation module 408 is configured to compute local features that describe a local representation of image content. The local descriptor computation module defines each mesh triangle shape with its three angles, and computes a color histogram of each mesh triangle to represent image color information corresponding to the triangle region. These local features, namely, triangle mesh shape and mesh triangle color histogram, describe a local representation of image content. The shape of a triangle may be represented by its three angles. If the three angles are the same for two triangles, then the two triangles are similar and have a scale variation, i.e. they have the same shapes (although not necessarily the same size). So the similarity of the shapes of two triangles may be evaluated by the similarity of the corresponding angles.
A patch feature computation module 410 is configured to compute a patch feature around each feature point region based on the detected feature points in an image. The patch feature computation module may be configured to compute patch features in accordance with various techniques, several of which are known in the art, including, but not limited to, the discussion of computation of patch features in D. Lowe, Distinctive Image Features from Scale-Invariant Keypoints, Cascade Filtering Approach. IJCV, 60: 91-110, 2004, and the discussion of computation of patch features in H. Bay, et al., SURF: Speeded Up Robust Features. ECCV, I:404-417, 2006.
A triangle mesh based image descriptor 412 comprises the global features, the local features, and the patch feature discussed above.
Feature points are detected from an input image, as shown at 502. Feature point detection from an image may be performed by various methods, which are known in the art, as is discussed in A. Alexandrov. Corner Detection Overview and Comparison. Computer Vision “http://www.cisnav.com/alex/cs558/CornerDet.pdf”, 2002. The robust feature point detection method disclosed in U.S. patent application Ser. No. 11/428,903, filed Jul. 6, 2006, by Kongqiao Wang et al., may also be used. To summarize this feature point detection method, assume a potential corner block that has eight neighboring blocks surrounding the potential corner block of a particular image frame. Each of the blocks may represent a pixel. As such, each of the blocks may include a greyscale value descriptive of information associated with the pixel. Alternatively, each of the blocks may represent a group of pixels. In any case, if the potential corner block is a pixel or pixel group representing a corner of an object in a particular image, then there should be a relatively large greyscale or color difference between the corner block and the eight neighboring blocks in at least two directions. Meanwhile, if the potential corner block instead represents, for example, a portion of a side edge of an object, then only blocks along the edge (i.e., blocks in one direction) may have substantially different greyscale values than that of the potential corner block, while all remaining blocks may have substantially similar greyscale values to that of the potential corner block. Furthermore, if the potential corner block is disposed at an interior portion of an object, the potential corner block may have a substantially similar greyscale value to that of each of the eight neighboring blocks.
The difference in energy amount E between a given image block and eight neighboring blocks of the given image block is written as shown in Equation (1) below.
In Equation (1), Ix,y represents the given image block, Ix+u, y+v represents the eight neighboring blocks, and Wu,v represents the weighted values for each of the eight neighboring blocks. The above formula is decomposed in Taylor criteria at (x, y) as shown below in Equation (2).
In Equation (2), X=I(−1,0,1)=∂I/∂x, and Y=I(−1,0,1)T=∂I/∂y. Further, Equation (2) can be described as shown below in Equation (3).
E(x,y)=Ax2+2Cxy+By2 (3)
In Equation (3), A=X2w, B=Y2w, and C=(XY)w. Variable w represents a window region including the potential corner block and the eight neighboring blocks with a center at point (x,y). Finally, E(x,y) can be written in matrix form as shown in Equation (4) below.
In Equation (4), M describes a shape of E(x, y), if both eigenvalues of M are relatively small, then the given block is likely part of a smooth region. If both eigenvalues of M are relatively large, and E(x, y) shows a deep vale, then the given block likely includes a corner. If one eigenvalue is relatively large, while the other eigenvalue is relatively small, then the given block likely includes an edge.
Throughout an image frame, the two M eigenvalues of each pixel point are calculated by a feature extractor, and those points for which both M eigenvalues are relatively large may be considered to be potential corners or feature points. For each potential corner in a same frame, a smaller M eigenvalue of the two relatively large M eigenvalues is sorted, and then a predetermined number of feature points are selected from among the potential corners which have the largest smaller M eigenvalues.
A triangle mesh is constructed as shown at 504. Based on the detected feature points,
Extraction of a triangle mesh based descriptor may be performed on the constructed triangle mesh and corresponding image content. Four components, namely, a triangle structure, a triangle distribution scatter of mesh, a color histogram of triangle mesh region, and a patch feature, may be generated as follows.
To generate a triangle structure in accordance with certain embodiments, as shown at 506, assume that the three angles of a mesh triangle Tj are αj,βj,γj and that the three angles are arranged counterclockwise such that αj≦βj,αj≦γj(γj=π−αj−βj). Based on the geometric structure feature of the triangle, we further define a measurement to evaluate a degree of similarity of two triangles Ti and Tj:
A triangle distribution scatter may be defined for each mesh, as shown at 508, to evaluate a global texture density of an image. First, compute the centroid cj(xj,yj) of each triangle Tj. Then, compute the center point of the centroids for the triangles in the mesh:
Here, N is the triangle number of a mesh.
Finally, the scatter R for triangle distribution of this mesh may be calculated according to Eq. (7):
Where, dj=∥cj(xj,yj),c0(x0,y0)∥. W and H are the corresponding width and height of the image, respectively.
To compute a color histogram, as shown at 510, first, we classify the R, G, B (red, green, blue) color space into several relatively small spaces, and then count the number of pixels in each relatively small color space.
Formally, for each triangle Tj, its color histogram
bin is the category number classified for each color level, n is the total number of the pixels in the triangle Tj, and C is a counting function, which is defined as:
Having gotten the color histograms for triangles, by integrating them, the color histogram of the triangle mesh region
Here,
To represent the local content of the image, here the local patch feature, as shown at 512, may be computed in a manner known in the art, such as the manner disclosed in D. Lowe, Distinctive Image Features from Scale-Invariant Keypoints, Cascade Filtering Approach. IJCV, 60: 91-110, 2004, and the manner disclosed in H. Bay, et al., SURF: Speeded Up Robust Features. ECCV, I:404-417, 2006. For each feature point, a square region centered around the feature point is constructed, and the patch feature vector is extracted. Any feature transformation may be performed on such square patches to get the feature representation. For example, in certain embodiments, a patch feature may be constructed in accordance with the discussion on page 409-41 of H. Bay, et al., SURF: Speeded Up Robust Features. ECCV, I: 404-417, 2006. The usually adopted features include Haar feature, Gabor feature, histogram feature and the like. In accordance with certain embodiments, the patch feature is not limited to the Haar feature, Gabor feature, histogram feature or the like. Instead, the patch feature may be any one or more of various suitable image features.
A triangle mesh based image descriptor in accordance with certain embodiments describes local features of an image, e.g., patch description of each feature point, color histogram of image on each mesh triangle region and also describes the global structure feature of image through the neighboring relationship between mesh triangles, and the distribution scatter of triangle mesh. The constructed triangle mesh is affine invariant, i.e., scale-invariant, translation-invariant, rotation-invariant, and the like. Each triangle can accurately characterize its local content in the image, and the mesh scatter describes the texture density of the image.
One or more aspects of the invention may be embodied in computer-executable instructions, such as in one or more program modules, executed by one or more computers or other devices. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types when executed by a processor in a computer or other device. The computer executable instructions may be stored on a computer readable medium such as a hard disk, optical disk, removable storage media, solid state memory, RAM, etc. As will be appreciated by one of skill in the art, the functionality of the program modules may be combined or distributed as desired in various embodiments. In addition, the functionality may be embodied in whole or in part in firmware or hardware equivalents such as integrated circuits, field programmable gate arrays (FPGA), application specific integrated circuits (ASIC), and the like. The term “processor” and “memory” comprising executable instructions should be interpreted to include the variations described in this paragraph and equivalents thereof.
For example, in certain embodiments, functions, including, but not limited to, the following functions, may be performed by a processor executing computer-executable instructions that are recorded on a computer-readable medium: constructing a triangle mesh to describe an image based on feature points detected from the image; extracting a triangle mesh based descriptor from the constructed triangle mesh and from content of the image by computing a triangle structure, a triangle distribution scatter of mesh, a color histogram of triangle mesh region, and a patch feature; (a) computing lines composed by the detected feature points,
Embodiments include any novel feature or combination of features disclosed herein either explicitly or any generalization thereof. While embodiments have been described with respect to specific examples including presently preferred modes of carrying out the invention, those skilled in the art will appreciate that there are numerous variations and permutations of the above described systems and techniques. Thus, the scope of the invention should be construed broadly as set forth in the appended claims.