This disclosure relates generally to computer vision based object recognition applications, and in particular but not exclusively, relates to building feature databases for such systems.
A challenge to enabling Augmented Reality (AR) on mobile phones or other mobile platforms is the problem of detecting and tracking objects in real-time. Object detection for AR applications has very demanding requirements: it must deliver full six degrees of freedom, give absolute measurements with respect to a given coordinate system, be very robust and run in real-time. Of interest are methods to compute camera pose using computer vision (CV) based approaches, which rely on first detecting and, subsequently, tracking objects within the camera view. In one aspect, the detection operation includes detecting a set of features contained within the digital image. A feature may refer to a region in the digital image that differs in properties, such as brightness or color, compared to areas surrounding that region. In one aspect, a feature is a region of a digital image in which some properties are constant or vary within a prescribed range of values.
The detected features are then compared to known features contained in a feature database in order to determine whether a real-world object is present in the image. Thus, an important element in the operation of a vision-based AR system is the composition of the feature database. In some systems, the feature database is built pre-runtime by taking multiple sample images of known target objects from a variety of known viewpoints. Features are then extracted from these sample images and added to the feature database. However, storing every extracted feature results in prohibitively large databases which leads to poor performance.
Some embodiments discussed herein provide a feature database for object recognition/detection that is generated by pruning similar features extracted from multi-view sample images of a known object. In general, features are extracted from multi-view images and then a derived feature that is representative of a group of similar features is then generated and stored in the database. The group of similar features may then be discarded (i.e., pruned). Thus, the database avoids the issue of containing similar features with an unmanageable database size. Accordingly, the derived features that are added to the database are not the extracted features, but instead are each derived from a group of like extracted features.
According to one aspect of the present disclosure, a method of building a database for an object recognition system includes acquiring several multi-view images of a target object and then extracting a first set of features from the images. In one example the first set of features is limited to only those features which correspond to the target object. Next, one of these extracted features is selected and a second set of features is determined based on the selected feature. In one example, the features included in the second set are taken from the first set of features and include those features that have both, a descriptor that matches (e.g., is similar) to the descriptor of the selected feature, and a keypoint location that is the same or proximate to the keypoint location of the selected feature. In one example, if a repeatability of the selected feature is greater than a repeatability threshold and if a discriminability is greater than a discriminability threshold, then at least one derived feature is stored to the database, where the derived feature is representative of the second set of features. The second set of features may then be discarded and the process repeated for each remaining feature included in the first set of extracted features.
According to another aspect of the present disclosure, a computer-readable medium including program code stored thereon is provided. The program code is configured to build a database containing a plurality of features corresponding to a 3-dimensional (3D) target object and includes instructions to acquire a plurality of images of the target object, where each of the plurality of images are acquired from a distinct and known viewpoint of the target object. The program code also includes instructions to extract a first set of features from the plurality of images, where each extracted feature includes a descriptor and a corresponding keypoint location. A feature is then selected from the first set of features and a series of instructions are performed on the selected feature. For example, a second set of features may be features chosen from the first set of features that have both, a descriptor that matches a descriptor of the selected feature, and a keypoint location proximate to a keypoint location of the selected feature. Next, a repeatability and discriminability of the selected feature is determined. A derived feature, representative of the entire second set is then stored based on the repeatability and discriminability of the selected feature.
In yet another aspect of the present disclosure an apparatus includes both memory and a processing unit. The memory is adapted to store program code for building a database containing a plurality of features corresponding to a 3-dimensional (3D) target object. The processing unit is coupled to the memory and adapted to access and execute instructions included in the program code. When the instructions are executed by the processing unit, the processing unit directs the apparatus to acquire a plurality of images of the target object, where each of the plurality of images are acquired from a distinct and known viewpoint of the target object. The processing unit also directs the apparatus to extract a first set of features from the plurality of images, where each extracted feature includes a descriptor and a corresponding keypoint location. The processing unit then selects a feature from the first set of features; and then, (a) determines a second set of features corresponding to the selected feature, wherein the second set of features includes features of the first set that have both, a descriptor that matches a descriptor of the selected feature, and a keypoint location proximate to a keypoint location of the selected feature; (b) determines a repeatability of the selected feature; (c) determines a discriminability of the selected feature; and (d) stores at least one derived feature based, at least, on the repeatability of the selected feature, a repeatability threshold, the discriminability of the selected feature, and a discriminability threshold, wherein the at least one derived feature is representative of the second set of features.
An apparatus according to another aspect of the present disclosure is for use in building a database containing a plurality of features corresponding to a 3-dimensional (3D) target object. The apparatus includes means for acquiring a plurality of images of the target object, where each of the plurality of images are acquired from a distinct and known viewpoint of the target object. The apparatus also includes means for extracting a first set of features from the plurality of images, where each extracted feature includes a descriptor and a corresponding keypoint location. Also included in the apparatus are means for selecting a feature from the first set of features, and then: (a) determining a second set of features corresponding to the selected feature, wherein the second set of features includes features of the first set that have both, a descriptor that matches a descriptor of the selected feature, and a keypoint location proximate to a keypoint location of the selected feature; (b) determining a repeatability of the selected feature; (c) determining a discriminability of the selected feature; and (d) storing at least one derived feature based, at least, on the repeatability of the selected feature, a repeatability threshold, the discriminability of the selected feature, and a discriminability threshold, wherein the at least one derived feature is representative of the second set of features.
The present disclosure also provides a method of building a pruned database from an existing model containing a first set of features. This method includes rendering several synthetic images of a target object. Features are then extracted from the synthetic images to create a second set of features. The extracted features of the second set are matched to features included in the first set. Then it is determined how many matches there are for each feature of the first set. The feature with the most matches is added to the pruned database and then removed from the first set. The feature from the first set with the next most matches is then added to the pruned database and so on, until each viewpoint used to render the synthetic images includes a threshold number of features that have been added to the pruned database.
In particular, building a pruned database from an existing database is accomplished by way of a computer-implemented method, where the existing database contains a first set of features corresponding to a target object. The method includes: rendering a plurality of synthetic images of the target object based on the first set of features contained in the existing database, wherein each of the plurality of synthetic images are rendered using a distinct and known viewpoint; extracting a second set of features from the plurality of synthetic images; matching features of the second set to features included in the first set; determining a number of times each feature of the first set is matched to a feature of the second set; and then, (a) adding a feature of the first set that has the most matches to the pruned database and removing the feature from the first set; and (b) repeating (a) until each viewpoint used to render the plurality of synthetic images includes a threshold number of features added to the pruned database.
According to several embodiments, the above computer-implemented method of building a pruned database from an existing database may further include:
In addition, according to several embodiments, the above computer-implemented method of building a pruned database from an existing database may by embodied by way of a computer-readable medium that includes program code stored thereon for building the pruned database. The program code may include instructions to: render a plurality of synthetic images of the target object based on the first set of features contained in the existing database, wherein each of the plurality of synthetic images are rendered using a distinct and known viewpoint; extract a second set of features from the plurality of synthetic images; match features of the second set to features included in the first set; determine a number of times each feature of the first set is matched to a feature of the second set; and then, (a) add a feature of the first set that has the most matches to the pruned database and removing the feature from the first set; and (b) repeat (a) until each viewpoint used to render the plurality of synthetic images includes a threshold number of features added to the pruned database.
According to several embodiments, the above computer-readable medium for building a pruned database from an existing database may further include:
Furthermore, the present disclosure further provides for an apparatus that includes memory and a processing unit. The memory is adapted to store program code for building a pruned database from an existing database containing a plurality of features corresponding to a target object. The processing unit is adapted to access and execute instructions included in the program code, wherein when the instructions are executed by the processing unit, the processing unit directs the apparatus to: render a plurality of synthetic images of the target object based on the first set of features contained in the existing database, wherein each of the plurality of synthetic images are rendered using a distinct and known viewpoint; extract a second set of features from the plurality of synthetic images; match features of the second set to features included in the first set; determine a number of times each feature of the first set is matched to a feature of the second set; and then, (a) add a feature of the first set that has the most matches to the pruned database and removing the feature from the first set; and (b) repeat (a) until each viewpoint used to render the plurality of synthetic images includes a threshold number of features added to the pruned database.
According to several embodiments, the above apparatus may further include:
The above and other aspects, objects, and features of the present disclosure will become apparent from the following description of various embodiments, given in conjunction with the accompanying drawings.
Non-limiting and non-exhaustive embodiments of the invention are described with reference to the following figures, wherein like reference numerals refer to like parts throughout the various views unless otherwise specified.
Reference throughout this specification to “one embodiment”, “an embodiment”, “one example”, or “an example” means that a particular feature, structure, or characteristic described in connection with the embodiment or example is included in at least one embodiment of the present invention. Thus, the appearances of the phrases “in one embodiment” or “in an embodiment” in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. Any example or embodiment described herein is not to be construed as preferred or advantageous over other examples or embodiments.
Returning now to
In some embodiments, the first set of features (e.g., feature set 308) includes only those extracted features that have keypoint locations associated with the target object. For example, all features in the first set may be limited to such features that belong to the object of interest. This determination may be done by using a CAD model corresponding to the target object and the known camera pose to segment out the features belonging to the object. Alternative solutions are possible which include a measured depth of extracted features and a known camera pose, or alternative object segmentation techniques based on known background properties.
Once the first set of features is extracted, process block 115 includes selecting one of the features from the first set. Next, in process block 120, a second set of features is determined based on this selected feature. For example, process block 120 may include examining the first set of features to find those features that include both a descriptor that is similar to that of the selected feature and a keypoint locations that is proximate to that of the selected feature. These matched features are then added to the second set of features.
In one embodiment a descriptor is a L-dimensional vector describing the occurrence of a keypoint from one viewpoint (image). Thus, two descriptors are similar if their difference (which itself is a L-dimensional vector) is small in norm/magnitude. Accordingly, process block 120 may include determining whether two descriptors are similar by subtracting one descriptor from another and comparing the result to a descriptor distance threshold (e.g., |f1−fi|<descth, where descth is the descriptor distance threshold). Determining whether keypoint locations are proximate is similar to that describe above, except that keypoint locations are 3-dimensional vectors of (x,y,z) coordinates according to a pre-defined (or set) coordinate system (e.g., |k1−ki|<dkptth, where dkptth is the keypoint distance threshold).
Accordingly, the second set of features is a subset of the extracted first set of features whose descriptors are similar to that of the selected feature and also whose keypoint locations are proximate to that of the selected feature. In one embodiment, the second set of features includes the selected feature. Once this second set of features is determined, decision blocks 125 and 130, decide whether the selected feature is both repeatable and discriminable. The repeatability of a feature refers to the number of viewpoints in which the same (or similar) feature is observed and in one example, may just simply be the number of features included in the second set of features since each image was taken from a distinct viewpoint. In one embodiment, determining the repeatability of the selected feature includes determining whether a keypoint location of the selected feature is observable from multiple distinct viewpoints, and if so determining a number of viewpoints in which the keypoint location of the selected feature is described by a descriptor similar to the descriptor of the selected feature. It is noted that this determination of the number of viewpoints includes analysis of the selected feature's keypoint location, as well as proximally located keypoints (e.g., within the keypoint distance threshold dkptth). Thus, the repeatability may be determined by counting the number of similar observations of a same or proximally located keypoint. In other words, similar descriptors attached to keypoints that are distinct but essentially co-located count as two observations of a same keypoint. Once quantified, the repeatability of the selected feature may then be compared against a fixed repeatability threshold (ri>rth?).
The discriminability of the features refers to the ability to discriminate between the selected feature and other extracted features. In one example, the discriminability may be quantified as the ratio of the number of features in the second set to the number of all extracted features that have similar descriptors. Determining the discriminability of the selected feature may include determining a first number of viewpoints in which a keypoint location of the selected feature (or proximally located keypoints) is described by a descriptor similar to a descriptor of the selected feature. Then a second number of all features in the first set of features that have descriptors similar to the descriptor of the selected feature, regardless of keypoint location, is determined. The discriminability may then be represented as the ratio between this first number to the second number. In one embodiment, the discriminability is compared against a fixed discriminability threshold to determine whether the discriminability of the selected feature is high enough (di>dth?).
If, in decision block 130, it is determined that the selected feature is not discriminable (e.g., di<dth) then this indicates that the features from the second set are not to be represented in the pruned database due to low discriminability. That is, besides a cluster of similar descriptors at a keypoint location, the first set of features contains at least one more similar descriptor of a different keypoint location. In the matching process an observation of this keypoint may then easily be mistaken by an observation of another keypoint, and vice versa. Thus, the features of the second set as well as the features in the first set that have a similar descriptor may be discarded (e.g., process block 140) as they are not consistent with a unique geometric location. In one embodiment, these discarded features may still figure in calculating the discriminability of other unrelated features, but by the symmetric nature of “similarity” relationships and by the fact that the descriptors in the second set of features are by nature grouped tightly together, all these features are safe to ignore from that point onwards.
If, in decision block 125, it is determined that the selected feature is not repeatable (e.g., ri<rth), then this indicates that the features from the second set are not to be represented in the pruned database due to low repeatability. Thus, the same low repeatability will hold true not just for the selected feature, but for all the features in the second set. Thus, none of them should be represented in the pruned database. Moreover, if a keypoint location is genuinely so hard to observe, then these descriptors need not penalize other similar descriptors attached to a more repeatable different keypoint by casting them to be not discriminative. Therefore, again, for all practical purposes, it is safe to simply discard all features in the second set (e.g., process block 140) from that point onward.
If, however, the second set of features is determined to be both repeatable and discriminable, then process 100 proceeds to process block 135, where at least one derived feature is generated and added to the feature database. The derived feature is representative of the second set of features and, in one example, may include a descriptor that is an average of the descriptors included in the second set.
In one example, the derived feature is a single feature representative of all the features included in the second set. In another example, process block 135 includes generating an M number of derived features for the selected feature, where the M number of derived features are generated by clustering together features of the second set into M number of clusters and then taking cluster centers.
Once the derived feature(s) is added to the database, the features of the second set may then be discarded (i.e., process block 140). In one embodiment, discarding features includes removing them from the first set of extracted features.
Next, in decision block 145 it is determined whether pruning is complete. In one example, pruning may be deemed as complete if all features of the first set have been processed by the pruning process 100. If pruning is done then process 100 completes (150). If pruning is not complete, process 100 returns to process block 115 to select another feature from the first set (e.g., fi+1) to examine for feature pruning.
Next, repeatability detector 210 examines the feature set S1 and determines whether the selected feature is repeatable. In one example, the repeatability of the selected feature is quantified as ri and may simply be the number of features included in the feature set S1. The more features included in feature set S1 corresponds to a larger number of viewpoints in which the keypoint location of the selected feature (or keypoint locations proximate to the keypoint location of the selected feature) is described by a descriptor similar to that of the selected feature. The more views that similar descriptors of a keypoint location (or proximate keypoint) appear means that the selected feature is more repeatable. Thus, the higher the repeatability ri, the better. In one example, the repeatability ri of the selected feature is compared against a repeatability threshold rth in order to determine whether the repeatability is high enough. In one embodiment, the repeatability threshold rth is fixed, however in other embodiment the repeatability threshold may vary based, for example, on the number of distinct viewpoints from which the images were acquired. By way of further example, the repeatability threshold may be directly related (e.g., proportional, a percentage, etc.) to the number of distinct viewpoints, such that as the number of viewpoints increases so too does the repeatability threshold.
Discriminability detector 212 also examines the feature set S1 and determines whether the selected feature is discriminable. That is, discriminability may refer to how easy is it to notice and understand that the selected feature is different from other extracted features. In one example, the discriminability of the selected feature is quantified as di and may be equal to a ratio of the number of features in set S1 to the number of features in set S0 (i.e., di=|S1|/|S0|). The higher the discriminability, the easier it is to discriminate the selected feature from other extracted features. In one example, discriminability di is compared against a discriminability threshold discth in order to determine whether the discriminability of the selected feature is high enough. Ideally, the discriminability di is equal to 1.0 (i.e., all of the extracted features with similar descriptors have proximate keypoint locations). In one embodiment, the discriminability threshold discth is fixed at about 0.75.
If the repeatability detector 210 determines that the repeatability of the selected feature is high enough and if the discriminability detector 212 determines that the discriminability of the selected feature high enough, then feature averager 214 may proceed with generating a derived feature gi (or M clusters of features) that is representative of the entire feature set S1. In one example, derived feature gi includes a descriptor that is the average of the descriptors included in feature set S1. Next, the derived feature gi is written to feature database 216. Optionally, the features included in set S1 may then be discarded and the process repeated for the remaining extracted features.
As shown, feature database 216 includes a j number of derived features gj, while i number of features fi were extracted by the feature extractor 204. Since each derived feature gj is representative of a set of like extracted features, the number of derived features gj added to the feature database 216 may be much less than the total number of extracted features fi(e.g., j<<i). In one embodiment, the number of derived features added to database 216 may be orders of magnitudes less than the number of extracted features included in the first set. Accordingly, embodiments of building a pruned feature database 216 may avoid the issue of exceedingly large database sizes, while also providing a model of a target object from many, if not an exhaustive, number of viewpoints.
Apparatus 400 also includes a control unit 404 that is connected to and communicates with the camera 402 and user interface 406, if present. The control unit 404 accepts and processes images received from the camera 402 and/or from network adapter 416. Control unit 404 may be provided by a processing unit 408 and associated memory 414, hardware 410, software 415, and firmware 412.
Processing unit 200 of
The processes described herein may be implemented by various means depending upon the application. For example, these processes may be implemented in hardware 410, firmware 412, software 415, or any combination thereof. For a hardware implementation, the processing units may be implemented within one or more application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable gate arrays (FPGAs), processors, controllers, micro-controllers, microprocessors, electronic devices, other electronic units designed to perform the functions described herein, or a combination thereof.
For a firmware and/or software implementation, the processes may be implemented with modules (e.g., procedures, functions, and so on) that perform the functions described herein. Any computer-readable medium tangibly embodying instructions may be used in implementing the processes described herein. For example, program code may be stored in memory 415 and executed by the processing unit 408. Memory may be implemented within or external to the processing unit 408.
If implemented in firmware and/or software, the functions may be stored as one or more instructions or code on a computer-readable medium. Examples include non-transitory computer-readable media encoded with a data structure and computer-readable media encoded with a computer program. Computer-readable media includes physical computer storage media. A storage medium may be any available medium that can be accessed by a computer. By way of example, and not limitation, such computer-readable media can comprise RAM, ROM, Flash Memory, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to store desired program code in the form of instructions or data structures and that can be accessed by a computer; disk and disc, as used herein, includes compact disc (CD), laser disc, optical disc, digital versatile disc (DVD), floppy disk and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.
As discussed above, an object recognition system includes capturing an image of a target object and then extracting features from this image. These extracted features are then compared against a feature database containing previously extracted features of known objects in order to produce reliable matches. However, the detection performance of different features of an object is not the same. For example, some physical object points can be reliably detected from wider viewing angles, which often depends on the surrounding texture and on the shape of the object itself. In addition, feature descriptor variations across different viewing angles can depend on local texture variations and on the object shape. Accordingly, embodiments of the present disclosure further provide a process of improving object detection speed and robustness by generating a feature database containing only those features that can significantly contribute to the detection task.
Given a camera frame, most pose estimation algorithms succeed if they can find a number of good matches above a predefined threshold TV. In one embodiment, the threshold TV is four (4). However, in another embodiment, the threshold TV is ten (10). In yet another embodiment, the threshold TV is fifteen (15). Thus, one aim of the pruning method described infra is to select a reduced feature set that gives at least TV matches for all synthetic views. This in turn should improve the probability that a target is detectable from any real view. For example,
As shown, process 500 includes first rendering several synthetic (i.e., virtual) images of a target object based on the features contained in existing database 502 (i.e., process block 505). Each synthetic image generated may be rendered from a distinct and known viewpoint of the target object. For example,
Returning now to
Once the features are extracted, process block 515 includes matching the extracted features of process block 510 to those features contained in the existing database 502. In one embodiment, features are matched similar to that described above using descriptor distance (e.g., f−fi<descth). Next, information on the relative pose between the virtual camera and the target object may be used to geometrically verify each match and discard false matches. By way of example, given a known 3D position of an object point in the object-centric coordinate system and given the camera pose, a point can be projected the camera image in a 2D coordinate system. Then if the 2D projection is within a fixed radius of the matched feature the match is kept, otherwise it is discarded.
Next, in process block 520 it is determined how many extracted features match to each feature contained in the existing database 502. That is, a first count may be maintained for each feature in existing database 502 indicating how many times an extracted feature is found that matches that feature in the database 502.
Next, in process block 525, the feature of existing database 502 that has the most matches is added to the pruned database 504. The feature with the most matches represents a feature that appears in the largest number of viewpoints and thus, is likely to aide in the object detection process. Also, in process block 525, the feature that was just added to the pruned database 504 (i.e., the feature with the most matches) is removed from the existing database 502.
Process 500 also includes maintaining a second count that represents the number of matches for each rendered viewpoint (e.g., V1-V5) generated by features that have been added to the pruned database 504. Thus, as a feature is added to the pruned database 504, the second count for each viewpoint associated with that feature is incremented.
Next, in decision block 530, if the second count for any viewpoint exceeds the threshold Tv, then process block 535 reduces the influence of all subsequent feature matches corresponding to that viewpoint. In one example, reducing the influence of the matches corresponding to a viewpoint may be done by simply decrementing the first count corresponding with those features that have a match in the viewpoint. If, in decision block 540, the count for each of the viewpoints is greater than or equal to the threshold Tv, then process 500 may proceed to optional process block 550 (discussed in more detail below). That is, a second count exceeding the threshold Tv for each viewpoint means that a sufficient number of features have been added to the pruned database 504 to allow detection of the target object from each viewpoint, such that no additional features need to be added to the pruned database 504.
However, if, in decision block 540, not all viewpoints meet the threshold Tv number of matches with the features currently added to the pruned database 504, then process 500 proceeds to process block 545 which determines whether there are any features remaining in the existing database 502. Process 500 then returns to process block 525 where the feature with the next highest number of matches is added to the pruned database 504. Thus, in summary, process 500 is an iterative process that includes taking a feature from the existing database 502 that has the next most matches and adding that feature to the pruned database 504. The taking of the feature with the next most matches and adding it to the pruned database repeats until each viewpoint used to render the synthetic images has a threshold number Tv of matches generated by features that have been added to the pruned database.
Given better knowledge of the pose estimation algorithm it is possible to further enhance the pruning process 500 described above, by using probability theory. Thus, process 500 includes an optional process block 550 of performing probabilistic pruning of the pruned database 504. For example,
As shown in
In one embodiment, RANdom SAmple Consensus (RANSAC) algorithm is used to estimate the target pose from a set of matches. In this embodiment, the detection probability Pd can be computed as follows. RANSAC estimates the target pose from a minimal set of d points, typically d=3 or 4, which are randomly selected from the matches and then checks the consensus of the remaining matches with the candidate pose. For RANSAC to succeed the d matches must all be inliers (i.e., good matches). Assuming that v matches out of m matches extracted from an image are inliers, the probability of randomly picking an inlier from the set is Pin(v,m)=v/m. Thus, the probability of failure for a set of d points is PDF=1−(Pin)d. From this follows that the probability of success over k iterations is PS=1−(PDF)k. Finally, taking into account multiple n views and the minimum match count threshold TV we can compute the overall detection probability as:
The detection probability gain for a feature is the increase in detection probability obtained by adding a feature to the pruned database.
Next, in decision block 625, a detection probability given by the pruned database 504 is calculated. If the detection probability given by the pruned database 504 is greater than a probability threshold Tp, or if there are no features remaining in the existing database 502, then process ends at 630. Otherwise, process 600 returns to process block 605 to again calculate the detection probability gain of those features remaining in the existing database 502.
In another embodiment, the same probabilistic framework of
In yet another embodiment, a feature pruning process includes increasing performance under different illumination conditions. In particular, matching performance may significantly degrade in low-lighting scenarios. Low-lighting conditions result in a lower number of detected key-points, and in a reduction of the overall number of matches (inliers+outliers). Accordingly, this embodiment may include simulating low-lighting conditions by applying a higher threshold to cornerness scores used during key-point detection. Then, similar to the occlusion scenarios (see above), we compute the detection probability gain for each feature in low lighting scenarios. Selection of the best feature to add to the pruned database may then be based on a probability gain that is a combination of lighting, occluded and non-occluded scenario gains.
As shown, pruned database 504 includes a m number of features Im, while j number of features gj were included in the existing database 502. Since only those features that are determined to sufficiently aide in object detection are added to the pruned database 504, the number of features Im, added to the pruned database 504 may be much less than the total number of features gj included in the existing database 502 (e.g., m<<j). Accordingly, embodiments of pruning an existing database to build a pruned database may avoid the issue of exceedingly large database sizes, while also providing a model of a target object that includes only features that can significantly contribute to the detection process.
The mobile platform 902 may include a display to show images captured by the camera. The mobile platform 902 may also be used for navigation based on, e.g., determining its latitude and longitude using signals from a satellite positioning system (SPS), which includes satellite vehicle(s) 906, or any other appropriate source for determining position including cellular tower(s) 904 or wireless communication access points 905. The mobile platform 902 may also include orientation sensors, such as a digital compass, accelerometers or gyroscopes, that can be used to determine the orientation of the mobile platform 902.
As used herein, a mobile platform refers to a device such as a cellular or other wireless communication device, personal communication system (PCS) device, personal navigation device (PND), Personal Information Manager (PIM), Personal Digital Assistant (PDA), laptop or other suitable mobile device which is capable of receiving wireless communication and/or navigation signals, such as navigation positioning signals. The term “mobile platform” is also intended to include devices which communicate with a personal navigation device (PND), such as by short-range wireless, infrared, wireline connection, or other connection—regardless of whether satellite signal reception, assistance data reception, and/or position-related processing occurs at the device or at the PND. Also, “mobile platform” is intended to include all devices, including wireless communication devices, computers, laptops, etc. which are capable of communication with a server, such as via the Internet, WiFi, or other network, and regardless of whether satellite signal reception, assistance data reception, and/or position-related processing occurs at the device, at a server, or at another device associated with the network. In addition a “mobile platform” may also include all electronic devices which are capable of augmented reality (AR), virtual reality (VR), and/or mixed reality (MR) applications. Any operable combination of the above are also considered a “mobile platform.”
A satellite positioning system (SPS) typically includes a system of transmitters positioned to enable entities to determine their location on or above the Earth based, at least in part, on signals received from the transmitters. Such a transmitter typically transmits a signal marked with a repeating pseudo-random noise (PN) code of a set number of chips and may be located on ground based control stations, user equipment and/or space vehicles. In a particular example, such transmitters may be located on Earth orbiting satellite vehicles (SVs) 906. For example, a SV in a constellation of Global Navigation Satellite System (GNSS) such as Global Positioning System (GPS), Galileo, Glonass or Compass may transmit a signal marked with a PN code that is distinguishable from PN codes transmitted by other SVs in the constellation (e.g., using different PN codes for each satellite as in GPS or using the same code on different frequencies as in Glonass).
In accordance with certain aspects, the techniques presented herein are not restricted to global systems (e.g., GNSS) for SPS. For example, the techniques provided herein may be applied to or otherwise enabled for use in various regional systems, such as, e.g., Quasi-Zenith Satellite System (QZSS) over Japan, Indian Regional Navigational Satellite System (IRNSS) over India, Beidou over China, etc., and/or various augmentation systems (e.g., an Satellite Based Augmentation System (SBAS)) that may be associated with or otherwise enabled for use with one or more global and/or regional navigation satellite systems. By way of example but not limitation, an SBAS may include an augmentation system(s) that provides integrity information, differential corrections, etc., such as, e.g., Wide Area Augmentation System (WAAS), European Geostationary Navigation Overlay Service (EGNOS), Multi-functional Satellite Augmentation System (MSAS), GPS Aided Geo Augmented Navigation or GPS and Geo Augmented Navigation system (GAGAN), and/or the like. Thus, as used herein an SPS may include any combination of one or more global and/or regional navigation satellite systems and/or augmentation systems, and SPS signals may include SPS, SPS-like, and/or other signals associated with such one or more SPS.
The mobile platform 902 is not limited to use with an SPS for position determination, as position determination techniques may be implemented in conjunction with various wireless communication networks, including cellular towers 904 and from wireless communication access points 905, such as a wireless wide area network (WWAN), a wireless local area network (WLAN), a wireless personal area network (WPAN). Further the mobile platform 902 may access one or more servers 908 to obtain data, such as reference images and reference features from a database 912, using various wireless communication networks via cellular towers 904 and from wireless communication access points 905, or using satellite vehicles 906 if desired. The term “network” and “system” are often used interchangeably. A WWAN may be a Code Division Multiple Access (CDMA) network, a Time Division Multiple Access (TDMA) network, a Frequency Division Multiple Access (FDMA) network, an Orthogonal Frequency Division Multiple Access (OFDMA) network, a Single-Carrier Frequency Division Multiple Access (SC-FDMA) network, Long Term Evolution (LTE), and so on. A CDMA network may implement one or more radio access technologies (RATs) such as cdma2000, Wideband-CDMA (W-CDMA), and so on. Cdma2000 includes IS-95, IS-2000, and IS-856 standards. A TDMA network may implement Global System for Mobile Communications (GSM), Digital Advanced Mobile Phone System (D-AMPS), or some other RAT. GSM and W-CDMA are described in documents from a consortium named “3rd Generation Partnership Project” (3GPP). Cdma2000 is described in documents from a consortium named “3rd Generation Partnership Project 2” (3GPP2). 3GPP and 3GPP2 documents are publicly available. A WLAN may be an IEEE 802.11x network, and a WPAN may be a Bluetooth network, an IEEE 802.15x, or some other type of network. The techniques may also be implemented in conjunction with any combination of WWAN, WLAN and/or WPAN.
As shown in
The order in which some or all of the process blocks appear in each process discussed above should not be deemed limiting. Rather, one of ordinary skill in the art having the benefit of the present disclosure will understand that some of the process blocks may be executed in a variety of orders not illustrated.
Those of skill would further appreciate that the various illustrative logical blocks, modules, engines, circuits, and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, engines, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
Various modifications to the embodiments disclosed herein will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
This application claims the benefit of U.S. Provisional Application No. 61/883,736, filed Sep. 27, 2013.
Number | Date | Country | |
---|---|---|---|
61883736 | Sep 2013 | US |