The invention relates generally to place classification in image processing systems, and in particular to labeling places using model-based runtime change-point detection.
In computer image analysis such as intelligent transportation systems, a common task is to consistently classify and label places in a captured image scene. For example, place recognition is the task of consistently labeling a particular place (e.g., “kitchen on 2nd floor with a coffee machine”) every time the place is visited, while place categorization is to consistently label places according to their category (e.g., “kitchen”, “living room”). Place recognition and categorization are important for a robot or an intelligent agent to recognize places in a manner similar to that done by humans.
Most existing place recognition systems assume a finite set of place labels, which are learned offline from supervised training data. Some existing place recognition systems use place classifiers, which categorize places during runtime based on some measurements of input data. For example, one type of place recognition method models local features and distinctive parts of input images. Alternatively, a place recognition method extracts global representations of input images and learns place categories from the global representations of the images.
Existing place recognition systems face a variety of challenges including the requirement of large training data and limited place recognition (e.g., only recognizing places known from training data). For example, existing place recognition methods in robotics range from matching scale-invariant feature transform (SIFT) features across images to other derived measures of distinctiveness for places such as Fourier signatures, subspace representations and color histograms. These methods have the disadvantage of not being able to generalize and also are invariant to perspective mainly through the use of omnidirectional images.
An embodiment of the invention is now described with reference to the figures where like reference numbers indicate identical or functionally similar elements.
Reference in the specification to “one embodiment” or to “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiments is included in at least one embodiment of the invention. The appearances of the phrase “in one embodiment” in various places in the specification are not necessarily all referring to the same embodiment.
Some portions of the detailed description that follows are presented in terms of algorithms and symbolic representations of operations on data bits within a computer memory. These algorithmic descriptions and representations are the means used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art. An algorithm is here, and generally, conceived to be a self-consistent sequence of steps (instructions) leading to a desired result. The steps are those requiring physical manipulations of physical quantities. Usually, though not necessarily, these quantities take the form of electrical, magnetic or optical signals capable of being stored, transferred, combined, compared and otherwise manipulated. It is convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers, or the like. Furthermore, it is also convenient at times, to refer to certain arrangements of steps requiring physical manipulations of physical quantities as modules or code devices, without loss of generality.
However, all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Unless specifically stated otherwise as apparent from the following discussion, it is appreciated that throughout the description, discussions utilizing terms such as “processing” or “computing” or “calculating” or “determining” or “displaying” or “determining” or the like, refer to the action and processes of a computer system, or similar electronic computing device, that manipulates and transforms data represented as physical (electronic) quantities within the computer system memories or registers or other such information storage, transmission or display devices.
Certain aspects of the invention include process steps and instructions described herein in the form of an algorithm. It should be noted that the process steps and instructions of the invention could be embodied in software, firmware or hardware, and when embodied in software, could be downloaded to reside on and be operated from different platforms used by a variety of operating systems.
The invention also relates to an apparatus for performing the operations herein. This apparatus may be specially constructed for the required purposes, or it may comprise a general-purpose computer selectively activated or reconfigured by a computer program stored in the computer. Such a computer program may be stored in a computer readable storage medium, such as, but is not limited to, any type of disk including floppy disks, optical disks, CD-ROMs, magnetic-optical disks, read-only memories (ROMs), random access memories (RAMs), EPROMs, EEPROMs, magnetic or optical cards, application specific integrated circuits (ASICs), or any type of media suitable for storing electronic instructions, and each coupled to a computer system bus. Furthermore, the computers referred to in the specification may include a single processor or may be architectures employing multiple processor designs for increased computing capability.
The algorithms and displays presented herein are not inherently related to any particular computer or other apparatus. Various general-purpose systems may also be used with programs in accordance with the teachings herein, or it may prove convenient to construct more specialized apparatus to perform the required method steps. The required structure for a variety of these systems will appear from the description below. In addition, the invention is not described with reference to any particular programming language. It will be appreciated that a variety of programming languages may be used to implement the teachings of the invention as described herein, and any references below to specific languages are provided for disclosure of enablement and best mode of the invention.
In addition, the language used in the specification has been principally selected for readability and instructional purposes, and may not have been selected to delineate or circumscribe the inventive subject matter. Accordingly, the disclosure of the invention is intended to be illustrative, but not limiting, of the scope of the invention, which is set forth in the claims.
Embodiments of the invention provide a place recognition method/system for labeling places of a video/image sequence using runtime change-point detection. A video/image sequence is represented by a measurement stream comprising multiple image histograms of feature frequencies associated with the video/image sequence. The measurement stream is segmented into segments corresponding to places recognized in the video/image sequence based on change-point detection. Change-point detection is to detect abrupt changes to the parameters of a statistical place model. By detecting the changes in the parameters of the statistical place model modeling a video/image sequence, the place boundaries in the video/image sequence are obtained, where a place is exited or entered at each place boundary.
One embodiment of a disclosed system includes a segmentation module for providing place boundaries in a video/image stream. The segmentation module is configured to compute the probability of a change-point occurring at each time-step of the segments of a measurement stream representing a video/image stream. The segmentation module tracks the probabilities of detected change-points in the measurement stream. The probability of a change-point at any given time-step is obtained by combining a prior on the occurrences of change-points with the likelihood of the current measurement of the measurement stream given all the possible scenarios in which change-points could have occurred in the past.
One embodiment of a disclosed system also includes a place label generation module for labeling places known or unknown to pre-learned place models. The place label generation module is configured to assign place labels probabilistically to places in a video/image sequence based on the measurements of the measurement stream representing the video/image sequence, the most recently assigned place label and change-point distribution. Statistical hypothesis testing can be used to determine if the current measurement could have been generated by any of pre-learned place models. If a measurement cannot be generated by a pre-learned place model, the measurement is determined to be a previously unknown place and is assigned a new place label.
System Overview
The place recognition problem described above can be formulated as follows. Given a measurement stream representing a video/image sequence, measurements at some (possibly changing) intervals are generated. For simplicity, the intervals are referred to as time-steps. In one embodiment, an image from a video/image sequence is represented by one or more image histograms of feature frequencies associated with the image. Each image of the video/image sequence is represented by a spatial pyramid of multiple image histograms of feature frequencies associated with the image. For simplicity, histograms of feature frequencies associated with the image from the video/image sequence are referred to as image histograms. The image histograms of a video/image sequence forms a measurement stream of the video/image sequence, where the each image histogramis a measurement of the measurement stream.
It is noted that a place label remains the same for periods of time when a robot is moving inside a particular place. The place label only changes sporadically when the robot travels into the next place. Thus, a measurement stream representing a video sequence at runtime can be segmented into segments corresponding to places recognized in the video sequence, where measurements in each segment are assumed to be generated by a corresponding place model. The start and end of a segment where the corresponding place model associated with the segment changes are referred to as “change-points.” The change-points of a segment provide a reliable indication regarding the place label of the segment.
For each measurement, a label corresponding to the type of the place (e.g., kitchen, living room, office) is generated. If a measurement does not correspond to any type of the place, the measurement is likely to represent an unknown place. In one embodiment, the place types are given in the form of L place models M1, M2, . . . ML. The place models M1, M2, . . . ML are learned offline from pre-labeled training data. An example computer system for learning the place models offline from pre-labeled training data is further described below with reference to
At run time, the place recognition system receives 220 an input video for place labeling. The place recognition system generates image representations (e.g., spatial pyramids of image histograms) of images of the input video. The spatial pyramids of image histograms of images at different spatial resolutions are combined to generate 222 a corresponding measurement stream of the input video. The place recognition system segments 224 the measurement stream into multiple segments. Using the learned place labels, the place recognition system generates 226 place models for the input video.
Image Representation by Image Histograms
Turning now to
The image training set 110 comprises multiple pre-labeled images. In one embodiment, the image training set 110A comprises video sequences obtained from Visual Place Categorization (VPC) dataset. The dataset contains image sequences from six different homes, each containing multiple floors. The data set from each home consists of between 6000 and 10000 frames. In one embodiment, image sequences from each floor are treated as a different image sequence. The dataset has been manually labeled into 5 categories (e.g., living room, office) to provide ground truth for the place categorization problem to be solved by the disclosed method/system. In addition, a “transition” category is used to mark segments that do not correspond determinatively to any place category.
The memory 120 stores data and/or instructions that may be executed by the processor 150. The instructions may comprise code for performing any and/or all of the techniques described herein. Memory 120 may be a DRAM device, a static random access memory (SRAM), Flash RAM (non-volatile storage), combinations of the above, or some other memory device known in the art. In one embodiment, the memory 120 comprises a feature detection module 122, a feature clustering module 124, an image representation module 126 and a label learning module 128. The feature detection module 122 detects and extracts image features and/or textures from the images in the image training set 110. The feature clustering module 124 groups the extracted image/texture features into clusters. The image representation module 126 generates image representations of images of the image training set 110. The label learning module 128 learns multiple place models from the image representations and stores the learned place models in the data store 160.
The feature detection module 122 comprises computer executable instructions for detecting and extracting image/texture features from input images. In one embodiment, the feature detection module 122 detects scale-invariant feature transform (SIFT) features on a dense grid on each of a set of input images. SIFT is a way to detect and describe local features in an image by detecting multiple feature description key points of objects in an image. The feature detection module 122 extracts SIFT features by transforming an input image into a large collection of feature vectors, each of which is invariant to image translation, scaling and rotation and partially invariant to illumination and to local geometric distortion.
In another embodiment, the feature detection module 122 detects CENTRIST features from input images. CENTRIST is based on census transform of an image, which is a local feature computed densely for every pixel of the image, and encodes the value of a pixel's intensity relative to that of its neighbors. The feature detection module 122 computes census transform by considering a patch centered at every pixel of an image. The transform value is a positive integer that takes a range of values depending on the size of the patch. For instance, a patch size of 3, where there are 8 pixels in the patch apart from the central pixel, has transform values between 0 and 255.
In yet another embodiment, the feature detection module 122 detects texture features of the input images. Texture feature of an image is a function of the spatial variation in pixel intensities (e.g., gray values) of the image. The feature detection module 122, in one embodiment, extracts texture features from the input images using 17-dimensional filter bank (e.g., Leung-Malik filter bank).
The feature clustering module 124 comprises computer executable instructions for clustering features extracted by the feature detection module 122. In one embodiment, the feature clustering module 214 clusters the extracted SIFT image features by quantizing the image features using K-means to create a codebook/dictionary of code words of a pre-specified size. A code word of the dictionary is represented by a cluster identification of the quantized image feature. Similar to SIFT image features clustering, the feature clustering module 124 uses K-means to cluster texture features of the input images to create a dictionary comprising the cluster identifications of the quantized texture features.
The image representation module 126 comprises computer executable instructions for representing input images by image histograms. For example, an image of a video sequence is represented by one or more image histograms of feature frequencies associated with the image. The image histograms of a video sequence forms a measurement stream for segmentation and place label generation at runtime. The image histograms are the measurements in the change-point detection procedure described below.
In one embodiment, the image representation module 126 uses a spatial pyramid of image histograms to represent an image of a video sequence at different spatial resolutions. Specifically, the image representation module 126 obtains a spatial pyramid of an image by computing histograms of feature frequencies at various spatial resolutions across the image. The histogram bins contain the number of quantized image features in each of the image feature clusters in the image region being processed. The image representation module 126 divides the image into successive resolutions. In one embodiment, the image representation module 126 only computes the image histograms at the finest resolution since the coarser resolution image histograms can be obtained by adding the appropriate image histograms at an immediately finer level. All the image histograms from the different resolutions are then concatenated to produce the spatial pyramid representation of the image.
To compute the spatial pyramid of image histograms of an image, the image representation module 126 needs two parameters: the number of levels in the spatial pyramid and the number of feature clusters. Taking a spatial pyramid based on SIFT features of an image as an example, the image presentation module 126 computes the spatial pyramid of the image using the number of levels in the pyramid corresponding to the number of spatial resolutions of the image and the number of the image clusters computed in SIFT space (i.e., the size of the codebook). SIFT features have local information about an image patch while an image histogram has global information. By combining both of SIFT features and image histogram at different scales, the spatial pyramid of image histograms obtains more fine-grained discriminative power of the image.
In addition to SIFT features, the image representation module 126 can compute spatial pyramids using two other features, CENTRIST and texture. For example, the image representation module 126 computes a census transform histogram of an image based on the CENTRIST features extracted from the image. One advantage of using CENTRIST features to compute the image histograms is the image representation module 126 can directly compute the census transform histograms from the CENTRIST features without clustering the features, thus with reduced computation load. In another embodiment, the image representation module 126 computes a texture-based spatial pyramid of an image using texture features of the image.
Turning to
The label learning module 128 comprises computer executable instructions for learning place models from the image training set 110. In one embodiment, the label learning module 128 interacts with the image representation module 126 to obtain image histograms generated from the image training set 110, and from the image histograms, the label learning module 128 learns one or more place models. The place labels learned by the label learning module 128 are represented in a form of L place models M1, M2, . . . ML. The learned place models are stored in the data store 160 and are used at runtime to label places recognized in a video sequence. The label learning module 128 is further described below with reference to section of “learning place models using image histograms” and
Model-Based Change-Point Detection
The segmentation module 115 is configured to segment a measurement stream representing the input video 105 into non-overlapping and adjacent segments corresponding to places recognized in the input video 105. The measurement stream of the input video 105 is generated based on the image histograms of the input video 105, where the image histograms of the input video 105 are the measurements of the measurement stream. The boundaries between the segments are the change-points. In one embodiment, the segmentation module 115 uses a Bayesian change-point detection method to compute the probability of a change-point occurring at each time-step. The probability of a change-point at any given time-step is obtained by combining a prior on the occurrence of change-points with the likelihood of the current measurement given all the possible scenarios in which change-points could have occurred in the past.
In one embodiment, the segmentation module 115 generates the image histograms of the input video 105 similar to the image representation module 126 illustrated in
Assuming that a sequence of input data (e.g., the measurement stream of the input video 105) y1, y2, . . . yt can be segmented into non-overlapping and adjacent segments. The boundaries between the segments are the change-points. In one embodiment, the change-points are model based, where the form of the probability distribution in each segment remains the same and only the parameter value of the model for the segment changes. Further assuming that the probability distribution data are independent identically distributed (i.i.d) within each segment, ct denotes the length of the segment at time t. ct also indicates the time since the last change-point. If the current time-step is a change-point, ct=0, indicating that a new place model is used for the segment. If no change-points have occurred, ct=t.
Denoting the place label at time t as xtc, the place label xtc is indexed by the current segment since the whole segment has a single place label. The place label xtc is also updated with each measurement at time t. The probability distribution over xtc is taken to be a discrete distribution of size L, one for each of the learned place models.
To obtain the place label xtc at time t, a joint posterior on ct and xtc given the probability distribution data, p(ct, xtc|y1:t), is computed, where y1:t denotes all the input data from time 1 to time t. The posterior can be factored as Equation 1 below:
p(ct, xtc|y1:t)=p(ct|y1:t)p(xtc|ct, y1:t) (1)
The first term of Equation (1), p(ct|y1:t), is the posterior over the segment length. Computation of p(ct|y1:t) over the input data from time 1 to time t provides the change-point detection of the input data.
The likelihood of the input data in segment ct is represented as p(yt|ξtc) where ξtc is a parameter set. The data inside each segment are assumed to be independent identically distributed (i.i.d) and the parameters are assumed i.i.d according to a prior parameter distribution. The change-point posterior from Equation (1) can be expanded using Bayes law as Equation (2) below:
p(ct|y1:t)∝ p(yt|ct, y1:t-1)p(ct|y1:t-1) (2).
The first term of Equation (1) is the data likelihood, and the second term of Equation (2) can be further expanded by marginalizing over the segment length at the previous time step to yield a recursive formulation for ct as Equation (3) below:
where p(ct|ct-1) is the transition probability, p(ct-1|y1:t-1) is the posterior from the previous step, and c1, c2, . . . ct form a Markov chain.
For characterizing the transition probability p(ct|ct-1) in Equation (3), it is noted that the only two possible outcomes are ct=ct-1+1 when there is no change-point at time t, and ct=0 otherwise. Hence, this is a prior probability on the “lifetime” of this particular segment where the segment ends if a change-point occurs. Using survival analysis, the prior probability predicting the likelihood of occurrence of a change point in a segment can be modeled using a hazard function, which represents the probability of failure in a unit time interval conditional on the fact that failure has not already occurred. If H(·) is a hazard function, the transition probability can be modeled as below in Equation (4):
In the special case where the length of a segment is modeled using an exponential distribution with time scale λ, the probability of a change-point at every time-step is constant so that H(t)=1/λ.
The data likelihood from Equation (2) can be calculated if the distribution parameter to use is known. Hence, the data likelihood can be integrated over the parameter value using the parameter prior as Equation (5) below:
p(yt|ct, y1:t-1)=∫ξ
where ξc is the model parameter for segment ct, and yt-1c is the data from the current segment. The above integral can be computed in a closed form if the two distributions inside the integral are in the conjugate-exponential family of distributions.
In one embodiment, the conjugate distribution is used and the integrated function is denoted p(yt|ct, ηtc) where ηtc parametrizes the integrated data likelihood. Even though the integrated function p(yt|ct, ηtc) is usually not in the exponential family, it can be directly updated using the statistics of the data corresponding to the current segment {yt-1c, yt}. In other words, the integration need not be performed at every step for computing efficiency. In the case where t is a change-point (i.e., ct=0), the integrated function p(yt|ct, ηtc) is computed with prior values for ηt(0).
As described above, the locations of change-points are obtained by maintaining the posterior over segment lengths ct for all t. The posterior can be approximated using N weighted particles to obtain a constant runtime computation. Specifically, the segment length posterior can be obtained by combining Equations (2), (3), and (4) as the following:
where wt(c)=p(yt|ct, yt-1c) and, for the case where t is a change-point and yt-1c is the empty set, wt(0)=p(yt|ct). ρt-1=p(ct-1|y1:t-1) is the posterior from the previous time-step.
The posterior computed in Equation (6) can be approximated using particle filtering for computing efficiency. For example, the posterior computed in Equation (6) is used with a Rao-Blackwellized particle filer with wt as the particle weights. The particle weights are given by Equation (5). Since the likelihood parameters ξc in Equation (5) are integrated out, Rao-Blackwellized particle filer has lower variance than a standard particle filter, and makes the convergence of the computation of the posterior more efficient.
Place Lable Inference
The label generation module 125 is configured to assign place labels to places detected in an input video probabilistically. In one embodiment, the label generation module 125 uses statistical hypothesis testing to determine if the current measurement of the measurement stream of the input video could have been generated by any of the pre-learned place models. If the current measurement could not be generated by one of the pre-learned place models, the place associated with the measurement is declared to be a previously unknown place. Thus, the label generation module 125 can systematically recognize a previously unknown place type and assign it a new label if needed.
In one embodiment, the conditional posterior on a place label associated with a segment is represented by the second term, p(xtc|ct, y1:t), of Equation (1) given the segment length. The conditional posterior p(xtc|ct, y1:t) over the input data from time 1 to time t provides predictions of place labels of the input video. The conditional posterior on the place label p(xtc|ct, y1:t) from Equation (1) can be expanded using Bayes law as:
p(xtc|ct, y1:t)∝ p(ytc|xtc, ct)p(xtc|ct) (7)
where ytc is the measurement data in the current segment, i.e. ytc={yt-c
For detection of an unknown place, the label generation module 125 needs to indicate the place being evaluated is not predicted by any of the known place models. In one embodiment, the label generation module 125 detects unknown place using statistical hypothesis testing. Specifically, at each time-step, the label generation module 125 performs L hypothesis tests to determine whether the place is a known place. If these tests are computationally expensive, the label generation module 125 may perform statistical hypothesis tests once every T time-steps or stop the tests when ct>C for some large segment length C after which the decision regarding the place label is unlikely to change.
The label generation module 125 considers hypothesis testing for model Mi with parameter vector η so that the exact probability under this place model is p0=p(ytc|η). The indication of the observed data is the probability, pσ=Σp(y|η)<p
where ηmi=arg max ηp(ytc|η).
The statistic used in the hypothesis test is −2ln R, where R is the likelihood ratio in Equation (8). This statistic can be shown to converge to the Chi-squared distribution with k−1 degrees of freedom, where k represents the dimensions of the parameter vector θ. The place model Mi can be rejected if the Chi-squared probability is less than a threshold, which is usually set at 5% (0.05) or 1% (0.01). The test statistic converges to the Chi-squared distribution at the rate of O(N1), where N is the number of measurements used to compute the maximum likelihood parameter value ηmi.
The label generation module 125 performs the statistical hypothesis testing for each known place model and declares the place to be previously unknown if the tests reject all of the known place models. Since the Chi-squared probabilities from the hypothesis testing do not say anything about the probability of the new label, the place distribution is set to a prior value for unknown places p(x|new label). The new place label can be either stored for future reference along with the maximum likelihood model parameters ηmi, or be discarded if new places are of no interest.
In terms of implementation, the label generation module 125 augments the change-point detection described above so that the discrete distribution on places is stored with each segment ct. Similarly, in change-point detection with particle filtering, each particle maintains a place distribution. The place distribution becomes increasingly confident as the length of the segment ct increases and is robust to image noise and outliers. The cost of updating the place distribution is linear in the number of labels and hence, does not affect the runtime of the change-point detection.
Learning Place Models using Image Historgrams
A place model is used to compute the data likelihood in Equation (5). In one embodiment as illustrated in
P(y|α)=∫θP(y|θ)P(θ|α) (9)
where θ=[θ1, θ2, . . . , θW] and α=[α1, α2, . . . , αW] are the multinomial parameter and Dirichlet prior respectively, and W is the codebook size.
Assuming that the histogram measurement y has bin counts given by [n1, n2, . . . , nW], the distributions in the integrand above can be written as
where P(y|θ) is a multinomial distribution and P(θ|α) is a Dirichlet distribution. The likelihood model in Equation (9) is called the Multivariate Polya Model, where a quantized image feature that appears once is more likely to appear multiple times in the entire input video stream.
Performing the integration in Equation (9) gives the final form of the histogram measurement likelihood, which is the place model to be learned, as
where n=Σwnw, |α|=Σwαw and Γ(.) denotes a Gamma function. If the likelihood of a set of measurements is to be computed, then n is taken to be the total counts across all measurements, while nW is the total count for a particular image feature.
Given a set of D images with features detected from the images, the maximum likelihood value for a multinomial parameter α can be learned by using iterative gradient descent optimization. It can be shown that this leads to the following fixed point update for the α parameter:
where α=Σwαw as before, and Ψ(·) is a Digamma function, the derivative of the Gamma function.
The label learning module 128 uses the DCM distributions as place models in change-point detection. The multinomial parameter α for each place is learned from labeled images in an offline training phase. During runtime, the distribution is used to compute the data likelihood in Equation (5), and the α parameter is also updated after each measurement using the iterative rule in Equation (13). This facilitates runtime place label learning, and if runtime place label learning is not required, the updated parameter can be discarded at the end of the segment.
To reduce computation load, the place label learning module 128 uses the spatial pyramids of image histograms at the finest level of an input video as input. Thus, for a pyramid with V levels and level V=0 denoting the whole image, the dimensions of multinomial parameter α is 4V W, where W is the size of codebook of image features. The expression for the hypothesis testing statistic, which is −2ln R, can also be obtained by substituting the distribution expression in Equation (12) into the likelihood ratio Equation (8).
Table I below illustrates a summary of particle filtering using the DCM model
For a place that is recognized by one or more place models, the label generation module 125 selects a place label based on the probabilities of the place label generated by all the learned place models. In one embodiment, the label generation module 125 selects the place label having the largest probability. For example, if a place is recognized by three place models corresponding to “kitchen,” “living room” and “office,” respectively, and the probability for the place being a kitchen is 0.8, the probability of being an office is 0.4 and the probability of being a living room is 0.1, the label generation module 125 selects kitchen as the place label for the place.
Experiments and Applications
While particular embodiments and applications of the invention have been illustrated and described herein, it is to be understood that the invention is not limited to the precise construction and components disclosed herein and that various modifications, changes, and variations may be made in the arrangement, operation, and details of the methods and apparatuses of the invention without departing from the spirit and scope of the invention as it is defined in the appended claims
This application claims the benefit of U.S. Provisional Application No. 61/314,461, filed Mar. 16, 2010, entitled “PLISS: Detecting and Labeling Places Using Online Change-Point Detection,” which is incorporated by reference in its entirety.
Number | Name | Date | Kind |
---|---|---|---|
5299284 | Roy | Mar 1994 | A |
5546475 | Bolle et al. | Aug 1996 | A |
6393423 | Goedken | May 2002 | B1 |
6404925 | Foote et al. | Jun 2002 | B1 |
6807312 | Thomas et al. | Oct 2004 | B2 |
7555165 | Luo et al. | Jun 2009 | B2 |
7949186 | Grauman et al. | May 2011 | B2 |
20070041638 | Liu et al. | Feb 2007 | A1 |
20090060340 | Zhou | Mar 2009 | A1 |
20090290802 | Hua et al. | Nov 2009 | A1 |
Number | Date | Country |
---|---|---|
2003157439 | May 2003 | JP |
Entry |
---|
Torralba, A. et al., “Context-Based Vision System for Place and Object Recognition,” Mar. 2003, AI Memo May 2003; Massachusetts Institute of Technology—Artificial Intelligence Laboratory, eleven pages. |
Number | Date | Country | |
---|---|---|---|
20110229032 A1 | Sep 2011 | US |
Number | Date | Country | |
---|---|---|---|
61314461 | Mar 2010 | US |