The invention relates to video processing and object identification, and more particularly relates to analyzing images of objects to identify attributes.
Automatically identifying the locations of objects and their parts in video is important for many tasks. For example, in the case of human body parts, automatically identifying the locations of human body parts is important for tasks such as automated action recognition, human pose estimation, etc. Body parsing is a term used to describe the computerized localization of individual body parts in video. Current methods for body parsing in video estimate only part locations such as head, legs, arms, etc. See e.g., “Strike a Pose: Tracking People by Finding Stylized Poses,” Ramanan et al., Computer Vision and Pattern Recognition (CVPR), San Diego, Calif., June 2005; and “Pictorial Structures for Object Recognition,” Felzenszwalb et al., International Journal of Computer Vision (IJCV), January 2005.
Most previous methods in fact only perform syntactic object parsing, i.e., they only estimate the localization of object parts (e.g., arms, legs, face, etc.) without efficiently estimating semantic attributes associated with the object parts.
In view of the foregoing, there is a need for a method and system for effectively identifying semantic attributes of objects from images.
The invention resides in a method, computer program product, computer system and process for estimating parts and attributes of an object in video. The method, computer program product, computer system and process comprising producing a plurality of versions of an image of an object derived from a video input, wherein each version has a different resolution of said image of said object, and computing an appearance score at each of a plurality of regions on the lowest resolution version of said plurality of versions of said image for at least one semantic attribute for said object, wherein said appearance score denotes a probability of the at least one semantic attribute appearing in the region. Such techniques also include analyzing one or more other versions of the multiple versions to compute a resolution context score for each of the plurality of regions in the lowest resolution version, wherein said resolution context score denotes an extent to which finer spatial structure exists in the one or more others versions than in the lowest resolution version for each of the plurality of regions, and determining a configuration of the at least one semantic attribute in the lowest resolution version based on the appearance score and the resolution context score in each of the plurality of regions in the lowest resolution version.
These and other features of the invention will be more readily understood from the following detailed description of the various aspects of the invention taken in conjunction with the accompanying drawings that depict various embodiments of the invention, in which:
It is noted that the drawings are not to scale. The drawings are intended to depict only typical aspects of the invention, and therefore should not be considered as limiting the scope of the invention. While the drawings illustrate the processing of human bodies in video, the invention extends to the processing of other objects in video. In the drawings, like numbering represents like elements between the drawings.
The invention relates to video processing and object identification, and more particularly relates to analyzing images of objects to identify attributes.
Aspects of the invention provide an improved solution for detecting semantic attributes of objects in video. For example, aspects of the invention provide for the extraction of attributes from body parts to enable automatic searching of people in videos based on a personal description. In another example, the invention provides for the extraction of attributes from cars to enable automatic searching of cars in video based on a description of a car. A possible query could be: “show all people entering IBM last month with beard, wearing sunglasses, wearing a red jacket and blue pants” or “show all blue two-door Toyota with diamond hub caps entering the IBM parking lot last week.”
The invention deals with the problem of semantic object parsing, where the goal is to effectively estimate both part locations and semantic attributes in the same process. Using human body parsing as an example, embodiments of the invention provide for the estimation of semantic attributes of human body parts together with the localization of body parts in the same process. Overcoming the inefficiency and inaccuracy of the previous approaches, the invention leverages a global optimization scheme to estimate both parts and their corresponding attributes simultaneously.
Unlike previous approaches, embodiments of the invention use semantic attributes such as “beard,” “moustache,” and “no facial hair” to not only locate the human body part but also identify the attribute of the body part. For example, instead of only identifying a body part such as a “leg,” the invention uses semantic attributes such as “black trousers,” “long skirts,” and “shorts” to both locate the body part and identify its attributes. The invention maintains a data table relating each semantic attribute to a corresponding body part. For example, the semantic attribute “beard” corresponds to the body part “lower face region.”
Embodiments of the invention are based on three kinds of features: appearance features, resolution context features, and geometric features. The appearance features refer to the scores obtained by comparing semantic attributes from an image library to what appears to be on the image to evaluate the probability of a match. The resolution context features refer to object consistency under different image resolutions. The resolution context score for a particular region is the weighted average score from the particular region's higher resolution image. A total score is computed for the higher resolution image by adding up the appearance scores, geometric scores and if, a higher resolution image is available, resolution context scores. The resolution context score is computed from a higher resolution image as the total score at a given region divided by the number of sub-regions which compose that region on the higher resolution image being analyzed. The geometric features refer to the scores computed based on the spatial relationships among the underlying parts in a probable configuration. For example, a potential attribute of “beard” corresponds to a “face” and a “black shirt” corresponds to a “torso.” The geometric features test the accuracy of the candidate semantic attributes by applying the general human body configuration principle that a “face” is both above a “torso” and of a certain distance from a “torso.”
In the example of human body parsing, aspects of the invention estimate not only human body part locations, but also their semantic attributes such as color, facial hair type, presence of glasses, etc. In other words, aspects of the invention utilize a unified learning scheme to perform both syntactic parsing, i.e., location estimation, and semantic parsing, i.e., extraction of semantic attributes that describe each body part. The invention detects both body parts and attributes in the same process to more accurately identify the attributes of a human body over the prior art.
Turning to the drawings,
Computing device 14 is shown including a processor 20, a memory 22A, an input/output (I/O) interface 24, and a bus 26. Further, computing device 14 is shown in communication with an external I/O device/resource 28 and a non-transitory computer readable storage device 22B (e.g., a hard disk, a floppy disk, a magnetic tape, an optical storage such as a compact disc (CD) or a digital video disc (DVD)). In general, processor 20 executes program code, such as semantic attribute detection program 30, which is stored in a storage system, such as memory 22A (e.g., a dynamic random access memory (DRAM), a read-only memory (ROM), etc.) and/or storage device 22B. While executing program code, processor 20 can read and/or write data, such as data 36 to/from memory 22A, storage device 22B, and/or I/O interface 24. A computer program product comprises the storage device 22B on which the program code is stored for subsequent execution by the processor 20 to perform a method for estimating parts and attributes of an object in video. Bus 26 provides a communications link between each of the components in computing device 14. I/O device 28 can comprise any device that transfers information between a user 16 and computing device 14 and/or digital video input 40 and computing device 14. To this extent, I/O device 28 can comprise a user I/O device to enable an individual user 16 to interact with computing device 14 and/or a communications device to enable an element, such digital video input 40, to communicate with computing device 14 using any type of communications link. I/O device 28 represents at least one input device (e.g., keyboard, mouse, etc.) and at least one (e.g., a printer, a plotter, a computer screen, a magnetic tape, a removable hard disk, a floppy disk).
In any event, computing device 14 can comprise any general purpose computing article of manufacture capable of executing program code installed thereon. However, it is understood that computing device 14 and semantic attribute detection program 30 are only representative of various possible equivalent computing devices that may perform the process described herein. To this extent, in other embodiments, the functionality provided by computing device 14 and semantic attribute detection program 30 can be implemented by a computing article of manufacture that includes any combination of general and/or specific purpose hardware and/or program code. In each embodiment, the program code and hardware can be created using standard programming and engineering techniques, respectively. Such standard programming and engineering techniques may include an open architecture to allow integration of processing from different locations. Such an open architecture may include cloud computing. Thus the present invention discloses a process for supporting computer infrastructure, integrating, hosting, maintaining, and deploying computer-readable code into the computer system 12, wherein the code in combination with the computer system 12 is capable of performing a method for estimating parts and attributes of an object in video.
Similarly, computer system 12 is only illustrative of various types of computer systems for implementing aspects of the invention. For example, in one embodiment, computer system 12 comprises two or more computing devices that communicate over any type of communications link, such as a network, a shared memory, or the like, to perform the process described herein. Further, while performing the process described herein, one or more computing devices in computer system 12 can communicate with one or more other computing devices external to computer system 12 using any type of communications link. In either case, the communications link can comprise any combination of various types of wired and/or wireless links; comprise any combination of one or more types of networks; and/or utilize any combination of various types of transmission techniques and protocols.
As discussed herein, semantic attribute detection program 30 enables computer system 12 to detect semantic attributes of objects, such as person 92 (
Aspects of the invention provide an improved solution for detecting semantic attributes of objects, such as person 92 (
Aspects of the invention provide an improved solution for detecting semantic attributes of objects, such as person 92 (
At S1, object detection module 32 (
D2 maintains a list of semantic attributes and associated images. In addition to describing a semantic attribute, each semantic attribute corresponds to a body part. For example, semantic attributes “sunglasses,” “eyeglasses,” and “no glasses” all correspond to the body part “middle face region;” semantic attributes “beard,” “moustache,” and “no facial hair” all correspond to the body part “lower face region.”
At S2, the appearance score module 34 (
At S2, in evaluating the probability of semantic attributes being present at regions of the image, aspects of the invention employ a method described in the works of Viola et al. in “Robust Real-time Object Detection,” Cambridge Research Laboratory Technical Report, February 2001. The method is further described with real-valued confidence scores in the works of Bo Wu et al. in “Fast Rotation Invariant Multi-View Face Detection Based on Real Adaboost,” IEEE International Conference on Automatic Face and Gesture Recognition, 2004. The method provides steps to calculate an appearance score to represent the probability of an attribute being present at a region. The presence of a semantic attribute is evaluated through the application of a semantic attribute detector. A detector for a semantic attribute is a function that maps a region of an image into a real number in the interval [0,1], where the output indicates the probability that the semantic attribute is present in the image region given as input. Under the invention, the resulted value of an appearance score can range from 0 to 1. At each region of the image, there may be multiple appearance scores corresponding to the probability of multiple semantic attributes being present at the same region.
Subsequently in S2 (
The output of S2 (
At S3 (
To further illustrate how region 675 on image 670 on
The weighted average score for the highest resolution image corresponding to region x on image 690 is:
(0.95*0.5+0.9*0.3+0*0.2+0.9*0.5+0.8*0.3+0*0.2)/2=0.7275
The sum is divided by 2 because there are two regions (region xx and region xy) in the calculation. The output of 0.7275 becomes the resolution context score of region x on image 690. Similarly, assume that, upon analysis of the higher resolution images of region y and region z, the resolution context scores for region y and region z are 0.6 and 0.5 respectively. Table 2 depicts scores for region x, region y and region z on image 690 is shown below.
Therefore, the weighted average score for image 690 is:
(0.9*0.5+0.5*0.3+0.7275*0.2+0.8*0.5+0.6*0.3+0.6*0.2+0.9*0.5+0.35*0.3+0.5*0.2)/3≈0.7
Because image 690 is the corresponding higher resolution image of region 675 on image 670, region 675 on image 670 has a resolution context score of 0.7.
As further demonstrated in
The output of S3 (
At S4 (
Geometric Score (Gi) Examples
The geometric score (Gi) for body part i (or region i) may be expressed in terms of a geometric score (GAi) based on angles and/or a geometric score (GDi) based on distances.
In one embodiment, Gi=(GAi+GDi)/2, which is a straight arithmetic average.
In one embodiment, Gi=WAGAi+WDGDi, which is a weighted arithmetic average, wherein the weights (WA,WD) are non-negative real numbers satisfying WA+WD=1, and wherein the weights (WA,WD) are inputs that may be selected or determined, in one example, based on such factors as the relative accuracy and/or importance of reference values of angles and distance (see below) used to calculate the geometric scores GAi and GDi.
In one embodiment, Gi=(GAi*GDi)1/2, which is a geometric average.
In one embodiment, Gi=GAi, wherein only angles, and not distances, are used.
In one embodiment, Gi=GDi, wherein only distances, and not angles, are used.
Geometric Score (GAi) Based on Angles
Let Ai={Ai1, Ai2, . . . , AiN} denote an array of N angles determined as described supra for between part i (or region i) and each pair of the other body parts (or regions).
Let ai={ai1, ai1, . . . , aiN} denote an array of N corresponding reference angles stored in a library or file, wherein N≧2.
Let δAi denote a measure of a differential between Ai and ai.
In one embodiment, δAi=[{(Ai1−ai1)2+(Ai2−ai2)2+ . . . +(AiN−aiN)2}/N]1/2.
In one embodiment, δAi=(|Ai1−ai1|+|Ai2−ai2|+ . . . +|AiN−aiN|)/N.
Let tA denote a specified or inputted angle threshold such that:
GAi=0 if δAi≧tA; and
GAi=1−δAi/tA if δAi<tA.
Thus, GAi satisfies 0≦GAi≦1. In particular, GAi=1 if SAi=0 (i.e., if all determined angles are equal to all of the corresponding reference angles). Furthermore, GAi=0 if δAi≧tA. (i.e., if the measure of the differential between Ai and ai is intolerably large).
Geometric Score (GDi) Based on Distances
Let Di={Di1, Di2, . . . , DiM} denote an array of M distances determined as described supra between body part i (or region i) and each other body part (or region).
Let di={di1, di2, . . . , diM} denote an array of M corresponding reference distances stored in a library or file, wherein M≧2.
Let δDi denote a measure of a differential between Di and di.
In one embodiment, δDi=[{(Di1−di1)2+(Di2−di2)2+ . . . +(DiN−diM)2}/M]1/2.
In one embodiment, δDi=(|Di1−di1|+|Di2−di2|+ . . . +|Din−diM|)/M.
Let tD denote a specified or inputted distance threshold such that:
GDi=0 if δDi≧tD; and
GDi=1−δDi/tD if δDi<tD.
Thus, GDi satisfies 0≦GDi≦1. In particular, GDi=1 if δDi=0 (i.e., if all determined distances are equal to all of the corresponding reference distances). Furthermore, GDi=0 if δDi≧tA (i.e., if the measure of the differential between Di and di is intolerably large).
The outputs of S4 (
At S5 (
Therefore, at S5 (
As an alternative to having predetermined weights for the three types of scores when calculating the weighted average score, the weights can be dynamically determined. To compute an optimized weighted average score from all three types of scores, S6 (
At S7 (
Therefore, each configuration under analysis is composed of a set of parts where each part (i) is associated with an attribute and correspondent appearance score Ai, resolution context score and geometric score Gi. At S7 (
where Ai represents appearance scores, Gi represents geometric scores, Ri represents resolution scores for each part i of the configuration, and W1, W2, and W3 correspond to the weights obtained by the structured learning module. W1, W2, and W3 are provided by S6 the structured learning module 35 (
As used herein, it is understood that “program code” means any set of statements or instructions, in any language, code or notation, that cause a computing device having an information processing capability to perform a particular function either directly or after any combination of the following: (a) conversion to another language, code or notation; (b) reproduction in a different material form; and/or (c) decompression. To this extent, program code can be embodied as any combination of one or more types of computer programs, such as an application/software program, component software/a library of functions, an operating system, a basic I/O system/driver for a particular computing, storage and/or I/O device, and the like.
The foregoing description of various aspects of the invention has been presented for purposes of illustration and description. It is not intended to be exhaustive or to limit the invention to the precise form disclosed, and obviously, many modifications and variations are possible. Such modifications and variations that may be apparent to an individual in the art are included within the scope of the invention as defined by the accompanying claims.
This application is a continuation of U.S. patent application Ser. No. 13/783,749, filed Mar. 4, 2013, which is a continuation of U.S. patent application Ser. No. 12/845,095, filed Jul. 28, 2010, both of which are incorporated by reference herein. The present application is related to U.S. patent application entitled “Multispectral Detection of Personal Attributes for Video Surveillance,” identified by Ser. No. 12/845,121 and filed Jul. 28, 2010, the disclosure of which is incorporated by reference herein in its entirety. Additionally, the present application is related to U.S. patent application entitled “Facilitating People Search in Video Surveillance,” identified by Ser. No. 12/845,116, and filed Jul. 28, 2010, the disclosure of which is incorporated by reference herein in its entirety. Also, the present application is related to U.S. patent application entitled “Attribute-Based Person Tracking Across Multiple Cameras,” identified by Ser. No. 12/845,119, and filed Jul. 28, 2010, the disclosure of which is incorporated by reference herein in
Number | Name | Date | Kind |
---|---|---|---|
5870138 | Smith et al. | Feb 1999 | A |
6549913 | Murakawa | Apr 2003 | B1 |
6608930 | Agnihotri et al. | Aug 2003 | B1 |
6829384 | Schneiderman et al. | Dec 2004 | B2 |
6885761 | Kage | Apr 2005 | B2 |
6920236 | Prokoski | Jul 2005 | B2 |
6967674 | Lausch | Nov 2005 | B1 |
7006950 | Greiffenhagen et al. | Feb 2006 | B1 |
7257569 | Elder et al. | Aug 2007 | B2 |
7274803 | Sharma et al. | Sep 2007 | B1 |
7277891 | Howard et al. | Oct 2007 | B2 |
7355627 | Yamazaki et al. | Apr 2008 | B2 |
7382894 | Ikeda et al. | Jun 2008 | B2 |
7391900 | Kim et al. | Jun 2008 | B2 |
7395316 | Osterbag et al. | Jul 2008 | B2 |
7406184 | Wolff et al. | Jul 2008 | B2 |
7450735 | Shah et al. | Nov 2008 | B1 |
7460149 | Donovan et al. | Dec 2008 | B1 |
7526102 | Ozer | Apr 2009 | B2 |
7822227 | Barnes et al. | Oct 2010 | B2 |
7974714 | Hoffberg | Jul 2011 | B2 |
8004394 | Englander | Aug 2011 | B2 |
8208694 | Jelonek et al. | Jun 2012 | B2 |
8411908 | Ebata et al. | Apr 2013 | B2 |
8421872 | Neven, Sr. | Apr 2013 | B2 |
8532390 | Brown et al. | Sep 2013 | B2 |
8588533 | Brown et al. | Nov 2013 | B2 |
20030120656 | Kageyama et al. | Jun 2003 | A1 |
20050013482 | Niesen | Jan 2005 | A1 |
20050162515 | Venetianer et al. | Jul 2005 | A1 |
20060165386 | Garoutte | Jul 2006 | A1 |
20060184553 | Liu et al. | Aug 2006 | A1 |
20060285723 | Morellas et al. | Dec 2006 | A1 |
20070052858 | Zhou et al. | Mar 2007 | A1 |
20070053513 | Hoffberg | Mar 2007 | A1 |
20070122005 | Kage et al. | May 2007 | A1 |
20070126868 | Kiyohara et al. | Jun 2007 | A1 |
20070177819 | Ma et al. | Aug 2007 | A1 |
20070183763 | Barnes et al. | Aug 2007 | A1 |
20070237355 | Song et al. | Oct 2007 | A1 |
20070237357 | Low | Oct 2007 | A1 |
20070294207 | Brown et al. | Dec 2007 | A1 |
20080002892 | Jelonek et al. | Jan 2008 | A1 |
20080080743 | Schneiderman et al. | Apr 2008 | A1 |
20080122597 | Englander | May 2008 | A1 |
20080123968 | Nevatia et al. | May 2008 | A1 |
20080159352 | Adhikari et al. | Jul 2008 | A1 |
20080201282 | Garcia et al. | Aug 2008 | A1 |
20080211915 | McCubbrey | Sep 2008 | A1 |
20080218603 | Oishi | Sep 2008 | A1 |
20080232651 | Woo | Sep 2008 | A1 |
20080252722 | Wang et al. | Oct 2008 | A1 |
20080252727 | Brown et al. | Oct 2008 | A1 |
20080269958 | Filev et al. | Oct 2008 | A1 |
20080273088 | Shu et al. | Nov 2008 | A1 |
20080317298 | Shah et al. | Dec 2008 | A1 |
20090046153 | Chen et al. | Feb 2009 | A1 |
20090060294 | Matsubara et al. | Mar 2009 | A1 |
20090066790 | Hammadou | Mar 2009 | A1 |
20090074261 | Haupt et al. | Mar 2009 | A1 |
20090097739 | Rao et al. | Apr 2009 | A1 |
20090174526 | Howard et al. | Jul 2009 | A1 |
20090261979 | Breed et al. | Oct 2009 | A1 |
20090295919 | Chen et al. | Dec 2009 | A1 |
20100106707 | Brown et al. | Apr 2010 | A1 |
20100150447 | GunasekaranBabu et al. | Jun 2010 | A1 |
20110087677 | Yoshio et al. | Apr 2011 | A1 |
20120039506 | Sturzel et al. | Feb 2012 | A1 |
Number | Date | Country |
---|---|---|
19960372 | Jun 2001 | DE |
2875629 | Mar 2006 | FR |
2004070514 | Mar 2004 | JP |
2009117607 | Sep 2009 | WO |
2009133667 | Nov 2009 | WO |
2010023213 | Mar 2010 | WO |
Entry |
---|
Ronfard et al., Leaning to Parse Pictures of People, Lecture Notes in Computer Science—LNCS, vol. 2353, Jan. 1, 2002, pp. 700-714. |
Li-Jia Li et al., Towards Total Scene Understanding: Classification, Annotation and Segmentation in an Automatic Framework, Computer Vision and Pattern Recognition, 2009, CVPR 2009. IEEE, Piscataway, NJ, USA, Jun. 20, 2009, pp. 2036-2043. |
Szelisky et al., Computer Vision: Algorithms and Applications, Jan. 1, 2011, Springer, pp. 615-621. |
Marr, Vision, Jan. 1, 1982, Freeman, pp. 305-313. |
Ramanan, Part-Based Models for Finding People and Estimating Their Pose, in: Thomas B. Moeslund et al., Visual Analysis of Humans, Jan. 1, 2011, Springer, pp. 1-25. |
Zhu et al., A Stochastic Grammar of Images, Jan. 1, 2007, Now Publishers, pp. 259-362. |
Vaquero et al., Chapter 14: Attribute-Based People Search, in: Yunqian Ma et al., Intelligent Video Surveillance: Systems and Technology, Jan. 1, 2009, pp. 387-405. |
Feris, Chapter 3, Case Study: IBM Smart Surveillance System, in: Yunqian Ma et al., Intelligent Video Surveillance: System and Technology, Jan. 1, 2009, pp. 47-76. |
Nowozin et al., Structured Learning and Prediction in Computer Vision, Jan. 1, 2011, Now Publishers, pp. 183-365. |
Wu, Integration and Goal-Guided Scheduling of Bottom-up and Top-Down Computing Processes in Hierarchical Models, UCLA Jan. 1, 2011. |
Lin L et al., A Stochastic Graph Grammar for Compositional Object Representation and Recognition, Pattern Recognition, Elsevier, GB, vol. 42, No. 7, Jul. 1, 2009, pp. 1297-1307. |
Yang et al., Evaluating Information Contributions of Bottom-up and Top-down Processes, Computer Vision, 2009 IEEE, Piscataway, NJ, USA, Sep. 29, 2009, pp. 1042-1049. |
Tan et al., Enhanced Pictorial Structures for Precise Eye Localization Under Incontrolled Conditions, Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE, Piscataway, NJ, USA Jun. 20, 2009, pp. 1621-1628. |
Ioffe et al., Probabilistic Methods for Finding People, International Journal of Computer Vision, Kluwer Academic Publishers, Norwell, US, vol. 43, No. 1, Jun. 1, 2001, pp. 45-68. |
Mohan et al., Example-Based Object Detection in Images by Components, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 23, No. 4, Apr. 2001, pp. 349-361. |
Felzenszwalb, et al., Pictorial Structures for Object Recognition, International Journal of Computer Vision (IJCV), pp. 1-42, Jan. 2005. |
Viola et al., Robust Real-time Object Detection, Cambridge Research Laboratory Technical Report Series, pp. 1-24, Feb. 2001. |
Wu, Bo et al. Fast Rotation Invariant Multi-View Face Detection Based on Real Adaboost, IEEE International Conference on Automatic Face and Gesture Recognition (FGR'04), 2004. |
Ramanan et al., Strike a Pose: Tracking People by Finding Stylized Poses, Computer Vision and Pattern Recognition (CVPR), San Diego, CA, Jun. 2005. |
Tran et al., Configuration Estimates Improve Pedestrian Finding, National Information Processing Systems Foundation, 2007. |
N. Dalal et al, Histograms of Oriented Gradients for Human Detection, IEEE Conference on Computer Vision and Pattern Recognition, San Diego, USA, Jun. 2005, vol. II, pp. 886-893, 2005. |
Tsochantaridis et al., Large Margin Methods for Structured and Interdependent Output Variables, Journal of Machine Learning Research (JMLR), Sep. 2005. |
Naive Bayes Classifier, Wikipedia, http://en.wikipedia.org/wiki/Naive—Bayes—classifier, Jul. 27, 2010, 7 pages. |
Bi, Sheng, et al., Human Body Segmentation Based on Adaptive Feature Selection in Complex Situations, Society for Imaging Science and Technology, Unites States; Society of Photo-Optical Instrumentation Engineers, United States Image Conference (San Jose, CA, USA) 2008 Journal: Proceedings of SPIE—The International Society for Optical Engineering, 2008. |
Lao, Weilun et al., Fast Detection and Modeling of Human-Body Parts from Monocular Video, Springer-Verlag Berlin Heidelberg 2008, pp. 380-389, 2008. |
Jiang et al., View Synthesis from Infrared-visual Fused 3D Model for Face Recognition, Fifth Intl. Conference on Information, Communications and Signal Processing, pp. 177-180, 2005. |
Gundimada et al., Feature Selection for Improved Face Recognition in Multisensor Images, in R. Hammond, B. Abidi and M. Abidi, editors, Face Biometrics for Personal Identification, Signals and Communication Technology, pp. 109-120, Springer Berlin Heidelberg, 2007. |
Tian et al., Real-time Detection of Abandoned and Removed Objects in Complex Environments, IBM TJ Watson Research Center, 8 pages, Sep. 30, 2008. |
Viola et al., Rapid Object Detection Using a Boosted Cascade of Simple Features, IEEE, ISBN 0-7695-1272-0/01, 8 pages, 2001. |
Conaire et al., Multispectral Object Segmentation and Retrieval in Surveillance Video, Centre for Digital Video Processing, Dublin City University, Ireland, 2006 IEEE, pp. 2381-2384, 2006. |
Tseng et al., Mining from Time Series Human Movement Data, 2006 IEEE International Conference on Systems, Man, and Cybernetics, Oct. 8-11, 2006 Tapei, Taiwan, pp. 3241-3243, 2006. |
Raskar et al., Image Fusion for Context Enhancement and Video Surrealism, Association for Computing Machinery, Inc., 2004, pp. 85-93, and 153, 2004. |
Abidi et al., Survey and Analysis of Multimodal Sensor Planning and Integration for Wide Area Surveillance, ACM Computing Surveys, vol. 41, No. 1, Article 7, publication date 2006, pp. 7:2-7:36, 2006. |
Kong et al., Recent Advances in Visual and Infrared Face Recognition—a Review, Computer Vision and Understanding 97 (2005), 2004 Elsevier, Inc., pp. 103-134, 2004. |
Boyle et al., The Language of Privacy: Learning from Video Media Space Analysis and Design, ACM Transactions on Computer-Human Interactions, vol. 12, No. 2, Jun. 2005, pp. 328-370, 2005. |
Milian Carmelo, Virtual Interviews, IPCOM000170611D, May 22, 2008, 3 pages, 2008. |
Fukuda et al. Visual Surveillance System with Multiple Cameras in Wide Area Environment, Nihon Kikai Gakkai Ronbunshu C (Transactions of the Japan Society of Mechanical Engineers, pt. C), vol. 69, Issue 680, Apr. 2003, pp. 1011-1018, English Abstract Only, 2003. |
Petrushin et al., Multiple Sensor Integration for Indoor Surveillance, MDM/KDD 2005, Aug. 21, 2005, Chicago, Illinois, USA, pp. 53-60, 2005. |
Trease et al., Unstructured Data Analysis of Streaming Video Using Parallel, High-throughput Algorithms, Proceedings of the Ninth lasted International Conference on Signal and Image Processing, Aug. 20-22, 2007, Honolulu, Hawaii, USA, pp. 305-310, 2007. |
Hampapur et al., Smart Video Surveillance Exploring the Concept of Multiscale Spatiotemporal Tracking, IEEE Signal Processing Magazine, Mar. 2005, pp. 38-51, 2005. |
Kang et al., Continuous Tracking within and Across Camera Streams, Proceedings of the 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'03), 6 pages, 2003. |
Comaniciu et al., Real-Time Tracking on Non-Rigid Objects Using Mean Shift, IEEE CVPR 2000, pp. 1-8, 2000. |
U.S. Appl. No. 12/845,121, Multispectral Detection of Personal Attributes for Video Surveillance, filed Jul. 28, 2010. |
U.S. Appl. No. 12/845,119, Attribute-Based Person Tracking Across Multiple Cameras, filed Jul. 28, 2010. |
U.S. Appl. No. 12/845,116, Facilitating People Search in Video Surveillance, filed Jul. 28, 2010. |
Samangooei et al., The Use of Semantic Human Description as a Soft Biometric, Biometrics: Theory, Applications and Systems, 2008. BTAS 2008. 2nd IEEE International Conference. |
Park et al. Multiresolution Models for Object Detection, in Proceedings of the 11th European Conference on Computer Vision: Part IV, pp. 241-254 (2010). |
Number | Date | Country | |
---|---|---|---|
20130308868 A1 | Nov 2013 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 13783749 | Mar 2013 | US |
Child | 13948325 | US | |
Parent | 12845095 | Jul 2010 | US |
Child | 13783749 | US |