This invention relates generally to implementations of gesture recognition systems, and more particularly to gesture recognition systems and methods employing machine vision and computer-aided vision systems and methods.
Machine vision systems generally include an image source, such as a camera, for retrieving an image of a subject, such as a person, coupled with a computer system. Many system implementations receive images from the image source, process them using the computer system, and utilize the computer system to implement various methods to determine whether a user being observed by the image source is using portions of his or her body to make particular actions or form particular shapes, or gestures. The computer system then associates the observed gestures with executable commands or instructions. Machine vision systems that analyze the images for gestures are referred to as gesture recognition systems.
Various implementations of gesture recognition systems, implementations of methods of gesture recognition, and implementations of methods of generating a depth map are presented in accordance with the present invention. The inventors of the present invention, however, have determined that many presently available gesture recognition systems are insufficient in their ability to recognize gestures and provide such recognition for subsequent processing.
Therefore, it would be desirable to provide an apparatus that overcomes the drawbacks of the prior art.
Gesture recognition systems provided in accordance with the present invention may be used in a wide variety of operating contexts and locations. For example, a gesture recognition system according to one or more embodiments of the present invention may be utilized to observe individuals standing by a wall of a building on which an interface has been projected. As the individuals move their arms, the system observes the gestures, recognizes them, and executes commands using a computer associated with the gesture recognition system to perform a variety of tasks, such as, by non-limiting example, opening a web site, saving files to a storage device, opening a document, viewing a video, viewing a picture, searching for a book, or any other task that a computer may be involved in performing.
In another situation, an implementation of a gesture recognition system in accordance with one or more embodiments of the present invention may be incorporated into or in the bezel of a laptop computer above the screen area, or in any other conveniently located position on such a laptop or other computing or mobile device. In this position, when the computer is in operation and the user is in the field of view of the image camera being used to view the user's actions, gesture recognition may be used to enable the performance of various tasks on the laptop screen like those previously discussed. Particular implementations of gesture recognition systems may also be developed to enable individuals with limited motor coordination or movement, or physical impairments, to be able to interface with or utilize a computer, by using certain predefined gestures and/or watching the movement of particular portions of the user's body.
Gesture recognition systems in accordance with one or more embodiments of the present inventions may be employed in a wide variety of other use environments, conditions, and cases, including by non-limiting example, to enable interactive video game play or exercise, in kiosks to allow individuals to get information without touching a screen, in vehicles, in interactive advertisements, to guide aircraft or other vehicles directly or remotely, enable physical skill training exercises, provide secure access to controlled areas, or any other situation or location where allowing a user to communicate through actions would facilitate human/system interaction.
Still other objects and advantages of the invention will in part be obvious and will in part be apparent from the specification and drawings.
The invention accordingly comprises the several steps and the relation of one or more of such steps with respect to each of the others, and the apparatus embodying features of construction, combinations of elements and arrangement of parts that are adapted to affect such steps, all as exemplified in the following detailed disclosure, and the scope of the invention will be indicated in the claims.
For a more complete understanding of the invention, reference is made to the following description and accompanying drawings, in which:
The invention will now be described making reference to the following drawings in which like reference numbers denote like structure or steps. Referring first to
Referring back to
At a next step 120 of
Once the frames have been processed and clustering and locating the various body portions has been completed, implementations of TOF camera-employing gesture recognition systems may utilize implementations of any of the gesture recognition methods in accordance with this invention to process the resulting depth maps generated in accordance with the invention. When one or more gestures performed by an imaged individual are recognized, this individual 208 can execute and interact with the computer 206 or any other system in communication with computer 206 and may also provide feedback to the individual 208 through display 204.
Referring next to
Referring next to
As is then shown in
Referring next to
Next, at step 520 the reference frame may be used to identify new objects in the image by subtracting the reference from a current frame from one of the cameras. When the reference frame is subtracted from the current frame, all of the pixels that include information that has not changed in the current frame from the reference frame are zeroed out or take on null values. The process may sometimes be called background subtraction. The remaining pixels represent areas in the frame which correspond to changes in the image from the time of the current frame to the time of the reference image, which changes are generally apparent because an object or person has been moving since the time the reference image was taken. Finally, at step 530, the resulting portion of the image is thresholded and used to compute a motion mask, or area of interest within the image where depth values will be calculated.
Once a motion mask has been created and the areas or regions within the images that are changing as a result of motion have been identified, implementations of gesture recognition systems in accordance with the present invention may utilize methods of evaluating the texture of the regions of the images within the motion mask and of segmenting within the regions based on differences in their texture. By “texture” is meant a particular pattern of lighter and darker pixels within a particular area or region of the frame or image. For example, a highly-textured region would be an area where half of the pixels were white and half were black and the black pixels were arranged in parallel lines across the region. An example of a very non-textured region would be one where practically all the pixels were white or black, or the same or very similar colors.
Referring next to
If at step 610 it is determined that the area is not highly textured, then processing passes to step 615 where it is queried whether the area of the image is very non-textured. If this inquiry is answered in the affirmative, and it is determined that the area of the image is very non-textured when compared to a low texture threshold, then processing preferably passes to step 640 to implement a method of block-based median filtering and color-based clustering including clustering the pixels within the same color with pixels of the same color located at the edge of a region with a known depth. Then, at step 645, the known depth values is assigned to all pixels with that came color, and at step 650, median filtering is performed. This method will also be discussed subsequently in greater depth.
If at step 615 it is instead determined that the area of the image is not very non-textured, and therefore falls between the high texture threshold and the low texture threshold, processing preferably passes to step 655 where a stereo correspondence algorithm may be executed directly on the pixels in the area or image being evaluated to determine the pixel depths thereof.
Finally, in accordance with the invention, regardless of the path followed to generate the pixel depths, one or more depth maps may be generated. In some embodiments of the invention, all three methods may be employed one or more times for each motion mask region being analyzed to generate a portion of the depth map. When two or more of the methods are used to analyze portions of the area, the resulting depth map portions formed are joined together to form a depth map of the entire area for the particular frame or image within the motion mask area. Each of these methods will be discussed in greater detail in the following sections.
Referring to
As noted in
An example of a noted steerable filter bank in accordance with an embodiment of the present invention is set forth in
In particular implementations presented in accordance with the present invention, as noted in step 620, a Gabor filter bank may also be used in place of a steerable filter bank. Relevant teachings and disclosure concerning the structure, function, implementation, and methods of using steerable filter banks and Gabor filter banks for texture segmentation and processing may be found in the following references, each of which is incorporated herein by reference in its entirety: W. T. Freeman, et al., “The Design and Use of Steerable Filters,” IEEE Transactions of Pattern Analysis and Machine Intelligence, v. 13, p. 891-906 (September 1991); E. P. Simoncelli, et al., “Shiftable Multi-scale Transforms,” IEEE Transactions on Information Theory, v. 38, p. 587-607 (March 1992); E. P. Simoncellie, et al., “The Steerable pyramid: A Flexible Architecture for Multi-Scale Derivative Coomputation,” Proceedings of ICIP-95, v. 3, p. 444-447 (October 1995); and J. Chen, et al., “Adaptive Perceptual Color-Texture Image Segmentation,” IEEE Transactions on Image Processing, v. 14, No. 10, p. 1524-1536 (October 2005).
Referring next to
Each pixel in the depth map generated from a stereoscopic camera configuration contains information from a pixel in a left image and a corresponding pixel in a right image where both pixels are viewing the same point in the scene. The two corresponding or paired pixels may be located using a stereo correspondence algorithm as shown in step 1010. As is further illustrated in
In particular embodiments of the present invention, at step 1020, the method may include determining the number of pixels with non-zero values (valid pixels) in the motion mask. The method further includes processing at step 1030 so the SSD value may be scaled based on the percentage of valid differencing operations according the following equation (because less than 100% of the pixels in the window were used for the calculation):
where SSDS is the scaled SSD value, Rh is the horizontal window radius, Rv is the vertical window radius, and Nd is the number of valid differencing operations used to calculate the SSD.
In particular embodiments of the invention, at step 1040, the SSD value may be considered a candidate SSD value and evaluated to determine its validity. The validity of the candidate SSD value may be determined if at least 51% of the pixels in one window are valid and correspond with valid pixels in the other window. Any of a wide variety of other criteria and other percentages could also be used to determine the validity of a candidate SSD value in particular implementations.
Once the SSD values have been determined, at step 1050 they are used in calculations to generate a disparity map of the area within the motion mask. With the values in the disparity map at step 1060, a depth map is calculated using any of a wide variety of known methods and techniques. The foregoing method may be used directly to form depth maps or portions of depth maps directly from the image data.
Implementations of many, if not all, of the methods presented in accordance with the present invention may be carried out on a computer as software programmed in a wide variety of languages. Any of a wide variety of computer hardware platforms, processor types, operating systems, and telecommunication networks may be involved in carrying out various method steps. In a particular implementation, the processor being used may be a graphics processing unit (GPU) such as those manufactured by NVIDIA® Corporation of Santa Clara Calif. The software instructions utilized to carry out the various method steps may be programmed in a CUDA environment, which is a term used to described both the computer architecture manufactured by NVIDIA® that currently support C language programming. Accordingly, something being “programmed in CUDA” means that the code may be written in any language supported by the CUDA architecture, which generally includes a massively multithreaded computing environment including a many 21 cored processor. Because the stereo correspondence and any of the other methods disclosed in this document may be implemented in CUDA on the GPU, the processing load of the central processing unit (CPU) may be substantially reduced, and may enable gesture detection with stereo cameras in real time at 70-80 frames per second.
Referring to
In particular embodiments in accordance with the invention, when any method or structure presented in accordance with the present invention are implemented using CUDA on a GPU, any, all, or some of the methods may be programmed to operate asynchronously and scalably. Each method and or section of a method and/or group of methods may be applied separately, and may serve as its own compartmentalized compute device. In particular embodiments in accordance with the present invention, no actual main thread may be used from which child or derived threads are run. Instead, the entire method and/or section may be run in separate threads all interfacing with the CPU for input/output. In these implementations, the resulting scalability may ensure that the overall execution of the method(s) and/or sections does not slow down should a specific method(s) and/or section require more time to execute on the GPU.
Referring next to
Referring to
Referring to
For those regions within the motion mask area that are identified as having low texture, various methods of block-based median filtering and color-based clustering may be employed in accordance with the present invention, as noted above. The overall process of block-based median filtering involves performing a pixel-wise operation on the neighborhood pixels of a particular pixel and assigning the median value of the neighborhood pixels to that pixel. In the case of generating depth maps, the median disparity value calculated in the neighborhood of a pixel will be assigned to that pixel as its disparity value.
Referring to
While the foregoing has dealt primarily with various implementations of methods of generating a depth map from stereo camera images, embodiments of the present invention contemplate gesture recognition systems utilizing TOF sensors to generate depth maps for use in implementations of gesture recognition methods.
Referring to
Referring to
With one or more clusters identified, the method may include finding a cluster corresponding with the user's body (or major portion of the body, such as a torso or face, depending upon the implementation), or other portion of another type of user actuator, and establishing an oval membrane around the body cluster at step 1725. The method may also include establishing the oval membrane as the background depth reference from which all other body portions will be tracked at step 1730. The method may then include, at step 1735, finding the arm clusters, at step 1740, locating the head and shoulders, at step 1745 calculating arm length, and finally at step 1750, finding a hand and tracking its position relative to the oval membrane. A wide variety of techniques can be employed to find and/or calculate the arm length such as, by non-limiting example, various biometric methods, databases of common human proportion values, and other methods, algorithms, and/or databases.
Implementations of one or more clustering methods presented in accordance with the present invention may be implemented in CUDA. A non-limiting example of an embodiment of the invention including an implementation of a clustering method in CUDA will now be described. In this implementation, a map of cluster numbers is created that is updated as clusters merge through an agglomeration process. Three stages may be utilized by the algorithm. These stages are implemented in three kernels to allow the cluster map to be copied into texture memory after each stage.
Three different clustering methods of many possible methods may be implemented and employ the following clustering stages of the invention. A first clustering method of the invention may treat the image as binary with no additional constraints beyond the 2D spatial window. Another clustering method may utilize the absolute difference in grayscale values (and thus depth values) as a distance metric according to Equation 2. Color-based clustering may be implemented by a third method in accordance with the invention which uses an RGB Euclidean distance metric according to Equation 3.
ΔI=|I1−I2| (2)
ΔC=√{square root over ((Cr,1−Cr,2)2+(Cg,1−Cg,2)2+(Cb,1−Cb,2))} (3)
Referring to
During a second stage or a linking stage depicted in accordance with an embodiment of the invention at
Referring finally to
Any of a wide variety of combinations of specific clustering methods and clustering stages is possible using the principles disclosed in accordance with this invention. In addition, the stages may be implemented in any order, iteratively performed, and repetitively performed depending upon the constraints of the clusters and the desired outcome. Also, implementations of the method of clustering described above may be utilized for clustering pixels based on any desired value expressed by and/or represented by a pixel, such as, by non-limiting example, depth, color, texture, intensity, chromaticity, or other pixel characteristic.
Once the finished depth maps have been produced by an implementation of a depth estimation system utilizing stereoscopic cameras or by an implementation of a depth estimation system using a TOF camera or sensor as described in accordance with one or more embodiments of this invention, various implementations of methods of gesture recognition can be used in accordance with additional embodiments of the invention. These methods may allow the computer to determine whether the user is making a static or a dynamic gesture. A static gesture may be a particular orientation of the hand or arm. Static gestures include orientations of the hand or arm that are recognized when the hand or arm forms a pattern that does not include a movement (such as many American Sign Language signs). Dynamic gestures include orientations of the hand or arm that are recognized when the hand, fingers, palm, wrist, or arm move in a particular orientation, perform a defined action, or make a defined movement. Based on whether the gesture, either static or dynamic, is recognized, the computer may preferably execute an instruction, process, or code associated with the gesture through a gesture library or database, and display results on a display or perform other resulting actions.
Referring to
At step 2130, the method may then determine whether the depth data in one or more of the frames includes a gesture that is determined likely to be static or dynamic. A wide variety of methods may be used to make the decision, including, by non-limiting example, a time requirement, recognition of movement within a particular time interval, identification that particular hand features are visible within a frame, or any other method of determining whether the gesture is executed in a fixed or a moving fashion. If the gesture is determined to be dynamic at step 2130, the processing passes to step 2140, and the resulting set of depth data frames that contain the gesture (or portions of the set of frames containing the gesture) are may be evaluated using a hidden Markov model and stored gestures in a gesture library or database to determine the likelihood of a match. Implementations of gesture libraries or databases may include video segments or maps of the movement of particular points of the hand through time to enable the hidden Markov model to determine whether what stored gestures in the database could have been produced by the observed gesture. An example of a type of hidden Markov model that may be used with implementations of the method may be found in the article by S. Rajko, et al., “HMM Parameter Reduction for Practice Gesture Recognition,” Proceedings of the International Conference on Automatic Gesture Recognition (September 2008) which is incorporated entirely herein by reference. If the gesture is determined to be a match at step 2160, then the computer may execute a command or instruction corresponding with the matched gesture at step 2170, in the context of the context-aware algorithm.
If the observed gesture is determined at step 2150 to be a static gesture, then implementations of the method may utilize a generative artificial neural network to determine whether the gesture matches one included in a gesture database. In particular implementations, the network may operate by imagining the gestures possible in the given context (using inputs from the context-aware algorithm in some implementations). If the network determines that a match exists at step 2160, then at step 2170 a command or instruction may be executed in accordance therewith. Examples of implementations of generative artificial neural networks that may be utilized may be found in the article to Geoffrey Hinton, et al., entitled “A Fast Learning Algorithm for Deep Belief Nets,” Neural Computation, v. 18 p. 1527-1554, the disclosure of which is hereby entirely incorporated herein by reference. Particular implementations in accordance with the invention may utilize deep belief networks. In accordance with one or more embodiments of the present invention, many modifications have been made to this network, specifically, in its over topology and architecture, such that the network is suited for gesture recognition.
Referring to
If it is instead determined at step 2220 that the gesture is static, then processing alternatively passes to step 2240 where a generally unsupervised learning process may be implemented in combination with an implementation of a generative artificial neural network to record and store the new gesture. The particular generative artificial neural network used may be any previously presented in accordance with the invention.
Once the observed gesture has been recorded and stored, the method may also alternatively include associating the learned gesture with a particular context-aware algorithm and/or inputting or associating the instructions or steps that should be executed by the computer when the gesture is observed. Additional context-aware algorithms may be created in particular implementations of the present invention. Any of a wide variety of other application-specific information may also be input or associated with the gesture and/or the context-aware algorithm, depending upon what the command or instruction the gesture is associated with requires for execution.
Therefore, in accordance with the present invention, as can be seen, implementations of the described gesture recognition systems and related methods may have the following advantages, among any number of advantages:
Using CUDA to create a massively multithreaded application to process image data on a multi-cored GPU may enable use of very inexpensive stereo camera equipment while still providing depth map data of sufficient quality. The use of hidden Markov and generative artificial neural networks for gesture recognition and learning in combination with real time or near real time depth map information may enable accurate gesture recognition without requiring artificial user posing or positioning.
The materials used for the described embodiments of the invention for the implementation of gesture recognition systems may be made of conventional materials used to make goods similar to these in the art, such as, by non-limiting example, plastics, metals, semiconductor materials, rubbers, glasses, and the like. Those of ordinary skill in the art will readily be able to elect appropriate materials and manufacture these products from the disclosures provided herein.
The implementations listed here, and many others, will become readily apparent from this disclosure. From this, those of ordinary skill in the art will readily understand the versatility with which this disclosure may be applied.
Still other objects and advantages of the invention will in part be obvious and will in part be apparent from the specification and drawings.
The invention accordingly comprises the several steps and the relation of one or more of such steps with respect to each of the others, and the apparatus embodying features of construction, combinations of elements and arrangement of parts that are adapted to affect such steps, all as exemplified in the following detailed disclosure, and the scope of the invention will be indicated in the claims.
This application claims the benefit of U.S. Provisional Patent Application Ser. No. 61/180,351 filed May 21, 2009, to El Dokor et al., titled Gesture Recognition Systems and Related Methods, the contents thereof being incorporated herein by reference.
Number | Name | Date | Kind |
---|---|---|---|
5454043 | Freeman | Sep 1995 | A |
5544050 | Abe et al. | Aug 1996 | A |
5581276 | Cipolla et al. | Dec 1996 | A |
5594469 | Freeman et al. | Jan 1997 | A |
5699441 | Sagawa et al. | Dec 1997 | A |
5767842 | Korth | Jun 1998 | A |
5887069 | Sakou et al. | Mar 1999 | A |
5990865 | Gard | Nov 1999 | A |
6002808 | Freeman | Dec 1999 | A |
6072494 | Nguyen | Jun 2000 | A |
6075895 | Qiao et al. | Jun 2000 | A |
6115482 | Sears et al. | Sep 2000 | A |
6128003 | Smith et al. | Oct 2000 | A |
6141434 | Christian et al. | Oct 2000 | A |
6147678 | Kumar et al. | Nov 2000 | A |
6181343 | Lyons | Jan 2001 | B1 |
6195104 | Lyons | Feb 2001 | B1 |
6204852 | Kumar et al. | Mar 2001 | B1 |
6215890 | Matsuo et al. | Apr 2001 | B1 |
6222465 | Kumar et al. | Apr 2001 | B1 |
6240197 | Christian et al. | May 2001 | B1 |
6240198 | Rehg et al. | May 2001 | B1 |
6252598 | Segen | Jun 2001 | B1 |
6256033 | Nguyen | Jul 2001 | B1 |
6256400 | Takata et al. | Jul 2001 | B1 |
6269172 | Rehg et al. | Jul 2001 | B1 |
6323942 | Bamji | Nov 2001 | B1 |
6324453 | Breed et al. | Nov 2001 | B1 |
6360003 | Doi et al. | Mar 2002 | B1 |
6363160 | Bradski et al. | Mar 2002 | B1 |
6377238 | McPheters | Apr 2002 | B1 |
6389182 | Ihara et al. | May 2002 | B1 |
6394557 | Bradski | May 2002 | B2 |
6400830 | Christian et al. | Jun 2002 | B1 |
6434255 | Harakawa | Aug 2002 | B1 |
6442465 | Breed et al. | Aug 2002 | B2 |
6456728 | Doi et al. | Sep 2002 | B1 |
6478432 | Dyner | Nov 2002 | B1 |
6509707 | Yamashita et al. | Jan 2003 | B2 |
6512838 | Rafii et al. | Jan 2003 | B1 |
6526156 | Black et al. | Feb 2003 | B1 |
6553296 | Breed et al. | Apr 2003 | B2 |
6556708 | Christian et al. | Apr 2003 | B1 |
6571193 | Unuma et al. | May 2003 | B1 |
6590605 | Eichenlaub | Jul 2003 | B1 |
6600475 | Gutta et al. | Jul 2003 | B2 |
6608910 | Srinivasa et al. | Aug 2003 | B1 |
6614422 | Rafii et al. | Sep 2003 | B1 |
6624833 | Kumar et al. | Sep 2003 | B1 |
6674877 | Jojic et al. | Jan 2004 | B1 |
6674895 | Rafii et al. | Jan 2004 | B2 |
6678425 | Flores et al. | Jan 2004 | B1 |
6681031 | Cohen et al. | Jan 2004 | B2 |
6683968 | Pavlovic et al. | Jan 2004 | B1 |
6757571 | Toyama | Jun 2004 | B1 |
6766036 | Pryor | Jul 2004 | B1 |
6768486 | Szabo et al. | Jul 2004 | B1 |
6788809 | Grzeszczuk et al. | Sep 2004 | B1 |
6795567 | Cham et al. | Sep 2004 | B1 |
6801637 | Voronka et al. | Oct 2004 | B2 |
6804396 | Higaki et al. | Oct 2004 | B2 |
6829730 | Nadeau-Dostie et al. | Dec 2004 | B2 |
6857746 | Dyner | Feb 2005 | B2 |
6901561 | Kirkpatrick et al. | May 2005 | B1 |
6937742 | Roberts et al. | Aug 2005 | B2 |
6940646 | Taniguchi et al. | Sep 2005 | B2 |
6944315 | Zipperer et al. | Sep 2005 | B1 |
6950534 | Cohen et al. | Sep 2005 | B2 |
6993462 | Pavlovic et al. | Jan 2006 | B1 |
7039676 | Day et al. | May 2006 | B1 |
7046232 | Inagaki et al. | May 2006 | B2 |
7050606 | Paul et al. | May 2006 | B2 |
7050624 | Dialameh et al. | May 2006 | B2 |
7058204 | Hildreth et al. | Jun 2006 | B2 |
7065230 | Yuasa et al. | Jun 2006 | B2 |
7068842 | Liang et al. | Jun 2006 | B2 |
7095401 | Liu et al. | Aug 2006 | B2 |
7102615 | Marks | Sep 2006 | B2 |
7129927 | Mattsson | Oct 2006 | B2 |
7170492 | Bell | Jan 2007 | B2 |
7190811 | Ivanov | Mar 2007 | B2 |
7203340 | Gorodnichy | Apr 2007 | B2 |
7212663 | Tomasi | May 2007 | B2 |
7221779 | Kawakami et al. | May 2007 | B2 |
7224830 | Nefian et al. | May 2007 | B2 |
7224851 | Kinjo | May 2007 | B2 |
7233320 | Lapstun et al. | Jun 2007 | B1 |
7236611 | Roberts et al. | Jun 2007 | B2 |
7239718 | Park et al. | Jul 2007 | B2 |
7257237 | Luck et al. | Aug 2007 | B1 |
7274800 | Nefian et al. | Sep 2007 | B2 |
7274803 | Sharma et al. | Sep 2007 | B1 |
7289645 | Yamamoto et al. | Oct 2007 | B2 |
7295709 | Cootes et al. | Nov 2007 | B2 |
7296007 | Funge et al. | Nov 2007 | B1 |
7308112 | Fujimura et al. | Dec 2007 | B2 |
7340077 | Gokturk et al. | Mar 2008 | B2 |
7340078 | Shikano et al. | Mar 2008 | B2 |
7342485 | Joehl et al. | Mar 2008 | B2 |
7346192 | Yuasa et al. | Mar 2008 | B2 |
7348963 | Bell | Mar 2008 | B2 |
7359529 | Lee | Apr 2008 | B2 |
7372977 | Fujimura et al. | May 2008 | B2 |
7379563 | Shamaie | May 2008 | B2 |
7391409 | Zalewski et al. | Jun 2008 | B2 |
7394346 | Bodin | Jul 2008 | B2 |
7412077 | Li et al. | Aug 2008 | B2 |
7415126 | Breed et al. | Aug 2008 | B2 |
7415212 | Matsushita et al. | Aug 2008 | B2 |
7421093 | Hildreth et al. | Sep 2008 | B2 |
7423540 | Kisacanin | Sep 2008 | B2 |
7444001 | Roberts et al. | Oct 2008 | B2 |
7450736 | Yang et al. | Nov 2008 | B2 |
7460690 | Cohen et al. | Dec 2008 | B2 |
7477758 | Piirainen et al. | Jan 2009 | B2 |
7489308 | Blake et al. | Feb 2009 | B2 |
7489806 | Mohri et al. | Feb 2009 | B2 |
7499569 | Sato et al. | Mar 2009 | B2 |
7512262 | Criminisi et al. | Mar 2009 | B2 |
7519223 | Dehlin et al. | Apr 2009 | B2 |
7519537 | Rosenberg | Apr 2009 | B2 |
7574020 | Shamaie | Aug 2009 | B2 |
7590262 | Fujimura et al. | Sep 2009 | B2 |
7593552 | Higaki et al. | Sep 2009 | B2 |
7598942 | Underkoffler et al. | Oct 2009 | B2 |
7599547 | Sun et al. | Oct 2009 | B2 |
7606411 | Venetsky et al. | Oct 2009 | B2 |
7614019 | Rimas Ribikauskas et al. | Nov 2009 | B2 |
7620316 | Boillot | Nov 2009 | B2 |
7646372 | Marks et al. | Jan 2010 | B2 |
7660437 | Breed | Feb 2010 | B2 |
7665041 | Wilson et al. | Feb 2010 | B2 |
7676062 | Breed et al. | Mar 2010 | B2 |
7720282 | Blake et al. | May 2010 | B2 |
7721207 | Nilsson | May 2010 | B2 |
7804998 | Mundermann et al. | Sep 2010 | B2 |
20010030642 | Sullivan et al. | Oct 2001 | A1 |
20020041327 | Hildreth et al. | Apr 2002 | A1 |
20020064382 | Hildreth et al. | May 2002 | A1 |
20020090133 | Kim et al. | Jul 2002 | A1 |
20020140633 | Rafii et al. | Oct 2002 | A1 |
20040001182 | Dyner | Jan 2004 | A1 |
20040183775 | Bell | Sep 2004 | A1 |
20050002074 | McPheters et al. | Jan 2005 | A1 |
20050083314 | Shalit et al. | Apr 2005 | A1 |
20050105775 | Luo et al. | May 2005 | A1 |
20050190443 | Nam et al. | Sep 2005 | A1 |
20050286756 | Hong et al. | Dec 2005 | A1 |
20060093186 | Ivanov | May 2006 | A1 |
20060101354 | Hashimoto et al. | May 2006 | A1 |
20060136846 | Im et al. | Jun 2006 | A1 |
20060139314 | Bell | Jun 2006 | A1 |
20060221072 | Se et al. | Oct 2006 | A1 |
20070055427 | Sun et al. | Mar 2007 | A1 |
20070113207 | Gritton | May 2007 | A1 |
20070132721 | Glomski et al. | Jun 2007 | A1 |
20070195997 | Paul et al. | Aug 2007 | A1 |
20070263932 | Bernardin et al. | Nov 2007 | A1 |
20070280505 | Breed | Dec 2007 | A1 |
20080002878 | Meiyappan et al. | Jan 2008 | A1 |
20080005703 | Radivojevic et al. | Jan 2008 | A1 |
20080013793 | Hillis et al. | Jan 2008 | A1 |
20080037875 | Kim et al. | Feb 2008 | A1 |
20080052643 | Ike et al. | Feb 2008 | A1 |
20080059578 | Albertson et al. | Mar 2008 | A1 |
20080065291 | Breed | Mar 2008 | A1 |
20080069415 | Schildkraut et al. | Mar 2008 | A1 |
20080069437 | Baker | Mar 2008 | A1 |
20080104547 | Morita et al. | May 2008 | A1 |
20080107303 | Kim et al. | May 2008 | A1 |
20080120577 | Ma et al. | May 2008 | A1 |
20080178126 | Beeck et al. | Jul 2008 | A1 |
20080181459 | Martin et al. | Jul 2008 | A1 |
20080219501 | Matsumoto | Sep 2008 | A1 |
20080219502 | Shamaie | Sep 2008 | A1 |
20080225041 | El Dokor et al. | Sep 2008 | A1 |
20080229255 | Linjama et al. | Sep 2008 | A1 |
20080240502 | Freedman et al. | Oct 2008 | A1 |
20080244465 | Kongqiao et al. | Oct 2008 | A1 |
20080244468 | Nishihara et al. | Oct 2008 | A1 |
20080267449 | Dumas et al. | Oct 2008 | A1 |
20080282202 | Sunday | Nov 2008 | A1 |
20090006292 | Block | Jan 2009 | A1 |
20090027337 | Hildreth | Jan 2009 | A1 |
20090037849 | Immonen et al. | Feb 2009 | A1 |
20090040215 | Afzulpurkar et al. | Feb 2009 | A1 |
20090060268 | Roberts et al. | Mar 2009 | A1 |
20090074248 | Cohen et al. | Mar 2009 | A1 |
20090077504 | Bell et al. | Mar 2009 | A1 |
20090079813 | Hildreth | Mar 2009 | A1 |
20090080526 | Vasireddy et al. | Mar 2009 | A1 |
20090085864 | Kutliroff et al. | Apr 2009 | A1 |
20090102788 | Nishida et al. | Apr 2009 | A1 |
20090102800 | Keenan | Apr 2009 | A1 |
20090103780 | Nishihara et al. | Apr 2009 | A1 |
20090108649 | Kneller et al. | Apr 2009 | A1 |
20090109036 | Schalla et al. | Apr 2009 | A1 |
20090110292 | Fujimura et al. | Apr 2009 | A1 |
20090115721 | Aull et al. | May 2009 | A1 |
20090116742 | Nishihara | May 2009 | A1 |
20090116749 | Cristinacce et al. | May 2009 | A1 |
20090150160 | Mozer | Jun 2009 | A1 |
20090153366 | Im et al. | Jun 2009 | A1 |
20090153655 | Ike et al. | Jun 2009 | A1 |
20090180668 | Jones et al. | Jul 2009 | A1 |
20090183125 | Magal et al. | Jul 2009 | A1 |
20090183193 | Miller, IV | Jul 2009 | A1 |
20090189858 | Lev et al. | Jul 2009 | A1 |
20090208057 | Wilson et al. | Aug 2009 | A1 |
20090222149 | Murray et al. | Sep 2009 | A1 |
20090228841 | Hildreth | Sep 2009 | A1 |
20090231278 | St. Hilaire et al. | Sep 2009 | A1 |
20090244309 | Maison et al. | Oct 2009 | A1 |
20090249258 | Tang | Oct 2009 | A1 |
20090262986 | Cartey et al. | Oct 2009 | A1 |
20090268945 | Wilson et al. | Oct 2009 | A1 |
20090273563 | Pryor | Nov 2009 | A1 |
20090273574 | Pryor | Nov 2009 | A1 |
20090273575 | Pryor | Nov 2009 | A1 |
20090278915 | Kramer et al. | Nov 2009 | A1 |
20090295738 | Chiang | Dec 2009 | A1 |
20090296991 | Anzola | Dec 2009 | A1 |
20090315740 | Hildreth et al. | Dec 2009 | A1 |
20090316952 | Ferren et al. | Dec 2009 | A1 |
20100079448 | Chen et al. | Apr 2010 | A1 |
20100119114 | Ardis et al. | May 2010 | A1 |
20100141651 | Tan | Jun 2010 | A1 |
20100228694 | Le Roux et al. | Sep 2010 | A1 |
20110002541 | Varekamp | Jan 2011 | A1 |
Entry |
---|
W.T. Freeman, et al., “The Design and Use of Steerable Filters,” IEEE Transactions of Pattern Analysis and Machine Intelligence, v. 13, p. 891-906 (Sep. 1991). |
E.P. Simoncelli, et al., “Shiftable Multi-scale Transforms,” IEEE Transactions on Information Theory, v. 38, p. 587-607 (Mar. 1992). |
E.P. Simoncellie, et al., “The Steerable pyramid: A Flexible Architecture for Multi-Scale Derivative Coomputation,” Proceedings of ICIP-95, v. 3, p. 444-447 (Oct. 1995). |
J. Chen, et al., “Adaptive Perceptual Color-Texture Image Segmentation,” IEEE Transactions on Image Processing, v. 14, No. 10, p. 1524-1536 (Oct. 2005). |
“Parallel Processing with CUDA,” by Tom R. Halfhill, Microprocessor Report (Jan. 28, 2008) available at http://www.nvidia.com/docs/IO/55972/220401—Reprint.pdf. |
“CUDA, Supercomputing for the Masses: Part 4, The CUDA Memory Model,” by Rob Farber under the High Performance Computing section of the Dr. Dobbs website, p. 3 available at http://www.ddj.com/hpc-high-performance- computing/208401741. |
S. Rajko, et al., “HMM Parameter Reduction for Practice Gesture Recognition,” Proceedings of the International Conference on Automatic Gesture Recognition (Sep. 2008). |
Geoffrey Hinton, et al., entitled “A Fast Learning Algorithm for Deep Belief Nets,” Neural Computation, v. 18 p. 1527-1554. |
Freeman, W. T. et al., “The Design and Use of Steerable Filters” IEEE Transactions of Pattern Analysis and Machine Intelligence V. 13, (Sep. 1991),891-906. |
Simoncelli, E.P. et al., “Shiftable Multi-scale Transforms” IEEE Transactions on Information Theory V. 38, (Mar. 1992),587-607. |
Simoncelli, E.P. et al., “The Steerable Pyramid: A Flexible Architecture for Multi-Scale Derivative Computation” Proceedings of ICIP-95 V. 3, (Oct. 1995),444-447. |
Chen, J et al., “Adaptive Perceptual Color-Texture Image Segmentation” IEEE Transactions on Image Processing v. 14 No. 10, (Oct. 2005),1524-1536 (2004 revised draft). |
Halfhill, Tom R., “Parallel Processing with CUDA”, Microprocessor Report, Available at http://www.nvidia.com/docs/10/55972/220401—Reprint.pdf,(Jan. 28, 2008). |
Farber, Rob “CUDA, Supercomputing for the Masses: Part 4, The CUDA Memory Model”, Under the High Performance Computing section of the Dr. Dobbs website, p. 3 available at http://www.ddj.com/hpc-high-performance-computing/208401741, 3. |
Rajko, S et al., “HMM Parameter Reduction for Practice Gesture Recognition”, Proceedings of the International Conference on Automatic Gesture Recognition, (Sep. 2008). |
Hinton, Geoffrey et al., “A Fast Learning Algorithm for Deep Belief Nets”, Neural Computation, V. 18, 1527-1554. |
Susskind, Joshua M., et al., “Generating Facial Expressions with Deep Belief Nets”, Department of Psychology Univ. of Toronto I-Tech Education and Publishing, (2008),421-440. |
Bleyer, Michael et al., “Surface Stereo with Soft Segmentation.” Computer Vision and Pattern Recognition. IEEE, 2010, (2010). |
Chen, Junqing et al., “Adaptive perceptual color-texture image segmentation.” The International Society for Optical Engineering, SPIE Newsroom, (2006),1-2. |
Forsyth, David A., et al., “Stereopsis”, In Computer Vision A Modern Approach Prentice Hall, 2003, (2003). |
Harris, Mark et al., “Parallel Prefix Sum (Scan) with CUDA” vol. 39 in GPU Gems 3, edited by Hubert Nguyen, (2007). |
Hirschmuller, Heiko “Stereo Vision in Structured Environments by Consistent Semi-Global Matching”, Computer Vision and Pattern Recognition CVPR 06, (2006),2386-2393. |
Ivekovic, Spela et al., “Dense Wide-baseline Disparities from Conventional Stereo for Immersive Videoconferencing”, ICPR. 2004, (2004),921-924. |
Kaldewey, Tim et al., “Parallel Search on Video Cards.” First USENIX Workshop on Hot Topics in Parallelism (HotPar '09), (2009). |
Kirk, David et al., “Programming Massively Parallel Processors A Hands-on Approach”, Elsevier 2010, (2010). |
Klaus, Andreas et al., “Segment-Based Stereo Matching Using Belief Propagation and a Self-Adapting Dissimilarity Measure” Proceedings of ICPR 2006. IEEE 2006, (2006),15-18. |
Kolmogorov, Vladimir et al., “Computing Visual Correspondence with Occlusions via Graph Cuts” International Conference on Computer Vision. 2001., (2001). |
Kolmogorov, Vladimir et al., “Generalized Multi-camera Scene Reconstruction Using Graph Cuts.”, Proceedings for the International Workshop on Energy Minimization Methods in Computer Vision and Pattern Recognition. 2003., (2003). |
Kuhn, Michael et al., “Efficient ASIC Implementation of a Real-Time Depth Mapping Stereo Vision System” Proceedings of 2009 IEEE International Conference on Acoustics, Speech and Signal Processing. Taipei, Taiwan: IEEE, 2009., (2009),. |
Li, Shigang “Binocular Spherical Stereo” IEEE Transactions on Intelligent Transportation Systems (IEEE)9, No. 4 (Dec. 2008), (Dec. 2008),589-600. |
Marsalek, M et al., “Semantic hierarchies for visual object recognition”, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition 2007. CVPR '07. MN: IEEE 2007, (2007),1-7. |
Metzger, Wolfgang “Laws of Seeing”, MIT Press, 2006, (2006). |
Min, Dongbo et al., “Cost Aggregation and Occlusion Handling With WLS in Stereo Matching”, Edited by IEEE. IEEE Transactions on Image Processing 17 (2008), (2008),1431-1442. |
“NVIDIA: CUDA compute unified device architecture, prog. guide, version 1.1”, NVIDIA, (2007). |
Remondino, Fabio et al., “Turning Images into 3-D Models” IEEE Signal Processing Magazine, (2008). |
Richardson, Ian E., “H.264/MPEG-4 Part 10 White Paper” White Paper/www.vcodex.com, (2003). |
Sengupta, Shubhabrata “Scan Primitives for GPU Computing”, Proceedings of the 2007 Graphics Hardware Conference. San Diego, CA, 2007, (2007),97-106. |
Sintron, Eric et al., “Fast Parallel GPU-Sorting Using a Hybrid Algorithm”, Journal of Parallel and Distributed Computing (Elsevier) 68, No. 10, (Oct. 2008),1381-1388. |
Wang, Zeng-Fu et al., “A Region Based Stereo Matching Algorithm Using Cooperative Optimization”, CVPR (2008). |
Wei, Zheng et al., “Optimization of Linked List Prefix Computations on Multithreaded GPUs Using CUDA” 2010 IEEE International Symposium on Parallel &Distributed Processing (IPDPS). Atlanta, (2010). |
Wiegand, Thomas et al., “Overview of the H.264/AVC Video Coding Standard” IEEE Transactions on Circuits and Systems for Video Technology 13, No. 7, (Jul. 2003),560-576. |
Woodford, O.J. et al., “Global Stereo Reconstruction under Second Order Smoothness Priors” IEEE Transactions on Pattern Analysis and Machine Intelligence (IEEE) 31, No. 12, (2009),2115-2128. |
Yang, Qingxiong et al., “Stereo Matching with Color-Weighted Correlation, Hierarchical Belief Propagation, and Occlusion Handling” IEEE Transactions on Pattern Analysis and Machine Intelligence (IEEE) 31, No. 3, (Mar. 2009),492-504. |
Zinner, Christian et al., “An Optimized Software-Based Implementation of a Census-Based Stereo Matching Algorithm” Lecture Notes in Computer Science (SpringerLink) 5358, (2008)216-227. |
“PCT Search report”, PCT/US2010/035717 (Sep. 1, 2010),1-29. |
“PCT Written opinion”, PCT/US2010/035717 (Dec. 1, 2011),1-9. |
“PCT Search report”, PCT/US2011/49043, (Mar. 21, 2012), 1-4. |
“PCT Written opinion”, PCT/US2011/49043 (Mar. 21, 2012), 1-4. |
“PCT Search report”, PCT/US2011/049808 (Jan. 12, 2012), 1-2. |
“PCT Written opinion”, PCT/US2011/049808 (Jan. 12, 2012), 1-5. |
“Non-Final Office Action”, U.S. Appl. No. 12/784,123 (Oct. 2, 2012), 1-20. |
“Non-Final Office Action”, U.S. Appl. No. 12/784,022 (Jul. 16, 2012), 1-14. |
Tieleman, T et al., “Using Fast weights to improve persistent contrastive divergence”, 26th International Conference on Machine Learning New York, NY ACM, (2009),1033-1040. |
Sutskever, I et al., “The recurrent temporal restricted boltzmann machine”, NIPS, MIT Press, (2008),1601-1608. |
Parzen, E “On the estimation of a probability density function and the mode”, Annals of Math. Stats., 33, (1962),1065-1076. |
Hopfield, J.J. “Neural networks and physical systems with emergent collective computational abilities” National Academy of Sciences, 79, (1982),2554-2558. |
Culibrk, D et al., “Neural network approach to background modeling for video object segmentation”, IEEE Transactions on Neural Networks, 18, (2007),1614-1627. |
Benggio, Y et al., “Curriculum learning”, ICML 09 Proceedings of the 26th Annual International Conference on Machine Learning, New York, NY: ACM, (2009). |
Benggio, Y et al., “Scaling learning algorithms towards Al In L. a Bottou”, Large Scale Kernel Machines, MIT Press,(2007). |
Battiato, S et al., “Exposure correction for imaging devices: An overview”, In R. Lukac (Ed.), Single Sensor Imaging Methods and Applications for Digital Cameras, CRC Press,(2009),323-350. |
Number | Date | Country | |
---|---|---|---|
20100295783 A1 | Nov 2010 | US |
Number | Date | Country | |
---|---|---|---|
61180351 | May 2009 | US |