The present invention may be used as a teaching or an instructional device, be used as device to ensure quality control during manufacturing or assembly operations or be used as an amusement device.
In preferred embodiments the assembled product is displayed in real time to the user as well as the feedback that shows successful assembly.
In the event that the image captured does not conform to the reference standard alternative audio and visual feedback is provided. The feedback provided may also include further instructions, using the visual display, the audio output device or both. As such the user may be provided instructional information that the user may use to assemble the device in conformance to the standard. Such instructional information may include information relating to the nature of the incorrect orientation of the part or element and a video demonstration of the correct manner in which to orient and integrate the part or element so that the assembly is correct. In the event that the positive feedback is not generated, the user is prompted to reassemble the components until such positive feedback is generated.
In embodiments, the assembly exercise may be subject to time limitations, and if the assembly is not completed before a predetermined time has elapsed, negative feedback is provided.
In other embodiments, the performance of a particular user completing the assembly is associated with a scoring heuristic which may be dependent on time, accuracy or both.
In yet further embodiments, a number of steps may be combined before a feedback step is implemented.
In a contemplated embodiment, the device may comprise a puzzle such as a Rubik's cube or other puzzles including both two dimensional and three dimensional manifestations.
In yet further embodiments, a continuous imaging system such as a video camera is employed and the captured image is displayed to the user in real time.
In a further contemplated embodiment, the assembly and the standard relates to a structure such as a model building. Such models may be created from commercially available materials such as Lego™ brand blocks.
In yet a further embodiment the assembly relates to a repair of a damaged device or article of manufacture.
In yet a further embodiment of the device, the image and reference standard relates to an actual or simulated medical procedures such as a surgical procedure or dental procedure.
In yet a further embodiment, the image of the assembled part is transmitted to a remote location for image processing.
In yet a further embodiment, an expert is located in the remote location along with the images and can provide further feedback to the user that includes audio and visual images of the standard compared with the captured image or images that are displayed to the user in proximity to the assembled device.
The manner in which the captured image is processed and then compared to the standard image can be performed in a plurality of manners and will depend in part on the nature of the assembly or procedure that is to be performed. In an embodiment, an algorithm is applied to the captured image data to convert the characteristics of the data to multidimensional vectors, including and not limited to shape, color, and size.
The description of the invention herein is intended to provide information for one skilled in the art to understand and practice the full scope of the invention, but is not intended to be limiting as to the scope of available knowledge, nor admit that any particular reference, nor the combinations and analysis of this information as presented herein, is itself a part of the prior art. Any of the references cited herein are expressly incorporated herein by reference, as if the entirety thereof was recited completely herein. The present invention is not limited by a narrow or precise discussion or the examples provided herein, nor is it intended that any disclaimer, limitation, or mandatory language as applied to any embodiment or embodiments be considered to limit the scope of the invention as a whole. The scope of the invention is therefore to be construed as the entire literal scope of the claims, as well as any equivalents thereof. It is also understood that the title, abstract, field of the invention, and dependent claims are not intended to, and do not, limit the scope of the independent claims.
Referring now to
Now referring to
Now referring to
According to a first embodiment of the invention, information relating to a standard with respect to an assembled device or a plurality of standards with respect to a reference assembled device is provided as input to a database or other data storage system that can be accessed by a processor. In addition to the database or data storage system, the system includes a video camera for the detection of images, a user input device and a display panel. To use the device, a user of the system first selects a reference device using the user input device that will serve as the standard for the intended device to be assembled. Information relating to the reference device may be selected from a menu or may be downloaded from the internet through a website that is designed to provide data for the application. Next, the user is prompted to initiate an assembly process for the device in front of the camera or other image capturing device. As each part is integrated to the device, the camera will capture an image of the partially assembled device or structure. The image capturing step may be controlled by the user or automatically triggered by the absence of motion in a captured video frame. In this regard, the camera may be triggered by the user or, if the camera is comprised of a video device, the image may be captured by automatically saving a particular static image after the absence of motion is detected after a predetermined time has elapsed. In further embodiments, either the device or the camera may be oriented so that multiple views of the partially assembled device may be captured during the assembly process. The data from the camera is then transmitted to the processor for comparison with the reference standard.
In an embodiment, after each step in the assembly process is completed, an image is captured and then compared to the reference standard. If the assembled product conforms to the standard at each step, positive feedback is generated to reflect that the step has been successfully completed. This feedback may comprise of audio signals such as a chime, and additional visual feedback may be displayed to the user on the display. In an embodiment wherein the method involves the assembly of a device, an image of the device is displayed for each assembly step in conjunction with an outline superimposed on the image that conforms to the outer edge or periphery of the reference standard device using a dotted line in a first color such as white. If the assembly is correct, the image will be shown within the confines of the standard outline superimposed on the display. If the assembly is incorrect, the part of the assembly that does not conform is highlighted by superimposing an outline of the non-conforming part on the device in a second color on the display, such as red.
Now referring to
Referring now to
As illustrated in
As illustrated in
Now referring to
Now referring to
b depicts the parts of the assembly wherein element 402 and 401 are in the position in conformance with the reference standard. 401a and 402a depict the elements in phantom reflecting the starting position of the elements.
Thus in a first example, an application is activated on a computer 102, wherein the system includes camera 101. In response to a start command, the display will provide information relating to a standard. The display will provide a sequence of images, including the elements, the sequences of steps to reach the reference standard and the reference standard.
Next, a user will manipulate three dimensional objects in an attempt to replicate the standard. The camera captures images of the work and displays the images on the display in real time. At the same time, the processor executes an algorithm to characterize the features of the image and then compare the features to the reference standard. In this example, the comparison is executed when the processor detects the absence of motion in the transmitted image after a predetermined time. The processor then compares the last image captured to the reference standard. If the captured image is consistent with the reference standard, the will display an outline that reflects the successful. In this embodiment if the processor detects the successful completion of the first step, signal is sent to a speaker that will provide an audio signal that reflects positive feedback such as a bell or chime. If the processor fails to detect the successful completion of a step, an alternative signal is provided. In the event the step is successfully completed, the user can proceed to a second step and the process is repeated but a reference standard is altered to a second reference standard. This sequence is repeated until the assembly is completed.
If the user fails to, the user is given an opportunity to again assemble the device and present the device to the camera for imaging. The processor will then compare the reassembled device to the first reference standard.
Now referring to
While the example discussed above involves the manipulation of three dimensional object in a physical environment, in other embodiments the assembly may be achieved in a virtual environment. Accordingly, a user may select a reference standard and, following a series of assembly steps using virtual elements. In addition, while embodiments discussed above are directed to the manipulation of both real and virtual objects, in a further embodiment the standard may be directed to a preferred body positions and body movement. In this contemplated embodiment, the user may select a preferred reference standard, such as a golf swing and then attempt to replicate the motion in front of the camera. The camera can then compare the captured images against the reference standard. In yet further embodiments, the degree of deviation from the standard assembly or standard motion can be calculated and assigned a value. This value can then be displayed to the user in the form of a score. In yet further embodiments, the computer will measure the time elapsed for each step in an assembly process to be successfully completed and the time can be displayed t the user in the form of a score. In yet further embodiments a countdown display may be provide and the user is prompted to complete an assembly process in conformance with a standard that is displayed before the countdown has elapsed.
A method that can be used to detect features in an image and then compare the features is referred to as Scale-invariant feature transform (or SIFT) which employs an algorithm for computer vision for the detection of local features that are present in detected images. The algorithm, which was published by David Lowe in 1999 in a paper entitled “Object recognition from local scale-invariant features,” Proceedings of the International Conference on Computer Vision. pp. 1150-1157. doi:10.1109/ICCV.1999 is further described in U.S. Pat. No. 6,711,293 “Method and apparatus for identifying scale invariant features in an image and use of same for locating an object in an image.” which is incorporated by reference herein. The SIFT algorithm may be used for object recognition, as well as video tracking, and match moving.
In summary, the algorithm behind the SIFT keypoints technique first extracts features from a set of reference images of objects that are stored in a database. Features of a new object may be recognized in the new image by individually comparing each new feature from the new image to the database and candidate matching features based on Euclidean distance of their feature vectors are determined. From the full set of matches, subsets of keypoints that agree on the object and its location, scale, and orientation in the new image are identified to filter out good matches. The determination of consistent clusters may be rapidly implemented by the use of a hash table implementation of the generalized Hough transform algorithm. Each cluster of 3 or more features that agree on an object and its pose is then subject to further detailed model verification and subsequently outliers are discarded. Finally the probability that a particular set of features indicates the presence of an object is computed, given the accuracy of fit and number of probable false matches. Object matches that pass all these tests can be identified as correct with high confidence. Using the SIFT algorithm, distinctive keypoints may be selected that that are invariant to location, scale and rotation, and which are robust to affine transformations (changes in scale, rotation, shear and position) and changes in illumination for object recognition. The sequence proceeds as follows:
First, SIFT features are obtained from the input image using the algorithm described above. Next the features from the input image are matched to the SIFT feature database of reference or standard images that has been created. In an embodiment the feature matching is done through a Euclidean-distance based nearest neighbor approach. To increase robustness, matches are rejected for those keypoints for which the ratio of the nearest neighbor distance to the second nearest neighbor distance is greater than 0.8. To avoid the processor expensive search required for finding the Euclidean-distance-based nearest neighbor, an approximate algorithm called the best-bin-first algorithm is then employed. See Beis, J. Lowe, David G.; Shape Indexing using approximate nearest neighbor search in high dimensional spaces,” Conference on Computer Vision and Pattern Recognition, Puerto Rico; Sn pp1000-1006. doi 10.110/CPVR 1997 609451 which is incorporated by reference herein.
To further increase the reliability of the matching step the Hough transform is applied to create clusters of those features that belong to the same object and reject the matches that are left out in the clustering process. When clusters of features are found to vote for the same pose of an object, the probability of the interpretation being correct is much higher than for any single feature. Each keypoint votes for the set of object poses that are consistent with the keypoint's location, scale, and orientation. Bins that accumulate at least 3 votes are identified as candidate object/pose matches.
Finally, for each candidate cluster, a least-squares solution for the best estimated affine projection parameters relating to the reference image to the input image is obtained. If the projection of a keypoint through these parameters lies within half the error range that was used for the parameters in the Hough Transform bins, the keypoint match is kept. If fewer than 3 points remain after discarding outliers for a bin, then the object match is rejected. The least-squares fitting is repeated until no more rejections take place.
Additional information relating to the implementation of keypoint recognition systems can be found in the articles by K. Mikolajczyk and C. Schmid, “An Affine Invariant Interest Point Detector,” In European Conference on Computer Vision, pages 128-142. Springer, 2002. Copenhagen; K. Mikolajczyk and C. Schmid. A Performance Evaluation of Local Descriptors. In Conference on Computer Vision and Pattern Recognition, pages 257-263, June 2003; K. Mikolajczyk, T. Tuytelaars, C. Schmid, A. Zisserman, J. Matas, F. Schaffalitzky, T. Kadir, and L. Van Gool. A Comparison of Affine Region detectors. Accepted to International Journal of Computer Vision, 2005 and which are incorporated by reference herein. Software that implements a keypoint technique is sold under the brand “ImageModeler” which is available from Realviz Corporation, Arep Center, 1 traverse des Brucs 06560 Sophia Antipolis Cedex, France that allows semi-automated 3D reconstruction from a number of separate and distinct views.
An alternative method to compare image data involves the creation of feature histograms for each image, and the selecting an reference or standard image with the histogram closest to the input image's histogram. This technique may use three color histograms (red, green, and blue), and two texture histograms, direction and scale. This technique works best with images that are very similar to the database images. If the image requires significant efforts to replicate the scaled, or requires rotation of the images do not match, the method is not as effective. The computation of color histograms is fairly straightforward and requires the first selection of a range for your “histogram buckets.” For each range, the number of pixels with a color in that range are calculated. As an example, a “green” histogram is created using four buckets—0-63, 64-127, 128-191, and 192-255. For each pixel of the captured image the green value is analyzed and the number is added to the appropriate bucket. After the results are calculated, each bucket is divided by the total by the number of pixels in the entire image to get a normalized histogram for the green channel.
For creating a texture direction histogram, the edges of the image are first detected. For each edge point, has a normal vector pointing in the direction perpendicular to the edge. Next the normal vector's angle is quantized into one of 6 buckets between 0 and pi (since edges have 180-degree symmetry, the angels are converted between −pi and 0 to be between 0 and PI). The number of edge points in each direction is calculated and the result is an un-normalized histogram representing texture direction. This can then be normalized by dividing each bucket by the total number of edge points in the image.
To compute a texture scale histogram, for each edge point, we measured the distance to the next-closest edge point with the same direction. For example, if edge point A has a direction of 45 degrees, the algorithm walks in that direction until it finds another edge point with a direction of 45 degrees (or within a reasonable deviation). After computing this distance for each edge point, we dump those values into a histogram and normalize it by dividing by the total number of edge points. The five histograms for each image as discussed can then be compared by two images by taking the absolute value of the difference between each histogram bucket, and then sum these values.
In yet a further alternative technology that may be advantageously used with the invention is reported in a paper entitled “Keypoint Recognition using Randomized Trees” by Vincent Lepetit and Pascal Fual, Ecole Polytechnique Federale de Lausanne (EPFL) Computer Vision Laboratory, CH-1015 Lausanne, Switzerland (Vincent.Lepetit, Pascal.Fua}@epfl.ch, http://cvlab.epfl.ch. which is also incorporated by reference herein. This paper discloses a keypoint-based approach that is effective in this context by formulating wide-baseline matching of keypoints extracted from the input images to those found in the model images as a classification problem. This shifts much of the computational burden to a training phase, without sacrificing recognition performance. The resulting algorithm is robust, accurate, and fast-enough for frame-rate performance. See also, Fast Keypoint Recognition using Random Ferns—faster and more scalable than Lepetit 06.
In a further embodiment, the software platform to operate the device is based upon the Microsoft's Kinect software and its software development kit (SDK). The SDK released by Microsoft includes Windows 7 compatible drivers for its Kinect device which includes a camera and processor. The software kit provides Kinect capabilities to developers to allow them to build applications with using C++, C# or Visual Basic using using Microsoft Visual Studio. Features included in the SDK kit include raw sensor streams and access to low-level data streams from a depth sensor, a color camera sensor, and a microphone array. An element location sensor can be optimized to locate an enhanced detection element as discussed below. While the Kinect system is focused and optimized for skeletal tracking, embodiments of the present invention can be directed to teaching body position wherein the reference standard may be directed to body movements such as those that may be implemented in dance, exercise, and sports. For example, the reference standard may be directed to a golf swing or swimming stroke. The user then attempts to replicate the body position and the processor will compare the reference standard against the detected body motion and position. The development kit provided by Microsoft further includes sample code and requisite documentation.
Now referring to
In other embodiments the system can play back the successful solution as a positive reinforcement tool. Other positive feedback may be provided such a pleasant chime or applause when the user implements the correct solution. Negative feedback such as the audio of “oops” or a “boooing” or “razzing” sound may be broadcast when the user presents the incorrect solution.
Other object recognition and object comparison software that can be used in accordance with the teaching of the invention can be acquired from vendors such as Image Graphics Video, a division of Dynamic Ventures, Inc., of Cupertino, Calif.;. Goepel electronic GMPH of Jena, Germany and Imagu Ltd., of Tel-Aviv, Israel. Cognex Corporation of Natick, Mass. has developed a commercially available software referred to as Patmax® that can be adapted for use with the invention and can integrate its solutions with various platforms. Other object recognition and comparison techniques that are well known in the object recognition field and can be employed in connection with the invention include the following: Normalized Cross Correlation as disclosed by Brown, L. G, 1992 A Survey of image registration techniques ACM computing Surveys 24(4) pp. 325-376; Hausdorff Distance as disclosed by Rucklidge, W. J 1997 Efficiently locating objects using Hausdorff Distance International Journal of Computer Vision 24(3) pp. 251-270; Shape Based Matching disclosed by Steger, C. 2001 Similarity measures for occlusion, clutter and illumination invariant object recognition. In:B. Radig and S. Florczyk (eds), Mustererkennung 2001, Springer, Munchen, pp. 148-154 and, as discussed above, Modified Hough Transform as disclosed inter alia by Ulrich M 2001, Real time object recognition in digital images for industrial applications. Technical Report PF-2002-01 Lehrstuhl fur Photogrammetrie and Fernerkundung, Technische Universitat Munchen. See also Performance Comparison of 2D Object Recognition Techniques, Ulrich M, Steger C Commission III, Working Group and III/5 and papers cited therein, all of which is incorporated by reference herein.
Referring to
The data processing apparatus 1700 incorporates any combination of additional devices. These include, but are not limited to, a mass storage device 1715, one or more peripheral devices 1720, a loudspeaker or audio means 1725, one or more input devices 1730 which may comprise a touchscreen, mouse or keyboard, one or more portable storage medium drives 1735, a graphics subsystem 1740, a display 1745, and one or more output devices 1750. The input devices in the present invention include a camera. The various components are connected via an appropriate bus 1755 as known by those skilled in the art. In alternative embodiments, the components are connected through other communications media known in the art. In one example, processor 1705 and memory 1710 are connected via a local microprocessor bus; while mass storage device 1715, peripheral devices 1720, portable storage medium drives 1735, and graphics subsystem 1740 are connected via one or more input/output buses.
In embodiments, computer instructions for performing methods in accordance with exemplary embodiments of the invention also are stored in processor 1705 or mass storage device 1715. The computer instructions are programmed in a suitable language such as C++.
In embodiments, the portable storage medium drive 1735 operates in conjunction with a portable non-volatile storage medium, such as a floppy disk, CD-ROM, or other computer-readable medium, to input and output data and code to and from the data processing apparatus 1700. In some embodiments, methods performed in accordance with exemplary embodiments of the invention are implemented using computer instructions that are stored on such a portable medium or are downloaded to said processor from a wireless link.
Peripheral devices 1720 include any type of computer support device, such as a network interface card for interfacing the data processing apparatus 1700 to a network or a modem.
Still referring to
Loudspeaker or audio means 1725 includes a sound card, on-board sound processing hardware, or a device with built-in processing devices that attach via Universal Serial Bus (USB) or IEEE 1394 (Firewire). The audio means may also include input mean such as a microphone for capturing and streaming audio signals.
In embodiments, instructions for performing methods in accordance with exemplary embodiments of the invention are embodied as computer program products. These generally include a storage medium having instructions stored thereon used to program a computer to perform the methods disclosed above. Examples of suitable storage medium or media include any type of disk including floppy disks, optical disks, DVDs, CD ROMs, magnetic or optical cards, hard disk, flash card, smart card, and other media known in the art.
Stored on one or more of the computer readable media, the program includes software for controlling both the hardware of a general purpose or specialized computer or microprocessor. This software also enables the computer or microprocessor to interact with a human or other mechanism utilizing the results of exemplary embodiments of the invention. Such software includes, but is not limited to, device drivers, operating systems and user applications. Preferably, such computer readable media further include software for performing the methods described above.
In certain other embodiments, a program for performing an exemplary method of the invention or an aspect thereof is situated on a carrier wave such as an electronic signal transferred over a data network. Suitable networks include the Internet, a frame relay network, an ATM network, a wide area network (WAN), or a local area network (LAN). Those skilled in the art will recognize that merely transferring the program over the network, rather than executing the program on a computer system or other device, does not avoid the scope of the invention.
It will be clear to one skilled in the art that the embodiments described above can be altered in many ways without departing from the scope of the invention. Accordingly, the scope of the invention should be determined by the following claims and their legal equivalents.
This application claims the benefit of the filing date of U.S. Application No. 61/689,911 and U.S. Application 61/687,034. The present invention relates to a system and method that includes a camera, a display and a processor wherein the camera captures series of images of a device or structure as the device or structure is assembled or constructed. Next a processor then compares the detected images against a standard and provides feedback to a user in the form of output that reflects compliance with the standard or a deviation from the standard. The feedback is provided, inter alia on a display panel so that the user can either confirm that the assembly is in conformance with the standard or see a graphical representation of how the assembly deviates from the standard.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2013/036950 | 4/17/2013 | WO | 00 |
Number | Date | Country | |
---|---|---|---|
61687034 | Apr 2012 | US | |
61689911 | Jun 2012 | US |