1. Field of the Invention
The invention relates to automated detection and inspection of objects being manufactured on a production line, and more particularly to the related fields of industrial machine vision and automated image analysis.
2. Description of the Related Art
Industrial manufacturing relies on automatic inspection of objects being manufactured. One form of automatic inspection that has been in common use for decades is based on optoelectronic technologies that use electromagnetic energy, usually infrared or visible light, photoelectric sensors, and some form of electronic decision making.
One well-known form of optoelectronic automatic inspection uses a device that can capture a digital image of a two-dimensional field of view in which an object to be inspected is located, and then analyze the image and make decisions. Such a device is usually called a machine vision system, or simply a vision system. The image is captured by exposing a two-dimensional array of photosensitive elements for a brief period, called the integration or shutter time, to light that has been focused on the array by a lens. The array is called an imager and the individual elements are called pixels. Each pixel measures the intensity of light falling on it during the shutter time. The measured intensity values are then converted to digital numbers and stored in the memory of the vision system to form the image, which is analyzed by a digital processing element such as a computer, using methods well-known in the art to determine the status of the object being inspected.
In some cases the objects are brought to rest in the field of view, and in other cases the objects are in continuous motion through the field of view. An event external to the vision system, such as a signal from a photodetector, or a message from a PLC, computer, or other piece of automation equipment, is used to inform the vision system that an object is located in the field of view, and therefore an image should be captured and analyzed. Such an event is called a trigger.
Machine vision systems have limitations that arise because they make decisions based on a single image of each object, located in a single position in the field of view (each object may be located in a different and unpredictable position, but for each object there is only one such position on which a decision is based). This single position provides information from a single viewing perspective, and a single orientation relative to the illumination. The use of only a single perspective often leads to incorrect decisions. It has long been observed, for example, that a change in perspective of as little as a single pixel can in some cases change an incorrect decision to a correct one. By contrast, a human inspecting an object usually moves it around relative to his eyes and the lights to make a more reliable decision.
Machine vision systems have additional limitations arising from their use of a trigger signal. The need for a trigger signal makes the setup more complex—a photodetector must be mounted and adjusted, or software must be written for a PLC or computer to provide an appropriate message. When a photodetector is used, which is almost always the case when the objects are in continuous motion, a production line changeover may require it to be physically moved, which can offset some of the advantages of a vision system. Furthermore, photodetectors can only respond to a change in light intensity reflected from an object or transmitted along a path. In some cases, such a condition may not be sufficient to reliably detect when an object has entered the field of view.
Some prior art vision systems used with objects in continuous motion can operate without a trigger using a method often called self-triggering. These systems typically operate by monitoring one or more portions of captured images for a change in brightness or color that indicates the presence of an object. Self-triggering is rarely used in practice due to several limitations:
Many of the limitations of machine vision systems arise in part because they operate too slowly to capture and analyze multiple perspectives of objects in motion, and too slowly to react to events happening in the field of view. Since most vision systems can capture a new image simultaneously with analysis of the current image, the maximum rate at which a vision system can operate is determined by the larger of the capture time and the analysis time. Overall, one of the most significant factors in determining this rate is the number of pixels comprising the imager.
The time needed to capture an image is determined primarily by the number of pixels in the imager, for two basic reasons. First, the shutter time is determined by the amount of light available and the sensitivity of each pixel. Since having more pixels generally means making them smaller and therefore less sensitive, it is generally the case that increasing the number of pixels increases the shutter time. Second, the conversion and storage time is proportional to the number of pixels. Thus the more pixels one has, the longer the capture time.
For at least the last 25 years, prior art vision systems generally have used about 300,000 pixels; more recently some systems have become available that use over 1,000,000, and over the years a small number of systems have used as few as 75,000. Just as with digital cameras, the recent trend is to more pixels for improved image resolution. Over the same period of time, during which computer speeds have improved a million-fold and imagers have changed from vacuum tubes to solid state, machine vision image capture times generally have improved from about 1/30 second to about 1/60 second, only a factor of two. Faster computers have allowed more sophisticated analysis, but the maximum rate at which a vision system can operate has hardly changed.
The Vision Detector Method and Apparatus teaches novel methods and systems that can overcome the above-described limitations of prior art machine vision systems. These teachings also provide fertile ground for innovation leading to improvements beyond the scope of the original teachings. In the following section the Vision Detector Method and Apparatus is briefly summarized, and a subsequent section lays out the problems to be addressed by the present invention.
The Vision Detector Method and Apparatus provides systems and methods for automatic optoelectronic detection and inspection of objects, based on capturing digital images of a two-dimensional field of view in which an object to be detected or inspected may be located, and then analyzing the images and making decisions. These systems and methods analyze patterns of brightness reflected from extended areas, handle many distinct features on the object, accommodate line changeovers through software means, and handle uncertain and variable object locations. They are less expensive and easier to set up than prior art machine vision systems, and operate at much higher speeds. These systems and methods furthermore make use of multiple perspectives of moving objects, operate without triggers, provide appropriately synchronized output signals, and provide other significant and useful capabilities that will be apparent to those skilled in the art.
One aspect of the Vision Detector Method and Apparatus is an apparatus, called a vision detector, that can capture and analyze a sequence of images at higher speeds than prior art vision systems. An image in such a sequence that is captured and analyzed is called a frame. The rate at which frames are captured and analyzed, called the frame rate, is sufficiently high that a moving object is seen in multiple consecutive frames as it passes through the field of view (FOV). Since the objects moves somewhat between successive frames, it is located in multiple positions in the FOV, and therefore it is seen from multiple viewing perspectives and positions relative to the illumination.
Another aspect of the Vision Detector Method and Apparatus is a method, called dynamic image analysis, for inspecting objects by capturing and analyzing multiple frames for which the object is located in the field of view, and basing a result on a combination of evidence obtained from each of those frames. The method provides significant advantages over prior art machine vision systems that make decisions based on a single frame.
Yet another aspect of the Vision Detector Method and Apparatus is a method, called visual event detection, for detecting events that may occur in the field of view. An event can be an object passing through the field of view, and by using visual event detection the object can be detected without the need for a trigger signal.
Additional aspects of the Vision Detector Method and Apparatus will be apparent by a study of the figures and detailed descriptions given therein.
In order to obtain images from multiple perspectives, it is desirable that an object to be detected or inspected moves no more than a small fraction of the field of view between successive frames, often no more than a few pixels. According to the Vision Detector Method and Apparatus, it is generally desirable that the object motion be no more than about one-quarter of the FOV per frame, and in typical embodiments no more than 5% or less of the FOV. It is desirable that this be achieved not by slowing down a manufacturing process but by providing a sufficiently high frame rate. In an example system the frame rate is at least 200 frames/second, and in another example the frame rate is at least 40 times the average rate at which objects are presented to the vision detector.
An exemplary system is taught that can capture and analyze up to 500 frames/second. This system makes use of an ultra-sensitive imager that has far fewer pixels than prior art vision systems. The high sensitivity allows very short shutter times using very inexpensive LED illumination, which in combination with the relatively small number of pixels allows very short image capture times. The imager is interfaced to a digital signal processor (DSP) that can receive and store pixel data simultaneously with analysis operations. Using methods taught therein and implemented by means of suitable software for the DSP, the time to analyze each frame generally can be kept to within the time needed to capture the next frame. The capture and analysis methods and apparatus combine to provide the desired high frame rate. By carefully matching the capabilities of the imager, DSP, and illumination with the objectives of the invention, the exemplary system can be significantly less expensive than prior art machine vision systems.
The method of visual event detection involves capturing a sequence of frames and analyzing each frame to determine evidence that an event is occurring or has occurred. When visual event detection used to detect objects without the need for a trigger signal, the analysis would determine evidence that an object is located in the field of view.
In an exemplary method the evidence is in the form of a value, called an object detection weight, that indicates a level of confidence that an object is located in the field of view. The value may be a simple yes/no choice that indicates high or low confidence, a number that indicates a range of levels of confidence, or any item of information that conveys evidence. One example of such a number is a so-called fuzzy logic value, further described below and in the Vision Detector Method and Apparatus. Note that no machine can make a perfect decision from an image, and so it will instead make judgments based on imperfect evidence.
When performing object detection, a test is made for each frame to decide whether the evidence is sufficient that an object is located in the field of view. If a simple yes/no value is used, the evidence may be considered sufficient if the value is “yes”. If a number is used, sufficiency may be determined by comparing the number to a threshold. Frames where the evidence is sufficient are called active frames. Note that what constitutes sufficient evidence is ultimately defined by a human user who configures the vision detector based on an understanding of the specific application at hand. The vision detector automatically applies that definition in making its decisions.
When performing object detection, each object passing through the field of view will produce multiple active frames due to the high frame rate of the vision detector. These frames may not be strictly consecutive, however, because as the object passes through the field of view there may be some viewing perspectives, or other conditions, for which the evidence that the object is located in the field of view is not sufficient. Therefore it is desirable that detection of an object begins when a active frame is found, but does not end until a number of consecutive inactive frames are found. This number can be chosen as appropriate by a user.
Once a set of active frames has been found that may correspond to an object passing through the field of view, it is desirable to perform a further analysis to determine whether an object has indeed been detected. This further analysis may consider some statistics of the active frames, including the number of active frames, the sum of the object detection weights, the average object detection weight, and the like.
The method of dynamic image analysis involves capturing and analyzing multiple frames to inspect an object, where “inspect” means to determine some information about the status of the object. In one example of this method, the status of an object includes whether or not the object satisfies inspection criteria chosen as appropriate by a user.
In some aspects of the Vision Detector Method and Apparatus dynamic image analysis is combined with visual event detection, so that the active frames chosen by the visual event detection method are the ones used by the dynamic image analysis method to inspect the object. In other aspects of the Vision Detector Method and Apparatus, the frames to be used by dynamic image analysis can be captured in response to a trigger signal.
Each such frame is analyzed to determine evidence that the object satisfies the inspection criteria. In one exemplary method, the evidence is in the form of a value, called an object pass score, that indicates a level of confidence that the object satisfies the inspection criteria. As with object detection weights, the value may be a simple yes/no choice that indicates high or low confidence, a number, such as a fuzzy logic value, that indicates a range of levels of confidence, or any item of information that conveys evidence.
The status of the object may be determined from statistics of the object pass scores, such as an average or percentile of the object pass scores. The status may also be determined by weighted statistics, such as a weighted average or weighted percentile, using the object detection weights. Weighted statistics effectively weight evidence more heavily from frames wherein the confidence is higher that the object is actually located in the field of view for that frame. Evidence for object detection and inspection is obtained by examining a frame for information about one or more visible features of the object. A visible feature is a portion of the object wherein the amount, pattern, or other characteristic of emitted light conveys information about the presence, identity, or status of the object. Light can be emitted by any process or combination of processes, including but not limited to reflection, transmission, or refraction of a source external or internal to the object, or directly from a source internal to the object.
One aspect of the Vision Detector Method and Apparatus is a method for obtaining evidence, including object detection weights and object pass scores, by image analysis operations on one or more regions of interest in each frame for which the evidence is needed. In example of this method, the image analysis operation computes a measurement based on the pixel values in the region of interest, where the measurement is responsive to some appropriate characteristic of a visible feature of the object. The measurement is converted to a logic value by a threshold operation, and the logic values obtained from the regions of interest are combined to produce the evidence for the frame. The logic values can be binary or fuzzy logic values, with the thresholds and logical combination being binary or fuzzy as appropriate.
For visual event detection, evidence that an object is located in the field of view is effectively defined by the regions of interest, measurements, thresholds, logical combinations, and other parameters further described herein, which are collectively called the configuration of the vision detector and are chosen by a user as appropriate for a given application of the invention. Similarly, the configuration of the vision detector defines what constitutes sufficient evidence.
For dynamic image analysis, evidence that an object satisfies the inspection criteria is also effectively defined by the configuration of the vision detector.
Image analysis devices, including machine vision systems and vision detectors, must be configured to inspect objects. Typically, configuring such a device requires a human user to obtain at least one object whose appearance is representative of the objects to be inspected. The user captures an image of the object, generally called a training image, and uses it by choosing image analysis tools, positioning those tools on the training image, and setting operating parameters to achieve a desired effect. It is desirable that a training image be obtained under conditions as close to the actual production line as is practical. For production lines that operate in continuous motion, however, this can present difficulties for conventional vision systems. Generally a trigger signal must be used to obtain a useful training image, with the attendant limitations of triggers mentioned above. Furthermore, the single image captured in response to the trigger may not be the best viewing perspective from which to obtain the training image.
During configuration, a vision detector must be tested to determine whether its performance meets the requirements of the application. If the performance meets the requirements, the configuration can be considered complete, otherwise adjustments must be made. Since no vision detector makes perfect inspection decisions, testing performance generally includes some assessment of the probability that a correct decision will be made as to the status (e.g., pass or fail) of an object. Furthermore, since the error rates (incorrect decisions) are very low (desirably under one in 10,000 objects) this assessment can be very difficult. Therefore it is desirable that a large number of objects be tested, and that only those likely to represent incorrect decisions be assessed by the human user.
In one aspect the invention provides systems and methods for configuring a vision detector, wherein a training image is obtained from a production line operating in continuous motion so as to provide conditions substantially identical to those that will apply during actual manufacturing and inspection of objects. A training image can be obtained without any need for a trigger signal, whether or not the vision detector might use such a signal for inspecting the objects.
The production line contains a plurality of objects in continuous motion and passing through the field of view of a vision detector, which captures a sequence of images of the field of view. The frame rate is sufficiently high that a plurality of images are captured for each object as it passes through the field of view, and the sequence is sufficiently long that it contains images for a plurality of objects. The sequence of images will thus include a variety of viewing perspectives of each of a variety of objects, and will typically also include images where no object is present in the field of view, or where an object is only partially present.
Images from the captured sequence of images are displayed for a human user who will configure the vision detector. Typically it is not practical to display all of the captured images at once at a display resolution sufficient for the user to see useful detail in each image, so instead a portion of the sequence is displayed at one time. The user chooses the portion to be displayed by issuing scrolling commands to advance the portion to be displayed forward or backward in the sequence. The portion to be displayed at one time preferably includes several images, but can include as few as one image.
The user issues scrolling commands to advance the displayed portion forward and/or backward in the sequence until an image is found that the user judges to be sufficiently similar in appearance to the typical objects that will be inspected by the vision detector. The user will choose this image to be a training image. Since multiple images of each of multiple objects have been captured, the user will have a choice both of which object, and which viewing perspective (i.e. position in the field of view), he prefers to use for the training image.
The user will configure the vision detector by creating and configuring at least one vision tool, where a vision tool performs a suitable image analysis operation on a region of interest in frames that the vision detector will capture and analyze during the course of detecting and inspecting objects. Vision tools are created and configured using the chosen training image to perform actions including
In an exemplary embodiment, the images are displayed using a graphical user interface (GUI). The portion of images displayed at one time are contained in a filmstrip window of the GUI, which displays the portion of images as a succession of low-resolution “thumbnail” images. The resolution of the thumbnail images is chosen to be low enough that a useful number of images can been seen at one time, and high enough that each image is sufficiently detailed to be useful. The scrolling commands are provided by conventional GUI elements.
This exemplary embodiment further displays one image of the portion of images at full resolution in an image view window. As the scrolling commands advance the filmstrip forward and/or backward, the image displayed in the image view window will also be advanced forward or backward. The image view window may also contain a graphical display of any vision tools created by the user.
In a further modification to this exemplary embodiment, the captured sequence of images are stored in a memory on the vision detector, and the GUI is running on a separate human-machine interface (HMI) connected to the vision detector by means of a communications interface. To avoid excessive time that might be needed to transmit the entire captured sequence of images to the HMI, the vision detector creates and transmits the low-resolution thumbnail images for display in the filmstrip window. Only at such times as a new full-resolution image is needed by the HMI for display in the image view window is the appropriate full-resolution image transmitted.
In another exemplary embodiment, a configuration of the vision detector can be improved using additional images in the captured sequence of images. Note that there is no requirement or expectation that only one training image be used to configure a vision detector. It is desirable to confirm reasonable operation of the vision detector for a variety of viewing perspectives of a variety of objects, and to adjust the configuration as necessary to achieve that.
Furthermore, while at least one training image should be chosen to be sufficiently similar in appearance to the typical objects that will be inspected by the vision detector, it is often desirable to choose other training images that are not similar, as long as they are obtained under conditions as close to the actual production line as is practical. These other training images might include the images where no object is present in the field of view, or where an object is only partially present, for which it is desirable that the configuration of the vision detector be such that an object is not falsely detected.
It is expressly contemplated that the above described aspect of the invention could be used for configuring a prior art machine vision system that operates at a frame rate sufficiently high relative to the speed of the production line that a plurality of images are captured for each object.
In another aspect the invention provides systems and methods for testing a vision detector by capturing and storing selected images during a production run for display to a human user. As noted in the Background section, is desirable that a large number of objects be tested, which would produce far more images than would be practical for a human user to review. Therefore the invention provides systems and methods to select, store, and display a limited number of images from a production run, where those images correspond to objects likely to represent incorrect decisions.
The production line contains a plurality of objects in continuous motion and passing through the field of view of a vision detector, which captures and analyzes a sequence of frames. The vision detector has previously been configured to detect and inspect the objects. The detection may use a trigger signal or visual event detection, as further described herein and in the Vision Detector Method and Apparatus. The inspection preferably uses dynamic image analysis, as further described herein and in the Vision Detector Method and Apparatus, but may use any means that produces information about the status of objects.
A group of frames from the sequence of frames corresponds to each object detected and inspected. When using visual event detection, the group of frames may include the active frames, and may also include inactive frames sandwiched between the active frames or terminating the set of active frames. In an exemplary embodiment, the group of frames includes the active frames and all inactive frames sandwiched between them. When using a trigger signal, the group of frames preferably includes all of the frames captured and analyzed responsive to the trigger.
Each object is inspected, resulting in information about the status of the object. This result information may include a simple pass/fail, and it may also include statistics of frame counts, object detection weights, or object pass scores calculated during the course of detecting and inspecting the object and further described herein and in the Vision Detector Method and Apparatus.
The result information for each object is used to decide whether to store the object's group of frames for subsequent display to a human user. Typically it is not practical to display all of the stored images at once at a display resolution sufficient for the user to see useful detail in each image, so instead a portion of the stored images are displayed at one time. The user chooses the portion to be displayed by issuing scrolling commands to advance the portion to be displayed forward or backward, either by individual images within a group or by groups. The portion to be displayed at one time preferably includes several images, but can include as few as one image.
The choice of which groups to store is preferably made by the human user prior to the production run. The user may choose to store all groups, just those groups corresponding to objects that pass inspection, or just those groups corresponding to objects that fail inspection. These choices are often useful, but do not meet the objective of storing images that correspond to objects likely to represent incorrect decisions.
To meet that objective the invention recognizes that pass/fail decisions are preferably made by comparing a number to a decision threshold, where the number represents a level of confidence that some condition holds, for example that an object has been detected or that it passes inspection. The number is included in the object's result information and may correspond to statistics of frame counts, object detection weights, or object pass scores calculated during the course of detecting and inspecting the object.
If the number is significantly above the decision threshold, one can be confident that the condition holds; if significantly below the decision threshold, one can be confident that the condition does not hold. If the number is close to the decision threshold, however, the decision is ambiguous and one cannot be confident in the outcome. Thus objects for which the number is close to the decision threshold are likely to represent incorrect decisions, and it is desirable to store the corresponding groups of frames.
In an exemplary embodiment, the images are displayed using a GUI. The portion of images displayed at one time are contained in a filmstrip window of the GUI, which displays the portion of images as a succession of low-resolution “thumbnail” images. The resolution of the thumbnail images is chosen to be low enough that a useful number of images can been seen at one time, and high enough that each image is sufficiently detailed to be useful. The scrolling commands are provided by conventional GUI elements.
This exemplary embodiment further displays one image of the portion of images at full resolution in an image view window. As the scrolling commands advance the filmstrip forward and/or backward, the image displayed in the image view window will also be advanced forward or backward. The image view window may also contain a graphical display of any vision tools that are part of the configuration of the vision detector.
In a further modification to this exemplary embodiment, the GUI is running on a separate HMI connected to the vision detector by means of a communications interface. To avoid excessive time that might be needed to transmit all of the stored images to the HMI, the vision detector creates and transmits the low-resolution thumbnail images for display in the filmstrip window. Only at such times as when a new full-resolution image is needed by the HMI for display in the image view window is the appropriate full-resolution image transmitted.
The invention will be more fully understood from the following detailed description, in conjunction with the accompanying figures, wherein:
In an alternate embodiment, the vision detector sends signals to a PLC for various purposes, which may include controlling a reject actuator.
In another embodiment, suitable in extremely high speed applications or where the vision detector cannot reliably detect the presence of an object, a photodetector is used to detect the presence of an object and sends a signal to the vision detector for that purpose.
In yet another embodiment there are no discrete objects, but rather material flows past the vision detector continuously, for example a web. In this case the material is inspected continuously, and signals are send by the vision detector to automation equipment, such as a PLC, as appropriate.
When a vision detector detects the presence of discrete objects by visual appearance, it is said to be operating in visual event detection mode. When a vision detector detects the presence of discrete objects using an external signal such as from a photodetector, it is said to be operating in external trigger mode. When a vision detector continuously inspects material, it is said to be operating in continuous analysis mode.
If capture and analysis are overlapped, the rate at which a vision detector can capture and analyze images is determined by the longer of the capture time and the analysis time. This is the “frame rate”.
The Vision Detector Method and Apparatus allows objects to be detected reliably without a trigger signal, such as that provided by a photodetector. Referring to
Referring again to
Each analysis step first considers the evidence that an object is present. Frames where the evidence is sufficient are called active. Analysis steps for active frames are shown with a thick border, for example analysis step 540. In an illustrative embodiment, inspection of an object begins when an active frame is found, and ends when some number of consecutive inactive frames are found. In the example of
At the time that inspection of an object is complete, for example at the end of analysis step 548, decisions are made on the status of the object based on the evidence obtained from the active frames. In an illustrative embodiment, if an insufficient number of active frames were found then there is considered to be insufficient evidence that an object was actually present, and so operation continues as if no active frames were found. Otherwise an object is judged to have been detected, and evidence from the active frames is judged in order to determine its status, for example pass or fail. A variety of methods may be used to detect objects and determine status within the scope of the Vision Detector Method and Apparatus; some are described below and many others will occur to those skilled in the art.
Once an object has been detected and a judgment made, a report may be made to appropriate automation equipment, such as a PLC, using signals well-known in the art. In such a case a report step would appear in the timeline. The example of
Note that the report 560 may be delayed well beyond the inspection of subsequent objects such as 510. The vision detector uses well-known first-in first-out (FIFO) buffer methods to hold the reports until the appropriate time.
Once inspection of an object is complete, the vision detector may enter an idle step 580. Such a step is optional, but may be desirable for several reasons. If the maximum object rate is known, there is no need to be looking for an object until just before a new one is due. An idle step will eliminate the chance of false object detection at times when an object couldn't arrive, and will extend the lifetime of the illumination system because the lights can be kept off during the idle step.
In another embodiment, the report step is delayed in a manner equivalent to that shown in
The DSP 900 can be any device capable of digital computation, information storage, and interface to other digital elements, including but not limited to a general-purpose computer, a PLC, or a microprocessor. It is desirable that the DSP 900 be inexpensive but fast enough to handle a high frame rate. It is further desirable that it be capable of receiving and storing pixel data from the imager simultaneously with image analysis.
In the illustrative embodiment of
The high frame rate desired by a vision detector suggests the use of an imager unlike those that have been used in prior art vision systems. It is desirable that the imager be unusually light sensitive, so that it can operate with extremely short shutter times using inexpensive illumination. It is further desirable that it be able to digitize and transmit pixel data to the DSP far faster than prior art vision systems. It is moreover desirable that it be inexpensive and have a global shutter.
These objectives may be met by choosing an imager with much higher light sensitivity and lower resolution than those used by prior art vision systems. In the illustrative embodiment of
It is desirable that the illumination 940 be inexpensive and yet bright enough to allow short shutter times. In an illustrative embodiment, a bank of high-intensity red LEDs operating at 630 nanometers is used, for example the HLMP-ED25 manufactured by Agilent Technologies. In another embodiment, high-intensity white LEDs are used to implement desired illumination.
In the illustrative embodiment of
As used herein an image capture device provides means to capture and store a digital image. In the illustrative embodiment of
As used herein an analyzer provides means for analysis of digital data, including but not limited to a digital image. In the illustrative embodiment of
As used herein an output signaler provides means to produce an output signal responsive to an analysis. In the illustrative embodiment of
It will be understood by one of ordinary skill that there are many alternate arrangements, devices, and software instructions that could be used within the scope of the Vision Detector Method and Apparatus to implement an image capture device 980, analyzer 982, and output signaler 984.
A variety of engineering tradeoffs can be made to provide efficient operation of an apparatus according to the Vision Detector Method and Apparatus for a specific application. Consider the following definitions:
From these definitions it can be seen that
To achieve good use of the available resolution of the imager, it is desirable that b is at least 50%. For dynamic image analysis, n should be at least 2. Therefore it is further desirable that the object moves no more than about one-quarter of the field of view between successive frames.
In an illustrative embodiment, reasonable values might be b=75%, e=5%, and n=4. This implies that m 5%, i.e. that one would choose a frame rate so that an object would move no more than about 5% of the FOV between frames. If manufacturing conditions were such that s=2, then the frame rate r would need to be at least approximately 40 times the object presentation rate p. To handle an object presentation rate of 5 Hz, which is fairly typical of industrial manufacturing, the desired frame rate would be at least around 200 Hz. This rate could be achieved using an LM9630 with at most a 3.3 millisecond shutter time, as long as the image analysis is arranged so as to fit within the 5 millisecond frame period. Using available technology, it would be feasible to achieve this rate using an imager containing up to about 40,000 pixels.
With the same illustrative embodiment and a higher object presentation rate of 12.5 Hz, the desired frame rate would be at least approximately 500 Hz. An LM9630 could handle this rate by using at most a 300 microsecond shutter.
In another illustrative embodiment, one might choose b=75%, e=15%, and n=5, so that m≦2%. With s=2 and p=5 Hz, the desired frame rate would again be at least approximately 500 Hz.
A fuzzy logic value is a number between 0 and 1 that represents an estimate of confidence that some specific condition is true. A value of 1 signifies high confidence that the condition is true, 0 signifies high confidence that the condition is false, and intermediate values signify intermediate levels of confidence.
The more familiar binary logic is a subset of fuzzy logic, where the confidence values are restricted to just 0 and 1. Therefore, any embodiment described herein that uses fuzzy logic values can use as an alternative binary logic values, with any fuzzy logic method or apparatus using those values replaced with an equivalent binary logic method or apparatus.
Just as binary logic values are obtained from raw measurements by using a threshold, fuzzy logic values are obtained using a fuzzy threshold. Referring to
In an illustrative embodiment, a fuzzy threshold comprises two numbers shown on the x-axis, low threshold t0 1120, and high threshold t1 1122, corresponding to points on the function 1124 and 1126. The fuzzy threshold can be defined by the equation
Note that this function works just as well when ti<t0. Other functions can also be used for a fuzzy threshold, such as the sigmoid
where t and σ are threshold parameters. In embodiments where simplicity is a goal, a conventional binary threshold can be used, resulting in binary logic values.
Fuzzy decision making is based on fuzzy versions of AND 1140, OR 1150, and NOT 1160. A fuzzy AND of two or more fuzzy logic values is the minimum value, and a fuzzy OR is the maximum value. Fuzzy NOT off is 1−f Fuzzy logic is identical to binary when the fuzzy logic values are restricted to 0 and 1.
In an illustrative embodiment, whenever a hard true/false decision is needed, a fuzzy logic value is considered true if it is at least 0.5, false if it is less than 0.5.
It will be clear to one skilled in the art that there is nothing critical about the values 0 and 1 as used in connection with fuzzy logic herein. Any number could be used to represent high confidence that a condition is true, and any different number could be used to represent high confidence that the condition is false, with intermediate values representing intermediate levels of confidence.
1. Is an object, or a set of visible features of an object, located in the field of view?
2. If so, what is the status of the object?
Information comprising evidence that an object is located in the field of view is called an object detection weight. Information comprising evidence regarding the status of an object is called an object pass score. In various illustrative embodiments, the status of the object comprises whether or not the object satisfies inspection criteria chosen as appropriate by a user. In the following, an object that satisfies the inspection criteria is sometimes said to “pass inspection”.
In the illustrative embodiment of
In the illustrative embodiment of
In the example of
In one embodiment, an object is judged to have been detected if the number of active frames found exceeds some threshold. An another embodiment, an object is judged to have been detected if the total object detection weight over all active frames exceeds some threshold. These thresholds are set as appropriate for a given application.
In the illustrative embodiment of
where the summation is over all active frames. The effect of this formula is to average the object pass scores, but to weight each score based on the confidence that the object really did appear in the corresponding frame.
In an alternate embodiment, an object is judged to pass inspection if the average of the object pass scores is at least 0.5. This is equivalent to a weighted average wherein all of the weights are equal.
In the example of
The weighted percentile method is based on the fraction Q(p) of total weight where the pass score is at least p:
The object is judged to pass if Q(p) is at least some threshold t. In the illustrative embodiment of
Useful behavior is obtained using different values of t. For example, if t=50%, the object is judged to pass inspection if the weighted median score is at least p. Weighted median is similar to weighted average, but with properties more appropriate in some cases. For higher values, for example t=90%, the object will be judged to pass inspection only if the overwhelming majority of the weight corresponds to active frames where the pass score is at least p. For t=100%, the object will be judged to pass inspection only if all of the active frames have a pass score that is at least p. The object may also be judged to pass inspection if Q(p) is greater than 0, which means that any active frame has frame a pass score that is at least p.
In another useful variation, the object is judged to pass inspection based on the total weight where the pass score is at least p, instead of the fraction of total weight.
In an alternate embodiment, a percentile method is used based on a count of the frames where the pass score is at least p. This is equivalent to a weighted percentile method wherein all of the weights are equal.
The above descriptions of methods for weighing evidence to determine whether an object has been detected, and whether it passes inspection, are intended as examples of useful embodiments, but do not limit the methods that can be used within the scope of the Vision Detector Method and Apparatus. For example, the exemplary constants 0.5 used above may be replaced with any suitable value. Many additional methods for dynamic image analysis will occur to those skilled in the art.
As illustrated, classes with a dotted border, such as Gadget class 1400, are abstract base classes that do not exist by themselves but are used to build concrete derived classes such as Locator class 1420. Classes with a solid border represent dynamic objects that can be created and destroyed as needed by the user in setting up an application, using an HMI 830. Classes with a dashed border, such as Input class 1450, represent static objects associated with specific hardware or software resources. Static objects always exist and cannot be created or destroyed by the user.
All classes are derived from Gadget class 1400, and so all objects that are instances of the classes shown in
The act of analyzing a frame consists of running each Gadget once, in an order determined to guarantee that all logic inputs to a Gadget have been updated before the Gadget is run. In some embodiments, a Gadget is not run during a frame where its logic output is not needed.
The Photo class 1410 is the base class for all Gadgets whose logic output depends on the contents of the current frame. These are the classes that actually do the image analysis. Every Photo measures some characteristic of a region of interest (ROI) of the current frame. The ROI corresponds to a visible feature on the object to be inspected. This measurement is called the Photo's analog output. The Photo's logic output is computed from the analog output by means of a fuzzy threshold, called the sensitivity threshold, that is among its set of parameters that can be configured by a user. The logic output of a Photo can be used to provide evidence to be used in making judgments.
The Detector class 1430 is the base class for Photos whose primary purpose is to make measurements in an ROI and provide evidence to be used in making judgments. In an illustrative embodiment all Detector ROIs are circles. A circular ROI simplifies the implementation because there is no need to deal with rotation, and having only one ROI shape simplifies what the user has to learn. Detector parameters include the position and diameter of the ROI.
A Brightness Detector 1440 measures a weighted average or percentile brightness in the ROI. A Contrast Detector 1442 measures contrast in the ROI. An Edge Detector 1444 measures the extent to which the ROI looks like an edge in a specific direction. A Spot Detector 1446 measures the extent to which the ROI looks like a round feature such as a hole. A Template Detector 1448 measures the extent to which the ROI looks like a pre-trained pattern selected by a user. The operation of the Detectors is further described below and in the Vision Detector Method and Apparatus.
The Locator class, 1420 represents Photos that have two primary purposes. The first is to produce a logic output that can provide evidence for making judgments, and in this they can be used like any Detector. The second is to determine the location of an object in the field of view of a vision detector, so that the position of the ROI of other Photos can be moved so as to track the position of the object. Any Locator can be used for either or both purposes.
In an illustrative embodiment, a Locator searches a one-dimensional range in a frame for an edge. The search direction is normal to the edge, and is among the parameters to be configured by the user. The analog output of a Locator is similar to that for an Edge Detector. Locators are further described in the Vision Detector Method and Apparatus.
The Input class 1450 represents input signals to the vision detector, such as an external trigger. The Output class 1452 represents output signals from the vision detector, such as might be used to control a reject actuator. There is one static instance of the Input class for each physical input, such as exemplary input signal 926 (
The Gate base class 1460 implements fuzzy logic decision making. Each Gate has one or more logic inputs than can be connected to the logic outputs of other Gadgets. Each logic input can be inverted (fuzzy NOT) by means of a parameter that a user can configure. An AND Gate 1462 implements a fuzzy AND operation, and an OR Gate 1464 implements a fuzzy OR operation.
The Judge class 1470 is the base class for two static objects, the ObjectDetect Judge 1472 and the ObjectPass Judge 1474. Judges implement dynamic image analysis by weighing evidence over successive frames to make the primary decisions. Each Judge has a logic input to which a user connects the logic output of a Photo or, more typically, a Gate that provides a logical combination of Gadgets, usually Photos and other Gates.
The ObjectDetect Judge 1472 decides if an object has been detected, and the ObjectPass Judge 1474 decides if it passes inspection. The logic input to the ObjectDetect Judge provides the object detection weight for each frame, and the logic input to the ObjectPass Judge provides the object pass score for each frame.
The logic output of the ObjectDetect Judge provides a pulse that indicates when a judgment has been made. In one mode of operation, called “output when processing”, the leading edge of the pulse occurs when the inspection of an object begins, for example at the end of analysis step 540 in
The logic output of the ObjectPass Judge provides a level that indicates whether the most recently inspected object passed. The level changes state when the inspection of an object is complete, for example at the end of analysis step 548.
A Locator 1520 is used to detect and locate the top edge of the object, and another Locator 1522 is used to detect and locate the right edge.
A Brightness Detector 1530 is used to help detect the presence of the object. In this example the background is brighter than the object, and the sensitivity threshold is set to distinguish the two brightness levels, with the logic output inverted to detect the darker object and not the brighter background.
Together the Locators 1520 and 1522, and the Brightness Detector 1530, provide the evidence needed to judge that an object has been detected, as further described below.
A Contrast Detector 1540 is used to detect the presence of the hole 1512. When the hole is absent the contrast would be very low, and when present the contrast would be much higher. A Spot Detector could also be used.
An Edge Detector 1560 is used to detect the presence and position of the label 1510. If the label is absent, mis-positioned horizontally, or significantly rotated, the analog output of the Edge Detector would be very low.
A Brightness Detector 1550 is used to verify that the correct label has been applied. In this example, the correct label is white and incorrect labels are darker colors.
As the object moves from left to right through the field of view of the vision detector, Locator 1522 tracks the right edge of the object and repositions Brightness Detector 1530, Contrast Detector 1540, Brightness Detector 1550, and Edge Detector 1560 to be at the correct position relative to the object. Locator 1520 corrects for any variation in the vertical position of the object in the field of view, repositioning the detectors based on the location of the top edge of the object. In general Locators can be oriented in any direction.
A user can manipulate Photos in an image view by using well-known HMI techniques. A Photo can be selected by clicking with a mouse, and its ROI can be moved, resized, and rotated by dragging. Additional manipulations for Locators are described in the Vision Detector Method and Apparatus.
Referring still to the wiring diagram of
The logic output of AND Gate 1610 represents the level of confidence that the top edge of the object has been detected, the right edge of the object has been detected, and the background has not been detected. When confidence is high that all three conditions are true, confidence is high that the object itself has been detected. The logic output of AND Gate 1610 is wired to the ObjectDetect Judge 1600 to be used as the object detection weight for each frame.
Since the logic input to the ObjectDetect Judge in this case depends on the current frame, the vision detector is operating in visual event detection mode. To operate in external trigger mode, an Input Gadget would be wired to ObjectDetect. To operate in continuous analysis mode, nothing would be wired to ObjectDetect.
The choice of Gadgets to wire to ObjectDetect is made by a user based on knowledge of the application. In the example of
In the wiring diagram, Contrast Detector “Hole” 1640, corresponding to Contrast Detector 1540, Brightness Detector “Label” 1650, corresponding to Brightness Detector 1550, and Edge Detector “LabelEdge” 1660, corresponding to Edge Detector 1560, are wired to AND Gate 1612. The logic output of AND Gate 1612 represents the level of confidence that all three image features have been detected, and is wired to ObjectPass Judge 1602 to provide the object pass score for each frame.
The logic output of ObjectDetect Judge 1600 is wired to AND Gate 1670. The logic output of ObjectPass Judge 1602 is inverted and also wired to AND Gate 1670. The ObjectDetect Judge is set to “output when done” mode, so a pulse appears on the logic output of ObjectDetect Judge 1600 after an object has been detected and inspection is complete. Since the logic output of ObjectPass 1602 has been inverted, this pulse will appear on the logic output of AND Gate 1670 only if the object has not passed inspection. The logic output of AND Gate 1670 is wired to an Output gadget 1680, named “Reject”, which controls an output signal from the vision detector than can be connected directly to a reject actuator 170. The Output Gadget 1680 is configured by a user to perform the appropriate delay 570 needed by the downstream reject actuator.
A user can manipulate Gadgets in a logic view by using well-known HMI techniques. A Gadget can be selected by clicking with a mouse, its position can be moved by dragging, and wires can be created by a drag-drop operation.
To aid the user's understanding of the operation of the vision detector, Gadgets and/or wires can change their visual appearance to indicate fuzzy logic values. For example, Gadgets and/or wires can be displayed red when the logic value is below 0.5, and green otherwise. In
One skilled in the art will recognize that a wide variety of objects can be detected and inspected by suitable choice, configuration, and wiring of Gadgets. One skilled in the art will also recognize that the Gadget class hierarchy is only one of many software techniques that could be used to practice the Vision Detector Method and Apparatus.
The invention herein described can be used to configure any image analysis device, including prior art vision systems; the illustrative embodiment described in the following is based on configuring a vision detector.
The vision detector 400 captures a sequence of images of its field of view. Referring to the illustrative apparatus of
For clarity in the drawing ring buffer 5000 is shown to contain only 24 elements, but in practice a higher number is desirable. In one embodiment 160 elements are used, which requires just under two megabytes of storage, and which is capable of storing about 0.8 seconds of a production run at 200 frames/second, or about 0.32 seconds at 500 frames/second. Clearly, lower frame rates can be used to increase the amount of time for which images can be stored.
The operation and implementation of ring buffers are well-known in the art. In the illustrative embodiment of
At some point the ring buffer will become full, as is the case for full ring buffer 5002. Write pointer 5040 indicates the only available element 5034. In one embodiment, when the ring buffer becomes full image capture terminates. In another embodiment image capture continues, with the next capture overwriting oldest element 5036 indicated by read pointer 5042, and further continuing to overwrite elements and advance the pointers until some condition, such as a command from the human user, occurs to terminate the image capture. Once image capture has terminated, ring buffer 5002 contains a sequence of images 5050 available for display for a human user.
Referring back to
The GUI allows a portion of the sequence of images 5050 stored in vision detector memory 910 to be displayed for a human user. In the illustrative embodiment of
A set of scrolling controls 5150 is provided in filmstrip window 5102 for advancing the displayed portion 5052 forward or backward within the sequence of images 5050. Next image control 5160 advances displayed portion 5052 forward by one image, and previous image control 5162 advances displayed portion 5052 backward by one image. Controls 5164 and 5166 are described below.
Thumbnail 5120 displays a low-resolution image of object 5140, which may correspond for example to object 114 (
In the illustrative embodiment of
To configure the vision detector, the user issues scrolling commands until an image appears in image view window 5100 that is suitable to use as a first training image. Generally an image is suitable if the user judges it to be sufficiently similar in appearance to the typical objects that will be inspected. Note that object 1500 in image view window 5100, which is also shown in
Once a configuration has been created using the first training image, it is desirable to confirm and possibly adjust the configuration using additional training images. Scrolling commands can be issued to choose a second training image, which will appear both in thumbnail 5120 and image view window 5100. The second training image can be used simply to assess the operation of previously created vision tools, or to update the configuration by creating additional vision tools, or adjusting the position, size, orientation, and/or operating parameters of previously created vision tools.
For example, the initial configuration created using the first training image may not have included Brightness Detector 1530, which as previously described is used in conjunction with Locators 1520 and 1522 to help detect the presence of the object. It may be that the user did not realize, looking only at the first training image, that Brightness Detector 1530 would be needed. In may be that only by considering a second training image where an object is only partially present in the field of view, or in which no object is present, could the user see that Brightness Detector 1530 is needed to prevent false detection.
Similarly, it may be that a second training image corresponding to defective object 116, which is dissimilar in appearance to the typical objects to be inspected, is necessary to properly adjust Contrast Detector 1540.
During the process of configuring a vision detector it is desirable to test the configuration more extensively than can be accomplished using second training images to assess vision tool operation. It is further desirable that such a test include a large number of objects, and capture and store a limited number of images, where those images correspond to objects likely to represent incorrect decisions. Similar tests may also be desirable during actual production operation. It is desirable that the production environment, including conveyer speed and ambient illumination, be identical during configuration and production operation. Comparing
In an illustrative embodiment used for testing a vision detector, the vision detector is configured for detecting and inspecting discrete objects. The vision detector may use visual event detection, operating as shown in
Referring to
Referring to
As previously described, objects are analyzed to determine results that contain information about the status of the object. The results may be as simple as whether or not the object passes inspection. In an illustrative embodiment, the results include numbers that indicate a relative confidence that the object has been detected, and that it passes inspection. For example, one such number may be the weighted average of object pass scores, where an object is judged to pass inspection if that number exceeds some decision threshold, as shown in equation 5 where the decision threshold is 0.5. Similarly, when using visual event detection, one such number may be the number of active frames or the total object detection weight, which is compared to a decision threshold to determine if an object has been detected (for more detail on these numbers see the above description of
These results are used to determine if the group of frames for the corresponding object might be of interest to a human user, in which case they would be stored for subsequent display. The user may be interested in all objects, or just those that pass or just those that do not pass. It is often most desirable to store and display images corresponding to objects likely to represent incorrect decisions, because these represent both very rare occurrences, and situations most in need of careful study. An incorrect decision is likely when numbers that indicate a relative confidence are close to the decision threshold.
The stored groups of frames is a set of images, a portion of which can be displayed by a graphical user interface such as that shown in
Note that if an active frame for a new object has not yet been found, then there is no current group, and so the read pointer 5222 will indicate frame buffer 5232 and will advance for each new frame until a new object is found. Note further that if the free pool 5210 fills up, some of the oldest frames in the group will not be able to be stored.
When an object has been analyzed and the result containing information about the status of the object is determined, the group is complete. If the result indicates that the group is not to be stored for subsequent display, the images are discarded by simply advancing the read pointer 5222. If the group is to be stored, the frame buffers are removed from the free pool 5210 and added to stored group pool 5204, which includes stored groups 5212, 5214, and 5216. If the number of frame buffers in the free pool 5210 becomes too small after removing the new stored group, then one or more older stored groups may be taken from stored group pool 5204 and placed back in the free pool. Those older groups will no longer be available for display.
In an illustrative embodiment, frame buffers are never copied. Instead frame buffers are moved between free pool 5210 and stored group pool 5204 by pointer manipulation using techniques well known in the art.
A list of stored groups 5202 is maintained, including list elements 5240, 5242, and 5244. List element 5240, for example, contains next element pointer 5250, frame buffer count 5252, result information 5254, and stored group pointer 5256. Result information 5254 includes the relative confidence number used to decide whether to store the group. Note in this case that the decision threshold is 0.5, and the three stored groups have confidence numbers near the threshold. Note further that the list is sorted in order of closeness to the decision threshold. If a stored group needs to be moved from stored group pool 5204 and placed back in free pool 5210, the first group on the list, which is farthest from the decision threshold, is taken.
The foregoing has been a detailed description of illustrative embodiments of the invention. Various modifications and additions can be made without departing from the sprit and scope thereof. For example, the layout and control options of the GUI are highly variable according to alternate embodiments. The number and size of thumbnails in the filmstrip window of the GUI are also subject to variation. Accordingly, this description is meant to be taken only by way of example, and not to otherwise limit the scope of the invention.
This application is a continuation-in-part of co-pending U.S. patent application Ser. No. 10/865,155, entitled METHOD AND APPARATUS FOR VISUAL DETECTION AND INSPECTION OF OBJECTS, by William M. Silver, filed Jun. 9, 2004, the teachings of which are expressly incorporated herein by reference, and referred to herein as the “Vision Detector Method and Apparatus”.
Number | Date | Country | |
---|---|---|---|
Parent | 10979572 | Nov 2004 | US |
Child | 14546662 | US |