The field of the invention relates, generally, to monitoring of industrial environments where humans and machinery interact or come into proximity, and in particular to systems and methods for detecting conditions in a monitored workspace that do not conform to industry safety standards/practices.
Industrial machinery may present potential hazards to humans. Some machinery may be completely shut down, while other machinery may have a variety of operating states, some of which may present potential hazards and some of which may not. In some cases, the degree of any potential hazard may depend on the location or distance of the human with respect to the machinery. As a result, many “guarding” approaches have been developed to separate humans and machines and to prevent interaction between machinery and humans. One very simple and common type of guarding is simply a cage that surrounds the machinery, configured such that opening the door of the cage causes an electrical circuit to place the machinery in a shut-down (immobile) state. If the door is placed sufficiently far from the machinery to ensure that the human cannot reach the machinery before the machinery shuts down, this ensures that humans can never approach the machinery while it is operating. Of course, this prevents all interaction between human and machine, and severely constrains use of the workspace.
The problem is exacerbated if not only humans but also the machinery (e.g., a robot) can move within the workspace. Both may change position and configuration in rapid and uneven ways. Typical industrial robots are stationary, but nonetheless have powerful arms that may present potential hazards over a wide “envelope” of possible movement trajectories. Additionally, robots are often mounted on a rail or other type of external axis, and additional machinery is often incorporated into the robot's end effector, both of which increase the effective total operating envelope of the robot.
Sensors such as light curtains can be substituted for cages or other physical barriers, providing alternative methods to prevent humans and machinery from coming into contact. Sensors such as two-dimensional (2D) light detection and ranging (LIDAR) sensors can provide more sophisticated capabilities, such as allowing the industrial machinery or robot to slow down or issue a warning when an intrusion is detected in an outer zone and stop only when an intrusion is detected in an inner zone. Additionally, a system using a 2D LIDAR can define multiple zones in a variety of shapes.
The guarding equipment must typically comply with stringent industry standards regarding function of the guarding equipment, such as ISO 13849, IEC 61508, and IEC 62061. These standards specify maximum failure rates for hardware components and define rigorous development practices for both hardware and software components that must be complied with in order for a system to be considered safety-rated for use in industrial settings.
Such guarding systems must ensure that potentially hazardous conditions and system failures can be detected with very high probability, and that the system responds to such events by transitioning the equipment being controlled into a safe state. For example, a system that detects zone intrusion may be biased toward registering an intrusion, i.e., risking false positives in order to avoid hazardous interaction between a machine and a human due to a false negative.
One new class of sensor that shows significant promise for use in machine guarding provides three-dimensional (3D) depth information. Examples of such sensors include 3D time-of-flight cameras, 3D LIDAR, and stereo vision cameras. These sensors offer the ability to detect and locate intrusions into the area surrounding industrial machinery in 3D, which has several advantages over 2D systems. In particular, for complex workcells it can be very difficult to determine a combination of 2D planes that effectively covers the entire space for monitoring purposes; 3D sensors, properly configured, can alleviate this issue.
For example, a 2D LIDAR system guarding the floor space of an industrial robot will have to preemptively stop the robot when an intrusion is detected well beyond an arm's-length distance away from the robot (the “Protective Separation Distance” or PSD), because if the intrusion represents a person's legs, that person's arms could be much closer and would be undetectable by the 2D LIDAR system. For sensors that cannot detect arms or hands, the PSD has an extra term called the intrusion distance that is typically set to 850 mm. A 3D system, by contrast, can allow the robot to continue to operate until the person actually stretches his or her arm towards the robot. This provides a much tighter interlock between the actions of the machine and the actions of the human, which avoids premature or unnecessary shutdowns, facilitates many new safety-rated applications and workcell designs, and saves space on the factory floor (which is always at a premium).
Another application of 3D sensing involves tasks that are best achieved by humans and machines working collaboratively together. Humans and machines have very different strengths and weaknesses. Typically, machines may be stronger, faster, more precise, and offer greater repeatability. Humans have flexibility, dexterity, and judgment far beyond the abilities of even the most advanced machines. An example of a collaborative application is the installation of a dashboard in a car, where the dashboard is heavy and difficult for a human to maneuver, but attaching it requires a variety of connectors and fasteners that require human dexterity. A guarding system based on 3D sensing could enable industrial engineers to design processes that optimally allocate subtasks to humans and machines in a manner that best exploits their different capabilities while preserving safety-rating of the system.
2D and 3D sensing systems may share underlying technologies. RGB cameras and stereo vision cameras, for example, utilize a lens and sensor combination (i.e., a camera) to capture an image of a scene that is then analyzed algorithmically. A camera-based sensing system typically includes several key components. A light source illuminates the object being inspected or measured. This light source may be part of the camera, as in active sensing systems, or independent of the camera, such as a lamp illuminating the field of view of the camera, or even ambient light. A lens focuses the reflected light from the object and provides a wide field of view. An image sensor (usually a CCD or CMOS array) converts light into electrical signals. A camera module usually integrates the lens, image sensor, and necessary electronics to provide electrical input for further analysis.
The signal from the camera module is fed to an image-capture system, such as a frame grabber, which stores and further processes the 2D or 3D image signal. A processor runs image-analysis software for identification, measurement, and location of objects within the captured scene. Depending on the specific design of the system, the processor can use central-processing units (CPUs), graphics-processing units (GPUs), field-programmable gate arrays (FPGAs), or any number of other architectures, which may be deployed in a stand-alone computer or integrated in the camera module.
2D camera-based methods are well-suited to detecting defects or taking measurements using well-known image-processing techniques, such as edge detection or template matching. 2D sensing is used in unstructured environments and, with the aid of advanced image-processing algorithms, may compensate for varying illumination and shading conditions. However, algorithms for deriving 3D information from 2D images may lack reliability and suitability for safety-critical applications, as their failure modes are hard to characterize.
While a typical image provides 2D information of an object or space, a 3D camera adds another dimension and estimates the distance to objects and other elements in a scene. 3D sensing can therefore provide the 3D contour of an object or space, which can itself be used to create a 3D map of the surrounding environment and position an object relative to this map. Reliable 3D vision overcomes many problems of 2D vision, as the depth measurement can be used to easily separate foreground from background. This is particularly useful for scene understanding, where the first step is to segment the subject of interest (foreground) from other parts of the image (background).
A widely used 3D camera-based sensing approach is stereoscopic vision, or stereo vision. Stereo vision generally uses two spaced-apart cameras in a physical arrangement similar to human eyes. Given a point-like object in space, the camera separation will lead to a measurable disparity of the object positions in the two camera images. Using simple pinhole camera geometry, the object's position in 3D can be computed from the images in each of the cameras. This approach is intuitive, but its real-world implementations are often not as simple. For example, features of the target need to be recognized first so that the two images can be compared for triangulation, but feature recognition involves relatively complex computation and may consume substantial processing power.
Further, 3D stereoscopic vision is highly dependent on the background lighting environment, and its effectiveness is degraded by shadows, occlusions, low contrast, lighting changes, or unexpected movements of the object or sensors. Therefore, often more than two sensors will be used to obtain a surrounding view of the target and thereby handle occlusions, or to provide redundancy to compensate for errors caused by a degraded and uncontrolled environment. Another common alternative is the use of structured light patterns to enhance a system's ability to detect features.
Another approach to 3D imaging utilizes lasers or other active light sources and detectors. A light source-detector system is similar to a camera-based system in that it also integrates lens and image sensors and converts optical signals into electrical signals, but there is no image captured. Instead, the image sensor measures the change of position and/or intensity of a tightly focused light beam, usually a laser beam, over time. This change of position and/or intensity of the detected light beam is used to determine object alignment, throughput, reflective angles, time of flight, or other parameters to create images or maps of the space or object under observation. Light source-detector combinations include active triangulation, structured light, LIDAR, and time-of-flight (ToF) sensors.
Active triangulation mitigates the environmental limitations of stereoscopic 3D by proactively illuminating objects under study with a narrowly focused light source. The wavelength of the active illumination can be controlled, and the sensors can be designed to ignore light at other wavelengths, thereby reducing ambient light interference. Further, the location of the light source can be changed, allowing the object to be scanned across points and from multiple angles to provide a complete 3D picture of the object.
3D structured light is another approach based on triangulation and an active light source. In this approach, a pre-designed light pattern, such as parallel lines, a grid, or speckles, is beamed on the target. The observed reflected pattern will be distorted by the contour of the target, and the contour as well as the distance to the object can be recovered by analysis of the distortion. Successive projections of coded or phase-shifted patterns are often required to extract a single depth frame, which leads to lower frame rates, which in turn mean that the subject must remain relatively still during the projection sequence to avoid blurring.
Compared to a simple active triangulation, structured light adds “feature points” to the target. As feature points are pre-determined (i.e., spatially encoded) and very recognizable, the structured light approach makes feature recognition easier and triangulation therefore faster and more reliable. This technology shifts complexity from the receiver to the source and requires more sophisticated light sources but simpler sensors and lower computational intensity.
Scanning LIDAR measures the distance to an object or space by illuminating it with a pulsed laser beam and measuring the reflected pulses with a sensor. By scanning the laser beam in 2D and 3D, differences in laser return times and wavelengths can be used to make 2D or 3D representations of the scanned object or space. LIDAR uses ultraviolet (UV), visible, or near-infrared light, which is typically reflected via backscattering to form an image or map of the space or object being under study.
A 3D time-of-flight (ToF) camera works by illuminating the scene with a modulated light source and observing the reflected light. The phase shift between the illumination and the reflection is measured and translated to distance. Unlike LIDAR, the light source is not scanned; instead, the entire scene is illuminated simultaneously, resulting in higher frame rates. Typically, the illumination is from a solid-state laser or LED operating in the near-infrared range (of about 800-1500 nm) invisible to the human eye. Typically, an imaging sensor responsive to the same spectrum receives the light and converts the photonic energy to electrical current, then to charge, and then to a digitized value. The light entering the sensor has a component due to ambient light, and a component from the modulated illumination source. Distance (depth) information is only embedded in the component reflected from the modulated illumination. A high ambient component can saturate the sensor, introducing nonlinearities on the measurement and reducing the signal to noise ratio (SNR). Hence, a sufficiently strong component of the reflected modulated illumination and a low enough ambient component is necessary to achieve a high enough SNR for an accurate measurement of the distance between the sensor and the reflected object. Insufficient reflected illumination can be, among other reasons, a result of insufficient illumination or low reflectivity of the objects being illuminated.
To detect phase shifts between the illumination and the reflection, the light source in a 3D ToF camera may be pulsed to produce a square wave, or modulated by a continuous-wave source, typically a sinusoid. Distance is measured for every pixel in a 2D addressable array, resulting in a depth map, or collection of 3D points. The depth map may be rendered or otherwise projected into a 3D space as a collection of points, or a point cloud. The 3D points can be mathematically connected to form a mesh onto which a textured surface may be mapped.
3D ToF cameras have been used in industrial settings but, to date, the deployments have tended to involve non-safety critical applications such as bin-picking and palletizing. Because existing off-the-shelf 3D ToF cameras are not safety-rated, they cannot be used in safety-critical applications such as machine guarding or collaborative robotics applications, meeting, for example, the ISO 10218 standard, which at the time of this writing employs a Performance Level d (PLd) performance standard. Accordingly, there is a need for architectures and techniques that render 3D cameras, including ToF cameras, useful in applications requiring a high degree of safety and conformance to industry-recognized safety standards.
There is a need for architectures, sensing and imaging techniques that render 3D cameras, including ToF cameras, useful and reliable in applications requiring a sufficiently high degree of safety and in conformance with industry-recognized safety standards. U.S. Pat. Nos. 10,887,578 and 10,887,579 describe an architecture utilizing one or more 3D cameras (e.g., ToF cameras) in industrial safety applications. However, those two patents are silent on what is the required imaging performance of such a camera, and how to achieve such a level of performance.
For various reasons, on any given frame, a small fraction of the pixels in a ToF image may provide an incorrect range measurement or no range measurement at all. For non-safety applications, such incorrect pixels are typically not a major concern. However, for safety applications, an incorrect pixel could result in an undesired condition.
Accordingly, the present disclosure addresses a number of those issues.
In the drawings, like reference characters generally refer to the same parts throughout the different views. Also, the drawings are not necessarily to scale, with an emphasis instead generally being placed upon illustrating the principles of the present disclosure. In the following description, various aspects of the present disclosure are described with reference to the following drawings, in which:
The following detailed description is meant to assist the understanding of one skilled in the art and is not intended in any way to unduly limit claims connected to or related to the present disclosure.
The following detailed description references various figures, where like reference numbers refer to like components and features across various figures, whether specific figures are referenced, or not.
The word “each” as used herein refers to a single object (i.e., the object) in the case of a single object or each object in the case of multiple objects. The words “a,” “an,” and “the” as used herein are inclusive of “at least one” and “one or more” so as not to limit the object being referred to as being in its “singular” form.
Any reference(s) to industry standard(s) made herein are made with respect to the content of such standard(s) as it exists at the time of writing this paper. Such standard(s), to which reference is made, is/are incorporated herein by reference in its/their entirety/entireties.
In the following discussion, an integrated system for monitoring a workspace is described, classifying regions therein for safety purposes, and dynamically identifying safe states. In some cases, the latter function involves semantic analysis of a robot in the workspace and identification of the workpieces with which it interacts. It should be understood, however, that these various elements may be implemented separately or together in desired combinations; the inventive aspects discussed herein do not require all of the described elements, which are set forth together merely for ease of presentation and to illustrate their interoperability. The system as described is exemplary. The ensuing discussion describes aspects involving ToF cameras, but it should be understood that the present disclosure may utilize any form of 3D sensor capable of recording a scene and assigning depth information, typically on a pixelwise basis, to a recorded scene. Functionally, the 3D camera generates a depth map, or a depth-space 3D image that may be used by external hardware and software to classify objects in a workcell or workspace and generate control signals for machinery.
Referring to
Referring to
Referring to
The processor 110 may be or include any suitable type of computing hardware, e.g., a microprocessor, but in various aspects may be a microcontroller, peripheral integrated circuit element, a CSIC (customer-specific integrated circuit), an ASIC (application-specific integrated circuit), a logic circuit, a digital signal processor, a programmable logic device such as an FPGA (field-programmable gate array), PLD (programmable logic device), PLA (programmable logic array), RFID processor, graphics processing unit (GPU), smart chip, or any other device or arrangement of devices that is capable of implementing the steps of the processes of the present disclosure.
In the illustrated aspect, the processor 110 operates an FPGA 112 and may advantageously provide features to support safety-rated operation, e.g., Safety Separation Design Flow to lock down place and route for safety-critical portions of the design; clock check; single event upset; CRC functions for various data and communication paths that cross the FPGA boundary; and usage of safety-rated functions for individual sub-modules. Within the processor's integrated memory and/or in a separate, primary random-access memory (RAM) 125 typically dynamic RAM, or DRAM—are instructions, conceptually illustrated as a group of modules that control the operation of the processor 110 and its interaction with the other hardware components. These instructions may be coded in any suitable programming language, including, without limitation, high-level languages such as C, C++, C#, Java, Python, Ruby, Scala, Lua, Julia, PHP or Go, utilizing, without limitation, any suitable frameworks and libraries such as TensorFlow, Keras, PyTorch, or Theano. Additionally, the software can be implemented in an assembly language and/or machine language directed to a microprocessor resident on a target device. An operating system (not shown) directs the execution of low-level, basic system functions such as memory allocation, file management and operation of mass storage devices. At a higher level, a pair of conventional depth-compute engines 1301, 1302 (generally referred to individually or collectively as depth-compute engine 130) receive raw 3D sensor data and assign depth values to each pixel of the recorded scene. Raw data refers to the uncalibrated data coming from a sensor (e.g., 12 bits per pixel). The RAM 125 supports error-correcting code (ECC), which is important for safety-rated applications.
Using two independent lenses and 3D sensor modules 115 creates two separate optical paths. This redundancy allows for immediate detection if one of the camera modules 115 fails during operation. Also, by not picking up the exact same image from each lens and sensor combination, additional levels of processing can be performed by an image comparison module 135, which projects the response of a pixel from one optical path into corresponding pixels of the other optical path. (This projection may be determined, for example, during a calibration phase.) Failure modes that can be detected through this comparison include errant detections due to multiple reflections and sensor-sensor interference. When the two sensors 115 agree within an established noise metric based on the performance characteristics of the cameras, the two independent images can also be used to reduce noise and/or increase resolution. Redundant sensing for dual-channel imaging ensures that reliability levels required for safety-critical operation in industrial environments can be met.
If the comparison metric computed by the comparison module 135 is within the allowed range, the merged output is processed for output according to a network communication protocol. In the illustrated aspect, output is provided by a conventional low-latency Ethernet communication layer 140. This output may be utilized by a safety-rated processor system for controlled machinery as described, for example, in U.S. Pat. No. 11,543,798 issued on Jan. 3, 2023, the entire disclosure of which is hereby incorporated by reference.
The system 100 may include one or more environmental sensors 145 to measure conditions such as temperature and humidity. In one aspect, multiple on-board temperature sensors 145 are disposed at multiple locations across the sensors 115, e.g., at the center of the illumination array, on the camera enclosure, and within the camera enclosure internally (one near the primary sensor and one near the secondary sensor), for calibrating and correcting the 3D sensing modules as system-generated heat and ambient temperature changes or drifts affect the camera's operating parameters. For example, camera temperature variations can affect the camera's baseline calibration, accuracy, and operating parameters. Calibration may be employed to establish operating temperature ranges where performance is maintained; sensor detection of conditions outside these ranges can cause a shutdown, preventing undesired failures. As discussed in greater detail below, temperature correction parameters may be estimated during calibration and then applied in real-time during operation. In one aspect, the system 100 identifies a stable background image and uses this to constantly verify the correctness of the calibration and that the temperature-corrected image remains stable over time.
A fundamental problem with the use of depth sensors in safety-rated systems is that the depth result from each pixel is not known with 100% certainty. The actual distance to an object can differ from the reported depth. The error between the reported depth and actual depth may become significant, manifesting as a mismatch between an object's actual and apparent location, and this mismatch will be randomized on a per-pixel basis. Pixel-level errors may arise from, for example, raw data saturation or clipping, unresolvable ambiguity distance as calculated by different modulation frequencies, a large intensity mismatch between different modulation frequencies, a predicted measurement error above a certain threshold due to low SNR, or excessive ambient light level. A safety-rated system where it is desired to know accurate distances cannot afford such errors. The approach taken by typical ToF cameras is to zero out the data for a given pixel if the received intensity is below a certain level. For pixels with medium or low received optical intensity, the system can either conservatively disregard the data and be totally blind for that pixel, or it can accept the camera's reported depth result, which may be off by some distance.
Accordingly, depth data provided in the sensor output may include a predicted measurement error range of the depth result, on a per-pixel basis, based on raw data processing and statistical models. For example, it is common for ToF cameras to output two values per pixel: depth and optical intensity. Intensity can be used as a rough metric of data confidence (i.e., the reciprocal of error), so instead of outputting depth and intensity, the data provided in the output may be depth and an error range. The error range may also be predicted, on a per-pixel basis, based on variables such as sensor noise, dark frame data (as described below), and environmental factors such as ambient light and temperature.
Thus, this approach represents an improvement over simple pass/fail criteria as described above, which ignore all depth data for pixels with a signal-to-noise ratio (SNR) below a threshold. With a simple pass/fail approach, depth data is presented as if there is zero measurement error, so a safety-critical process that relies on the integrity of this data may set the SNR threshold sufficiently high that the actual measurement error has no undesired impact at the system level. Pixels with medium to low SNR may still contain useful depth information despite having increased measurement error and are either completely ignored (at a high SNR threshold) or are used with the incorrect assumption of zero measurement error (at a low SNR threshold). Including the measurement error range on a per-pixel basis allows a higher-level safety-critical process to utilize information from pixels with low to mid SNR levels while properly bounding the depth result from such pixels. This may improve overall system performance and uptime over the simple pass/fail approach, although it should be noted that a pass/fail criterion for very low SNR pixels can still be used with this approach.
In accordance with the present disclosure, error detection can take different forms with the common objective of preventing erroneous depth results from being propagated to a higher-level safety-critical process, on a per-pixel basis, without simply setting a threshold for the maximum allowable error (or equivalently minimum required intensity). For example, a pixel's depth can be reported as 0 with a corresponding pixel error code. Alternatively, the depth-compute engine 130 can output the depth along with the expected range error, enabling the downstream safety-rated system to determine whether the error is sufficiently low to permit the pixel to be used.
For example, as described in U.S. Pat. No. 10,099,372 issued on Oct. 16, 2018, the entire disclosure of which is hereby incorporated by reference, a robot safety protocol may involve modulating the robot's maximum velocity (by which is meant modulating the velocity of the robot itself or any appendage thereof) proportionally to the minimum distance between any point on the robot and any point in the relevant set of sensed objects to be avoided. The robot is allowed to operate at maximum speed when the closest object is further away than some threshold distance beyond which collisions are not a concern, and the robot is halted altogether if an object is within a certain minimum distance. Sufficient margin can be added to the specified distances to account for movement of relevant objects or humans toward the robot at some maximum realistic velocity. Thus, in one approach, an outer envelope or 3D zone is generated computationally around the robot. Outside this zone, all movements of, for example, a detected person are considered safe because, within an operational cycle, they cannot bring the person sufficiently close to the robot to pose a hazard. Detection of any portion of the person's body within a second 3D zone, computationally defined (e.g., defined through computational/mathematical processes such as modeling or analysis) within the first zone, does not prohibit the robot from continuing to operate at full speed. If any portion of the detected person crosses the threshold of the second zone but is still outside a third interior zone within the second zone, the robot is signaled to operate at a slower speed. If any portion of the detected person crosses into the innermost zone or is predicted to do so within the next cycle based on a model of human movement, operation of the robot is halted.
In this case, the zones may be adjusted (or the space considered occupied by the detected person may be expanded) based on estimated depth errors. The greater the detected error, the larger the envelope of the zones or the space assumed to be occupied by the detected person will be. In this way, the robot may continue operating based on error estimates instead of shutting down because too many pixels do not satisfy a pass/fail criterion.
Because any single image of a scene may contain shimmer and noise, in operation, multiple images of a scene are obtained by both sensors 115 in rapid succession following a frame trigger. These “subframes” are then averaged or otherwise combined to produce a single final frame for each sensor 115. The subframe parameters and timing relative to the frame trigger can be programmable at the system level and can be used to reduce crosstalk between sensors. Programming may include subframe timing to achieve time multiplexing, and also frequency modulation of the carrier. Subframe averaging may increase the SNR, thereby improving system performance.
As indicated in
Some aspects utilize a dark frame (i.e., an image of the scene without illumination) for real-time correction of ambient noise and sensor offset. Often a differential measurement technique that uses multiple subframe measurements to cancel out noise sources is effective. However, by using the dark subframe not only as a measurement of ambient levels but also as a measurement of inherent camera noise, the number of subframes required can be decreased, which increases the amount of signal available for each subframe.
As illustrated in
Each data path 222 may have multiple DDR interfaces with ECC support to allow for simultaneous reading and writing of memory, but the two data paths 222 are independent. Each of the depth-compute pipelines 2301, 2302 (generally referred to, individually or collectively, as depth-compute pipelines 230) operates in a pipelined fashion such that, after each processing step, a new frame can be started as an earlier frame is completed and intermediate frames are stepwise advanced through the processing path. Data relevant to calibration (e.g., temperature data) may be acquired and passed alongside contemporaneous sensor data to the depth-compute pipelines 230, so that at each processing step, the depth computation is performed in accordance with environmental conditions prevailing when the frame was acquired.
The new images with depth information that emerge after each time step from the depth-compute pipelines are compared by the sensor comparison processing unit 235 as described above and output as Ethernet data.
As described in U.S. Pat. No. 11,543,798, 3D sensor data may be processed to facilitate detection and classification of objects in the monitored space, their velocities, and distances between them. Computation modules in the external computer vision process the depth images to generate and/or analyze the 3D volume. For example, the system may recognize hazards, e.g., as a person approaches controlled machinery such as a robot, the system issues commands to slow or stop the machinery, restarting it once the person has cleared the area. The computer vision system may also control sensor operation, e.g., triggering them in a sequential fashion so as to prevent crosstalk among the sensors.
In a typical deployment of the illustrated system 200, multiple 3D ToF cameras are mounted and fixed in place around the workspace or object to be measured or imaged (see for example,
In greater detail, and with reference to
Following this calibration step, the same images of the checkerboard used for calibration may be analyzed by conventional stereo calibration software that produces the rotational and translation components of the spatial transform. The checkerboard image obtained by the secondary sensor 215S is transformed using this coordinate transform and the result is compared with the image obtained by the primary sensor 215M (step 330). The result is used as input to the calibration process again as a fine-tuning. The procedure 300 is repeated until a desired level of convergence in the parameters (i.e., deviation between the transformed and observed image) is achieved.
Range calibration is employed to minimize error in the range value reported by each pixel of the sensors 215. For example, a range correction may be computed for every pixel for each of the raw data modes (various illumination patterns and illumination time windows) of the sensors 215. Most 3D cameras have an inherent property called fixed pattern phase noise (FPPN), which introduces a fixed offset value for the distance reported by each pixel. In order to make the system 200 report the correct distance, each sensor 215 is calibrated as herein described.
A representative secondary calibration procedure, which includes range calibration and adjustment for temperature, is illustrated within the broader calibration procedure 400 in
Other metadata may also be captured, such as the subframe expected background image, which may be used for real-time monitoring of camera measurement stability. Each camera 100 can frame or subframe trigger an exposure by varying illumination frequencies and illumination levels, including the dark level captured by the camera under no illumination. Through the external subframe external sync 150, multiple 3D ToF cameras can be triggered at different frequencies and illumination levels to minimize interference and lower the latency of all the 3D ToF cameras in the workcell. By coordinating the overall timing of the cameras (to ensure that only one is illuminating the scene at a time), typically by an external computer vision system as described above, latency between all the cameras can be reduced and acquisition frequency increased.
As noted, the range data produced by an image sensor is generally temperature dependent. In accordance with the present disclosure the dependency may be empirically approximated linearly and used to recalculate the range values as if they were produced at a fixed reference temperature, e.g., 25° C. (
The linear relationship may be given by
where C(T0) is the FPPN calibration value to be stored on the EEPROM and used for the range correction at a reference temperature T0 (e.g., 25° C.), TC is the on-sensor temperature as actually measured by a thermometer within or close to the sensor 215 in the system (e.g., camera) 200, D* is the theoretically calculated true value of the range distance, D(TC) is the range value directly calculated from the raw sensor data during the calibration at temperature TC, and k is a coefficient whose value depends on the sensor and the modulation frequency mode and may be obtained empirically without undue experimentation. In some aspects, since this coefficient depends on the attributes of the sensor 215 and the modulation frequency employed, there are four different coefficients k, i.e., for the primary and secondary sensors 215M, 215S and for each of the two modulation frequencies. The additional term k·(TC−T0) is added when computing the FPPN calibration value C(T0), i.e., the range offset. In particular, FPPN calibration involves gathering a number of frames for each angular orientation (pose) of the sensor. The range values for each frame are averaged, and the average range reading serves as D(TC) in the equation above. Correspondingly, the on-sensor temperature is acquired for each frame, and these values are averaged to obtain a general temperature value TC for the given pose. The process is repeated for each pose of the system 200.
The resulting calibration parameters (i.e., the lens parameters and calibration maps) are uploaded to a non-volatile programmable read-only memory (PROM) 2451, 2452 of each sensor 215 (step 440). Alternatively, the PROMs 245 may be more easily modified, e.g., as Flash memory. The calibration maps necessary for the correct range calculation are applied internally by the FPGA 210. After completion of the calibration (and, in some aspects, following a validation procedure that confirms the calibration on a benchmarking arrangement), the camera 200 is brought into production mode whereby it is made fully operational for customers (step 445).
Calibration can be adjusted not only for camera-specific performance differences but characterizing interference between cameras in a multiple-camera configuration. During initialization, one camera at a time illuminates the scene and other cameras determine how much signal is received. This procedure facilitates the creation of an interference matrix, which may be employed (e.g., by an external computer vision system as described above) to determine which cameras can illuminate at the same time. Alternatively, this approach can also be used to create a real-time correction similar to crosstalk correction techniques used for electronic signal transmission. In particular, multiple cameras may cooperate with each other (in, for example, an ad hoc network or with one camera designated as the primary and the others operating as secondaries) to sequentially cause each of the cameras to generate an output while the other cameras are illuminating their fields of view, and may share the resulting information to build up, and share, the interference matrix from the generated outputs. Alternatively (and more typically), these tasks may be performed by a supervisory controller (e.g., the external computer vision system) that operates all cameras.
The depth-compute pipeline utilizes these data along with the streaming frame data as well as data characterizing the sensor's fixed noise properties in computing depth and error as described above. When the camera 200 is powered up, the corresponding FPGA flash image is activated by the camera's operating system. During the initialization stage, the operating system causes calibration parameters and other data to be retrieved from the boot PROMs 2451, 2452 and copied into the relevant registers (e.g., camera characterization parameters) or into the DDR memory banks 2171, 2172 (e.g., calibration maps). Following initialization, the system 200 is switched into a “ready” state and is ready for UDP communication with external control devices.
In accordance with the present disclosure, the following data may be stored in the boot PROMs 2451, 2452; each data field may be protected against the errors on the communication channel using, for example, a cyclic redundancy check:
Optionally:
During run time, the depth-compute engine 230 accesses the calibration data in real time from DDR3 memory as needed. In particular, real-time recalibration adjusts, in a conventional fashion, for drift of operating parameters such as temperature or illumination levels during operation. Health and status monitoring information may also be sent after every frame of depth data, and may include elements such as temperatures, pipeline error codes, and FPGA processing latency margins as needed for real-time recalibration.
Data flows from each sensor 215 through a data reception path in the FPGA 210 and into the associated DDR 217. The data is stored in the DDR 217 at a subframe level. Once a depth-compute engine 230 recognizes that a full subframe has accumulated in the associated DDR 217, it starts pulling data therefrom. Those pixels flow through the depth-compute engine 230 and are stored back in the associated DDR 217 as single-frequency depth values. These contain ambiguous depth results that need to be resolved later in the pipeline via comparison. Accordingly, as soon as the first three or more subframes needed for calculating the first single-frequency result are available in the DDR 217, the associated depth-compute engine 2301, 2302 (see also 1301, 1302, in
However, rather than loading the second single-frequency depth result into memory as it is calculated, it is processed along with the first single-frequency depth result on a pixelwise basis to produce an unambiguous depth result. This result is then stored in memory as an intermediate value until it can be further compared to the second unambiguous depth result obtained from the third and fourth single-frequency depth results. This process is repeated until all the relevant subframes are processed. As a last step, all intermediate results are read from the DDR and final depth and intensity values are calculated.
An operating timer 250 (once again shown as an internal component for convenience, but which may be implemented externally) may be included to keep track of the hours of camera operation, periodically sending this data to the user via the communication layer 240. The calibration unit 242 may also receive this information to adjust operating parameters as the camera illumination system and other components age. Moreover, once the aging limit for VCSELs is reached, the timer 250 may produce an error condition to alert the user that maintenance is required.
The features described above address various possible failure modes of conventional 3D cameras or sensing systems, such as multiple exposures or common mode failures, enabling operation in safety-rated systems. The system may include additional features for safety-rated operation. One such feature is over/under monitoring of every voltage rail by a voltage monitor so that, if a failure condition is detected, the camera may be turned off immediately. Another is the use of a safety-rated protocol for data transmission between the different elements of the 3D ToF camera and the external environment, including the external sync. Broadly speaking, a safety-rated protocol will include some error checking to ensure that bad data does not get propagated through the system. It is possible to create a safety-rated protocol around a common protocol, such as UDP, which supports high bandwidths but is not inherently reliable. This is accomplished by adding features that effect a desired safety-rating, such as packet enumeration, CRC error detection, and frame ID tagging. These assure that the current depth frame is the correct depth frame for further downstream processing after the frame data is output from the camera.
A common failure mode of cameras with active optical sensors that depend on reflection, such as LIDAR and ToF cameras, is that they do not return any signal from surfaces that are insufficiently reflective, and/or when the angle of incidence between the sensors and the surface being detected is too shallow. This may lead to undesired failure because the level of the signal may be indistinguishable from the one measured if no obstacle is encountered. The sensor, in other words, will report an empty volume despite the possible presence of an obstacle.
For various reasons, on any given imaged frame, a small fraction of the pixels in a ToF image may provide an incorrect range or distance (the terms distance and range are used interchangeably herein) measurement or no range measurement at all. For non-safety applications, such incorrect pixels are typically not a major concern. However, for safety applications, an incorrect pixel could result in a condition that may be a hazard.
Many safety applications involve determining the nearest possible location of a person to a machine, such as an industrial robot, and sending an emergency stop signal to the machine if that person is too close. In such applications, failure to detect an object where there is, in reality, an object in the volume can create an undesirable situation in the case where the object is a human and there is machinery nearby operating such that the distance between the object and the machine is less than a predetermined protective separation distance.
This is why ISO standards for e.g., 2D LIDAR sensors have specifications for the minimum reflectivity of objects that must be detected; however, these reflectivity standards can be difficult to meet for some 3D sensor modalities such as ToF. In a representative workcell (such as those illustrated in
U.S. Pat. Nos. 10,099,372, 10,899,007 and 11,279,039 provide a methodology (using two or more sensors that can sense a given volume) where if one sensor does not see a return, a second (or more) sensor can affirmatively receive a return from the same volume, confirming whether the space is actually empty or occupied.
A simpler approach (using a single sensor) to guarantee a return signal in order to meet safety-rated standards (such as for example, ISO 61496-3) is to increase the intensity of the illumination by using a more powerful light source, driving a lower power light source harder so it is a more powerful light source (i.e., increasing the power output of the lower power light source), using more lower power light sources in parallel, increasing the sensitivity of the light sensing element, and/or decreasing the field of view over which the light source is diffused to increase illumination power density, where, as noted herein, the light source may be a VCSEL laser source or any other suitable light source. But this generates its own set of issues that need to be addressed for the operation of the machinery in accordance with the applicable standards, such as those described herein.
Imaging distortions that are introduced by more powerful light sources involve the impact of the higher signal return on pixel-level saturation from the higher dynamic range and distortions on adjacent pixels from saturation. The present disclosure provides methods and techniques that may correct these issues.
In addition to pixels providing no range data due to insufficient return, a pixel may provide a range measurement that is incorrect. If an incorrect pixel results in an object being observed as further away from the machine than it actually is, the system may fail to send an emergency stop signal, which could result in an undesired situation or simply to unnecessary stops of the machine if the object is observed as closer to the machine than it actually is.
When using time of flight sensors for safety applications, it is typically desired to utilize various strategies for detecting and discarding invalid measurements. One strategy (as presented in U.S. Pat. Nos. 10,887,578 and 10,887,579) is to employ two redundant ToF imagers in the same camera and compare their results, where if the results from the two imagers agree the values can be used, if they disagree, they are to be discarded.
Discarding invalid measurements can result in unnecessarily conservative conclusions that lower productivity. For example, if a pixel is discarded, a solution would be to assume that the entire volume of space that would have been monitored by that pixel may be occupied. Especially when a ToF sensor is being used at relatively long range to monitor a large space, the volume monitored by a single pixel may be large enough to contain an entire person or at least a person's hand or arm. In this case, a single bad pixel could result in an undesired emergency stop.
In many cases where a pixel is flagged as invalid or suspect by one aspect of the system (such as the system 100 described herein), such as low intensity or a discrepancy with a second imager, additional elements of the sensor's data corresponding to secondary physical properties of the observed scene can be used to reconstruct a correct or conservative range value for that volume observed by that pixel. Such strategies are often desired to achieve a sufficiently low rate of both unnecessary stoppage of the machine and failure to stop the machine when desired for a target application.
Aspects of the present disclosure relate to systems and methods for three-dimensional ToF imaging of a large and diverse scene demanding reliability and accuracy for safety-rating purposes. Methods and techniques have been developed that may increase pixel-level range accuracy, accurately measure range for low-intensity pixels and pixels that receive illumination reflected from multiple surfaces and reduce or substantially eliminate various sources of range error.
One common source of error in ToF measurements occurs when a single pixel observes reflected light from multiple surfaces. This can occur due to edges, such as where the field of the single pixel includes, for example, both the edge of a table and the floor beneath that table (or any other suitable edge of an object). In that case, some of the photons received by the pixel will have reflected off the table, while other photons will have reflected off the floor (see
This reflected light issue can be mitigated by exploiting the periodic range ambiguity inherent in ToF imaging. This periodic range ambiguity in meters is defined by the ratio of 150/MF (i.e., the speed of light/MF/2), where MF is the modulation frequency of the sensor in MHz. For example, an object positioned at x meters from the sensor will be indistinguishable from an object positioned at an integer number of ambiguity distances plus x. One method for addressing this is to take measurements at two frequencies f1, f2, resulting in measurements with different range or distance ambiguities that can be used to disambiguate each other. For example, a sensor with a frequency f1 of 50 MHz illumination will be able to measure a maximum of 3 m. Any distance greater than 3 m will result in a measurement between 0 and 3 m. In accordance with the present disclosure, and with reference to
Referring to
The controller 110, 112, 210, 512 is communicably connected to the at least two three-dimensional sensors 115M, 115S, 215M, 215S to receive pixelwise data PWD from each sensor 115M, 115S, 215M, 215S. The pixelwise data PWD embodying intensity and distance information from the illuminated objects. The controller 110, 112, 210, 512 is configured (e.g., with any suitable non-transitory computer program code) so as disambiguate a distance to each of the illuminated objects, resolve error in the received pixelwise data PWD due to periodic distance ambiguity, and determine corrected pixelwise values CPWD indicative of true distance Td via each sensor illumination at the at least two different frequencies f1, f2.
In the image processing system, the controller 110, 112, 210, 512 is configured: so as to disambiguate the distance and resolve error in the received pixelwise data PWD of each sensor 115M, 115S, 215M, 215S, and determine corrected pixelwise values CPWD of the output array of pixelwise values PWV of each sensor 115M, 115S, 215M, 215S indicative of the true distance Td; and/or so that each sensor illumination at the at least two different frequencies f1, f2 describes a phase space in the corresponding field of view FOV that characterizes the relationship (within the corresponding field of view FOV) of different measured intensities (Ii) and corresponding measured distances/ranges (Ri) (or measured distances from the sensor 115M, 115S, 215M, 215S or any suitable reference datum/origin from which the distance is measured), of each object embodied in the pixelwise data PWD registered (e.g., recorded in any suitable memory) by the controller 110, 112, 210, 512 at the least two different frequencies f1, f2. The phase space relationship is programmed in the controller 110, 112, 210, 512 and the phase space relationship characterizes the relation between differences in the measured intensities (ΔIi,i+1) and in differences of the measured distances (ΔRi,i+1) corresponding to the measured intensities (Ii) in the pixelwise data PWD.
In the image processing system the controller 110, 112, 210, 512 is programmed to one or more of: identify (or quantify) discrepancies in measured distances (Ri) from the measured intensities (Ii), the differences in the measured intensities (Ii), the measured distances (Ri), and the differences of the measured distances (ΔRi,i+1) and calculate a distance error in the measured distance (Ri); and determine a true distance Td value from the measured distance (Ri) and distance error.
In practice, the two intensity and range (depth/distance) measurements at the two different frequencies f1, f2 will differ slightly because of different responses at those frequencies f1, f2 to secondary sources of error such as light reflecting from two or more surfaces (multipath and edge effects) and harmonic distortion resulting from the fact that such sensors typically emit light in a square wave rather than a true sine wave. These secondary differences can be employed to further reduce error in the resulting combined measurement.
This can be accomplished as follows. Consider two sets of intensity and range measurements, Ia, Ra, Ib, and Rb. The discrepancy between the intensity values and range values can be respectively denoted as (Iratio=Ia/Ib) and (ΔR). Theoretical calculations developed in accordance with the present disclosure provide a phase space where discrepancies Iratio and ΔR define the range measurement error and the relative intensity of direct optical paths and multi-path optical paths. This phase space produces a 1:1 correspondence between the Iratio and ΔR values and the length and intensity of the multi-path optical path(s). Thus, given these discrepancies, it is possible to compute exactly how the data is affected by multi-path optical path(s), and correct for it. This computation describes precisely how multi-path optical path(s) effects affect the data, allowing the system 100 to directly correct these and produce more accurate data.
In an ideal scenario, the intensity and unambiguous range between the two frequencies f1, f2 agree identically. However, in the presence of multi-path optical paths, there may be an error between the two ranges or distances (each range being measured at a respective one of the two frequencies f1, f2) and an error between the two intensities. The source of the multi-path optical paths (intensity and range) will change the two error signals and hence, determine the error signal, or error er(Ia, Ib), er(Ra, Rb) in the measured signals (Ia, Ra), (Ib, Rb) for each corresponding frequency f1, f2, and identify the true range or distance Td of the object.
Accordingly, the present disclosure proposes a method where all four pieces of measured information (Ia, Ra), (Ib, Rb), the intensity average (Ia+Ib)/2, range average (Ra+Rb)/2, intensity difference (Ia−Ib), and range difference (Ra−Rb) are used to estimate four unknowns (true range of a target, intensity of the target, multipath signal range, and intensity). Linear algebra multivariable equation methods or nonlinear multivariate inverse methods may be employed to extract the true range and intensity of the target from the intensity and range average between the two frequencies and the intensity and range difference between the two frequencies.
Referring to
The method may include one or more of the following, in any suitable combination thereof and/or in combination with any of the features described herein: with the controller 110, 112, 210, 512, disambiguating the distance and resolve error in the received pixelwise data PWD of each sensor 115M, 115S, 215M, 215S, and determine corrected pixelwise values CPWD of the output array of pixelwise values PWV of each sensor 115M, 115S, 215M, 215S indicative of the true distance Td; the controller 110, 112, 210, 512 is configured so that each sensor illumination at the at least two different frequencies f1, f2 describes a phase space in the corresponding field of view FOV that characterizes the relationship (within the corresponding field of view FOV) of different measured intensities (Ii) and corresponding measured distances/ranges (Ri) (or measured distances from the sensor 115M, 115S, 215M, 215S or any suitable reference datum/origin from which the distance is measured), of each object embodied in the pixelwise data PWD registered (e.g., recorded in any suitable memory) by the controller 110, 112, 210, 512 at the least two different frequencies f1, f2; the phase space relationship is programmed in the controller 110, 210, 512 and the phase space relationship characterizes the relation between differences in the measured intensities (ΔIi,i+1) and in differences of the measured distances (ΔRi,i+1) corresponding to the measured intensities (I) in the pixelwise data PWD; with the controller, identifying discrepancies in measured ranges (Ri) from the measured intensities (Ii), the differences in the measured intensities (ΔIi,i+1), the measured distances (Ri), and the differences of the measured distances (ΔRi,i+1), and calculating a distance error in the measured distance (Ri); and determining, with the controller 110, 112, 210, 512 a true distance Td value from the measured distance (Ri) and distance error.
As mentioned above, one cause of invalid measurements is a pixel whose intensity is too low to provide a valid range measurement. This is particularly likely to occur with small, low reflectivity features. In a camera that includes at least two imagers (a primary imager and a secondary imager) such as the one described in U.S. Pat. Nos. 10,887,578 and 10,887,579, the intensity values of the pixels from the primary imager and the secondary imager corresponding to a particular point can be used to replace or estimate the distance to the camera by using stereo vision techniques. Small, low reflectivity features in a high frequency neighborhood are particularly likely to provide valid stereo information as they would stand out in an otherwise higher reflectivity image.
The replacement approach would work like this: A point O projects onto (without loss of generality) a primary imager's pixel P1 (
The estimation approach would work like this: If P1 and P2 both have invalid range data, a new range can be estimated (
This use of stereo depth from intensity increases the signal to noise ratio. The stereo depth signal can be considered a third source of range in addition to the two direct versions of range provided by each of the ToF imagers/sensors 115M, 115S (
Referring to
In
The present disclosure provides for a method of validating uncertain pixelwise value(s) of at least one pixel of an output array of pixelwise values of at least one of the sensors 115M, 115S (
The method includes one or more of the following, in any suitable combination thereof and/or in any suitable combination with the features described herein: the pixelwise distance SD resolved from binocular images BM forms a third measure of confidence relative to the pixelwise values of the output array of pixelwise values PWV; the pixelwise distance SD resolved from binocular images BM is determined substantially simultaneously with obtaining the pixelwise values of the output array of pixelwise values PWV; each of the at least two three-dimensional sensors 115M, 115S (
Another source of invalid pixels are edge effects. Edge effects occur when a particular pixel or group of pixels is simultaneously imaging surfaces of materially different depths, where the depth signal from that pixel or group of pixels is effectively the “average” of both depths (see
In particular, high frequency gradient pixels HFGP can result in the appearance of objects where no object exists, which may result in false positives. However, those objects cannot simply be ignored because the actual location of the surfaces 750A, 750B that caused those reflections could represent a potential hazard.
Edge effects are worsened the further away the return surface 750A, 750B is. The higher the cameras 100 are mounted the higher the probability of edge effects, and also the lower the probability of other cameras 100 overlapping and covering for the fixed effects error. Hence longer-range cameras, such as those with high illumination, are likely to have worsened edge effects. Using a single camera also prevents the use of other sensors to correct the edge effects. However, it is possible to identify a range value for high frequency gradient pixels HFGP that can be used for safety.
Referring to
The spot size imaged by the pixel gets larger as the return surface 750A, 750B distance increases, hence the high frequency gradient pixel analysis done by the depth-compute engine 130 has to account for the distance to the return and to the spot size. The gradient calculation is performed in real time, requiring computation cycles from the imaging computer or processing unit 110.
The present disclosure may employ a calibration step to determine the edge effect correction as a function of distance. Hence, the correction for spot size is not done in real time, but using a lookup table LUP (see
Still referring to
The controller 110, 112, 210, 512 is programmed with a look up table LUP (see
In accordance with the present disclosure, and referring to
In the method, the controller 110, 112, 210, 512 is programmed with a look up table LUP (see
Another source of error is associated with dynamic range. Dynamic range refers to brighter, higher reflectivity surfaces resulting in a slightly different range error than lower reflectivity surfaces at the same distance due to non-linearities in sensor components. This is because sensors typically have a slightly non-linear relationship between the number of received photons and the measured intensity. For example, doubling the illumination of a low-reflectivity surface may only cause a 1.9× increase in the measured reflectance, due to imperfections in the sensor hardware.
This dynamic range effect is expected to be the same for each pixel in the camera 100. As such, this error can be characterized by measuring the computed dynamic range for materials of diverse reflected intensities during a calibration process performed on each sensor/camera 100 at manufacturing time (
Referring to
Referring to
Another source of error may be since optical imagers are subject to a blurring function due to imperfections in the optical and sensor components used. This blur is described by a point spread function (PSF). This is particularly important in ToF sensors, where blur can hide small objects and cause edge effects. Thus, it is important to minimize this PSF with careful lens design, and to correct it when processing the sensor data.
In sensor design, it is important to consider the path of light through the entire imager, and to minimize the PSF. However, it may not be possible to fully eliminate the PSF due to the physical constraints of optical systems. As such, it is desirable to correct for the remaining effects in software. The PSF can be determined in at least three ways. First, the path of light through the sensor can be mathematically computed based on the lens design, giving a mathematically determined PSF. Second, the path can be simulated using optical software and a model of the lens, similarly determining a theoretical PSF. Finally, the PSF can be measured during sensor calibration. The measured data has the advantage of including lens focusing and electrical leakage effects. To measure the PSF, one of a few approaches can be used: a sharp, angled edge can be imaged by the sensor. By measuring the blur over this edge, the PSF can be determined. However, this assumes that the PSF is circular and constant over the imager. If this is not the case, a point source of light synchronized with the sensor's emission can be used. This point source will produce a blurred region in the image, which can be used to determine the PSF. If a synchronous light source (or sufficiently reflective surface) is not obtainable, one can also use a “dark frame” where the only illumination comes from a constant light point-source. The PSF is similarly calculated.
Once the PSF has been obtained, it can be used to correct the image at run-time, such as by the processing unit 110. In the noiseless case, this is as simple as convolving the inverse of the PSF with the image. However, on real data this amplifies noise. A wide range of alternative approaches can be used, including using a low-pass filter on the data before deconvolution, or using iterative approaches to minimize the difference between the image and the expected “uncorrupted” version of the image, which can be computed using the PSF.
Referring to
Referring to
The following are provided in accordance with the present disclosure and may be employed individually, in any combination with each other, and/or in any combination with the features described herein:
An image processing system is provided and includes: at least two three-dimensional sensors each for illuminating a corresponding field of view of the sensor, and generating an output array of pixelwise values indicative of distances to illuminated objects in the field of view, each sensor being configured to generate illumination at least at two different frequencies so that objects in the corresponding field of view are illuminated at the two different frequencies; and a controller communicably connected to the at least two three-dimensional sensors to receive pixelwise data from each sensor embodying intensity and distance information from the illuminated objects, and the controller is configured so as disambiguate distance to each of the illuminated objects, resolve error in the received pixelwise data due to periodic distance ambiguity, and determine corrected pixelwise values indicative of true distance via each sensor illumination at the at least two different frequencies.
The image processing system includes one or more of the following individually, in any combination with each other, and/or in any combination with the features described herein: the controller is configured so as to disambiguate the distance and resolve error in the received pixelwise data of each sensor, and determine corrected pixelwise values of the output array of pixelwise values of each sensor indicative of the true distance; the controller is configured so that each sensor illumination at the at least two different frequencies describes a phase space in the corresponding field of view that characterizes the relationship of different measured intensities and corresponding measured distances, of each object embodied in the pixelwise data registered by the controller at the least two different frequencies; the phase space relationship is programmed in the controller and the phase space relationship characterizes the relation between differences in the measured intensities and in differences of the measured distances corresponding to the measured intensities in the pixelwise data; the controller is programed to identify discrepancies in measured distances from the measured intensities, the differences in the measured intensities, the measured distances and the differences of the measured distances and calculate a distance error in the measured distance; and the controller is programmed to determine a true distance value from the measured distance and distance error; the three-dimensional sensor is a time-of-flight sensor.
An automated logistic system including the image processing system, where the image processing system includes: at least two three-dimensional sensors each for illuminating a corresponding field of view of the sensor, and generating an output array of pixelwise values indicative of distances to illuminated objects in the field of view, each sensor being configured to generate illumination at least at two different frequencies so that objects in the corresponding field of view are illuminated at the two different frequencies; and a controller communicably connected to the at least two three-dimensional sensors to receive pixelwise data from each sensor embodying intensity and distance information from the illuminated objects, and the controller is configured so as disambiguate distance to each of the illuminated objects, resolve error in the received pixelwise data due to periodic distance ambiguity, and determine corrected pixelwise values indicative of true distance via each sensor illumination at the at least two different frequencies. The image processing system may also include one or more of the following individually, in any combination with each other, and/or in any combination with the features described herein: the controller is configured so as to disambiguate the distance and resolve error in the received pixelwise data of each sensor, and determine corrected pixelwise values of the output array of pixelwise values of each sensor indicative of the true distance; the controller is configured so that each sensor illumination at the at least two different frequencies describes a phase space in the corresponding field of view that characterizes the relationship of different measured intensities and corresponding measured distances, of each object embodied in the pixelwise data registered by the controller at the least two different frequencies; the phase space relationship is programmed in the controller and the phase space relationship characterizes the relation between differences in the measured intensities and in differences of the measured distances corresponding to the measured intensities in the pixelwise data; the controller is programed to identify discrepancies in measured distances from the measured intensities, the differences in the measured intensities, the measured distances and the differences of the measured distances and calculate a distance error in the measured distance; and the controller is programmed to determine a true distance value from the measured distance and distance error; the three-dimensional sensor is a time-of-flight sensor.
A method is provided and includes: providing an image processing system having at least two three-dimensional sensors each for illuminating a corresponding field of view of the sensor; generating, with each of the at least two three-dimensional sensors, an output array of pixelwise values indicative of distances to illuminated objects in the field of view, where each sensor is configured to generate illumination at least at two different frequencies so that objects in the corresponding field of view are illuminated at the two different frequencies; and receiving, with a controller communicably connected to the at least two three-dimensional sensors, pixelwise data from each sensor embodying intensity and distance information from the illuminated objects; and with the controller: disambiguating a distance to each of the illuminated objects, resolving error in the received pixelwise data due to periodic range ambiguity, and determining corrected pixelwise values indicative of true distance via each sensor illumination at the at least two different frequencies.
The method includes one or more of the following individually, in any combination with each other, and/or in any combination with the features described herein: the controller is configured so as to disambiguate the distance and resolve error in the received pixelwise data of each sensor, and determine corrected pixelwise values of the output array of pixelwise values of each sensor indicative of the true distance; the controller is configured so that each sensor illumination at the at least two different frequencies describes a phase space in the corresponding field of view that characterizes the relationship of different measured intensities and corresponding measured distances, of each object embodied in the pixelwise data registered by the controller at the least two different frequencies; the phase space relationship is programmed in the controller and the phase space relationship characterizes the relation between differences in the measured intensities and in differences of the measured distances corresponding to the measured intensities in the pixelwise data; with the controller: identifying discrepancies in measured ranges from the measured intensities, the differences in the measured intensities, the measured distances, and the differences of the measured distances, and calculating a distance error in the measured distance; with the controller, determining a true distance value from the measured distance and distance error; and the three-dimensional sensor is a time-of-flight sensor.
An image processing system is provided and includes: at least two three-dimensional sensors each for illuminating a corresponding field of view of the sensor, and generating an output array of pixelwise values indicative of distances to illuminated objects in the field of view, the at least two three-dimensional sensors being disposed to generate binocular images of the illuminated objects in the field of view; and a controller communicably connected to the at least two three-dimensional sensors to receive pixelwise data from each sensor so as to register the binocular images and configured to effect stereo matching from the binocular images resolving a pixelwise distance of the illuminated object, which pixelwise distance resolved from the binocular images corresponds to at least one pixel of the output array of pixelwise values of each three-dimensional sensor; and wherein the controller is configured to validate an uncertain pixelwise value of the at least one pixel of the output array of pixelwise values of one of the at least two three-dimensional sensors.
The image processing system includes one or more of the following individually, in any combination with each other, and/or in any combination with the features described herein: the pixelwise distance resolved from binocular images forms a third measure of confidence relative to the pixelwise values of the output array of pixelwise values; the pixelwise distance resolved from binocular images is determined substantially simultaneously with obtaining the pixelwise values of the output array of pixelwise values; each of the at least two three-dimensional sensors is a ToF sensor and the pixelwise values are obtained from ToF data of the sensor; and the controller is configured so as to provide the comparison unit with the validated pixelwise value of the sensor validated from the binocular vision.
An automated logistic system including the image processing system, where the image processing system includes: at least two three-dimensional sensors each for illuminating a corresponding field of view of the sensor, and generating an output array of pixelwise values indicative of distances to illuminated objects in the field of view, the at least two three-dimensional sensors being disposed to generate binocular images of the illuminated objects in the field of view; and a controller communicably connected to the at least two three-dimensional sensors to receive pixelwise data from each sensor so as to register the binocular images and configured to effect stereo matching from the binocular images resolving a pixelwise distance of the illuminated object, which pixelwise distance resolved from the binocular images corresponds to at least one pixel of the output array of pixelwise values of each three-dimensional sensor; and wherein the controller is configured to validate an uncertain pixelwise value of the at least one pixel of the output array of pixelwise values of one of the at least two three-dimensional sensors. The image processing system may also include one or more of the following individually, in any combination with each other, and/or in any combination with the features described herein: the pixelwise distance resolved from binocular images forms a third measure of confidence relative to the pixelwise values of the output array of pixelwise values; the pixelwise distance resolved from binocular images is determined substantially simultaneously with obtaining the pixelwise values of the output array of pixelwise values; each of the at least two three-dimensional sensors is a time-of-flight sensor and the pixelwise values are obtained from time-of-flight data of the sensor; and the controller is configured so as to provide the comparison unit with the validated pixelwise value of the sensor validated from the binocular vision.
A method is provided and includes: providing an image processing system having at least two three-dimensional sensors each for illuminating a corresponding field of view of the sensor, and generating an output array of pixelwise values indicative of distances to illuminated objects in the field of view, the at least two three-dimensional sensors being disposed to generate binocular images of the illuminated objects in the field of view; and receiving, with a controller communicably connected to the at least two three-dimensional sensors, pixelwise data from each sensor so as to register the binocular images; effecting, with the controller, stereo matching from the binocular images resolving a pixelwise distance of the illuminated object, which pixelwise distance resolved from binocular images corresponds to at least one pixel of the output array of pixelwise values of each three-dimensional sensor; and validating, with the controller, an uncertain pixelwise value of the at least one pixel of the output array of pixelwise values of one of the at least two three-dimensional sensors.
The method includes one or more of the following individually, in any combination with each other, and/or in any combination with the features described herein: the pixelwise distance resolved from binocular images forms a third measure of confidence relative to the pixelwise values of the output array of pixelwise values; the pixelwise distance resolved from binocular images is determined substantially simultaneously with obtaining the pixelwise values of the output array of pixelwise values; each of the at least two three-dimensional sensors is a time-of-flight sensor and the pixelwise values are obtained from time-of-flight data of the sensor; and the controller provides the comparison unit with the validated pixelwise value of the sensor validated from the binocular vision.
An image processing system is provided and includes: at least two three-dimensional sensors each for illuminating a corresponding field of view of the sensor, and generating an output array of pixelwise values indicative of distances to illuminated objects in the field of view; and a controller communicably connected to the at least two three-dimensional sensors to receive pixelwise data from each sensor so as to register the output array of pixelwise values as a measured depth map array of the illuminated objects, the controller being configured to search for and find high frequency gradient pixels within the depth map array in substantially real time; and wherein the controller is further configured to identify a neighbor pixel, to each high frequency gradient pixel within the depth map array, with a value indicating a minimum distance to the illuminated objects and set the minimum distance as a true minimum distance to the illuminated objects.
The image processing system includes one or more of the following individually, in any combination with each other, and/or in any combination with the features described herein: the controller is programmed with a look up table of edge effect correction factors as a function of distance described by calibration data characterizing pixelwise response to edge effects as a function of distance for each sensor; and the controller finds the high frequency gradient pixels within the depth map array substantially in real time based on convolving the depth map array with an edge detector and application of the edge effect correction factors from the look up table.
An automated logistic system is provided and includes the image processing system, where the imaging processing system includes: at least two three-dimensional sensors each for illuminating a corresponding field of view of the sensor, and generating an output array of pixelwise values indicative of distances to illuminated objects in the field of view; and a controller communicably connected to the at least two three-dimensional sensors to receive pixelwise data from each sensor so as to register the output array of pixelwise values as a measured depth map array of the illuminated objects, the controller being configured to search for and find high frequency gradient pixels within the depth map array in substantially real time; and wherein the controller is further configured to identify a neighbor pixel, to each high frequency gradient pixel within the depth map array, with a value indicating a minimum distance to the illuminated objects and set the minimum distance as a true minimum distance to the illuminated objects. The image processing system may also include one or more of the following individually, in any combination with each other, and/or in any combination with the features described herein: the controller is programmed with a look up table of edge effect correction factors as a function of distance described by calibration data characterizing pixelwise response to edge effects as a function of distance for each sensor; and the controller finds the high frequency gradient pixels within the depth map array substantially in real time based on convolving the depth map array with an edge detector and application of the edge effect correction factors from the look up table.
A method is provided and includes: providing an image processing system having at least two three-dimensional sensors each for illuminating a corresponding field of view of the sensor, and generating an output array of pixelwise values indicative of distances to illuminated objects in the field of view; receiving, with a controller communicably connected to the at least two three-dimensional sensors, pixelwise data from each sensor so as to register the output array of pixelwise values as a measured depth map array of the illuminated objects; searching for and finding, with the controller, high frequency gradient pixels within the depth map array in substantially real time; and identifying, with the controller, a neighbor pixel, to each high frequency gradient pixel within the depth map array, with a value indicating a minimum distance to the illuminated objects and setting the minimum distance as a true minimum distance to the illuminated objects.
The method may also include one or more of the following individually, in any combination with each other, and/or in any combination with the features described herein: the controller is programmed with a look up table of edge effect correction factors as a function of distance described by calibration data characterizing pixelwise response to edge effects as a function of distance for each sensor; and the controller finds the high frequency gradient pixels within the depth map array substantially in real time based on convolving the depth map array with an edge detector and application of the edge effect correction factors from the look up table.
An image processing system is provided and includes: at least one three-dimensional sensor for illuminating a corresponding field of view of the sensor, and generating an output array of pixelwise values indicative of distances to illuminated objects in the field of view; and a controller communicably connected to the at least one three-dimensional sensor to effect output from the at least one three-dimensional sensor of a successive output arrays of pixelwise values, and to receive pixelwise data from the at least one three-dimensional sensor so as to register each of the successive output arrays of pixelwise values; wherein the controller is programmed with a correction factor describing a pixelwise linearized response of the at least one three-dimensional sensor to intensity variances characterized by calibration data of calibration fields of different reflection intensities, other than dark frame illumination, and wherein the controller is configured to correct in real time, with the correction factor, the pixelwise data of each of the successive output arrays of pixelwise values and form a succession of corrected depth map arrays, substantially invariant to different intensities of the illuminated objects in the field of view.
The image processing system may also include one or more of the following individually, in any combination with each other, and/or in any combination with the features described herein: the at least one three-dimensional sensor is a time-of-flight sensor; and the pixelwise values are obtained from time-of-flight data of the time-of-flight sensor.
An automated logistic system is provided and including the image processing system, where the image processing system includes at least one three-dimensional sensor for illuminating a corresponding field of view of the sensor, and generating an output array of pixelwise values indicative of distances to illuminated objects in the field of view; and a controller communicably connected to the at least one three-dimensional sensor to effect output from the at least one three-dimensional sensor of a successive output arrays of pixelwise values, and to receive pixelwise data from the at least one three-dimensional sensor so as to register each of the successive output arrays of pixelwise values; wherein the controller is programmed with a correction factor describing a pixelwise linearized response of the at least one three-dimensional sensor to intensity variances characterized by calibration data of calibration fields of different reflection intensities, other than dark frame illumination, and wherein the controller is configured to correct in real time, with the correction factor, the pixelwise data of each of the successive output arrays of pixelwise values and form a succession of corrected depth map arrays, substantially invariant to different intensities of the illuminated objects in the field of view. The image processing system may also include one or more of the following individually, in any combination with each other, and/or in any combination with the features described herein: the at least one three-dimensional sensor is a time-of-flight sensor; and the pixelwise values are obtained from time-of-flight data of the time-of-flight sensor.
A method is provided and includes: providing an image processing system including at least one three-dimensional sensor for illuminating a corresponding field of view of the at least one three-dimensional sensor, and generating an output array of pixelwise values indicative of distances to illuminated objects in the field of view; effecting, with a controller communicably connected to the at least one three-dimensional sensor, output from the at least one three-dimensional sensor of successive output arrays of pixelwise values; receiving, with the controller, pixelwise data from the at least one three-dimensional sensor so as to register each of the successive output arrays of pixelwise values, wherein the controller is programmed with a correction factor describing a pixelwise linearized response of the at least one three-dimensional sensor to intensity variances characterized by calibration data of calibration fields of different reflection intensities, other than dark frame illumination; and correcting in real time, with the controller, with the correction factor, the pixelwise data of each of the successive output arrays of pixelwise values and forms a succession of corrected depth map arrays, substantially invariant to different intensities of the illuminated objects in the field of view.
The method may also include one or more of the following individually, in any combination with each other, and/or in any combination with the features described herein: the at least one three-dimensional sensor is a time-of-flight sensor; and the pixelwise values are obtained from time-of-flight data of the time-of-flight sensor.
An image processing system is provided and includes: at least one three-dimensional sensor for illuminating a corresponding field of view of the at least one three-dimensional sensor, and generating an output array of pixelwise values indicative of distances to illuminated objects in the field of view; and a controller communicably connected to the at least one three-dimensional sensor to effect output from the at least one three-dimensional sensor of successive output arrays of pixelwise values, and to receive pixelwise data from the at least one three-dimensional sensor so as to register each of the successive output arrays of pixelwise values; wherein the controller is programmed with a correction factor describing a pixelwise blurring of the at least one three-dimensional sensor characterized by calibration data of calibration fields of different reflection intensities, other than dark frame illumination, and wherein the controller is configured to correct in real time, with the correction factor, the pixelwise data of each of the successive output arrays of pixelwise values and form a succession of corrected depth map arrays, substantially invariant to different intensities of the illuminated objects in the field of view.
The image processing system may also include one or more of the following individually, in any combination with each other, and/or in any combination with the features described herein: the at least one three-dimensional sensor is a time-of-flight sensor; the pixelwise values are obtained from time-of-flight data of the time-of-flight sensor; and the calibration data characterizing the blurring includes a point spread function.
An automated logistic system is provided and includes the image processing system, where the image processing system includes at least one three-dimensional sensor for illuminating a corresponding field of view of the at least one three-dimensional sensor, and generating an output array of pixelwise values indicative of distances to illuminated objects in the field of view; and a controller communicably connected to the at least one three-dimensional sensor to effect output from the at least one three-dimensional sensor of successive output arrays of pixelwise values, and to receive pixelwise data from the at least one three-dimensional sensor so as to register each of the successive output arrays of pixelwise values; wherein the controller is programmed with a correction factor describing a pixelwise blurring of the at least one three-dimensional sensor characterized by calibration data of calibration fields of different reflection intensities, other than dark frame illumination, and wherein the controller is configured to correct in real time, with the correction factor, the pixelwise data of each of the successive output arrays of pixelwise values and form a succession of corrected depth map arrays, substantially invariant to different intensities of the illuminated objects in the field of view. The image processing system may also include one or more of the following individually, in any combination with each other, and/or in any combination with the features described herein: the at least one three-dimensional sensor is a time-of-flight sensor; the pixelwise values are obtained from time-of-flight data of the time-of-flight sensor; and the calibration data characterizing the blurring includes a point spread function.
A method is provided and includes: providing an image processing system including at least one three-dimensional sensor for illuminating a corresponding field of view of the at least one three-dimensional sensor, and generating an output array of pixelwise values indicative of distances to illuminated objects in the field of view; effecting, with a controller communicably connected to the at least one three-dimensional sensor, output from the at least one three-dimensional sensor of successive output arrays of pixelwise values; receiving, with the controller, pixelwise data from the at least one three-dimensional sensor so as to register each of the successive output arrays of pixelwise values, wherein the controller is programmed with a correction factor describing a pixelwise blurring of the at least one three-dimensional sensor characterized by calibration data of calibration fields of different reflection intensities, other than dark frame illumination; and correcting in real time, with the controller, with the correction factor, the pixelwise data of each of the successive output arrays of pixelwise values and form a succession of corrected depth map arrays, substantially invariant to different intensities of the illuminated objects in the field of view.
The method may also include one or more of the following individually, in any combination with each other, and/or in any combination with the features described herein: the at least one three-dimensional sensor is a time-of-flight sensor; the pixelwise values are obtained from time-of-flight data of the time-of-flight sensor; and the calibration data characterizing the blurring includes a point spread function.
Certain aspects of the present disclosure are described above. It is, however, expressly noted that the present invention is not limited to those aspects; rather, additions and modifications to what is expressly described herein are also included within the scope of the invention. Accordingly, the present disclosure is intended to embrace all such alternatives, modifications and variances that fall within the scope of any claims appended hereto. Further, the mere fact that different features are recited in mutually different dependent or independent claims does not indicate that a combination of these features cannot be advantageously used, such a combination remaining within the scope of the present disclosure.
This application is a continuation-in-part of U.S. Ser. No. 17/577,487 (filed Jan. 18, 2022), which is a continuation-in-part of U.S. Ser. No. 17/103,427 (filed Nov. 24, 2020), which is a continuation of U.S. Ser. No. 16/553,724 (filed on Aug. 28, 2019), now U.S. Pat. No. 10,887,578, which claims priority to and the benefit of U.S. Ser. No. 62/724,941 (filed on Aug. 30, 2018). The entire disclosures of these priority documents are hereby incorporated by reference in their entireties.
Number | Date | Country | |
---|---|---|---|
62724941 | Aug 2018 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 16553724 | Aug 2019 | US |
Child | 17103427 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 17577487 | Jan 2022 | US |
Child | 18920605 | US | |
Parent | 17103427 | Nov 2020 | US |
Child | 17577487 | US |