Collaborative robots (cobots) are increasingly regarded as cost-effective solutions for automating high-mix, low-volume processes. However, their application faces significant challenges when handling small, fragile objects commonly found in semiconductor manufacturing and high-precision applications. Existing end-effectors lack the capability for pressure-sensitive handling of sub-centimeter objects, particularly those made of delicate, reflective, or refractive materials such as glass and semiconductor components used in optoelectronics.
Previous approaches for addressing these limitations have relied on static light sources or light sources mounted on a robot wrist. However, such configurations present drawbacks. There are occlusions at the tool center point (TCP) and hard shadows or vibration-induced blur. The robot's range of motion is constrained due to enlarged clearance requirements for the lighting source. Robot joint limits are reduced to avoid self-collisions. Also, operation times caused by alternating between sensing and recognition for six-dimensional pose estimation and grasping are prolonged. In practice, these drawbacks create barriers within industrial and manufacturing scenarios that involve clutter, small devices, trays, or machine-tending chambers where tools, parts, and other collateral elements must be picked and placed.
In order to address challenging grasping scenarios involving small objects and non-Lambertian surfaces, the aspects of the disclosure provide dynamic illumination from multiple sources directed at an object near a grasp point. Both the end-effector and camera remain stationary, while only the light pattern is manipulated. This approach employs simple software-based inter-frame synchronization among finger-integrated lights and in-hand camera(s), represents.
The NSI unit 200 (120) comprises a finger unit assembly 210 that includes a USB data and power connector, testing probes, and a dielectric elastomer for compliant tactile/force sensing. A mounting frame 220 that encapsulates the components and includes a microcontroller unit (MCU) 230 (component with processor circuitry) configured for tactile analog-to-digital sampling, signal smoothing and linearization, control of the dynamic illumination subsystem, management of robot operating system (ROS)-based communications with a robot host unit, and the like. A printed circuit board (PCB) 240 that mounts both static components (LEDs, MCU, capacitors, and resistors) and dynamic components, including a replaceable tactile (pressure) transducer/sensor. An array of light sources 250 (LEDs or dynamic illumination sources) incorporates red, green, blue, and white (RGBW) channels for projecting precise luminance and chromatic patterns in high-dynamic range mode. There are 18 lights sources in the example shown. A quick-replacement attachment mechanism, socket 260 and transducer 270, for the transducer, enables rapid sensor replacement and auto-calibration. A pressure transducer 280 with characterized force/voltage behavior shown in graph 270b. A semi-transparent diffuser and protective cover 290 that serves as an illumination-spatial band-pass filter while protecting the PCB and LEDs from collisions and slippage during grasping operations. The NSI unit 200 (120) integrates both tactile sensing and dynamic illumination capabilities in a compact form factor optimized for robotic grasping applications.
The set of RGB (W) pixels 330 is considered an image g (x, y){N3|(x, y)∈[0,3]×[0,5]}. Some elements {S0,0, S0,3, S4,0, S4,1, S5,0, S5,1} are not present due to tactile, fasteners and MCU space occupancy. Formally, these few elements are computed as if existing. Practically, ignoring few of these elements remain invariant in the application due to superposition and diffusion of the covering case.
The reference set 340 of illumination basis Ω:={gi|g(x, y, λ, ψ, θ, σ, γ)} consists of a collection of discretized Gabor filters where each value map to either a luminance (x, y)N value or a chromatic value (x, y)
N3. The Gabor gi (x, y) includes five degrees-of-freedom (350), for aperture λ∈R, phase
orientation
y sin(θ) and y′=−x sin(θ)+y cos(θ)), smoothing/acutance σ and spatial aspect ratio γ.
Due to the size and number of elements in g, there are only a subset of values which produce visible patterns beyond constant illumination. Reference numeral 360 shows five different illumination patterns (gh through gv) where the phase of the Gabor function is varied to produce shifting illumination patterns. These functions in combination with the saliency extractor 4110 allows the training of an AI model as described below.
The architecture 400 comprises three processing stages that work together to enable illumination control during robotic manipulation.
The offline geometric process stage 400A, 400B includes five processing stages 410-450 that generate visible structural regions 450 and pre-grasping poses 410d optimized for both visibility and grasping functionality.
The process begins with grasping synthesis using a robot URDF (Unified Robot Description Format) 410a and associated inverse kinematics functionality 410b to generate feasible grasping sequences 410c. The system stores three frames: pre-grasping (before finger gripper closure), grasping (with contact to target object), and post-grasping (without contact in a collision-free position based on scene post-conditions). The resulting pre-grasping poses 410d are stored in a database after ensuring they maintain visibility of the structural regions of both the tray and target object.
The system processes input describing spatial placement of target objects using parametric deviations expressed as probability functions. This enables bounded variational generation processes within physically plausible parameters. The input, which are CAD files for both the tray 430a and target object 440a, are processed as tessellated meshes with small crease angles, allowing efficient removal of non-visible structural elements like coplanar edges and small length segments along curvatures such as chamfers and other smoothing mesh artifacts. In the same manner, the system filters edges by length and aperture while identifying concave-coplanar regions of significant area (430b, 440b).
This one-way process selects and associates visually salient regions that remain detectable despite reflections or refractions caused by discontinuity bounds. The final results are stored in a visible structural regions 450 that maintains the relationships between visible structural regions and their associated grasping poses, enabling the system to optimize both visual detection and physical manipulation capabilities. The resulting set of pre-grasping poses ensures no occlusion of visible structural regions of the tray or target object and is stored in a database.
The offline simulation and self-data annotation stage 400C implements a pipeline for generating training data that enables the creation and training of an encoder-dual-decoder model for adaptive dynamic illumination pattern generation. These patterns dynamically illuminate the scene under inspection to amplify the saliency of visual elements for grasping.
The process begins with creating a training dataset 4120 by ensuring sufficient variability in vantage points and poses of target objects in the tray. The parametric sampler 460 obtains configurations from labeled six-dimensional placement distributions 420 (shown in
The system then selects illumination patterns g; and g; (480b) from the illumination pattern set 480a and applies (480c) them to the simulated scene 490 (shown in
In the saliency computation stage 4110, the system applies edge detection (Gabor-Jet) and semantic segmentation. The system retains only image pairs that demonstrate minimal correlation (intersection over union at 5-10%) between visible edges and stable regions for subsequent processing stages.
These patterns dynamically illuminate the scene under inspection to amplify the saliency of key visual elements for grasping, creating a self-supervised dataset that enables adaptive illumination control during robotic manipulation tasks.
D. AI Model Training to Auto-Encoder with Dual Decoder
The inference architecture 500A employs an autoencoder 510 with dual decoders 520, 530 to adaptively create dynamic illumination patterns. The system begins with input images Ia(x, y)∈N3 captured without dynamic illumination under low-lighting conditions that allow proper focus given the object distance. The images are generated in 4100a of
From the collection of illumination patterns (480) Ω:={gi|g(x, y, λ, ψ, θ, σ,γ)}, the system selects a subset n=|Ωa|<<|Ω| with low cardinality 2≤n≤8 based on the structure map in 4100c. This ranging is directly computed by the saliency function in 4110. The process utilizes tuples {Γa,i,j} of the form:
The autoencoder structure includes an encoding sub-net Φ(Ia(x, y))Z∈Rw that map input images into latent space Z, and two decoder sub-nets: a decoder base 520 Y(Z∈Rw, α)
I′a,i,j (x, y) used only during training to shape the latent space Z∈Rw, and decoder extension 530 Δ(Z∈Rw, α)
[gi, gj] that generates illumination pattern pairs.
Once the model is trained, this sub-net is no longer needed during inference, significantly reducing the workload. On the other hand, the decoder extension 530 takes the input (query image Ia and orientation cue a) in the latent space zaα∈Z and associated orientation cue α to decode into a lower dimensional pair of illumination patterns [gi, gj] in a single reshaped tensor.
During runtime inference, the system processes a real image from the camera image without dynamic illumination, applying a high orientation cue α˜π/2 to create the encoding za,α∈Z. The decoder extension 530 then generates 2≤n≤20 illumination patterns [gi,α
This architecture 500 enables adaptive generation of illumination patterns that optimize visibility of object features during robotic grasping operations, while maintaining a compact and efficient implementation suitable for real-time operation.
The calibration procedure 600B starts with measuring pressure 0≤ρs(t)≤1∈R for each gripper side s∈{L, R}, as shown at 640. The system computes mean
The calibration continues with the gripper closing in a step-wise closed-loop combining gripper-encoder. Here, the center of mass of the ball and its contour are tracked to identify a variation to correlate the contact as shown at 650. In practice, the closure continues until both tactile sensors detect the contact. Then the fingers and wrist open and move, respectively, to ensure the centering of the ball.
The final calibration stage involves closing the gripper while observing sensor readings until either saturation occurs or the gripper motors approach 50% of power limits to prevent overload and prop damage. This creates a signal ramp at constant temperature that characterizes the prop's compressibility behavior
where Vs∈m3 represents ball/sphere volume in m3 and δV indicates its deviation due to pressure increase δp∈N/m2. The calibration process 600B employs volume approximation using an ellipsoid model
of the compressed ball's semi-axes, with a simplified two-axis observation (ab). Hence, an approximation of the form ΔVe=−κVsΔp. Here, the camera calibration (distortion compensation, namely unwrapping) and the known size of the ball radius r allows the approximation of the pixel-to-millimeters transformations. Moreover only 2 (ab) of the semi-axes of the ellipse can be observed (abc) from a top view, hence the system performs a further approximation of
Setting with c′0 set to the aperture of the gripper from the URDF and its encoder limits the physical consistency of the approximation but allows establishment of a visual two degrees-of-freedom approximation of the deformation via
With κ remaining unknown, it is obtained from experimental data or work in pseudo pascal via
The latter approach provides a linear behavior to model proportionality on the applied force. This means the visually approximate volume variation and aperture of the gripper is then related to Δp·κ=[
The system can optionally leverage the arrays light sources 122 integrated into the gripper fingers and a camera mounted on the end-effector to generate detailed multi-dimensional models of objects. By synchronizing dynamic illumination patterns projected from multiple angles with camera captures, the system enables rapid acquisition of object features under varying lighting conditions. This is particularly advantageous for Neural Radiance Fields (NeRF) and Neural Point-Based Graphics (Splatting) technologies, where controlled lighting and diverse viewpoints enhance the learning of implicit multi-dimensional representations-especially for objects with complex surface properties such as the reflective and refractive materials common in semiconductor components.
Semiconductor, pharmaceutical, and bio-technology manufacturing processes and technology development commonly occur in high-mix, complex environments that present material handling challenges due to the sensitive, fragile, and cleanliness-intensive nature of the workpieces, as well as strict cost constraints. Prior solutions for transporting partially assembled workpieces are often complex, expensive, and tailored to a specific task, thereby limiting their flexibility. Consequently, many high-mix or non-value-added operations (e.g., metrology) frequently rely on manual handling, which increases the risk of contamination, damage, and human error.
In contrast, the visuo-haptic approach for grasping with collaborative robots as disclosed herein provides a more cost-effective option for automated material handling while mitigating the risks associated with manual operations. By integrating real-time information, this approach dynamically avoids obstacles during pickup and identifies optimal pickup parameters—such as position, orientation, and lighting—for each unique situation. Unlike prior solutions, which may require costly redesigns to accommodate a wide range of sample sizes, the aspects disclosed herein offer a single, scalable solution that delivers robust quality and cleanliness on par with automated equipment but at lower cost and overhead.
Additionally, this visuo-haptic methodology may be applied to various other low-volume, high-mix processes involving sensitive workpieces, where effective human-robot collaboration improves overall process quality, yield, and throughput. In such environments, it constitutes a better approach to both risky manual handling and expensive, fully automated systems.
Further, the aspects of the disclosure overcome limitations of existing technologies in semiconductor and pharmaceutical manufacturing applications in particular. Traditional robot grippers with tactile sensors often suffer from mechanical stress due to compression and wear, leading to frequent recalibrations and preventive replacements. In contrast, the disclosed solution integrates a durable transducer with automatic self-calibration capabilities, enabling rapid and cost-effective replacement without the need for specialized tools, saline fluids, or extensive engineering time. This solution significantly reduces downtime and operational costs, offering a seamless approach to visuo-haptic material handling without requiring alternative instrumentation. Also, sensors employing silicone with bonding properties are unsuitable in semiconductor and pharma manufacturing.
Moreover, the aspects of the disclosure introduce active illumination that overcomes the challenges posed by reflective and refractive surfaces, variable 6D object poses, and non-Lambertian materials. Unlike previous setups, which rely on heavy, rigid, and stationary configurations, this system integrates compact, energy-efficient lighting directly into the robot wrist and links. By eliminating the need for bulky setups and additional floor space, the system ensures dependable visual perception while maintaining flexibility and adaptability for high-mix, low-volume applications such as machine tending, inspection, and assembly.
Additionally, the solution's design addresses the limitations of prior active illumination approaches. It reduces computational and coordination overhead through single-source or dynamically modulated illumination, minimizing sensitivity to occlusion and dynamic shadows. The streamlined and compact end-effector design eliminates the bulky components and complex cabling that restrict movement in conventional systems, enabling precise manipulation even in confined environments. This adaptability makes it suitable for high-precision tasks requiring dynamic handling and fine servoing in demanding manufacturing settings.
By combining enhanced tactile sensing, automatic calibration, and advanced visual perception technologies, the disclosed solution redefines efficiency, reliability, and versatility in manufacturing automation.
The techniques of this disclosure may also be described in the following examples.
Example 1. A component of a system, comprising: processor circuitry; and a non-transitory computer-readable storage medium including instructions that, when executed by the processing circuitry, cause the processor circuitry to: receive image data of an object captured by a camera; analyze a visual feature of the object based on the received image data; generate illumination patterns based on the analyzed visual feature; and control arrays of light sources integrated into a plurality of fingers of a robotic gripper to project the illumination patterns within a grasp volume defined by the plurality of fingers during object manipulation to enhance detection of the visual feature of the object, wherein each light source in the arrays of light sources is individually controllable.
Example 2. The component of example 1, wherein: each of the light sources comprises RGB (W) (red, green, blue, and white) light-emitting diode elements (LED elements), or LEDs in a non-visible spectrum coupled with a multi-spectral camera, configured to project variable intensities or colors of light, and the instructions further cause the processor circuitry to generate the illumination patterns by dynamically varying intensity or color balance of each of the light sources.
Example 3. The component of any one or more of examples 1-2, wherein the instructions further cause the processor circuitry to: dynamically control the arrays of light sources to project the illumination patterns within the grasp volume to create a shifting illumination wavefront for edge detection, wherein the shifting illumination wavefront enhances detection of a horizontal, vertical, or diagonal edge and infers surface properties.
Example 4. The component of any one or more of examples 1-3, wherein the instructions further cause the processor circuitry to: receive pressure data from pressure sensors integrated into the fingers; and adjust manipulation of the object based on the received pressure data.
Example 5. The component of any one or more of examples 1-4, wherein the instructions further cause the processor circuitry to: receive pressure data from pressure sensors integrated into the fingers; acquire visual feedback of a compressible calibration object's deformation; and calibrate the pressure sensors automatically based on the pressure data and the compressible calibration object's deformation.
Example 6. The component of any one or more of examples 1-5, wherein the instructions further cause the processor circuitry to: capture a sequence of images using a camera mounted on the robotic gripper while controlling the arrays of light sources to project different illumination patterns within the grasp volume; generate a multi-dimensional model of the object based on the captured sequence of images and corresponding illumination patterns; and adjust subsequent illumination patterns based on features detected in the multi-dimensional model to enhance visual detection of object geometry during manipulation.
Example 7. The component of any one or more of examples 1-6, wherein the instructions further cause the processor circuitry to: receive a model of the object; extract a visible feature from the model; generate a set of illumination patterns using dynamic kernel and saliency functions; apply the generated illumination patterns to a simulated scene including the object; evaluate saliency of the illumination patterns based on detection of the visible feature; select illumination patterns that achieve a minimum saliency threshold; and train a neural network using the selected illumination patterns to generate an illumination pattern decoder for runtime operation.
Example 8. The component of example 7, wherein the set of illumination patterns are generated by the kernel function by varying aperture, phase, orientation, smoothing, or spatial aspect ratio parameters.
Example 9. The component of any one or more of examples 1-8, wherein the instructions further cause the processor circuitry to: generate a training dataset using an object model and simulated illumination patterns; train a neural network using the training dataset; and use the trained neural network to generate illumination patterns during operation.
Example 10. The component of any one or more of examples 1-9, wherein the instructions further cause the processor circuitry to: encode the image data into a latent space representation; and decode the latent space representation into illumination control parameters for the arrays of light sources.
Example 11. The component of example 10, wherein the instructions further cause the processor circuitry to: use a base decoder during training to shape the latent space representation; and use an extension decoder to generate the illumination control parameters during runtime operation.
Example 12. The component of example 10, wherein the instructions further cause the processor circuitry to: encode a camera image captured without dynamic illumination into the latent space representation; and decode the latent space representation into a plurality of pairs of illumination patterns for the fingers.
Example 13. The component of any one or more of examples 1-12, wherein the instructions further cause the processor circuitry to: extract visible geometric elements from a model of the object; define visible structural regions based on the extracted geometric elements; and determine the illumination patterns based on the visible structural regions.
Example 14. The component of example 13, wherein the instructions further cause the processor circuitry to: receive parametric placement distributions defining possible positions and orientations of the object; generate scene layouts based on the parametric placement distributions; and simulate illumination of the scene layouts to generate training data.
Example 15. The component of example 14, wherein the instructions further cause the processor circuitry to: render images of the simulated scene layouts with and without the determined illumination patterns; generate a structure map encoding geometric features of the rendered images; and evaluate saliency of the geometric features to select illumination patterns that enhance feature detection.
Example 16. The component of any one or more of examples 1-15, wherein the instructions further cause the processor circuitry to generate a training dataset comprising: camera images captured without dynamic illumination; pairs of illumination patterns for the fingers; and saliency-selected illuminated images showing enhanced geometric features.
Example 17. The component of example 16, wherein the instructions further cause the processor circuitry to: select illumination patterns having a minimal correlation between visible edges and stable regions; and train a neural network using the selected patterns to generate runtime illumination control.
Example 18. A robotic system, comprising: a gripper including fingers that together define a grasp volume; an array of light sources integrated into each of the fingers, wherein each of the light sources is individually controllable; a controller circuitry configured to: receive image data of an object; analyze a visual feature of the object based on the image data; generate illumination patterns based on the analyzed visual feature; and dynamically control the arrays of light sources to project the illumination patterns within the grasp volume during object manipulation to enhance detection of the visual feature of the object.
Example 19. The robotic system of example 18, further comprising: pressure sensors integrated into the fingers, wherein the controller circuitry is further configured to: receive pressure data from the pressure sensors; and adjust manipulation of the object based on the received pressure data.
Example 20. The robotic system of any one or more of examples 18-19, further comprising: pressure sensors integrated into the fingers, wherein the controller circuitry is configured to: receive pressure data from the pressure sensor; acquire visual feedback of a compressible calibration object's deformation; and calibrate the pressure sensors automatically based on the pressure data and the compressible calibration object's deformation.
While the foregoing has been described in conjunction with exemplary aspect, it is understood that the term “exemplary” is merely meant as an example, rather than the best or optimal. Accordingly, the disclosure is intended to cover alternatives, modifications and equivalents, which may be included within the scope of the disclosure.
Although specific aspects have been illustrated and described herein, it will be appreciated by those of ordinary skill in the art that a variety of alternate and/or equivalent implementations may be substituted for the specific aspects shown and described without departing from the scope of the present application. This application is intended to cover any adaptations or variations of the specific aspects discussed herein.