The present disclosure relates to systems and methods that use imaging systems and associated image analysis techniques to detect, identify, and manipulate an article of silverware.
One costly aspect of operating a restaurant is dealing with used dishware and flatware. The process of collecting and cleaning dishware and flatware is a time intensive manual process but has the advantage of eliminating the use of single-use plastics.
It would be an advancement in the art to provide an improved approach for cleaning dishware and flatware.
Non-limiting and non-exhaustive embodiments of the present disclosure are described with reference to the following figures, wherein like reference numerals refer to like parts throughout the various figures unless otherwise specified.
In the following disclosure, reference is made to the accompanying drawings, which form a part hereof, and in which is shown by way of illustration specific implementations in which the disclosure may be practiced. It is understood that other implementations may be utilized and structural changes may be made without departing from the scope of the present disclosure. References in the specification to “one embodiment,” “an embodiment,” “an example embodiment,” etc., indicate that the embodiment described may include a particular feature, structure, or characteristic, but every embodiment may not necessarily include the particular feature, structure, or characteristic. Moreover, such phrases are not necessarily referring to the same embodiment. Further, when a particular feature, structure, or characteristic is described in connection with an embodiment, it is submitted that it is within the knowledge of one skilled in the art to affect such feature, structure, or characteristic in connection with other embodiments whether or not explicitly described.
Implementations of the systems, devices, and methods disclosed herein may comprise or utilize a special purpose or general-purpose computer including computer hardware, such as, for example, one or more processors and system memory, as discussed herein. Implementations within the scope of the present disclosure may also include physical and other computer-readable media for carrying or storing computer-executable instructions and/or data structures. Such computer-readable media can be any available media that can be accessed by a general purpose or special purpose computer system. Computer-readable media that store computer-executable instructions are computer storage media (devices). Computer-readable media that carry computer-executable instructions are transmission media. Thus, by way of example, and not limitation, implementations of the disclosure can comprise at least two distinctly different kinds of computer-readable media: computer storage media (devices) and transmission media.
Computer storage media (devices) includes RAM, ROM, EEPROM, CD-ROM, solid state drives (“SSDs”) (e.g., based on RAM), Flash memory, phase-change memory (“PCM”), other types of memory, other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store desired program code means in the form of computer-executable instructions or data structures and which can be accessed by a general purpose or special purpose computer.
An implementation of the devices, systems, and methods disclosed herein may communicate over a computer network. A “network” is defined as one or more data links that enable the transport of electronic data between computer systems and/or modules and/or other electronic devices. When information is transferred or provided over a network or another communications connection (either hardwired, wireless, or a combination of hardwired or wireless) to a computer, the computer properly views the connection as a transmission medium. Transmissions media can include a network and/or data links, which can be used to carry desired program code means in the form of computer-executable instructions or data structures and which can be accessed by a general purpose or special purpose computer. Combinations of the above should also be included within the scope of computer-readable media.
Computer-executable instructions comprise, for example, instructions and data which, when executed at a processor, cause a general purpose computer, special purpose computer, or special purpose processing device to perform a certain function or group of functions. The computer executable instructions may be, for example, binaries, intermediate format instructions such as assembly language, or even source code. Although the subject matter is described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the described features or acts described herein. Rather, the described features and acts are disclosed as example forms of implementing the claims.
Those skilled in the art will appreciate that the disclosure may be practiced in network computing environments with many types of computer system configurations, including, an in-dash vehicle computer, personal computers, desktop computers, laptop computers, message processors, hand-held devices, multi-processor systems, microprocessor-based or programmable consumer electronics, network PCs, minicomputers, mainframe computers, mobile telephones, PDAs, tablets, pagers, routers, switches, various storage devices, and the like. The disclosure may also be practiced in distributed system environments where local and remote computer systems, which are linked (either by hardwired data links, wireless data links, or by a combination of hardwired and wireless data links) through a network, both perform tasks. In a distributed system environment, program modules may be located in both local and remote memory storage devices.
Further, where appropriate, functions described herein can be performed in one or more of: hardware, software, firmware, digital components, or analog components. For example, one or more application specific integrated circuits (ASICs) can be programmed to carry out one or more of the systems and procedures described herein. Certain terms are used throughout the description and claims to refer to particular system components. As one skilled in the art will appreciate, components may be referred to by different names. This document does not intend to distinguish between components that differ in name, but not function.
It should be noted that the sensor embodiments discussed herein may comprise computer hardware, software, firmware, or any combination thereof to perform at least a portion of their functions. For example, a sensor may include computer code configured to be executed in one or more processors, and may include hardware logic/electrical circuitry controlled by the computer code. These example devices are provided herein purposes of illustration, and are not intended to be limiting. Embodiments of the present disclosure may be implemented in further types of devices, as would be known to persons skilled in the relevant art(s).
At least some embodiments of the disclosure are directed to computer program products comprising such logic (e.g., in the form of software) stored on any computer useable medium. Such software, when executed in one or more data processing devices, causes a device to operate as described herein.
The systems and methods described herein use a robotic apparatus and one or more imaging systems and associated image analysis techniques to detect and identify an article of silverware (also referred to herein as an “article of cutlery”). In some embodiments, a robotic actuator is used to engage an article of cutlery from a collection of articles of cutlery. The robotic actuator then presents the article of cutlery in a field of view of an imaging system that is configured to capture an image of the article of cutlery. A processing system is configured to process the image and identify a type of the article of cutlery, as described herein.
Robotic actuator 104 is configured to receive commands from a processing system 102 to engage an article of cutlery 112 using magnetic end effector 110. Magnetic end effector may also be referred to as a “magnetic gripper.” In some embodiments engaging article of cutlery 112 by robotic actuator 104 is accomplished by processing system 102 issuing a command to robotic actuator 104 to move magnetic end effector 110 proximate a plurality (i.e., a collection) of articles of cutlery that includes an article of cutlery 116, an article of cutlery 118, an article of cutlery 120, and an article of cutlery 122, disposed on a work surface 114. In some embodiments, the plurality of articles of cutlery may be in a container such as a work bin or a cutlery bin.
In some embodiments, article of cutlery 112 is randomly engaged by magnetic end effector 110 from the collection of articles of cutlery (i.e., article of cutlery 116 through article of cutlery 112). In other words, magnetic end effector 110 engages a random (i.e., unidentified) article of cutlery from the collection. Robotic actuator 104 is then commanded by processing system 102 to present article of cutlery 112 in a field of view 108 of an imaging system 106. Robotic actuator 104 may be commanded by processing system 102 to present article of cutlery 112 in field of view 108 in a specific spatial orientation. In some embodiments, imaging system 106 may include any combination of imaging devices such as ultraviolet (UV) cameras, infrared (IR) cameras, visible light RGB (red green blue) cameras, hyperspectral imaging cameras, high dynamic range (HDR) cameras, and so on. Different lighting systems such as tungsten lighting, fluorescent lighting, UV lighting, IR lighting, or other lighting systems may also be included in imaging system to appropriately illuminate article of cutlery 112.
In some embodiments, imaging system 106 is configured to capture an image of article of cutlery 112. In particular embodiments, a command to capture the image may be issued to imaging system 106 via processing system 102. Processing system 102 then receives the image and performs image processing on the image by, for example, running computer vision algorithms, to identify a type of article of cutlery 112. Details about these computer vision algorithms are provided herein. In
In some embodiments, once a type of article of cutlery 112 has been identified, processing system 102 is configured to sort the article of cutlery. For example, in
In some embodiments, a process of engaging an article of cutlery from a collection of a plurality of articles of cutlery includes processing system 102 issuing a command to robotic actuator 104 to move magnetic end effector 110 towards the collection. When magnetic end effector 110 is in a proximity to the collection, magnetic attraction from magnetic end effector 110 causes an article of cutlery to attach itself to magnetic end effector 110. In some embodiments, magnetic end effector 110 includes a permanent magnet that engages an article of cutlery from the collection via magnetic attraction when magnetic end effector 110 is close enough to the collection. In other embodiments, magnetic end effector includes an electromagnet that is normally deactivated, but is activated when magnetic end effector 110 is close enough to the collection such that an article of cutlery is attracted to and engaged by the electromagnet. Commands for activating and deactivating an electromagnet associated with magnetic end effector 110 may be issued by processing system 102.
In some embodiments, processing system 102 may issue spatial orientation commands to robotic actuator 104 that places article of cutlery 112 in field of view 108 of imaging system 106 in a specific orientation. In particular embodiments, spatial orientation commands may include placing article of cutlery 112 in field of view 108 in front of a predetermined background. In other embodiments, imaging conditions such as lighting and background associated with imaging system 106 capturing an image of article of cutlery 112 may be controllable. For example, once an article of cutlery is engaged by magnetic end effector 110 and placed in field of view 108, assuming that a pose of robotic actuator 104 relative to imaging system 106 is known and can be tracked by processing system 102, it is possible for processing system 102 to identify and predetermine desired poses of robotic actuator 104 for imaging that could potentially yield better contrast and signal to noise ratio as well as a better viewing angle.
As discussed herein, holding a single article of cutlery at a time in field of view 108 greatly improves an identification success probability of a computer vision system running on processing system 102, where the computer vision system is configured to identify a type of the article of cutlery. The computer vision system in this case can also then be used to identify objectionable issues with this article of cutlery. Some such objectionable issues are broken or bent cutlery, bent tines on forks, missing tines on forks or any other damage, or dirt remaining on the silverware, as discussed herein. These items could then be segregated from the items that are going to be put back into service in, for example, a restaurant.
Other enhancements to a computer vision system used by processing system 102 to detect and identify an article of cutlery include an ability to perform more subtle sorting that goes beyond sorting an article of cutlery based on its type (e.g., spoon, fork, or knife). In some embodiments, the type of an article of cutlery may be identified based on its category (spoon, fork, knife) and other attributes such as orientation (see discussion of
In some embodiments, silverware identification system 100 may be configured to generate multiple views of article of cutlery 112 by capturing multiple images of article of cutlery 112 using imaging system 106. For generating multiple views, multiple cameras may be included in imaging system 106, where each camera is configured to image article of cutlery 112 from a different perspective or angle of view. Using multiple cameras allows for relatively high-speed image acquisition for pose estimation. In other embodiments, additional flexibility may be added to silverware identification system 100 by configuring robotic actuator 104 such that robotic actuator 104 can modify poses in which article of cutlery 112 is presented to an imaging system 106.
In some embodiments, using a robotic actuator 104 to change a pose of an article of cutlery in front of imaging system 106 allows naturally for multiple views. This process essentially uses a plurality of spatial orientations associated with the article of cutlery when the article of cutlery is presented in field of view 108. Oftentimes, a single view, no matter how good the image is, is not sufficient for classification with a high confidence level. An alternative solution is to use multiple views associated with an article of cutlery. Once robotic actuator 104 engages an article of cutlery, it can present the article of cutlery with many different poses and scales in field of view 108 associated with imaging system 106. Variations in exposure time may be used to improve signal to noise ratios (SNRs) and contrast based on intermediate classification results. In this case, rather than taking a set of pictures and trying to combine them after collecting them, imaging system 106 acquires one image at a time, and processing system 102 outputs classification results based on what has been collected up to that time point to give feedback to processing system 102 via imaging system 106 to design a next imaging location, until silverware identification system 100 is highly confident about classification results produced by processing system 102.
In some embodiments, a process of placing an article of cutlery at a designated location includes processing system 102 commanding robotic actuator 104 to move article of cutlery 112 engaged (grasped) by magnetic end effector 110 to the designated location, and placing article of cutlery 112 at the designated location. In order to place article of cutlery 112 at the designated location, article of cutlery must be disengaged from magnetic end effector 110. If the magnet associated with magnetic end effector 110 is a permanent magnet, this disengagement may be done via mechanical methods such as having a mechanical fixture that holds article of cutlery 112 in place while robotic actuator pulls magnetic end effector 110 in an opposite direction to disengage magnetic end effector 110 from article of cutlery 112. If the magnet associated with magnetic end effector 110 is an electromagnet, disengagement may be done by deactivating the electromagnet to release article of cutlery 112. These disengagement methods may be used by silverware identification system 100 for not only placing article of cutlery in a designated location, but also sorting article of cutlery 112, as discussed herein.
In some embodiments, a collection of articles of cutlery includes spoons, forks, and knives, and silverware identification system 100 is configured to identify and place all the forks and all the knives from the collection at separated designated locations. When all the forks and knives in the collection have been identified and removed from the collection in this way, the remaining articles of cutlery in the collection are spoons. This constitutes an efficient sorting algorithm where computer vision algorithms associated with silverware identification system 100 need to detect and identify two types of articles of cutlery instead of three.
Some embodiments of silverware identification system 100 engage an article of cutlery before presenting the article of cutlery to imaging system 106. Other embodiments of silverware identification system 100 use imaging system 106 to identify a specific article of cutlery in a collection of articles of cutlery before engaging the article of cutlery. In restaurant applications, silverware identification system 100 can be used to sort a collection of dirty articles of cutlery before they are sent through a sanitizer in the restaurant. Or, if sorting is done after the articles of cutlery are run through the sanitizer, a sorting algorithm associated with silverware identification system could pick out forks and knives, leaving only the spoons, or any single item of cutlery. This method reduces the sorting time considerably, because a user does not have to spend time moving the items in the last batch of articles of cutlery, these items arrive at a sorted state by way of removing all of the other unlike items.
In some embodiments, processing system 102 includes a memory 204 that is configured to store data associated with silverware identification system 100. Data stored in memory 204 may be temporary data or permanent data. In some embodiments, memory 204 may be implemented using any combination of hard drives, random access memory, read-only memory, flash memory, and so on. In particular embodiments, data stored in memory 204 may include data from imaging system 106.
Some embodiments of processing system 102 may also include an imaging system interface 206 that is configured to interface processing system 102 with one or more imaging systems such as imaging system 106. In some embodiments, imaging system interface 206 may include any combination of connectivity protocols such as IEEE 1394 (FireWire), Universal Serial Bus (USB), and so on. Imaging system interface 206 allows processing system 102 to receive images from an associated imaging system, while also sending commands to the imaging system (e.g., a command to capture an image of article of cutlery 112 when article of cutlery 112 is in a field of view of imaging system 106).
In some embodiments, processing system 102 may include a robotic actuator interface 208 that is configured to interface processing system 102 with robotic actuator 104. Commands issued by processing system to robotic actuator 104 are transmitted to robotic actuator 104 via robotic actuator interface 208. Examples of such commands include a command to move robotic actuator 104 to a specific point in space, a command to activate an electromagnet associated with magnetic end effector 110, and so on. Other examples of general commands include positioning commands, object gripping commands, object release commands, object repositioning commands, and so on. In some embodiments, robotic actuator interface 208 may be configured to receive feedback data from robotic actuator 104. For example, data from a Hall effect sensor that is included in magnetic end effector 110 may generate electrical signals that indicate that magnetic end effector 110 has engaged or gripped multiple articles of cutlery, as described herein.
Processing system 102 may also include a processor 210 that may be configured to perform functions that may include generalized processing functions, arithmetic functions, and so on. Processor 210 may also be configured to perform three-dimensional geometric calculations and solve navigation equations in order to determine relative positions, trajectories, and other motion-related and position-related parameters associated with manipulating an article of cutlery by robotic actuator 104.
A user interface 212 may be included in processing system 102. In some embodiments, user interface 212 is configured to receive commands from a user or display information to the user. For example, commands received from a user may be basic on/off commands, and may include variable operational speeds. Information displayed to a user by user interface 212 may include, for example, system health information and diagnostics. User interface 212 may include interfaces to one or more switches or push buttons, and may also include interfaces to touch-sensitive display screens.
In some embodiments, processing system 102 includes an image analysis system 214 that is configured to process images of an article of cutlery captured by, for example, imaging system 106 to identify the article of cutlery. Image analysis system 214 may include subsystems that implement computer vision algorithms for processing the images, as described herein.
In some embodiments, processing system 102 includes a conveyor belt magnet driver 216 that is configured to rotate a magnet under a conveyor belt that carries articles of cutlery. By rotating the magnet, conveyor belt magnet driver 216 individually orients each article of cutlery on the conveyor belt so that the article of cutlery is in a specific spatial orientation relative to an imaging system that is configured to capture an image of the article of cutlery. Details about this embodiment are provided herein.
A data bus 218 interconnects all subsystems associated with processing system 102, transferring data and commands within processing system 102.
Once an object is detected, an object identifier 304 determines a type of the object (i.e., the article of cutlery). For example, object identifier 304 may be configured to determine whether the article of cutlery is a fork, a spoon or a knife. Algorithms that may be implemented by the object identifier 304 are outlined below.
In some embodiments, image analysis system 214 includes a background identifier 306 that is configured to perform image processing on an image captured by imaging system 106 to detect, identify and discriminate a background relative to an article of cutlery that is presented in field of view 118 by robotic actuator 104. Background identifier 306 effectively allows image analysis system 214 to distinguish an article of cutlery from background information in an image. In some embodiments, the background may be a predetermined background such as a solid-colored background. In other embodiments, it may not be possible to provide a standard background and the background may contain distracting elements that are rendered in the image. In this case, imaging conditions such as lighting and background associated with imaging system 106 capturing an image of article of cutlery 112 may be controllable. Note that background identification may make image analysis more accurate but may be omitted in some embodiments. For example, one may use a black colored non-reflective surface so that cutlery 112 is more discernible compared to those on a metal workspace surface, which would obscure the edges of the cutlery articles. This improves the accuracy and robustness as well as the confidence of the inference results from the detection and identification models.
A dirt detector 307 and a damage detector 308 included in image analysis system 214 are respectively configured to determine a presence of dirt or damage on an article of cutlery. Examples of dirt include food soils and stains on the article of cutlery after the article of cutlery has been used. Examples of damage on an article of cutlery include bent or broken tines on a fork, or gouges or pitting on a spoon.
In some embodiments, when magnetic end effector 110 attempts to engage an article of cutlery from a collection (plurality) of articles of cutlery, multiple articles of cutlery that are stuck together may be simultaneously engaged by magnetic end effector 110. For example, a blade of a knife may be stuck within the tines of a fork. A stuck objects detector 310 included in imaging system 214 is configured to detect and identify such stuck articles of cutlery.
In some embodiments, robotic actuator interface 208 includes a magnetic end effector controller 404 that is configured to issue commands to magnetic end effector 110. For example, if magnetic end effector 110 includes an electromagnet then magnetic end effector controller 404 issues activate or deactivate commands to magnetic end effector. These activate or deactivate commands either energize or de-energize the electromagnet respectively. In other embodiments, if magnetic end effector 110 includes a permanent magnet then magnetic end effector controller 404 may issue commands to a mechanical apparatus that engages or disengages an article of cutlery using the permanent magnet.
Although embodiments disclosed herein are described as using a magnetic end effector 110, other types of magnetic end effectors may be used in its place, such as mechanical gripper including fingers that may be rotated or moved translationally relative to one another to pinch and release items.
In some embodiments, magnetic end effector 110 includes a Hall effect sensor that is used by silverware identification system 100 to detect whether multiple articles of cutlery have been engaged by robotic actuator 104, as discussed herein. Outputs generated by the Hall effect sensor are received by a Hall effect sensor interface 406 that is included in robotic actuator interface 208. These received outputs from the Hall effect sensor are further processed by processing system 102 to determine whether magnetic end effector 110 has engaged multiple articles of cutlery. In other embodiments, images generated by imaging system 106 are processed by processing system 102 to determine whether magnetic end effector 110 has engaged multiple articles of cutlery. In other embodiments, an inductive sensor may be used in the place of the Hall effect sensor in order to detect presence of multiple articles of cutlery, i.e. an amount of metal present may be sensed by detecting variation in a sensed inductance of an inductive coil incorporated into the magnetic end effector 110. The amount of metal present may then be used to estimate a number of articles of cutlery present, e.g. a predefined mapping between the measured inductance of one or more inductive coils and a number of cutlery present may be determined by measurement and used by the processing system 102 to determine the number of cutlery present for a given resonant frequency. The manner in which the inductive loop, circuit, and resonant frequency sensing is performed may be according to any approach known in the art.
In some embodiments, processing system 102 may be configured to issue actuation commands to a combination of robotic actuator 506 and magnetic end effector 518, where actuation commands include positioning commands for robotic actuator 506 relative to coordinate system 502, and activation or deactivation commands associated with magnetic end effector 518.
In some embodiments, processing system 102 commands robotic actuator 506 to move in an X-Y direction, to a proximity of a collection of articles of cutlery placed on a work surface 510. In some embodiments, the collection of articles of cutlery may be in a cutlery bin (not shown). When robotic actuator 506 is in a proximity of the collection of articles of cutlery placed on work surface 510, processing system 102 commands robotic actuator 506 to move magnetic end effector 518 in a Z-direction referenced to coordinate system 502, towards the collection of articles of cutlery placed on work surface 510. Processing system 102 then issues a command to activate magnetic end effector 518, so that magnetic end effector 518 engages and grips a random, unidentified article of cutlery from the collection of articles of cutlery placed on work surface 510. For example, in
In some embodiments, processing system 102 commands robotic actuator 506 to spatially orient article of cutlery 508 so that article of cutlery 508 is in a field of view of imaging system 106 that is communicatively coupled with processing system 102. Processing system 102 then commands imaging system 106 to capture an image of article of cutlery 508 and transmit the image to processing system 102. Processing system 102 processes the image to identify a type of article of cutlery 508. Once a type of an article of cutlery 508 has been identified by processing system 102, processing system 102 commands robotic actuator to move along support rail 504 towards and deposit the article of cutlery on one of a workspace 512 that contains forks, a workspace 514 that contains spoons, and a workspace 516 that contains knives. In
In some embodiments, imaging system 106 continues to capture images of article of cutlery 612; processing system 102 receives these images and processes these images to determine and track an orientation of article of cutlery 612 as it rotates under the influence of the magnetic field of rotating magnet 608. When article of cutlery 612 is in a predetermined final orientation as determined by processing system 102, processing system 102 stops the rotation of magnet 608, and article of 612 assumes a final orientation relative to conveyor belt 602. Conveyor belt 602 then moves article of cutlery 612 away from field of view 606 to make way for a subsequent article of cutlery. Conveyor belt 602 moves away a set of oriented articles of cutlery 616 that are now arranged substantially parallel to one another in a specified orientation as illustrated in
In some embodiments, magnetic end effector 700 includes a Hall effect sensor 706 that is configured to detect a perturbation or deviation in a magnetic field generated between North pole 702 and South pole 704. When magnetic end effector 700 does not engage an article of cutlery, this magnetic field is not perturbed. When a single article of cutlery is engaged by magnetic end effector 700, the magnetic field between North pole 702 and South pole 704 is perturbed. This perturbation is detected by Hall effect sensor 706. In some embodiments, Hall effect sensor 706 is configured to transmit signals associated with perturbations in the magnetic field between North pole 702 and South pole 704 to Hall effect sensor interface 406.
In some embodiments, if magnetic end effector 700 engages multiple articles of cutlery, a corresponding perturbation in the magnetic field between North pole 702 and South pole 704 is different from the perturbation in this magnetic field corresponding to when a single article of cutlery is engaged by magnetic end effector 700. Hall effect sensor 706 corresponding outputs a different signal when multiple articles of cutlery are engaged as compared to when a single article of cutlery is engaged. This difference in signals output by Hall effect sensor 706 is detected by processing system 102 to determine whether multiple articles of cutlery are engaged by magnetic end effector 700. In an event that multiple articles of cutlery are engaged by magnetic end effector 700, processing system 102 may command magnetic end effector 700 to release the multiple articles of cutlery and attempt to re-engage a single article of cutlery.
If the method determines that multiple articles of cutlery are engaged, then the method goes to A, with a continued description in the description of
On the other hand, if method 1200 is at B, the method goes to 1218, where the processing system identifies a type of the article of cutlery based on the image. In some embodiments, aligning of the article of cutlery in a specific orientation by the robotic actuator and the predetermined background allow the processing system to detect and identify the article of cutlery with a higher confidence level. Finally, at 1220, the processing system detects a presence of dirt or damage on the article of cutlery based on the image. The method 1200 then terminates at 1222.
Note that step 1218 may be another application of an object detector that detects an article of cutlery, finds a bounding box associated with the article of cutlery, and classifies the article of cutlery to obtain a label representing the type of article of cutlery. The dirt and damage detection of step 1220 may also be another application of dirt detection previously described for the imaging blocks of
For a pile of many articles of cutlery, the object detector may or may not locate graspable cutlery pieces. Or even when the object detector produces bounding boxes with some cutlery articles, those pieces may be buried under other pieces and the bounding boxes are associated with very low confidence scores. Either when the bounding boxes are associated with confidence scores below a predetermined threshold or when there are no bounding boxes, the processing system 102 may instruct the robotic actuator 104 to stir the pile of cutlery articles and subsequently execute the object detector on the stirred pile. This step could be repeated until there are cutlery articles detected that produce bounding boxes with high confidence scores.
The stuck object detector 310 may be implemented in multiple ways. In one embodiment, there is a classification model. The classification model receives a region of an image in which the stuck objects are shown. Such a region is an output from the object detection in the form of bounding box. The model generates a predicted label with two categories. One indicates that the objects in the input image are not stuck together or there is only a single object and the other indicates that the objects are stuck together. Examples of classification models include, but are not limited to, the deep CNN architectures such as ResNets, DenseNets, SENets, and their variations.
Next, at 1306, the processing system commands a robotic actuator to perform mechanical actions on the portions to separate individual articles of cutlery. For example, the processing system may command the robotic actuator to stir the articles of cutlery as described in method 1200. Or, the processing system may command the robotic actuator to engage the articles of cutlery that are stuck together and mechanically manipulate the articles of cutlery to separate them. In some embodiments, this mechanical manipulation is done by physically agitating or shaking the articles of cutlery that are stuck together. In other embodiments, the robotic actuator may use different kinds of grippers to separate the articles of cutlery that are stuck together.
At 1308, the processing system commands the robotic actuator to engage an individual article of cutlery. In some embodiments, the robotic actuator engages 1310 an individual article of cutlery using a magnetic end effector such as magnetic end effector 110. At 1312, the robotic actuator presents the article of cutlery in a field of view of the imaging system. Next, at 1314, the imaging system captures an image of the article of cutlery. Finally, at 1316, the processing system identifies a type of the article of cutlery based on the image. Step 1316 may include performing the functions of the object identifier 304 with respect to the image as described above with respect to
For example, an image 1400 may be received by the processing system 102 from the imaging system 106. The object detector 302 identifies two-dimensional (2D) bounding boxes 1402 of objects present in the image 1400 and classifies the object within each bounding box, such as a fork, spoon, knife, or other item of cutlery.
The input to the object identifier 304 may be composed of one or more images 1400 and the model produces 2D bounding boxes of the cutlery articles of interest in the form of the center, width, and height of the 2D bounding box, e.g. these values may be represented as pixel coordinates within the image 1400. Note that the sides of the bounding box 1402 may be parallel to the sides of the image 1400. Other references to a 2D bounding box herein below may be defined in a similar manner. The object detector model 302 can be a single-stage or two or more staged CNN such as Faster R-CNN, SSD, YOLOv3 and Mask R-CNN.
Each bounding box may also have a confidence score generated by the machine vision algorithm. The confidence score may indicate a probability that an object is present in the 2D bounding box and may additionally or alternatively indicate a confidence that a classification of the object present in the 2D bounding box is correct.
The object detector 302 may be trained before it is deployed for making online inference in the system 100. The data for training may include training images in which articles of cutlery are placed whose bounding boxes for each object along with the class label of the object are annotated by humans. Both the bounding boxes and the image are input to the object detector 302 from which the estimated 2D bounding boxes 1402 are produced. The object detector 302 is trained until the magnitude of the difference between the estimated bounding boxes 1402 and the human annotated bounding boxes becomes lower than a specified threshold. The magnitude of the difference may be measured by a mean squared difference of the two bounding boxes. The minimization may be performed by stochastic gradient descent algorithms or their variants. There exist several popular software libraries that may be used to perform training, such as PyTorch and TensorFlow.
The imaging system 106 (e.g., cameras) may be calibrated relative to the robotic actuator 104. The mathematical transformation between the imaging system 106 and the robotic actuator 104 may be generated as a result of calibration. Such a calibration is known as robot hand-eye coordination and may be performed using software libraries that are widely available. The calibration output may be used to map image pixel coordinates in the image 1400 to physical coordinates in the workspace 114 and to position and orient the robotic actuator such that the actuator can grasp the target cutlery at an intended location with a desired orientation. The orientation of an item of cutlery may be determined using the approach described below with respect to
The object detector may be trained in various ways to handle the case of cluttered cutlery in which items may be on top of one another.
In another approach, only the articles of cutlery that are in the pile and are not occluded by other articles of cutlery are labeled with a bounding box and used to train a detector. This is shown in
In yet another embodiment, only the articles of cutlery that are not buried in the pile and can be manipulated are labeled with a bounding box and used to train a detector. For example, in the example of
Referring to
Then, an oriented bounding box (OBB) calculator 1504 uses geometric computer vision to find the an OBB 1506 that tightly encloses the mask 1502. One way to find the OBB 1506 for the mask is to use the function minAreaRect from the open source computer vision library, OpenCV. This function utilizes a computational geometry method called rotating calipers to find the rectangle with the smallest area that encloses the given mask 1506. Given the OBB 1508, a polarity calculator 1508 can be utilized to identify the polarity 1510 of the cutlery enclosed in the OBB 1506. For example, this may include determining the angle of the OBB 1506 using the OBB calculator 1504 followed by determining the polarity of the cutlery within the OBB 1506 to avoid confusion as to the orientation of the article of cutlery in the OBB 1506. For example, in the illustrated example, polarity 1510 output by the polarity calculator 1508 indicates which end of the OBB 1506 is closest to the handle of the illustrated spoon. In other examples, polarity calculator 1508 indicates which end of the OBB 1506 is closest to the handle of a fork, knife, or other type of cutlery. In some embodiments, the polarity calculator 1508 determines the end of the OBB closest to the bowl of a spoon, tines of a fork, blade of a knife, etc., rather than the end closest to the handle. The polarity calculator may be a CNN trained to identify polarity or a machine vision algorithm that compares predefined masks for different types of cutlery to the mask 1506 to estimate the polarity of the cutlery represented by the mask 1506.
In an alternative embodiment, the OBB 1506 of cutlery in a bounding box 1402 is calculated directly by the object detector 302, i.e. the object detector 302 is or includes a CNN trained to perform that function without first computing a mask 1502. Accordingly, the detector would propose an OBB 1506 that may be represented as (x, y, theta, height, width) with theta representing the angle of the OBB. The polarity calculator 1508 may then be a simple classifier that identifies the polarity as discussed above except that it is trained to operate on images of cutlery rather than masks of images.
Referring to
Note that the approach described above for determining the OBB 1506 and polarity 1510 of an article of cutler may be used instead of performing oriented and ordered placement of cutlery during sorting in other embodiments disclosed herein (see
Once the OBB 1506 and polarity of an item of cutlery is known, the processing system 102 may control the robotic actuator 104 to grasp and move the article of cutlery according to the methods described above based on the known orientation and polarity, e.g. engaging the end effector with the handle of the article of cutlery.
In another alternative embodiment, an OBB 1506 and polarity 1510 are determined by the detector 302 as shown in
In some embodiments, identification of 2D bounding boxes 1402 of objects may be performed using a first CNN trained to perform that task and classification of the object enclosed by a 2D bounding box may be performed by a second CNN trained to perform that task. Multiple second CNNs may be used, each second CNN trained to output whether a particular type of cutlery is present. Alternatively, a single CNN may be trained to both detect and classify cutlery and thus function as both the object detector 302 and the object identifier 304.
The output of the object identifier 304 may be an object label 1700 for each bounding box 1402 that indicates the classification of the cutlery present in the bounding box 1402. The object label 1500 may further include a confidence score indicating a probability of accuracy of the label.
The object identifier 304 may be implemented in the form of an image classifier based on convolutional neural network (CNN) models including, but not limited to, ResNets, DenseNets, SENets, and their variants. The input to the object identifier 304 may include one or more images of an item of cutlery which are then processed the CNN models in order to infer one or more classifications. In particular, bounding boxes 1402 of an item of cutlery form one or more cameras may be processed in order to classify the item of cutlery represented in the one or more bounding boxes 1402.
These CNN models may be trained in a similar manner to the object detector 302 using software libraries such as PyTorch or TensorFlow. The data for training the models may include of images of cutlery articles annotated with the category labels (e.g., large fork or a number that represents such a class or category) of cutlery represented in the images. When the model is deployed in the system 100 to generate an inference result, the input to such a model may be a (cropped) image region that contains only a single article of cutlery, i.e. a single bounding box 1402.
In some embodiments, the input to the object identifier 304 is the portion of an image in the OBB 1506 of an item of cutlery either with or without annotation with the polarity 1510. For example, the x, y, height, width, and theta values defining the OBB 1506 may be input along with the image 1400 to the object identifier, which then classifies the portion of the image enclosed by the OBB 1506.
Referring to
The object detector 1800 may be a machine vision algorithm trained to identify a particular type of anomaly or multiple types of anomalies. The machine vision algorithm may be a CNN or other type of machine learning algorithm. The CNN may be trained to both identify and classify anomalies. Alternatively, a first CNN may be trained to identify 2D bounding boxes of anomalies and a second CNN may be trained to label each 2D bounding boxes with the anomaly bounded by it. Note that multiple second CNNs may be used, each second CNN trained to output whether a particular type of anomaly is present in a 2D bounding box.
The output of object detector 1800 is a 2D bounding box 1802 around any anomalies detected. Each 2D bounding box may be labeled with a type of the anomaly (damage, dirt, type of contaminant, type of damage) bounded by the 2D bounding box.
The object detector 1800 may be implemented as one or both of detectors and classifiers. The object detector 1800 may be trained in an identical way to that used for training object detectors 302 and object identifiers 304. When the models of the object detector 1800 are embodied as classifiers, the input to such models is an image region containing only one piece of cutlery, often produced by an object detector 302 in an earlier stage (e.g., area in bounding box 1402 or OBB 1506. The output from such models is a label representing whether the article of cutlery in the input image is clean, dirty, or damaged in the simplest level. In an alternative embodiment, the labels can be more fine-grained. For example, the labels could be clean, modestly dirty, very dirty to suggest the amount of additional cleaning required. Similarly, there can be a variety of types of damages such as bent, broken tines (for fork), scratches, and etc. to inform and suggest further actions to the end users.
When the object detector 1800 is a detector rather than (or in addition to) a classifier, the models again receive a region of the image which contains only one piece of cutlery. The output in this case is a set of bounding boxes 1802 that show dirty or damaged spots and each box is associated with a label whether it is a dirt or a damage or possibly both. Another embodiment, similarly to the classifier models, may have more fine-grained labels such as the type of damages (e.g., broken tines). These outputs could be used to guide additional user actions such as further cleaning, discarding, and further repair.
Referring to
For example, as shown in
In some embodiments, the visualization algorithm 1900 is implemented as a neural network class activation map visualization tool such as Grad-CAM(++), which produces the heat map 1902. The heat map 1902 may then be used to infer the sizes of dirty spots which are approximately proportional to those of the hot spots in the heat map 1902.
Referring to
As shown in
Alternatively, as shown in
Referring again to
As an alternative, the image 2104 may be processed by an image classifier 2112 of the processing system 102 that is trained to output a number of objects detected in an image or whether an image includes no objects, a single item of cutlery, or multiple items of cutlery. For example, the image classifier 2112 may be a CNN trained with images that are each annotated with the scenario present in the image (no objects, a single item of cutlery, or multiple items of cutlery) or the number of items present in the image. The result of the image classifier 2112 is therefore an estimate 2114 of the number of objects present.
Referring again to
The scattered items 2106 of cutlery may then be grasped by the magnetic end effector 110 or by a separate end effector 110 that may be magnetic or non-magnetic, e.g. an actuated gripper or other type of end effector. Note also that the magnetic end effector 110 may be embodied as an actuated gripper (e.g., the pinch gripper discussed above) or other type of end effector that may engage individual items of cutlery or multiple items of cutlery according to the methods described above.
While various embodiments of the present disclosure are described herein, it should be understood that they are presented by way of example only, and not limitation. It will be apparent to persons skilled in the relevant art that various changes in form and detail can be made therein without departing from the spirit and scope of the disclosure. Thus, the breadth and scope of the present disclosure should not be limited by any of the described exemplary embodiments, but should be defined only in accordance with the following claims and their equivalents. The description herein is presented for the purposes of illustration and description. It is not intended to be exhaustive or to limit the disclosure to the precise form disclosed. Many modifications and variations are possible in light of the disclosed teaching. Further, it should be noted that any or all of the alternate implementations discussed herein may be used in any combination desired to form additional hybrid implementations of the disclosure.