The ability to detect objects, such as a person or a can of soda, allows applications to exist that would not be possible otherwise, such as self-driving cars and pallet verification. The challenge with creating object detectors is that they require a lot of labeled images which need to be generated by hand. In an ideal scenario, an object detector would be trained with a set of labeled images that were captured with every single possible angle, lighting condition, camera, and camera setting an object could be captured with. In lieu of the ideal training set, a subset that is representative of the ideal set can be used to train object detectors by taking advantage of their ability to generalize. That is, an object detector trained only on a representative subset of the ideal set would be able to detect all objects in the ideal set.
An example training data image capture system disclosed herein includes a support surface on which an object to be imaged would be supported. At least one camera is mounted proximate the support surface and positioned to image an object on the support surface. More than one camera could also be used to capture more images more quickly.
At least one light is directed toward the support surface, where the object would be located. Preferably a plurality of lights are directed toward the support surface, again where the object would be located.
A computer is programmed to vary the lighting conditions from the at least one light and to record a plurality of images from the camera at a plurality of lighting conditions from the at least one light.
The computer may further be programmed to cause relative movement between the camera and the support surface between the plurality of images. For example, the computer may be programmed to cause the support surface to rotate relative to the camera. The computer may also be programmed to cause the camera to cause relative rotation between the camera and the object about a horizontal axis. For example, the camera may move along an arc relative to the support surface. The computer and camera record at least one of the plurality of images at each of a plurality of positions of the camera along the arc. The camera may be movable at least 90 degrees on the arc relative to the support surface.
The camera and computer record at least one of the plurality of images at each of a plurality of rotational positions of the support surface (and the object). The system may further include a backlight below the support surface, which may be translucent.
If a plurality of lights are used, the computer is programmed to control the plurality of lights to vary the intensities of each of the plurality of lights independently and to use different intensities and different combinations of intensities from the different lights for each of the plurality of images.
According to a method disclosed herein, a method for creating training data includes capturing a plurality of images of an object at a plurality of angles and capturing the plurality of images of the object under a plurality of different lighting conditions. The method may further include training a machine learning model based upon the plurality of images. The method steps may be controlled by a computer and the images recorded by the computer.
The method may further include providing relative motion between a camera and the object and recording images at varying relative positions. The computer may cause relative motion between the camera and the object and cause a camera to capture the plurality of images at the plurality of angles. The computer may cause at least one light to illuminate the object at a variety of intensities and cause the camera to capture the plurality of images at the variety of intensities.
The example imaging station described herein is designed to capture a representative subset of the ideal set. It is designed primarily for mostly-non-deformable objects, that is, objects that mostly do not move, flex or distort such as a can or plastic bottle of soda. Somewhat deformable objects could also be imaged.
The training station may do this in a scalable fashion in two main parts: one, by automatically capturing images of an object in most angles, many different lighting conditions, and a few different cameras; and two, by automatically segmenting the object from the background, which is used to automatically create labels for the object. The imaging station may also be designed to capture the weight and dimensions of an object.
The example imaging station includes a motorized camera mover that moves the camera in such a way that it captures all vertical angles of an object. The imaging station also includes a motorized pedestal that spins about a vertical axis, allowing the camera to see the object in all horizontal angles. The combination of these two devices allows the imaging station to see most angles of the object
To capture many different lighting conditions, the example imaging station includes a set of lights in many different positions around the object. The goal is to simulate directional lighting, glare, soft lighting, hard lighting, low lighting, and bright light scenarios. The imaging station may also include a device that can cast shadows on an object.
To capture images with a few different cameras, a mounting device can be attached to the camera moving devices, allowing for the attachment of a few different cameras. The camera settings for each camera can be automatically programmed.
To automatically segment the object from the background, the imaging station includes semi-transparent, smooth screens that are back lit using powerful lights. Being back-lit helps segment white objects on a white background. The back-lit screens may take advantage of a camera's Auto White Balance (AWB) feature, which adjusts an image so that the brightest white pixel is true-white, while all other whites appear to be a shade of gray. This creates a visual separation between the white object and the white background, which makes it possible to segment the object from the background. The rotating pedestal is also made up of a semi-transparent material that is lit from the inside out. The floor surrounding the pedestal may also be made of a semi-transparent material that is back lit.
The imaging station may also include a scale underneath the pedestal to capture the weight of an object.
One of the cameras mounted to the motorized camera mover may be a depth camera that produces a depth map in meters for each image. Using this depth map and the object segmentation, a 3-dimensional point cloud of the object is generated in real-world coordinate space. This point cloud allows the user to obtain the length, width and height of the object.
Surrounding the pedestal 2 are a plurality of directional soft lights 5 that cause directional lighting, and a plurality of glare lights 6 that cause glare. Behind the pedestal 2 is a semitransparent/translucent screen 3 and its corresponding back-light 9. A computer 7 is programmed to control the entire imaging station, including all of the lights 5, 6 (independently for each light, whether, when and how much to illuminate), the rotation of the pedestal 2, the position of the shuttle 18 and camera 22 on the arc 20, and the operation of the camera 22. The computer 7 also records the images from the camera 22. The computer 7 includes a processor and storage containing suitable programs which when executed by the processor perform the functions described herein.
Referring to
In step 36, the computer 7 controls the lights 5, 6, 9 to vary their intensity independently and to different degrees (including completely off) to produce a variety of lighting conditions on the object 8. At each lighting condition (1 to x), the computer 7 records another image of the object 8 with the camera 22 in step 34.
After all of the lighting conditions have been imaged at the first position, the computer 7 then controls the pedestal 2 and motor 28 to provide relative rotation about a vertical axis between the camera 22 and the object 8 in step 38. The computer 7 images the object 8 in step 34 for every lighting condition again in steps 34-36 at this rotational position.
After all of the rotational positions (1-y) have been imaged at all the lighting conditions (1-x), the camera 22 is moved along the arc 20 (again as controlled by the computer 7) in step 40 to the next relative position about a horizontal axis. At each position of the camera 22 along the arc 20, the pedestal 2 rotates the object 8 through a plurality of positions (1-y) and again, at each rotational position of the object 8, the computer 7 controls the lights 5, 6 to provide the variety of different lighting (optionally shadows) on the object 8, including glare or diffuse lighting (lighting conditions 1-x). Then the camera 22 is moved to the next position on the arc 20 in step 40 and so on. This is repeated for a plurality of positions of the camera 22 all along the arc, from less than zero degrees (i.e. looking up at the object 8) to 90 degrees (i.e. looking straight down onto the object 8). Of course, optionally, less than all of the permutations of the lighting conditions, vertical axis rotational positions, and horizontal axis rotational positions could be used.
The plurality of images of object 8 are collected in this manner by the computer 7 and sent to the machine learning model 21 (
In accordance with the provisions of the patent statutes and jurisprudence, exemplary configurations described above are considered to represent preferred embodiments of the inventions. However, it should be noted that the inventions can be practiced otherwise than as specifically illustrated and described without departing from its spirit or scope. Unless otherwise specified in the claims, alphanumeric labels on method steps are for ease of reference in dependent claims and do not indicate a required sequence.
Number | Date | Country | |
---|---|---|---|
63014243 | Apr 2020 | US |