Man-made environments are generally endowed with a preferred direction corresponding to the local orientation of earth's gravity field. In simple terms, “up” and “down” define natural engineering directions for indoor settings (e.g., rooms) and outdoor settings (e.g., streets). Floors, walls, and ceilings are strongly constrained by the direction of local gravity. In particular, man-made environments are usually populated by horizontal Z-planes (e.g., tabletops, chair seats, floors, sidewalks).
As a person walks around a man-made environment while holding a 3D time-of-flight (TOF) imaging system in hand, the sensor's angular orientation relative to the natural “up” and “down” directions is typically unknown. Humans do not reliably align imaging systems to their environments, and having users align and re-align sensors to match their environment can be a time-consuming and frustrating process.
One potential application of TOF imaging systems is determining the dimensions of boxes. Measuring volumes of physical objects is a basic problem for various industrial and consumer markets, such as packing, shipping, and storage of objects. In typical packing and shipping contexts, humans use tape measures to measure box dimensions, which is a time-consuming process. Existing technical solutions are often fragile, expensive, and/or can only be used in certain settings. For example, some dimensioning solutions rely on fixed frames of reference, e.g., deriving the volume of a box placed on a designated surface from an image taken by a camera at a fixed position relative to the designated surface.
To provide a more complete understanding of the present disclosure and features and advantages thereof, reference is made to the following description, taken in conjunction with the accompanying figures, wherein like reference numerals represent like parts, in which:
Overview
The systems, methods and devices of this disclosure each have several innovative aspects, no single one of which is solely responsible for all of the desirable attributes disclosed herein. Details of one or more implementations of the subject matter described in this specification are set forth in the description below and the accompanying drawings.
Reliable identification of Z-planes (e.g., a floor, a street, a tabletop) in an environment can be useful in many two-dimensional and three-dimensional image processing applications. In particular, it is useful to determine a time-of-flight sensor's roll and pitch angles relative to a Z-plane, as well as the sensor's height relative to Z-planes in its environment. As used herein, a Z-plane is a plane in a real-world environment that is parallel to the ground in a particular environment. Z-planes include the ground or floor, and surfaces parallel to the ground or floor. In many cases, the Z-plane is perpendicular to the direction of gravity. In some cases, e.g., on a hill or other slanted surface, Z-planes (e.g., the ground, a table resting on the ground) may be somewhat tilted with respect to the direction of gravity.
A base Z-plane is the lowest Z-plane within an image that captures an environment. For example, in an image of an environment that includes a box resting on a table placed on a floor, the box top, tabletop, and floor are all Z-planes, and the floor is the base Z-plane. If another image includes the box and the tabletop but does not include the floor, the tabletop is the base Z-plane for that image.
Methods and systems for identifying Z-planes in an environment and, in some cases, identifying a base Z-plane in an environment, are described herein. The method involves extracting parameters for the roll and pitch rotation angles relative to a Z-plane. In some embodiments, the method also extracts a parameter for the height of the sensor relative to the base Z-plane from a single input TOF depth frame. Once the two rotation parameters and translation parameters are extracted, the number of a priori unknown extrinsic camera calibration parameters is reduced from six (3 translations+3 rotation angles) to three (2 translations+1 rotation angle). Time-of-flight applications become easier and faster for processing systems to handle when the number of unknown sensor degrees of freedom is reduced in this way. In addition, aligning coordinate system axes to a Z-plane simplifies time-of-flight imagery exploitation in several applications, such as box dimensioning, object dimensioning, box packing, or obstacle detection.
Methods and systems for measuring dimensions of a box are also described herein. One method involves receiving a three-dimensional point cloud obtained from time-of-flight data and identifying a box within the point cloud. In particular, the method includes identifying a box top within the point cloud, and then identifying a surface on which the box is resting, such as a tabletop or the floor. The method then includes calculating the height of the box as the distance between the box top and the surface on which the box is resting, and identifying edges of the box top. The method then includes calculating width and length profiles for the edges, and determining a width and a length for the box based on the width and length profiles. Quantitative height, width, and length values, e.g., measured in centimeters, may be reported to a user, e.g., on a display of a TOF measurement device. In some examples, the device also generates a visualization of the identified box superimposed on an image of the box so that the user may qualitatively confirm the calculated dimensions.
Existing box dimensioning solutions are typically highly vulnerable to sunlight because sunlight creates significant noise in TOF data or image data. Prior box dimensioning systems were only suitable for indoor use or under particular lighting conditions. In some embodiments described herein, TOF measurement data is filtered to reduce the impact of visual noise, enabling the TOF sensor system to be used in a variety of ambient lighting conditions, including both indoors and outdoors. In one example, the measurement data is filtered at a first stage for identifying Z-planes in the observed environment. Because Z-planes are relatively large, an aggressive filter (e.g., a large filter window) can be used. As noted above, after the box has been identified using the Z-planes, the box edges are identified. A finer filter (e.g., a smaller filter window) may be used to filter the measurement data for finding the box edges, since more precision is needed at this stage.
One embodiment provides a method for identifying a Z-plane. The method includes receiving distance data describing distances between a sensor that captured the distance data and a plurality of surfaces in an environment of the sensor, where at least one of the surfaces is a Z-plane; generating a point cloud based on the distance data, the point cloud in a frame of reference of the sensor; identifying a basis vector representing a peak direction across the point cloud; transforming the point cloud into a frame of reference of the basis vector; and identifying a Z-plane in the transformed point cloud.
Another embodiment provides an imaging system that includes a TOF depth sensor and a processor. The TOF depth sensor obtains distance data describing distances between the TOF depth sensor and a plurality of surfaces in an environment of the TOF depth sensor. The processor receives the distance data from the TOF depth sensor; generates a point cloud based on the distance data, the point cloud in a frame of reference of the TOF depth sensor; identifies a basis vector representing a peak direction across the point cloud; transforms the point cloud into a frame of reference of the basis vector; and identifies a Z-plane in the transformed point cloud.
Yet another embodiment provides a method for determining dimensions of a physical box. The method includes receiving distance data describing distances between a sensor and a plurality of surfaces in an environment of the sensor, at least a portion of the surfaces corresponding to a box to be measured; transforming the distance data into a frame of reference of one of the surfaces in the environment of the sensor; selecting, from the plurality of surfaces in the environment of the sensor, a first surface corresponding to a top of the box and a second surface corresponding to a surface the box is resting on; calculating a height between the first surface and the second surface; and calculating a length and a width based on the selected first surface corresponding to the top of the box.
Another embodiment provides an imaging system including a TOF depth sensor and a processor. The TOF depth sensor obtains distance data describing distances between the TOF depth sensor and a plurality of surfaces in an environment of the TOF depth sensor. The processor receives the distance data from the TOF depth sensor; transforms the distance data into a frame of reference of one of the surfaces in the environment of the sensor; selects a first surface corresponding to a top of the box and a second surface corresponding to a surface the box is resting on; calculates a height between the first surface and the second surface; and calculates a length and a width based on the selected first surface corresponding to the top of the box.
As will be appreciated by one skilled in the art, aspects of the present disclosure, in particular aspects of identifying a Z-plane and determining box dimensions based on TOF imagery, described herein, may be embodied in various manners (e.g., as a method, a system, a computer program product, or a computer-readable storage medium). Accordingly, aspects of the present disclosure may take the form of a hardware embodiment, a software embodiment (including firmware, resident software, micro-code, etc.), or an embodiment combining software and hardware aspects that may all generally be referred to herein as a “circuit,” “module” or “system.” Functions described in this disclosure may be implemented as an algorithm executed by one or more hardware processing units, e.g. one or more microprocessors, of one or more computers. In various embodiments, different steps and portions of the steps of each of the methods described herein may be performed by different processing units. Furthermore, aspects of the present disclosure may take the form of a computer program product embodied in one or more computer-readable medium(s), preferably non-transitory, having computer-readable program code embodied, e.g., stored, thereon. In various embodiments, such a computer program may, for example, be downloaded (updated) to the existing devices and systems (e.g. to the existing perception system devices and/or their controllers, etc.) or be stored upon manufacturing of these devices and systems.
The following detailed description presents various descriptions of specific certain embodiments. However, the innovations described herein can be embodied in a multitude of different ways, for example, as defined and covered by the claims and/or select examples. In the following description, reference is made to the drawings where like reference numerals can indicate identical or functionally similar elements. It will be understood that elements illustrated in the drawings are not necessarily drawn to scale. Moreover, it will be understood that certain embodiments can include more elements than illustrated in a drawing and/or a subset of the elements illustrated in a drawing. Further, some embodiments can incorporate any suitable combination of features from two or more drawings.
The following disclosure describes various illustrative embodiments and examples for implementing the features and functionality of the present disclosure. While particular components, arrangements, and/or features are described below in connection with various example embodiments, these are merely examples used to simplify the present disclosure and are not intended to be limiting. It will of course be appreciated that in the development of any actual embodiment, numerous implementation-specific decisions must be made to achieve the developer's specific goals, including compliance with system, business, and/or legal constraints, which may vary from one implementation to another. Moreover, it will be appreciated that, while such a development effort might be complex and time-consuming; it would nevertheless be a routine undertaking for those of ordinary skill in the art having the benefit of this disclosure.
In the Specification, reference may be made to the spatial relationships between various components and to the spatial orientation of various aspects of components as depicted in the attached drawings. However, as will be recognized by those skilled in the art after a complete reading of the present disclosure, the devices, components, members, apparatuses, etc. described herein may be positioned in any desired orientation. Thus, the use of terms such as “above”, “below”, “upper”, “lower”, “top”, “bottom”, or other similar terms to describe a spatial relationship between various components or to describe the spatial orientation of aspects of such components, should be understood to describe a relative relationship between the components or a spatial orientation of aspects of such components, respectively, as the components described herein may be oriented in any desired direction. When used to describe a range of dimensions or other characteristics (e.g., time, pressure, temperature, length, width, etc.) of an element, operations, and/or conditions, the phrase “between X and Y” represents a range that includes X and Y.
Other features and advantages of the disclosure will be apparent from the following description and the claims.
TOF System Overview
The TOF sensor 110 collects distance data describing a distance between the TOF sensor 110 and various surfaces in the environment of the TOF sensor 110. The TOF sensor 110 may contain a light source, e.g., a laser, and an image sensor for capturing light reflected off the surfaces. In some embodiments, the TOF sensor 110 emits a pulse of light and capture multiple image frames at different times to determine an amount of time for the light pulse to travel to the surface and be returned to the image sensor. In other embodiments, the TOF sensor 110 detects phase shifts in the captured light, and the phase shifts indicate the distance between the TOF sensor 110 and various surfaces. In some embodiments, the TOF sensor 110 may generate and capture light at multiple different frequencies. If the TOF sensor 110 emits and captures light at multiple frequencies, this can help resolve ambiguous distances and help the TOF sensor 110 work at larger distance ranges. For example, if, for a first frequency, a first observed phase may correspond to a surface 0.5 meters, 1.5 meters, or 2.5 meters away, and, for a second frequency, a second observed phase may correspond to a surface 0.75 meters, 1.5 meters, or 2.25 meters away, by combining the two observations, the TOF sensor 110 can determine that the surface is 1.5 meters away. Using multiple frequencies may also improve robustness against noise caused by particular frequencies of ambient light, whether phase shift or pulse return time are used to measure distance. In alternate embodiments, different types of sensors may be used instead of and/or in addition to the TOF sensor 110 to obtain distance data.
The processor 120 receives distance data from the TOF sensor 110 and processes the distance data to identify various features in the environment of the TOF sensor 110, as described in detail herein, e.g., with respect to
A camera 130 may capture image frames of the environment. The camera 130 may be a visual light camera that captures images of the environment in the visible range. In other embodiments, the camera 130 is an infrared (IR) camera captures IR intensities of the surfaces in the sensor system's environment. The field of view of the camera 130 and TOF sensor 110 are partially or fully overlapping, e.g., the field of view of the camera 130 may be slightly larger than the field of view of the TOF sensor 110. The camera 130 may pass captured images to the processor 120. In some embodiments, two processors or processing units may be included, e.g., a first processing unit for performing the Z-plane identification and box dimensioning algorithms described herein, and a second graphical processing unit that receives images from the camera 130 and generates displays based on the images and data from the first processing unit. In some embodiments, image data from the camera 130 may be used to determine a level of sunlight in the environment of the TOF sensor 110. In alternate embodiments, the sensor system 100 may include a separate light sensor for detecting sunlight or other ambient light conditions in the environment of the TOF sensor 110.
The display device 140 provides visual output for a user of the sensor system 100. For example, the display device 140 may display box dimensions and/or a box volume calculated by the processor 120 based on distance data from the TOF sensor 110. In some embodiments, the display device 140 displays an image obtained by the camera 130 and overlays visual imagery indicating one or more features identified in the field of view of the camera 130 and TOF sensor 110 based on the distance data. For example, the processor 120 may instruct the display device 140 to display an outline of a box over an image of the box obtained by the camera 130. A user can use this display to determine whether the sensor system 100 has correctly identified the box and the box's edges. The sensor system 100 may include additional or alternative input and/or output devices, e.g., buttons, a speaker, a touchscreen, etc.
The memory 150 stores data for the sensor system 100. For example, the memory 150 stores processing instructions used by the processor 120 to identify features in the environment of the TOF sensor 110, e.g., instructions to identify one or more Z-planes and/or to calculate box dimensions of an observed box. The memory 150 may temporarily store data and images obtained by the camera 130 and/or TOF sensor 110 and accessed by the processor 120. The memory 150 may further store image data accessed by the display device 140 to generate an output display.
For example, a first pixel 210a has a ray direction 215a that extends straight out from the TOF sensor 110; the pixel 210a is in the center of the image frame 220. A second pixel 210b at a corner of the image frame 220 is associated with a ray direction 215b that extends out from the TOF sensor 110 at, for example, a 30° angle in both an x-direction and y-direction from the center of the image frame 220, where the image frame 220 is an x-y plane in a frame of reference of the TOF sensor 110. The TOF sensor 110 returns distance data (e.g., a distance, one or more phase shifts) to a surface along each valid pixel's ray. In one example, the first pixel 210a may have a measured distance of 1 meter representing a distance to a particular point on a box, and the second pixel 210b may have a measured distance of 2 meters representing a distance to particular point on a wall behind the box.
Example Process for Identifying a Z-Plane
In some embodiments, the processor 120 filters 320 the received distance data. Ambient light in the environment of the TOF sensor 110 can create noise in the distance data. To reduce the effect of noise, a filter, e.g., an integral filter, may be applied to the distance data before further analysis is performed. Filtering the noise in this manner may be particularly useful if the TOF sensor 110 captures data in an outdoor environment, due to the noise caused by sunlight. To filter the distance data, the processor 120 may compute, for each pixel, an average pixel value based on pixel values in a region around the pixel. For example, the filtered pixel value for a given pixel may be the average value for an 11×11 or 21×21 square of pixels centered on the given pixel. In some embodiments, the processor 120 performs the filtering on phase measurement data received from TOF sensor 110, e.g., the processor 120 first filters multiple phase measurements (for different frequencies, as described above), and then processes the filtered phase data to determine a distance measurement for each pixel. Alternatively, the processor 120 may filter the distance measurements, e.g., if the pulse return method is used to obtain the distance data.
In some embodiments, the filtering step may be omitted, e.g., if, the TOF sensor 110 is intended for use in an environment with relatively low noise levels, e.g., if the TOF sensor 110 is designed for indoor use only. In some cases, the processor 120 may perform filtering in response to determining that there is a threshold level of sunlight in the environment of the TOF sensor 110. Furthermore, in some embodiments, the processor 120 may perform adaptive filtering based on the type or level of ambient light in the environment of the TOF sensor 110, e.g., using a larger filter window when brighter sunlight is detected, using a larger filter window when a greater frequency distribution is detected in the ambient light, or using a larger filter window when particular frequencies known to interact with the TOF sensor 110 are detected in the ambient light.
The processor 120 generates 330 a point cloud based on the distance data and the pixel ray directions 215. For example, for each individual pixel, the processor 120 multiplies the ray direction 215 for the pixel by the measurement distance to the surface for that pixel, e.g., the measurement distance shown in
The point cloud is in the reference frame of the TOF sensor 110, also referred to as the ego frame. For example, if a user is holding the TOF sensor 110 at a slight angle relative to the ground in the environment, the Z-direction in the reference frame of the TOF sensor 110 does not align with the Z-planes in the environment (e.g., the Z-direction in the reference frame is angled relative to the direction of gravity, if the Z-planes are perpendicular to the direction of gravity).
The processor 120 identifies 340 basis vectors for a frame of reference of the surfaces in the environment, also referred to as a “world” frame of reference. A first basis vector corresponds to the direction perpendicular to the Z-planes in the environment observed by the TOF sensor 110. Second and third basis vectors are each orthogonal to the first basis vector. The basis vectors define a “world” coordinate system, i.e., a coordinate system in which the Z-planes are horizontal.
Having computed the surface normals, the processor 120 extracts one or more basis vectors based on the computed surface normals. To extract the first basis vector, the processor 120 may bin 420 the coordinates of the surface normals, e.g., the processor 120 bins the polar and azimuthal angles for each of the computed surface normals. This binning results a two-dimensional distribution. The bins may be represented visually by a histogram, e.g., the two-dimensional histogram that bins angular coordinates of the surface normals shown in
Having selected the first basis vector, processor 120 selects 450 the second and third basis vectors. The second and third basis vectors are orthogonal to the first basis vector (i.e., orthogonal to the surface normal to the Z-planes). The second and third basis vectors are also orthogonal to each other. The first, second, and third basis vectors define the world frame of reference.
In some embodiments, the processor 120 calculates a projection of the TOF sensor's pointing direction (e.g., the ray direction 215b, which extends straight out from the TOF sensor 110) into a Z-plane (e.g., a plane orthogonal to the first basis vector), and the processor 120 selects this projection as the second basis vector. The processor 120 selects the vector orthogonal to the first and second basis vectors as a third basis vector; the processor 120 may compute the third basis vector as the cross product of the first basis vector and the second basis vector. In other embodiments, the second and third basis vectors may be chosen in other ways.
Returning to
The processor 120 next identifies 360 Z-planes in the transformed point cloud.
The processor 120 then generates 520 a profile representation of the height map. For example, the processor 120 integrates the height map over the x- and y-directions to obtain a Z-profile of the height map. This profile represents the probability density of heights within the height map. Peaks within the profile correspond to various flat surfaces, i.e., surfaces that are orthogonal to the first basis vector.
The processor 120 identifies 530 the peaks in the profile representation of the height map as Z-planes. For example, for each portion of the Z-profile where the probability density falls above a given threshold (e.g., 0.01), the processor 120 identifies a Z-plane. In some embodiments, the processor 120 applies one or more other rules or heuristics to the Z-profile to identify the Z-planes, e.g., to remove spurious noise peaks while retaining genuine weak signals, such as the floor peak shown on page 9. For example, the processor 120 may identify peaks having at least a threshold number of associated points, or peaks in which the heights fall within a given range of each other.
The processor 120 may select a particular height point within a given peak (e.g., a highest point or a center point) as the Z-plane height for that peak. In some embodiments, the processor 120 selects the lowest Z-value peak as the height of the base Z-plane. The processor 120 may set the height of the base Z-plane to zero, and determine the heights of the other Z-planes based on their height relative to the base Z-plane. For example, the processor 120 sets the height of the floor peak 1610 as 0, and the processor 120 determines that the chair peak 1620 has a height of 0.569 meters, the table peak 1630 has a height of 0.815 meters, and the box top peak 1640 has a height of 0.871 meters.
Having identified the Z-planes and their associated heights, the processor 120 associates 540 various points in the point cloud with the identified Z-planes. For example, the processor 120 may associate a particular point in the point cloud with an identified Z-plane if the height of the point is within a height range of the identified Z-plane. For example, if the height of a particular point is within twice a peak's FWHM (full width at half maximum), the point is associated with the Z-plane corresponding to the peak. In other examples, other ranges around a Z-plane height are used to associate points in the point cloud to Z-planes.
The transformed point cloud and identified Z-planes can be used for various further processing on the point cloud data. In some examples, the processor 120 can proceed to locate a box in the environment of the TOF sensor 110 and determine dimensions and/or the volume of the box, as described further below. In other examples, the processor 120 can perform other types of identification or analysis on other types of objects in the environment of the TOF sensor 110.
In some embodiments, the sensor system 100 displays outputs of the Z-plane identification process to a user. For example, the processor 120 may correlate the identified Z-planes to various pixels in an image obtained by the camera 130, and generate a display with visual indications of the identified Z-planes. For example, the Z-planes may be outlined or color-coded in a display output by the display device 140. The display device 140 may alternatively or additionally output the determined heights of the identified Z-planes.
Example Process for Box Dimensioning
For the box dimensioning process, the processor 120 may assume that at least a portion of the surfaces in the environment of the TOF sensor 110 correspond to a box to be measured. Several additional assumptions may be made about the box being measured. Such assumptions can improve the speed and accuracy of the box dimensioning process, particularly in applications where fast detection and measuring of the box are important, e.g., if box dimensions are calculated provided to a user in real or near-real time as the user points the TOF sensor 110 at a box. These assumptions may include that the angles between adjacent surfaces of the box are reasonably close to 90° (e.g., between 85° and 95°, or within some other range); that the box is located within a particular distance of the TOF sensor 110 (e.g., within 3 meters or within 5 meters); that each box dimension is within a particular range (e.g., at least 3 centimeters, or at least 10 centimeters; no more than 2 meters, or no more than 3 meters); that the box is closed; that the top face of the box is visible to the TOF sensor 110; and that the box rests on a flat, horizontal surface (i.e., a Z-plane) that is also visible to the TOF sensor 110.
It should be understood that, in some embodiments, one or more of these assumptions may be relaxed or removed. For example, the range between the sensor and box and the minimum and maximum box dimensions are merely exemplary, and in other embodiments, different ranges and dimensions may be used. In some embodiments, the ranges may vary on the intended uses or target users of the sensor system 100. For example, if the sensor system 100 is used to measuring boxes being loaded into a moving truck (e.g., including wardrobe boxes and boxed furniture), greater distance ranges and greater box dimensions may be used. In some embodiments, a user may be able to input a distance range and/or maximum and minimum box dimensions.
The processor 120 transforms 620 the distance data (e.g., the point cloud calculated based on distance data from the TOF sensor 110) into a frame of reference of a surface in the environment of the TOF sensor 110. For example, as described with respect to steps 340 and 350 in
As noted with respect to
The processor 120 identifies 730 connected components within at least some of the Z-slices. Each of the connected components may be a candidate box top. To identify a connected component, the processor 120 finds clusters of nearby or connecting points within a Z-slice. The processor 120 may identify a connected component by finding sets of pixels in a Z-slice that may be reached by moving across the Z-slice, e.g., pixels that are within a threshold distance of each other. For example, the processor 120 may select a particular pixel in a Z-slice and recursively add neighboring pixels that are also in the Z-slice to a connected component. Each connected component has a respective height along the Z-axis in the frame of reference of the basis vectors; the height corresponds to the height of the Z-slice.
In some embodiments, prior to identifying the connected components in the Z-slices, the processor 120 may apply one or more rules to remove one or more Z-slices from consideration as the box top. For example, the processor 120 may eliminate the lowest or base Z-slice as potentially containing the box top, since it is assumed that the box top is above the lowest surface. The processor 120 may also remove a Z-slice that does not lie sufficiently close within the height map (in the x- and y-direction) to some other, lower Z-slice (i.e., a potential surface for the box to be resting on). For example, for the Z-plane slices shown in
Having identified the connected components representing candidate box tops, the processor 120 selects 740 one of the connected components as the box top. The processor 120 may apply various rules to the connected components to identify the box top. For example, the processor 120 may remove connected components that are very small (e.g., having a width and/or length below the threshold minimum box dimensions described above). The processor 120 may remove connected components that are highly elongated or non-compact (e.g., the connected component has a large perimeter compared to the square root of its area). The processor 120 may remove connected components for which a box bottom (i.e., the surface on which the box is resting) cannot be derived from the height map, e.g., because no other connected component or Z-slice is sufficiently close to the connected component in the height map in the x- and y-direction.
Having applied these rules to the connected components shown in
While several example rules for identifying a box top are discussed above, the processor 120 may apply additional, fewer, or different rules to the connected components to identify the box top in different embodiments. In some embodiments, if multiple candidates pass each of the rules described above, the processor 120 may use an additional rule or rules for choosing between the possible box tops. For example, the processor 120 may select the candidate box top that is closest to the TOF sensor 110.
Having identified the box top, the processor 120 identifies 750 the surface on which the box is resting, which corresponds to the box bottom. For example, the processor 120 selects a Z-slice that has a lower height than the box top and that is closest in lateral range to the box top in the height map, e.g., the Z-slice that is closest to the identified box top in the x- and y-directions in the height map. In the example height map shown in
Returning to
The processor 120 further 650 calculates the length and the width of the box based on the selected box top. For example, having identified the box top, the processor 120 calculates the length and width of the box top. Because the distance data for the box top is taken at an angle and may have noise, the processor 120 may filter the box top data, rotate the box top so it is aligned with an x-axis and a y-axis, and calculate horizontal and vertical profiles of the edges to determine the length and width. For example, the trailing edge of the box (i.e., the edge of the box farthest from the TOF sensor 110) may be blurred, which can make it difficult for the processor 120 to identify the trailing edge without performing additional data processing.
To filter the distance data, the processor 120 may compute, for each pixel in the distance data, an average pixel value based on pixel values in a region around the pixel. The processor may use a different filter than the filter described with respect to step 320. In particular, the processor 120 may use a smaller filter window than the filter used to identify the Z-planes. For example, the filtered pixel value for a given pixel may be the average value for an 5×5 or 7×7 square of pixels centered on the given pixel. As described with respect to
In some embodiments, the filtering step may be omitted, e.g., if, the TOF sensor 110 is intended for use in an environment with relatively low noise levels, e.g., if the TOF sensor 110 is designed for indoor use only. In some cases, the processor 120 may perform filtering in response to determining that there is a threshold level of sunlight in the environment of the TOF sensor 110. Furthermore, in some embodiments, the processor 120 may perform adaptive filtering based on the type or level of ambient light in the environment of the TOF sensor 110, e.g., using a larger filter window when brighter sunlight is detected, using a larger filter window when a greater frequency distribution is detected in the ambient light, or using a larger filter window when particular frequencies known to interact with the TOF sensor 110 are detected in the ambient light.
The processor 120 extracts 820 a subset of points within the transformed distance data corresponding to the box top, e.g., the connected component selected as the box top at step 740. If the filtering 810 is performed, the processor 820 may calculate a second point cloud based on the filtered data (following the process described in step 330 in
Having rotated the box top subcloud, the processor 120 calculates 840 a width profile and a length profile for the box top. For many TOF sensors, while the leading edges of the box closest to the sensor are sharp and easy for both humans and computers to identify, flying voxels may blur the trailing edges of the box located farther downstream from the TOF sensor 110. This causes the location of the trailing edges to be more ambiguous and difficult to identify from the distance data. The processor 120 generates boxtop width and length profiles by projecting the points of the rotated box top subcloud onto the horizontal and vertical axes.
The processor 120 identifies 850 the leading edges and trailing edges in the width and length profiles. For example, the processor 120 applies one or more rules to identify the edges from the profiles. The processor 120 may fit lines to each of the profiles' interiors and define a leading edge as a location where the profile equals a set percentage of the linear fit's value, e.g., 40% of the linear fit's value. The processor 120 may define the trailing edge by a location where the profile equals the same percentage or different percentage of the linear fit's value, e.g., a particular value in the range of 25% to 85%. The processor 120 may further apply one or more rules to determine the trailing edge fraction. For example, trailing edge percentage thresholds may vary with the height of the box. Alternatively, percentage thresholds can differ for the shorter and longer of the two box top edges.
The processor 120 calculates 860 the width and length of the box top based on the determined leading edges and trailing edges. In particular, the width is the distance between the leading edge and trailing edge in the width projection, and the length is the distance between the leading edge and trailing edge in the length projection.
Returning to
The processor 120 may project the image of the three-dimensional box onto a two-dimensional image plane of the camera 130 to generate an overlay image, e.g., an outline overlaying the image of the box. The calculated width, length, and height dimensions may also be reported in the graphical display, either along the edges or in a separate area. A user can view the graphical display in the display device 140 to qualitatively confirm that the sensor system 100 has correctly identified the box and correctly identified the edges and surfaces.
In some embodiments, the sensor system 100 may additionally or alternatively report an intensity indicator that indicates a measured intensity at a particular pixel or across a set of pixels in the distance data collected by the TOF sensor 110 and/or a measured intensity in the corresponding pixel or set of pixels collected by the camera 130. In some cases, if the measured intensity in an area of interest in the image frame 220 is too low, it may be difficult for the processor 120 to find Z-planes, determine the box top dimensions, or perform other processing of the TOF distance data. The processor 120 can analyze the intensity of at least a portion of the sensor system's field of view and report the intensity to the user. Based on the reported intensity, the user may determine whether to adjust the environment, e.g., by changing lighting conditions, by changing the angle of the TOF sensor 110 relative to the box or other area of interest, by moving the box to a different location (e.g., onto a different Z-plane, into another room), etc., in order to increase the intensity. In some embodiments, if the processor 120 determines that the intensity is too low (e.g., the intensity is below a given threshold and/or the processor 120 is having difficulty finding Z-planes or the box, e.g., none of the identified connected components satisfies the rules for identifying the box top), the processor 120 may output an instruction to the user to make a change to the environment, sensor position, or location of the box to increase the intensity.
For example, if the camera 130 is an IR camera, the processor 120 may determine an IR intensity for at least a portion of the camera's field of view, e.g., at or near the center of the image frame of the camera 130. If the camera 130 is a visible light camera, the processor 120 may determine an intensity or brightness of the visible light at or near the center of the image frame. The intensity measurement may be correlated with the reflectivity of the material(s) in a given region, e.g., a reflectivity of a box material. Since a user typically points the TOF sensor 110 at a box, Z-plane, or other area of interest, and may be encouraged to include a box top in the center of the image frame by the IoU (as described above), the center of the image frame of the camera 130 typically corresponds to the box top, other portion of a box, Z-plane, or other area of interest.
As a particular example, the camera 130 captures an image frame with an area corresponding to the image frame 220. The processor 120 may identify, in the image frame captured by the camera 130, an intensity near the center of the image frame, e.g., an intensity at a location corresponding to the pixel 215a in the center of the image frame 220 of the TOF sensor 110, or an average intensity for set of pixels including the center of the image frame. For example, the processor 120 may determine an average intensity for a set of pixels corresponding to the circle 2510 shown in
The following paragraphs provide various examples of the embodiments disclosed herein.
Example 1 provides a method for identifying a Z-plane, the method including receiving distance data describing distances between a sensor that captured the distance data and a plurality of surfaces in an environment of the sensor, where at least one of the surfaces is a Z-plane; generating a point cloud based on the distance data, the point cloud in a frame of reference of the sensor; identifying a basis vector representing a peak direction across the point cloud; transforming the point cloud into a frame of reference of the basis vector; and identifying a Z-plane in the transformed point cloud.
Example 2 provides the method of example 1, where the sensor is a TOF sensor including a light source and an image sensor.
Example 3 provides the method of example 1, where the distance data is arranged in a plurality of pixels within an image frame of the sensor.
Example 4 provides the method of example 3, where an individual pixel has a distance to one of the plurality of surfaces in the environment of the sensor, and the individual pixel has an associated ray direction describing a direction from the sensor to the surface.
Example 5 provides the method of example 4, where generating the point cloud involves multiplying the ray direction for the individual pixel by the distance to the one of the plurality of surfaces for the individual pixel.
Example 6 provides the method of example 1, where the distance data is arranged as a plurality of pixels, the method further including filtering the distance data by computing, for an individual pixel, an average pixel value based on pixel values in a region around the individual pixel.
Example 7 provides the method of example 1, where identifying the basis vector includes computing surface normals for points in the point cloud; and extracting the basis vector based on the computed surface normals, the basis vector representing the peak direction of the surface normals across the point cloud.
Example 8 provides the method of example 7, where computing the surface normal for points in the point cloud includes computing angular coordinates of the surface normals of the points in the point cloud.
Example 9 provides the method of example 8, where extracting the basis vector includes binning the angular coordinates of the surface normals; identifying a peak angle of each of the angular coordinates; and identifying the basis vector based on the identified peak angles.
Example 10 provides the method of example 7, where computing a surface normal for an individual point in the point cloud includes fitting a plane to a set of points in a region around the individual point.
Example 11 provides the method of example 1, where the basis vector is a first basis vector, the method further including selecting a second basis vector orthogonal to the first basis vector and a third basis vector orthogonal to the first basis vector and the second basis vector, where the frame of reference of the basis vector is a frame of reference of the first basis vector, the second basis vector, and the third basis vectors.
Example 12 provides the method of example 11, where the second basis vector is selected as a projection of a pointing direction of the sensor into a Z-plane, and the third basis vector is set equal to a cross product of the first basis vector and the second basis vector.
Example 13 provides the method of example 1, where identifying the Z-plane in the transformed point cloud includes generating a height map of the transformed point cloud; generating a profile representation of the height map, the profile representation having a peak corresponding to each of a plurality of Z-planes; and identifying the Z-plane in the profile representation.
Example 14 provides the method of example 13, where the identified Z-plane is a base Z-plane, the method further including setting a height of the base Z-plane to zero.
Example 15 provides the method of example 13, further including associating a point in the transformed point cloud with the identified Z-plane based on determining that a height of the point is within a height range associated with the identified Z-plane.
Example 16 provides an imaging system including a TOF depth sensor to obtain distance data describing distances between the TOF depth sensor and a plurality of surfaces in an environment of the TOF depth sensor; and a processor to receive the distance data from the TOF depth sensor; generate a point cloud based on the distance data, the point cloud in a frame of reference of the TOF depth sensor; identify a basis vector representing a peak direction across the point cloud; transform the point cloud into a frame of reference of the basis vector; and identify a Z-plane in the transformed point cloud.
Example 17 provides the system of example 16, where the TOF depth sensor includes a light source to illuminate the environment of the TOF depth sensor and an image sensor to sense reflected light.
Example 18 provides the system of example 16, where the TOF depth sensor has an image frame, and the distance data is arranged in a plurality of pixels within the image frame.
Example 19 provides the system of example 18, where an individual pixel has a distance to one of the plurality of surfaces in the environment of the TOF depth sensor, and the individual pixel has an associated ray direction describing a direction from the TOF depth sensor to the surface.
Example 20 provides the system of example 19, where, to generate the point cloud, the processor multiplies the ray direction for the individual pixel by the distance to the one of the plurality of surfaces for the individual pixel.
Example 21 provides the system of example 16, further including a camera to capture an image of the environment of the TOF depth sensor.
Example 22 provides the system of example 21, further including a display screen, the processor to display, on the display screen, the image captured by the camera and a visual indication of the identified Z-plane.
Example 23 provides the system of example 16, further including a light sensor for detecting sunlight in the environment of the TOF depth sensor, where the processor applies a filter to the distance data in response to detecting at least a threshold level of sunlight.
Example 24 provides a method for determining dimensions of a physical box, the method including receiving distance data describing distances between a sensor and a plurality of surfaces in an environment of the sensor, at least a portion of the surfaces corresponding to a box to be measured; transforming the distance data into a frame of reference of one of the surfaces in the environment of the sensor; selecting, from the plurality of surfaces in the environment of the sensor, a first surface corresponding to a top of the box and a second surface corresponding to a surface the box is resting on; calculating a height between the first surface and the second surface; and calculating a length and a width based on the selected first surface corresponding to the top of the box.
Example 25 provides the method of example 24, where the distance data is a point cloud in a frame of reference of the sensor.
Example 26 provides the method of examples 25, where transforming the distance data into the frame of reference of one of the surfaces in the environment of the sensor includes identifying a basis vector representing a peak direction across the point cloud; and transforming the point cloud into a frame of reference of the basis vector.
Example 27 provides the method of example 26, where identifying the basis vector includes computing angular coordinates of surface normals for points in the point cloud; and extracting the basis vector based on the computed angular coordinates of the surface normals, the basis vector representing the peak direction of the surface normals across the point cloud.
Example 28 provides the method of example 24, where the sensor is a TOF sensor including a light source and an image sensor.
Example 29 provides the method of example 24, where the one of the surfaces used as the frame of reference for transforming the distance data is a Z-plane.
Example 30 provides the method of example 24, where selecting the first surface includes identifying a plurality of connected components within the transformed distance data, each connected component having a respective height along a Z-axis in the frame of reference of the one of the surfaces; and selecting, as the first surface, one of the plurality of connected components by applying a set of rules to the plurality of connected components.
Example 31 provides the method of example 30, where identifying the plurality of connected components includes identifying a plurality of Z-slices of the transformed distance data, each of the plurality of Z-slices having a respective height along the Z-axis; and identifying, within each of the plurality of Z-slices, at least one connected component of height map pixels.
Example 32 provides the method of example 31, where identifying the plurality of Z-slices includes generating a height map of the distance data; generating a profile representation of the height map, the profile representation having a peak corresponding to each Z-slice; and identifying the plurality of Z-slices from the profile representation.
Example 33 provides the method of example 31, where selecting the second surface corresponding to the surface the box is resting on includes selecting a Z-slice of the plurality of Z-slices within a lateral range of the selected first surface.
Example 34 provides the method of example 30, where the set of rules applied to the plurality of connected components includes removing a connected component having a width or length less than a threshold minimum width or length; removing a connected component at least a threshold distance from another connected component; and removing a connected component having an enclosing convex hull polygon that deviates from an expected rectangular shape by at least a threshold deviation.
Example 35 provides the method of example 24, where calculating the length and the width based on the selected first surface involves extracting a subset of the transformed distance data corresponding to the selected first surface; calculating a length profile and a width profile of the subset; identifying, within the width profile, a first leading edge and a first trailing edge of the box; identifying, within the length profile, a second leading edge and a second trailing edge of the box; and calculating the width of the box between the first leading edge and the second leading edge and calculating the length of the box between the second leading edge and the second trailing edge.
Example 36 provides the method of example 24, further including determining an angle of rotation for the extracted subset corresponding to the first selected surface, the determined angle selected to minimize a sum of projections of edges of the first selected surface onto a set of axes of the frame of reference of one of the surfaces in the environment of the sensor; and rotating the extracted subset corresponding to the first selected surface by the determined angle.
Example 37 provides the method of example 24, where the transformed distance data includes a plurality of pixels, and calculating the length and the width based on the selected first surface corresponding to the top of the box includes, for at least pixels in the selected first surface, filtering the pixels by computing, for an individual pixel, an average pixel value based on pixel values in a region around the individual pixel; and calculating the length and width based on the filtered pixels in the selected first surface.
Example 38 provides the method of example 24, further including generating a visual representation of the box, the visual representation indicating the height, width, and length of the box.
Example 39 provides the method of example 24, further including calculating an IoU score based on an overlap between the first surface corresponding to the top of the box and a circle in a field of view of the sensor; and generating a display including the calculated IoU score.
Example 40 provides the method of example 24, further including receiving camera data from a camera, the camera having a camera field of view that at least partially overlaps with a field of view of the sensor; determining, based on the camera data, an intensity of at least portion of the camera field of view; and generating a display including the determined intensity.
Example 41 provides an imaging system including a TOF depth sensor to obtain distance data describing distances between the TOF depth sensor and a plurality of surfaces in an environment of the TOF depth sensor; and a processor to receive the distance data from the TOF depth sensor; transform the distance data into a frame of reference of one of the surfaces in the environment of the sensor; select a first surface corresponding to a top of the box and a second surface corresponding to a surface the box is resting on; calculate a height between the first surface and the second surface; and calculate a length and a width based on the selected first surface corresponding to the top of the box.
Example 42 provides the system of example 41, where the TOF depth sensor includes a light source to illuminate the environment of the depth sensor and an image sensor to sense reflected light.
Example 43 provides the system of example 41, where the TOF sensor has an image frame, and the distance data is arranged in a plurality of pixels within the image frame.
Example 44 provides the system of example 43, where an individual pixel has a distance to one of the plurality of surfaces in the environment of the TOF depth sensor, and the individual pixel has an associated ray direction describing a direction from the sensor to the TOF depth surface.
Example 45 provides the system of example 41, further including a camera to capture an image of the environment of the TOF depth sensor.
Example 46 provides the system of example 45, further including a display screen, the processor to display, on the display screen, the image captured by the camera and the calculated width, length, and height.
Example 47 provides the system of example 45, further including a display screen, the processor to display, on the display screen, the image captured by the camera and an overlaid depiction of the selected first surface.
Example 48 provides the system of example 47, the processor further to display, on the display screen, a plurality of box edges below the selected first surface.
Other Implementation Notes, Variations, and Applications
It is to be understood that not necessarily all objects or advantages may be achieved in accordance with any particular embodiment described herein. Thus, for example, those skilled in the art will recognize that certain embodiments may be configured to operate in a manner that achieves or optimizes one advantage or group of advantages as taught herein without necessarily achieving other objects or advantages as may be taught or suggested herein.
In one example embodiment, any number of electrical circuits of the figures may be implemented on a board of an associated electronic device. The board can be a general circuit board that can hold various components of the internal electronic system of the electronic device and, further, provide connectors for other peripherals. More specifically, the board can provide the electrical connections by which the other components of the system can communicate electrically. Any suitable processors (inclusive of digital signal processors, microprocessors, supporting chipsets, etc.), computer-readable non-transitory memory elements, etc. can be suitably coupled to the board based on particular configuration needs, processing demands, computer designs, etc. Other components such as external storage, additional sensors, controllers for audio/video display, and peripheral devices may be attached to the board as plug-in cards, via cables, or integrated into the board itself. In various embodiments, the functionalities described herein may be implemented in emulation form as software or firmware running within one or more configurable (e.g., programmable) elements arranged in a structure that supports these functions. The software or firmware providing the emulation may be provided on non-transitory computer-readable storage medium comprising instructions to allow a processor to carry out those functionalities.
It is also imperative to note that all of the specifications, dimensions, and relationships outlined herein (e.g., the number of processors, logic operations, etc.) have only been offered for purposes of example and teaching only. Such information may be varied considerably without departing from the spirit of the present disclosure, or the scope of the appended claims. The specifications apply only to one non-limiting example and, accordingly, they should be construed as such. In the foregoing description, example embodiments have been described with reference to particular arrangements of components. Various modifications and changes may be made to such embodiments without departing from the scope of the appended claims. The description and drawings are, accordingly, to be regarded in an illustrative rather than in a restrictive sense.
Note that with the numerous examples provided herein, interaction may be described in terms of two, three, four, or more components. However, this has been done for purposes of clarity and example only. It should be appreciated that the system can be consolidated in any suitable manner. Along similar design alternatives, any of the illustrated components, modules, and elements of the FIGS. may be combined in various possible configurations, all of which are clearly within the broad scope of this Specification.
Note that in this Specification, references to various features (e.g., elements, structures, modules, components, steps, operations, characteristics, etc.) included in “one embodiment”, “example embodiment”, “an embodiment”, “another embodiment”, “some embodiments”, “various embodiments”, “other embodiments”, “alternative embodiment”, and the like are intended to mean that any such features are included in one or more embodiments of the present disclosure, but may or may not necessarily be combined in the same embodiments.
Numerous other changes, substitutions, variations, alterations, and modifications may be ascertained to one skilled in the art and it is intended that the present disclosure encompass all such changes, substitutions, variations, alterations, and modifications as falling within the scope of the appended claims. Note that all optional features of the systems and methods described above may also be implemented with respect to the methods or systems described herein and specifics in the examples may be used anywhere in one or more embodiments.
In order to assist the United States Patent and Trademark Office (USPTO) and, additionally, any readers of any patent issued on this application in interpreting the claims appended hereto, Applicant wishes to note that the Applicant: (a) does not intend any of the appended claims to invoke paragraph (f) of 35 U.S.C. Section 112 as it exists on the date of the filing hereof unless the words “means for” or “step for” are specifically used in the particular claims; and (b) does not intend, by any statement in the Specification, to limit this disclosure in any way that is not otherwise reflected in the appended claims.
This application claims priority to U.S. provisional patent application Nos. 63/081,742, filed Sep. 22, 2020 and entitled “BOX DIMENSIONING USING THREE-DIMENSIONAL TIME-OF-FLIGHT IMAGING,” and 63/081,775, filed Sep. 22, 2020 and entitled “WORLD Z-PLANE IDENTIFICATION IN TIME-OF-FLIGHT IMAGERY,” which are hereby incorporated by reference in their entireties.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2021/051238 | 9/21/2021 | WO |
Number | Date | Country | |
---|---|---|---|
63081742 | Sep 2020 | US | |
63081775 | Sep 2020 | US |