Man-made environments are generally endowed with a preferred direction corresponding to the local orientation of earth's gravity field. In simple terms, “up” and “down” define natural engineering directions for indoor settings (e.g., rooms) and outdoor settings (e.g., streets). Floors, walls, and ceilings are strongly constrained by the direction of local gravity. In particular, man-made environments are usually populated by horizontal Z-planes (e.g., tabletops, chair seats, floors, sidewalks).
As a person walks around a man-made environment while holding a 3D time-of-flight (TOF) imaging system in hand, the sensor's angular orientation relative to the natural “up” and “down” directions is typically unknown. Humans do not reliably align imaging systems to their environments, and having users align and re-align sensors to match their environment can be a time-consuming and frustrating process.
One potential application of TOF imaging systems is determining the dimensions of boxes. Measuring volumes of physical objects is a basic problem for various industrial and consumer markets, such as packing, shipping, and storage of objects. In typical packing and shipping contexts, humans use tape measures to measure box dimensions, which is a time-consuming process. Existing technical solutions are often fragile, expensive, and/or can only be used in certain settings. For example, some dimensioning solutions rely on fixed frames of reference, e.g., deriving the volume of a box placed on a designated surface from an image taken by a camera at a fixed position relative to the designated surface.
To provide a more complete understanding of the present disclosure and features and advantages thereof, reference is made to the following description, taken in conjunction with the accompanying figures, wherein like reference numerals represent like parts, in which:
Overview
The systems, methods and devices of this disclosure each have several innovative aspects, no single one of which is solely responsible for all of the desirable attributes disclosed herein. Details of one or more implementations of the subject matter described in this specification are set forth in the description below and the accompanying drawings.
Reliable identification of Z-planes (e.g., a floor, a street, a tabletop) in an environment can be useful in many two-dimensional and three-dimensional image processing applications. In particular, it is useful to determine a time-of-flight sensor's roll and pitch angles relative to a Z-plane, as well as the sensor's height relative to Z-planes in its environment. As used herein, a Z-plane is a plane in a real-world environment that is parallel to the ground in a particular environment. Z-planes include the ground or floor, and surfaces parallel to the ground or floor. In many cases, the Z-plane is perpendicular to the direction of gravity. In some cases, e.g., on a hill or other slanted surface, Z-planes (e.g., the ground, a table resting on the ground) may be somewhat tilted with respect to the direction of gravity.
A base Z-plane is the lowest Z-plane within an image that captures an environment. For example, in an image of an environment that includes a box resting on a table placed on a floor, the box top, tabletop, and floor are all Z-planes, and the floor is the base Z-plane. If another image includes the box and the tabletop but does not include the floor, the tabletop is the base Z-plane for that image.
Methods and systems for identifying Z-planes in an environment and, in some cases, identifying a base Z-plane in an environment, are described herein. An example of a method involves extracting parameters for the roll and pitch rotation angles relative to a Z-plane. In some embodiments, the method also extracts a parameter for the height of the sensor relative to the base Z-plane from a single input TOF depth frame. Once the two rotation parameters and translation parameters are extracted, the number of a priori unknown extrinsic camera calibration parameters is reduced from six (3 translations+3 rotation angles) to three (2 translations+1 rotation angle). Time-of-flight applications become easier and faster for processing systems to handle when the number of unknown sensor degrees of freedom is reduced in this way. In addition, aligning coordinate system axes to a Z-plane simplifies time-of-flight imagery exploitation in several applications, such as box dimensioning, object dimensioning, box packing, or obstacle detection.
Methods and systems for measuring dimensions of a box are also described herein. One example of a method involves receiving a three-dimensional point cloud obtained from time-of-flight data and identifying a box within the point cloud. In particular, the method includes identifying a box top within the point cloud, and then identifying a surface on which the box is resting, such as a tabletop or the floor. The method then includes calculating the height of the box as the distance between the box top and the surface on which the box is resting, and identifying edges of the box top. The method then includes calculating width and length profiles for the edges, and determining a width and a length for the box based on the width and length profiles. Quantitative height, width, and length values, e.g., measured in centimeters, may be reported to a user, e.g., on a display of a TOF measurement device. In some examples, the device also generates a visualization of the identified box superimposed on an image of the box so that the user may qualitatively confirm the calculated dimensions.
Existing box dimensioning solutions are typically highly vulnerable to sunlight because sunlight creates significant noise in TOF data or image data. Prior box dimensioning systems were only suitable for indoor use or under particular lighting conditions. In some embodiments described herein, TOF measurement data is filtered to reduce the impact of visual noise, enabling the TOF sensor system to be used in a variety of ambient lighting conditions, including both indoors and outdoors. In one example, the measurement data is filtered at a first stage for identifying Z-planes in the observed environment. Because Z-planes are relatively large, an aggressive filter (e.g., a large filter window) can be used. As noted above, after the box has been identified using the Z-planes, the box edges are identified. A finer filter (e.g., a smaller filter window) may be used to filter the measurement data for finding the box edges, since more precision is needed at this stage.
One embodiment provides a method for identifying a Z-plane. An example of a method includes receiving distance data describing distances between a sensor that captured the distance data and a plurality of surfaces in an environment of the sensor, where at least one of the surfaces is a Z-plane; generating a point cloud based on the distance data, the point cloud in a frame of reference of the sensor; identifying a basis vector representing a peak direction across the point cloud; transforming the point cloud into a frame of reference of the basis vector; and identifying a Z-plane in the transformed point cloud.
Another embodiment provides an imaging system that includes a TOF depth sensor and a processor. The TOF depth sensor obtains distance data describing distances between the TOF depth sensor and a plurality of surfaces in an environment of the TOF depth sensor. The processor receives the distance data from the TOF depth sensor; generates a point cloud based on the distance data, the point cloud in a frame of reference of the TOF depth sensor; identifies a basis vector representing a peak direction across the point cloud; transforms the point cloud into a frame of reference of the basis vector; and identifies a Z-plane in the transformed point cloud.
Yet another embodiment provides a method for determining dimensions of a physical box. The method includes receiving distance data describing distances between a sensor and a plurality of surfaces in an environment of the sensor, at least a portion of the surfaces corresponding to a box to be measured; transforming the distance data into a frame of reference of one of the surfaces in the environment of the sensor; selecting, from the plurality of surfaces in the environment of the sensor, a first surface corresponding to a top of the box and a second surface corresponding to a surface the box is resting on; calculating a height between the first surface and the second surface; and calculating a length and a width based on the selected first surface corresponding to the top of the box.
Another embodiment provides an imaging system including a TOF depth sensor and a processor. The TOF depth sensor obtains distance data describing distances between the TOF depth sensor and a plurality of surfaces in an environment of the TOF depth sensor. The processor receives the distance data from the TOF depth sensor; transforms the distance data into a frame of reference of one of the surfaces in the environment of the sensor; selects a first surface corresponding to a top of the box and a second surface corresponding to a surface the box is resting on; calculates a height between the first surface and the second surface; and calculates a length and a width based on the selected first surface corresponding to the top of the box.
Another embodiment provides a method for identifying Z-planes including obtaining raw data from a TOF sensor indicating distance between the TOF sensor and a plurality of surfaces, applying an averaging filter to the raw data to smooth the raw data for increasing signal-to-noise ratio (SNR) of flat surfaces represented in the raw data, performing a depth compute process on the raw data, as filtered, to generate distance data, generating a point cloud based on the distance data, and identifying the Z-planes in the point cloud.
Another embodiment provides an imaging system including a TOF depth sensor configured to obtain raw data indicating distances between the TOF depth sensor and a plurality of surfaces in an environment of the TOF depth sensor, and a processor. The processor is configured to apply an averaging filter to the raw data to smooth the raw data for increasing SNR of flat surfaces represented in the raw data, perform a depth compute process on the raw data, as filtered, to generate distance data, generate a point cloud based on the distance data, and identify the Z-planes in the point cloud.
As will be appreciated by one skilled in the art, aspects of the present disclosure, in particular aspects of identifying a Z-plane and determining box dimensions based on TOF imagery, described herein, may be embodied in various manners (e.g., as a method, a system, a computer program product, or a computer-readable storage medium). Accordingly, aspects of the present disclosure may take the form of a hardware embodiment, a software embodiment (including firmware, resident software, micro-code, etc.), or an embodiment combining software and hardware aspects that may all generally be referred to herein as a “circuit,” “module” or “system.” Functions described in this disclosure may be implemented as an algorithm executed by one or more hardware processing units, e.g. one or more microprocessors, of one or more computers. In various embodiments, different steps and portions of the steps of each of the methods described herein may be performed by different processing units. Furthermore, aspects of the present disclosure may take the form of a computer program product embodied in one or more computer-readable medium(s), preferably non-transitory, having computer-readable program code embodied, e.g., stored, thereon. In various embodiments, such a computer program may, for example, be downloaded (updated) to the existing devices and systems (e.g. to the existing perception system devices and/or their controllers, etc.) or be stored upon manufacturing of these devices and systems.
The following detailed description presents various descriptions of specific certain embodiments. However, the innovations described herein can be embodied in a multitude of different ways, for example, as defined and covered by the claims and/or select examples. In the following description, reference is made to the drawings where like reference numerals can indicate identical or functionally similar elements. It will be understood that elements illustrated in the drawings are not necessarily drawn to scale. Moreover, it will be understood that certain embodiments can include more elements than illustrated in a drawing and/or a subset of the elements illustrated in a drawing. Further, some embodiments can incorporate any suitable combination of features from two or more drawings.
The following disclosure describes various illustrative embodiments and examples for implementing the features and functionality of the present disclosure. While particular components, arrangements, and/or features are described below in connection with various example embodiments, these are merely examples used to simplify the present disclosure and are not intended to be limiting. It will of course be appreciated that in the development of any actual embodiment, numerous implementation-specific decisions must be made to achieve the developer's specific goals, including compliance with system, business, and/or legal constraints, which may vary from one implementation to another. Moreover, it will be appreciated that, while such a development effort might be complex and time-consuming; it would nevertheless be a routine undertaking for those of ordinary skill in the art having the benefit of this disclosure.
In the Specification, reference may be made to the spatial relationships between various components and to the spatial orientation of various aspects of components as depicted in the attached drawings. However, as will be recognized by those skilled in the art after a complete reading of the present disclosure, the devices, components, members, apparatuses, etc. described herein may be positioned in any desired orientation. Thus, the use of terms such as “above”, “below”, “upper”, “lower”, “top”, “bottom”, or other similar terms to describe a spatial relationship between various components or to describe the spatial orientation of aspects of such components, should be understood to describe a relative relationship between the components or a spatial orientation of aspects of such components, respectively, as the components described herein may be oriented in any desired direction. When used to describe a range of dimensions or other characteristics (e.g., time, pressure, temperature, length, width, etc.) of an element, operations, and/or conditions, the phrase “between X and Y” represents a range that includes X and Y.
Other features and advantages of the disclosure will be apparent from the following description and the claims.
TOF System Overview
The TOF sensor 110 collects and/or determines distance data describing a distance between the TOF sensor 110 and various surfaces in the environment of the TOF sensor 110. The TOF sensor 110 may contain a light source, e.g., a laser, and an image sensor for capturing light reflected off the surfaces. In some embodiments, the TOF sensor 110 can emit a pulse of light and capture multiple image frames at different times to determine an amount of time for the light pulse to travel to the surface and be returned to the image sensor. In other embodiments, the TOF sensor 110 can detect phase shifts in the captured light, and the phase shifts indicate the distance between the TOF sensor 110 and various surfaces. In some embodiments, the TOF sensor 110 may generate and capture light at multiple different frequencies. If the TOF sensor 110 emits and captures light at multiple frequencies, this can help resolve ambiguous distances and help the TOF sensor 110 work at larger distance ranges. For example, if, for a first frequency, a first observed phase may correspond to a surface 0.5 meters, 1.5 meters, or 2.5 meters away, and, for a second frequency, a second observed phase may correspond to a surface 0.75 meters, 1.5 meters, or 2.25 meters away, by combining the two observations, the TOF sensor 110 can determine that the surface is 1.5 meters away. Using multiple frequencies may also improve robustness against noise caused by particular frequencies of ambient light, whether phase shift or pulse return time are used to measure distance. In alternate embodiments, different types of sensors may be used instead of and/or in addition to the TOF sensor 110 to obtain distance data. In any case, the TOF sensor 110 can collect raw data, such as the returned pulse from objects and surfaces in the scene, or the TOF sensor 110 can compute raw data based on the returned pulse from objects and surfaces such as samples of the correlation function between the emitted pulse and the returned pulse. In some embodiments, a TOF sensor 110 may be equipped to compute raw data such as linear and nonlinear transformations of such a correlation function between the emitted pulse and the returned pulse. This raw data may include information about the amount of time for the light pulse to travel and return at the image sensor, the phase shift, etc., and can be used to compute the distance (or depth data) using a distance (or depth) compute process.
The processor 120 receives distance data from the TOF sensor 110 and processes the distance data to identify various features in the environment of the TOF sensor 110, as described in detail herein, e.g., with respect to
A camera 130 may capture image frames of the environment. The camera 130 may be a visual light camera that captures images of the environment in the visible range. In other embodiments, the camera 130 is an infrared (IR) camera captures IR intensities of the surfaces in the sensor system's environment. The field of view of the camera 130 and TOF sensor 110 are partially or fully overlapping, e.g., the field of view of the camera 130 may be slightly larger than the field of view of the TOF sensor 110. The camera 130 may pass captured images to the processor 120. In some embodiments, two processors or processing units may be included, e.g., a first processing unit for performing the Z-plane identification and box dimensioning algorithms described herein, and a second graphical processing unit that receives images from the camera 130 and generates displays based on the images and data from the first processing unit. In some embodiments, image data from the camera 130 may be used to determine a level of sunlight in the environment of the TOF sensor 110. In alternate embodiments, the sensor system 100 may include a separate light sensor for detecting sunlight or other ambient light conditions in the environment of the TOF sensor 110. In one example, the camera 130 can be programmed to output active brightness (AB) and phases. AB-phase mode can allow for providing a fixed number of bits for the phase.
The display device 140 provides visual output for a user of the sensor system 100. For example, the display device 140 may display box dimensions and/or a box volume calculated by the processor 120 based on distance data from the TOF sensor 110. In some embodiments, the display device 140 displays an image obtained by the camera 130 and overlays visual imagery indicating one or more features identified in the field of view of the camera 130 and TOF sensor 110 based on the distance data. For example, the processor 120 may instruct the display device 140 to display an outline or wire-frame of a box over an image of the box obtained by the camera 130. A user can use this display to determine whether the sensor system 100 has correctly identified the box and the box's edges. The sensor system 100 may include additional or alternative input and/or output devices, e.g., buttons, a speaker, a touchscreen, etc.
The memory 150 stores data for the sensor system 100. For example, the memory 150 stores processing instructions used by the processor 120 to identify features in the environment of the TOF sensor 110, e.g., instructions to identify one or more Z-planes and/or to calculate box dimensions of an observed box. The memory 150 may temporarily store data and images obtained by the camera 130 and/or TOF sensor 110 and accessed by the processor 120. The memory 150 may further store image data accessed by the display device 140 to generate an output display.
For example, a first pixel 210a has a ray direction 215a that extends straight out from the TOF sensor 110; the pixel 210a is in the center of the image frame 220. A second pixel 210b at a corner of the image frame 220 is associated with a ray direction 215b that extends out from the TOF sensor 110 at, for example, a 30° angle in both an x-direction and y-direction from the center of the image frame 220, where the image frame 220 is an x-y plane in a frame of reference of the TOF sensor 110. The TOF sensor 110 returns distance data (e.g., a distance, one or more phase shifts) to a surface along each valid pixel's ray. In one example, the first pixel 210a may have a measured distance of 1 meter representing a distance to a particular point on a box, and the second pixel 210b may have a measured distance of 2 meters representing a distance to particular point on a wall behind the box.
Example Process for Identifying a Z-Plane
In some embodiments, the processor 120 filters 320 the received distance data. Ambient light in the environment of the TOF sensor 110 can create noise in the distance data. To reduce the effect of noise, a filter, e.g., an integral filter, may be applied to the distance data before further analysis is performed. Filtering the noise in this manner may be particularly useful if the TOF sensor 110 captures data in an outdoor environment, due to the noise caused by sunlight. To filter the distance data, the processor 120 may compute, for each pixel, an average pixel value based on pixel values in a region around the pixel. For example, the filtered pixel value for a given pixel may be the average value for an 11×11 or 21×21 square of pixels centered on the given pixel. In some embodiments, the processor 120 performs the filtering on phase measurement data received from TOF sensor 110, e.g., the processor 120 first filters multiple phase measurements (for different frequencies, as described above), and then processes the filtered phase data to determine a distance measurement for each pixel. Alternatively, the processor 120 may filter the distance measurements, e.g., if the pulse return method is used to obtain the distance data.
In some embodiments, the filtering step may be omitted, e.g., if, the TOF sensor 110 is intended for use in an environment with relatively low noise levels, e.g., if the TOF sensor 110 is designed for indoor use only. In some cases, the processor 120 may perform filtering in response to determining that there is a threshold level of sunlight in the environment of the TOF sensor 110. Furthermore, in some embodiments, the processor 120 may perform adaptive filtering based on the type or level of ambient light in the environment of the TOF sensor 110, e.g., using a larger filter window when brighter sunlight is detected, using a larger filter window when a greater frequency distribution is detected in the ambient light, or using a larger filter window when particular frequencies known to interact with the TOF sensor 110 are detected in the ambient light.
The processor 120 generates 330 a point cloud based on the distance data and the pixel ray directions 215. For example, for each individual pixel, the processor 120 multiplies the ray direction 215 for the pixel by the measurement distance to the surface for that pixel, e.g., the measurement distance shown in
The point cloud is in the reference frame of the TOF sensor 110, also referred to as the ego frame. For example, if a user is holding the TOF sensor 110 at a slight angle relative to the ground in the environment, the Z-direction in the reference frame of the TOF sensor 110 does not align with the Z-planes in the environment (e.g., the Z-direction in the reference frame is angled relative to the direction of gravity, if the Z-planes are perpendicular to the direction of gravity).
The processor 120 identifies 340 basis vectors for a frame of reference of the surfaces in the environment, also referred to as a “world” frame of reference. A first basis vector corresponds to the direction perpendicular to the Z-planes in the environment observed by the TOF sensor 110. Second and third basis vectors are each orthogonal to the first basis vector. The basis vectors define a “world” coordinate system, i.e., a coordinate system in which the Z-planes are horizontal.
Having computed the surface normals, the processor 120 extracts one or more basis vectors based on the computed surface normals. To extract the first basis vector, the processor 120 may bin 420 the coordinates of the surface normals, e.g., the processor 120 bins the polar and azimuthal angles for each of the computed surface normals. This binning results a two-dimensional distribution. The bins may be represented visually by a histogram, e.g., the two-dimensional histogram that bins angular coordinates of the surface normals shown in
Having selected the first basis vector, processor 120 selects 450 the second and third basis vectors. The second and third basis vectors are orthogonal to the first basis vector (i.e., orthogonal to the surface normal to the Z-planes). The second and third basis vectors are also orthogonal to each other. The first, second, and third basis vectors define the world frame of reference.
In some embodiments, the processor 120 calculates a projection of the TOF sensor's pointing direction (e.g., the ray direction 215b, which extends straight out from the TOF sensor 110) into a Z-plane (e.g., a plane orthogonal to the first basis vector), and the processor 120 selects this projection as the second basis vector. The processor 120 selects the vector orthogonal to the first and second basis vectors as a third basis vector; the processor 120 may compute the third basis vector as the cross product of the first basis vector and the second basis vector. In other embodiments, the second and third basis vectors may be chosen in other ways.
Returning to
The processor 120 next identifies 360 Z-planes in the transformed point cloud.
The processor 120 then generates 520 a profile representation of the height map. For example, the processor 120 integrates the height map over the x- and y-directions to obtain a Z-profile of the height map. This profile represents the probability density of heights within the height map. Peaks within the profile correspond to various flat surfaces, i.e., surfaces that are orthogonal to the first basis vector.
The processor 120 identifies 530 the peaks in the profile representation of the height map as Z-planes. For example, for each portion of the Z-profile where the probability density falls above a given threshold (e.g., 0.01), the processor 120 identifies a Z-plane. In some embodiments, the processor 120 applies one or more other rules or heuristics to the Z-profile to identify the Z-planes, e.g., to remove spurious noise peaks while retaining genuine weak signals, such as the floor peak shown on page 9. For example, the processor 120 may identify peaks having at least a threshold number of associated points, or peaks in which the heights fall within a given range of each other.
The processor 120 may select a particular height point within a given peak (e.g., a highest point or a center point) as the Z-plane height for that peak. In some embodiments, the processor 120 selects the lowest Z-value peak as the height of the base Z-plane. The processor 120 may set the height of the base Z-plane to zero, and determine the heights of the other Z-planes based on their height relative to the base Z-plane. For example, the processor 120 sets the height of the floor peak 1610 as 0, and the processor 120 determines that the chair peak 1620 has a height of 0.569 meters, the table peak 1630 has a height of 0.815 meters, and the box top peak 1640 has a height of 0.871 meters.
Having identified the Z-planes and their associated heights, the processor 120 associates 540 various points in the point cloud with the identified Z-planes. For example, the processor 120 may associate a particular point in the point cloud with an identified Z-plane if the height of the point is within a height range of the identified Z-plane. For example, if the height of a particular point is within twice a peak's FWHM (full width at half maximum), the point is associated with the Z-plane corresponding to the peak. In other examples, other ranges around a Z-plane height are used to associate points in the point cloud to Z-planes.
The transformed point cloud and identified Z-planes can be used for various further processing on the point cloud data. In some examples, the processor 120 can proceed to locate a box in the environment of the TOF sensor 110 and determine dimensions and/or the volume of the box, as described further below. In other examples, the processor 120 can perform other types of identification or analysis on other types of objects in the environment of the TOF sensor 110.
In some embodiments, the sensor system 100 displays outputs of the Z-plane identification process to a user. For example, the processor 120 may correlate the identified Z-planes to various pixels in an image obtained by the camera 130, and generate a display with visual indications of the identified Z-planes. For example, the Z-planes may be outlined or color-coded in a display output by the display device 140. The display device 140 may alternatively or additionally output the determined heights of the identified Z-planes.
Example Process for Box Dimensioning
For the box dimensioning process, the processor 120 may assume that at least a portion of the surfaces in the environment of the TOF sensor 110 correspond to a box to be measured. Several additional assumptions may be made about the box being measured. Such assumptions can improve the speed and accuracy of the box dimensioning process, particularly in applications where fast detection and measuring of the box are important, e.g., if box dimensions are calculated provided to a user in real or near-real time as the user points the TOF sensor 110 at a box. These assumptions may include that the angles between adjacent surfaces of the box are reasonably close to 90° (e.g., between 85° and 95°, or within some other range); that the box is located within a particular distance of the TOF sensor 110 (e.g., within 3 meters or within 5 meters); that each box dimension is within a particular range (e.g., at least 3 centimeters, or at least 10 centimeters; no more than 2 meters, or no more than 3 meters); that the box is closed; that the top face of the box is visible to the TOF sensor 110; and that the box rests on a flat, horizontal surface (i.e., a Z-plane) that is also visible to the TOF sensor 110.
It should be understood that, in some embodiments, one or more of these assumptions may be relaxed or removed. For example, the range between the sensor and box and the minimum and maximum box dimensions are merely exemplary, and in other embodiments, different ranges and dimensions may be used. In some embodiments, the ranges may vary on the intended uses or target users of the sensor system 100. For example, if the sensor system 100 is used to measuring boxes being loaded into a moving truck (e.g., including wardrobe boxes and boxed furniture), greater distance ranges and greater box dimensions may be used. In some embodiments, a user may be able to input a distance range and/or maximum and minimum box dimensions.
The processor 120 transforms 620 the distance data (e.g., the point cloud calculated based on distance data from the TOF sensor 110) into a frame of reference of a surface in the environment of the TOF sensor 110. For example, as described with respect to steps 340 and 350 in
As noted with respect to
The processor 120 identifies 730 connected components within at least some of the Z-slices. Each of the connected components may be a candidate box top. To identify a connected component, the processor 120 finds clusters of nearby or connecting points within a Z-slice. The processor 120 may identify a connected component by finding sets of pixels in a Z-slice that may be reached by moving across the Z-slice, e.g., pixels that are within a threshold distance of each other. For example, the processor 120 may select a particular pixel in a Z-slice and recursively add neighboring pixels that are also in the Z-slice to a connected component. Each connected component has a respective height along the Z-axis in the frame of reference of the basis vectors; the height corresponds to the height of the Z-slice.
In some embodiments, prior to identifying the connected components in the Z-slices, the processor 120 may apply one or more rules to remove one or more Z-slices from consideration as the box top. For example, the processor 120 may eliminate the lowest or base Z-slice as potentially containing the box top, since it is assumed that the box top is above the lowest surface. The processor 120 may also remove a Z-slice that does not lie sufficiently close within the height map (in the x- and y-direction) to some other, lower Z-slice (i.e., a potential surface for the box to be resting on). For example, for the Z-plane slices shown in
Having identified the connected components representing candidate box tops, the processor 120 selects 740 one of the connected components as the box top. The processor 120 may apply various rules to the connected components to identify the box top. For example, the processor 120 may remove connected components that are very small (e.g., having a width and/or length below the threshold minimum box dimensions described above). The processor 120 may remove connected components that are highly elongated or non-compact (e.g., the connected component has a large perimeter compared to the square root of its area). The processor 120 may remove connected components for which a box bottom (i.e., the surface on which the box is resting) cannot be derived from the height map, e.g., because no other connected component or Z-slice is sufficiently close to the connected component in the height map in the x- and y-direction.
Having applied these rules to the connected components shown in
While several example rules for identifying a box top are discussed above, the processor 120 may apply additional, fewer, or different rules to the connected components to identify the box top in different embodiments. In some embodiments, if multiple candidates pass each of the rules described above, the processor 120 may use an additional rule or rules for choosing between the possible box tops. For example, the processor 120 may select the candidate box top that is closest to the TOF sensor 110.
Having identified the box top, the processor 120 identifies 750 the surface on which the box is resting, which corresponds to the box bottom. For example, the processor 120 selects a Z-slice that has a lower height than the box top and that is closest in lateral range to the box top in the height map, e.g., the Z-slice that is closest to the identified box top in the x- and y-directions in the height map. In the example height map shown in
Returning to
The processor 120 further 650 calculates the length and the width of the box based on the selected box top. For example, having identified the box top, the processor 120 calculates the length and width of the box top. Because the distance data for the box top is taken at an angle and may have noise, the processor 120 may filter the box top data, rotate the box top so it is aligned with an x-axis and a y-axis, and calculate horizontal and vertical profiles of the edges to determine the length and width. For example, the trailing edge of the box (i.e., the edge of the box farthest from the TOF sensor 110) may be blurred, which can make it difficult for the processor 120 to identify the trailing edge without performing additional data processing.
To filter the distance data, the processor 120 may compute, for each pixel in the distance data, an average pixel value based on pixel values in a region around the pixel. The processor may use a different filter than the filter described with respect to step 320. In particular, the processor 120 may use a smaller filter window than the filter used to identify the Z-planes. For example, the filtered pixel value for a given pixel may be the average value for an 5×5 or 7×7 square of pixels centered on the given pixel. As described with respect to
In some embodiments, the filtering step may be omitted, e.g., if, the TOF sensor 110 is intended for use in an environment with relatively low noise levels, e.g., if the TOF sensor 110 is designed for indoor use only. In some cases, the processor 120 may perform filtering in response to determining that there is a threshold level of sunlight in the environment of the TOF sensor 110. Furthermore, in some embodiments, the processor 120 may perform adaptive filtering based on the type or level of ambient light in the environment of the TOF sensor 110, e.g., using a larger filter window when brighter sunlight is detected, using a larger filter window when a greater frequency distribution is detected in the ambient light, or using a larger filter window when particular frequencies known to interact with the TOF sensor 110 are detected in the ambient light.
The processor 120 extracts 820 a subset of points within the transformed distance data corresponding to the box top, e.g., the connected component selected as the box top at step 740. If the filtering 810 is performed, the processor 120 may calculate a second point cloud based on the filtered data (following the process described in step 330 in
Having rotated the box top subcloud, the processor 120 calculates 840 a width profile and a length profile for the box top. For many TOF sensors, while the leading edges of the box closest to the sensor are sharp and easy for both humans and computers to identify, flying voxels may blur the trailing edges of the box located farther downstream from the TOF sensor 110. This causes the location of the trailing edges to be more ambiguous and difficult to identify from the distance data. The processor 120 generates box top width and length profiles by projecting the points of the rotated box top subcloud onto the horizontal and vertical axes.
The processor 120 identifies 850 the leading edges and trailing edges in the width and length profiles. For example, the processor 120 applies one or more rules to identify the edges from the profiles. The processor 120 may fit lines to each of the profiles' interiors and define a leading edge as a location where the profile equals a set percentage of the linear fit's value, e.g., 40% of the linear fit's value. The processor 120 may define the trailing edge by a location where the profile equals the same percentage or different percentage of the linear fit's value, e.g., a particular value in the range of 25% to 85%. The processor 120 may further apply one or more rules to determine the trailing edge fraction. For example, trailing edge percentage thresholds may vary with the height of the box. Alternatively, percentage thresholds can differ for the shorter and longer of the two box top edges. FIGS. 23A and 23B show examples of the trailing edges and leading edges identified based on the width and length profiles.
The processor 120 calculates 860 the width and length of the box top based on the determined leading edges and trailing edges. In particular, the width is the distance between the leading edge and trailing edge in the width projection, and the length is the distance between the leading edge and trailing edge in the length projection.
Returning to
The processor 120 may project the image of the three-dimensional box onto a two-dimensional image plane of the camera 130 to generate an overlay image, e.g., an outline overlaying the image of the box. The calculated width, length, and height dimensions may also be reported in the graphical display, either along the edges or in a separate area. A user can view the graphical display in the display device 140 to qualitatively confirm that the sensor system 100 has correctly identified the box and correctly identified the edges and surfaces.
In some embodiments, the sensor system 100 may additionally or alternatively report an intensity indicator that indicates a measured intensity at a particular pixel or across a set of pixels in the distance data collected by the TOF sensor 110 and/or a measured intensity in the corresponding pixel or set of pixels collected by the camera 130. In some cases, if the measured intensity in an area of interest in the image frame 220 is too low, it may be difficult for the processor 120 to find Z-planes, determine the box top dimensions, or perform other processing of the TOF distance data. The processor 120 can analyze the intensity of at least a portion of the sensor system's field of view and report the intensity to the user. Based on the reported intensity, the user may determine whether to adjust the environment, e.g., by changing lighting conditions, by changing the angle of the TOF sensor 110 relative to the box or other area of interest, by moving the box to a different location (e.g., onto a different Z-plane, into another room), etc., in order to increase the intensity. In some embodiments, if the processor 120 determines that the intensity is too low (e.g., the intensity is below a given threshold and/or the processor 120 is having difficulty finding Z-planes or the box, e.g., none of the identified connected components satisfies the rules for identifying the box top), the processor 120 may output an instruction to the user to make a change to the environment, sensor position, or location of the box to increase the intensity.
For example, if the camera 130 is an IR camera, the processor 120 may determine an IR intensity for at least a portion of the camera's field of view, e.g., at or near the center of the image frame of the camera 130. If the camera 130 is a visible light camera, the processor 120 may determine an intensity or brightness of the visible light at or near the center of the image frame. The intensity measurement may be correlated with the reflectivity of the material(s) in a given region, e.g., a reflectivity of a box material. Since a user typically points the TOF sensor 110 at a box, Z-plane, or other area of interest, and may be encouraged to include a box top in the center of the image frame by the IoU (as described above), the center of the image frame of the camera 130 typically corresponds to the box top, other portion of a box, Z-plane, or other area of interest.
As a particular example, the camera 130 captures an image frame with an area corresponding to the image frame 220. The processor 120 may identify, in the image frame captured by the camera 130, an intensity near the center of the image frame, e.g., an intensity at a location corresponding to the pixel 215a in the center of the image frame 220 of the TOF sensor 110, or an average intensity for set of pixels including the center of the image frame. For example, the processor 120 may determine an average intensity for a set of pixels corresponding to the circle 2510 shown in
Another Example Process for Box Identification and Dimensioning
In process 2600, raw data 2602 can be obtained from the TOF sensor 110 and used in identifying the Z-Plane and/or in steps of box dimensioning before the raw data is converted to distance or depth data. Different averaging filters can be applied to the raw data 2602 to smooth flat surface, which can aid in Z-Plane identification, to sharpen edges, which can allow for more accurately estimating boundaries of the Z-planes (e.g., the identified box top and floor), etc. In some examples, the raw data 2602 can be provided to a first depth compute process 2604 that can include a large averaging window size. In addition, in some examples, the raw data 2602 can also be provided to a second depth compute process 2606 that can include a small averaging window size. The large averaging window size and the small averaging window size can be of sizes such that the large averaging window size is larger than the small averaging window size. In one specific example, the large averaging filter can be 17×17 pixels in size and the small averaging filter can be 5×5 pixels in size. For example, the larger filter kernel can ensure smoothness of the points in the raw data for more robustly determining the flat surfaces in the scene, such as potential box top and floor. Once this is done, the smaller filter kernel can be used to more robustly estimate the boundaries of box top and floor as the edges can remain relatively sharper.
In process 2600, a surface normal computation process 2608 can be performed for the depth computed data that used the large averaging window size, and a surface normal computation process 2610 can be performed for the depth computed data the used the small averaging window size. The surface normal computation processes 2608 and 2610 can be used to identify the Z-Planes, as described in examples herein. For example, the Z-planes can be identified as a collection of points in the depth computed data having a same surface normal and/or achieving a threshold size (e.g., a threshold number of adjacent points). Once the surface normal are computed, a box top and floor identification process 2612 can be performed based on the fact that box top and floor are parallel to each other to identify potential pairs of box tops and floors. All spurious surfaces can be eliminated using heuristics such as checking the shape of the box top to be rectangular or requiring that box top be contiguous except possibly for small, saturated areas, etc., as described further herein. In some examples, if multiple box tops are identified (i.e. many boxed are in the field of view), the one closest to the camera can be considered. The difference between the distance of box top and floor can yield the height of the box 2614.
In process 2600, a box top orientation identification process 2616 can be performed to determine the boundary of the box top (e.g., box width and length 2618) from the point cloud, which can include selecting the points whose surface normal are relatively aligned with that of the identified box top. An optional edge refinement process can also be performed as part of the box top orientation identification process 2616 to take the raw distance data and adjust the estimated box top boundary to ensure the edge pixel is at the average of its equidistant neighbors. The edge refinement process can ensure the box edges are properly captured and box width and length 2618 are not underestimated due to lens blur or other filtering operations. A box corners identification process 2620 can be performed to identify the pixels that correspond to the box corners so that line segments can be drawn on the 2D image as a wire-frame 2622, as described above and further herein.
Another Example Process for Identifying a Z-Plane
As described above,
In some embodiments, the processor 120 can apply 2720 averaging filter(s) to the raw data. As described, for example, ambient light in the environment of the TOF sensor 110 can create noise in the raw data. To reduce the effect of noise, one or more filters can be applied for one or more purposes. For example, the processor 120 can apply a first larger size filter to smooth the raw data and improve Z-Plane identification. In some examples, as described further herein, the processor 120 can also apply a smaller size filter to reduce smearing, which can improve boundary estimation for the Z-Plane. In an example, applying the larger size filter may smear the edges, so also separately applying the smaller size filter can help to reduce the smearing of edges. The averaging filters can have the effect of averaging a given pixel value based on pixel values in a region around the pixel. In the specific example described above, a 17×17 filter can, for a given pixel, average the pixel values in a 17×17 box around the given pixel, and set the given pixel as the computed average value. The averaging filters may include one or more of a box filter, a bilateral filter, a guided filter, an integral filter, etc., as described herein.
In some examples, the processor 120 can filter the raw data using a bilateral filters that preserves edges. It can also computes how “edgy” a pixel is (i.e. whether the pixel is likely to be on an edge), and if it is too edgy, processor 120 can invalidate the pixel, as described further herein. The bilateral filter computational cost and memory requirement can grow quadratically with the filter window size. The error can decrease linearly with the filter window size. To increase the SNR robustness and be computational efficient, edge preservation can be ignored, and box filters, which compute the mean of points in a square window of pixels as described above, can be used. In another example, guided filters, which are computationally efficient and preserve edges, can be used. In another example, integral tables can be used, which have a computational cost that may not scale with filter window size.
In an example, as part of filtering (e.g., based on the larger size filter and/or the smaller size filter), the processor 120 can discard or otherwise consider certain pixels or points invalid (e.g., as not belonging to a collection of adjacent points that define a surface or Z-plane) based on one or more factors. In some examples, the processor 120 can consider a pixel invalid if the active brightness is below a threshold (which can be configurable). In some examples, the processor 120 can consider a pixel invalid if a confidence at the pixel is above a confidence threshold (which can be configurable). In some examples, the processor 120 can consider a pixel invalid if a radial distance is below a minimum or above a maximum (either of which can be configurable).
The processor 120 can perform 2730 a depth compute process on the raw data, as filtered, to generate distance data. In some examples, the processor 120 can obtain distance data describing distances between the TOF sensor 110, or a point from which the raw data is measured, and various objects captured in the raw data. In some examples, the processor 120 can discard or otherwise consider certain pixels invalid (e.g., as not belonging to a surface or Z-plane) if the depth computed after applying the larger size filter is not within a threshold percentage (or absolute difference) of the depth computed after applying the smaller size filter (where the threshold can be configurable).
The processor 120 can generate 2740 a point cloud based on the distance data and the pixel ray directions 215. For example, processor 120 can generate the point cloud in a frame of reference of the TOF sensor 110. For example, for each individual pixel, the processor 120 multiplies the ray direction 215 for the pixel by the measurement distance to the surface for that pixel, e.g., the measurement distance shown in
The processor 120 can identify 2750 a basis vector representing a peak direction across the point cloud. For example, processor 120 can identify the basis vector for a frame of reference of a corresponding surface in the environment, also referred to as a “world” frame of reference. For example, processor 120 can identify the basis vector as described in reference to actions 410, 420, 430, and 440 of
The processor 120 can identify 2760 at least one Z-plane in the point cloud, where the at least one Z-plane can represent at least one surface.
Having computed the surface normals, the processor 120 can bin 2820 coordinates of the surface normals including averaging neighboring bins. This binning results a two-dimensional distribution. The bins may be represented visually by a histogram, e.g., the two-dimensional histogram that bins angular coordinates of the surface normals shown in
The processor 120 can generate 2840 a height map from the point cloud. For example, the processor 120 distributes the points the point cloud into square “chimneys,” and subsequently selects a representative height for each chimney. Each chimney may be a shape of the same size, e.g., a (0.75 cm)2 square. Other sizes or shapes may be used to construct the height map. The representative height may be, for example, the top point (maximum height) of the chimney, an average height, a median height, or another height selected or computed from the heights of the points falling within the chimney. Reducing the three-dimensional point cloud down to a two-dimensional height map simplifies data processing and increases computation speed. An example is shown in
The processor 120 can generate 2850 a profile representation of the height map. For example, the processor 120 integrates the height map over the x- and y-directions to obtain a Z-profile of the height map. This profile represents the probability density of heights within the height map. Peaks within the profile correspond to various flat surfaces, i.e., surfaces that are orthogonal to the first basis vector. An example is shown in
The processor 120 can identify 2860 at least one Z-plane as at least one peak in the profile representation of the height map. For example, for each portion of the Z-profile where the probability density falls above a given threshold (e.g., 0.01), the processor 120 can identify a Z-plane. In some embodiments, the processor 120 applies one or more other rules or heuristics to the Z-profile to identify the Z-planes, e.g., to remove spurious noise peaks while retaining genuine weak signals, such as the floor peak. For example, the processor 120 may identify peaks having at least a threshold number of associated points, or peaks in which the heights fall within a given range of each other.
The processor 120 may select a particular height point within a given peak (e.g., a highest point or a center point) as the Z-plane height for that peak. In some embodiments, the processor 120 selects the lowest Z-value peak as the height of the base Z-plane. The processor 120 may set the height of the base Z-plane to zero, and determine the heights of the other Z-planes based on their height relative to the base Z-plane. For example, the processor 120 sets the height of the floor peak 1610 as 0, and the processor 120 determines that the chair peak 1620 has a height of 0.569 meters, the table peak 1630 has a height of 0.815 meters, and the box top peak 1640 has a height of 0.871 meters.
Having identified the at least one Z-plane and its associated height, the processor 120 can associate 2870 various points in the point cloud with the at least one identified Z-plane. For example, the processor 120 may associate a particular point in the point cloud with an identified Z-plane if the height of the point is within a height range of the identified Z-plane. For example, if the height of a particular point is within twice a peak's FWHM (full width at half maximum), the point is associated with the Z-plane corresponding to the peak. Each peak of the at least one peak can indicate a collection of adjacent points in the point cloud having a same surface normal estimate, and the collection of adjacent points can achieve a threshold size. In other examples, other ranges around a Z-plane height are used to associate points in the point cloud to Z-planes. An example is shown in
The transformed point cloud and identified Z-planes can be used for various further processing on the point cloud data, as described above and further herein.
Another Example Process for Box Dimensioning
For the box dimensioning process, the processor 120 may assume that at least a portion of the surfaces in the environment of the TOF sensor 110 correspond to a box to be measured. Several additional assumptions may be made about the box being measured. Such assumptions can improve the speed and accuracy of the box dimensioning process, particularly in applications where fast detection and measuring of the box are important, e.g., if box dimensions are calculated provided to a user in real or near-real time as the user points the TOF sensor 110 at a box. These assumptions may include that the angles between adjacent surfaces of the box are reasonably close to 90° (e.g., between 85° and 95°, or within some other range); that the box is located within a particular distance of the TOF sensor 110 (e.g., within 3 meters or within 5 meters); that each box dimension is within a particular range (e.g., at least 3 centimeters, or at least 10 centimeters; no more than 2 meters, or no more than 3 meters); that the box is closed; that the top face of the box is visible to the TOF sensor 110; and that the box rests on a flat, horizontal surface (i.e., a Z-plane) that is also visible to the TOF sensor 110.
It should be understood that, in some embodiments, one or more of these assumptions may be relaxed or removed. For example, the range between the sensor and box and the minimum and maximum box dimensions are merely exemplary, and in other embodiments, different ranges and dimensions may be used. In some embodiments, the ranges may vary on the intended uses or target users of the sensor system 100. For example, if the sensor system 100 is used to measuring boxes being loaded into a moving truck (e.g., including wardrobe boxes and boxed furniture), greater distance ranges and greater box dimensions may be used. In some embodiments, a user may be able to input a distance range and/or maximum and minimum box dimensions.
The processor 120 can identify 2920 Z-planes in the raw data, as described with respect to action 2760 of
As noted with respect to
The processor 120 can identify 3030 connected components within at least some of the Z-slices. Each of the connected components may be a candidate box top. To identify a connected component, the processor 120 finds clusters of nearby or connecting points within a Z-slice. The processor 120 may identify a connected component by finding sets of pixels in a Z-slice that may be reached by moving across the Z-slice, e.g., pixels that are within a threshold distance of each other. For example, the processor 120 may select a particular pixel in a Z-slice and recursively add neighboring pixels that are also in the Z-slice to a connected component. Each connected component has a respective height along the Z-axis in the frame of reference of the basis vectors; the height corresponds to the height of the Z-slice. Examples are shown in
In some embodiments, prior to identifying the connected components in the Z-slices, the processor 120 may apply one or more rules to remove one or more Z-slices from consideration as the box top. For example, the processor 120 may eliminate the lowest or base Z-slice as potentially containing the box top, since it is assumed that the box top is above the lowest surface. The processor 120 may also remove a Z-slice that does not lie sufficiently close within the height map (in the x- and y-direction) to some other, lower Z-slice (i.e., a potential surface for the box to be resting on). For example, for the Z-plane slices shown in
Having identified the connected components representing candidate box tops, the processor 120 can selects 3040 one of the connected components as the box top. The processor 120 may apply various rules to the connected components to identify the box top. For example, the processor 120 may remove connected components that are very small (e.g., having a width and/or length below the threshold minimum box dimensions described above). The processor 120 may remove connected components that are highly elongated or non-compact (e.g., the connected component has a large perimeter compared to the square root of its area). The processor 120 may remove connected components for which a box bottom (i.e., the surface on which the box is resting) cannot be derived from the height map, e.g., because no other connected component or Z-slice is sufficiently close to the connected component in the height map in the x- and y-direction. The processor 120 may remove connected components that are too close to the edges of the capture from the TOF sensor 110. The processor 120 may remove connected components outside of a threshold distance from an optical center of the capture of the TOF sensor 110. The processor 120 may remove connected components that have at least a threshold number of gaps (e.g., pixels within the boundary of connected components that are not considered part of the box top). The processor 120 may remove connected components that have over a threshold difference in length between opposite edges, less than a minimum distance between corners or box extrema, over a maximum cosine of angles, etc. An example is shown in
While several example rules for identifying a box top are discussed above, the processor 120 may apply additional, fewer, or different rules to the connected components to identify the box top in different embodiments. In some embodiments, if multiple candidates pass each of the rules described above, the processor 120 may use an additional rule or rules for choosing between the possible box tops. For example, the processor 120 may select the candidate box top that is closest to the TOF sensor 110.
Having identified the box top, the processor 120 can identify 3050 the surface on which the box is resting, and may select this surface as corresponding to the box bottom. For example, the processor 120 selects a Z-slice that has a lower height than the box top and that is closest in lateral range to the box top in the height map, e.g., the Z-slice that is closest to the identified box top in the x- and y-directions in the height map. In the example height map shown in
Returning to
The processor 120 can further calculate 2950 the length and the width of the box based on the selected box top. For example, having identified the box top, the processor 120 calculates the length and width of the box top. Because the distance data for the box top is taken at an angle and may have noise, the processor 120 may filter the box top data, rotate the box top so it is aligned with an x-axis and a y-axis, and calculate horizontal and vertical profiles of the edges to determine the length and width. For example, the trailing edge of the box (i.e., the edge of the box farthest from the TOF sensor 110) may be blurred, which can make it difficult for the processor 120 to identify the trailing edge without performing additional data processing. In some examples, though described terms of dimensioning a box top herein, the processor 120 can similarly calculate length and width and height, etc., of any identified Z-plane, regardless of whether it is a box top or box bottom, ground plane, wall, etc.
To filter the distance data, the processor 120 may compute, for each pixel in the distance data, an average pixel value based on pixel values in a region around the pixel. The processor may use a different filter than the filter described with respect to action 2720. In particular, the processor 120 may use a smaller filter window than the filter used to identify the Z-planes. For example, the filtered pixel value for a given pixel may be the average value for an 5×5 or 7×7 square of pixels centered on the given pixel.
In some cases, the processor 120 may perform filtering in response to determining that there is a threshold level of sunlight in the environment of the TOF sensor 110. Furthermore, in some embodiments, the processor 120 may perform adaptive filtering based on the type or level of ambient light in the environment of the TOF sensor 110, e.g., using a larger filter window when brighter sunlight is detected, using a larger filter window when a greater frequency distribution is detected in the ambient light, or using a larger filter window when particular frequencies known to interact with the TOF sensor 110 are detected in the ambient light.
The processor 120 can maintain or discard 3210 a pixel based on whether the second surface normal estimate is substantially aligned with the identified basis vector or not. For example, where the second surface normal estimate for the pixel (e.g., as computed using the smaller size filter) is substantially aligned with the identified basis vector, the processor 120 can maintain the pixel in the collection of pixels defining the Z-plane. Otherwise, the processor 120 can discard the pixel from the collection of pixels defining the Z-plane.
The processor 120 can maintain or discard 3220 a pixel based on an inner product of a first surface normal estimate and a second surface normal estimate. For example, for at least pixels or points in the point cloud that are determined to be candidates for edge points, the processor 120 can compute, for a given pixel, the inner product of the first surface normal computed for the pixel (based on the larger size filter) and the second surface normal computed for the pixel (based on the smaller size filter). In some examples, the first surface normal can correspond to that computed in the first surface normal computation process 2608, and the second surface normal can correspond to that computed in the second surface normal computation 2610 (and/or as similarly described at 2810 in
As another factor for refining the Z-plane (e.g., in addition or alternatively to action 3220), the processor 120 can compute 3230 a distance of coordinates of a pixel projected to the Z-plane along an associated ray onto the Z-plane, and can maintain or discard 3240 the pixel based on comparing the distance to a threshold. For example, if the distance along the ray to the crudely determined Z-plane is within the threshold, this can indicate that the pixel belongs to the Z-plane, and the processor 120 can retain the pixel as part of the determined Z-plane. If the distance is not within the threshold, this can indicate that the pixel does not belong to the Z-plane, and the processor 120 can discard the pixel from the determined Z-plane. In one example, the processor 120 can projecting, for a point in the collection of adjacent points, the point to the at least one Z-plane along an associated ray, compute a distance of coordinates of the point corresponding to the projecting, maintain the point in the collection of adjacent points where the distance is within a threshold distance, or discard the point from the collection of adjacent points where the distance is not within the threshold distance. In one example, the distance can correspond to a height of the box, such as that computed at action 2940 in
As another factor for refining the Z-plane (e.g., in addition or alternatively to actions 3220, 3230, and/or 3240), the processor 120 can maintain or discard 3250 a pixel based on determining whether at least a portion of pixels in a square (or matrix) surrounding the pixel are saturated. For example, the square can be of various sizes, such as 3×3, 5×5, 7×7, 9×9, etc., and the processor 120 can determine whether a certain number or percentage of the pixels in the square are saturated (or unsaturated). In some examples, processor 120 can determine the pixel values based on the raw data. For example, the processor 120 can consider a pixel saturated if it has a pixel value that achieves a threshold (e.g., 32767 for a 16-bit pixel). For example, if a certain number of pixels in the square are valid (e.g., unsaturated), this can indicate that the pixel belongs to the Z-plane, and the processor 120 can retain the pixel as part of the determined Z-plane. If a certain number of pixels in the square are invalid (e.g., saturated), this can indicate that the pixel does not belong to the Z-plane, and the processor 120 can discard the pixel from the determined Z-plane. The threshold pixel value for determining saturation, and/or the number of pixels having the threshold pixel value for the subject pixel to be considered as not part of the Z-plane (or the number of pixels not having the threshold pixel value for the subject pixel to be considered as part of the Z-plane) can be configurable to adjust for certain conditions, as described above, such as based on a detected type or level of ambient light.
For example, the TOF sensor 110 can collect the set of raw capture data as an image having a number of rows and a number columns. If the value at a pixel indicates saturation (e.g., a value of at least 32767 for a 16-bit pixel), the capacitor may have been saturated and the pixel value may not be meaningful. When filtering the data, if a pixel has at least a threshold number of saturated pixels around it (the threshold may be configurable), the processor 120 can exclude the pixel from or during filtering. In one example, the processor 120 can replace the pixel value pi
where the weights wi′
In one specific example, if a pixel is maintained through all of actions 3220, 3240, and 3250, the processor 120 can retain the pixel as part of the Z-plane. In another example, the processor 120 can retain the pixel as part of the Z-plane if a pixel is maintained through one or more, or substantially any combination of, actions 3220, 3240, and 3250.
Returning to
The processor 120 can extract 3130 a subset of points within the transformed distance data corresponding to the Z-plane, e.g., the connected component selected as the Z-plane at action 3030.
As described,
Having rotated the Z-plane subcloud, the processor 120 can calculate 3150 a width profile and a length profile for the Z-plane. For many TOF sensors, while the leading edges of the box closest to the sensor are sharp and easy for both humans and computers to identify, flying voxels may blur the trailing edges of the box located farther downstream from the TOF sensor 110. This causes the location of the trailing edges to be more ambiguous and difficult to identify from the distance data. The processor 120 generates Z-plane width and length profiles by projecting the points of the rotated Z-plane subcloud onto the horizontal and vertical axes.
The processor 120 can identify 3160 the leading edges and trailing edges in the width and length profiles. For example, the processor 120 applies one or more rules to identify the edges from the profiles. The processor 120 may fit lines to each of the profiles' interiors and define a leading edge as a location where the profile equals a set percentage of the linear fit's value, e.g., 40% of the linear fit's value. The processor 120 may define the trailing edge by a location where the profile equals the same percentage or different percentage of the linear fit's value, e.g., a particular value in the range of 25% to 85%. The processor 120 may further apply one or more rules to determine the trailing edge fraction. For example, trailing edge percentage thresholds may vary with the height of the box. Alternatively, percentage thresholds can differ for the shorter and longer of the two Z-plane edges.
The processor 120 can calculate 3170 the width and length of the box top, or other Z-plane, based on the determined leading edges and trailing edges. In particular, the width is the distance between the leading edge and trailing edge in the width projection, and the length is the distance between the leading edge and trailing edge in the length projection.
Returning to
The processor 120 may project the image of the three-dimensional box onto a two-dimensional image plane of the camera 130 to generate an overlay image, e.g., an outline overlaying the image of the box, e.g., a wire-frame. The calculated width, length, and height dimensions may also be reported in the graphical display, either along the edges or in a separate area. A user can view the graphical display in the display device 140 to qualitatively confirm that the sensor system 100 has correctly identified the box and correctly identified the edges and surfaces. Examples are illustrated in
The following paragraphs provide various examples of the embodiments disclosed herein.
Example 1 provides a method for identifying a Z-plane, the method including receiving distance data describing distances between a sensor that captured the distance data and a plurality of surfaces in an environment of the sensor, where at least one of the surfaces is a Z-plane; generating a point cloud based on the distance data, the point cloud in a frame of reference of the sensor; identifying a basis vector representing a peak direction across the point cloud; transforming the point cloud into a frame of reference of the basis vector; and identifying a Z-plane in the transformed point cloud.
Example 2 provides the method of example 1, where the sensor is a TOF sensor including a light source and an image sensor.
Example 3 provides the method of example 1, where the distance data is arranged in a plurality of pixels within an image frame of the sensor.
Example 4 provides the method of example 3, where an individual pixel has a distance to one of the plurality of surfaces in the environment of the sensor, and the individual pixel has an associated ray direction describing a direction from the sensor to the surface.
Example 5 provides the method of example 4, where generating the point cloud involves multiplying the ray direction for the individual pixel by the distance to the one of the plurality of surfaces for the individual pixel.
Example 6 provides the method of example 1, where the distance data is arranged as a plurality of pixels, the method further including filtering the distance data by computing, for an individual pixel, an average pixel value based on pixel values in a region around the individual pixel.
Example 7 provides the method of example 1, where identifying the basis vector includes computing surface normals for points in the point cloud; and extracting the basis vector based on the computed surface normals, the basis vector representing the peak direction of the surface normals across the point cloud.
Example 8 provides the method of example 7, where computing the surface normal for points in the point cloud includes computing angular coordinates of the surface normals of the points in the point cloud.
Example 9 provides the method of example 8, where extracting the basis vector includes binning the angular coordinates of the surface normals; identifying a peak angle of each of the angular coordinates; and identifying the basis vector based on the identified peak angles.
Example 10 provides the method of example 7, where computing a surface normal for an individual point in the point cloud includes fitting a plane to a set of points in a region around the individual point.
Example 11 provides the method of example 1, where the basis vector is a first basis vector, the method further including selecting a second basis vector orthogonal to the first basis vector and a third basis vector orthogonal to the first basis vector and the second basis vector, where the frame of reference of the basis vector is a frame of reference of the first basis vector, the second basis vector, and the third basis vectors.
Example 12 provides the method of example 11, where the second basis vector is selected as a projection of a pointing direction of the sensor into a Z-plane, and the third basis vector is set equal to a cross product of the first basis vector and the second basis vector.
Example 13 provides the method of example 1, where identifying the Z-plane in the transformed point cloud includes generating a height map of the transformed point cloud; generating a profile representation of the height map, the profile representation having a peak corresponding to each of a plurality of Z-planes; and identifying the Z-plane in the profile representation.
Example 14 provides the method of example 13, where the identified Z-plane is a base Z-plane, the method further including setting a height of the base Z-plane to zero.
Example 15 provides the method of example 13, further including associating a point in the transformed point cloud with the identified Z-plane based on determining that a height of the point is within a height range associated with the identified Z-plane.
Example 16 provides an imaging system including a TOF depth sensor to obtain distance data describing distances between the TOF depth sensor and a plurality of surfaces in an environment of the TOF depth sensor; and a processor to receive the distance data from the TOF depth sensor; generate a point cloud based on the distance data, the point cloud in a frame of reference of the TOF depth sensor; identify a basis vector representing a peak direction across the point cloud; transform the point cloud into a frame of reference of the basis vector; and identify a Z-plane in the transformed point cloud.
Example 17 provides the system of example 16, where the TOF depth sensor includes a light source to illuminate the environment of the TOF depth sensor and an image sensor to sense reflected light.
Example 18 provides the system of example 16, where the TOF depth sensor has an image frame, and the distance data is arranged in a plurality of pixels within the image frame.
Example 19 provides the system of example 18, where an individual pixel has a distance to one of the plurality of surfaces in the environment of the TOF depth sensor, and the individual pixel has an associated ray direction describing a direction from the TOF depth sensor to the surface.
Example 20 provides the system of example 19, where, to generate the point cloud, the processor multiplies the ray direction for the individual pixel by the distance to the one of the plurality of surfaces for the individual pixel.
Example 21 provides the system of example 16, further including a camera to capture an image of the environment of the TOF depth sensor.
Example 22 provides the system of example 21, further including a display screen, the processor to display, on the display screen, the image captured by the camera and a visual indication of the identified Z-plane.
Example 23 provides the system of example 16, further including a light sensor for detecting sunlight in the environment of the TOF depth sensor, where the processor applies a filter to the distance data in response to detecting at least a threshold level of sunlight.
Example 24 provides a method for determining dimensions of a physical box, the method including receiving distance data describing distances between a sensor and a plurality of surfaces in an environment of the sensor, at least a portion of the surfaces corresponding to a box to be measured; transforming the distance data into a frame of reference of one of the surfaces in the environment of the sensor; selecting, from the plurality of surfaces in the environment of the sensor, a first surface corresponding to a top of the box and a second surface corresponding to a surface the box is resting on; calculating a height between the first surface and the second surface; and calculating a length and a width based on the selected first surface corresponding to the top of the box.
Example 25 provides the method of example 24, where the distance data is a point cloud in a frame of reference of the sensor.
Example 26 provides the method of examples 25, where transforming the distance data into the frame of reference of one of the surfaces in the environment of the sensor includes identifying a basis vector representing a peak direction across the point cloud; and transforming the point cloud into a frame of reference of the basis vector.
Example 27 provides the method of example 26, where identifying the basis vector includes computing angular coordinates of surface normals for points in the point cloud; and extracting the basis vector based on the computed angular coordinates of the surface normals, the basis vector representing the peak direction of the surface normals across the point cloud.
Example 28 provides the method of example 24, where the sensor is a TOF sensor including a light source and an image sensor.
Example 29 provides the method of example 24, where the one of the surfaces used as the frame of reference for transforming the distance data is a Z-plane.
Example 30 provides the method of example 24, where selecting the first surface includes identifying a plurality of connected components within the transformed distance data, each connected component having a respective height along a Z-axis in the frame of reference of the one of the surfaces; and selecting, as the first surface, one of the plurality of connected components by applying a set of rules to the plurality of connected components.
Example 31 provides the method of example 30, where identifying the plurality of connected components includes identifying a plurality of Z-slices of the transformed distance data, each of the plurality of Z-slices having a respective height along the Z-axis; and identifying, within each of the plurality of Z-slices, at least one connected component of height map pixels.
Example 32 provides the method of example 31, where identifying the plurality of Z-slices includes generating a height map of the distance data; generating a profile representation of the height map, the profile representation having a peak corresponding to each Z-slice; and identifying the plurality of Z-slices from the profile representation.
Example 33 provides the method of example 31, where selecting the second surface corresponding to the surface the box is resting on includes selecting a Z-slice of the plurality of Z-slices within a lateral range of the selected first surface.
Example 34 provides the method of example 30, where the set of rules applied to the plurality of connected components includes removing a connected component having a width or length less than a threshold minimum width or length; removing a connected component at least a threshold distance from another connected component; and removing a connected component having an enclosing convex hull polygon that deviates from an expected rectangular shape by at least a threshold deviation.
Example 35 provides the method of example 24, where calculating the length and the width based on the selected first surface involves extracting a subset of the transformed distance data corresponding to the selected first surface; calculating a length profile and a width profile of the subset; identifying, within the width profile, a first leading edge and a first trailing edge of the box; identifying, within the length profile, a second leading edge and a second trailing edge of the box; and calculating the width of the box between the first leading edge and the second leading edge and calculating the length of the box between the second leading edge and the second trailing edge.
Example 36 provides the method of example 24, further including determining an angle of rotation for the extracted subset corresponding to the first selected surface, the determined angle selected to minimize a sum of projections of edges of the first selected surface onto a set of axes of the frame of reference of one of the surfaces in the environment of the sensor; and rotating the extracted subset corresponding to the first selected surface by the determined angle.
Example 37 provides the method of example 24, where the transformed distance data includes a plurality of pixels, and calculating the length and the width based on the selected first surface corresponding to the top of the box includes, for at least pixels in the selected first surface, filtering the pixels by computing, for an individual pixel, an average pixel value based on pixel values in a region around the individual pixel; and calculating the length and width based on the filtered pixels in the selected first surface.
Example 38 provides the method of example 24, further including generating a visual representation of the box, the visual representation indicating the height, width, and length of the box.
Example 39 provides the method of example 24, further including calculating an IoU score based on an overlap between the first surface corresponding to the top of the box and a circle in a field of view of the sensor; and generating a display including the calculated IoU score.
Example 40 provides the method of example 24, further including receiving camera data from a camera, the camera having a camera field of view that at least partially overlaps with a field of view of the sensor; determining, based on the camera data, an intensity of at least portion of the camera field of view; and generating a display including the determined intensity.
Example 41 provides an imaging system including a TOF depth sensor to obtain distance data describing distances between the TOF depth sensor and a plurality of surfaces in an environment of the TOF depth sensor; and a processor to receive the distance data from the TOF depth sensor; transform the distance data into a frame of reference of one of the surfaces in the environment of the sensor; select a first surface corresponding to a top of the box and a second surface corresponding to a surface the box is resting on; calculate a height between the first surface and the second surface; and calculate a length and a width based on the selected first surface corresponding to the top of the box.
Example 42 provides the system of example 41, where the TOF depth sensor includes a light source to illuminate the environment of the depth sensor and an image sensor to sense reflected light.
Example 43 provides the system of example 41, where the TOF sensor has an image frame, and the distance data is arranged in a plurality of pixels within the image frame.
Example 44 provides the system of example 43, where an individual pixel has a distance to one of the plurality of surfaces in the environment of the TOF depth sensor, and the individual pixel has an associated ray direction describing a direction from the sensor to the TOF depth surface.
Example 45 provides the system of example 41, further including a camera to capture an image of the environment of the TOF depth sensor.
Example 46 provides the system of example 45, further including a display screen, the processor to display, on the display screen, the image captured by the camera and the calculated width, length, and height.
Example 47 provides the system of example 45, further including a display screen, the processor to display, on the display screen, the image captured by the camera and an overlaid depiction of the selected first surface.
Example 48 provides the system of example 47, the processor further to display, on the display screen, a plurality of box edges below the selected first surface.
It is to be understood that not necessarily all objects or advantages may be achieved in accordance with any particular embodiment described herein. Thus, for example, those skilled in the art will recognize that certain embodiments may be configured to operate in a manner that achieves or optimizes one advantage or group of advantages as taught herein without necessarily achieving other objects or advantages as may be taught or suggested herein.
In one example embodiment, any number of electrical circuits of the figures may be implemented on a board of an associated electronic device. The board can be a general circuit board that can hold various components of the internal electronic system of the electronic device and, further, provide connectors for other peripherals. More specifically, the board can provide the electrical connections by which the other components of the system can communicate electrically. Any suitable processors (inclusive of digital signal processors, microprocessors, supporting chipsets, etc.), computer-readable non-transitory memory elements, etc. can be suitably coupled to the board based on particular configuration needs, processing demands, computer designs, etc. Other components such as external storage, additional sensors, controllers for audio/video display, and peripheral devices may be attached to the board as plug-in cards, via cables, or integrated into the board itself. In various embodiments, the functionalities described herein may be implemented in emulation form as software or firmware running within one or more configurable (e.g., programmable) elements arranged in a structure that supports these functions. The software or firmware providing the emulation may be provided on non-transitory computer-readable storage medium comprising instructions to allow a processor to carry out those functionalities.
It is also imperative to note that all of the specifications, dimensions, and relationships outlined herein (e.g., the number of processors, logic operations, etc.) have only been offered for purposes of example and teaching only. Such information may be varied considerably without departing from the spirit of the present disclosure, or the scope of the appended claims. The specifications apply only to one non-limiting example and, accordingly, they should be construed as such. In the foregoing description, example embodiments have been described with reference to particular arrangements of components. Various modifications and changes may be made to such embodiments without departing from the scope of the appended claims. The description and drawings are, accordingly, to be regarded in an illustrative rather than in a restrictive sense.
Note that with the numerous examples provided herein, interaction may be described in terms of two, three, four, or more components. However, this has been done for purposes of clarity and example only. It should be appreciated that the system can be consolidated in any suitable manner. Along similar design alternatives, any of the illustrated components, modules, and elements of the FIGS. may be combined in various possible configurations, all of which are clearly within the broad scope of this Specification.
Note that in this Specification, references to various features (e.g., elements, structures, modules, components, steps, operations, characteristics, etc.) included in “one embodiment”, “example embodiment”, “an embodiment”, “another embodiment”, “some embodiments”, “various embodiments”, “other embodiments”, “alternative embodiment”, and the like are intended to mean that any such features are included in one or more embodiments of the present disclosure, but may or may not necessarily be combined in the same embodiments.
Numerous other changes, substitutions, variations, alterations, and modifications may be ascertained to one skilled in the art and it is intended that the present disclosure encompass all such changes, substitutions, variations, alterations, and modifications as falling within the scope of the appended claims. Note that all optional features of the systems and methods described above may also be implemented with respect to the methods or systems described herein and specifics in the examples may be used anywhere in one or more embodiments.
In order to assist the United States Patent and Trademark Office (USPTO) and, additionally, any readers of any patent issued on this application in interpreting the claims appended hereto, Applicant wishes to note that the Applicant: (a) does not intend any of the appended claims to invoke paragraph (f) of 35 U.S.C. Section 112 as it exists on the date of the filing hereof unless the words “means for” or “step for” are specifically used in the particular claims; and (b) does not intend, by any statement in the Specification, to limit this disclosure in any way that is not otherwise reflected in the appended claims.
This application is a continuation-in-part of PCT application no. PCT/US2021/051238, filed Sep. 21, 2021, and entitled “Z-PLANE IDENTIFICATION AND BOX DIMENSIONING USING THREE-DIMENSIONAL TIME-OF-FLIGHT IMAGING,” which claims priority to U.S. provisional patent application nos. 63/081,742, filed Sep. 22, 2020 and entitled “BOX DIMENSIONING USING THREE-DIMENSIONAL TIME-OF-FLIGHT IMAGING,” and 63/081,775, filed Sep. 22, 2020 and entitled “WORLD Z-PLANE IDENTIFICATION IN TIME-OF-FLIGHT IMAGERY,” which are hereby incorporated by reference in their entireties.
Number | Date | Country | |
---|---|---|---|
63081742 | Sep 2020 | US | |
63081775 | Sep 2020 | US |
Number | Date | Country | |
---|---|---|---|
Parent | PCT/US2021/051238 | Sep 2021 | US |
Child | 18188316 | US |